1

I have a python script that mounts a storage account in databricks and then installs a wheel from the storage account. I am trying to run it as a cluster init script but it keeps failing. My script is of the form:

#/databricks/python/bin/python
mount_point = "/mnt/...."
configs = {....}
source = "...."
if not any(mount.mountPoint == mount_point for mount in dbutils.fs.mounts()):
  dbutils.fs.mount(source = source, mount_point = mount_point, extra_configs = configs)
dbutils.library.install("dbfs:/mnt/.....")
dbutils.library.restartPython()

It works when I run it in directly in a notebook but if I save to a file called dbfs:/databricks/init_scripts/datalakes/init.py and use it as cluster init script, the cluster fails to start and the error message says that the init script has a non-zero exit status. I've checked the logs and it appears that it is running as bash instead of python:

bash: line 1: mount_point: command not found

I have tried running the python script from a bash script called init.bash containing this one line:

/databricks/python/bin/python "dbfs:/databricks/init_scripts/datalakes/init.py"

Then the cluster using init.bash fails to start, with the logs saying it can't find the python file:

/databricks/python/bin/python: can't open file 'dbfs:/databricks/init_scripts/datalakes/init.py': [Errno 2] No such file or directory

Can anyone tell me how I could get this working please?

Related question: Azure Databricks cluster init script - Install wheel from mounted storage

1
  • The reason your script cannot find the file is because you're using a dbfs:/ path. Replace dbfs:/ by /dbfs/ and it should work (dbfs is mounted at /dbfs on cluster nodes) Commented Apr 29, 2021 at 14:07

1 Answer 1

2

The solution I went with was to run a notebook which mounts the storage and creates a bash init script that just installs the wheel. Something like this:

mount_point = "/mnt/...."
configs = {....}
source = "...."
if not any(mount.mountPoint == mount_point for mount in dbutils.fs.mounts()):
  dbutils.fs.mount(source = source, mount_point = mount_point, extra_configs = configs)

dbutils.fs.put("dbfs:/databricks/init_scripts/datalakes/init.bash",""" 
        /databricks/python/bin/pip install "../../../dbfs/mnt/package-source/parser-3.0-py3-none-any.whl"""", True)"
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.