Execute pyspark code on dbfs on existing interactive databricks cluster

Question

I'm working on Azure Databricks. Currently my Pyspark project is on 'dbfs'. I configured a spark-submit job to execute my Pyspark code (.py file). However, according to the Databricks documentation spark-submit jobs can only run on new automated clusters (Probably, that's by design).

Is there a way to run my Pyspark code on existing interactive cluster?

I also tried to run spark-submit command from notebook in %sh cell to no use.

CHEEKATLAPRADEEP · Accepted Answer · 2020-05-26 13:03:31Z

1

By default, when you create a job, the cluster type is selected as "New Automated cluster".

You can configure the cluster type to choose between automated cluster or existing interactive cluster.

Steps to configure a job:

Select the job => click on the cluster => Edit button and select the "Existing interactive cluster" and select the cluster.

answered May 26, 2020 at 13:03

CHEEKATLAPRADEEP

12.8k1 gold badge22 silver badges42 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Kas1 Over a year ago

Hi. That only works if the type of job is notebook. It doesn't work if its spark submit job. Please change the job to execute spark submit task to understand the change.

CHEEKATLAPRADEEP Over a year ago

Hi, you can configure and execute pyspark code using spark-submit using jobs.

Kas1 Over a year ago

Hi thanks for the reply. Yes, we currently scheduled a spark submit job to run Pyspark code. But it seems, spark submit jobs cannot be run on existing interactive clusters. Is there a way to run spark submit on existing cluster?

Collectives™ on Stack Overflow

Execute pyspark code on dbfs on existing interactive databricks cluster

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related