Working with Python in Azure Databricks to Write DF to SQL Server

Question

We just switched away from Scala and moved over to Python. I've got a dataframe that I need to push into SQL Server. I did this multiple times before, using the Scala code below.

var bulkCopyMetadata = new BulkCopyMetadata
bulkCopyMetadata.addColumnMetadata(1, "Title", java.sql.Types.NVARCHAR, 128, 0)
bulkCopyMetadata.addColumnMetadata(2, "FirstName", java.sql.Types.NVARCHAR, 50, 0)
bulkCopyMetadata.addColumnMetadata(3, "LastName", java.sql.Types.NVARCHAR, 50, 0)

val bulkCopyConfig = Config(Map(
  "url"               -> "mysqlserver.database.windows.net",
  "databaseName"      -> "MyDatabase",
  "user"              -> "username",
  "password"          -> "*********",
  "dbTable"           -> "dbo.Clients",
  "bulkCopyBatchSize" -> "2500",
  "bulkCopyTableLock" -> "true",
  "bulkCopyTimeout"   -> "600"
))

df.bulkCopyToSqlDB(bulkCopyConfig, bulkCopyMetadata)

That's documented here.

https://learn.microsoft.com/en-us/azure/sql-database/sql-database-spark-connector

I'm looking for an equivalent Python script to do the same job. I searched for the same, but didn't come across anything. Does someone here have something that would do the job? Thanks.

Checkout this article (nerdsgene.com/Article/BulkCopyToSQLDB#BulkCopy), which describes Bulk copy to SQL server using Spark Python. — CHEEKATLAPRADEEP
– CHEEKATLAPRADEEP, Commented Nov 25, 2019 at 10:06

Peter Pan · Accepted Answer · 2019-11-28 09:23:38Z

5

Please try to refer to PySpark offical document JDBC To Other Databases to directly write a PySpark dataframe to SQL Server via the jdbc driver of MS SQL Server.

Here is the sample code.

spark_jdbcDF.write
    .format("jdbc")
    .option("url", "jdbc:sqlserver://yourserver.database.windows.net:1433")
    .option("dbtable", "<your table name>")
    .option("user", "username")
    .option("password", "password")
    .save()

Or

jdbcUrl = "jdbc:mysql://{0}:{1}/{2}".format(jdbcHostname, jdbcPort, jdbcDatabase)
connectionProperties = {
  "user" : jdbcUsername,
  "password" : jdbcPassword,
  "driver" : "com.mysql.jdbc.Driver"
}
spark_jdbcDF.write \
    .jdbc(url=jdbcUrl, table="<your table anem>",
          properties=connectionProperties ).save()

Hope it helps.

answered Nov 28, 2019 at 9:23

Peter Pan

24.2k4 gold badges31 silver badges47 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Subbu VidyaSekar Over a year ago

I have tried the same, I am getting, AttributeError: 'NoneType' object has no attribute 'save' error

Shubh Over a year ago

were you able to resolve it?

Stian · Accepted Answer · 2021-02-24 13:20:07Z

1

Here is the complete PySpark code to write a Spark Data Frame to an SQL Server database including where to input database name and schema name:

df.write \
.format("jdbc")\
.option("url", "jdbc:sqlserver://<servername>:1433;databaseName=<databasename>")\
.option("dbtable", "[<optional_schema_name>].<table_name>")\
.option("user", "<user_name>")\
.option("password", "<password>")\
.save()

answered Feb 24, 2021 at 13:20

Stian

1,4314 gold badges21 silver badges30 bronze badges

Collectives™ on Stack Overflow

Working with Python in Azure Databricks to Write DF to SQL Server

2 Answers 2

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related