6

We just switched away from Scala and moved over to Python. I've got a dataframe that I need to push into SQL Server. I did this multiple times before, using the Scala code below.

var bulkCopyMetadata = new BulkCopyMetadata
bulkCopyMetadata.addColumnMetadata(1, "Title", java.sql.Types.NVARCHAR, 128, 0)
bulkCopyMetadata.addColumnMetadata(2, "FirstName", java.sql.Types.NVARCHAR, 50, 0)
bulkCopyMetadata.addColumnMetadata(3, "LastName", java.sql.Types.NVARCHAR, 50, 0)

val bulkCopyConfig = Config(Map(
  "url"               -> "mysqlserver.database.windows.net",
  "databaseName"      -> "MyDatabase",
  "user"              -> "username",
  "password"          -> "*********",
  "dbTable"           -> "dbo.Clients",
  "bulkCopyBatchSize" -> "2500",
  "bulkCopyTableLock" -> "true",
  "bulkCopyTimeout"   -> "600"
))

df.bulkCopyToSqlDB(bulkCopyConfig, bulkCopyMetadata)

That's documented here.

https://learn.microsoft.com/en-us/azure/sql-database/sql-database-spark-connector

I'm looking for an equivalent Python script to do the same job. I searched for the same, but didn't come across anything. Does someone here have something that would do the job? Thanks.

1

2 Answers 2

5

Please try to refer to PySpark offical document JDBC To Other Databases to directly write a PySpark dataframe to SQL Server via the jdbc driver of MS SQL Server.

Here is the sample code.

spark_jdbcDF.write
    .format("jdbc")
    .option("url", "jdbc:sqlserver://yourserver.database.windows.net:1433")
    .option("dbtable", "<your table name>")
    .option("user", "username")
    .option("password", "password")
    .save()

Or

jdbcUrl = "jdbc:mysql://{0}:{1}/{2}".format(jdbcHostname, jdbcPort, jdbcDatabase)
connectionProperties = {
  "user" : jdbcUsername,
  "password" : jdbcPassword,
  "driver" : "com.mysql.jdbc.Driver"
}
spark_jdbcDF.write \
    .jdbc(url=jdbcUrl, table="<your table anem>",
          properties=connectionProperties ).save()

Hope it helps.

Sign up to request clarification or add additional context in comments.

2 Comments

I have tried the same, I am getting, AttributeError: 'NoneType' object has no attribute 'save' error
were you able to resolve it?
1

Here is the complete PySpark code to write a Spark Data Frame to an SQL Server database including where to input database name and schema name:

df.write \
.format("jdbc")\
.option("url", "jdbc:sqlserver://<servername>:1433;databaseName=<databasename>")\
.option("dbtable", "[<optional_schema_name>].<table_name>")\
.option("user", "<user_name>")\
.option("password", "<password>")\
.save()

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.