Connect to Postgres via AWS Glue Python script

Question

Going through the AWS Glue docs I can't see any mention of how to connect to a Postgres RDS via a Glue job of "Python shell" type. I've set up a RDS connection in AWS Glue and verified I can connect to my RDS. Also, when creating the Python job I can see my connection and I've added it to the script.

How do I use the connection which I've added to the Glue job to run some raw SQL?

Thanks in advance,

Did you have any luck with it?

gibbz00
– gibbz00

2019-06-12 14:06:47 +00:00
Commented Jun 12, 2019 at 14:06 — gibbz00
– gibbz00, Commented Jun 12, 2019 at 14:06

Harsh Bafna · Accepted Answer · 2019-05-05 13:57:08Z

10

There are 2 possible ways to access data from RDS in glue etl (spark):

1st Option:

Create a glue connection on top of RDS
Create a glue crawler on top of this glue connection created in first step
Run the crawler to populate the glue catalogue with database and table pointing to RDS tables.
Create a dynamic frame in glue etl using the newly created database and table in glue catalogue.

Code Sample :

from pyspark.context import SparkContext
from awsglue.context import GlueContext
glueContext = GlueContext(SparkContext.getOrCreate())
DyF = glueContext.create_dynamic_frame.from_catalog(database="{{database}}", table_name="{{table_name}}")

2nd Option

Create a dataframe using spark sql :

url = "jdbc:postgresql://<rds_host_name>/<database_name>"
properties = {
"user" : "<username>",
"password" : "<password>"
}
df = spark.read.jdbc(url=url, table="<schema.table>", properties=properties)

Note :

You will need to pass postgres jdbc jar for creating the database using spark sql.
I have tried first method on glue etl and second method on python shell (dev-endpoint)

answered May 5, 2019 at 13:57

Harsh Bafna

2,2341 gold badge17 silver badges26 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

mcm Over a year ago

Want to be able to execute Raw SQL queries. Such as CREATE .... In the above case that's not possible...from my understanding. :/

t_warsop Over a year ago

@Harsh "You will need to pass postgres jdbc jar for creating the database using spark sql." - how would I do this?

Harsh Bafna Over a year ago

@t_warsop : You will need to ssh to end point, download the postgre jar and pass it with your spark-submit command. I couldn't figure out a better way for dev endpoints.

Harsh Bafna Over a year ago

@mcm : you can use spark's sqlcontext to execute the CREATE command, sqlContext.sql(query).

Collectives™ on Stack Overflow

Connect to Postgres via AWS Glue Python script

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related