12

I want to extract data from a postgresql database and use that data (in a dataframe format) in a script. Here's my initial try:

from pandas import DataFrame
import psycopg2

conn = psycopg2.connect(host=host_address, database=name_of_database, user=user_name, password=user_password)

cur = conn.cursor()

cur.execute("SELECT * FROM %s;" % name_of_table)

the_data = cur.fetchall()

colnames = [desc[0] for desc in cur.description]

the_frame = DataFrame(the_data)
the_frame.columns = colnames

cur.close()
conn.close()

Note: I am aware that I should not use "string parameters interpolation (%) to pass variables to a SQL query string", but this works great for me as it is.

Would there be a more direct approach to this?

Edit: Here's what I used from the selected answer:

import pandas as pd
import sqlalchemy as sq

engine = sq.create_engine("postgresql+psycopg2://username:password@host:port/database")

the_frame = pd.read_sql_table(name_of_table, engine)
0

2 Answers 2

21

Pandas can load data from Postgres directly:

import psycopg2
import pandas.io.sql as pdsql

conn = psycopg2.connect(...)

the_frame = pdsql.read_frame("SELECT * FROM %s;" % name_of_table, conn)

If you have a recent pandas (>=0.14), you should use read_sql_query/table (read_frame is deprecated) with an sqlalchemy engine:

import pandas as pd
import sqlalchemy
import psycopg2

engine = sqlalchemy.create_engine("postgresql+psycopg2://...")

the_frame = pd.read_sql_query("SELECT * FROM %s;" % name_of_table, engine)
the_frame = pd.read_sql_table(name_of_table, engine)
Sign up to request clarification or add additional context in comments.

5 Comments

you don't need that deeper import anymore. pandas.read_sql_query is available from a top-level import now.
there's also, pandas.read_sql_table, which I believe will serve the OP even better
@PaulH: Thanks for that. I'll leave my answer as it is though, to avoid depending on very new Pandas (my personal one is too old for read_sql_query, and it isn't that old).
@JohnZwinck I added the suggestion of Paul H (but left the old, so you have both), is that OK? I can also put it as a separate answer if you want
@JohnZwinck If I use create_engine('postgresql+psycopg2://postgres@ip_address/table_name'), the call to pd.read_sql_table('table_name', engine) returns a NotImplementedError "read_sql_table only support for SQLAlchemy connectable". I tested engine.has_table('table_name') and it returns true. Why does Pandas think I'm not using an sqlalchemy connectable?
3

Here is an alternate method:

    # run sql code
    result = conn.execute(sql)   

    # Insert to a dataframe
    df = DataFrame(data=list(result), columns=result.keys())

2 Comments

Note that this wastefully constructs a list that isn't needed. If the table is large, that will hurt performance.
You can, however, do df = DataFrame(iter(result), columns=result.keys()) which is not quite as wasteful.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.