Is there some sort of adaptor that allows querying a postgresql database like it was a pandas dataframe?
2 Answers
Update (16th March 2016)
It is possible, but you would have to have a compiler, which evaluates your query and transforms it into SQL clauses.
The fact that SQL is a higher level language and that DBMS interpret SQL clauses with regard to not only the query, but also the data and its distribution, makes this really hard do to performantly.
Wes McKinney is trying to do this with Ibis project and has a nice writeup about some of the challenges.
Previous post
Unfortunately that's not possible, because SQL is higher level language than Python.
With pandas you specify what and how you want to do something, whereas with SQL you only specify what you want. The SQL server is then free to decide how to serve your query. When you add an index to a table, the SQL server can then use that index to serve your query faster without you rewriting your query.
If you instructed your database how you want it to execute your query, then you would also need to rewrite your SQL statements if you wanted them to use an index.
That being said, I commonly use the pattern in neurite's answer for analysis, using SQL to perform initial aggregation (and reduce size of data) and then perform other operations in pandas.
Comments
Not sure if this is exactly what you want but you can load postgres tables into pandas and manipulate them from there.
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_sql.html http://docs.sqlalchemy.org/en/latest/dialects/postgresql.html
Shamelessly stolen from the pages referenced above:
import pandas
from sqlalchemy import create_engine
engine = create_engine(
'postgresql+pg8000://scott:tiger@localhost/test',
isolation_level='READ UNCOMMITTED'
)
df = pandas.read_sql('SELECT * FROM <TABLE>;' con=engine)