2

I'm running a SQL query using cx_oracle in python, and the result of the query is a list. The dimension of list if 180 columns * 200,000+ rows. Whenever I try to convert it into a data frame using pd.DataFrame I run into a Memory Error. For now as a solution I've tried to breakdown my query by putting filters, or query only a few columns etc, which works out. But then if I change some filters I again run into the error, and I can't always be sure of how many rows a query will return.

So I'm looking for any alternative data structures/library/packages which can be used. Or anyway where I can handle this within Pandas? As I'm doing data analysis using Pandas, so I would prefer if there is a way to handle this in Pandas rather than another library.

The fields in the list is either float,string or timestamp format.

1 Answer 1

2

Try to read data directly into Pandas DataFrame:

import cx_Oracle   #  pip install cx_Oracle
from sqlalchemy import create_engine

engine = create_engine('oracle://user:password@host_or_scan_address:1521/ORACLE_SERVIVE_NAME')

df = pd.read_sql('select * from table_name where ...', engine)

PS you may also want to make use of the chunksize parameter...

Sign up to request clarification or add additional context in comments.

2 Comments

Thank you I was able to query larger sizes using this. Is there a limit for the number of rows one can query using sqlalchemy? or any form of restriction you're aware of?
@user23564, AFAIK it's just your RAM size ;-)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.