I am running a very simple query using python on a table from Snowflake database using the package snowflake-connector-python==2.3.3 installed with the additional [pandas] I containerized my python app using the python:3.7.0-slim image. And my script is extremely simple.
from snowflake import connector
import os
ctx = connector.connect(
user=os.environ['USER'],
password=os.environ['PASSWORD'],
account=os.environ['ACCOUNT'],
warehouse=os.environ['WAREHOUSE'],
database=os.environ['DATABASE'],
schema=os.environ['SCHEMA'])
cur = ctx.cursor()
# Execute a statement that will generate a result set.
sql = "SELECT * FROM MY_TABLE ORDER BY MY_COLUMN"
print("executing query: " + sql)
cur.execute(sql)
df = cur.fetch_pandas_all()
The actual table size from what Snowflake tells me is 3.3 GB. However when I run this app it crashes as it takes over 9GB of RAM. I know this because I'm running it in a kubernetes cluster and the pod is evicted and says it used 9535336Ki memory. Is there something I'm missing here? How can the memory usage be 3x the table size?
SELECT * FROM. See why. Try selecting exact needed columns for app.It's acceptable to use SELECT * when there's the explicit need for every column in the table(s) involved. And I need every column. Also it still doesn't answer the question: the table itself is 3.3 GB but the size of my container grows beyond 9GB so why is that?cur.fetch_pandas_all(). Try removing for debugging reasons to isolate problematic line. Docs indicate faster than pandas'read_sqlbut I wonder.