I wrote a script that iterates through a large database table. (~150K rows.) To avoid using too much memory I'm using this windowed_query method. My script goes something like this:
query = db.query(Table)
count = 0
for row in windowed_query(query, Table.id, 1000):
points = 0
# +100 points for a logo
if row.logo_id:
points += 100
# +10 points for each image
points += 10 * len(row.images) #images is a SQLAlchemy one-to-many relationship
#...The script continues with much of the same...
row.points = points
db.add(row)
count += 1
if count % 100 == 0:
db.commit()
print count
request.db.commit()
When trying to run it on a CentOS server, it makes it through 9000 rows before getting killed by the kernel because it's using ~2GB of memory.
On my Mac development environment, it works like a charm, even though it's running on exactly the same version of Python (2.7.3), SQLAlchemy (0.7.8), and psycopg2 (2.4.5).
Using memory_profiler for some simple debugging: On Linux, each piece of code that queries the database increased the memory a small amount, with the growth never stopping. On Mac, the same thing happened, but after growing ~4MB it leveled off. It's as if on Linux nothing is being garbage collected. (I even tried running gc.collect() every 100 rows. Didn't do anything.)
Does anybody have a clue what is happening?
windowed_querythe largest query is 1000 rows, sofetchoneusing the memory offetchalldoesn't explain 2GB of memory use.