3

I wrote a script that iterates through a large database table. (~150K rows.) To avoid using too much memory I'm using this windowed_query method. My script goes something like this:

query = db.query(Table)

count = 0
for row in windowed_query(query, Table.id, 1000):

    points = 0

    # +100 points for a logo
    if row.logo_id:
        points += 100

    # +10 points for each image
    points += 10 * len(row.images) #images is a SQLAlchemy one-to-many relationship

    #...The script continues with much of the same...

    row.points = points
    db.add(row)

    count += 1
    if count % 100 == 0:
        db.commit()
        print count

request.db.commit()

When trying to run it on a CentOS server, it makes it through 9000 rows before getting killed by the kernel because it's using ~2GB of memory.

On my Mac development environment, it works like a charm, even though it's running on exactly the same version of Python (2.7.3), SQLAlchemy (0.7.8), and psycopg2 (2.4.5).

Using memory_profiler for some simple debugging: On Linux, each piece of code that queries the database increased the memory a small amount, with the growth never stopping. On Mac, the same thing happened, but after growing ~4MB it leveled off. It's as if on Linux nothing is being garbage collected. (I even tried running gc.collect() every 100 rows. Didn't do anything.)

Does anybody have a clue what is happening?

5
  • Perhaps you're suffering from this bug? velocityreviews.com/forums/… Commented Feb 10, 2013 at 18:59
  • But with windowed_query the largest query is 1000 rows, so fetchone using the memory of fetchall doesn't explain 2GB of memory use. Commented Feb 10, 2013 at 19:37
  • 3
    Aaahhhh.... I figured it out. I'm using Pyramid, and had the debug toolbar enabled. After disabling it, the memory usage plateaued at 73MB. Problem solved! Commented Feb 10, 2013 at 19:43
  • @TheronLuhn Hey, glad you solved this! Since your question is solved, feel free to post an answer to your own question and mark it as "Accepted" (click the big green checkmark). This helps other people know that your problem was solved :) Commented Dec 24, 2013 at 0:46
  • Alright, I will do that. Commented Dec 24, 2013 at 14:59

1 Answer 1

4

It turns out Pyramid's debugtoolbar was enabled, which was the reason for the high memory use. I disabled it and the script worked like a charm.

Sign up to request clarification or add additional context in comments.

1 Comment

Yup, most of the debugging toolbars keep the sql which is sent to server for later debugging (also had similar fun with Flask's debug toolbar).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.