6

I have a custom Django (v 2.0.0) command to start background job executers in a multi-threaded fashion which seems to give me memory leak issues.

The command can be started like so:

./manage.py start_job_executer --thread=1

Each thread has a while True loop that picks up jobs from a PostgreSQL table.

In order to pick up the job and change the status atomically I used transactions:

# atomic transaction to temporary lock the db access and to
# get the most recent job from db with column status = pending
with transaction.atomic():
    job = Job.objects.select_for_update() \
        .filter(status=Job.STATUS['pending']) \
        .order_by('created_at').first()
    if job:
        job.status = Job.STATUS['executing']
        job.save()

Il looks like the allocated memory by this Django custom command keeps growing.

Using tracemalloc I tried to find what is causing the memory leak by creating a background thread that checks the memory allocation:

def check_memory(self):
        while True:
            s1 = tracemalloc.take_snapshot()
            sleep(10)
            s2 = tracemalloc.take_snapshot()
            for alog in s2.compare_to(s1, 'lineno')[:10]:
                log.info(alog)

Finding out the following log:

01.04.20 13:50:06   operations.py:222: size=23.7 KiB (+23.7 KiB), count=66 (+66), average=367 B
01.04.20 13:50:36   operations.py:222: size=127 KiB (+43.7 KiB), count=353 (+122), average=367 B
01.04.20 13:51:04   operations.py:222: size=251 KiB (+66.7 KiB), count=699 (+186), average=367 B
01.04.20 13:51:31   operations.py:222: size=379 KiB (+68.9 KiB), count=1056 (+192), average=367 B
01.04.20 13:51:57   operations.py:222: size=495 KiB (+60.3 KiB), count=1380 (+168), average=367 B

Looks like /usr/local/lib/python3.5/dist-packages/django/db/backends/postgresql/operations.py:222 does not release memory

The leakage is slow for 1 thread but if I use 8 threads the memory leak is worse:

01.04.20 13:07:51   operations.py:222: size=68.3 KiB (+68.3 KiB), count=191 (+191), average=366 B
01.04.20 13:08:56   operations.py:222: size=770 KiB (+140 KiB), count=2151 (+390), average=367 B
01.04.20 13:10:07   operations.py:222: size=1476 KiB (+138 KiB), count=4122 (+386), average=367 B

01.04.20 13:36:22   operations.py:222: size=17.3 MiB (+138 KiB), count=49506 (+385), average=367 B

01.04.20 13:48:16   operations.py:222: size=24.5 MiB (+136 KiB), count=69993 (+379), average=367 B

This is the code at line 222 in /usr/local/lib/python3.5/dist-packages/django/db/backends/postgresql/operations.py:222:

def last_executed_query(self, cursor, sql, params):
        # http://initd.org/psycopg/docs/cursor.html#cursor.query
        # The query attribute is a Psycopg extension to the DB API 2.0.
        if cursor.query is not None:
            return cursor.query.decode() # this is line 222!
        return None

I have no clue how to attack this problem. Any ideas at all?

Posted it here also: https://code.djangoproject.com/ticket/31419#ticket

I was thinking to maybe fork a new process for every job that needs to be executed, and once finished, the memory would be deallocated with the process itself dying. This would probably work but it seems a little overkill.

Thanks in advance

UPDATE

I was using Django 2.0 and I thought to update to Django 3.0.5 (latest stable release), but unfortunately the problem is still there.

Below the new logs:

01.04.20 20:15:06   operations.py:235: size=977 KiB (+53.9 KiB), count=2750 (+152), average=364 B
01.04.20 20:15:28   operations.py:235: size=1070 KiB (+50.1 KiB), count=3012 (+141), average=364 B
01.04.20 20:15:53   operations.py:235: size=1156 KiB (+43.7 KiB), count=3255 (+123), average=364 B
01.04.20 20:16:19   operations.py:235: size=1245 KiB (+44.7 KiB), count=3507 (+126), average=364 B

01.04.20 20:20:23   operations.py:235: size=2154 KiB (+44.3 KiB), count=6065 (+125), average=364 B

1 Answer 1

6

Django keeps a reference to all executed queries in a ring buffer when settings.DEBUG = True

From DEBUG ​documentation

It is also important to remember that when running with DEBUG turned on, Django will remember every SQL query it executes. This is useful when you’re debugging, but it’ll rapidly consume memory on a production server.

Setting DEBUG = False should address your issue.

To wipe the ring buffer in situations where it may pose a problem in development:

from django.db import reset_queries
if settings.DEBUG:
    reset_queries()
Sign up to request clarification or add additional context in comments.

3 Comments

Thank you! A week of debugging later you finally nailed it!
Is there a way to say Django not to store those SQL queries in a particular function? Or maybe it's possible to manually flush the buffer that stores the queries? The problem is that I want to use all advantages of the debug mode when testing my scripts, but the memory consumption is so huge that my PC runs out of memory.
@AndreyVolkov I know your comment is almost a year old, but I updated Simon's answer to demonstrate the mechanism for resetting the buffer. I don't know that a way exists to disable it in DEBUG mode, but you could possibly look at the source for the reset_queries function and monkeypatch the underlying data structure with a different implementation that just does a pass instead of holding it.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.