1

I am trying to solve performance issues at a Django-based web site, with very little knowledge of Django and Python syntax. I seem to have correctly identified the problem. I also seem to know what to do next, but I can't get a grasp on the Python/Django syntax to make everything work.

class Company(models.Model):
    name = models.CharField(max_length=100)
    bic = models.CharField(max_length=100, blank=True)

    def get_order_count(self):
        return self.orders.count()

    def get_order_sum(self):
        total_sum = 0
        for contact in self.contacts.all():
            for order in contact.orders.all():
                total_sum += order.total
        return total_sum

class Contact(models.Model):
    company = models.ForeignKey(
        Company, related_name="contacts", on_delete=models.DO_NOTHING)
    first_name = models.CharField(max_length=100)
    last_name = models.CharField(max_length=100, blank=True)

    def get_order_count(self):
        return self.orders.count()

class Order(models.Model):
    order_number = models.CharField(max_length=100)
    company = models.ForeignKey(Company, related_name="orders", on_delete=models.DO_NOTHING)
    contact = models.ForeignKey(Contact, related_name="orders", on_delete=models.DO_NOTHING)
    total = models.DecimalField(max_digits=18, decimal_places=9)
    order_date = models.DateTimeField(null=True, blank=True)

    def __str__(self):
        return "%s" % self.order_number

My hunch is that the performance problems are caused by the nested loop in get_order_sum. And my solution is quite clear: nested "fors" should be replaced by a simple command, that uses aggregation and utilizes the database's own efficient internal SQL functionality. So in my mind, the solution should look something like this:

return self.contacts.all().orders.all().aggregate(Sum('total'))

The problem is that I can't figure out how to properly write what I want Django/Python to do. Please help me!

Or am I mistaken and is the problem (or part of it) in my View-code?

<table>
    <tr>
        <th>Name</th>
        <th>Order Count</th>
        <th>Order Sum</th>
        <th>Select</th>
    </tr>
    {% for company in company_list|slice:":100" %}
    <tr>
        <td>{{ company.name }}</td>
        <td>{{ company.orders.count }}</td>
        <td>{{ company.get_order_sum|floatformat:2 }}</td>
        <td><input type="checkbox" name="select{{company.pk}}" id=""></td>
    </tr>
    {% for contact in company.contacts.all %}
    <tr>
        <td>- {{ contact.first_name }} {{ contact.last_name }}</td>
        <td>Orders: {{ contact.orders.count }}</td>
        <td>&nbsp;</td>
        <td>&nbsp;</td>
    </tr>
    {% endfor %}
    {% endfor %}
</table>

I would also appreciate any other hints and opinions on how this code could be further improved (especially from the performance POV).

2 Answers 2

1

The key is not to have excess database queries (this is not the same as having as few as possible) and, as you mentioned, to let the database do work it's good at.

In practical terms, a) all data you need should be in company_list by the time it gets to your template, so you don't have database queries sent from within the loops in your template, and b) company_list should be filled with data in an efficient way.

from django.db.models import Prefetch, Sum, Count

contacts_with_orders = Prefetch(
    'contacts',
    queryset=Contact.objects.annotate(order_count=Count('orders'))
)

company_list = (Company.objects
    .prefetch_related(contacts_with_orders)
    .annotate(order_sum=Sum('orders__total'),
              order_count=Count('orders'))
)

Now you can do your loops and access all required data without any further queries:

for company in company_list:
    company.name
    company.order_sum
    company.order_count
    for contact in company.contacts.all():
        contact.first_name
        contact.order_count

While this should be orders of magnitude faster than previously, it is still quite onerous. There is also some more room for optimization: If needed, you can shave off a bit more by querying only the columns you need instead of complete rows and by returning dictionaries instead of objects. See only(), values() and values_list() and the to_attr parameter in Prefetch.

Sign up to request clarification or add additional context in comments.

7 Comments

Unfortunately I couldn't make it work. :-( queryset=Contacts.objects.annotate(order_count=Count('orders')). NameError: name 'Contacts' is not defined. Changing it to singular also didn't work. Is orders__total written correctly: plural and with two underlines? I am also not sure if I am pasting it in the right place. It should be pasted as one block, right? In the beginning (right after 'import's?), after the rest of the classes or inside one of the classes? Thank you for helping! You already gave me great ideas!
The code belongs in your view where you define company_list that is used in the template. You may need to import your models to access them: from your_app_name.models import Contact.
I put it in the View as well as added the required from mailer.models import Company, Contact. So the code doesn't give any errors. But it doesn't provide any information: neither of contact.order_count, company.order_sum nor company.order_count give anything. Any ideas on how to fix it (or at least how to get more info about why it doesn't work)?
Try some old-fashioned debugging: Print out company_list after instantiating it; print out company_list.count() to see how many objects there are, print out vars(company_list[0])to see the fields in a company object etc.
Trying to put in index.html: 1. {{ company_list }} - [<Company: Company object>, <Company: Company object>, '...(remaining elements truncated)...'] ; 2. {{ company_list.count() }} - 100; 3. {{ vars(company_list[0]) }} - TemplateSyntaxError: Could not parse the remainder: '[0]' from 'company_list[0]'; 4. {{ vars(company_list[0]) }} - TemplateSyntaxError: Could not parse the remainder: '(company_list[0])' from 'vars(company_list[0])'; 5. Trying to put it in models.py ("return company_list"), causes NameError: name 'company_list' is not defined.
|
1

As you guessed, your performance problem is most likely caused by get_order_sum method. You are running a query to get all contats of a company, then for each contact, you are running a query to get that contact's ordesr. You can find this sum with a single query, like this in Django:

from django.db.models import Sum
def get_order_sum(self):
    return self.contacts.aggregate(order_sum=Sum('orders__total')).get('order_sum')

Note that aggregate function returns a dictionary in the following format:

{
    'order_sum': 123
}

5 Comments

I get the error NameError: name 'order__total' is not defined. Trying to rename it to order_total, orders__total, orders_total, total doesn't help either. The only other hunch I had was to try to rename self.contracts to self.contract, but that only caused the error AttributeError: 'Company' object has no attribute 'contact'.
@MikkoVedru sorry I had a type in the statement. Updated the answer now. Can you try with Sum('orders__total') (note the single quotes)
Thank you! It works now! The results are: "SQL: 1294 queries in 78ms" -> "SQL: 798 queries in 32-33ms"; "CPU: 4400ms-4500ms" -> "CPU: 2350-2450ms".
If you don't mind, could you please explain why if there a double underline used, instead of a singular? What should I google for to educate myself?
double underscore is used to follow relations in django queries. You can use double underscore in filters, annotations, aggregations, ordering etc. In you case, we are getting the usm from a related model field. orders__total means total field of model related to Contact model by 'orders' related name. You can read more about this here: django.readthedocs.io/en/2.1.x/topics/db/…

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.