0

Suppose I have the following table:

class Word(Model):
    id = UUIDField(primary_key=True,
                   default=uuid.uuid4,
                   editable=False)
    name = CharField(_('Word'),
                     db_index=True,
                     max_length=250)

This table contains around 3 million rows. Content is mainly Cyrillian words.

What I want to accomplish is to search through some predefined list of values. The problem is that words in the table are accented.

Content of tables is like:

мо́рква
буря́к
я́блуко

But I query without accent.

words = ['морква', 'буряк', 'яблуко']

Word.objects.annotated(name_without_accent=Func(
            F('name'),Value('[%s]' % ACCENTS), Value(''),
            function='regexp_replace',
        )).filter(reduce(operator.or_, [Q(name_without_accent__icontains=w) for w in words], Q()))

The problem is that this query on such big table runs very long. An words list can contain more elements (around 20-30).

Is there anyway to improve overall performance of querying such big table? Improve Django query? Tune Postgres? Use some external tools?

1 Answer 1

2

Try to use unicodedata.normalize('NFD', word) instead of regexp. And if you do a lot of such queries it would be reasonable to make additional field with unAccent versions of words and make index by that field, because in your situation db_index doesn't give any help it just slows table with reindexing.

Sign up to request clarification or add additional context in comments.

11 Comments

Where should I use unicodedata.normalize('NFD', word)?
In annotation field
Forgot my previous comment about annotation ((( It was too early in the morning for answering, sorry ((( I think the best solution would be creation of additional field or even additional temporary table
Did as you told, but still contains is very slow. How can I improve it? Query runs 7 seconds (instead of 65 as earlier).
7 seconds even after indexing that field? By, the way if you index it using Django migrations, check in db if it really crates index. Sometimes Django indexing doesn't have effect.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.