0

I am a newbie to database optimisations,I have is around 29 million rows,it takes 13 seconds. What can I do to optimize performance?

"Properties" column is int array. I created a GIN index on F."Properties",

SELECT
    F. "Id",
    F. "Name",
    F. "Url",
    F. "CountryModel",
    F. "Properties",
     F. "PageRank",
    F. "IsVerify",
    count(*) AS Counter
FROM
    public. "Firms" F,
    LATERAL unnest(F."Properties") AS P
WHERE
    F. "CountryId" = 1   
    AND P = ANY (ARRAY[126,128]) 
    AND "Properties" && ARRAY[126,128]
    AND F. "Deleted" = FALSE
GROUP BY
    F. "Id"
ORDER BY
    F. "IsVerify" DESC,
    Counter DESC,    
    F. "PageRank" DESC OFFSET 0 ROWS FETCH FIRST 100 ROW ONLY```

Thats My Query Plan Analyze

"Limit  (cost=801718.65..801718.70 rows=20 width=368) (actual time=12671.277..12674.826 rows=20 loops=1)"
"  ->  Sort  (cost=801718.65..802180.37 rows=184689 width=368) (actual time=12671.276..12674.824 rows=20 loops=1)"
"        Sort Key: f.""IsVerify"" DESC, (count(*)) DESC, f.""PageRank"" DESC"
"        Sort Method: top-N heapsort  Memory: 47kB"
"        ->  GroupAggregate  (cost=763260.63..796804.14 rows=184689 width=368) (actual time=12284.752..12592.010 rows=201352 loops=1)"
"              Group Key: f.""Id"""
"              ->  Nested Loop  (cost=763260.63..793110.36 rows=369378 width=360) (actual time=12284.734..12488.106 rows=205124 loops=1)"
"                    ->  Gather Merge  (cost=763260.62..784770.69 rows=184689 width=360) (actual time=12284.716..12389.961 rows=201352 loops=1)"
"                          Workers Planned: 2"
"                          Workers Launched: 2"
"                          ->  Sort  (cost=762260.59..762452.98 rows=76954 width=360) (actual time=12258.175..12309.931 rows=67117 loops=3)"
"                                Sort Key: f.""Id"""
"                                Sort Method: external merge  Disk: 35432kB"
"                                Worker 0:  Sort Method: external merge  Disk: 35536kB"
"                                Worker 1:  Sort Method: external merge  Disk: 35416kB"
"                                ->  Parallel Bitmap Heap Scan on ""Firms"" f  (cost=1731.34..743387.12 rows=76954 width=360) (actual time=57.500..12167.222 rows=67117 loops=3)"
"                                      Recheck Cond: (""Properties"" && '{126,128}'::integer[])"
"                                      Rows Removed by Index Recheck: 356198"
"                                      Filter: ((NOT ""Deleted"") AND (""CountryId"" = 1))"
"                                      Heap Blocks: exact=17412 lossy=47209"
"                                      ->  Bitmap Index Scan on ix_properties_gin  (cost=0.00..1685.17 rows=184689 width=0) (actual time=61.628..61.628 rows=201354 loops=1)"
"                                            Index Cond: (""Properties"" && '{126,128}'::integer[])"
"                    ->  Memoize  (cost=0.01..0.14 rows=2 width=0) (actual time=0.000..0.000 rows=1 loops=201352)"
"                          Cache Key: f.""Properties"""
"                          Hits: 179814  Misses: 21538  Evictions: 0  Overflows: 0  Memory Usage: 3076kB"
"                          ->  Function Scan on unnest p  (cost=0.00..0.13 rows=2 width=0) (actual time=0.001..0.001 rows=1 loops=21538)"
"                                Filter: (p = ANY ('{126,128}'::integer[]))"
"                                Rows Removed by Filter: 6"
"Planning Time: 2.542 ms"
"Execution Time: 12675.382 ms"

Thats EXPLAIN (ANALYZE, BUFFERS) result

"Limit  (cost=793826.15..793826.20 rows=20 width=100) (actual time=12879.468..12882.414 rows=20 loops=1)"
"  Buffers: shared hit=108 read=194121 written=1, temp read=3685 written=3697"
"  ->  Sort  (cost=793826.15..794287.87 rows=184689 width=100) (actual time=12879.468..12882.412 rows=20 loops=1)"
"        Sort Key: f.""IsVerify"" DESC, (count(*)) DESC, f.""PageRank"" DESC"
"        Sort Method: top-N heapsort  Memory: 29kB"
"        Buffers: shared hit=108 read=194121 written=1, temp read=3685 written=3697"
"        ->  GroupAggregate  (cost=755368.13..788911.64 rows=184689 width=100) (actual time=12623.980..12845.122 rows=201352 loops=1)"
"              Group Key: f.""Id"""
"              Buffers: shared hit=108 read=194121 written=1, temp read=3685 written=3697"
"              ->  Nested Loop  (cost=755368.13..785217.86 rows=369378 width=92) (actual time=12623.971..12785.946 rows=205124 loops=1)"
"                    Buffers: shared hit=108 read=194121 written=1, temp read=3685 written=3697"
"                    ->  Gather Merge  (cost=755368.12..776878.19 rows=184689 width=120) (actual time=12623.945..12680.899 rows=201352 loops=1)"
"                          Workers Planned: 2"
"                          Workers Launched: 2"
"                          Buffers: shared hit=108 read=194121 written=1, temp read=3685 written=3697"
"                          ->  Sort  (cost=754368.09..754560.48 rows=76954 width=120) (actual time=12613.425..12624.658 rows=67117 loops=3)"
"                                Sort Key: f.""Id"""
"                                Sort Method: external merge  Disk: 9848kB"
"                                Buffers: shared hit=108 read=194121 written=1, temp read=3685 written=3697"
"                                Worker 0:  Sort Method: external merge  Disk: 9824kB"
"                                Worker 1:  Sort Method: external merge  Disk: 9808kB"
"                                ->  Parallel Bitmap Heap Scan on ""Firms"" f  (cost=1731.34..743387.12 rows=76954 width=120) (actual time=42.098..12567.883 rows=67117 loops=3)"
"                                      Recheck Cond: (""Properties"" && '{126,128}'::integer[])"
"                                      Rows Removed by Index Recheck: 356198"
"                                      Filter: ((NOT ""Deleted"") AND (""CountryId"" = 1))"
"                                      Heap Blocks: exact=17323 lossy=47429"
"                                      Buffers: shared hit=97 read=194118 written=1"
"                                      ->  Bitmap Index Scan on ix_properties_gin  (cost=0.00..1685.17 rows=184689 width=0) (actual time=41.862..41.862 rows=201354 loops=1)"
"                                            Index Cond: (""Properties"" && '{126,128}'::integer[])"
"                                            Buffers: shared hit=4 read=74"
"                    ->  Memoize  (cost=0.01..0.14 rows=2 width=0) (actual time=0.000..0.000 rows=1 loops=201352)"
"                          Cache Key: f.""Properties"""
"                          Hits: 179814  Misses: 21538  Evictions: 0  Overflows: 0  Memory Usage: 3076kB"
"                          ->  Function Scan on unnest p  (cost=0.00..0.13 rows=2 width=0) (actual time=0.001..0.001 rows=1 loops=21538)"
"                                Filter: (p = ANY ('{126,128}'::integer[]))"
"                                Rows Removed by Filter: 6"
"Planning:"
"  Buffers: shared hit=32 read=6 dirtied=1"
"Planning Time: 4.533 ms"
"Execution Time: 12883.604 ms"
3
  • This is based on the same table as a previous question, but the question is not the same. Commented Apr 10, 2022 at 15:33
  • Is ARRAY[126,128] a magic value which doesn't change from execution to execution? Or is it just an example you picked for demo purposes? Commented Apr 10, 2022 at 15:55
  • Yes, it varies from execution to execution; I give an example; restaurant id 126 and good for kids id 128 or no smoking properties id 3 Commented Apr 10, 2022 at 17:21

1 Answer 1

0

You should increase work_mem to get rid of the lossy pages in the bitmap. I don't think this will make a big difference, because I suspect most of your time is going to read the pages from disk, and converting lossy pages to exact pages doesn't change how many pages get read (unless TOAST is involved, which I suspect is not--how large does the "Properties" array get?). But I might be wrong, so try it and see. Also, if you turn on track_io_timing and collect your plans with EXPLAIN (ANALYZE, BUFFERS), then we could immediately see if the IO read time was the problem.

Beyond that, this looks very hard to optimize with traditional methods. You can usually optimize ORDER BY...LIMIT by using an index to read rows already in order, but since the 2nd column in your ordering is computed dynamically, this is unlikely here. Are values within "Properties" unique? So can 126 and 128 each exist and be counted at most once per row, or can they exist and be counted multiple times?

The easiest way to optimize this might be on the app or business end. Do we really need to run this query at all, and why? What if we queried only "IsVerify" is true, rather than sorting by it? If that only returns 95 rows, is it really necessary to go back and fill in 5 more with "IsVerify" is false?, etc.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.