0

I have a query like this where join ~6000 values

SELECT DISTINCT ON(user_id)
                user_id,
                finished_at as last_deposit_date,
                CASE When currency = 'RUB' Then amount_cents  END as last_deposit_amount_cents
            FROM    payments
            JOIN (VALUES (5),(22),(26)) --~6000 values
            AS v(user_id) USING (user_id)
            WHERE action = 'deposit' 
                AND success = 't'
                AND currency IN ('RUB')
            ORDER BY user_id, finished_at DESC

QUERY PLAN for query with many VALUES:

Unique  (cost=444606.97..449760.44 rows=19276 width=24) (actual time=6129.403..6418.317 rows=5991 loops=1)
  Buffers: shared hit=2386527, temp read=7807 written=7808
  ->  Sort  (cost=444606.97..447183.71 rows=1030695 width=24) (actual time=6129.401..6295.457 rows=1877039 loops=1)
        Sort Key: payments.user_id, payments.finished_at DESC
        Sort Method: external merge  Disk: 62456kB
        Buffers: shared hit=2386527, temp read=7807 written=7808
        ->  Nested Loop  (cost=0.43..341665.35 rows=1030695 width=24) (actual time=0.612..5085.376 rows=1877039 loops=1)
              Buffers: shared hit=2386521
              ->  Values Scan on "*VALUES*"  (cost=0.00..75.00 rows=6000 width=4) (actual time=0.002..4.507 rows=6000 loops=1)
              ->  Index Scan using index_payments_on_user_id on payments  (cost=0.43..54.78 rows=172 width=28) (actual time=0.010..0.793 rows=313 loops=6000)
                    Index Cond: (user_id = "*VALUES*".column1)
                    Filter: (success AND ((action)::text = 'deposit'::text) AND ((currency)::text = 'RUB'::text))
                    Rows Removed by Filter: 85
                    Buffers: shared hit=2386521
Planning time: 5.886 ms
Execution time: 6429.685 ms

I use PosgreSQL 10.8.0. Is there any chance to speed up this query?

I tried replacing DISTINCT with recursion:

WITH RECURSIVE t AS (
 (SELECT min(user_id) AS user_id FROM payments)
 UNION ALL
 SELECT (SELECT min(user_id) FROM payments  
 WHERE user_id > t.user_id      
 ) AS user_id FROM
t   
  WHERE t.user_id IS NOT NULL
 )
SELECT payments.* FROM t
JOIN (VALUES (5),(22),(26)) --~6000 VALUES
AS v(user_id) USING (user_id)
, LATERAL (
 SELECT user_id,
        finished_at as last_deposit_date,
        CASE When currency = 'RUB' Then amount_cents  END as last_deposit_amount_cents FROM payments            
        WHERE payments.user_id=t.user_id
            AND action = 'deposit' 
        AND success = 't'
        AND currency IN ('RUB')     
        ORDER BY finished_at DESC LIMIT 1
) AS payments

WHERE t.user_id IS NOT NULL;

But it turned out even slower.

Hash Join (cost=418.67..21807.22 rows=3000 width=24) (actual time=16.804..10843.174 rows=5991 loops=1) Hash Cond: (t.user_id = "VALUES".column1) Buffers: shared hit=6396763 CTE t -> Recursive Union (cost=0.46..53.73 rows=101 width=8) (actual time=0.142..1942.351 rows=237029 loops=1) Buffers: shared hit=864281 -> Result (cost=0.46..0.47 rows=1 width=8) (actual time=0.141..0.142 rows=1 loops=1) Buffers: shared hit=4 InitPlan 3 (returns $1) -> Limit (cost=0.43..0.46 rows=1 width=8) (actual time=0.138..0.139 rows=1 loops=1) Buffers: shared hit=4 -> Index Only Scan using index_payments_on_user_id on payments payments_2 (cost=0.43..155102.74 rows=4858092 width=8) (actual time=0.137..0.138 rows=1 loops=1) Index Cond: (user_id IS NOT NULL) Heap Fetches: 0 Buffers: shared hit=4 -> WorkTable Scan on t t_1 (cost=0.00..5.12 rows=10 width=8) (actual time=0.008..0.008 rows=1 loops=237029) Filter: (user_id IS NOT NULL) Rows Removed by Filter: 0 Buffers: shared hit=864277 SubPlan 2 -> Result (cost=0.48..0.49 rows=1 width=8) (actual time=0.007..0.007 rows=1 loops=237028) Buffers: shared hit=864277 InitPlan 1 (returns $3) -> Limit (cost=0.43..0.48 rows=1 width=8) (actual time=0.007..0.007 rows=1 loops=237028) Buffers: shared hit=864277 -> Index Only Scan using index_payments_on_user_id on payments payments_1 (cost=0.43..80786.25 rows=1619364 width=8) (actual time=0.007..0.007 rows=1 loops=237028) Index Cond: ((user_id IS NOT NULL) AND (user_id > t_1.user_id)) Heap Fetches: 46749 Buffers: shared hit=864277 -> Nested Loop (cost=214.94..21498.23 rows=100 width=32) (actual time=0.475..10794.535 rows=167333 loops=1) Buffers: shared hit=6396757 -> CTE Scan on t (cost=0.00..2.02 rows=100 width=8) (actual time=0.145..1998.788 rows=237028 loops=1) Filter: (user_id IS NOT NULL) Rows Removed by Filter: 1 Buffers: shared hit=864281 -> Limit (cost=214.94..214.94 rows=1 width=24) (actual time=0.037..0.037 rows=1 loops=237028) Buffers: shared hit=5532476 -> Sort (cost=214.94..215.37 rows=172 width=24) (actual time=0.036..0.036 rows=1 loops=237028) Sort Key: payments.finished_at DESC Sort Method: quicksort Memory: 25kB Buffers: shared hit=5532476 -> Index Scan using index_payments_on_user_id on payments (cost=0.43..214.08 rows=172 width=24) (actual time=0.003..0.034 rows=15 loops=237028) Index Cond: (user_id = t.user_id) Filter: (success AND ((action)::text = 'deposit'::text) AND ((currency)::text = 'RUB'::text)) Rows Removed by Filter: 6 Buffers: shared hit=5532473 -> Hash (cost=75.00..75.00 rows=6000 width=4) (actual time=2.255..2.255 rows=6000 loops=1) Buckets: 8192 Batches: 1 Memory Usage: 275kB -> Values Scan on "VALUES" (cost=0.00..75.00 rows=6000 width=4) (actual time=0.004..1.206 rows=6000 loops=1) Planning time: 7.029 ms Execution time: 10846.774 ms

1 Answer 1

1

For this query:

SELECT DISTINCT ON (user_id)
       p.user_id,
       p.finished_at as last_deposit_date,
       (CASE WHEN p.currency = 'RUB' THEN p.amount_cents  END) as last_deposit_amount_cents
FROM payments p JOIN
     (VALUES (5),( 22), (26) --~6000 values
     ) v(user_id)
     USING (user_id)
WHERE p.action = 'deposit' AND
      p.success = 't' ND
      p.currency = 'RUB'
ORDER BY p.user_id, p.finished_at DESC;

I don't fully understand the CASE expression, because the WHERE is filtering out all other values.

That said, I would expect an index on (action, success, currency, user_id, finished_at desc) to be helpful.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.