2

I have this simple query in pg

EXPLAIN ANALYZE 
select * from email_events 
where act_owner_id = 500
order by date desc
limit 500

The first query execution take very long time about 7 seconds.

"Limit  (cost=0.43..8792.83 rows=500 width=2311) (actual time=3.064..7282.497 rows=500 loops=1)"
"  ->  Index Scan Backward using email_events_idx_date on email_events  (cost=0.43..233667.36 rows=13288 width=2311) (actual time=3.059..7282.094 rows=500 loops=1)"
"        Filter: (act_owner_id = 500)"
"        Rows Removed by Filter: 1053020"
"Total runtime: 7282.818 ms"

After the first execution the query i guess is cached and goes in 20-30 ms.

Why the LIMIT is so slow when there is no cache? How can i fix this?

7
  • The table has 2.5 mil rows Commented Feb 21, 2014 at 19:48
  • 1
    do you have an index on (act_owner_id, date) ? Commented Feb 21, 2014 at 19:53
  • Yes i do, even if is order by act_owner_id the result is the same Commented Feb 21, 2014 at 20:06
  • 1
    Try a composite index on act_owner_id + date (in this order). Commented Feb 21, 2014 at 20:12
  • How huge is your database aprox? How looks your load on the system? (Thinking of maybe a real huge database with small resources is maybe lagging on I/O on first attempt) Commented Feb 21, 2014 at 21:04

2 Answers 2

2

CLUSTER TABLE on INDEX seems to fix the problem. It seems that after bulk data loading that data is all over the hard drive. CLUSTER table will re-order the data on the hard drive

Sign up to request clarification or add additional context in comments.

1 Comment

Nice to see CLUSTERing on an index being useful for once.
1

PostgreSQL thinks it will be faster to scan the date-ordered index backwards (i.e. in DESC order), reading every row and throwing away the rows that don't have the right act_owner_id. It's having to do 1053020 random reads to do this, and backward index scans aren't very fast either.

Try creating an index on email_events(date DESC, act_owner_id). I think Pg will be able to do a forward index scan on that and then use the second index term to filter rows, so it shouldn't have to do a heap lookup. Test with EXPLAIN and see.

1 Comment

Not helping... The DB was filled with dummy data generated from me, loaded table by table in bulk...I made CLUSTER on one smaller table, and the problem seems to be fixed for that table. After cleared cache the first time the query is executed in less than 500ms, before CLUSTER used to go up to 20 sec.. will do CLUSTER on the biggest table to see if that will fix the problem

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.