2

When I use the following query, the response time is really terrible(sometimes over a minute!).

select * from cdr where start_time < now() - interval '4 hours' and final = 0 limit 50 

I am trying to get records where the final=0 and the starttime is over 4 hours old. the following is the index I have on the table:

CREATE INDEX "cdr_Final_ix"
ON cdr
USING btree
(start_time , final );

The following is the explain analyze:

"Limit  (cost=0.00..167.81 rows=50 width=188) (actual time=64491.409..64650.635 rows=11 loops=1)"
"  ->  Seq Scan on cdr  (cost=0.00..749671.06 rows=223372 width=188) (actual time=64491.407..64650.625 rows=11 loops=1)"
"Filter: ((final = 0) AND (start_time < (now() - '04:00:00'::interval)))"
"Total runtime: 64650.690 ms"

Any help would be Greatly appreciated. Thanks, Ari

6
  • How many rows ? How many qualify ? Note: you have no order by Commented Jul 30, 2012 at 8:09
  • It doesn't matter to me what the order is. The amount of rows in the table varies during the day between ~10K to 5million. the amount that qualifies varies between 0 and 200 because I update the final after I query it usually. If I am not up to date then there can qualify >100K. Commented Jul 30, 2012 at 8:30
  • I get index-scans < 50 ms for 40K rows, even with "bad" tuning. Your tuning constants? Your distribution of values? Vacuum analyze ? version ? NB: I am now increasing the number of rows. Commented Jul 30, 2012 at 8:59
  • im using version 9.1. I was able to successfully use the index when i stated "start_time = '7/29/2012'" but this won't help me for my query. When it compares an inequality it doesn't pick up the index. If my program that reads from postgres is working correctly, then the values where final = 0 should always be within 4 hours of the current date and just a little outside the 4 hour window. Commented Jul 30, 2012 at 9:12
  • With a partial index (WHERE final=0) I get sstill get index scans, results in sub-millisecond times, with normal index 100--200 ms, with about 2.5M rows UPDATE: I'll post as an answer. Commented Jul 30, 2012 at 9:14

1 Answer 1

5
-- DROP SCHEMA tmp CASCADE;
-- CREATE SCHEMA tmp ;
SET search_path='tmp';

-- Generate some data
CREATE TABLE cdr
        ( start_time TIMESTAMP NOT NULL
        , final INTEGER
        );
INSERT INTO cdr (start_time,final)
SELECT gs, random() * 1000
FROM generate_series('2012-07-01 00:00:00', '2012-08-01 00:00:00', '1 s'::interval) gs
        ;
DROP INDEX "cdr_Final_ix";
CREATE INDEX "cdr_Final_ix"
ON cdr
USING btree
(start_time , final )
WHERE final = 0 -- partial index here
;

-- Do some data-massaging
-- UPDATE cdr
-- SET final = random() * 100
-- WHERE final = 0
-- AND random() < 0.2 ;

VACUUM ANALYZE cdr;

-- SET tuning to default (the worst possible)
SET random_page_cost = 4;
SET work_mem = 64;
SET effective_cache_size = 64;
-- SET shared_buffers = 64;

EXPLAIN ANALYZE
SELECT * from cdr
WHERE start_time < now() - interval '4 hours'
AND final = 0
ORDER BY start_time
LIMIT 50
        ;

Result:

                                                           QUERY PLAN                                                            
----------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.01..88.49 rows=50 width=12) (actual time=0.191..0.452 rows=50 loops=1)
   ->  Index Scan using "cdr_Final_ix" on cdr  (cost=0.01..4310.95 rows=2436 width=12) (actual time=0.188..0.321 rows=50 loops=1)
         Index Cond: ((start_time < (now() - '04:00:00'::interval)) AND (final = 0))
 Total runtime: 0.569 ms
(4 rows)
Sign up to request clarification or add additional context in comments.

5 Comments

ok. sounds good. i changed my index and i am vacuum analyzing now. what was the data massaging for and is it necessary because I don't want to change my data?
No, the data massaging was just to alter the distribution of "final" in my synthetic dataset. I also shifted the dates up to 2012-07-30 + one month, with no change in the resulting plan.
Awsome! it worked! its picking up the index. Thanks so much. I am getting a different query plan than you though. See next comment
"Limit (cost=204.54..391.71 rows=50 width=188) (actual time=0.469..0.533 rows=50 loops=1)" " -> Bitmap Heap Scan on cdr (cost=204.54..21298.88 rows=5635 width=188) (actual time=0.469..0.520 rows=50 loops=1)" " Recheck Cond: ((start_time < (now() - '04:00:00'::interval)) AND (final = 0))" " -> Bitmap Index Scan on "cdr_Final_ix" (cost=0.00..203.13 rows=5635 width=0) (actual time=0.394..0.394 rows=1194 loops=1)" " Index Cond: ((start_time < (now() - '04:00:00'::interval)) AND (final = 0))" "Total runtime: 0.576 ms" Any idea Why?
The reason why are the tunables {random_page_cost,work_mem,effective_cache_size,shared_buffers} plus maybe the distribution of the values (or its estimate) YMMV...

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.