1
  1. Create 2 identical tables:
CREATE TABLE logs1 (
    user_id int4 NOT NULL,
    create_time timestamptz NOT NULL DEFAULT now()
);

CREATE TABLE logs2 (
    user_id int4 NOT NULL,
    create_time timestamptz NOT NULL DEFAULT now()
);
  1. I insert 200,000,000 entries into the first table, 2,000,000 for each of the 100 users:
INSERT INTO public.logs1 (user_id) (select 1 from generate_series(1,2000000));
INSERT INTO public.logs1 (user_id) (select 2 from generate_series(1,2000000));
INSERT INTO public.logs1 (user_id) (select 3 from generate_series(1,2000000));
--...
INSERT INTO public.logs1
SELECT user_id
     , now()+make_interval(secs => user_id)
FROM generate_series(4,100) AS user_id
CROSS JOIN LATERAL generate_series(1,2e6) AS duplicate_count;
  1. Insert 200,000,000 entries into the second table, but in such a way that the data of several users takes up most of the total data:
INSERT INTO public.logs2(user_id) select 1 from generate_series(1,95802970);
INSERT INTO public.logs2(user_id) select 2 from generate_series(1,10561980);
INSERT INTO public.logs2(user_id) select 3 from generate_series(1,17014900);
--...
INSERT INTO public.logs2 
SELECT ordinality+3
     , now()+make_interval(secs => ordinality)
FROM unnest(ARRAY[/*95802970,10561980,17014900,*/14987000,14701010,13137950,7180130,6930550,4260890,3702960,3215300,2648260,1197320,802200,709840,661650,628120,590780,433570,291460,195000,158450,55980,34210,21660,17400,9500,7850,7450,4910,4860,3150,2760,2620,2160,1810,1630,1410,1090,1050,830,610,560,540,500,490,330,240,210,160,130,110,100,100,90,90,90,80,80,70,70,70,60,40,40,40,30,30,30,30,30,20,20,20,20,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10])
WITH ORDINALITY AS _(val,ordinality)
CROSS JOIN LATERAL generate_series(1,val);
  1. Create identical indexes:
CREATE INDEX logs1_user_id_idx ON ONLY logs1 USING btree (user_id);
CREATE INDEX logs1_create_time_idx ON ONLY logs1 USING btree (create_time);
CREATE INDEX logs1_user_id_create_time_idx ON ONLY logs1 USING btree (user_id, create_time);

CREATE INDEX logs2_user_id_idx ON ONLY logs2 USING btree (user_id);
CREATE INDEX logs2_create_time_idx ON ONLY logs2 USING btree (create_time);
CREATE INDEX logs2_user_id_create_time_idx ON ONLY logs2 USING btree (user_id, create_time);
  1. Run analyze:
analyze logs1;
analyze logs2;

As a result, I have 2 tables with the same schemas, indexes, the same amount of data, but with a different distribution of this data.

I ran this query on both tables:

EXPLAIN ANALYZE
SELECT *
FROM logs1 -- logs2
WHERE user_id = 1001 
ORDER BY create_time  DESC
OFFSET 0 ROWS 
FETCH FIRST 10 ROWS only

And here is the result:

--------- logs1 ---------
Limit  (cost=0.57..8.59 rows=1 width=12) (actual time=0.026..0.028 rows=0 loops=1)
  ->  Index Only Scan Backward using logs1_user_id_create_time_idx on logs1  (cost=0.57..8.59 rows=1 width=12) (actual time=0.024..0.024 rows=0 loops=1)
        Index Cond: (user_id = 1001)
        Heap Fetches: 0
Planning Time: 0.447 ms
Execution Time: 0.062 ms

--------- logs2 ---------
Limit  (cost=0.57..0.78 rows=10 width=12) (actual time=0.042..0.044 rows=0 loops=1)
  ->  Index Only Scan Backward using logs2_user_id_create_time_idx on logs2  (cost=0.57..414.57 rows=20000 width=12) (actual time=0.039..0.040 rows=0 loops=1)
        Index Cond: (user_id = 1001)
        Heap Fetches: 0
Planning Time: 0.220 ms
Execution Time: 0.080 ms

In both cases the correct index was used - logs1_user_id_create_time_idx, which caused the query to execute quickly. Well, now I run this query:

EXPLAIN ANALYZE
SELECT *
FROM logs1 -- logs2
WHERE user_id IN (1001, 1002)
ORDER BY create_time  DESC
OFFSET 0 ROWS 
FETCH FIRST 10 ROWS only

And here is the result:

--------- logs1 ---------
Limit  (cost=13.18..13.18 rows=1 width=12) (actual time=0.086..0.087 rows=0 loops=1)
  ->  Sort  (cost=13.18..13.18 rows=1 width=12) (actual time=0.083..0.084 rows=0 loops=1)
        Sort Key: create_time DESC
        Sort Method: quicksort  Memory: 25kB
        ->  Index Only Scan using logs1_user_id_create_time_idx on logs1  (cost=0.57..13.17 rows=1 width=12) (actual time=0.074..0.074 rows=0 loops=1)
              Index Cond: (user_id = ANY ('{1001,1002}'::integer[]))
              Heap Fetches: 0
Planning Time: 0.223 ms
Execution Time: 0.124 ms

--------- logs2 ---------
Limit  (cost=0.57..1303.11 rows=10 width=12) (actual time=254657.997..254658.003 rows=0 loops=1)
  ->  Index Scan Backward using logs2_create_time_idx on logs2  (cost=0.57..5210160.65 rows=40000 width=12) (actual time=254657.992..254657.993 rows=0 loops=1)
        Filter: (user_id = ANY ('{1001,1002}'::integer[]))
        Rows Removed by Filter: 200000000
Planning Time: 0.247 ms
Execution Time: 254658.069 ms

In case of logs1, the query executed quickly because the same index was used as in the previous query.

In logs2, the query executed 2 million times longer (over 4 minutes) because a different index was used - logs2_create_time_idx. If I remove this index: logs2_create_time_idx, then postgres will come to its senses and use logs2_user_id_create_time_idx, causing the query to execute quickly:

Limit  (cost=9.18..9.18 rows=1 width=12) (actual time=0.143..0.146 rows=0 loops=1)
  ->  Sort  (cost=9.18..9.18 rows=1 width=12) (actual time=0.141..0.143 rows=0 loops=1)
        Sort Key: create_time DESC
        Sort Method: quicksort  Memory: 25kB
        ->  Index Only Scan using logs1_user_id_create_time_idx on logs1  (cost=0.57..9.17 rows=1 width=12) (actual time=0.122..0.123 rows=0 loops=1)
              Index Cond: (user_id = ANY ('{1001,1002}'::integer[]))
              Heap Fetches: 0
Planning Time: 0.245 ms
Execution Time: 0.188 ms

I am in a situation where postgres selects the wrong index for me, which is why the query takes incomparably longer to execute than if the correct index had been selected. What can I do about it? In sql server I could force the engine to use a specific index, but in pogres, as far as I know, this cannot be done.

I will also add that postgres uses the correct index when I operate on a smaller amount of data, e.g. not 200,000,000, but 2,000,000.

2

1 Answer 1

1

When having 200 million records for just a few unique user IDs, you need better statistics.

This one already works:

ALTER TABLE logs2 ALTER COLUMN user_id SET STATISTICS 200;
ANALYZE logs2;

And gave me this result:

Limit  (cost=560.60..560.62 rows=10 width=12) (actual time=0.166..0.167 rows=0 loops=1)
  Output: user_id, create_time
  Buffers: shared hit=4
  ->  Sort  (cost=560.60..595.44 rows=13939 width=12) (actual time=0.164..0.165 rows=0 loops=1)
        Output: user_id, create_time
        Sort Key: logs2.create_time DESC
        Sort Method: quicksort  Memory: 25kB
        Buffers: shared hit=4
        ->  Index Only Scan using logs2_user_id_create_time_idx on public.logs2  (cost=0.57..259.38 rows=13939 width=12) (actual time=0.124..0.124 rows=0 loops=1)
              Output: user_id, create_time
              Index Cond: (logs2.user_id = ANY ('{1001,1002}'::integer[]))
              Heap Fetches: 0
              Buffers: shared hit=4
Settings: random_page_cost = '1.1'
Planning Time: 0.228 ms
Execution Time: 0.203 ms

My non default value for random_page_cost doesn't have any impact, other values for the different cpu-costs might have. You might want to check these.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.