Postgres is using wrong index

Question

Create 2 identical tables:

CREATE TABLE logs1 (
    user_id int4 NOT NULL,
    create_time timestamptz NOT NULL DEFAULT now()
);

CREATE TABLE logs2 (
    user_id int4 NOT NULL,
    create_time timestamptz NOT NULL DEFAULT now()
);

I insert 200,000,000 entries into the first table, 2,000,000 for each of the 100 users:

INSERT INTO public.logs1 (user_id) (select 1 from generate_series(1,2000000));
INSERT INTO public.logs1 (user_id) (select 2 from generate_series(1,2000000));
INSERT INTO public.logs1 (user_id) (select 3 from generate_series(1,2000000));
--...
INSERT INTO public.logs1
SELECT user_id
     , now()+make_interval(secs => user_id)
FROM generate_series(4,100) AS user_id
CROSS JOIN LATERAL generate_series(1,2e6) AS duplicate_count;

Insert 200,000,000 entries into the second table, but in such a way that the data of several users takes up most of the total data:

INSERT INTO public.logs2(user_id) select 1 from generate_series(1,95802970);
INSERT INTO public.logs2(user_id) select 2 from generate_series(1,10561980);
INSERT INTO public.logs2(user_id) select 3 from generate_series(1,17014900);
--...
INSERT INTO public.logs2 
SELECT ordinality+3
     , now()+make_interval(secs => ordinality)
FROM unnest(ARRAY[/*95802970,10561980,17014900,*/14987000,14701010,13137950,7180130,6930550,4260890,3702960,3215300,2648260,1197320,802200,709840,661650,628120,590780,433570,291460,195000,158450,55980,34210,21660,17400,9500,7850,7450,4910,4860,3150,2760,2620,2160,1810,1630,1410,1090,1050,830,610,560,540,500,490,330,240,210,160,130,110,100,100,90,90,90,80,80,70,70,70,60,40,40,40,30,30,30,30,30,20,20,20,20,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10])
WITH ORDINALITY AS _(val,ordinality)
CROSS JOIN LATERAL generate_series(1,val);

Create identical indexes:

CREATE INDEX logs1_user_id_idx ON ONLY logs1 USING btree (user_id);
CREATE INDEX logs1_create_time_idx ON ONLY logs1 USING btree (create_time);
CREATE INDEX logs1_user_id_create_time_idx ON ONLY logs1 USING btree (user_id, create_time);

CREATE INDEX logs2_user_id_idx ON ONLY logs2 USING btree (user_id);
CREATE INDEX logs2_create_time_idx ON ONLY logs2 USING btree (create_time);
CREATE INDEX logs2_user_id_create_time_idx ON ONLY logs2 USING btree (user_id, create_time);

Run analyze:

analyze logs1;
analyze logs2;

As a result, I have 2 tables with the same schemas, indexes, the same amount of data, but with a different distribution of this data.

I ran this query on both tables:

EXPLAIN ANALYZE
SELECT *
FROM logs1 -- logs2
WHERE user_id = 1001 
ORDER BY create_time  DESC
OFFSET 0 ROWS 
FETCH FIRST 10 ROWS only

And here is the result:

--------- logs1 ---------
Limit  (cost=0.57..8.59 rows=1 width=12) (actual time=0.026..0.028 rows=0 loops=1)
  ->  Index Only Scan Backward using logs1_user_id_create_time_idx on logs1  (cost=0.57..8.59 rows=1 width=12) (actual time=0.024..0.024 rows=0 loops=1)
        Index Cond: (user_id = 1001)
        Heap Fetches: 0
Planning Time: 0.447 ms
Execution Time: 0.062 ms

--------- logs2 ---------
Limit  (cost=0.57..0.78 rows=10 width=12) (actual time=0.042..0.044 rows=0 loops=1)
  ->  Index Only Scan Backward using logs2_user_id_create_time_idx on logs2  (cost=0.57..414.57 rows=20000 width=12) (actual time=0.039..0.040 rows=0 loops=1)
        Index Cond: (user_id = 1001)
        Heap Fetches: 0
Planning Time: 0.220 ms
Execution Time: 0.080 ms

In both cases the correct index was used - logs1_user_id_create_time_idx, which caused the query to execute quickly. Well, now I run this query:

EXPLAIN ANALYZE
SELECT *
FROM logs1 -- logs2
WHERE user_id IN (1001, 1002)
ORDER BY create_time  DESC
OFFSET 0 ROWS 
FETCH FIRST 10 ROWS only

And here is the result:

--------- logs1 ---------
Limit  (cost=13.18..13.18 rows=1 width=12) (actual time=0.086..0.087 rows=0 loops=1)
  ->  Sort  (cost=13.18..13.18 rows=1 width=12) (actual time=0.083..0.084 rows=0 loops=1)
        Sort Key: create_time DESC
        Sort Method: quicksort  Memory: 25kB
        ->  Index Only Scan using logs1_user_id_create_time_idx on logs1  (cost=0.57..13.17 rows=1 width=12) (actual time=0.074..0.074 rows=0 loops=1)
              Index Cond: (user_id = ANY ('{1001,1002}'::integer[]))
              Heap Fetches: 0
Planning Time: 0.223 ms
Execution Time: 0.124 ms

--------- logs2 ---------
Limit  (cost=0.57..1303.11 rows=10 width=12) (actual time=254657.997..254658.003 rows=0 loops=1)
  ->  Index Scan Backward using logs2_create_time_idx on logs2  (cost=0.57..5210160.65 rows=40000 width=12) (actual time=254657.992..254657.993 rows=0 loops=1)
        Filter: (user_id = ANY ('{1001,1002}'::integer[]))
        Rows Removed by Filter: 200000000
Planning Time: 0.247 ms
Execution Time: 254658.069 ms

In case of logs1, the query executed quickly because the same index was used as in the previous query.

In logs2, the query executed 2 million times longer (over 4 minutes) because a different index was used - logs2_create_time_idx. If I remove this index: logs2_create_time_idx, then postgres will come to its senses and use logs2_user_id_create_time_idx, causing the query to execute quickly:

Limit  (cost=9.18..9.18 rows=1 width=12) (actual time=0.143..0.146 rows=0 loops=1)
  ->  Sort  (cost=9.18..9.18 rows=1 width=12) (actual time=0.141..0.143 rows=0 loops=1)
        Sort Key: create_time DESC
        Sort Method: quicksort  Memory: 25kB
        ->  Index Only Scan using logs1_user_id_create_time_idx on logs1  (cost=0.57..9.17 rows=1 width=12) (actual time=0.122..0.123 rows=0 loops=1)
              Index Cond: (user_id = ANY ('{1001,1002}'::integer[]))
              Heap Fetches: 0
Planning Time: 0.245 ms
Execution Time: 0.188 ms

I am in a situation where postgres selects the wrong index for me, which is why the query takes incomparably longer to execute than if the correct index had been selected. What can I do about it? In sql server I could force the engine to use a specific index, but in pogres, as far as I know, this cannot be done.

I will also add that postgres uses the correct index when I operate on a smaller amount of data, e.g. not 200,000,000, but 2,000,000.

Same, but older, question: stackoverflow.com/questions/78789920/… — Frank Heikens
– Frank Heikens, Commented Oct 17, 2024 at 17:08
This question woul be better suited to DBA Stack Exchange: it is not about a programming problem. — JohnH
– JohnH, Commented Oct 18, 2024 at 14:05

Frank Heikens · Accepted Answer · 2024-10-17 15:52:49Z

When having 200 million records for just a few unique user IDs, you need better statistics.

This one already works:

ALTER TABLE logs2 ALTER COLUMN user_id SET STATISTICS 200;
ANALYZE logs2;

And gave me this result:

Limit  (cost=560.60..560.62 rows=10 width=12) (actual time=0.166..0.167 rows=0 loops=1)
  Output: user_id, create_time
  Buffers: shared hit=4
  ->  Sort  (cost=560.60..595.44 rows=13939 width=12) (actual time=0.164..0.165 rows=0 loops=1)
        Output: user_id, create_time
        Sort Key: logs2.create_time DESC
        Sort Method: quicksort  Memory: 25kB
        Buffers: shared hit=4
        ->  Index Only Scan using logs2_user_id_create_time_idx on public.logs2  (cost=0.57..259.38 rows=13939 width=12) (actual time=0.124..0.124 rows=0 loops=1)
              Output: user_id, create_time
              Index Cond: (logs2.user_id = ANY ('{1001,1002}'::integer[]))
              Heap Fetches: 0
              Buffers: shared hit=4
Settings: random_page_cost = '1.1'
Planning Time: 0.228 ms
Execution Time: 0.203 ms

My non default value for random_page_cost doesn't have any impact, other values for the different cpu-costs might have. You might want to check these.

Collectives™ on Stack Overflow

Postgres is using wrong index

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related