Postgres using two indexes for WHERE clause in parallel

Question

I have table which has a few billion rows. There are two columns in the table:

match_id uuid,
group_id integer

And there are indexes created on both of the above columns:

create index if not exists my_tbl_match_id_idx on my_tbl (match_id);
create index if not exists my_tbl_group_id_idx on my_tbl (group_id);

When I perform following query, both indexes are scanned in parallel. match_id index look up is much faster and returns result back in 13 milliseconds while group_id index look up is slower and takes 1.3 seconds. Because both indexes are looked up in parallel, overall query time = 1.3 seconds i.e. look up time of group_id index.

Is there any way Postgres can just use group_id index after the match_id result set is computed?

SELECT *
FROM my_table
WHERE match_id = 'e089e0af-543b-45d5-abbf-22c6c5ed9a01'
AND (group_id IN (167,1704,1864,2065,2145,3812,3814,3855,7462,11393,11394,11396))

Or do I need to create a composite index for match_id and group_id?

I see. My query pattern is what I mentioned in the question above but also just on match_id i.e. SELECT * FROM my_table WHERE match_id = 'e089e0af-543b-45d5-abbf-22c6c5ed9a01' hopefully composite index won't affect adversely on this query pattern — Sagar
– Sagar, Commented Sep 15, 2023 at 14:36
you can ry it only with the combined if the other query get slower you can add always more indexes, but too many make inserts and update slower — nbk
– nbk, Commented Sep 15, 2023 at 14:38
I don't think the parallelization you describe actually exists in PostgreSQL. Please show us the plan, not your interpretation of the plan. — jjanes
– jjanes, Commented Sep 15, 2023 at 16:30

Erwin Brandstetter · Accepted Answer · 2023-09-17 08:09:24Z

There are many factors with bearing on the best course of action, most of which are not clear from the question. Consider instructions for postgresql-performance questions here.

Generally speaking, a multicolumn index on (match_id, group_id) (with leading match_id!) performs substantially better for your query than combining two separate indexes in bitmap index scans.

Multicolumn index and performance

In effect, this index implements your requirement exactly:

Is there any way Postgres can just use group_id index after the match_id result set is computed?

A multicolumn index is sorted by leading expressions first. Using this index, Postgres excludes the lion share of irrelevant matches on group_id for your query. Should be particularly impactful for your case since match_id seems to be much more selective.

See:

This index also covers queries on just match_id like you mentioned in your comment. Since match_id is the more selective and bigger column (uuid = 16 bytes, int = 4 bytes), the multicolumn index isn't much bigger than the one on just (match_id), and should work almost as well for the purpose, making the overhead for maintaining a second index questionable. The other index on just (group_id) may still serve its purpose for queries on just group_id.

See:

Is a composite index also good for queries on the first field?

Other advice

Your query can benefit a lot from physically clustering rows on match_id. Use CLUSTER or one of the less blocking community tools. See:

Only reasonable if there isn't too much write activity on the table (or large parts of it).

Also, if you don't really need SELECT *, but only a small subset of columns, there is potential for optimization.

Collectives™ on Stack Overflow

Postgres using two indexes for WHERE clause in parallel

1 Answer 1

Other advice

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Other advice

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related