1

I have table which has a few billion rows. There are two columns in the table:

match_id uuid,
group_id integer

And there are indexes created on both of the above columns:

create index if not exists my_tbl_match_id_idx on my_tbl (match_id);
create index if not exists my_tbl_group_id_idx on my_tbl (group_id);

When I perform following query, both indexes are scanned in parallel. match_id index look up is much faster and returns result back in 13 milliseconds while group_id index look up is slower and takes 1.3 seconds. Because both indexes are looked up in parallel, overall query time = 1.3 seconds i.e. look up time of group_id index.

Is there any way Postgres can just use group_id index after the match_id result set is computed?

SELECT *
FROM my_table
WHERE match_id = 'e089e0af-543b-45d5-abbf-22c6c5ed9a01'
AND (group_id IN (167,1704,1864,2065,2145,3812,3814,3855,7462,11393,11394,11396))

Or do I need to create a composite index for match_id and group_id?

4
  • 1
    usually you would make a combined combined index, Commented Sep 15, 2023 at 14:34
  • I see. My query pattern is what I mentioned in the question above but also just on match_id i.e. SELECT * FROM my_table WHERE match_id = 'e089e0af-543b-45d5-abbf-22c6c5ed9a01' hopefully composite index won't affect adversely on this query pattern Commented Sep 15, 2023 at 14:36
  • you can ry it only with the combined if the other query get slower you can add always more indexes, but too many make inserts and update slower Commented Sep 15, 2023 at 14:38
  • I don't think the parallelization you describe actually exists in PostgreSQL. Please show us the plan, not your interpretation of the plan. Commented Sep 15, 2023 at 16:30

1 Answer 1

3

There are many factors with bearing on the best course of action, most of which are not clear from the question. Consider instructions for questions here.

Generally speaking, a multicolumn index on (match_id, group_id) (with leading match_id!) performs substantially better for your query than combining two separate indexes in bitmap index scans.

In effect, this index implements your requirement exactly:

Is there any way Postgres can just use group_id index after the match_id result set is computed?

A multicolumn index is sorted by leading expressions first. Using this index, Postgres excludes the lion share of irrelevant matches on group_id for your query. Should be particularly impactful for your case since match_id seems to be much more selective.

See:

This index also covers queries on just match_id like you mentioned in your comment. Since match_id is the more selective and bigger column (uuid = 16 bytes, int = 4 bytes), the multicolumn index isn't much bigger than the one on just (match_id), and should work almost as well for the purpose, making the overhead for maintaining a second index questionable. The other index on just (group_id) may still serve its purpose for queries on just group_id.

See:

Other advice

Your query can benefit a lot from physically clustering rows on match_id. Use CLUSTER or one of the less blocking community tools. See:

Only reasonable if there isn't too much write activity on the table (or large parts of it).

Also, if you don't really need SELECT *, but only a small subset of columns, there is potential for optimization.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.