Multicolumn index for where and order by clause (postgres)

Question

Consider the query like this:

select ... from table where a = 1 and b > 2 order by c asc

What would be the ideal idex for this query? Should i use one index (a, b, c) or two separate (a, b) and (c)?

thanks in advance.

It depends on the statistics of the data and how many rows are selected. However, I would expect (a, b) to be optimal under many circumstances. — Gordon Linoff
– Gordon Linoff, Commented Apr 11, 2021 at 12:49
@GordonLinoff But ordering supposed to benefit from indexing too... does the second condition '>' breaks the index scan? — nsavinov
– nsavinov, Commented Apr 11, 2021 at 13:04
. . It is tricky sometimes to have an index used for sorting when there are inequalities in the where clause. — Gordon Linoff
– Gordon Linoff, Commented Apr 11, 2021 at 14:29

jjanes · Accepted Answer · 2021-04-11 19:00:07Z

2

You probably can't do both well. The equality is not a problem, but you can't cleanly combine inequality with ordering. How many rows do you expect to return? If b>2 is very selective leaving few rows left to sort, then clearly you want want (a,b) as getting the selectivity is useful while sorting the few remaining rows doesn't take very long. On the other hand, if b>2 only rules out a few rows, then (a,c) (perhaps with more columns at the end to allow index-only scans) avoids a big slow sort while removing the few rows that fail b "the hard way" doesn't take very long.

You could build both and let the planner make the call, although it is far from perfect at doing so.

It is possible to use a GiST index on all three columns to achieve both the inequality filtering and the ordering simultaneously, but GiST indexes have a much higher overhead than BTree indexes as well as being slower to build and maintain, so it is quite unlikely to be worth while, although if you used a LIMIT that could make it more attractive. You would also need to write the query in a contorted way, as a KNN query (assisted by the btree_gist extension):

where a = 1 and b > 2 order by c <-> impossibly_low_value

There are other advanced possibilities. If the thing you compare b to is always >2, then you could use a partial index (a,c) where b>2 or an expression index (a,(b>2),c). You might also be able to usefully partition on b. If b has a small number of distinct values, you could just UNION ALL the results of queries on each qualifying distinct value and use the index on (a,b,c), getting an efficient merge append:

(select * from foo where a=1 and b=3 order by c asc) 
    union all 
(select * from foo where a=1 and b=4 order by c asc) 
order by c asc;

edited Apr 11, 2021 at 19:00

answered Apr 11, 2021 at 18:54

jjanes

45k5 gold badges39 silver badges48 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Laurenz Albe Over a year ago

Will the UNION approach really help with sorting?

jjanes Over a year ago

It was to be UNION ALL, not just UNION. It provides order without doing a sort at all, by preserving the index ordering. How effective it is depends on how clustered the table is, and on sequential vs random page costs. Of course the effectiveness is most pronounced when there is a LIMIT specified.

Laurenz Albe Over a year ago

I mean, if you get two sorted results (using the index) and then UNION ALL them, the result has to be sorted again (like you did in your answer). But then you might as well omit the ORDER BY in the subqueries, and the index will not be used for sorting the overall result.

jjanes Over a year ago

@LaurenzAlbe My answer just specified they be ordered, not that be sorted. The planner is free to obtain the ordering any means it can, which is a merge append over index scans in this case. It should be able to do the same thing with just the outer ORDER BY and not the inner ones, but it isn't smart enough to push the ORDER BY down. I thought that that was to be fixed in v17, but I just checked and it still doesn't initiate the push down, but still benefits from it if you force it by manually over-specifying the ORDER BY

Collectives™ on Stack Overflow

Multicolumn index for where and order by clause (postgres)

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related