2

Consider the query like this:

select ... from table where a = 1 and b > 2 order by c asc

What would be the ideal idex for this query? Should i use one index (a, b, c) or two separate (a, b) and (c)?

thanks in advance.

3
  • 1
    It depends on the statistics of the data and how many rows are selected. However, I would expect (a, b) to be optimal under many circumstances. Commented Apr 11, 2021 at 12:49
  • @GordonLinoff But ordering supposed to benefit from indexing too... does the second condition '>' breaks the index scan? Commented Apr 11, 2021 at 13:04
  • . . It is tricky sometimes to have an index used for sorting when there are inequalities in the where clause. Commented Apr 11, 2021 at 14:29

1 Answer 1

2

You probably can't do both well. The equality is not a problem, but you can't cleanly combine inequality with ordering. How many rows do you expect to return? If b>2 is very selective leaving few rows left to sort, then clearly you want want (a,b) as getting the selectivity is useful while sorting the few remaining rows doesn't take very long. On the other hand, if b>2 only rules out a few rows, then (a,c) (perhaps with more columns at the end to allow index-only scans) avoids a big slow sort while removing the few rows that fail b "the hard way" doesn't take very long.

You could build both and let the planner make the call, although it is far from perfect at doing so.

It is possible to use a GiST index on all three columns to achieve both the inequality filtering and the ordering simultaneously, but GiST indexes have a much higher overhead than BTree indexes as well as being slower to build and maintain, so it is quite unlikely to be worth while, although if you used a LIMIT that could make it more attractive. You would also need to write the query in a contorted way, as a KNN query (assisted by the btree_gist extension):

where a = 1 and b > 2 order by c <-> impossibly_low_value

There are other advanced possibilities. If the thing you compare b to is always >2, then you could use a partial index (a,c) where b>2 or an expression index (a,(b>2),c). You might also be able to usefully partition on b. If b has a small number of distinct values, you could just UNION ALL the results of queries on each qualifying distinct value and use the index on (a,b,c), getting an efficient merge append:

(select * from foo where a=1 and b=3 order by c asc) 
    union all 
(select * from foo where a=1 and b=4 order by c asc) 
order by c asc;
Sign up to request clarification or add additional context in comments.

4 Comments

Will the UNION approach really help with sorting?
It was to be UNION ALL, not just UNION. It provides order without doing a sort at all, by preserving the index ordering. How effective it is depends on how clustered the table is, and on sequential vs random page costs. Of course the effectiveness is most pronounced when there is a LIMIT specified.
I mean, if you get two sorted results (using the index) and then UNION ALL them, the result has to be sorted again (like you did in your answer). But then you might as well omit the ORDER BY in the subqueries, and the index will not be used for sorting the overall result.
@LaurenzAlbe My answer just specified they be ordered, not that be sorted. The planner is free to obtain the ordering any means it can, which is a merge append over index scans in this case. It should be able to do the same thing with just the outer ORDER BY and not the inner ones, but it isn't smart enough to push the ORDER BY down. I thought that that was to be fixed in v17, but I just checked and it still doesn't initiate the push down, but still benefits from it if you force it by manually over-specifying the ORDER BY

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.