3

I have a table with sales records by year as follows:

id year sales
1  2001 10
2  2002 20
3  2003 30

I'm joining the table to itself in order to get a sales_difference from one year to the next:

SELECT s1.*, s1.sales - s2.sales AS sales_difference
FROM sales s1, sales s2
WHERE s1.year = s2.year + 1

This query runs pretty slowly, so I want to create an index on year + 1. According to the PostgreSQL docs you can create indexes on expressions such as:

CREATE INDEX test1_lower_col1_idx ON test1 (lower(col1));

So I try doing this:

CREATE INDEX sales_year_plus_one on sales (year + 1);

which gives me:

ERROR:  syntax error at or near "+"
LINE 1: ...sales_year_plus_one on sales (year + 1);
                                              ^

Why is this particular expression not allowed?

2 Answers 2

6

You need to enclose your expression in an additional set of parentheses:

CREATE INDEX sales_year_plus_one on sales ((year + 1));

See this excerpt from the documentation:

The syntax of the CREATE INDEX command normally requires writing parentheses around index expressions, as shown in the second example. The parentheses can be omitted when the expression is just a function call, as in the first example.

Sign up to request clarification or add additional context in comments.

1 Comment

Very well spotted. How odd that such a vital piece of information is so subtly tucked away in the small print of the documentation.
4

You can also use a window function to get the same effect without the additional index (and thus the overhead of maintaining that index):

SELECT *, sales - lag(sales) OVER (ORDER BY year) AS sales_difference
FROM sales;

A simple index on year, which you probably already have, will suffice here.

3 Comments

This is amazing. I guess this probably counts as some sort of best practice for this sort of task. But care must be taken around the ORDER BY inside the OVER as you have to take account of all the other fields in your query (clearly in this dumb case, that's not necessary.)
Using a self-join means two table scans, window functions get by with a single scan. Do an EXPLAIN ANALYZE on both queries to see the difference. And as with all SQL queries, except the most trivial, care must always be taken in getting it right.
You have to see the results from EXPLAIN ANALYZE to see the performance difference for both queries. explain.depesz.com can help

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.