1

The table is represented by following scripts:

CREATE TABLE sales (
  id SERIAL PRIMARY KEY,
  product_id INTEGER,
  sales_date DATE,
  quantity INTEGER,
  price NUMERIC
);

INSERT INTO sales (product_id, sales_date, quantity, price) VALUES
   (1, '2023-01-01', 10, 10.00),
   (1, '2023-01-02', 12, 12.00),
   (1, '2023-01-03', 15, 15.00),
   (2, '2023-01-01', 8, 8.00),
   (2, '2023-01-02', 10, 10.00),
   (2, '2023-01-03', 12, 12.00);

The task is to sum sales quantity for last 3 days per each product_id. The period must be counted backwards from maximum (last) date of each product_id. So for 1 maximum is 2023-01-03, same for 2. But for product_id 2 last day may be differ from 1 - let's say 2023-01-05.

By applying this query with window function in subquery:

select product_id, max(increasing_sum) as quantity_last_3_days
   from 
        (SELECT product_id,
         SUM(quantity) OVER (PARTITION BY product_id ORDER BY sales_date RANGE BETWEEN INTERVAL '2 days'
                PRECEDING AND CURRENT ROW) AS increasing_sum
         FROM sales) as s
   group by product_id;

I receive the expected output:

  | product_id | quantity_last_3_days |
  |____________|______________________|            
  |_____1______|___________37_________|
  |_____2______|___________30_________|     
 

But is it solution optimal? Is there any way to solve the problem, by using window function without subquery ?

1
  • 1
    The query you show does not match the description of what you want. It shows the highest 3-day window, not the most recent 3-day window. Those just happen to be the same thing given your demo data, but are not the same thing in general. Commented May 4, 2024 at 23:27

3 Answers 3

2

But is it solution optimal?

No, not if you just need "to sum sales quantity for last 3 days per each product_id".

Is there any way to solve the problem, by using window function without subquery?

You can usually trade window functions for correlated subqueries, lateral or scalar, but there's an easier way to speed up your query, and correct it. It is doing extra work trying to get you a rolling/stepping sum; the window isn't trying to get the 3 most recent days for each product_id.

Instead, for each row, it's looking back at rows with the same product_id up to 2 days prior. You're later picking the 3 day period with the highest sum of quantity, which won't necessarily be the 3 most recent dates.

On 400k samples, your query takes a whole 1.0s without an index, 0.7s with a covering index, and you can go down from that to 0.4s without or 0.1s with the covering index. You just need to only ask about the sum for the 3 most recent dates for each product_id: demo at db<>fiddle

select product_id,sum(quantity) as quantity_last_3_days
from(select *,row_number()over(PARTITION BY product_id ORDER BY sales_date DESC)
     from sales)_
where 3>=row_number
group by product_id;

The trick here is that the window function will execute with a Run Condition: (3 >= row_number() OVER (?)), which means it'll just grab the 3 most recent and quit. It can even get them straight off the top of the covering index, without ever having to visit the table.
Your original query has to scan the whole thing (either the entire table or the whole index, if it's available), then sort that to get the max().


As you pointed out, that's counting any last 3 dates for each product_id, which don't have to be consecutive and the most recent one doesn't have to be today or yesterday.
Not the last 3 days in general, counting back from current date.
Not the last 3 consecutive dates for each product_id.

In case you need that, last 3 from today are trivial to implement:

select product_id,sum(quantity) as quantity_last_3_days
from sales
where sales_date>='2023-01-03'::date--this would be current_date or now()
                  -'2d'::interval
group by product_id;

and the 3 most recent consecutive per product_id are similar

with cte as (
  select distinct on (product_id) *
  from sales s1
  order by product_id,sales_date desc)
select *,(select sum(quantity)
          from sales s2
          where s1.product_id=s2.product_id
          and s1.sales_date-'2 days'::interval >= s2.sales_date) 
          as quantity_last_3_days
from cte s1;
Sign up to request clarification or add additional context in comments.

3 Comments

One more moment. My sample represent the situation, when each sale happens each date and happens strictly ones a day. But, if have, for example dates sequence: '2023-01-07', '2023-01-05', '2023-01-01', and apply 'where 3>=row_number'-clause, we'll get '2023-01-07', '2023-01-05', '2023-01-01' records, although it is expected '2023-01-07', '2023-01-05', according to main condition "last 3 days"
@Jelly That is exactly right and I assumed that was your intention. That's what I understood from your "last 3 days per each product_id". I figured if you wanted the last date for each of those, plus two consecutive days prior, you'd say "last 3 consecutive days", or if you needed last 3 days from the current date, regardless of what's a given product_id's actual most recent recorded sales_date, you'd point that out as well. These alternative interpretations of the logic are both way easier to implement.
you're right - my answer does not clear at this moment. I don't take into account, that each sale not not necessary happens each date and not necessary happens strictly ones a day
2

If you want to avoid window functions (which one might because of their cognitive load), you could also solve the problem using a WITH clause (a.k.a. CTE). Since the query does not actually need anything more than a single threshold for each product id, you could express the filtering in a join condition with a CTE as follows:

WITH latest_sales AS (
  SELECT product_id, max(sales_date) AS time 
  FROM sales 
  GROUP BY product_id
)
SELECT sales.product_id, sum(sales.quantity) as quantity_last_3_days
FROM sales INNER JOIN latest_sales ON 
    sales.product_id = latest_sales.product_id 
    AND sales.sales_date >= latest_sales.time - interval '2 days'
GROUP BY sales.product_id;

5 Comments

I tried your solution, and it is nothing returns. Maybe it's a matter in Where-clause ?
Well, if you are using the sample data you provided, as I understand it, there should be no results (all sales were made in 2023, so no sales have been made in the past 3 days). If you update the data, you should get correct results. Or perhaps I misunderstood the intended result?
The period must be counted backwards from maximum (last) date of each product_id. So for 1 maximum is 2023-01-03, same for 2. But for product_id 2 last day may be differ from 1 - let's say 2023-01-05. I updated my question to make it more clearly
That makes sense. I'll update the answer!
I changed my answer to an example which does not use a window query.
0

My answer. It particulary learning on Zegarek and grundb answers:

select product_id, quantity_last_3_days
  FROM
     (SELECT product_id,
        CASE 
          WHEN
             ROW_NUMBER() OVER (PARTITION BY product_id ORDER BY sales_date DESC) < 2
          THEN
             SUM(quantity) OVER (PARTITION BY product_id ORDER BY sales_date RANGE BETWEEN INTERVAL '2 days'
            PRECEDING AND CURRENT ROW) 
          ELSE
             NULL
        END quantity_last_3_days
       FROM sales) as s
 WHERE  quantity_last_3_days IS NOT NULL;

It avoid expensive max operation and takes into account, that each sale not not necessary happens each date and not necessary happens strictly ones a day

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.