1

I have a query which get data summarised between two dates like so:

SELECT date(created_at),
COUNT(COALESCE(id, 0))                                                                      AS total_orders,
   SUM(COALESCE(total_price, 0))                                                               AS total_price,
   SUM(COALESCE(taxes, 0))                                                                     AS taxes,
   SUM(COALESCE(shipping, 0))                                                                  AS shipping,
   AVG(COALESCE(total_price, 0))                                                               AS average_order_value,
   SUM(COALESCE(total_discount, 0))                                                            AS total_discount,
   SUM(total_price - COALESCE(taxes, 0) - COALESCE(shipping, 0) - COALESCE(total_discount, 0)) as net_sales
FROM orders
WHERE shop_id = 43
  AND orders.active = true
  AND orders.created_at >= '2022-07-20'
  AND orders.created_at <= '2022-07-26'
GROUP BY date (created_at)
order by created_at::date desc

However for dates that do not have any orders, the query returns nothing and I'd like to return 0.

I have tried with COALESCE but that doesn't seem to do the trick?

Any suggestions?

8
  • What date value do you want to return in the case of a 0 count? Commented Jul 25, 2022 at 7:01
  • Use generate_seies and left join to the resultant list of dates. Commented Jul 25, 2022 at 7:10
  • You can use a simple nested query and count the resulting rows if it's 0 pick the default values in the outer query. Commented Jul 25, 2022 at 7:10
  • @TimBiegeleisen 0 would be ideal. I'm making a graph with the result, so I can't have missing dates. Commented Jul 25, 2022 at 7:47
  • This sounds like something you should handle in the presentation layer which is calling Postgres. Just check for an empty result set, in that case, render a different graph. Commented Jul 25, 2022 at 7:48

3 Answers 3

2

This should be substantially faster - and correct:

SELECT *
     , total_price - taxes - shipping - total_discount AS net_sales  -- ⑤
FROM  (
   SELECT created_at
        , COALESCE(total_orders        , 0) AS total_orders
        , COALESCE(total_price         , 0) AS total_price
        , COALESCE(taxes               , 0) AS taxes
        , COALESCE(shipping            , 0) AS shipping
        , COALESCE(average_order_value , 0) AS average_order_value
        , COALESCE(total_discount      , 0) AS total_discount
   FROM   generate_series(timestamp '2022-07-20'  -- ①
                        , timestamp '2022-07-26'
                        , interval '1 day') AS g(created_at)
   LEFT  JOIN (  -- ③
      SELECT created_at::date
           , count(*)            AS total_orders  -- ⑥
           , sum(total_price)    AS total_price
           , sum(taxes)          AS taxes
           , sum(shipping)       AS shipping
           , avg(total_price)    AS average_order_value
           , sum(total_discount) AS total_discount
      FROM   orders
      WHERE  shop_id = 43
      AND    active  -- simpler
      AND    created_at >= '2022-07-20'
      AND    created_at <  '2022-07-27'  -- ② !
      GROUP  BY 1
      ) o USING (created_at)  -- ④
   ) sub
ORDER  BY created_at DESC;

db<>fiddle here

I copied, simplified, and extended Xu's fiddle for comparison.

① Why this particular form for generate_series()? See:

② Assuming created_at is data type timestamp your original formulation is most probably incorrect. created_at <= '2022-07-26' would include the first instant of '2022-07-26' and exclude the rest. To include all of '2022-07-26', use created_at < '2022-07-27'. See:

③ The LEFT JOIN is the core feature of this answer. Generate all days with generate_series(), independently aggregate days from table orders, then LEFT JOIN to retain one row per day like you requested.

④ I made the column name match created_at, so we can conveniently shorten the join syntax with the USING clause.

⑤ Compute net_sales in an outer SELECT after replacing NULL values, so we need COALESCE() only once.

count(*) is equivalent to COUNT(COALESCE(id, 0)) in any case, but cheaper. See:

Sign up to request clarification or add additional context in comments.

Comments

1

Please refer to the below script.

SELECT *
FROM 
    (SELECT date(created_at) AS created_at,
         COUNT(id) AS total_orders,
         SUM(total_price) AS total_price,
         SUM(taxes) AS taxes,
         SUM(shipping) AS shipping,
         AVG(total_price) AS average_order_value,
         SUM(total_discount) AS total_discount,
         SUM(total_price - taxes - shipping - total_discount) AS net_sales
    FROM orders
    WHERE shop_id = 43
        AND orders.active = true
        AND orders.created_at >= '2022-07-20'
        AND orders.created_at <= '2022-07-26'
    GROUP BY  date (created_at)
UNION
SELECT dates AS created_at,
         0 AS total_orders,
         0 AS total_price,
         0 AS taxes,
         0 AS shipping,
         0 AS average_order_value,
         0 AS total_discount,
         0 AS net_sales
FROM generate_series('2022-07-20', '2022-07-26', interval '1 day') AS dates
WHERE dates NOT IN 
    (SELECT created_at
    FROM orders
    WHERE shop_id = 43
        AND orders.active = true
        AND orders.created_at >= '2022-07-20'
        AND orders.created_at <= '2022-07-26' ) ) a
ORDER BY  created_at::date desc;

There is one sample for your reference. Sample

I got your duplicate test cases at my side. The root cause is created_at field (datattype:timestamp), hence there are duplicate lines.

Below script is correct for your request.

SELECT *
FROM 
    (SELECT date(created_at) AS created_at,
         COUNT(id) AS total_orders,
         SUM(total_price) AS total_price,
         SUM(taxes) AS taxes,
         SUM(shipping) AS shipping,
         AVG(total_price) AS average_order_value,
         SUM(total_discount) AS total_discount,
         SUM(total_price - taxes - shipping - total_discount) AS net_sales
    FROM orders
    WHERE shop_id = 43
        AND orders.active = true
        AND orders.created_at >= '2022-07-20'
        AND orders.created_at <= '2022-07-26'
    GROUP BY  date (created_at)
UNION
SELECT dates AS created_at,
         0 AS total_orders,
         0 AS total_price,
         0 AS taxes,
         0 AS shipping,
         0 AS average_order_value,
         0 AS total_discount,
         0 AS net_sales
FROM generate_series('2022-07-20', '2022-07-26', interval '1 day') AS dates
WHERE dates NOT IN 
    (SELECT date (created_at)
    FROM orders
    WHERE shop_id = 43
        AND orders.active = true
        AND orders.created_at >= '2022-07-20'
        AND orders.created_at <= '2022-07-26' ) ) a
ORDER BY  created_at::date desc;

Here is a sample that's same with your side. Link

5 Comments

you suggestion looks promising but it repeats the dates (for the ones where there are orders, I have two rows, one with data and one empty)
Please use the corrected script.
still shows duplicated. There was a duplicated ORDER BY which I removed as it was causing an error. Could have it been related?
Sorry. ORDER BY is duplicate. I modify “Union All” to "Union" for your case. Also, I create some data for testing, refer to them,
Thanks for the help and no need to apologise. I still get the duplicates though :/
0

You can use WITH RECURSIVE to build a table of dates and then select dates that are not in your table

WITH RECURSIVE t(d) AS (
  (SELECT '2015-01-01'::date)
UNION ALL
  (SELECT d + 1 FROM t WHERE d + 1 <= '2015-01-10')
) SELECT d FROM t WHERE d NOT IN (SELECT d_date FROM tbl);

[look on this post : ][1]

[1]: https://stackoverflow.com/questions/28583379/find-missing-dates-postgresql#:~:text=You%20can%20use%20WITH%20RECURSIVE,SELECT%20d_date%20FROM%20tbl)%3B

1 Comment

generate_series() is much easier to use and probably more efficient as well

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.