generate_series() not working as expected with sum in PostgreSQL

Question

I have some table called classification that contains classification_indicator_id.
I need to sum this ID and put in 1 day series.
I need to add around 20 columns (with another classification_indicator_id).
I modified a bit answer from previous question:

select
data.d::date as "data",
sum(c.classification_indicator_id)::integer as "Segment1",
sum(c4.classification_indicator_id)::integer as "Segment2",
sum(c5.classification_indicator_id)::integer as "Segment3"
from 
  generate_series(
    '2013-03-25'::timestamp without time zone,
    '2013-04-01'::timestamp without time zone,
    '1 day'::interval
) data(d)
left join classifications c on (data.d::date = c.created::date and c.classification_indicator_id = 3)
left join classifications c4 on (data.d::date = c4.created::date and c4.classification_indicator_id = 4)
left join classifications c5 on (data.d::date = c5.created::date and c5.classification_indicator_id = 5)
group by "data"
ORDER BY "data"

But still not working properly. sum for each row is to big, and growing when I add additional columns. In second table with 4 columns in segment1 for 2013-03-26 should be the same amount like in first table etc.

 With 3 column                      With 4 columns
data       | Segment1 | Segment2   data       | Segment1 | Segment2 | Segment3
--------------------------------   -------------------------------------------
2013-03-25 | 12       | 16         2013-03-25 | 12       | 16       | 20
--------------------------------   -------------------------------------------
2013-03-26 | 18       | 24         2013-03-26 | 108      | 144      | 180

Community · Accepted Answer · 2017-05-23 10:25:12Z

2

As commented under your previous answer, you are running into a "proxy cross join".
I explained it in more detail in this related answer:
Two SQL LEFT JOINS produce incorrect result

Your query should work like this:

SELECT d.created AS data
      ,c3.segment1
      ,c4.segment2
      ,c5.segment3
FROM (
   SELECT generate_series('2013-03-25'::date
                         ,'2013-04-01'::date
                         ,interval '1 day')::date AS created
    ) d
LEFT JOIN (
    SELECT created
          ,sum(classification_indicator_id)::integer AS segment1
    FROM   classifications
    WHERE  classification_indicator_id = 3
    GROUP  BY 1
    ) c3 USING (created)
LEFT JOIN (
    SELECT created
          ,sum(classification_indicator_id)::integer AS segment2
    FROM   classifications
    WHERE  classification_indicator_id = 4
    GROUP  BY 1
    ) c4 USING (created)
LEFT JOIN (
    SELECT created
          ,sum(classification_indicator_id)::integer AS segment3
    FROM   classifications
    WHERE  classification_indicator_id = 5
    GROUP  BY 1
    ) c5 USING (created)
ORDER  BY 1;

Assuming that created is a date, not a timestamp.

Or, for an even faster query, since this has become a topic:

SELECT d.created AS data
      ,count(classification_indicator_id = 3 OR NULL)::int * 3 AS segment1
      ,count(classification_indicator_id = 4 OR NULL)::int * 4 AS segment2
      ,count(classification_indicator_id = 5 OR NULL)::int * 5 AS segment3
FROM (
   SELECT generate_series('2013-03-25'::date
                         ,'2013-04-01'::date
                         ,interval '1 day')::date AS created
    ) d
LEFT   JOIN classifications c USING (created)
GROUP  BY 1
ORDER  BY 1;

edited May 23, 2017 at 10:25

CommunityBot

11 silver badge

answered Apr 4, 2013 at 12:54

Erwin Brandstetter

669k160 gold badges1.2k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

ssuperczynski Over a year ago

Thank you, we are now testing this solution I will let you know if it helped us. But we think that this is it :)

ssuperczynski Over a year ago

We will use both of solutions, they are great. From 10000ms to 100ms this is it!

Erwin Brandstetter Over a year ago

@infaustus: cross joins can get very expensive. Since performance has become a topic, I provided a variant which might be even faster. Yes, might. The proof of the pudding is in the testing. ;)

Clodoaldo Neto Over a year ago

I like the direct casting of the function generate_series(...)::date. That will clean lots of code.

Clodoaldo Neto · Accepted Answer · 2013-04-04 13:11:09Z

2

No need for joins:

select
    data.d::date as "data",
    sum((classification_indicator_id = 3)::integer * classification_indicator_id)::integer as "Segment1",
    sum((classification_indicator_id = 4)::integer * classification_indicator_id)::integer as "Segment2",
    sum((classification_indicator_id = 5)::integer * classification_indicator_id)::integer as "Segment3",
from 
    generate_series(
        '2013-03-25'::timestamp without time zone,
        '2013-04-01'::timestamp without time zone,
        '1 day'::interval
    ) data(d)
    left join
    classifications c on data.d::date = c.created::date
group by "data"
ORDER BY "data"

edited Apr 4, 2013 at 13:11

answered Apr 4, 2013 at 13:05

Clodoaldo Neto

127k30 gold badges251 silver badges274 bronze badges

3 Comments

Erwin Brandstetter Over a year ago

This might be faster than multiple joins. CASE would be even faster.

Clodoaldo Neto Over a year ago

@Erwin MIGHT ??? You must be kidding. Or my english is precarious and don't get the meaning of might :))

ssuperczynski Over a year ago

I will test it too, and i will let you know

Collectives™ on Stack Overflow

generate_series() not working as expected with sum in PostgreSQL

2 Answers 2

4 Comments

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related