want to make query using running sum for postgres

Question

I have a problem for creating a query for postgres(strictly speaking its redshift).
table data is below.
the table is PARTITION BY user_id ORDER BY created_at desc

data

user_id| x | y |  min |     created_at      
-------+---+---+------+---------------------
      1| 1 | 1 |    1 | 2015-01-15 17:26:53
      1| 1 | 1 |    2 | 2015-01-15 17:26:54
      1| 1 | 1 |    3 | 2015-01-15 17:26:55
      1| 2 | 1 |   10 | 2015-01-16 02:46:21
      1| 1 | 1 |   15 | 2015-01-16 02:46:22
      1| 3 | 3 |   11 | 2015-01-16 03:01:44
      1| 3 | 3 |    2 | 2015-01-16 03:02:06
      2| 1 | 1 |    3 | 2015-01-16 03:02:12
      2| 2 | 1 |    4 | 2015-01-16 03:02:15
      2| 2 | 1 |    7 | 2015-01-16 03:02:18

and what I want is below

ideal result

user_id| x | y |  sum_min |
-------+---+---+----------+
      1| 1 | 1 |        6 |
      1| 2 | 1 |       10 |
      1| 1 | 1 |       15 |
      1| 3 | 3 |       13 |
      2| 1 | 1 |        3 |
      2| 2 | 1 |       11 |

If I use simply group by user_id, x, y, the result of will be

 user_id| x | y |  sum_min |
 -------+---+---+----------+
       1| 1 | 1 |       21 |
       :| : | : |        : |

this is not good for me:(

"the table is PARTITION BY user_id ORDER BY created_at desc" does not make sense. That is part of a query, not part of a table definition. Please post the query you have so far. — user330315
– user330315, Commented Jan 22, 2015 at 7:01
do you want to consider x,y, and created_at for grouping or what ??? — Vivek S.
– Vivek S., Commented Jan 22, 2015 at 7:30
in your expected output why 1| 1 | 1 | 6 | and 1| 1 | 1 | 15 | are comes in different rows (i.e row 1 and 3) ??? — Vivek S.
– Vivek S., Commented Jan 22, 2015 at 7:36
x, y means user position. I want to calculate stay_min for each places. want to consider id,x,y for group by each partition. — mo12mo34
– mo12mo34, Commented Jan 22, 2015 at 7:41

Vivek S. · Accepted Answer · 2015-01-22 07:55:00Z

1

try this

with cte as (
select user_id,x,y,created_at,sum(min) over (partition by user_id,x,y,replace order by user_id )  sum_min  from (
select user_id,x,y,min,replace( created_at::date::text ,'-',''),created_at   from usr order by created_at
)t   order by created_at
)

select user_id,x,y,sum_min from cte 
group by sum_min,user_id,x,y
order by user_id

answered Jan 22, 2015 at 7:55

Vivek S.

22.3k9 gold badges71 silver badges87 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Phil Cairns · Accepted Answer · 2015-01-22 06:41:47Z

0

Maybe try grouping it by the creation date as well:

select user_id, x, y, sum(min), created_at::date from test
group by user_id, x, y, created_at::date
order by user_id, x, y, created_at

answered Jan 22, 2015 at 6:41

Phil Cairns

7681 gold badge6 silver badges15 bronze badges

1 Comment

mo12mo34 Over a year ago

in this case, your query seems to be good. but created_at is not always different days. Hence I think we have to compare to next line whether its same or not. thanks:)

Patrick · Accepted Answer · 2015-01-22 10:02:15Z

It seems that what you want to do is to calculate an aggregate function over a cluster of records ordered on a column that is based on same values in three columns, separated from other clusters only by those three column values. That is not possible in standard SQL because the order of records is not relevant to any of the SQL commands. The fact that you order by date does not change that: SQL commands simply do not support this kind of stratification.

The only option that I am aware of is to create a plpgsql function with a cursor on your data relation (presumably a view, but would work equally well with a table). You iterate over all the records in the relation and for each cluster encountered sum up the min values and output a new record with the clustering columns and the summed value.

CREATE FUNCTION sum_clusters()
RETURNS TABLE (user_id int, x int, y int, sum_int int) AS $$
DECLARE
  data_row data%ROWTYPE;
  cur CURSOR FOR SELECT * FROM data;
  cur_user integer;
  cur_x integer;
  cur_y integer;
  sum integer;
BEGIN
  OPEN cur;
  FETCH NEXT cur INTO data_row;
  LOOP
    IF NOT FOUND THEN
      EXIT;
    END IF;
    cur_user := data_row.user_id;
    cur_x := data_row.x;
    cur_y := data_row.y;
    sum := data_row.min;
    LOOP
      FETCH NEXT cur INTO data_row;
      IF NOT FOUND THEN
        EXIT;
      END IF;
      IF (data_row.user_id = cur_user) AND (data_row.x = cur_x) AND (data_row.y = cur_y) THEN
        sum += data_row.min;
      ELSE
        EXIT;
      END IF;
    END LOOP;
    RETURN NEXT cur_user, cur_x, cur_y, sum;
  END LOOP;
  RETURN;
END;
$$ LANGUAGE plpgsql;

That is a lot of code and not particularly fast, but it should work.

Collectives™ on Stack Overflow

want to make query using running sum for postgres

data

ideal result

3 Answers 3

Comments

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

data

ideal result

3 Answers 3

Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related