0

I have a problem for creating a query for postgres(strictly speaking its redshift).
table data is below.
the table is PARTITION BY user_id ORDER BY created_at desc

data

user_id| x | y |  min |     created_at      
-------+---+---+------+---------------------
      1| 1 | 1 |    1 | 2015-01-15 17:26:53
      1| 1 | 1 |    2 | 2015-01-15 17:26:54
      1| 1 | 1 |    3 | 2015-01-15 17:26:55
      1| 2 | 1 |   10 | 2015-01-16 02:46:21
      1| 1 | 1 |   15 | 2015-01-16 02:46:22
      1| 3 | 3 |   11 | 2015-01-16 03:01:44
      1| 3 | 3 |    2 | 2015-01-16 03:02:06
      2| 1 | 1 |    3 | 2015-01-16 03:02:12
      2| 2 | 1 |    4 | 2015-01-16 03:02:15
      2| 2 | 1 |    7 | 2015-01-16 03:02:18

and what I want is below

ideal result

user_id| x | y |  sum_min |
-------+---+---+----------+
      1| 1 | 1 |        6 |
      1| 2 | 1 |       10 |
      1| 1 | 1 |       15 |
      1| 3 | 3 |       13 |
      2| 1 | 1 |        3 |
      2| 2 | 1 |       11 |

If I use simply group by user_id, x, y, the result of will be

 user_id| x | y |  sum_min |
 -------+---+---+----------+
       1| 1 | 1 |       21 |
       :| : | : |        : |

this is not good for me:(

5
  • 1
    "the table is PARTITION BY user_id ORDER BY created_at desc" does not make sense. That is part of a query, not part of a table definition. Please post the query you have so far. Commented Jan 22, 2015 at 7:01
  • do you want to consider x,y, and created_at for grouping or what ??? Commented Jan 22, 2015 at 7:30
  • in your expected output why 1| 1 | 1 | 6 | and 1| 1 | 1 | 15 | are comes in different rows (i.e row 1 and 3) ??? Commented Jan 22, 2015 at 7:36
  • x, y means user position. I want to calculate stay_min for each places. want to consider id,x,y for group by each partition. Commented Jan 22, 2015 at 7:41
  • I just want a query which returns the ideal result Commented Jan 22, 2015 at 7:44

3 Answers 3

1

try this

with cte as (
select user_id,x,y,created_at,sum(min) over (partition by user_id,x,y,replace order by user_id )  sum_min  from (
select user_id,x,y,min,replace( created_at::date::text ,'-',''),created_at   from usr order by created_at
)t   order by created_at
)

select user_id,x,y,sum_min from cte 
group by sum_min,user_id,x,y
order by user_id
Sign up to request clarification or add additional context in comments.

Comments

0

Maybe try grouping it by the creation date as well:

select user_id, x, y, sum(min), created_at::date from test
group by user_id, x, y, created_at::date
order by user_id, x, y, created_at

1 Comment

in this case, your query seems to be good. but created_at is not always different days. Hence I think we have to compare to next line whether its same or not. thanks:)
0

It seems that what you want to do is to calculate an aggregate function over a cluster of records ordered on a column that is based on same values in three columns, separated from other clusters only by those three column values. That is not possible in standard SQL because the order of records is not relevant to any of the SQL commands. The fact that you order by date does not change that: SQL commands simply do not support this kind of stratification.

The only option that I am aware of is to create a plpgsql function with a cursor on your data relation (presumably a view, but would work equally well with a table). You iterate over all the records in the relation and for each cluster encountered sum up the min values and output a new record with the clustering columns and the summed value.

CREATE FUNCTION sum_clusters()
RETURNS TABLE (user_id int, x int, y int, sum_int int) AS $$
DECLARE
  data_row data%ROWTYPE;
  cur CURSOR FOR SELECT * FROM data;
  cur_user integer;
  cur_x integer;
  cur_y integer;
  sum integer;
BEGIN
  OPEN cur;
  FETCH NEXT cur INTO data_row;
  LOOP
    IF NOT FOUND THEN
      EXIT;
    END IF;
    cur_user := data_row.user_id;
    cur_x := data_row.x;
    cur_y := data_row.y;
    sum := data_row.min;
    LOOP
      FETCH NEXT cur INTO data_row;
      IF NOT FOUND THEN
        EXIT;
      END IF;
      IF (data_row.user_id = cur_user) AND (data_row.x = cur_x) AND (data_row.y = cur_y) THEN
        sum += data_row.min;
      ELSE
        EXIT;
      END IF;
    END LOOP;
    RETURN NEXT cur_user, cur_x, cur_y, sum;
  END LOOP;
  RETURN;
END;
$$ LANGUAGE plpgsql;

That is a lot of code and not particularly fast, but it should work.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.