17

I have a slots table like this :

   Column   |            Type             |
------------+-----------------------------+
 id         | integer                     |
 begin_at   | timestamp without time zone |
 end_at     | timestamp without time zone |
 user_id    | integer                     |

and I like to select merged rows for continuous time. Let's say I have (simplified) data like :

(1, 5:15, 5:30, 1)
(2, 5:15, 5:30, 2)
(3, 5:30, 5:45, 2)
(4, 5:45, 6:00, 2)
(5, 8:15, 8:30, 2)
(6, 8:30, 8:45, 2)

I would like to know if it's possible to select rows formatted like :

(5:15, 5:30, 1)
(5:15, 6:00, 2) // <======= rows id 2,3 and 4 merged
(8:15, 8:45, 2) // <======= rows id 5 and 6 merged

EDIT: Here's the SQLfiddle

I'm using Postgresql, version 9.3!

Thank you!

3
  • Can you sqlfiddle your data? Commented Oct 12, 2014 at 20:30
  • Similar question for SQL Server stackoverflow.com/questions/7520556/… Commented Oct 12, 2014 at 21:04
  • Use a recursive query, start at front-unjoinable begin-stubs. Extend by joining following segment. Final select only the longest paths. Commented Oct 12, 2014 at 21:10

2 Answers 2

15

Here is one method for solving this problem. Create a flag that determines if a one record does not overlap with the previous one. This is the start of a group. Then take the cumulative sum of this flag and use that for grouping:

select user_id, min(begin_at) as begin_at, max(end_at) as end_at
from (select s.*, sum(startflag) over (partition by user_id order by begin_at) as grp
      from (select s.*,
                   (case when lag(end_at) over (partition by user_id order by begin_at) >= begin_at
                         then 0 else 1
                    end) as startflag
            from slots s
           ) s
     ) s
group by user_id, grp;

Here is a SQL Fiddle.

Sign up to request clarification or add additional context in comments.

Comments

2

Gordon Linoff already provided the answer (I upvoted).

I've used the same approach, but wanted to deal with tsrange type. So I came up with this construct:

SELECT min(id) b_id, min(begin_at) b_at, max(end_at) e_at, grp, user_id
  FROM (
    SELECT t.*, sum(g) OVER (ORDER BY id) grp
      FROM (
        SELECT s.*, (NOT r -|- lag(r,1,r)
                     OVER (PARTITION BY user_id ORDER BY id))::int g
          FROM (SELECT id,begin_at,end_at,user_id,
                       tsrange(begin_at,end_at,'[)') r FROM slots) s
      ) t
  ) u
 GROUP BY grp, user_id
 ORDER BY grp;

Unfortunately, on the top level one has to use min(begin_at) and max(end_at), as there're no aggregate functions for the range-based union operator +.

I create ranges with exclusive upper bounds, this allows me to use “is adjacent to” (-|-) operator. I compare current tsrange with the one on the previous row, defaulting to the current one in case there's no previous. Then I negate the comparison and cast to integer, which gives me 1 in cases when new group starts.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.