2

i have a table with userIds and some dates like this : -

    | userId |       dates        |
    | 1      | 2021-06-20 00:00:00|
    | 1      | 2021-06-24 00:00:00|
    | 2      | 2021-06-25 00:00:00|
    | 2      | 2021-06-28 00:00:00|
    | 2      | 2021-06-30 00:00:00|
    | 3      | 2021-06-22 00:00:00|
    | 3      | 2021-06-24 00:00:00|
    | 3      | 2021-06-27 00:00:00|

I want to find first date for every user with userId that doesn't exist:-

expected output: -

    | userId |       dates        |
    | 1      | 2021-06-21 00:00:00|
    | 2      | 2021-06-26 00:00:00|
    | 3      | 2021-06-23 00:00:00|

I'm using postgres, can someone help as the data is pretty large, 4m+.

0

2 Answers 2

2

I think simplest method is lead() and aggregation:

select userid,
       min(date) + interval '1 day'
from (select t.*,
             lead(date) over (partition by userid order by date) as next_date
      from t
     ) t
where next_date is null or next_date <> date + interval '1 day'
group by userid;

Or using distinct on:

select distinct on (userid) userid, date + interval '1 day'
from (select t.*,
             lead(date) over (partition by userid order by date) as next_date
      from t
     ) t
where next_date is null or next_date <> date + interval '1 day'
order by userid, date;

You can also write the where clause as:

where next_date is distinct from date + interval '1 day'

Here is a db<>fiddle.

Sign up to request clarification or add additional context in comments.

Comments

0
select DISTINCT ON (userId) userId, dates from table_name order by dates

1 Comment

. . This only returns dates in the table. The OP is asking for dates that are not in the table.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.