0

I have two tables:

Table 1: A table containing a listing of event occurrences with unique IDs per unique event (event_id), the date of the event (date_event), and each person's gender (gender):

person| event_id| date_event | gender|
----- |---------|------------|-------|
a     |  86i    | 2012-01-25 |   m   |
a     |  87i    | 2012-05-30 |   m   |
a     |  88i    | 2012-09-20 |   m   |
a     |  89i    | 2012-12-20 |   m   |
b     |  15i    | 2015-04-06 |   f   |
b     |  16i    | 2016-07-06 |   f   |
b     |  17i    | 2016-04-30 |   f   |
b     |  18i    | 2016-11-28 |   f   |
----- |---------|------------|-------|

Table 2: A table of dates listing the start dates (date_start) and end dates (date_end), representing ranges of time in which each person was enrolled in a program:

person| date_start | date_end   |
----- |------------|------------|
a     | 2012-02-05 | 2012-03-30 |
a     | 2012-06-26 | 2012-08-28 |
a     | 2012-09-15 | 2012-12-31 |
b     | 2015-01-24 | 2015-03-30 |
b     | 2016-07-01 | 2016-10-01 |
b     | 2016-11-25 | 2016-12-30 |
----- |------------|------------|

I would like to be able to filter Table 1 to only contain records of events that occurred when the person was enrolled in a program, which would be specified as date_event being between date_start and date_end. My desired resulting table would look like:

Desired result:

person| event_id| date_event | gender|
----- |---------|------------|-------|
a     |  88i    | 2012-09-20 |   m   |
a     |  89i    | 2012-12-20 |   m   |
b     |  16i    | 2016-07-06 |   f   |
b     |  18i    | 2016-11-28 |   f   |
----- |---------|------------|-------|

I have tried several variations of the following to no avail:

CREATE TABLE temptable AS
SELECT * 
FROM table1 t1
LEFT JOIN table2 t2 
USING(person);

CREATE TABLE desiredresult AS
SELECT * 
FROM temptable 
WHERE date_event BETWEEN date_start AND date_end
GROUP BY person;

I will admit that I am definitely a SQL novice and am used to using R where I can achieve my desired result using the below code. However, the dataset I'm using is gigantic so I need to figure out how to perform these operations on a SQL server.

library(dplyr)
library(lubridate)

desiredresult <- table1 %>%
    left_join(table2) %>%
    group_by(person) %>%
    mutate(keep_record = date_event %within% interval(start_date, end_date)) %>%
    filter(keep_record == TRUE)

Thanks in advance.

1 Answer 1

1

There is no reason for creating a temp table, and spend the processing time doing so. Just select directly from the JOIN. (See Demd)

select t1.* 
  from table1 t1
  join table2 t2 
    on (     t1.person = t2.person
         and t1.date_event between t2.date_start and t2.date_end 
       ) ;   
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.