2

I have some date/time data that has overlapping minutes/hours and I want to calculate how much there is. I'm a little stumped on how to do this. I prefer a tidyverse solution because I'm more familiar with reading it, but anything that works will be helpful. Example code / expected outcome below:

library(tidyverse)
library(lubridate)

main_start <- c("2024-01-01 4:50:00 PM", "2024-03-22 11:00:00 AM")
main_end <- c("2024-01-01 11:40:00 PM", "2024-03-22 9:00:00 PM")

second_start <- c("2024-01-01 2:00:00 PM", "2024-03-22 12:00:00 PM")
second_end <- c("2024-01-02 12:15:00 AM", "2024-03-22 8:00:00 PM")

third_start <- c("2024-01-01 8:00:00 AM", "2024-03-22 8:00:00 AM")
third_end <- c("2024-01-01 5:00:00 PM", "2024-03-22 5:00:00 PM")

df <- tibble::tibble(main_start, main_end,
                     second_start, second_end,
                     third_start, third_end) %>% 
  mutate(main_start = ymd_hms(main_start),
         main_end = ymd_hms(main_end),
         second_start = ymd_hms(second_start),
         second_end = ymd_hms(second_end),
         third_start = ymd_hms(third_start),
         third_end = ymd_hms(third_end))

I want to find the total amount of time that MAIN time had overlapped with SECOND or THIRD

If there is a double overlap it only needs to count once

For example, in row 1

  • Main time is from 4:50 PM to 11:40 PM.
  • Second is from 2:00 PM to 12:15 AM the next morning.
  • Third is from 8:00 AM to 5:00 PM

The entire main time is overlapping at some point with either of the other two. There is some double overlap, but I don't need that So the expected output would be 6 hours 50 minutes, or it can be in decimal form - doesn't really matter

Row 2

  • Main time 11:00 AM to 9:00 PM. Second 12:00 PM to 8:00 PM. Third 8:00 AM to 5:00 PM
  • The overlap here is from 11:00 AM to 8:00 PM, so it the result should be a flat 9 hours.

The only way I can think to calculate it is by listing every single minute between each option and finding the duplicates

0

3 Answers 3

3

I am sure there are more elegant approaches that follow the tidy philosophy (i.e., pivoting to longer, etc) that will hopefully be added here, but I recently had to do a similar thing in my data and used the IRanges package with purrr::pmap_dbl (which is part of the tidyverse) for a somewhat brute force (i.e., lazy) approach:

Create function to find the ranges and overlap:

overlap_fun <- function(startM, endM, start2, end2, start3, end3){
  mrange <- IRanges::IRanges(start = as.numeric(startM), end = as.numeric(endM))
  range2 <- IRanges::IRanges(start = as.numeric(start2), end = as.numeric(end2))
  range3 <- IRanges::IRanges(start = as.numeric(start3), end = as.numeric(end3))
  range23 <- IRanges::reduce(c(range2, range3))
  
  sum(IRanges::width(IRanges::pintersect(mrange, range23)))/3600
}

Then use pmap :

df$overlap_hours <- purrr::pmap_dbl(list(
  startM = df$main_start,
  endM = df$main_end,
  start2 = df$second_start,
  end2 = df$second_end,
  start3 = df$third_start,
  end3 = df$third_end), 
  overlap_fun)

Output:

# A tibble: 2 × 7
  main_start          main_end            second_start        second_end          third_start         third_end           overlap_hours
  <dttm>              <dttm>              <dttm>              <dttm>              <dttm>              <dttm>                      <dbl>
1 2024-01-01 16:50:00 2024-01-01 23:40:00 2024-01-01 14:00:00 2024-01-02 00:15:00 2024-01-01 08:00:00 2024-01-01 17:00:00          6.83
2 2024-03-22 11:00:00 2024-03-22 21:00:00 2024-03-22 12:00:00 2024-03-22 20:00:00 2024-03-22 08:00:00 2024-03-22 17:00:00          9.00

Data

df <- structure(list(main_start = structure(c(1704127800, 1711105200
), class = c("POSIXct", "POSIXt"), tzone = "UTC"), main_end = structure(c(1704152400, 
1711141200), class = c("POSIXct", "POSIXt"), tzone = "UTC"), 
    second_start = structure(c(1704117600, 1711108800), class = c("POSIXct", 
    "POSIXt"), tzone = "UTC"), second_end = structure(c(1704154500, 
    1711137600), class = c("POSIXct", "POSIXt"), tzone = "UTC"), 
    third_start = structure(c(1704096000, 1711094400), class = c("POSIXct", 
    "POSIXt"), tzone = "UTC"), third_end = structure(c(1704128400, 
    1711126800), class = c("POSIXct", "POSIXt"), tzone = "UTC")), row.names = c(NA, 
-2L), class = c("tbl_df", "tbl", "data.frame"))
Sign up to request clarification or add additional context in comments.

Comments

2

Pure dplyr:

df |>
  mutate(
    across(c(second_start, third_start), ~ pmax(main_start, pmin(main_end, .x))),
    across(c(second_end, third_end),     ~ pmin(main_end, pmax(main_start, .x))),
    third_start = if_else(between(third_start, second_start, second_end),
                          pmin(second_end, third_end), third_start),
    third_end   = if_else(between(third_end, second_start, second_end),
                          pmax(second_start, third_start), third_end),
    d = difftime(second_end, second_start, units = "hours") +
      difftime(third_end, third_start, units = "hours")
  )
# # A tibble: 2 × 7
#   main_start          main_end            second_start        second_end          third_start         third_end           d             
#   <dttm>              <dttm>              <dttm>              <dttm>              <dttm>              <dttm>              <drtn>        
# 1 2024-01-01 16:50:00 2024-01-01 23:40:00 2024-01-01 16:50:00 2024-01-01 23:40:00 2024-01-01 17:00:00 2024-01-01 17:00:00 6.833333 hours
# 2 2024-03-22 11:00:00 2024-03-22 21:00:00 2024-03-22 12:00:00 2024-03-22 20:00:00 2024-03-22 11:00:00 2024-03-22 12:00:00 9.000000 hours

(In retrospect, the code loosely follows user2554330's commentary.)


Data

df <- structure(list(main_start = structure(c(1704127800, 1711105200), class = c("POSIXct", "POSIXt"), tzone = "UTC"), main_end = structure(c(1704152400, 1711141200), class = c("POSIXct", "POSIXt"), tzone = "UTC"), second_start = structure(c(1704117600, 1711108800), class = c("POSIXct", "POSIXt"), tzone = "UTC"), second_end = structure(c(1704154500, 1711137600), class = c("POSIXct", "POSIXt"), tzone = "UTC"), third_start = structure(c(1704096000, 1711094400), class = c("POSIXct", "POSIXt"), tzone = "UTC"),      third_end = structure(c(1704128400, 1711126800), class = c("POSIXct", "POSIXt"), tzone = "UTC")), row.names = c(NA, -2L), class = c("tbl_df", "tbl", "data.frame"))

Comments

1

With {ivs} functions.

Y =
  X |> 
  dplyr::mutate(ID = dplyr::row_number()) |>
  tidyr::pivot_longer(-ID, names_to=c('type', '.value'), names_sep='_') |>
  dplyr::mutate(i = ivs::iv(start, end), 
                type = dplyr::if_else(type!='main', 'other', type)) |>
  dplyr::reframe(u = ivs::iv_set_union(i, i), .by=c(ID, type)) |>
  tidyr::pivot_wider(names_from=type, values_from=u) |>
  dplyr::mutate(d = { o=ivs::iv_set_intersect(main, other);
  ivs::iv_end(o)-ivs::iv_start(o) }, .by=ID) 
> Y
# A tibble: 2 × 4
    ID                                       main                                      other d             
 <int>                                 <iv<dttm>>                                 <iv<dttm>> <drtn>        
1     1 [2024-01-01 16:50:00, 2024-01-01 23:40:00) [2024-01-01 08:00:00, 2024-01-02 00:15:00) 6.833333 hours
2     2 [2024-03-22 11:00:00, 2024-03-22 21:00:00) [2024-03-22 08:00:00, 2024-03-22 20:00:00) 9.000000 hours

I am no {tidyverse}-expert, the pivoting is not needed but there for demonstration.


Input

X = structure(list(main_start = structure(c(1704127800, 1711105200
), class = c("POSIXct", "POSIXt"), tzone = "UTC"), main_end = structure(c(1704152400, 
1711141200), class = c("POSIXct", "POSIXt"), tzone = "UTC"), 
    second_start = structure(c(1704117600, 1711108800), class = c("POSIXct", 
    "POSIXt"), tzone = "UTC"), second_end = structure(c(1704154500, 
    1711137600), class = c("POSIXct", "POSIXt"), tzone = "UTC"), 
    third_start = structure(c(1704096000, 1711094400), class = c("POSIXct", 
    "POSIXt"), tzone = "UTC"), third_end = structure(c(1704128400, 
    1711126800), class = c("POSIXct", "POSIXt"), tzone = "UTC")), row.names = c(NA, 
-2L), class = c("tbl_df", "tbl", "data.frame"))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.