I’m working with machine events data in R, where each event has a start and end time, a unique event code, and a precedence level (ordered factor). Events may overlap, and I need to transform this into non-overlapping time intervals. Each resulting interval should inherit the highest precedence event active during that time.
Additionally:
- If there are gaps (i.e., no events) between intervals, those gaps should be filled with a dummy event (event_code = 0, precedence = Level 0).
- Contiguous intervals with the same event code and precedence should be merged to avoid fragmentation.
Here’s a reproducible example:
Input data (startstate)
startstate <- structure(list(
event_code = c(101, 202, 303, 202, 404),
start_time = structure(c(1735689600, 1735707600, 1735725600, 1735743600, 1735756200), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
end_time = structure(c(1735718400, 1735722000, 1735740000, 1735758000, 1735776000), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
precedence = structure(c(1L, 2L, 1L, 2L, 3L), levels = c("Level 1", "Level 2", "Level 3"), class = c("ordered", "factor"))
), class = "data.frame", row.names = c(NA, -5L))
> startstate
event_code start_time end_time precedence
1 101 2025-01-01 00:00:00 2025-01-01 08:00:00 Level 1
2 202 2025-01-01 05:00:00 2025-01-01 09:00:00 Level 2
3 303 2025-01-01 10:00:00 2025-01-01 14:00:00 Level 1
4 202 2025-01-01 15:00:00 2025-01-01 19:00:00 Level 2
5 404 2025-01-01 18:30:00 2025-01-02 00:00:00 Level 3
Desired output (endstate)
endstate <- structure(list(
event_code = c(101, 202, 0, 303, 0, 202, 404),
start_time = structure(c(1735689600, 1735707600, 1735722000, 1735725600, 1735740000, 1735743600, 1735756200), tzone = "UTC", class = c("POSIXct", "POSIXt")),
end_time = structure(c(1735707600, 1735722000, 1735725600, 1735740000, 1735743600, 1735756200, 1735776000), tzone = "UTC", class = c("POSIXct", "POSIXt")),
precedence = c("Level 1", "Level 2", "Level 0", "Level 1", "Level 0", "Level 2", "Level 3")
), row.names = c(NA, -7L), class = c("tbl_df", "tbl", "data.frame"))
> endstate
event_code start_time end_time precedence
1 101 2025-01-01 00:00:00 2025-01-01 05:00:00 Level 1
2 202 2025-01-01 05:00:00 2025-01-01 09:00:00 Level 2
3 0 2025-01-01 09:00:00 2025-01-01 10:00:00 Level 0
4 303 2025-01-01 10:00:00 2025-01-01 14:00:00 Level 1
5 0 2025-01-01 14:00:00 2025-01-01 15:00:00 Level 0
6 202 2025-01-01 15:00:00 2025-01-01 18:30:00 Level 2
7 404 2025-01-01 18:30:00 2025-01-02 00:00:00 Level 3
Edited: Fixed event_code typo in endstate. Thanks @jon-spring for spotting it.
I’ve had some success using the excellent ivs package (because of Right-open intervals) to untangle overlaps, but I run into trouble when:
- The result contains adjacent rows with the same event code and precedence, which should ideally be merged.
- I’m not sure of the cleanest way to insert filler rows for gaps.
Here’s an example of the undesired intermediate output:
event_code start_time end_time precedence
101 2025-01-01 00:00:00 2025-01-01 05:00:00 Level 1
202 2025-01-01 05:00:00 2025-01-01 08:00:00 Level 2
202 2025-01-01 08:00:00 2025-01-01 09:00:00 Level 2 <- should merge with row above
Any ideas on how to:
- Break overlapping intervals into disjoint ones with correct precedence,
- Insert zero-precedence filler intervals for gaps, and
- Collapse adjacent intervals with identical attributes?