How can I aggregate a count of rows by time intervals in PostgreSQL?

Question

If I have a PostgreSQL table that has columns for datetime and for an array of items, such as:

| time                       | items                  |
| -------------------------- | ---------------------- |
| 2020-12-06 11:31:38.000    |  {item1, item2}        |
| 2020-12-06 11:48:11.304    |  {item1}               |
| 2020-12-06 11:48:48.654    |  {item1, item2, item3} |
| 2020-12-06 11:49:50.355    |  {item2}               |
| 2020-12-06 11:55:31.842    |  {item1, item2}        |

How can I query the table to aggregate the count of a specific item in equidistant time intervals?

For example, I'd like to count the occurrences of item1 in 5 minute intervals, so that the query result looks like this:

| start_time                 | end_time                            | item1 count     |
| -------------------------- | ----------------------------------- | --------------- |
| 2020-12-06 11:30:00.000    |  2020-12-06 11:34:99.999            |       1         |
| 2020-12-06 11:35:00.000    |  2020-12-06 11:39:99.999            |       0         |
| 2020-12-06 11:40:00.000    |  2020-12-06 11:44:99.999            |       0         |
| 2020-12-06 11:45:00.000    |  2020-12-06 11:49:99.999            |       2         |
| 2020-12-06 11:50:00.000    |  2020-12-06 11:54:99.999            |       0         |
| 2020-12-06 11:55:00.000    |  2020-12-06 11:59:99.999            |       1         |

I'm having a tough time trying to figure out what query can help me achieve this in the most optimal way. I've been thinking that Postgres' date_trunc or grid might help with this, but am really not sure how to approach the problem. Any suggestions?

A five minute interval can be anchored anywhere, so do you want it starting on the hour or some other starting point? — Adrian Klaver
– Adrian Klaver, Commented Dec 7, 2020 at 19:32

Gordon Linoff · Accepted Answer · 2020-12-07 19:33:08Z

1

You can use generate_series() to generate the timestamps. Then unnest, filter and aggregate:

select gs.ts, count(i.time) as num_item1
from generate_series('2020-12-06 11:30:00.000'::timestamp, '2020-12-06 11:55:00.000', interval '5 minute') gs(ts) left join
     (items i join lateral
      unnest(i.items) item
      on item = 'item1'
     )
     on i.time >= gs.ts and i.time < gs.ts + interval '5 minute'
group by gs.ts
order by 1;

Here is a db<>fiddle.

answered Dec 7, 2020 at 19:33

Gordon Linoff

1.3m62 gold badges706 silver badges857 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

user12533955 Over a year ago

Works great, you're a wizard! Minor question: if I needed to further filter my query by some additional column, could I simply do something like on item = 'item1' and otherColumn = 'someValue'?

Gordon Linoff Over a year ago

@user12533955 . . . I think that will work.

Nick S · Accepted Answer · 2024-08-21 12:21:36Z

0

For postgres >= 14 you should probably use date_bin. As for the 'item1', if you don't care for it possibly being listed multiple times in a single row and just want to count rows where it's present, a simple filter will suffice:

select 
    date_bin('5 minutes', time)
    , count(*) filter (where 'item1' in any(items))
from table
group by 1

answered Aug 21, 2024 at 12:21

Nick S

1031 silver badge7 bronze badges

Collectives™ on Stack Overflow

How can I aggregate a count of rows by time intervals in PostgreSQL?

2 Answers 2

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related