2

If I have a PostgreSQL table that has columns for datetime and for an array of items, such as:

| time                       | items                  |
| -------------------------- | ---------------------- |
| 2020-12-06 11:31:38.000    |  {item1, item2}        |
| 2020-12-06 11:48:11.304    |  {item1}               |
| 2020-12-06 11:48:48.654    |  {item1, item2, item3} |
| 2020-12-06 11:49:50.355    |  {item2}               |
| 2020-12-06 11:55:31.842    |  {item1, item2}        |

How can I query the table to aggregate the count of a specific item in equidistant time intervals?

For example, I'd like to count the occurrences of item1 in 5 minute intervals, so that the query result looks like this:

| start_time                 | end_time                            | item1 count     |
| -------------------------- | ----------------------------------- | --------------- |
| 2020-12-06 11:30:00.000    |  2020-12-06 11:34:99.999            |       1         |
| 2020-12-06 11:35:00.000    |  2020-12-06 11:39:99.999            |       0         |
| 2020-12-06 11:40:00.000    |  2020-12-06 11:44:99.999            |       0         |
| 2020-12-06 11:45:00.000    |  2020-12-06 11:49:99.999            |       2         |
| 2020-12-06 11:50:00.000    |  2020-12-06 11:54:99.999            |       0         |
| 2020-12-06 11:55:00.000    |  2020-12-06 11:59:99.999            |       1         |

I'm having a tough time trying to figure out what query can help me achieve this in the most optimal way. I've been thinking that Postgres' date_trunc or grid might help with this, but am really not sure how to approach the problem. Any suggestions?

2
  • A five minute interval can be anchored anywhere, so do you want it starting on the hour or some other starting point? Commented Dec 7, 2020 at 19:32
  • Arbitrary starting point was what I was going for Commented Dec 8, 2020 at 19:50

2 Answers 2

1

You can use generate_series() to generate the timestamps. Then unnest, filter and aggregate:

select gs.ts, count(i.time) as num_item1
from generate_series('2020-12-06 11:30:00.000'::timestamp, '2020-12-06 11:55:00.000', interval '5 minute') gs(ts) left join
     (items i join lateral
      unnest(i.items) item
      on item = 'item1'
     )
     on i.time >= gs.ts and i.time < gs.ts + interval '5 minute'
group by gs.ts
order by 1;

Here is a db<>fiddle.

Sign up to request clarification or add additional context in comments.

2 Comments

Works great, you're a wizard! Minor question: if I needed to further filter my query by some additional column, could I simply do something like on item = 'item1' and otherColumn = 'someValue'?
@user12533955 . . . I think that will work.
0

For postgres >= 14 you should probably use date_bin. As for the 'item1', if you don't care for it possibly being listed multiple times in a single row and just want to count rows where it's present, a simple filter will suffice:

select 
    date_bin('5 minutes', time)
    , count(*) filter (where 'item1' in any(items))
from table
group by 1

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.