0

I have a table with events with first_seen (datetime), last_seen (datetime) and severity (an integer).

I'm trying to find how many events were active in discreet 15 minute intervals:

WITH intervals AS
  (
      SELECT
        '2016-08-02 00:00:00'::TIMESTAMP
            + (n||' minutes')::INTERVAL AS start_time,
        '2016-08-02 00:00:00'::TIMESTAMP
            + ((n + 15)||' minutes')::INTERVAL AS end_time
      FROM generate_series(0, 24 * 60, 15) n
  )
SELECT
  start_time,
  (SELECT count(*) FROM event
     WHERE first_seen < end_time AND last_seen > start_time
       AND severity = 5) red,
  (SELECT count(*) FROM event
     WHERE first_seen < end_time AND last_seen > start_time
       AND severity = 4) orange,
  (SELECT count(*) FROM event
     WHERE first_seen < end_time AND last_seen > start_time
       AND severity = 3) yellow
FROM intervals;

I also have an index on (first_seen, last_seen, severity).
My problem is that it seems to be a be too slow.
The table has about 100 thousand rows, and to make 100 intervals takes 10 seconds. The index scan seems too slow.

Any ideas how to optimize this query?

1 Answer 1

1

The best thing is to get rid of the subselects.

Try something like the following (untested, so it may contain errors):

WITH intervals AS
  (
      SELECT
        '2016-08-02 00:00:00'::TIMESTAMP
            + (n||' minutes')::INTERVAL AS start_time,
        '2016-08-02 00:00:00'::TIMESTAMP
            + ((n + 15)||' minutes')::INTERVAL AS end_time
      FROM generate_series(0, 24 * 60, 15) n
  )
SELECT
   start_time,
   sum(CASE WHEN severity = 5 THEN 1 ELSE 0 END) red,
   sum(CASE WHEN severity = 4 THEN 1 ELSE 0 END) orange,
   sum(CASE WHEN severity = 3 THEN 1 ELSE 0 END) yellow
FROM event
   RIGHT OUTER JOIN intervals
      ON first_seen < end_time AND last_seen > start_time
GROUP BY start_time;
ORDER BY start_time;

You may be able to speed up things by two indexes on first_seen and last_seen. A multicolumn index will not help.

Sign up to request clarification or add additional context in comments.

9 Comments

Note: generate_series() works with timestamps, too: WITH intervals AS ( SELECT gs AS start_time , gs + '15 minutes' AS end_time FROM generate_series('2016-08-02 00:00:00'::TIMESTAMP ,'2016-08-03 00:00:00'::TIMESTAMP , '15 minutes'::INTERVAL) gs )
Yes. I left that part the same to reduce potential confusion.
Wow, that's 2 times faster. Thanks :D.
I have added hints for potential indexes.
I also needed the intervals that have sum 0 so I added a RIGHT JOIN instead of join. Also an ORDER BY start_time at the end.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.