1

I thought I understood GENERATE_SERIES(), but obviously not as well as I thought.

I have a table:

CREATE TABLE status
(
  start_tz  TIMESTAMPTZ NOT NULL,
  end_tz    TIMESTAMPTZ NOT NULL,
  ms        TEXT        NOT NULL,
  sensor_id SMALLINT    NOT NULL,
);

And I wish to fill this table with simulated data from 2022-01-01 00:00:00 in 20 minute intervals for 4 sensors (numbered 1 - 4) for 2 hours (or whatever - no hardcoding). The ms text is random - see below.

I have tried various combinations of

SELECT
  GENERATE_SERIES('2022-01-01 00:00:00', '2022-01-01 02:00:00', (INTERVAL '20 MINUTES')),
  GENERATE_SERIES('2022-01-01 00:20:00', '2022-01-01 02:20:00', (INTERVAL '20 MINUTES')),
  CASE
     WHEN random() > 0.5 THEN 'in_motion)'
    ELSE                     'stationary'
  END,
  GENERATE_SERIES(1, 4);

and just can't get it. I also have various cominations and permutations of FROM GENERATE_SERIES(1, ... whatever guess...); and I'm stumped.

What I want looks like this:

start_tz                 end_tz                 ms      sensor_id
2022-01-01 00:00:00+00  2022-01-01 00:20:00+00  stationary  1
2022-01-01 00:20:00+00  2022-01-01 00:40:00+00  stationary  1
2022-01-01 00:40:00+00  2022-01-01 01:00:00+00  in_motion)  1
2022-01-01 01:00:00+00  2022-01-01 01:20:00+00  in_motion)  1
2022-01-01 01:20:00+00  2022-01-01 01:40:00+00  stationary  1
2022-01-01 01:40:00+00  2022-01-01 02:00:00+00  in_motion)  1
2022-01-01 02:00:00+00  2022-01-01 02:20:00+00  stationary  1
2022-01-01 00:00:00+00  2022-01-01 00:20:00+00  stationary  2
2022-01-01 00:20:00+00  2022-01-01 00:40:00+00  stationary  2
2022-01-01 00:40:00+00  2022-01-01 01:00:00+00  in_motion)  2
...
...
and so on

A fiddle is available here. The correct answer will give some explanation of where I was going wrong - I don't want to be given a fish, I would like to be taught how to do a little fishing of my own!

2
  • 2
    There's another StackExchange site for that Commented Sep 24, 2022 at 12:08
  • 1
    You need to use SELECT ... FROM GENERATE_SERIES, not SELECT GENERATE_SERIES, this is because GENERATE_SERIES returns a table, not a scalar: dbfiddle.uk/CyAMiDpy Commented Sep 24, 2022 at 12:24

2 Answers 2

1

The behavior you want would have been produced in 9.6, but perhaps only by accident. Back then multiple SRF in the select list would be cycled until all of them were exhausted at the same time, returning a number of rows which was the Least Common Multiple of each isolated row counts. Because 7 is prime, this would do what you apparently want. This was changed in v10, as a known side effect of other refactoring I think, so that each SRF was cycle just once with the "short" ones being augmented with NULLs.

To get what you want you can put one of the SRF into the FROM list:

SELECT
  GENERATE_SERIES('2022-01-01 00:00:00', '2022-01-01 02:00:00', (INTERVAL '20 MINUTES')),
  GENERATE_SERIES('2022-01-01 00:20:00', '2022-01-01 02:20:00', (INTERVAL '20 MINUTES')),
  CASE
     WHEN random() > 0.5 THEN 'in_motion)'
    ELSE                     'stationary'
  END, 
  f 
  FROM 
  GENERATE_SERIES(1, 4) f(f);

But I think it is still discouraged to put SRF in the SELECT-list. The problem is that putting both of the time ones in the FROM-list would return 7*7*4 rows, not LCM(7,7,4) rows. To get around that, you would reduce it to one SRF for the time, and use an expression to get the parallel offset time point.

SELECT
      s,
      s+INTERVAL '20 MINUTES',
      CASE
         WHEN random() > 0.5 THEN 'in_motion)'
        ELSE                     'stationary'
      END, 
      f 
      FROM GENERATE_SERIES('2022-01-01 00:00:00', '2022-01-01 02:00:00', (INTERVAL '20 MINUTES')) s(s), 
      GENERATE_SERIES(1, 4) f(f);

So this is the way I would do it. It is also less repetitive, repeating only the interval rather than the offsat start, offsat stop, and interval.

Sign up to request clarification or add additional context in comments.

1 Comment

Ahh... yes - I see - there's a CROSS JOIN going on there! Very nice - elegant, short - tha's given me considerable insight - thank you for that +1 and a correct tick!
1
  • GENERATE_SERIES returns a table, so to get rows from it you should only use it in the FROM clause of a query, not the SELECT part.
    • The generated table has a single column named "generate_series" - because that's also the name of the function itself it's a good idea to escape the column name (with double-quotes) to avoid potential trouble.
  • Because your end_tz column values are always 20-minutes ahead of start_tz you don't (and shouldn't) use GENERATE_SERIES again, instead define a new query column that simply adds 20 minutes to the already-generated start_tz value.
  • Multiplying the generated rows for each sensor_id can be done by re-querying that generated data 4 times over with UNION.
  • Finally, add the 'in_motion'/stationary' column in the last query step.

Like so:

WITH twentyMinuteIntervals AS (
    SELECT
        "generate_series" AS d0,
        ( "generate_series" + INTERVAL '20 MINUTES' ) AS d1
    FROM
        GENERATE_SERIES( '2022-01-01 00:00:00', '2022-01-01 02:00:00', INTERVAL '20 MINUTES' )
),
multiplyRowsBy4 AS (
    SELECT d0, d1, 1 AS "sensor_id" FROM twentyMinuteIntervals
    UNION
    SELECT d0, d1, 2 AS "sensor_id" FROM twentyMinuteIntervals
    UNION
    SELECT d0, d1, 3 AS "sensor_id" FROM twentyMinuteIntervals
    UNION
    SELECT d0, d1, 4 AS "sensor_id" FROM twentyMinuteIntervals
),
withRandom AS (
    SELECT
        d0 AS "start_tz",
        d1 AS "end_tz",
        "sensor_id",
        CASE WHEN random() > 0.5 THEN 'in_motion' ELSE 'stationary' END AS "status"
    FROM
        multiplyRowsBy4
)
SELECT
    *
FROM
    withRandom
ORDER BY
    "start_tz",
    "sensor_id";

Here's the DBFiddle, screenshot below:

enter image description here

3 Comments

Your answer has given me a bit more insight, but I did say "no hardcoding"! Also, I would have given a +1 for the humour of your teasing me with the link to the fishing group - but that's OK - I think it means that my metaphor found its mark! I'm giving your answer a +1 on merit - but @jjanes is considerably more elegant and has no hardcode wired into it - I do actually want way more than just 4 sensors!
@Vérace What hardcoding are you referring to? If you mean the UNION part, then that can be replaced with a CROSS JOIN to another GENERATE_SERIES call to generate the Cartesian product.
Which is exactly what @jjanes' code does!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.