0

I have the below Dataset which looks like this.

t               mean        max     min     std     data_id
4/14/2010 0:00  12.6941 12.6941 12.6941 12.6941          1
4/14/2010 0:00  12.3851 12.3851 12.3851 12.3851          2
4/14/2010 0:20  12.389  12.389  12.389  12.389           1
4/14/2010 0:20  12.1836 12.1836 12.1836 12.1836          2
4/14/2010 0:20  11.3887 11.3887 11.3887 11.3887          4

I want to transform the data to

t,str_agg
'2010-04-14 00:00:00','12.6941','12.6941','12.6941','12.6941','12.3851','12.3851','12.3851','12.3851',,,,
'2010-04-14 00:10:00','12.3890','12.3890','12.3890','12.3890','12.1836','12.1836','12.1836','12.1836','11.3887','11.3887','11.3887','11.3887

I have tried the below query:-

WITH dataset AS (
    SELECT *
    FROM
        (
            VALUES
            ('2010-04-14T00:00'::TIMESTAMP, 12.6941, 12.6941, 12.6941, 12.6941, 1),
            ('2010-04-14T00:00'::TIMESTAMP, 12.3851, 12.3851, 12.3851, 12.3851, 2),
            ('2010-04-14T00:20'::TIMESTAMP, 12.389, 12.389, 12.389, 12.389, 1),
            ('2010-04-14T00:20'::TIMESTAMP, 12.1836, 12.1836, 12.1836, 12.1836, 2),
            ('2010-04-14T00:20'::TIMESTAMP, 11.3887, 11.3887, 11.3887, 11.3887, 13)
        ) AS data(t, mean, max, min, std, data_id)
),
dataset_full AS (
    SELECT
        coalesce(t, time) AS t,
        mean,
        max,
        min,
        std,
        data_id
    FROM
        generate_series(
                (SELECT min(t) FROM dataset),
                (SELECT max(t) FROM dataset),
                '10 minutes')
            AS times(time)
        CROSS JOIN generate_series(
                       (SELECT min(data_id) FROM dataset),
                       (SELECT max(data_id) FROM dataset))
            AS data_id(id)
        LEFT JOIN dataset ON times.time = dataset.t AND data_id.id = dataset.data_id
)
SELECT
    t,
    string_agg(concat(mean, ',', max, ',', min, ',', std), ',')
FROM dataset_full
GROUP BY t
ORDER BY t;

And i get the below result :-

'2010-04-14 00:00:00','12.6941,12.6941,12.6941,12.6941,12.3851,12.3851,12.3851,12.3851,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,'
'2010-04-14 00:10:00',',,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,'
'2010-04-14 00:20:00','12.389,12.389,12.389,12.389,12.1836,12.1836,12.1836,12.1836,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,11.3887,11.3887,11.3887,11.3887'

But i want the below result :-

'2010-04-14 00:00:00','12.6941,12.6941,12.6941,12.6941,12.3851,12.3851,12.3851,12.3851,,,,'
'2010-04-14 00:20:00','12.389,12.389,12.389,12.389,12.1836,12.1836,12.1836,12.1836,11.3887,11.3887,11.3887,11.3887'

Can anyone please help me in resolving the above issue!

2 Answers 2

1

Your basic problem is that you are generating the time intervals and data set ids rather than reading them from the data. This affects the dataset_full CTE. You seem to want only the values that are somewhere in the data.

Hence:

with dataset as (
      select *
      from (values ('2010-04-14T00:00'::TIMESTAMP, 12.6941, 12.6941, 12.6941, 12.6941, 1),
                   ('2010-04-14T00:00'::TIMESTAMP, 12.3851, 12.3851, 12.3851, 12.3851, 2),
                   ('2010-04-14T00:20'::TIMESTAMP, 12.389, 12.389, 12.389, 12.389, 1),
                   ('2010-04-14T00:20'::TIMESTAMP, 12.1836, 12.1836, 12.1836, 12.1836, 2),
                   ('2010-04-14T00:20'::TIMESTAMP, 11.3887, 11.3887, 11.3887, 11.3887, 13)
           ) AS data(t, mean, max, min, std, data_id)
      ),
     dataset_full as (
       select t.t, d.data_id,
              ds.mean, ds.max, ds.min, ds.std
       from (select distinct t from dataset) t cross join
            (select distinct data_id from dataset) d left join
            dataset ds
            on ds.t = t.t and ds.data_id = d.data_id
     )
select t,string_agg(concat(mean, ',', max, ',', min, ',', std), ',' order by data_id)
from dataset_full
group by t
order by t;

Here is the SQL Fiddle.

Also note the order by in the string_agg(). Presumably, you want these values in the order of dataset_id.

Sign up to request clarification or add additional context in comments.

Comments

0

You could replace LEFT JOIN with JOIN:

WITH dataset AS (
    SELECT *
    FROM(VALUES
     ('2010-04-14T00:00'::TIMESTAMP, 12.6941, 12.6941, 12.6941, 12.6941, 1),
     ('2010-04-14T00:00'::TIMESTAMP, 12.3851, 12.3851, 12.3851, 12.3851, 2),
     ('2010-04-14T00:20'::TIMESTAMP, 12.389, 12.389, 12.389, 12.389, 1),
     ('2010-04-14T00:20'::TIMESTAMP, 12.1836, 12.1836, 12.1836, 12.1836, 2),
     ('2010-04-14T00:20'::TIMESTAMP, 11.3887, 11.3887, 11.3887, 11.3887, 13)
     ) AS data(t, mean, max, min, std, data_id)
),
dataset_full AS (
    SELECT coalesce(t, time) AS t,
        mean,
        max,
        min,
        std,
        data_id
    FROM generate_series(
                (SELECT min(t) FROM dataset),
                (SELECT max(t) FROM dataset),
                '10 minutes')
            AS times(time)
        CROSS JOIN generate_series(
                       (SELECT min(data_id) FROM dataset),
                       (SELECT max(data_id) FROM dataset))
            AS data_id(id)
        JOIN dataset    -- here
          ON times.time = dataset.t 
         AND data_id.id = dataset.data_id
)
SELECT t,string_agg(concat(mean, ',', max, ',', min, ',', std), ',')
FROM dataset_full
GROUP BY t
ORDER BY t;

DBFiddle Demo

EDIT:

The ,,,, in the first line is getting deleted which i do not want

...
 cte2 AS (
  SELECT    t,   string_agg(concat(mean, ',', max, ',', min, ',', std), ',') AS s
       , COUNT(*) AS c
  FROM dataset_full
  GROUP BY t
)
SELECT t, s|| REPEAT(',,,,', (MAX(c) OVER() - c)::int)
FROM cte2
ORDER BY t; 

DBFiddle Demo2

Output:

┌─────────────────────┬─────────────────────────────────────────────────────────────────────────────────────────────┐
│ t                   │ result                                                                                      │
├─────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────┤
│ 2010-04-14 00:00:00 │ 12.6941,12.6941,12.6941,12.6941,12.3851,12.3851,12.3851,12.3851,,,,                         │
│ 2010-04-14 00:20:00 │ 12.389,12.389,12.389,12.389,12.1836,12.1836,12.1836,12.1836,11.3887,11.3887,11.3887,11.3887 │
└─────────────────────┴─────────────────────────────────────────────────────────────────────────────────────────────┘

2 Comments

In that case i am getting the below result:- '2010-04-14 00:00:00','12.6941,12.6941,12.6941,12.6941,12.3851,12.3851,12.3851,12.3851' '2010-04-14 00:20:00','12.389,12.389,12.389,12.389,12.1836,12.1836,12.1836,12.1836,11.3887,11.3887,11.3887,11.3887' ... The ,,,, in the first line is getting deleted which i do not want
@Sunny Easy to fix

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.