0

I want to create a table that returns the top 10 aggregate cons_name over a given week, that repeats every day.

So for 5/29/2019 it will pull the top 10 cons_name by their sum dating back to 5/22/2019.

Then, for 5/28/2019, the top 10 cons_name by their sum back to 5/21/2019.

A table of top 10 dating back 7 days all the way to 2018-12-01.

I can make the simple code dating back 7 days but, I have tried Windows to no avail.

SELECT cons_name,
       pricedate,
       sum(shadow)
FROM spp.rtbinds
WHERE pricedate >= current_date - 7
GROUP BY cons_name, shadow, pricedate
ORDER BY shadow asc
LIMIT 10

This query generates the output below

cons_name       pricedate               sum
"TEMP17_24078"  "2019-05-28 00:00:00"   "-1473.29723333333"
"TEMP17_24078"  "2019-05-28 00:00:00"   "-1383.56638333333"
"TMP175_24736"  "2019-05-23 00:00:00"   "-1378.40504166667"
"TMP159_24149"  "2019-05-23 00:00:00"   "-1328.847675"
"TMP397_24836"  "2019-05-23 00:00:00"   "-1221.19560833333"
"TEMP17_24078"  "2019-05-28 00:00:00"   "-1214.9914"
"TMP175_24736"  "2019-05-23 00:00:00"   "-1123.83254166667"
"TEMP72_22893"  "2019-05-29 00:00:00"   "-1105.93840833333"
"TMP164_23704"  "2019-05-24 00:00:00"   "-1053.051375"
"TMP175_24736"  "2019-05-27 00:00:00"   "-1043.52104166667"

I would like a table and function that returns a table of each day's top 10 dating back a week.

1
  • Some sample data with expected output would help. Commented May 29, 2019 at 18:32

1 Answer 1

1

Using window functions get's you on the right track but you should be reading further in the documentation about the possibilities.

We have multiple issues here that we need to solve:

  1. gaps in the data (missing pricedate) not get us the correct number of rows (7) to calculate the overall sum
  2. for the calculation itself we need all data rows so the WHERE clause cannot be used to limit only to the visible days
  3. in order to select the top-10 for each day, we have to generate a row number per partition because the LIMIT clause cannot be applied per group

This is why I came up with the following CTE's:

  1. CTE days: generate the gap-less date series and mark visible days
  2. CTE daily: LEFT JOIN the data to the generated days and produce daily sums (and handle NULL entries)
  3. CTE calc: produce the cumulative sums
  4. CTE numbered: produce row numbers reset each day
  5. select the actual visible rows and limit them to max. 10 per day

So for a specific week (2019-05-26 - 2019-06-01), the query will look like the following:

WITH
    days (c_day, c_visible, c_lookback) as (
        SELECT gen::date, (CASE WHEN gen::date < '2019-05-26' THEN false ELSE true END), gen::date - 6
        FROM generate_series('2019-05-26'::date - 6, '2019-06-01'::date, '1 day'::interval) AS gen
    ),
    daily (cons_name, pricedate, shadow_sum) AS (
        SELECT
            r.cons_name,
            r.pricedate::date,
            coalesce(sum(r.shadow), 0)
        FROM days
        LEFT JOIN spp.rtbinds AS r ON (r.pricedate::date = days.c_day)
        GROUP BY 1, 2
    ),
    calc (cons_name, pricedate, shadow_sum) AS (
        SELECT
            cons_name,
            pricedate,
            sum(shadow_sum) OVER (PARTITION BY cons_name ORDER BY pricedate ROWS BETWEEN 6 PRECEDING AND CURRENT ROW)
        FROM daily
    ),
    numbered (cons_name, pricedate, shadow_sum, position) AS (
        SELECT
            calc.cons_name,
            calc.pricedate,
            calc.shadow_sum,
            ROW_NUMBER() OVER (PARTITION BY calc.pricedate ORDER BY calc.shadow_sum DESC)
        FROM calc
    )
SELECT
    days.c_lookback,
    numbered.cons_name,
    numbered.shadow_sum
FROM numbered
INNER JOIN days ON (days.c_day = numbered.pricedate AND days.c_visible)
WHERE numbered.position < 11
ORDER BY numbered.pricedate DESC, numbered.shadow_sum DESC;

Online example with generated test data: https://dbfiddle.uk/?rdbms=postgres_11&fiddle=a83a52e33ffea3783207e6b403bc226a

Example output:

 c_lookback |  cons_name   |    shadow_sum    
------------+--------------+------------------
 2019-05-26 | TMP400_27000 | 4578.04474575352
 2019-05-26 | TMP700_25000 | 4366.56857151864
 2019-05-26 | TMP200_24000 | 3901.50325547671
 2019-05-26 | TMP400_24000 | 3849.39595793188
 2019-05-26 | TMP700_28000 | 3763.51693260809
 2019-05-26 | TMP600_26000 | 3751.72016620729
 2019-05-26 | TMP500_28000 | 3610.75970225036
 2019-05-26 | TMP300_26000 | 3598.36888491176
 2019-05-26 | TMP600_27000 | 3583.89777677553
 2019-05-26 | TMP300_21000 | 3556.60386707587
 2019-05-25 | TMP400_27000 | 4687.20302128047
 2019-05-25 | TMP200_24000 | 4453.61603102228
 2019-05-25 | TMP700_25000 | 4319.10566615313
 2019-05-25 | TMP400_24000 | 4039.01832416654
 2019-05-25 | TMP600_27000 | 3986.68667223025
 2019-05-25 | TMP600_26000 | 3879.92447655788
 2019-05-25 | TMP700_28000 | 3632.56970774056
 2019-05-25 | TMP800_25000 |  3604.1630071504
 2019-05-25 | TMP600_28000 | 3572.50801157858
 2019-05-25 | TMP500_27000 | 3536.57885829499
 2019-05-24 | TMP400_27000 | 5034.53660146287
 2019-05-24 | TMP200_24000 | 4646.08844632655
 2019-05-24 | TMP600_26000 |  4377.5741555281
 2019-05-24 | TMP700_25000 | 4321.11906399066
 2019-05-24 | TMP400_24000 | 4071.37184911687
 2019-05-24 | TMP600_25000 | 3795.00857752701
 2019-05-24 | TMP700_26000 |  3518.6449117614
 2019-05-24 | TMP600_24000 | 3368.15348120732
 2019-05-24 | TMP200_25000 | 3305.84444172308
 2019-05-24 | TMP500_28000 | 3162.57388606668
 2019-05-23 | TMP400_27000 | 4057.08620966971
 2019-05-23 | TMP700_26000 | 4024.11812392669
...
Sign up to request clarification or add additional context in comments.

8 Comments

Thanks, this certainly helps. How do I run the query on a specific date? Say, "2019-02-07"
Provided a much better version of the query which allows you to just set the from - to dates in the first CTE and the rest is adapted automatically. For the single-day calculation just set all date constants to that date.
Thanks! I am getting an error ERROR: column "r.pricedate" must appear in the GROUP BY clause or be used in an aggregate function LINE 7: SELECT r.cons_name, r.pricedate, sum(r.shadow)
Sorry, a type-o. Added the second column to the GROUP BY' in CTE daily`.
So what the code generates is great, but is there a way to make it loop over each day and make a list of the top 10 for each day and then create a column that is the 7 day "lookback" date? That make sense? So for what we have 2019-06-01 is the lookback date and any cons_name sum generated will have "2019-06-01" instead of the pricedate. And ontop of that make a list looping over from 2018-12-01. More curveballs I know!
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.