0

I'm looking at a table of orders for an ecommerce website and trying to build a customers table with some basic info about each customer.

I'm getting caught up when trying to use WINDOW functions like NTH_VALUE in combination with normal functions.

The orders table looks like this:

order_id | customer_id | order_date | revenue
----------------------------------------------
    1    |      11     | 2017-01-01 |  5.0
    2    |      11     | 2018-02-01 |  2.25
    3    |      12     | 2019-03-01 |  1.0
    4    |      13     | 2016-04-01 |  12.0
    5    |      13     | 2016-05-01 |  15.25
    6    |      13     | 2018-06-01 |  25.25

I'm looking to build a Customers table that looks like this:

customer_id | num_orders | first_order_date | first_order_revenue | second_order_date
--------------------------------------------------------------------------------------
      11    |     2      |    2017-01-01    |        5.0          |    2018-02-01
      12    |     1      |    2019-03-01    |        1.0          |        n/a
      13    |     3      |    2016-04-01    |        12.0         |    2018-06-01

My code should be something like this:

SELECT
customer_id,
COUNT(customer_id) num_orders,
MIN(order_date) first_order_date,
FIRST_VALUE(revenue) OVER w1 first_order_revenue,
NTH_VALUE(order_date, 2) OVER w1 second_order_date

FROM `orders`
GROUP BY customer_id
WINDOW w1 as (PARTITION BY customer_id ORDER BY order_date ASC)

But it's telling me I need to GROUP "revenue" and "order_date" via errors like this:

"SELECT list expression references column revenue which is neither grouped nor aggregated at [5:13]"

But when I do that, it returns a row for every order where first_order_date is different for each, first_order_revenue is the same (correct) value for each, and the second_order_date is correct except for the first row...where it is null:

customer_id | num_orders | first_order_date | first_order_revenue | second_order_date
--------------------------------------------------------------------------------------
      13    |      1     |    2016-04-01    |        12.0         |       *null*
      13    |      1     |    2016-05-01    |        12.0         |     2016-05-01
      13    |      1     |    2018-06-01    |        12.0         |     2016-05-01

I'm slowly teaching myself SQL but this specific issue I can't find any solutions for online. I'm guessing it might take a nested SELECT statement for the WINDOW functions that is then JOINed with the non-WINDOW functions? Something like that? I've tried a few different solutions but nothing is working so far.

Thank you for anyone that can help!

1 Answer 1

1

I think a subquery and conditional aggregation might be simpler:

SELECT customer_id, COUNT(*) num_orders,
       MIN(order_date) first_order_date,
       MAX(CASE WHEN seqnum = 1 THEN revenue END) as revenue_1,
       MAX(CASE WHEN seqnum = 2 THEN revenue END) as revenue_2
FROM (SELECT o.*,
             ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date) as seqnum
      FROM `orders` o
     ) o
GROUP BY customer_id;

Or, put the values in an array:

SELECT customer_id, COUNT(*) num_orders,
       MIN(order_date) first_order_date,
       ARRAY_AGG(revenue ORDER BY order_date LIMIT 2) as revenue_1_2
FROM `orders` o
GROUP BY customer_id;
Sign up to request clarification or add additional context in comments.

3 Comments

Ah interesting.Using the ROW_NUMBER() to create the 'seqnum' column and then you can pull whatever column you want if it matches the seqnum you want.
Woops it sent too soon. Was going to add: The use of MAX is just a kind of workaround for pulling those values without having to GROUP them, correct?
@LeeLK . . . Yes. The MAX() is pulling one value per seqnum and there is only one value.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.