1

Is it possible to limit the result of a window function, with partitioning, without a subquery? This code is in postgres/mysql. I'm looking for solution in mysql and postgres.

For example: let's say the join is irrelevant to the point of the question.

select acct.name, we.channel, count(*) as cnt,
    max(count(*)) over (partition by name order by count(*) desc) as max_cnt
from web_events we join accounts acct
    on we.account_id=acct.id
group by acct.name, we.channel
order by name, max_cnt desc;

The result of this query gives:

output

I only want to show the first line of each of the window's partition. For example: lines with cnt: [3M,19],[Abbott Labortories,20]

I tried the following that doesn't work (added limit 1 to the window function):

select acct.name, we.channel, count(*) as cnt,
        max(count(*)) over (partition by name order by count(*) desc limit 1) as max_cnt
    from web_events we join accounts acct
        on we.account_id=acct.id
    group by acct.name, we.channel
    order by name, max_cnt desc;

2 Answers 2

1

I only want to show the first line of each of the window's partition. For example: lines with cnt: [3M,19],[Abbott Labortories,20]

You don't actually need a window function here, since the first row's max_cnt will always equal cnt. Instead use DISTINCT ON in combination with the GROUP BY.

From the postgresql documentation

SELECT DISTINCT ON ( expression [, ...] ) keeps only the first row of each set of rows where the given expressions evaluate to equal. The DISTINCT ON expressions are interpreted using the same rules as for ORDER BY (see above). Note that the “first row” of each set is unpredictable unless ORDER BY is used to ensure that the desired row appears first

SELECT DISTINCT ON(acct.name) 
  acct.name
, we.channel
, COUNT(*) cnt
FROM web_events we 
JOIN accounts acct
  ON we.account_id=acct.id
GROUP BY 1, 2
ORDER BY name, cnt DESC;

Here's a quick demo in sqlfiddle. http://sqlfiddle.com/#!17/57694/8

1 way I always messed up when I first started using DISTINCT ON is to ensure that the order of expressions in the ORDER BY clause starts with the expressions in the DISTINCT ON. In the above example the ORDER BY starts with acct.name

If there is a tie for first position, the first row that meets the criteria will be returned. This is non-deterministic. It is possible to specify additional expressions in the ORDER BY to affect which rows are returned in this setting.

example:

ORDER BY name, cnt DESC, channel = 'direct'

will return the row containing facebook, if for a given account, both facebook and direct yield the same cnt.

However, note that with this approach, it is not possible to return all the rows that are tied for first position, i.e. both rows containing facebook & direct (without using a subquery).

DISTINCT ON may be combined in the same statement with GROUP BYs (above example) and WINDOW FUNCTIONS (example below). The DISTINCT ON clause is logically evaluated just before the LIMIT.

For instance, the following query (however pointless) shows off the combination of DISTINCT ON with WINDOW FUNCTION. It will return a distinct row per max_cnt

SELECT DISTINCT ON(mxcnt) 
  acct.name
, we.channel
, COUNT(*) cnt
, MAX(COUNT(*)) OVER (PARTITION BY acct.name) mxcnt
FROM web_events we 
JOIN accounts acct
  ON we.account_id=acct.id
GROUP BY 1, 2
ORDER BY mxcnt, cnt DESC;
Sign up to request clarification or add additional context in comments.

Comments

0

Use a subquery. If you want exactly one row (even if there are ties), then use row_number():

select name, channel, cnt
from (select acct.name, we.channel, count(*) as cnt,
             row_number() over (partition by acct.name order by count(*) desc) as seqnum
      from web_events we join
           accounts acct
           on we.account_id = acct.id
      group by acct.name, we.channel
     ) wea
order by name;

You can use rank() if you want multiple rows for an account, in the event of ties.

2 Comments

Hi, thanks. I was trying to see if there's a way to do it without subquery.
@Lena . . . You can't really do without a subquery, because you need to calculate the count for all rows and then reduce the number of rows. I mean, the filtering could use a CTE or correlated subquery, but it will not be a simple query.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.