1

I'm trying to run some calculation over an array of aggregated data. When using a SQL function it's working:

CREATE TEMPORARY FUNCTION uniq_sum(cls array<struct<word string,word_count int64>>) AS (
  (select sum(word_count) from (select row_number() over (partition by word) r,word_count from unnest(cls)) where r=1)
);

select 
  corpus,
  uniq_sum(array_agg(struct(word,word_count))) res
  from `bigquery-public-data.samples.shakespeare` 
  group by corpus

When I try to run this inline, I get an error: Aggregate function ARRAY_AGG not allowed in UNNEST.

Is it possible to run inline calculations over an array created by array_agg? In this case I'm trying to run some version of sum(distinct) where the distinct key is taken over some string element (so for many pairs of word,word_count I would like to run sum(word_count), and sum only one element per word).

select 
  corpus,
  (select sum(word_count) from (select row_number() over (partition by word) r,word_count from unnest(array_agg(struct(word,word_count))) where r=1))
  from `bigquery-public-data.samples.shakespeare` 
  group by corpus

1 Answer 1

3

Below simple query returns exactly same result as yours - so looks like you overcomplicating things

#standardSQL
SELECT 
  corpus,
  SUM(word_count) res
FROM `bigquery-public-data.samples.shakespeare` 
GROUP BY corpus  

Meantime, formally, below is inline version of what you asked for

SELECT 
  corpus,
  (SELECT SUM(word_count) FROM (
    SELECT 
      word_count, 
      ROW_NUMBER() OVER(PARTITION BY word) r
    FROM UNNEST(cls)) 
    WHERE r=1  
  ) res
FROM (
  SELECT corpus, ARRAY_AGG(STRUCT(word,word_count)) cls
  FROM `bigquery-public-data.samples.shakespeare` 
  GROUP BY corpus
)    

Happy New Year! :o)

Sign up to request clarification or add additional context in comments.

2 Comments

Is it possible to write the second query without a subquery? As in - scalar subquery that is run over ARRAY_AGG(STRUCT(word,word_count)), like: (select sum(x) from unnest(array_agg(STRUCT(word,word_count))) where...
Unfortunately Aggregate function ARRAY_AGG not allowed in UNNESTso you cannot do it in one shot

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.