1

I have a table that looks like this:

+----+------+------+------+------+------+
| id | col1 | col2 | col3 | col4 | col5 |
+----+------+------+------+------+------+
| a  |    1 | null | null | null | null |
| b  |    1 | 2    | 3    | 4    | null |
| c  |    1 | 2    | 3    | 4    | 5    |
| d  |    2 | 1    | 7    | null | 4    |
+----+------+------+------+------+------+

I want to create an aggregated table where for each id I want an array that contains non null value from all the other columns. The output should look like this:

+-----+-------------+
| id  |   agg_col   |
+-----+-------------+
| a   | [1]         |
| b   | [1,2,3,4]   |
| c   | [1,2,3,4,5] |
| d   | [2,1,7,4]   |
+-----+-------------+

Is it possible to produce the output using bigquery standard sql?

1 Answer 1

3

Below is not super generic solution, but works for your specific example that you provided - id is presented with alphanumeric (not starting with digit) and rest of columns are numbers - integers

#standardSQL
SELECT id,
  ARRAY(SELECT * FROM UNNEST(REGEXP_EXTRACT_ALL(TO_JSON_STRING(t), r':(\d*)')) col WHERE col != '') AS agg_col_as_array,
  CONCAT('[', ARRAY_TO_STRING(ARRAY(SELECT * FROM UNNEST(REGEXP_EXTRACT_ALL(TO_JSON_STRING(t), r':(\d*)')) col WHERE col != ''), ','), ']') AS agg_col_as_string
FROM `project.dataset.table` t

You can test, play with above using sample data from your question as below

#standardSQL
WITH `project.dataset.table` AS (
  SELECT 'a' id, 1 col1, NULL col2, NULL col3, NULL col4, NULL col5 UNION ALL
  SELECT 'b', 1, 2, 3, 4, NULL UNION ALL
  SELECT 'c', 1, 2, 3, 4, 5 UNION ALL
  SELECT 'd', 2, 1, 7, NULL, 4
)
SELECT id,
  ARRAY(SELECT * FROM UNNEST(REGEXP_EXTRACT_ALL(TO_JSON_STRING(t), r':(\d*)')) col WHERE col != '') AS agg_col_as_array,
  CONCAT('[', ARRAY_TO_STRING(ARRAY(SELECT * FROM UNNEST(REGEXP_EXTRACT_ALL(TO_JSON_STRING(t), r':(\d*)')) col WHERE col != ''), ','), ']') AS agg_col_as_string
FROM `project.dataset.table` t
-- ORDER BY id   

with result as

Row id  agg_col_as_array    agg_col_as_string    
1   a   1                   [1]  
2   b   1                   [1,2,3,4]    
        2        
        3        
        4        
3   c   1                   [1,2,3,4,5]  
        2        
        3        
        4        
        5        
4   d   2                   [2,1,7,4]    
        1        
        7        
        4        

Do you think it is possible to do this by mentioning specific columns and then binding them into an array?

Sure, it is doable - see below

#standardSQL
WITH `project.dataset.table` AS (
  SELECT 'a' id, 1 col1, NULL col2, NULL col3, NULL col4, NULL col5 UNION ALL
  SELECT 'b', 1, 2, 3, 4, NULL UNION ALL
  SELECT 'c', 1, 2, 3, 4, 5 UNION ALL
  SELECT 'd', 2, 1, 7, NULL, 4
)
SELECT id, 
  ARRAY(
    SELECT col 
    FROM UNNEST([col1, col2, col3, col4, col5]) col 
    WHERE NOT col IS NULL
  ) AS agg_col_as_array,
CONCAT('[', ARRAY_TO_STRING(
  ARRAY(
    SELECT CAST(col AS STRING) 
    FROM UNNEST([col1, col2, col3, col4, col5]) col 
    WHERE NOT col IS NULL
  ), ','), ']') AS agg_col_as_string
FROM `project.dataset.table` t
-- ORDER BY id       

BUT ... this is not the best option you have as you need to manage and adjust number and names of columns in each case for different uses

Below solution is adjusted version of my original answer to address your latest comment - Actually the sample was too simple. Both of my id and other columns have alphanumeric and special characters.

#standardSQL
WITH `project.dataset.table` AS (
  SELECT 'a' id, 1 col1, NULL col2, NULL col3, NULL col4, NULL col5 UNION ALL
  SELECT 'b', 1, 2, 3, 4, NULL UNION ALL
  SELECT 'c', 1, 2, 3, 4, 5 UNION ALL
  SELECT 'd', 2, 1, 7, NULL, 4
)
SELECT id,
  ARRAY(
    SELECT col 
    FROM UNNEST(REGEXP_EXTRACT_ALL(TO_JSON_STRING(t), r':(.*?)(?:,|})')) col WITH OFFSET 
    WHERE col != 'null' AND OFFSET > 0
  ) AS agg_col_as_array,
  CONCAT('[', ARRAY_TO_STRING(
    ARRAY(
      SELECT col 
      FROM UNNEST(REGEXP_EXTRACT_ALL(TO_JSON_STRING(t), r':(.*?)(?:,|})')) col WITH OFFSET 
      WHERE col != 'null' AND OFFSET > 0
    ), ','), ']') AS agg_col_as_string
FROM `project.dataset.table` t
-- ORDER BY id   

both with same result as before

Row id  agg_col_as_array    agg_col_as_string    
1   a   1                   [1]  
2   b   1                   [1,2,3,4]    
        2        
        3        
        4        
3   c   1                   [1,2,3,4,5]  
        2        
        3        
        4        
        5        
4   d   2                   [2,1,7,4]    
        1        
        7        
        4         
Sign up to request clarification or add additional context in comments.

4 Comments

Actually the sample was too simple. Both of my id and other columns have alphanumeric and special characters. Do you think it is possible to do this by mentioning specific columns and then binding them into an array?
sure. everything is possible. I would hope my answer gives you at least a hint and you would expand on it to match your real use case as opposed to simplified version you presented in the question- but if not, not a problem, I will be back later when have some time to help you further :o)
I've added options for you in the answer. check them!
Sweet! I was actually looking for this technique: ARRAY( SELECT col FROM UNNEST([col1, col2, col3, col4, col5]) col WHERE NOT col IS NULL ) AS agg_col_as_array Thanks a lot!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.