1

I'm trying to group BigQuery columns using an array like so:

with test as (
   select 1 as A, 2 as B
   union all
   select 3, null
)

select *,
       [A,B] as grouped_columns
from test

However, this won't work, since there is a null value in column B row 2.

In fact this won't work either:

select [1, null] as test_array

When reading the documentation on BigQuery though, it says Nulls should be allowed.

In BigQuery, an array is an ordered list consisting of zero or more values of the same data type. You can construct arrays of simple data types, such as INT64, and complex data types, such as STRUCTs. The current exception to this is the ARRAY data type: arrays of arrays are not supported. Arrays can include NULL values.

There doesn't seem to be any attributes or safe prefix to be used with ARRAY() to handle nulls.

So what is the best approach for this?

2
  • I have no experience about BigQuery, but just for bypass null maybe use if null function to convert it into a non-null value represent null? like if column B never should be negative set it to -1? this is just a method if I need to have a data represent "null" but can't actually use "null" . Commented Jan 13, 2021 at 9:04
  • Right. So I have to group 20 columns this way? It should work but not efficient. When I look at documentation there are plenty of examples with [1, 2, NULL] arrays... Commented Jan 13, 2021 at 9:06

3 Answers 3

2

Per documentation - for Array type

Currently, BigQuery has two following limitations with respect to NULLs and ARRAYs:

  • BigQuery raises an error if query result has ARRAYs which contain NULL elements, although such ARRAYs can be used inside the query.

  • BigQuery translates NULL ARRAY into empty ARRAY in the query result, although inside the query NULL and empty ARRAYs are two distinct values.

So, as of your example - you can use below "trick"

with test as (
   select 1 as A, 2 as B union all
   select 3, null
)
select *, 
  array(select cast(el as int64) el
    from unnest(split(translate(format('%t', t), '()', ''), ', ')) el
    where el != 'NULL'
  ) as grouped_columns
from test t  

above gives below output

enter image description here

Note: above approach does not require explicit referencing to all involved columns!

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks! I had no idea you could pass an entire row in FORMAT() that's very cool. Also I often work with REPLACE() or REGEX_REPLACE() but discovering now TRANSLATE(), which is a little mad and very useful. This solves the quite a bit of the problem. However, if there's an index on each row, I could not avoid feeding it to FORMAT() right? I'd have to pop it out after the array is constructed?
0

My current solution---and I'm not a fan of it---is to use a combo of IFNULL(), UNNEST() and ARRAY() like so:

select 
       *,
       array(
           select * 
           from unnest(
               [
                   ifnull(A, ''), 
                   ifnull(B, '')
                   ]
               ) as grouping 
           where grouping <> ''
           ) as grouped_columns
from test

1 Comment

Still I can't quite understand why there's no way to have a null in an array while the documentation says it can. Also the documentation has clear examples of array with NULL, such as SELECT ["coffee", NULL, "milk" ] as list UNION ALL SELECT ["cake", "pie"]
0

An alternative way, you can replace NULL value to some NON-NULL figures using function IFNULL(null, 0) as given below:-

with test as (
   select 1 as A, 2 as B
   union all
   select 3, IFNULL(null, 0)
)

select *,
       [A,B] as grouped_columns
from test

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.