1

I have a table that contains a string column containing a stringified list of JSON objects like so:

'[{"a": 5, "b": 6}, {"a": 7, "b": 8}]'

I would like to unnest this array, and then use json_extract() or json_extract_scalar() to get the values out of these objects.

It's unclear from BigQuery's JSON Function documentation that I'm able to do so using baked-in functionality.

Is a UDF required to accomplish this, or does this functionality exist in BigQuery?

The below UDF accomplishes what I'm looking for:

CREATE TEMP FUNCTION
  JSON_EXTRACT_ARRAY(input STRING)
  RETURNS ARRAY<STRING>
  LANGUAGE js AS """  
return JSON.parse(input).map(x => JSON.stringify(x));
""";

with

raw as (
  select
    1 as id,
    '[{"a": 5, "b": 6}, {"a": 7, "b": 8}]' as body
)

select
  id,
  json_extract(entry, '$.a') as a,
  json_extract(entry, '$.b') as b
from
  raw,
  unnest(json_extract_array(body)) as entry
4
  • looks like you have solution - what is the question here? do you want solution that does not involve UDF? or something else? please clarify - as whatever yo highlighted/bolded is still not clear to me Commented Jul 19, 2019 at 18:32
  • 2
    please vote for issuetracker.google.com/issues/63716683 Commented Jul 19, 2019 at 19:49
  • Ah sorry, I thought my question was pretty clear. To rephrase: "Is a UDF required to accomplish this, or does BigQuery Standard SQL support unnesting stringified JSON arrays without the need for a UDF?" Commented Jul 19, 2019 at 21:12
  • It's unclear to me why this question was downvoted without a suggestion as to what could make it better. To anyone stumbling on this, it is still a legitimate open question, though Felipe Hoffa's comment does contain a link tracking this missing feature if anyone would like to add a vote to it. Commented Jul 26, 2019 at 17:18

2 Answers 2

0

try something like this


with

raw as (
    select
        1 as id,
        '[{"a": 5, "b": 6}, {"a": 7, "b": 8}]' as body
)

select
    r.id,
    r.body,
    regexp_extract_all(r.body, r'({.*?})'),
    json_extract(entry, '$.a') as a,
    json_extract(entry, '$.b') as b
from
    raw as r
    cross join  unnest(
                    regexp_extract_all(r.body, r'({.*?})')
                ) as entry

or a slightly more general solution

with

raw as (
    select
        1 as id,
        '[{"a": 5, "b": {"x": 1, "y": 2}}, {"b": {"c": 5, "d": 8}, "a": 7}]' as body
)

select
    r.id,
    r.body,
    split(trim(r.body, '[]{}'), '}, {'),
    json_extract(concat('{', entry, '}'), '$.a') as a,
    json_extract(concat('{', entry, '}'), '$.b') as b
from
    raw as r
    cross join  unnest(
                    split(trim(r.body, '[]{}'), '}, {')
                ) as entry
Sign up to request clarification or add additional context in comments.

1 Comment

This is nifty, and the example code works as expected to solve the problem without recourse to a UDF. I'll kick the tires with this - thanks!
0

Google has added the function JSON_EXTRACT_ARRAY to their Standard SQL so this can now be accomplished without a UDF. In fact, since the UDF name in the OP is the same name (JSON_EXTRACT_ARRAY) you can run that query below the UDF as is and it will work.

If performance matters, rather than completely denormalizing the table, you can also take advantage of BigQuery's nesting capabilities by extracting the body data into a repeated record.

with 
    raw as (
        select
            1 as id,
            '[{"a": 5, "b": 6}, {"a": 7, "b": 8}]' as body
    )

select
    r.id,
    array(
        select
            struct (
                json_value(items, '$.a') as a,
                json_value(items, '$.b') as b
            ) as b 
        from unnest(json_extract_array(body, '$')) as items
    ) as body_record_repeated
from raw r

which will return

BigQuery repeated record result

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.