4

From the import documentation of BigQuery,

Note: Null values are not allowed

So I assume null is not allowed in a json-formatted data for BigQuery import. However, null value is actually very common in regular ETL task (due to missing data). What should be a good solution to import such json source files? Note my data contains nested structures so I do not prefer a conversion to CSV and use ,, to represent a null value.

One way I think I can do is to replace all null values with default values of different data types respectively, e.g.,

  • string: null -> empty string
  • integer: null -> -1
  • float: null -> -1.0
  • ...

But I don't like it. I am looking for better options.

BTW, I tried to do bq load with a json file containing null values. I get the below error:

Failure details:
- Expected '"' found 'n'
- Expected '"' found 'n'
- Expected '"' found 'n'
- Expected '"' found 'n'
- Expected '"' found 'n
...

I think this is the indication of null usage, is it correct?

EDIT: If I remove all the null fields, it seems to work. I guess this is the way to handle the null data. You cannot have null for a data field, but you can just not include it. So I need to have a filtering code to remove all the null field in my raw json.

7
  • NULL is allowed in JSON syntax. Different JSON packages use different software constructs to represent NULL -- either an explicit NULL object, or something like an empty array. But the messages you quote tell us very little. Commented Nov 7, 2012 at 2:18
  • But note that JSON is only a data format -- it does not describe semantics, and the semantics of the data must be agreed to by both ends of the "conversation". If NULL is not in the agreed-to semantics then JSON has nothing to do with it. The "BigQuery" document defines some rather restricted semantics. Commented Nov 7, 2012 at 2:22
  • Yeah, this might be the restriction of BigQuery import. I just want to know if there is any smart way to avoid the limitation. Commented Nov 7, 2012 at 2:27
  • You can maybe use something (eg, an empty array) as a "stand-in". I don't really know what BigQuery is doing or what you're doing with it, though -- you have to look at your use of it to see what tricks you can play. Commented Nov 7, 2012 at 2:35
  • (Note that, in JSON, there's no requirement that a particular data item be of a specific type. Eg, "phone_number" can be character one time, integer the next time, and an array (or even "object") the third time. So to represent a "null" integer, you do not have to use an integer value. Commented Nov 7, 2012 at 2:37

1 Answer 1

4

You can import NULL values using JSON format source files - omit the key:value pair for values that are NULL.

Example - Let's say you have a schema like this:

{
"name": "kind",
"type": "string"
},
{
"name": "fullName",
"type": "string",
},
{
"name": "age",
"type": "integer",
"mode": "nullable"
}

A record with no NULL values might look like this:

{"kind": "person",
 "fullName": "Some Person",
 "age": 22
}

However, when "age" is NULL, try this (note, no "age" key):

{"kind": "person",
 "fullName": "Some Person",
}

Please let us know if you have issues with this. I'll make a note to improve the documentation around using NULL values with JSON import formats.

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks. This confirmed that I need to omit null (key,value) pairs.
This doesn't help if pre-processing your upload involves many gigabytes of data. For example, when loading log files of click data from a web server

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.