From the import documentation of BigQuery,
Note: Null values are not allowed
So I assume null is not allowed in a json-formatted data for BigQuery import. However, null value is actually very common in regular ETL task (due to missing data). What should be a good solution to import such json source files? Note my data contains nested structures so I do not prefer a conversion to CSV and use ,, to represent a null value.
One way I think I can do is to replace all null values with default values of different data types respectively, e.g.,
- string:
null-> empty string - integer:
null-> -1 - float:
null-> -1.0 - ...
But I don't like it. I am looking for better options.
BTW, I tried to do bq load with a json file containing null values. I get the below error:
Failure details:
- Expected '"' found 'n'
- Expected '"' found 'n'
- Expected '"' found 'n'
- Expected '"' found 'n'
- Expected '"' found 'n
...
I think this is the indication of null usage, is it correct?
EDIT: If I remove all the null fields, it seems to work. I guess this is the way to handle the null data. You cannot have null for a data field, but you can just not include it. So I need to have a filtering code to remove all the null field in my raw json.