I have the following dataframe schema:
root
|-- firstname: string (nullable = true)
|-- lastname: string (nullable = true)
|-- cities: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- name: string (nullable = true)
| | |-- postcode: string (nullable = true
And my dataframe looks like this:
+---------+--------+-----------------------------------+
|firstname|lastname|cities |
+---------+--------+-----------------------------------+
|John |Doe |[[New York,A000000], [Warsaw,null]]|
|John |Smith |[[Berlin,null]] |
|John |null |[[Paris,null]] |
+---------+--------+-----------------------------------+
I want to replace all of null values with string "unknown". When I use na.fill function I get the following dataframe:
df.na.fill("unknown").show()
+---------+--------+-----------------------------------+
|firstname|lastname|cities |
+---------+--------+-----------------------------------+
|John |Doe |[[New York,A000000], [Warsaw,null]]|
|John |Smith |[[Berlin,null]] |
|John |unknown |[[Paris,null]] |
+---------+--------+-----------------------------------+
How can I replace ALL of the null values in dataframe (Including nested arrays)?