I want to create a dataframe from a complex JSON in String format using Spark scala.
Spark version is 3.1.2. Scala version is 2.12.14.
The source data is like below:
{
"info": [
{
"done": "time",
"id": 9,
"type": "normal",
"pid": 202020,
"add": {
"fields": true,
"stat": "not sure"
}
},
{
"done": "time",
"id": 14,
"type": "normal",
"pid": 764310,
"add": {
"fields": true,
"stat": "sure"
}
},
{
"done": "time",
"id": 9,
"type": "normal",
"pid": 202020,
"add": {
"note": {
"id": 922,
"score": 0
}
}
}
],
"more": {
"a": "ok",
"b": "fine",
"c": 3
}
}
I have tried following things but not working.
val schema = new StructType().add("info", ArrayType(StringType)).add("more", StringType)
val rdd = ss.sparkContext.parallelize(Seq(Row(data))) // data is as mentioned above JSON
val df = ss.createDataFrame(rdd, schema)
df.printSchema()
schema gets printed as below
root
|-- info: array (nullable = true)
| |-- element: string (containsNull = true)
|-- more: string (nullable = true)
print(df.head())
Above line throws exception java.lang.RuntimeException: Error while encoding: java.lang.RuntimeException: java.lang.String is not a valid external type for schema of array<string>
Please help me to do this.