I want to reorganise the following JSON so that array elements under docs will be under root.
Example input
{
"response":{"docs":
[{
"column1":"dataA",
"column2":"dataB"
},
{
"column1":"dataC",
"column2":"dataD"
}]
}
}
Example PySpark script
from pyspark.sql import SQLContext
from pyspark import SparkContext, SparkConf
conf = SparkConf().setAppName("pyspark")
sc = SparkContext(conf=conf)
sqlContext = SQLContext(sc)
df = sqlContext.read.json("file:///.../input.json", multiLine=True)
new = df.select("response.docs")
new.printSchema()
new.write.mode("overwrite").format('json').save("file:///.../output.json")
The script already converts the schema to the following
root
|-- docs: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- column1: string (nullable = true)
| | |-- column2: string (nullable = true)
However, it should look like this final JSON
[
{"column1":"dataA","column2":"dataB"},
{"column1":"dataC","column2":"dataD"}
]
How can this be done using Spark?