0

I want to reorganise the following JSON so that array elements under docs will be under root.

Example input

{
  "response":{"docs":
      [{
        "column1":"dataA",
        "column2":"dataB"
      },  
      {
        "column1":"dataC",
        "column2":"dataD"
      }]
   }
}

Example PySpark script

from pyspark.sql import SQLContext
from pyspark import SparkContext, SparkConf


conf = SparkConf().setAppName("pyspark")
sc = SparkContext(conf=conf)

sqlContext = SQLContext(sc)
df = sqlContext.read.json("file:///.../input.json", multiLine=True)
new = df.select("response.docs")
new.printSchema()
new.write.mode("overwrite").format('json').save("file:///.../output.json")

The script already converts the schema to the following

root
 |-- docs: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- column1: string (nullable = true)
 |    |    |-- column2: string (nullable = true)

However, it should look like this final JSON

[
 {"column1":"dataA","column2":"dataB"},
 {"column1":"dataC","column2":"dataD"}
]

How can this be done using Spark?

2 Answers 2

1

You can explode the response.docs column.
Then just select column1 and column2 from this exploded column.
Like this

df.select(F.explode('response.docs').alias('col')) \
  .select('col.column1', 'col.column2')

Then the result will be like this

+-------+-------+
|column1|column2|
+-------+-------+
|  dataA|  dataB|
|  dataC|  dataD|
+-------+-------+
Sign up to request clarification or add additional context in comments.

Comments

0

Try using explode Spark function (see example here)

2 Comments

Hi. Thanks for the answer. However, seems like df.select(explode(df.response.docs)) will also create a new column called "col".
You just need to select needed columns afterwards

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.