Save dataframe as JSON in specific structure in Spark Scala

Question

I have a dataframe df which can be saved as json file in the following structure: {"id":"1234567890","score":123.0,"date":yyyymmdd}

for first instance I am saving it as follows:

df.write.format("json").save("path")

This df needs to saved as json file in the following structure id::1234567890\t{"id":"1234567890","score":123.0,"date":yyyymmdd}

I tried various ways but couldn't do it. How can we save it in the desired format?

Spark version: 1.6.0
Scala version: 2.10.6

Why would you need to save in such complications when you already have a dataframe from which you can always extract id and row as your expected result? — Anahcolus
– Anahcolus, Commented Jun 9, 2017 at 5:34

Nils · Accepted Answer · 2017-06-11 07:45:52Z

1

That is not json format. You are better off using an rdd and then transforming it into that custom format.

final case class LineOfSomething(id: String, score: BigDecimal, date: String)
import sqlContext.implicits._
df
  .as[LineOfSomething]
  .rdd
  .mapPartitions(lines => {
    val mapper = new com.fasterxml.jackson.databind.ObjectMapper()
    mapper.registerModule(com.fasterxml.jackson.module.scala.DefaultScalaModule)
    lines.map(line => {
      val json = mapper.writeValueAsString(line)
      s"id::${line.id}\t$json"
    })
  })
  .saveAsTextFile(output)

edited Jun 11, 2017 at 7:45

answered Jun 9, 2017 at 8:06

Nils

4462 silver badges5 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

qubiter Over a year ago

Iam getting the following error. May be due to the version of spark i am using i.e. Spark 1.6.0: Exception in thread "main" org.apache.spark.sql.AnalysisException: Unable to generate an encoder for inner class com.company.class.RowMapper$LineOfSomething without access to the scope that this class was defined in. Try moving this class out of its parent class.; sql.catalyst.encoders.ExpressionEncoder$$anonfun$2.applyOrElse(ExpressionEncoder.scala:264) at

Nils Over a year ago

are you running it inside the main "object" ? If not, move the case class into its own file.

qubiter Over a year ago

RowMapper is separate singleton with other mappers and case classes. I included this new case class into RowMapper. i Still keep getting the same error. Also looks like the issue is with datasets under the hood? i dont think 1.6.0 supports Datasets?

qubiter Over a year ago

Instead of this structure: id::1234567890\t{"id":"1234567890","score":123.0,"date":yyyymmdd} in a text file. Can we get it in this structure as a json file instead: {"id"::1234567890, {"id":"1234567890","score":123.0,"date":yyyymmdd} } . Iam still getting the Unable to generate Encoder exception.

Nils Over a year ago

are you sure you have imported implicits and tried moving the case class to its own file? Typically "Unable to generate Encoder" is because you have the case class as an inner class that spark cannot access.

Collectives™ on Stack Overflow

Save dataframe as JSON in specific structure in Spark Scala

1 Answer 1

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related