1

I have a dataframe df which can be saved as json file in the following structure: {"id":"1234567890","score":123.0,"date":yyyymmdd}

for first instance I am saving it as follows:

df.write.format("json").save("path")

This df needs to saved as json file in the following structure id::1234567890\t{"id":"1234567890","score":123.0,"date":yyyymmdd}

I tried various ways but couldn't do it. How can we save it in the desired format?

Spark version: 1.6.0
Scala version: 2.10.6
2
  • Is this JSON even valid? What is this id::1234567890\t Commented Jun 9, 2017 at 5:29
  • Why would you need to save in such complications when you already have a dataframe from which you can always extract id and row as your expected result? Commented Jun 9, 2017 at 5:34

1 Answer 1

1

That is not json format. You are better off using an rdd and then transforming it into that custom format.

final case class LineOfSomething(id: String, score: BigDecimal, date: String)
import sqlContext.implicits._
df
  .as[LineOfSomething]
  .rdd
  .mapPartitions(lines => {
    val mapper = new com.fasterxml.jackson.databind.ObjectMapper()
    mapper.registerModule(com.fasterxml.jackson.module.scala.DefaultScalaModule)
    lines.map(line => {
      val json = mapper.writeValueAsString(line)
      s"id::${line.id}\t$json"
    })
  })
  .saveAsTextFile(output)
Sign up to request clarification or add additional context in comments.

5 Comments

Iam getting the following error. May be due to the version of spark i am using i.e. Spark 1.6.0: Exception in thread "main" org.apache.spark.sql.AnalysisException: Unable to generate an encoder for inner class com.company.class.RowMapper$LineOfSomething without access to the scope that this class was defined in. Try moving this class out of its parent class.; sql.catalyst.encoders.ExpressionEncoder$$anonfun$2.applyOrElse(ExpressionEncoder.scala:264) at
are you running it inside the main "object" ? If not, move the case class into its own file.
RowMapper is separate singleton with other mappers and case classes. I included this new case class into RowMapper. i Still keep getting the same error. Also looks like the issue is with datasets under the hood? i dont think 1.6.0 supports Datasets?
Instead of this structure: id::1234567890\t{"id":"1234567890","score":123.0,"date":yyyymmdd} in a text file. Can we get it in this structure as a json file instead: {"id"::1234567890, {"id":"1234567890","score":123.0,"date":yyyymmdd} } . Iam still getting the Unable to generate Encoder exception.
are you sure you have imported implicits and tried moving the case class to its own file? Typically "Unable to generate Encoder" is because you have the case class as an inner class that spark cannot access.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.