0

I have a data file look like below:

// data.txt
1    2016-01-01
2    \N
3    2016-03-01

I used \N to represent a null value for some reason. (It's not a special character, it's a string consists of 2 chars: \ and N).

I want to create DataFrame like below:

case class Data(
    val id   : Int, 
    val date : java.time.LocalDate)

val df = sc.textFile("data.txt")
           .map(_.split("\t"))
           .map(p => Data(
               p(0).toInt, 
               _helper(p(1))
           ))
           .toDF()

My question is how can I write the helper method ?

def _helper(s : String) = s match {
    case "\\N" => null,                // type error             
    case _     => LocalDate.parse(s, dateFormat) 
}

1 Answer 1

1

This is where an Option type will come in handy.

I changed the custom null value to make the case more explicit but it should work in your case. My data is in a .txt file like so:

Ryan,11
Bob,22
Kevin,23
Asop,-nnn-

Notice the -nnn- is my custom null. I use a slightly different case class:

case class DataSet(name: String, age: Option[Int])

And write a pattern matching function to capture the nuances of the situation:

      def customNull (col: String): Option[Int] = col match {
           case "-nnn-" => None
           case _ => Some(Integer.parseInt(col))
      }

From here it should work as expected when you combine the two:

  val df = sc.textFile("./data.txt")
.map(_.split(","))
.map(p=>DataSet(p(0), customNull(p(1))))
.toDF()

When I do a df.show() I get the following:

+-----+----+
| name| age|
+-----+----+
| Ryan|  11|
|  Bob|  22|
|Kevin|  23|
| Asop|null|
+-----+----+

Treating the ages like a string gets around the problem. It might not be the fastest to parse values like this. Ideally, you could also use an Either but that can also get complex.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.