16

I am trying to convert a column which contains Array[String] to String, but I consistently get this error

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 78.0 failed 4 times, most recent failure: Lost task 0.3 in stage 78.0 (TID 1691, ip-******): java.lang.ClassCastException: scala.collection.mutable.WrappedArray$ofRef cannot be cast to [Ljava.lang.String; 

Here's the piece of code

val mkString = udf((arrayCol:Array[String])=>arrayCol.mkString(","))  
val dfWithString=df.select($"arrayCol").withColumn("arrayString",
      mkString($"arrayCol"))  

2 Answers 2

31

WrappedArray is not an Array (which is plain old Java Array not a natve Scala collection). You can either change signature to:

import scala.collection.mutable.WrappedArray

(arrayCol: WrappedArray[String]) => arrayCol.mkString(",")

or use one of the supertypes like Seq:

(arrayCol: Seq[String]) => arrayCol.mkString(",")

In the recent Spark versions you can use concat_ws instead:

import org.apache.spark.sql.functions.concat_ws

df.select(concat_ws(",", $"arrayCol"))
Sign up to request clarification or add additional context in comments.

1 Comment

I tried to use WrappedArray but that type wasnt recognized. Seq works fine
2

The code work for me:

df.select("wifi_ids").rdd.map(row =>row.get(0).asInstanceOf[WrappedArray[WrappedArray[String]]].toSeq.map(x=>x.toSeq.apply(0)))

In your case,I guess it is:

val mkString = udf(arrayCol=>arrayCol.asInstanceOf[WrappedArray[String]].toArray.mkString(","))  
val dfWithString=df.select($"arrayCol").withColumn("arrayString",mkString($"arrayCol"))  

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.