How to convert WrappedArray to string in spark?

Question

I have a json file which contain nested array as like below,

|    |    |-- coordinates: array (nullable = true)
|    |    |    |-- element: array (containsNull = true)
|    |    |    |    |-- element: array (containsNull = true)
|    |    |    |    |    |-- element: array (containsNull = true)
|    |    |    |    |    |    |-- element: long (containsNull = true)

I have used Spark to read json and exploded the array.

explode(col("list_of_features.geometry.coordinates"))

which returns values as below,

WrappedArray(WrappedArray(WrappedArray(1271700, 6404100), WrappedArray(1271700, 6404200), WrappedArray(1271600, 6404200), WrappedArray(1271600, 6404300),....

But the original input looks like without WrappedArray.

something like,

[[[[1271700,6404100],[1271700, 6404200],[1271600, 6404200]

The ultimate aim is to store the coordinates without WrappedArray (may be as String) in csv file for Hive to read the data.

After explode is there any way to just the coordinates enclosed with proper square brackets.

Or can I use replace to replace the WrappedArray string value in RDD?

I don't know wrapped array, but you should be able to write a recursive function that returns you what you need. Maybe there is a cleaner option tho. — Mafii
– Mafii, Commented Apr 12, 2018 at 9:11

koiralo · Accepted Answer · 2018-04-12 16:23:45Z

1

You can use UDF to flatten the WrappedArray and make it String value as

//udf
val concatArray = udf((value:  Seq[Seq[Seq[Seq[Long]]]]) => {
  value.flatten.flatten.flatten.mkString(",")
})

Now use udf to create/replace the column as

df1.withColumn("coordinates", concatArray($"coordinates") )

This should give you a string separated with "," replacing the WrappedArray

UPDATE: If you wan in the same format as string with brackets then you can do as

val concatArray = udf((value:  Seq[Seq[Seq[Seq[Long]]]]) => {
  value.map(_.map(_.map(_.mkString("[", ",", "]")).mkString("[", "", "]")).mkString("[", "", "]"))
})

Output:

[[[[1271700,6404100][1271700,6404200][1271600,6404200]]]]

Hope this helps!

edited Apr 12, 2018 at 16:23

answered Apr 12, 2018 at 11:07

koiralo

23.2k6 gold badges57 silver badges77 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

William R Over a year ago

It's flatten the entire coordinates , But I want to maintain the array levels by square brackets. This coordinates are latter used to draw polygon in the map. So maintaining the levels are needed.

William R Over a year ago

Due to two level of flatten it will never maintain the level right as I mentioned in my post [[[[1271700,6404100],[1271700, 6404200],[1271600, 6404200]][132122,24433]]]

koiralo Over a year ago

I guess As per the schema you provided two level of flatten should work.

koiralo Over a year ago

Or you want like [[[[1271700,6404100],[1271700, 6404200],[1271600, 6404200]][132122,24433]]] this as string. i.e with beackets ?

William R Over a year ago

Yes you are right. to be more specific I want to store this data as string in hive instead of complex data types. But I would like to maintain the [ as it's from original data.

|

Collectives™ on Stack Overflow

How to convert WrappedArray to string in spark?

1 Answer 1

7 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

7 Comments

Your Answer

Sign up or log in

Post as a guest

Related