0

I have a json file which contain nested array as like below,

|    |    |-- coordinates: array (nullable = true)
|    |    |    |-- element: array (containsNull = true)
|    |    |    |    |-- element: array (containsNull = true)
|    |    |    |    |    |-- element: array (containsNull = true)
|    |    |    |    |    |    |-- element: long (containsNull = true)

I have used Spark to read json and exploded the array.

explode(col("list_of_features.geometry.coordinates"))

which returns values as below,

WrappedArray(WrappedArray(WrappedArray(1271700, 6404100), WrappedArray(1271700, 6404200), WrappedArray(1271600, 6404200), WrappedArray(1271600, 6404300),....

But the original input looks like without WrappedArray.

something like,

[[[[1271700,6404100],[1271700, 6404200],[1271600, 6404200]

The ultimate aim is to store the coordinates without WrappedArray (may be as String) in csv file for Hive to read the data.

After explode is there any way to just the coordinates enclosed with proper square brackets.

Or can I use replace to replace the WrappedArray string value in RDD?

1
  • I don't know wrapped array, but you should be able to write a recursive function that returns you what you need. Maybe there is a cleaner option tho. Commented Apr 12, 2018 at 9:11

1 Answer 1

1

You can use UDF to flatten the WrappedArray and make it String value as

//udf
val concatArray = udf((value:  Seq[Seq[Seq[Seq[Long]]]]) => {
  value.flatten.flatten.flatten.mkString(",")
})

Now use udf to create/replace the column as

df1.withColumn("coordinates", concatArray($"coordinates") )

This should give you a string separated with "," replacing the WrappedArray

UPDATE: If you wan in the same format as string with brackets then you can do as

val concatArray = udf((value:  Seq[Seq[Seq[Seq[Long]]]]) => {
  value.map(_.map(_.map(_.mkString("[", ",", "]")).mkString("[", "", "]")).mkString("[", "", "]"))
})

Output:

[[[[1271700,6404100][1271700,6404200][1271600,6404200]]]]

Hope this helps!

Sign up to request clarification or add additional context in comments.

7 Comments

It's flatten the entire coordinates , But I want to maintain the array levels by square brackets. This coordinates are latter used to draw polygon in the map. So maintaining the levels are needed.
Due to two level of flatten it will never maintain the level right as I mentioned in my post [[[[1271700,6404100],[1271700, 6404200],[1271600, 6404200]][132122,24433]]]
I guess As per the schema you provided two level of flatten should work.
Or you want like [[[[1271700,6404100],[1271700, 6404200],[1271600, 6404200]][132122,24433]]] this as string. i.e with beackets ?
Yes you are right. to be more specific I want to store this data as string in hive instead of complex data types. But I would like to maintain the [ as it's from original data.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.