0

i have a dataframe with the following structure :

| a        | b    |           c                                             |
-----------------------------------------------------------------------------
|01        |ABC   |    {"key1":"valueA","key2":"valueC"}                    |
|02        |ABC   |    {"key1":"valueA","key2":"valueC"}                    |
|11        |DEF   |    {"key1":"valueB","key2":"valueD", "key3":"valueE"}   |
|12        |DEF   |    {"key1":"valueB","key2":"valueD", "key3":"valueE"}   |

i would like to turn into something like :

| a        | b    |      key         |       value     |
--------------------------------------------------------
|01        |ABC   |    key1          |     valueA      |
|01        |ABC   |    key2          |     valueC      |
|02        |ABC   |    key1          |     valueA      |
|02        |ABC   |    key2          |     valueC      |
|11        |DEF   |    key1          |     valueB      |
|11        |DEF   |    key2          |     valueD      |
|11        |DEF   |    key3          |     valueE      |
|12        |DEF   |    key1          |     valueB      |
|12        |DEF   |    key2          |     valueD      |
|12        |DEF   |    key3          |     valueE      |

in an efficient way, as the dataset can be quite large.

2
  • Note, key1...keyXX are consistent only for a given col(b) value Commented Jul 27, 2020 at 13:26
  • using spark 3.0. Commented Jul 27, 2020 at 13:56

1 Answer 1

3

Try using from_json function then explode the array.

Example:

import org.apache.spark.sql.types._
import org.apache.spark.sql.functions._
val df=Seq(("01","ABC","""{"key1":"valueA","key2":"valueC"}""")).toDF("a","b","c")
val Schema = MapType(StringType, StringType)
df.withColumn("d",from_json(col("c"),Schema)).selectExpr("a","b","explode(d)").show(10,false)
//+---+---+----+------+
//|a  |b  |key |value |
//+---+---+----+------+
//|01 |ABC|key1|valueA|
//|01 |ABC|key2|valueC|
//+---+---+----+------+
Sign up to request clarification or add additional context in comments.

1 Comment

Brilliant... much appreciate. i was starting to look building an UDF to parse the json string into a map. sounds from_json is much more powerfull.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.