Create Json from Dataframe

Question

Being new to Spark, I am working on something and facing difficulty. Any leads will help. I am trying to create a JSON from dataframe which I have but toJSON function is not helping me out. So my output data frame is something like below :-

+---------+------------------+-------------------------+
|booking_id|    status           |count(status)|
+---------+------------------+-------------------------+
|  132         |     rent count.       |                        6|
|  132         |     rent booked     |                      24|
|  132         |     rent delayed    |                        6|
|  134         |     rent booked     |                      34|
|  134         |     rent delayed.   |                       21|

The output I am looking for is a dataframe which will contain booking id and status and its count as Json

+---------+-------------------------------------------+
|booking_id|    status_json         
+---------+-------------------------------------------+
|  132         |   { "rent count": 6, "rent booked": 24, "rent delayed": 6}  
|  134        |     { "rent booked": 34, "rent delayed": 21}

Thanks in advance.

First create a map column with staus and countstatus columns. Then groupBy, agg(collect_list("yourmapcolumn")), finally call toJSON — C.S.Reddy Gadipally
– C.S.Reddy Gadipally, Commented Jun 15, 2020 at 3:44

murtihash · Accepted Answer · 2020-06-15 03:55:19Z

3

For Spark2.4, use map_from_arrays.

from pyspark.sql import functions as F

df.groupBy("booking_id").agg(F.to_json(F.map_from_arrays(F.collect_list("status"),F.collect_list("count(status)")))\
                              .alias("status_json"))\
                              .show(truncate=False)


#+----------+--------------------------------------------------+
#|booking_id|status_json                                       |
#+----------+--------------------------------------------------+
#|132       |{"rent count":6,"rent booked":24,"rent delayed":6}|
#|134       |{"rent booked":34,"rent delayed":21}              |
#+----------+--------------------------------------------------+

answered Jun 15, 2020 at 3:55

murtihash

8,4401 gold badge16 silver badges26 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

mvasyliv · Accepted Answer · 2020-06-15 05:17:19Z

  val sourceDF = Seq(
    (132, "rent count", 6),
    (132, "rent booked", 24),
    (132, "rent delayed", 6),
    (134, "rent booked", 34),
    (134, "rent delayed", 21)
  ).toDF("booking_id", "status", "count(status)")

  val resDF = sourceDF
    .groupBy("booking_id")
    .agg(to_json(collect_list(map(col("status"), col("count(status)")))).alias("status_json"))

  //  +----------+--------------------------------------------------------+
  //  |booking_id|status_json                                             |
  //  +----------+--------------------------------------------------------+
  //  |132       |[{"rent count":6},{"rent booked":24},{"rent delayed":6}]|
  //  |134       |[{"rent booked":34},{"rent delayed":21}]                |
  //  +----------+--------------------------------------------------------+

Collectives™ on Stack Overflow

Create Json from Dataframe

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related