2

I created spark dataframe using mongodata (in databricks using Python notebook)

Dataframe

I need to convert this dataframe as

required output

How can I do this?

7
  • could you add output of .printSchema()? this sal column is just a string wirth new lines? Commented Nov 9, 2019 at 5:28
  • 2
    You can try pyspark.sql.functions.explode if sal column entries are arrays. Commented Nov 9, 2019 at 5:33
  • can you add how do you create dataframe. Commented Nov 9, 2019 at 5:39
  • @Mahesh Gupta ``` spark.read.format("com.mongodb.spark.sql.DefaultSource").option("uri",constring).load() ``` Commented Nov 9, 2019 at 5:46
  • @chlebek, yes it's stringtype Commented Nov 9, 2019 at 5:47

1 Answer 1

3

Here is one proposed solution. You can organize your sal field into arrays using $concatArrays in MongoDB before exporting it to Spark. Then, run something like this

#df
#+---+-----+------------------+
#| id|empno|               sal|
#+---+-----+------------------+
#|  1|  101|[1000, 2000, 1500]|
#|  2|  102|      [1000, 1500]|
#|  3|  103|      [2000, 3000]|
#+---+-----+------------------+

import pyspark.sql.functions as F

df_new = df.select('id','empno',F.explode('sal').alias('sal'))

#df_new.show()
#+---+-----+----+
#| id|empno| sal|
#+---+-----+----+
#|  1|  101|1000|
#|  1|  101|2000|
#|  1|  101|1500|
#|  2|  102|1000|
#|  2|  102|1500|
#|  3|  103|2000|
#|  3|  103|3000|
#+---+-----+----+

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.