I have a DataFrame similar to following:
new_df = spark.createDataFrame([
([['hello', 'productcode'], ['red','color']], 7),
([['hi', 'productcode'], ['blue', 'color']], 8),
([['hoi', 'productcode'], ['black','color']], 7)
], ["items", "frequency"])
new_df.show(3, False)
# +------------------------------------------------------------+---------+
# |items |frequency|
# +------------------------------------------------------------+---------+
# |[WrappedArray(hello, productcode), WrappedArray(red, color)]|7 |
# |[WrappedArray(hi, productcode), WrappedArray(blue, color)] |8 |
# |[WrappedArray(hoi, productcode), WrappedArray(black, color)]|7 |
# +------------------------------------------------------------+---------+
I need to generate a new DataFrame similar to following:
# +-------------------------------------------
# |productcode | color |frequency|
# +-------------------------------------------
# |hello | red | 7 |
# |hi | blue | 8 |
# |hoi | black | 7 |
# +--------------------------------------------
new_df.select(col("items").getItem(0).getItem(0).alias('productcode'),col("items").getItem(1).getItem(0).alias('color'),col("frequency")).show()