Given a dataframe with a list of arrays
Schema
|-- items: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- name: string (nullable = true)
| | |-- quantity: string (nullable = true)
+-------------------------------+
|items |
+-------------------------------+
|[[A, 1], [B, 1], [C, 2]] |
---------------------------------
How do i get a string:
+-------------------------------+
|items |
+-------------------------------+
|A, 1, B, 1, C, 2 |
---------------------------------
Tried:
df.withColumn('item_str', concat_ws(" ", col("items"))).select("item_str").show(truncate = False)
Error:
: org.apache.spark.sql.AnalysisException: cannot resolve 'concat_ws(' ', `items`)' due to data type mismatch: argument 2 requires (array<string> or string) type, however, '`items`' is of array<struct<name:string,quantity:string>> type.;;
pyspark.sql.functions.flattendf.withColumn("items_flat",flatten("items")).show(False)and got error:The argument should be an array of arrays, but 'items' is of array<struct<name:string,quantity:string>> type.;;