I have a pyspark.sql.dataframe.DataFrame which is something like this:
+---------------------------+--------------------+--------------------+
|collect_list(results) | userid | page |
+---------------------------+--------------------+--------------------+
| [[[roundtrip, fal...|13482f06-9185-47f...|1429d15b-91d0-44b...|
+---------------------------+--------------------+--------------------+
Inside the collect_list(results) column there is an array with len = 2, and the elements are also arrays (the first one has a len = 1, and the second one a len = 9).
Is there a way to flatten this array of arrays into a unique array with len = 10 using pyspark?
Thanks!
query1 = spark.sql(""" select collect_list(results), userid, page from table group by 2,3 """)