0

I have a dataframe df1 with a column col1 that has structure :

StructField(recipientResource,ArrayType(StructType(List(StructField(resourceId,StringType,true),StructField(type,StringType,true))),true),true)

and another dataframe df2 with col1 that has structure:

StructField(recipientResource,StructType(List(StructField(resourceId,StringType,true),StructField(type,StringType,true))),true)

Inorder to union df1.union(df2), I was trying to cast the column in df2 to convert it from StructType to ArrayType(StructType), however nothing which I tried has worked out.

Can anyone suggest how to go about the same. I'm new to pyspark, any help is appreciated.

2
  • array<struct<...>> and struct<...> are two completely different objects - you cannot cast one into another. You could add wrapping array if that's what you mean, like select(array(struct_column)). Commented May 10, 2018 at 18:19
  • 1
    An minimal reproducible example with a small sample of your dataframes and the desired output would be helpful. See more on how to create good reproducible apache spark dataframe examples. Commented May 10, 2018 at 18:35

1 Answer 1

1

Here is a simple solution using array() function:

Input:

df1 (with ArrayType(StructType()) column):

enter image description here

df2 (with StructType() column):

enter image description here

Code:

df2=(df2
     .withColumn('recipientResource',array(col('recipientResource'))) #convert StructType() column to ArrayType(StructType()) column
    )

Output:

Modified df2:

enter image description here

df3 (output after union of df1 and df2):

enter image description here

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.