This is my actual code , its working fine
df_train_taxrate = (
df_train.groupby(
'Company_code_BUKRS',
'Vendor_Customer_Code_WT_ACCO',
'Expense_GL_HKONT',
'PAN_J_1IPANNO',
'HSN_SAC_HSN_SAC'
).agg(
f.collect_set('Section_WT_QSCOD').alias('Unique_Sectio_Code'),
f.collect_set('WHT_rate_QSATZ').alias('Unique_Wtax_rate')
)
)
But the problem is 'Section_WT_QSCOD,WHT_rate_QSATZ these are array's, while converting arrays into string I'm getting below error.
mycode:
df_train_taxrate = df_train.groupby(
'Company_code_BUKRS',
'Vendor_Customer_Code_WT_ACCO',
'Expense_GL_HKONT',
'PAN_J_1IPANNO',
'HSN_SAC_HSN_SAC'
).agg(
f.collect_set('Section_WT_QSCOD').withColumn(
'Section_WT_QSCOD',
concat_ws(',', 'Unique_Sectio_Code')
),
f.collect_set('WHT_rate_QSATZ').withColumn(
'WHT_rate_QSATZ',
concat_ws(',', 'Unique_W_tax_rate')
)
)
Error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'Column' object is not callable
withColumnoncollect_set(), which doesn't make any sense. That would explain why you get that error message.concat_ws(",". f.collect_set('Section_WT_QSCOD')).alias( 'Section_WT_QSCOD')