I'm trying to create a JSON structure from a pyspark dataframe. I have below columns in my dataframe - batch_id, batch_run_id, table_name, column_name, column_datatype, last_refresh_time, refresh_frequency, owner
I want it in below JSON structure -
{
"GeneralInfo": {
"DataSetID": "xxxx1234Abcsd",
"Owner" : ["[email protected]", "[email protected]", "[email protected]"]
"Description": "",
"BuisnessFunction": "",
"Source": "",
"RefreshRate": "Weekly",
"LastUpdate": "2020/10/15",
"InfoSource": "TemplateInfo"
},
"Tables": [
{
"TableName": "Employee",
"Columns" : [
{ "ColumnName" : "EmployeeID",
"ColumnDataType": "int"
},
{ "ColumnName" : "EmployeeName",
"ColumnDataType": "string"
}
]
}
}
}
I'm trying to assign the values in JSON string through dataframe column indexes but it is giving me an error as "Object of Type Column is not JSON serializable". I have used like below -
{
"GeneralInfo": {
"DataSetID": df["batch_id"],
"Owner" : list(df["owner"])
"Description": "",
"BuisnessFunction": "",
"Source": "",
"RefreshRate": df["refresh_frequency"],
"LastUpdate": df["last_update_time"],
"InfoSource": "TemplateInfo"
},
"Tables": [
{
"TableName": df["table_name"],
"Columns" : [
{ "ColumnName" : df["table_name"]["column_name"],
"ColumnDataType": df["table_name"]["column_datatype"]
}
]
}
}
}
Please help me on this, I have newly started coding in Pyspark.
