1

I am trying to create nested array of array inside a pandas dataframe column df[link] and attach back to original datframe. What's wrong with my code and how to fix this?

Error:

TypeError

Traceback (most recent call last)

in
2 df2['shipmentNumber'] = df2.shipmentID.str.split('-',1).str[-1]
3 df2['link'] = pd.DataFrame({'link': df2.to_dict('records')})
----> 4 result['link'] = df2.groupby(df2.index).agg(list)['link']

c:\users\ashok.eapen\pycharmprojects\rs-components\venv\lib\site-packages\pandas\core\groupby\generic.py in aggregate(self, func, engine, engine_kwargs, *args, **kwargs)
940 def aggregate(self, func=None, *args, engine=None, engine_kwargs=None, **kwargs):
941
--> 942 relabeling, func, columns, order = reconstruct_func(func, **kwargs)
943
944 if maybe_use_numba(engine):

TypeError: 'DataFrame' objects are mutable, thus they cannot be hashed

My input:

df6

ShipmentID                                                                             CustomerCode  
['USWPR04-20210429-S-00001', 'USWPR04-20210429-S-00002','USWPR04-20210429-S-00006']    USWPR04
['MSLPR04-20210429-S-00001', 'MSLPR04-20210429-S-00002']                               MSLPR04

My code:

df2= df6.explode('shipmentID')
df2['shipmentNumber'] = df2.shipmentID.str.split('-',1).str[-1]
df2['link'] = pd.DataFrame({'link': df2.to_dict('records')})
result['link']  = df2.groupby(df2.index).agg(list)['link']

Expected output column:

df['LinkID']

[{ "shipID": "USWPR04-20210429-S-00001", "customerCode": "USWPR04", "shiNumber": "20210429-S-00001" },
 { "shipID": "USWPR04-20210429-S-00002", "customerCode": "USWPR04", "shipNumber": "20210429-S-00002" },
 { "ShipID": "USWPR04-20210429-S-00002", "customerCode": "USWPR04", "shipNumber": "20210429-S-00006" }]

[{ "shipID": "MSLPR04-20210429-S-00001", "customerCode": "MSLPR04", "shiNumber": "20210429-S-00001" },
{ "shipID": "MSLPR04-20210429-S-00002", "customerCode": "MSLPR04", "shipNumber": "20210429-S-00002" }]

Expected final dataframe:

ShipID                                                                             CustomerCode   link
['USWPR04-20210429-S-00001', 'USWPR04-20210429-S-00002','USWPR04-20210429-S-00006']    USWPR04    [{ "shipID": "USWPR04-20210429-S-00001", "customerCode": "USWPR04", "shiNumber": "20210429-S-00001" },{ "shipID": "USWPR04-20210429-S-00002", "customerCode": "USWPR04", "shipNumber": "20210429-S-00002" },{ "ShipID": "USWPR04-20210429-S-00002", "customerCode": "USWPR04", "shipNumber": "20210429-S-00006" }]
['MSLPR04-20210429-S-00001', 'MSLPR04-20210429-S-00002']                               MSLPR04    [{ "shipID": "MSLPR04-20210429-S-00001", "customerCode": "MSLPR04", "shiNumber": "20210429-S-00001" },{ "shipID": "MSLPR04-20210429-S-00002", "customerCode": "MSLPR04", "shipNumber": "20210429-S-00002" }]
0

1 Answer 1

1

Use:

#add df2.index first
df2['link'] = pd.DataFrame({'link': df2.to_dict('records')}, index=df2.index)
#assign to `df6`
df6['link'] = df2.groupby(df2.index)['link'].agg(list)

Or instead your solution list comprehension:

df6['link1'] = [[{'shipID':x, 'CustomerCode':b, 'shipmentNumber': x.split('-',1)[-1]} 
                for x in a] for a, b in zip(df6['ShipmentID'],df6['CustomerCode'])]

Output are same:

print (df6['link'] == df6['link1'])
0    True
1    True
dtype: bool
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.