I got df like this:
cols=['a', 'b']
df = pd.DataFrame([[[3,1,5,7], [42,31]], [[],[44]], [[44,3,5,5,5,10],[]], [[], [44324,3]]],
columns=cols)
As you see theres list in every cell. I want to to followings things on each of element:
- Calculate mean of list and append 5
- If result <= 0, add 1 in place of list
- If list is empty, add 0 in place of list
My working solution:
df
def convert_list(x):
if len(x) != 0:
res = (sum(x)/len(x)) + 5
if res <= 0:
res = 1
return res
return 0
for col in cols:
df[col] = df[col].apply(lambda x: convert_list(x))
Desired output:
df
It's working but its very slow solution (in original df I got about 50k columns and 100k rows, and list might contains many elements). Is there any efficient solution for this? I also tried convert it to numpy array and do some vectecorized operations, but the problem is every list might have different length, so I cant convert it (unless I add many elements to other lists...)