Pandas dataframe to 3D array

Question

I have a dataframe like this

group           b             c           d           e        label
A           0.577535    0.299304    0.617103    0.378887       1
            0.167907    0.244972    0.615077    0.311497       0
B           0.640575    0.768187    0.652760    0.822311       0
            0.424744    0.958405    0.659617    0.998765       1
            0.077048    0.407182    0.758903    0.273737       0

I want to reshape it into a 3D array which an LSTM could use as input, using padding. So group A should feed in a sequence of length 3 (after padding) and group B of length 3. Desired output something like

array1 = [[[0.577535, 0.299304, 0.617103, 0.378887],
          [0.167907, 0.244972, 0.615077, 0.311497],
          [0, 0, 0, 0]],
         [[0.640575, 0.768187, 0.652760, 0.822311],
          [0.424744, 0.958405, 0.659617, 0.998765],
          [0.077048, 0.407182, 0.758903, 0.273737]]]

and then the labels have to be reshaped accordingly too

array2 = [[1,
           0,
           0],
          [0,
           1,
           0]]

How can I put in the padding and reshape my data?

Would you make your dataframe itself reproducible? ie what code should we run to have that dataframe. If yes, I think I'll be able to help. — zabop
– zabop, Commented Aug 23, 2020 at 19:58

Henry Yik · Accepted Answer · 2020-08-23 20:18:03Z

1

You can first use cumcount to create a count for each group, reindex by MultiIndex.from_product and fill with 0, and finally export to list:

df["count"] = df.groupby("group")["label"].cumcount()
mux = pd.MultiIndex.from_product([df["group"].unique(), range(max(df["count"]+1))], names=["group","count"])

df = df.set_index(["group","count"]).reindex(mux, fill_value=0)

print (df.iloc[:,:4].groupby(level=0).apply(pd.Series.tolist).values.tolist())

[[[0.577535, 0.299304, 0.617103, 0.378887],
  [0.167907, 0.24497199999999997, 0.6150770000000001, 0.31149699999999997],
  [0.0, 0.0, 0.0, 0.0]],
 [[0.640575, 0.768187, 0.65276, 0.822311],
  [0.42474399999999995, 0.958405, 0.659617, 0.998765],
  [0.077048, 0.40718200000000004, 0.758903, 0.273737]]]

print (df.groupby(level=0)["label"].apply(list).tolist())

[[1, 0, 0], [0, 1, 0]]

answered Aug 23, 2020 at 20:18

Henry Yik

22.6k5 gold badges21 silver badges44 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Mobeus Zoom Over a year ago

thanks. I get an error on df.iloc[:,:4].groupby(level=0).apply(pd.Series.tolist).values.tolist(), saying 'DataFrame' object has no attribute 'dtype'. I must admit I replaced df.iloc[:,:4] with df..iloc[:,:-1] for sake of generality, but can't see how that should make a difference

Dev Randalpura · Accepted Answer · 2020-08-23 20:34:22Z

I'm assuming your group column consists of many values and not just 1 'A' and 1 'B'. This code worked for me, you can give it a try as well:

import pandas as pd

df = pd.read_csv('file2.csv')
vals = df['group'].unique()

array1 = []
array2 = []

for val in vals:
    
    val_df = df[df.group == val]
    val_label = val_df.label
    smaller_array = []
    
    label_small_array = []
    
    for label in val_label:
        label_small_array.append(label)
        
    array2.append(label_small_array)
    
    for i in range(val_df.shape[0]):
        smallest_array = []
        
        for j in val_df.columns:
            smallest_array.append(j)
        
        smaller_array.append(smallest_array)
    
    array1.append(smaller_array)

Collectives™ on Stack Overflow

Pandas dataframe to 3D array

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related