Store array as a value within Pandas column

Question

I have a data set with two columns of categorical label data (NBA Team names). What I want to do is use one hot encoding to generate a binary, 1D vector as an array representing each team. Here is my code:

from sklearn.preprocessing import MultiLabelBinarizer
one_hot_encoder = MultiLabelBinarizer()
table["Teams"] = one_hot_encoder.fit_transform(table["Teams"])

The encoder works appropriately, and it generates the arrays accordingly. In other words,

one_hot_encoder.fit_transform(table["Teams"])

generates the following properly:

Link to encoder result screenshot

However, when I try to store the array into the column, as follows:

table["Teams"] = one_hot_encoder.fit_transform(table["Teams"])

It seems like it's not being saved properly.

Link to data frame result screenshot

Instead, it looks as if the column is just taking the first value of each array, and not storing the entire array. How should I go about resolving this?

Could your paste your sample data instead of image?

Frank AK
– Frank AK

2018-07-13 07:21:52 +00:00
Commented Jul 13, 2018 at 7:21 — Frank AK
– Frank AK, Commented Jul 13, 2018 at 7:21

jezrael · Accepted Answer · 2018-07-13 07:29:33Z

1

I think need convert 2d array to lists:

table = pd.DataFrame({"Teams":list('aaasdffds')})

from sklearn.preprocessing import MultiLabelBinarizer
one_hot_encoder = MultiLabelBinarizer()

table["Teams"] = one_hot_encoder.fit_transform(table["Teams"]).tolist()
print (table)
          Teams
0  [1, 0, 0, 0]
1  [1, 0, 0, 0]
2  [1, 0, 0, 0]
3  [0, 0, 0, 1]
4  [0, 1, 0, 0]
5  [0, 0, 1, 0]
6  [0, 0, 1, 0]
7  [0, 1, 0, 0]
8  [0, 0, 0, 1]

But store arrays or lists to one column is not recommended because not possible use vectorized methods/functions, better is create DataFrame:

table = pd.DataFrame(one_hot_encoder.fit_transform(table["Teams"]), 
                     columns=one_hot_encoder.classes_)
print (table)

   a  d  f  s
0  1  0  0  0
1  1  0  0  0
2  1  0  0  0
3  0  0  0  1
4  0  1  0  0
5  0  0  1  0
6  0  0  1  0
7  0  1  0  0
8  0  0  0  1

edited Jul 13, 2018 at 7:29

answered Jul 13, 2018 at 7:23

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Vivek · Accepted Answer · 2018-07-13 15:44:29Z

0

Realizing you need a list within your DataFrame. You can store the arrays as a list, pandas wont modify it.

from sklearn.preprocessing import MultiLabelBinarizer
mlb = MultiLabelBinarizer()
encoded_array = mlb.fit_transform(table['Teams'])
table['Teams'] = [ [encoded_array [i,:]] for i in range(table.shape[0]) ]

edited Jul 13, 2018 at 15:44

answered Jul 13, 2018 at 8:42

Vivek

3321 silver badge13 bronze badges

1 Comment

jezrael Over a year ago

OP need new column filled with array, so your question dont answer it. It is recommedation only, same principe in my answer.

Collectives™ on Stack Overflow

Store array as a value within Pandas column

2 Answers 2

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related