I have a data set with two columns of categorical label data (NBA Team names). What I want to do is use one hot encoding to generate a binary, 1D vector as an array representing each team. Here is my code:
from sklearn.preprocessing import MultiLabelBinarizer
one_hot_encoder = MultiLabelBinarizer()
table["Teams"] = one_hot_encoder.fit_transform(table["Teams"])
The encoder works appropriately, and it generates the arrays accordingly. In other words,
one_hot_encoder.fit_transform(table["Teams"])
generates the following properly:
Link to encoder result screenshot
However, when I try to store the array into the column, as follows:
table["Teams"] = one_hot_encoder.fit_transform(table["Teams"])
It seems like it's not being saved properly.
Link to data frame result screenshot
Instead, it looks as if the column is just taking the first value of each array, and not storing the entire array. How should I go about resolving this?