1

I am working with categorical variables in Machine Learning.Here is sample of my data:

age,gender,height,class,label
25,m,43,A,0
35,f,45,B,1
12,m,36,C,0
14,f,42,A,0

There are two categorical variables gender and height.I have used LabelEncoding technique.

My code:

import pandas as pd
import numpy as np
from sklearn.preprocessing import LabelEncoder,OneHotEncoder

df=pd.read_csv('test.csv')

X=df.drop(['label'],1)
y=np.array(df['label'])

data=X.iloc[:,:].values

lben = LabelEncoder()
data[:,1] = lben.fit_transform(data[:,1])
data[:,3] = lben.fit_transform(data[:,3])

onehotencoder = OneHotEncoder(categorical_features=[1])
data = onehotencoder.fit_transform(data).toarray()

onehotencoder = OneHotEncoder(categorical_features=[3])
data = onehotencoder.fit_transform(data).toarray()

print(data.shape)

np.savetxt('data.csv',data,fmt='%s')    

The data.csv looks like this:

0.0 0.0 1.0 0.0 0.0 1.0 25.0 0.0
0.0 0.0 0.0 1.0 1.0 0.0 35.0 1.0
1.0 0.0 0.0 0.0 0.0 1.0 12.0 2.0
0.0 1.0 0.0 0.0 1.0 0.0 14.0 0.0

I am unable to understand why the column is like this i.e where is the value of the 'height' column.Also the data.shape is (4,8) instead of (4,7) i.e(gender represented by 2 columns and class by 3 and 'age' and 'height' features.

1 Answer 1

1

Are you sure that you need to use LabelEncoder+OneHotEncoder? There is a much simpler method (which does not allow to do advanced procedures, but so far you seem to work on basics):

import pandas as pd
import numpy as np

df=pd.read_csv('test.csv')

X=df.drop(['label'],1)
y=np.array(df['label'])

data = pd.get_dummies(X)

The problem with the current code is that after you have done the first OHE:

onehotencoder = OneHotEncoder(categorical_features=[1])
data = onehotencoder.fit_transform(data).toarray()

the columns get shifted and column 3 is in fact the original height column instead of the label-encoded class column. So change the second one to use column 4 and you will get what you want.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.