11

I am learning how to use Imputer on Python.

This is my code:

df=pd.DataFrame([["XXL", 8, "black", "class 1", 22],
["L", np.nan, "gray", "class 2", 20],
["XL", 10, "blue", "class 2", 19],
["M", np.nan, "orange", "class 1", 17],
["M", 11, "green", "class 3", np.nan],
["M", 7, "red", "class 1", 22]])

df.columns=["size", "price", "color", "class", "boh"]

from sklearn.preprocessing import Imputer

imp=Imputer(missing_values="NaN", strategy="mean" )
imp.fit(df["price"])

df["price"]=imp.transform(df["price"])

However this rises the following error: ValueError: Length of values does not match length of index

What's wrong with my code???

Thanks for helping

4 Answers 4

17

This is because Imputer usually uses with DataFrames rather than Series. A possible solution is:

imp=Imputer(missing_values="NaN", strategy="mean" )
imp.fit(df[["price"]])
df["price"]=imp.transform(df[["price"]]).ravel()

# Or even 
imp=Imputer(missing_values="NaN", strategy="mean" )
df["price"]=imp.fit_transform(df[["price"]]).ravel()
Sign up to request clarification or add additional context in comments.

2 Comments

Why is ravel() necessary here? It seems to return the correct type without it
1. If you are making 2 dimension df[["price"]] , then ravel() is not needed. In order for Imputer & fit_transform to work, all we need is 2 dimension. df[["price"]] converts data into 2 dimension . Format (Row count, 1) . 2. if you are using 1 dimension- df["price"], then the below will still work but will also return error - ValueError: Expected 2D array, got 1D array instead: array df["price"]=imp.fit_transform(df["price"]).ravel()
3

Here is the documentation for Simple Imputer For the fit method, it takes array-like or sparse metrix as an input parameter. you can try this :

imp.fit(df.iloc[:,1:2]) 
df['price']=imp.transform(df.iloc[:,1:2])

provide index location to fit method and then apply the transform.

>>> df
   size  price   color    class   boh
 0  XXL    8.0   black  class 1  22.0
 1    L    9.0    gray  class 2  20.0
 2   XL   10.0    blue  class 2  19.0
 3    M    9.0  orange  class 1  17.0
 4    M   11.0   green  class 3   NaN
 5    M    7.0     red  class 1  22.0

Same way you can do for boh

imp.fit(df.iloc[:,4:5])
df['price']=imp.transform(df.iloc[:,4:5])
>>> df
    size  price   color    class   boh
 0  XXL    8.0   black  class 1  22.0
 1    L    9.0    gray  class 2  20.0
 2   XL   10.0    blue  class 2  19.0
 3    M    9.0  orange  class 1  17.0
 4    M   11.0   green  class 3  20.0
 5    M    7.0     red  class 1  22.0

Kindly correct me if I am wrong. Suggestions will be appreciated.

Comments

2

I think you want to specify the axis for the imputer, then transpose the array it returns:

import pandas as pd
import numpy as np

df=pd.DataFrame([["XXL", 8, "black", "class 1", 22],
["L", np.nan, "gray", "class 2", 20],
["XL", 10, "blue", "class 2", 19],
["M", np.nan, "orange", "class 1", 17],
["M", 11, "green", "class 3", np.nan],
["M", 7, "red", "class 1", 22]])

df.columns=["size", "price", "color", "class", "boh"]

from sklearn.preprocessing import Imputer

imp=Imputer(missing_values="NaN", strategy="mean",axis=1 ) #specify axis
q = imp.fit_transform(df["price"]).T #perform a transpose operation


df["price"]=q
print df 

2 Comments

Thank you Ryan. Really useful.
Unfortuantely this isnt working for me :( ValueError: Expected 2D array, got 1D array instead:
1

Simple solution is to provide a 2D array

df=pd.DataFrame([["XXL", 8, "black", "class 1", 22],
["L", np.nan, "gray", "class 2", 20],
["XL", 10, "blue", "class 2", 19],
["M", np.nan, "orange", "class 1", 17],
["M", 11, "green", "class 3", np.nan],
["M", 7, "red", "class 1", 22]])

df.columns=["size", "price", "color", "class", "boh"]

from sklearn.preprocessing import Imputer

imp=Imputer(missing_values="NaN", strategy="mean" )
imp.fit(df[["price"]])

df["price"]=imp.transform(df[["price"]])

df['boh'] = imp.fit_transform(df[['price']])

Here is your DataFrame

Cleaned DataFrame

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.