1

i am trying to impute missing values in pandas dataframe using linear regression

`

for index in [missing_data_df.horsepower.index]:
    i = 0
    if pd.isnull(missing_data_df.horsepower[index[i]]):
            #linear regression equation
            a = 0.25743277 * missing_data_df.displacement[index[i]] + 0.00958711 * 
            missing_data_df.weight[index[i]] + 25.874947903262651
            # replacing "nan" values in dataframe using .set_value
            missing_data_df.set_value(index[i],"horsepower",a) 
    i+=1

`
it is executing. but missing values(nan) in dataframe not replaced by the predicted values by linear regression in variable 'a'. any suggestion why?

below is the dataframe containing missing data `

   >>> missing_data_df:
       mpg cylinders  displacement  horsepower  weight  acceleration  \
10    NaN       4.0         133.0       115.0  3090.0          17.5   
11    NaN       8.0         350.0       165.0  4142.0          11.5   
12    NaN       8.0         351.0       153.0  4034.0          11.0   
13    NaN       8.0         383.0       175.0  4166.0          10.5   
14    NaN       8.0         360.0       175.0  3850.0          11.0   
17    NaN       8.0         302.0       140.0  3353.0           8.0   
38   25.0       4.0          98.0         NaN  2046.0          19.0   
39    NaN       4.0          97.0        48.0  1978.0          20.0   
133  21.0       6.0         200.0         NaN  2875.0          17.0   
337  40.9       4.0          85.0         NaN  1835.0          17.3   
343  23.6       4.0         140.0         NaN  2905.0          14.3   
361  34.5       4.0         100.0         NaN  2320.0          15.8   
367   NaN       4.0         121.0       110.0  2800.0          15.4   
382  23.0       4.0         151.0         NaN  3035.0          20.5   

       model_year origin                          car_name  
10        70.0    2.0              citroen ds-21 pallas  
11        70.0    1.0  chevrolet chevelle concours (sw)  
12        70.0    1.0                  ford torino (sw)  
13        70.0    1.0           plymouth satellite (sw)  
14        70.0    1.0                amc rebel sst (sw)  
17        70.0    1.0             ford mustang boss 302  
38        71.0    1.0                        ford pinto  
39        71.0    2.0       volkswagen super beetle 117  
133       74.0    1.0                     ford maverick  
337       80.0    2.0              renault lecar deluxe  
343       80.0    1.0                ford mustang cobra  
361       81.0    2.0                       renault 18i  
367       81.0    2.0                         saab 900s  
382       82.0    1.0                    amc concord dl

`

2 Answers 2

1

You can use apply and lambda for this:

missing_data_df['horsepower']= missing_data_df.apply(
    lambda row: 
            0.25743277 * row.displacement + 0.00958711 * row.weight + 25.874947903262651 
            if np.isnan(row.horsepower) else row.horsepower, axis=1)
Sign up to request clarification or add additional context in comments.

Comments

0

Several things

  1. missing_data_df.horsepower has no missing values
  2. missing_data_df.weight, a variable in your formula, does have missing values
  3. if hp = 0.25743277 * disp + 0.00958711 * weight + 25.874947903262651
    then weight = (0.25743277 * disp + 25.874947903262651 - hp) / -0.00958711

To calculate weight try

for idx in missing_data_df.index:
    if pd.isnull(missing_data_df.loc[idx,"weight"]):
        disp = missing_data_df.loc[idx,"displacement"]
        hp = missing_data_df.loc[idx,"horsepower"]
        missing_data_df.loc[idx,"weight"] = (0.25743277 * disp + 25.874947903262651 - hp) / -0.00958711

In general, .loc[] and .iloc[] are a better way to go when finding or setting values

2 Comments

it have noticed it.. sorry my bad... it is my indentation problem while posting it. Actually there is no missing values in Weight Column. i have corrected indentation above
OK. But with minor modifications I believe my script will solve your problem

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.