Fill pandas DataFrame nans with index value

Question

I have a DataFrame which contains nan values. I would like to fill those nans with the index value. The actual use case is filling the nans with a string template containing the index value which you can answer as a bonus.

Given:

In [31]: df
Out[31]:
          0         1         2         3
0       NaN  0.069419       NaN       NaN
1  2.439000  1.943944  0.279904  0.755746
2  0.013795  1.189474  0.834894  2.202108
3  0.520385       NaN       NaN  1.451822
4  0.153863  0.957394       NaN  0.052726
5  1.274204       NaN       NaN  0.169636
6       NaN  1.031703       NaN  0.267850
7  0.419157       NaN       NaN  0.409045
8       NaN  1.526764  0.947936  0.442226
9       NaN       NaN       NaN  0.458331

and

In [35]: tmp
Out[35]: 'i=%(idx)s'

Output should be something like the following:

          0         1         2         3
0       i=0  0.069419       i=0       i=0
1  2.439000  1.943944  0.279904  0.755746
2  0.013795  1.189474  0.834894  2.202108
3  0.520385       i=3       i=3  1.451822
4  0.153863  0.957394       i=4  0.052726
5  1.274204       i=5       i=5  0.169636
6       i=6  1.031703       i=6  0.267850
7  0.419157       i=7       i=7  0.409045
8       i=8  1.526764  0.947936  0.442226
9       i=9       i=9       i=9  0.458331

Just trying to fill the nans with the index.

Tried

In [32]: df.fillna(df.index)

ValueError: invalid fill value with a <class 'pandas.core.index.Int64Index'>

Tried

In [33]: df.replace(np.nan, df.index)

TypeError: Invalid "to_replace" type: 'float'

Tried

In [41]: df.fillna(df.index.values)

ValueError: invalid fill value with a <type 'numpy.ndarray'>

Tried

In [53]: df1 = df.astype(object)

and repeating the above, received same errors.

Using pandas==0.17.1

Anton Protopopov · Accepted Answer · 2016-02-24 04:51:44Z

3

Similar to @maxymoo solution using where but with pd.Series instead of lambda:

s = pd.Series(['i={}'.format(i) for i in df.index])

In [49]: df.where(df.notnull(), s, axis=0)
Out[49]:
          0         1         2         3
0       i=0  0.069419       i=0       i=0
1     2.439   1.94394  0.279904  0.755746
2  0.013795   1.18947  0.834894   2.20211
3  0.520385       i=3       i=3   1.45182
4  0.153863  0.957394       i=4  0.052726
5    1.2742       i=5       i=5  0.169636
6       i=6    1.0317       i=6   0.26785
7  0.419157       i=7       i=7  0.409045
8       i=8   1.52676  0.947936  0.442226
9       i=9       i=9       i=9  0.458331

Timing:

def f1():
    nan_strings = ["i={}".format(i) for i in df.index]
    df.apply(lambda c: c.where(c.notnull(), nan_strings))

def f2():
    s = pd.Series(['i={}s'.format(i) for i in df.index])
    df.where(df.notnull(), s, axis=0)

In [51]: %timeit f1()
100 loops, best of 3: 5.17 ms per loop

In [52]: %timeit f2()
1000 loops, best of 3: 1.34 ms per loop

answered Feb 24, 2016 at 4:51

Anton Protopopov

31.9k13 gold badges93 silver badges96 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Jason Strimpel Over a year ago

Nice one, +1 for %timeit

miriamsimone Over a year ago

very nice, didn't know about using axis to broadcast across the columns

miriamsimone · Accepted Answer · 2016-02-24 04:47:22Z

You can use where to do your substitution (it's kind of like assignment with a reversed mask), but you'll need to apply it column-by-column, I can't think of how to do it all at once:

In [1]: nan_strings = ["i={}".format(i) for i in df.index]

In [2]: df.apply(lambda c: c.where(c.notnull(), nan_strings))
Out[2]:
          0         1         2         3
0       i=0  0.069419       i=0       i=0
1     2.439   1.94394  0.279904  0.755746
2  0.013795   1.18947  0.834894   2.20211
3  0.520385       i=3       i=3   1.45182
4  0.153863  0.957394       i=4  0.052726
5    1.2742       i=5       i=5  0.169636
6       i=6    1.0317       i=6   0.26785
7  0.419157       i=7       i=7  0.409045
8       i=8   1.52676  0.947936  0.442226
9       i=9       i=9       i=9  0.458331

Kartik · Accepted Answer · 2016-02-24 04:55:59Z

0

Ok, so what you are doing is going to cause problems. Firstly, your columns appear to be all float64. 'i=%(idx)s' is a string. So you will either have to convert all columns to object or you will have to fill float values for nan. That said, why don't you try this, and let me know if you get your answer:

df.fillna(df.index.values, inplace=True)

Since you say bonus, let's try to convert the columns to object type first:

fill_val = ['i={}'.format(i) for i in df.index.values]
df.astype('object', inplace=True)
df.fillna(fill_val, inplace=True)

edited Feb 24, 2016 at 4:55

answered Feb 24, 2016 at 4:37

Kartik

8,73345 silver badges78 bronze badges

4 Comments

Jason Strimpel Over a year ago

Should have noted I tried using index values, does not like numpy arrays: ValueError: invalid fill value with a <type 'numpy.ndarray'> Understood on the float v. object.

Jason Strimpel Over a year ago

You can do this in one line: df.astype(object, inplace=True)

Kartik Over a year ago

@strimp099: That error can be fixed if you convert it to list: list(df.index). I'm still Pandas 0.13 guy! Haha, that's when I started, haven't kept up with the changes in all functions.

Jason Strimpel Over a year ago

Tell me about it mate, started with 0.03 :)

Collectives™ on Stack Overflow

Fill pandas DataFrame nans with index value

3 Answers 3

2 Comments

Comments

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related