2

I have a DataFrame which contains nan values. I would like to fill those nans with the index value. The actual use case is filling the nans with a string template containing the index value which you can answer as a bonus.

Given:

In [31]: df
Out[31]:
          0         1         2         3
0       NaN  0.069419       NaN       NaN
1  2.439000  1.943944  0.279904  0.755746
2  0.013795  1.189474  0.834894  2.202108
3  0.520385       NaN       NaN  1.451822
4  0.153863  0.957394       NaN  0.052726
5  1.274204       NaN       NaN  0.169636
6       NaN  1.031703       NaN  0.267850
7  0.419157       NaN       NaN  0.409045
8       NaN  1.526764  0.947936  0.442226
9       NaN       NaN       NaN  0.458331

and

In [35]: tmp
Out[35]: 'i=%(idx)s'

Output should be something like the following:

          0         1         2         3
0       i=0  0.069419       i=0       i=0
1  2.439000  1.943944  0.279904  0.755746
2  0.013795  1.189474  0.834894  2.202108
3  0.520385       i=3       i=3  1.451822
4  0.153863  0.957394       i=4  0.052726
5  1.274204       i=5       i=5  0.169636
6       i=6  1.031703       i=6  0.267850
7  0.419157       i=7       i=7  0.409045
8       i=8  1.526764  0.947936  0.442226
9       i=9       i=9       i=9  0.458331

Just trying to fill the nans with the index.

Tried

In [32]: df.fillna(df.index)

ValueError: invalid fill value with a <class 'pandas.core.index.Int64Index'>

Tried

In [33]: df.replace(np.nan, df.index)

TypeError: Invalid "to_replace" type: 'float'

Tried

In [41]: df.fillna(df.index.values)

ValueError: invalid fill value with a <type 'numpy.ndarray'>

Tried

In [53]: df1 = df.astype(object)

and repeating the above, received same errors.

Using pandas==0.17.1

3 Answers 3

3

Similar to @maxymoo solution using where but with pd.Series instead of lambda:

s = pd.Series(['i={}'.format(i) for i in df.index])

In [49]: df.where(df.notnull(), s, axis=0)
Out[49]:
          0         1         2         3
0       i=0  0.069419       i=0       i=0
1     2.439   1.94394  0.279904  0.755746
2  0.013795   1.18947  0.834894   2.20211
3  0.520385       i=3       i=3   1.45182
4  0.153863  0.957394       i=4  0.052726
5    1.2742       i=5       i=5  0.169636
6       i=6    1.0317       i=6   0.26785
7  0.419157       i=7       i=7  0.409045
8       i=8   1.52676  0.947936  0.442226
9       i=9       i=9       i=9  0.458331

Timing:

def f1():
    nan_strings = ["i={}".format(i) for i in df.index]
    df.apply(lambda c: c.where(c.notnull(), nan_strings))

def f2():
    s = pd.Series(['i={}s'.format(i) for i in df.index])
    df.where(df.notnull(), s, axis=0)

In [51]: %timeit f1()
100 loops, best of 3: 5.17 ms per loop

In [52]: %timeit f2()
1000 loops, best of 3: 1.34 ms per loop
Sign up to request clarification or add additional context in comments.

2 Comments

Nice one, +1 for %timeit
very nice, didn't know about using axis to broadcast across the columns
2

You can use where to do your substitution (it's kind of like assignment with a reversed mask), but you'll need to apply it column-by-column, I can't think of how to do it all at once:

In [1]: nan_strings = ["i={}".format(i) for i in df.index]

In [2]: df.apply(lambda c: c.where(c.notnull(), nan_strings))
Out[2]:
          0         1         2         3
0       i=0  0.069419       i=0       i=0
1     2.439   1.94394  0.279904  0.755746
2  0.013795   1.18947  0.834894   2.20211
3  0.520385       i=3       i=3   1.45182
4  0.153863  0.957394       i=4  0.052726
5    1.2742       i=5       i=5  0.169636
6       i=6    1.0317       i=6   0.26785
7  0.419157       i=7       i=7  0.409045
8       i=8   1.52676  0.947936  0.442226
9       i=9       i=9       i=9  0.458331

Comments

0

Ok, so what you are doing is going to cause problems. Firstly, your columns appear to be all float64. 'i=%(idx)s' is a string. So you will either have to convert all columns to object or you will have to fill float values for nan. That said, why don't you try this, and let me know if you get your answer:

df.fillna(df.index.values, inplace=True)

Since you say bonus, let's try to convert the columns to object type first:

fill_val = ['i={}'.format(i) for i in df.index.values]
df.astype('object', inplace=True)
df.fillna(fill_val, inplace=True)

4 Comments

Should have noted I tried using index values, does not like numpy arrays: ValueError: invalid fill value with a <type 'numpy.ndarray'> Understood on the float v. object.
You can do this in one line: df.astype(object, inplace=True)
@strimp099: That error can be fixed if you convert it to list: list(df.index). I'm still Pandas 0.13 guy! Haha, that's when I started, haven't kept up with the changes in all functions.
Tell me about it mate, started with 0.03 :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.