Pandas: convert column with empty strings to float

Question

In my application, I receive a pandas DataFrame (say, block), that has a column called est. This column can contain a mix of strings or floats. I need to convert all values in the column to floats and have the column type be float64. I do so using the following code:

block[est].convert_objects(convert_numeric=True)
block[est].astype('float')

This works for most cases. However, in one case, est contains all empty strings. In this case, the first statement executes without error, but the empty strings in the column remain empty strings. The second statement then causes an error: ValueError: could not convert string to float:.

How can I modify my code to handle a column with all empty strings?

Edit: I know I can just do block[est].replace("", np.NaN), but I was wondering if there's some way to do it with just convert_objects or astype that I'm missing.

Clarification: For project-specific reasons, I need to use pandas 0.16.2.

Here's an interaction with some sample data that demonstrates the failure:

>>> block = pd.DataFrame({"eps":["", ""]})
>>> block = block.convert_objects(convert_numeric=True)
>>> block["eps"]
0
1
Name: eps, dtype: object
>>> block["eps"].astype('float')
...
ValueError: could not convert string to float:

I don't think so. I'm having problems here specifically with empty strings, not with modifying non-empty values. — LateCoder
– LateCoder, Commented Feb 17, 2016 at 19:50

Community · Accepted Answer · 2020-06-20 09:12:55Z

36

It's easier to do it using:

pandas.to_numeric

http://pandas.pydata.org/pandas-docs/version/0.17.0/generated/pandas.to_numeric.html

import pandas as pd
df = pd.DataFrame({'eps': ['1', 1.6, '1.6', 'a', '', 'a1']})

df['eps'] = pd.to_numeric(df['eps'], errors='coerce')

'coerce' will convert any value error to NaN

df['eps'].astype('float')
0    1.0
1    1.6
2    1.6
3    NaN
4    NaN
5    NaN
Name: eps, dtype: float64

Then you can apply other functions without getting errors :

df['eps'].round()
0    1.0
1    2.0
2    2.0
3    NaN
4    NaN
5    NaN
Name: eps, dtype: float64

edited Jun 20, 2020 at 9:12

CommunityBot

11 silver badge

answered Aug 8, 2017 at 16:16

mcrrnz

5956 silver badges10 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Alexander · Accepted Answer · 2016-02-17 22:32:43Z

0

def convert_float(val):
    try:
        return float(val)
    except ValueError:
        return np.nan

df = pd.DataFrame({'eps': ['1', 1.6, '1.6', 'a', '', 'a1']})
>>> df.eps.apply(lambda x: convert_float(x))
0    1.0
1    1.6
2    1.6
3    NaN
4    NaN
5    NaN
Name: eps, dtype: float64

edited Feb 17, 2016 at 22:32

answered Feb 17, 2016 at 20:06

Alexander

111k32 gold badges212 silver badges208 bronze badges

4 Comments

LateCoder Over a year ago

Doesn't work with my version of pandas: AttributeError: 'DataFrame' object has no attribute 'data'

Alexander Over a year ago

Sorry, that is the name I gave the column. That should be eps per your example.

LateCoder Over a year ago

This replaces valid data with NaN, which is not what I'm looking to do. E.g., if my DataFrame contains ["", "1.0"], the "1.0" is also converted to NaN.

Alexander Over a year ago

That's why sample data goes a long way. Could you please post some?

Collectives™ on Stack Overflow

Pandas: convert column with empty strings to float

2 Answers 2

Comments

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related