27

In my application, I receive a pandas DataFrame (say, block), that has a column called est. This column can contain a mix of strings or floats. I need to convert all values in the column to floats and have the column type be float64. I do so using the following code:

block[est].convert_objects(convert_numeric=True)
block[est].astype('float')

This works for most cases. However, in one case, est contains all empty strings. In this case, the first statement executes without error, but the empty strings in the column remain empty strings. The second statement then causes an error: ValueError: could not convert string to float:.

How can I modify my code to handle a column with all empty strings?

Edit: I know I can just do block[est].replace("", np.NaN), but I was wondering if there's some way to do it with just convert_objects or astype that I'm missing.

Clarification: For project-specific reasons, I need to use pandas 0.16.2.

Here's an interaction with some sample data that demonstrates the failure:

>>> block = pd.DataFrame({"eps":["", ""]})
>>> block = block.convert_objects(convert_numeric=True)
>>> block["eps"]
0
1
Name: eps, dtype: object
>>> block["eps"].astype('float')
...
ValueError: could not convert string to float:
3
  • Possible duplicate of How to remove characters from floats? Commented Feb 17, 2016 at 19:35
  • 1
    I don't think so. I'm having problems here specifically with empty strings, not with modifying non-empty values. Commented Feb 17, 2016 at 19:50
  • could you please post a few lines of sample data? Commented Feb 17, 2016 at 19:50

2 Answers 2

36

It's easier to do it using:

pandas.to_numeric

http://pandas.pydata.org/pandas-docs/version/0.17.0/generated/pandas.to_numeric.html

import pandas as pd
df = pd.DataFrame({'eps': ['1', 1.6, '1.6', 'a', '', 'a1']})

df['eps'] = pd.to_numeric(df['eps'], errors='coerce')

'coerce' will convert any value error to NaN

df['eps'].astype('float')
0    1.0
1    1.6
2    1.6
3    NaN
4    NaN
5    NaN
Name: eps, dtype: float64

Then you can apply other functions without getting errors :

df['eps'].round()
0    1.0
1    2.0
2    2.0
3    NaN
4    NaN
5    NaN
Name: eps, dtype: float64
Sign up to request clarification or add additional context in comments.

Comments

0
def convert_float(val):
    try:
        return float(val)
    except ValueError:
        return np.nan

df = pd.DataFrame({'eps': ['1', 1.6, '1.6', 'a', '', 'a1']})
>>> df.eps.apply(lambda x: convert_float(x))
0    1.0
1    1.6
2    1.6
3    NaN
4    NaN
5    NaN
Name: eps, dtype: float64

4 Comments

Doesn't work with my version of pandas: AttributeError: 'DataFrame' object has no attribute 'data'
Sorry, that is the name I gave the column. That should be eps per your example.
This replaces valid data with NaN, which is not what I'm looking to do. E.g., if my DataFrame contains ["", "1.0"], the "1.0" is also converted to NaN.
That's why sample data goes a long way. Could you please post some?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.