3

I am trying to convert strings to float, but I get the error in the title. I don't understand why it doesn't recognise period ('.') as a decimal. Here is a head of my dataframe.

      Country                                           Variable  \
0  Afghanistan                 Inflation, GDP deflator (annual %)   
1  Afghanistan                            GDP (constant 2010 US$)   
2  Afghanistan                                  Population, total   
3  Afghanistan                       Population ages 15-64, total   
4  Afghanistan  Employment to population ratio, 15+, total (%)...   

2007 [YR2007]     2008 [YR2008]      2009 [YR2009]     2010 [YR2010]  \
0  22.3820157780035  2.17910328500052  -2.10708255443797  9.43779477259656   
1  11721187594.2052    12144482858.18   14697331940.6464  15936800636.2487   
2          26616792          27294031           28004331          28803167   
3          13293041          13602366           13950492          14372378   
4  47.1220016479492  47.0480003356934    47.015998840332  47.0429992675781   

And here is the code (Python 3.6):

growth_raw.iloc[:,3:] = growth_raw.iloc[:,3:].values.astype('float64')

I get:

ValueError: could not convert string to float: '.'

Any wise thoughts appreciated. Many thanks.

Update: I had accidentally converted NAs '..' to '.'. I have now converted them to ''. I now get

ValueError: could not convert string to float:

I have tried

growth_raw.apply(lambda x: x.str.strip())

For conversion, I have tried

growth_raw.iloc[:,2:].values.astype(float)

Which gives me the above error. I have also tried the following two which give me no error, but do nothing to the data:

growth_raw.iloc[:,2:].apply(lambda x: pd.to_numeric(x), axis=0)
growth_raw.iloc[:,2:].apply(pd.to_numeric,errors='coerce')
4
  • Seems like 0.0 might be represented as just .. How do you want to handle that data? Commented Oct 23, 2017 at 16:14
  • Use pd.to_numeric Commented Oct 23, 2017 at 16:23
  • Thanks guys. Tried both. I have updated my original post. Commented Oct 23, 2017 at 17:42
  • Could not figure it out, but had no problems with R: growth_raw[,3:11] = lapply(growth_raw[,3:11], as.numeric) Commented Oct 24, 2017 at 16:31

2 Answers 2

6

Use pd.to_numeric to be on safer side with erros = 'coerce' ( there might be some bad data in real) i.e

df.iloc[:,3:].apply(pd.to_numeric,errors='coerce')
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks. Tried that.
0

There seems to be nothing wrong with this sample of data and the way you convert it works fine for me. So what is causing the problem is somewhere else in the data.

I had accidentally converted NAs '..' to '.'. I have now converted them to ''.

Why did you do that? I can't get it. How do you think pandas is supposed to convert '' (empty string) to float. Try this float('') in the interactive mode and you will get the error you're reporting here. Just leave NaNs alone and see what happens.

Would you also please provide the full traceback of the error? It looks like you have '.' where it's supposed to be a number.

4 Comments

I had done the conversion because I had NAs as '..' and I received ValueError: could not convert string to float: '..' The full traceback of the error was the following: Traceback (most recent call last): File "C:\Users\user\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 2881, in run_code exec(code_obj, self.user_global_ns, self.user_ns) File "<ipython-input-30-18b4e27ef82a>", line 1, in <module> growth_raw.iloc[:,3].values.astype('float64') ValueError: could not convert string to float: '..'
@Minsky Do you have your original data in csv? If so, then you don't have to convert your double dots to NaNs, pandas can do this for you. Just load it as df = pd.read_csv(path_to_data, na_values='..') and you will get a frame of float convertible strings. To convert the data, you may use applymap or convert_objects.
@Minsky If that piece of advice is helpful then I will make it my answer so that some other people having similar problems may use it too.So please, care to reply.
Sorry about the delay. Yes, I had actually noticed that, but I wanted to keep these values distinct because I also had some extra 'NaN' rows which I wanted to drop easily before converting the double dots to 'NaN'. After doing so, I stripped the strings with growth_raw.apply(lambda x: x.str.strip()) and the conversion worked fine. Thanks for advising not to convert anything to ''. That was very helpful. I had assumed that pandas could read it as NaN as it is often used for stripping spaces.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.