Python - Pandas CSV file with error when converting mixed data types string to number

Question

So I have a large csv file with lots of data. The main column 'Results', that I am interested in has integers, float, NaN data types and also number as text. I need to aggregate 'Results' but before I do I want to convert the column to float data type. The values that are text have trailing spaces like the following: ["1.07 ", "8.22 ", "8.6 ", "11.41 ", "7.93 "]

The error I get is...

AttributeError: Can only use .str accessor with string values!

import pandas as pd
import os
import numpy as np

csv_file = 'c:/path/to/file/big.csv'
# ... more lines of code ...

df = pd.read_csv(csv_file, usecols=my_cols, parse_dates=['Date'])
df = df[df['Company ID'].str.contains(my_company)]
print('df of csv created')
# Above code works great. 

# the below 2 tries did not work for me
# df['Result'] = pd.to_numeric(df['Result'].str.replace(' ', ''), errors='ignore')
# df['Result'] = df['Result'].str.strip() # causes an error 

# now let's try np.where...
# the below causes AttributeError: Can only use .str accessor with string values! 
df['Result'] = np.where(df['Result'].dtype == np.str, df['Result'].str.strip(), 
df['Result'])
df['Result'] = pd.to_numeric(df['Result'], downcast="float", errors='raise')

How should I resolve this?

If I remove the line df['Result'] = np.where(df['Result'].dtype == np.str, df['Result'].str.strip(), df['Result']) I get an error ValueError: Unable to parse string " " at position 1283 — Shane S
– Shane S, Commented Feb 11, 2022 at 0:53

Park · Accepted Answer · 2022-02-16 01:05:15Z

1

Why don't you try this code to explicitly convert all the value as stirng using astype(str).

import pandas as pd

df = pd.DataFrame({
    'Result': [' a ', ' b', 'c ']
})

df['Result'] = df['Result'].astype(str).str.strip()
print(df['Result'])

#0    a
#1    b
#2    c
#Name: Result, dtype: object

Sometime, I use this code if NaN or numbers are included in a Series to avoid getting the error msg.

edited Feb 16, 2022 at 1:05

answered Feb 11, 2022 at 3:06

Park

2,5441 gold badge19 silver badges25 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Shane S Over a year ago

I did use this method but I modified it, because I got AttributeError: 'Series' object has no attribute 'strip' So I did 2 steps. df['Result'] = df['Result'].astype(str) then df['Result'] = df['Result'].str.strip()

Park Over a year ago

@Shane Oh, sorry I forgot to add 'str inside there. You can do it by one line. I just edited my code with sample dataset. Thank you for pointing out :)

Collectives™ on Stack Overflow

Python - Pandas CSV file with error when converting mixed data types string to number

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related