I am reading in data with
df = pandas.read_csv("file.csv", names=['A','B','C','D','E','F','G', 'H','I','J', 'K'], header=None)
I get
df.dtypes
Out[54]:
A int64
B object
C int64
D int64
E object
F object
G object
H object
I object
J object
K object
dtype: object
The problem is that some of the fields in the original data have been replaced with the string SUPP when they are less than 6 (but more than 0) so I am not getting numerical data types. I tried replacing them with
df.replace('SUPP', 3.0)
but I still don't get numerical data types.
Some typical input data looks like
931,Oxfordshire,9314125,123255,Larkmead School,Abingdon,125,124,20,SUPP,8
931,Oxfordshire,9314126,123256,John Mason School,Abingdon,164,164,25,6,16
931,Oxfordshire,9314127,123257,Fitzharrys School,Abingdon,150,149,9,0,11
931,Oxfordshire,9316076,123298,Our Lady's Abingdon,Abingdon,57,57,SUPP,SUPP,16
The problem can be reproduced by just saving the example above as file.csv.
df.replace('SUPP', 3.0, inplace=True)?NaNlikedf = pandas.read_csv("file.csv", names=['A','B','C','D','E','F','G', 'H','I','J', 'K'], header=None, na_values=['SUPP'])this will replace 'SUPP' withNaNwhich you should be able to replaceNaNwith 3.0 so it should achieve what you want no?