4

I'm trying to read a large and complex CSV file with pandas.read_csv. The exact command is

pd.read_csv(filename, quotechar='"', low_memory=True, dtype=data_types, usecols= columns, true_values=['T'], false_values=['F'])

I am pretty sure that the data types are correct. I can read the first 16 million lines (setting nrows=16000000) without problems but somewhere after this I get the following error

ValueError: could not convert string to float: '1,123'

As it seems, for some reason pandas thinks two columns would be one.

What could be the problem? How can I fix it?

5
  • Is it missing an expected delimiter in that row of data? Commented Dec 16, 2015 at 18:03
  • 2
    Have you done a visual inspection of the line at which the error is raised? Alternatively, could you provide us with that line +/- 1 line (so three lines in total)? Commented Dec 16, 2015 at 18:04
  • if loosing some data is not an issue you could probably add 'error_bad_lines=False' in order to skip problematic rows Commented Dec 16, 2015 at 18:12
  • I think it is very hard without checking problematic rows. But you can check divide by zero - string like something/0 - it can cause this error. Commented Dec 16, 2015 at 20:39
  • 1
    How can I find the row? The error message does not say the row. Commented Dec 16, 2015 at 20:55

1 Answer 1

1

I found the mistake. The problem was a thousand separator.

When writing the CSV file, most numbers were below thousand and were correctly written to the CSV file. However, this one value was greater than thousand and it was written as "1,123" which pandas did not recognize as a number but as a string.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.