Error in CSV file in Python

Question

When I run my Python code I get this error:

df = pd.DataFrame(desm)
scaler = StandardScaler()
scaler.fit(df)


ValueError                                Traceback (most recent call last)
<ipython-input-32-266a989a8af0> in <module>()
      1 scaler = StandardScaler()
----> 2 scaler.fit(df)

C:\Users\VILLAFAÑE\Anaconda3\lib\site-packages\sklearn\preprocessing\data.py in fit(self, X, y)
    555         # Reset internal state before fitting
    556         self._reset()
--> 557         return self.partial_fit(X, y)
    558 
    559     def partial_fit(self, X, y=None):

C:\Users\VILLAFAÑE\Anaconda3\lib\site-packages\sklearn\preprocessing\data.py in partial_fit(self, X, y)
    578         X = check_array(X, accept_sparse=('csr', 'csc'), copy=self.copy,
    579                         ensure_2d=False, warn_on_dtype=True,
--> 580                         estimator=self, dtype=FLOAT_DTYPES)
    581 
    582         if X.ndim == 1:

C:\Users\VILLAFAÑE\Anaconda3\lib\site-packages\sklearn\utils\validation.py in check_array(array, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
    371                                       force_all_finite)
    372     else:
--> 373         array = np.array(array, dtype=dtype, order=order, copy=copy)
    374 
    375         if ensure_2d:

ValueError: could not convert string to float: 'PMP'

My Python code was:

import pandas as pd
desm = pd.read_csv("G:/BASES DE DATOS/desm4.csv")

I know that is something with the csv format, but I don't know how to solve it. Please help! Here is the link to the csv file, for more information

https://drive.google.com/file/d/0B7tO-O0lx79FSnR0cVA3MDhrTG8/view?usp=sharing

Satyadev · Accepted Answer · 2017-06-01 06:29:44Z

1

You are trying to scale a dataset which has the first column being a string and not float.

You need to read the dataframe as follows:

import pandas as pd
from sklearn.preprocessing import StandardScaler
df = pd.read_csv('desm4.csv',index_col=0)
scaler = StandardScaler()
scaler.fit(df)

Try the above code and let me know if you face any problem. The above code takes the site column and uses it as index (row id for each row) ,the standard scaler does not get applied on index and hence you do not get an error.

Also , you don't have to do

df = pd.DataFrame(desm)

pd.read_csv reads a csv and returns a dataframe

edited Jun 1, 2017 at 6:29

answered Jun 1, 2017 at 6:22

Satyadev

6435 silver badges11 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Community · Accepted Answer · 2020-06-20 09:12:55Z

0

Your problem arises because the default read_csv() method used Index as None, meaning the first column it assumes as the index which can be read in its documentation here.

index_col : int or sequence or False, default None

Column to use as the row labels of the DataFrame.
If a sequence is given, a MultiIndex is used. 
If you have a malformed file with delimiters at the end of each line,
you might consider index_col=False to force pandas to _not_ use the first 
column as the index (row names)

Hence, try using this

import pandas as pd
desm = pd.read_csv("G:/BASES DE DATOS/desm4.csv",index_col = False)

I hope it works. Do let me know if there is any problem. Happy Coding. Cheers!

edited Jun 20, 2020 at 9:12

CommunityBot

11 silver badge

answered Jun 1, 2017 at 6:18

crazyglasses

5704 silver badges11 bronze badges

Collectives™ on Stack Overflow

Error in CSV file in Python

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related