0

When I run my Python code I get this error:

df = pd.DataFrame(desm)
scaler = StandardScaler()
scaler.fit(df)


ValueError                                Traceback (most recent call last)
<ipython-input-32-266a989a8af0> in <module>()
      1 scaler = StandardScaler()
----> 2 scaler.fit(df)

C:\Users\VILLAFAÑE\Anaconda3\lib\site-packages\sklearn\preprocessing\data.py in fit(self, X, y)
    555         # Reset internal state before fitting
    556         self._reset()
--> 557         return self.partial_fit(X, y)
    558 
    559     def partial_fit(self, X, y=None):

C:\Users\VILLAFAÑE\Anaconda3\lib\site-packages\sklearn\preprocessing\data.py in partial_fit(self, X, y)
    578         X = check_array(X, accept_sparse=('csr', 'csc'), copy=self.copy,
    579                         ensure_2d=False, warn_on_dtype=True,
--> 580                         estimator=self, dtype=FLOAT_DTYPES)
    581 
    582         if X.ndim == 1:

C:\Users\VILLAFAÑE\Anaconda3\lib\site-packages\sklearn\utils\validation.py in check_array(array, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
    371                                       force_all_finite)
    372     else:
--> 373         array = np.array(array, dtype=dtype, order=order, copy=copy)
    374 
    375         if ensure_2d:

ValueError: could not convert string to float: 'PMP'

My Python code was:

import pandas as pd
desm = pd.read_csv("G:/BASES DE DATOS/desm4.csv")

I know that is something with the csv format, but I don't know how to solve it. Please help! Here is the link to the csv file, for more information

https://drive.google.com/file/d/0B7tO-O0lx79FSnR0cVA3MDhrTG8/view?usp=sharing

2 Answers 2

1

You are trying to scale a dataset which has the first column being a string and not float.

You need to read the dataframe as follows:

import pandas as pd
from sklearn.preprocessing import StandardScaler
df = pd.read_csv('desm4.csv',index_col=0)
scaler = StandardScaler()
scaler.fit(df)

Try the above code and let me know if you face any problem. The above code takes the site column and uses it as index (row id for each row) ,the standard scaler does not get applied on index and hence you do not get an error.

Also , you don't have to do

df = pd.DataFrame(desm)

pd.read_csv reads a csv and returns a dataframe

Sign up to request clarification or add additional context in comments.

Comments

0

Your problem arises because the default read_csv() method used Index as None, meaning the first column it assumes as the index which can be read in its documentation here.

index_col : int or sequence or False, default None

Column to use as the row labels of the DataFrame.
If a sequence is given, a MultiIndex is used. 
If you have a malformed file with delimiters at the end of each line,
you might consider index_col=False to force pandas to _not_ use the first 
column as the index (row names)

Hence, try using this

import pandas as pd
desm = pd.read_csv("G:/BASES DE DATOS/desm4.csv",index_col = False)

I hope it works. Do let me know if there is any problem. Happy Coding. Cheers!

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.