1

In Machine learning in action Chapter 2, one example reads records from file, each line like:

124  110 223 largeDoses

(forget its actual meaning)

One function in kNN.py is:

 def file2matrix(filename):
      fr = open(filename)
      numberOfLines = len(fr.readlines())        
      returnMat = zeros((numberOfLines,3))       
      classLabelVector = []                       
      fr = open(filename)
      index = 0
      for line in fr.readlines():
          line = line.strip()
          listFromLine = line.split('\t')
          returnMat[index,:] = listFromLine[0:3]
          classLabelVector.append(int(listFromLine[-1]))
         index += 1
     return returnMat,classLabelVector

The problem is listFromLine[-1] is a string ('largeDoses', etc.), how can it convert to int?

In the book, it says numpy can handle this.

(From the book : You have to explicitly tell the interpreter that you’d like the integer version of the last item in the list, or it will give you the string version. Usually, you’d have to do this, but NumPy takes care of those details for you.) However,

ValueError: invalid literal for int() with base 10: 'largeDoses' 

occurs for

import kNN
kNN.file2matrix('dataset.txt')

BTW, the book's Chinese version is different from English Version. enter image description here

enter image description here

6
  • 2
    How would you convert largeDoses to an integer? What value should it give you? Do you have a mapping of string-int pairs? Commented Oct 10, 2014 at 15:21
  • In the book, it says numPy can handle this. => so what is expected output of int('largeDoses')? Commented Oct 10, 2014 at 15:22
  • 3
    An error is expected Commented Oct 10, 2014 at 15:24
  • the LabelsVector in the output from book is : [3,1,2,....]. The code doesn't have any mapping of string-int. Commented Oct 10, 2014 at 15:25
  • 1
    If you don't have a mapping, then there's nothing you can do. Python (or NumPy) can't magically convert a word to an integer arbitrarily. Commented Oct 10, 2014 at 15:27

3 Answers 3

1

String (indeed) cannot convert to int, neither in python, nor in other environment,

however,

the solution is

Put Machine Learning (indeed) in action

In case all kNN-input training / cross-validation records ( a.k.a. observations, examples )

do conform to the convention of [ 3x FEATURE, 1x LABEL]

use:

classLabelVector.append( listFromLine[-1] )    # to .append a LABEL, not an int()
Sign up to request clarification or add additional context in comments.

1 Comment

Ok, so the code is wrong,right? BTW, I originally read its Chinese translation version, and I then found its output of 'LabelsVector' is different from English version. (the former's output is Int, the later output is String)
1

You should convert those 'largeDoses' 'smallDoses' 'didntLike' to the number by hand. String cannot convert to int unless the String inside is int.

if (listLine[-1]=='largeDoses'):
    listLine[-1] = '3'
elif (listLine[-1]=='smallDoses'):
    listLine[-1] = '2'
else:
    listLine[-1] = '1'

Comments

0

It can be seen that instead of simply changing the string to integer data, it is changed to a table. So, the modification program is as follows.

labels = {'didntLike':1,'smallDoses':2,'largeDoses':3}
classLabelVector.append(labels[listFromLine[-1]])

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.