In Machine learning in action Chapter 2, one example reads records from file, each line like:
124 110 223 largeDoses
(forget its actual meaning)
One function in kNN.py is:
def file2matrix(filename):
fr = open(filename)
numberOfLines = len(fr.readlines())
returnMat = zeros((numberOfLines,3))
classLabelVector = []
fr = open(filename)
index = 0
for line in fr.readlines():
line = line.strip()
listFromLine = line.split('\t')
returnMat[index,:] = listFromLine[0:3]
classLabelVector.append(int(listFromLine[-1]))
index += 1
return returnMat,classLabelVector
The problem is listFromLine[-1] is a string ('largeDoses', etc.), how can it convert to int?
In the book, it says numpy can handle this.
(From the book : You have to explicitly tell the interpreter that you’d like the integer version of the last item in the list, or it will give you the string version. Usually, you’d have to do this, but NumPy takes care of those details for you.) However,
ValueError: invalid literal for int() with base 10: 'largeDoses'
occurs for
import kNN
kNN.file2matrix('dataset.txt')
BTW, the book's Chinese version is different from English Version.


largeDosesto an integer? What value should it give you? Do you have a mapping of string-int pairs?