First, if you are using Python 2.x, you can gain some speed by using xrange() instead of range(). In Python 3.x there is no xrange(), but the built-in range() is basically the same as xrange().
Next, if we are going for speed, we need to write less code, and rely more on Python's built-in features (that are written in C for speed).
You could speed things up by using a generator expression inside of sum() like so:
from itertools import izip
def find_best(weights,fields):
winner = -1
best = -float('inf')
for c in xrange(num_category):
score = sum(float(t[0]) * t[1] for t in izip(fields, weights[c]))
if score > best:
best = score
winner = c
return winner
Applying the same idea again, let's try to use max() to find the best result. I think this code is ugly to look at, but if you benchmark it and it's enough faster, it might be worth it:
from itertools import izip
def find_best(weights, fields):
tup = max(
((i, sum(float(t[0]) * t[1] for t in izip(fields, wlist))) for i, wlist in enumerate(weights)),
key=lambda t: t[1]
)
return tup[0]
Ugh! But if I didn't make any mistakes, this does the same thing, and it should rely a lot on the C machinery in Python. Measure it and see if it is faster.
So, we are calling max(). We are giving it a generator expression, and it will find the max value returned from the generator expression. But you want the index of the best value, so the generator expression returns a tuple: index and weight value. So we need to pass the generator expression as the first argument, and the second argument must be a key function that looks at the weight value from the tuple and ignores the index. Since the generator expression is not the only argument to max() it needs to be in parens. Then it builds a tuple of i and the calculated weight, calculated by the same sum() we used above. Finally once we get back a tuple from max() we index it to get the index value, and return that.
We can make this much less ugly if we break out a function. This adds the overhead of a function call, but if you measure it I'll bet it isn't too much slower. Also, now that I think about it, it makes sense to build a list of fields values already pre-coerced to float; then we can use that multiple times. Also, instead of using izip() to iterate over two lists in parallel, let's just make an iterator and explicitly ask it for values. In Python 2.x we use the .next() method function to ask for a value; in Python 3.x you would use the next() built-in function.
def fweight(field_float_list, wlist):
f = iter(field_float_list)
return sum(f.next() * w for w in wlist)
def find_best(weights, fields):
flst = [float(x) for x in fields]
tup = max(
((i, fweight(flst, wlist)) for i, wlist in enumerate(weights)),
key=lambda t: t[1]
)
return tup[0]
If there are 30K fields values, then pre-computing the float() values is likely to be a big speed win.
EDIT: I missed one trick. Instead of the lambda function, I should have used operator.itemgetter() like some of the code in the accepted answer. Also, the accepted answer timed things, and it does look like the overhead of the function call was significant. But the Numpy answers were so much faster that it's not worth playing with this answer anymore.
As for the second part, I don't think it can be sped up very much. I'll try:
def update_weights(weights,fields,toincrease,todecrease):
w_inc = weights[toincrease]
w_dec = weights[todecrease]
for i, f in enumerated(fields):
f = float(f) # see note below
w_inc[i] += f
w_dec[i] -= f
So, instead of iterating over an xrange(), here we just iterate over the fields values directly. We have a line that coerces to float.
Note that if the weights values are already float, we don't really need to coerce to float here, and we can save time by just deleting that line.
Your code was indexing the weights list four times: twice to do the increment, twice to do the decrement. This code does the first index (using the toincrease or todecrease) argument just once. It still has to index by i in order for += to work. (My first version tried to avoid this with an iterator and didn't work. I should have tested before posting. But it's fixed now.)
One last version to try: instead of incrementing and decrementing values as we go, just use list comprehensions to build a new list with the values we want:
def update_weights(weights, field_float_list, toincrease, todecrease):
f = iter(field_float_list)
weights[toincrease] = [x + f.next() for x in weights[toincrease]]
f = iter(field_float_list)
weights[todecrease] = [x - f.next() for x in weights[todecrease]]
This assumes you have already coerced all the fields values to float, as shown above.
Is it faster, or slower, to replace the whole list this way? I'm going to guess faster, but I'm not sure. Measure and see!
Oh, I should add: note that my version of update_weights() shown above does not return weights. This is because in Python it is considered a good practice to not return a value from a function that mutates a data structure, just to make sure that nobody ever gets confused about which functions do queries and which functions change things.
http://en.wikipedia.org/wiki/Command-query_separation
Measure measure measure. See how much faster my suggestions are, or are not.
if score > best:meant to be unindented one level?floaton fields?fieldslist.