Python optimisations in this code?

Question

I have two fairly simple code snippets and I'm running both of them a very large amount of times; I'm trying to determine if there's any optimisation I can do to speed up the execution time. If there's anything that stands out as something that could be done a lot quicker...

In the first one, we've got a list, fields. We've also got a list of lists, weights. We're trying to find which weight list multiplied by fields will produce the maximum sum. Fields is about 30k entries long.

def find_best(weights,fields):
  winner = -1
  best = -float('inf')
  for c in range(num_category):
    score = 0
    for i in range(num_fields):
      score += float(fields[i]) * weights[c][i]
    if score > best:
      best = score
      winner = c
  return winner

In the second one, we're trying to update two of our weight lists; one gets increased and one decreased. The amount to increase/decrease each element in the is equal to the corresponding element in fields (e.g. if fields[4] = 10.5, then we want to increase weights[toincrease][4] by 10.5 and decrease weights[todecrease][4] by 10.5)

 def update_weights(weights,fields,toincrease,todecrease):
   for i in range(num_fields):
     update = float(fields[i])
     weights[toincrease][i] += update
     weights[todecrease][i] -= update
   return weights

I hope this isn't an overly specific question.

Is the block starting with if score > best: meant to be unindented one level? — huon
– huon, Commented Apr 13, 2012 at 1:24
How long is each row of weights? (The same length as fields?) — huon
– huon, Commented Apr 13, 2012 at 1:31
I admit, that was probably a bit silly; I should probably have done the float conversion somewhere else earlier. The list is of strings. That's probably a good improvement right there... — Fergusmac
– Fergusmac, Commented Apr 13, 2012 at 1:50
Yes, do the conversion beforehand: as soon as you create the fields list. — huon
– huon, Commented Apr 13, 2012 at 1:53

Glorfindel · Accepted Answer · 2023-03-18 19:00:27Z

7

When you are trying to optimise, the thing you have to do is profile and measure! Python provides the timeit module which makes measuring things easy!

This will assume that you've converted fields to a list of floats beforehand (outside any of these functions), since the string → float conversion is very slow. You can do this via fields = [float(f) for f in string_fields].

Also, for doing numerical processing, pure python isn't very good, since it ends up doing a lot of type-checking (and some other stuff) for each operation. Using a C library like numpy will give massive improvements.

find_best

I have incorporated the answers of others (and a few more) into a profiling suite (say, test_find_best.py):

import random, operator, numpy as np, itertools, timeit

fields = [random.random() for _ in range(3000)]
fields_string = [str(field) for field in fields]
weights = [[random.random() for _ in range(3000)] for c in range(100)]

npw = np.array(weights)
npf = np.array(fields)   

num_fields = len(fields)
num_category = len(weights)

def f_original():
  winner = -1
  best = -float('inf')
  for c in range(num_category):
    score = 0
    for i in range(num_fields):
      score += float(fields_string[i]) * weights[c][i]
    if score > best:
      best = score
      winner = c
  
def f_original_no_string():
  winner = -1
  best = -float('inf')
  for c in range(num_category):
    score = 0
    for i in range(num_fields):
      score += fields[i] * weights[c][i]
    if score > best:
      best = score
      winner = c
      
def f_original_xrange():
  winner = -1
  best = -float('inf')
  for c in xrange(num_category):
    score = 0
    for i in xrange(num_fields):
      score += fields[i] * weights[c][i]
    if score > best:
      best = score
      winner = c


# Zenon  http://stackoverflow.com/a/10134298/1256624

def f_index_comprehension():
    winner = -1
    best = -float('inf')
    for c in range(num_category):
      score = sum(fields[i] * weights[c][i] for i in xrange(num_fields))
      if score > best:
        best = score
        winner = c  


# steveha  http://stackoverflow.com/a/10134247/1256624

def f_comprehension():
  winner = -1
  best = -float('inf')

  for c in xrange(num_category):
    score = sum(f * w for f, w in itertools.izip(fields, weights[c]))
    if score > best:
      best = score
      winner = c

def f_schwartz_original(): # https://en.wikipedia.org/wiki/Schwartzian_transform
    tup = max(((i, sum(t[0] * t[1] for t in itertools.izip(fields, wlist))) for i, wlist in enumerate(weights)),
              key=lambda t: t[1]
             )

def f_schwartz_opt(): # https://en.wikipedia.org/wiki/Schwartzian_transform
    tup = max(((i, sum(f * w for f,w in itertools.izip(fields, wlist))) for i, wlist in enumerate(weights)),
              key=operator.itemgetter(1)
             )

def fweight(field_float_list, wlist):
    f = iter(field_float_list)
    return sum(f.next() * w for w in wlist)
        
def f_schwartz_iterate():
     tup = max(
         ((i, fweight(fields, wlist)) for i, wlist in enumerate(weights)),
         key=lambda t: t[1]
      )
                                        
# Nolen Royalty  http://stackoverflow.com/a/10134147/1256624 
                           
def f_numpy_mult_sum():
   np.argmax(np.sum(npf * npw, axis = 1))


# me

def f_imap():
  winner = -1
  best = -float('inf')

  for c in xrange(num_category):
    score = sum(itertools.imap(operator.mul, fields, weights[c]))
    if score > best:
      best = score
      winner = c

def f_numpy():
   np.argmax(npw.dot(npf))



for f in [f_original,
          f_index_comprehension,
          f_schwartz_iterate,
          f_original_no_string,
          f_schwartz_original,
          f_original_xrange,
          f_schwartz_opt,
          f_comprehension,
          f_imap]:
   print "%s: %.2f ms" % (f.__name__, timeit.timeit(f,number=10)/10 * 1000)
for f in [f_numpy_mult_sum, f_numpy]:
   print "%s: %.2f ms" % (f.__name__, timeit.timeit(f,number=100)/100 * 1000)

Running python test_find_best.py gives me:

f_original: 310.34 ms
f_index_comprehension: 102.58 ms
f_schwartz_iterate: 103.39 ms
f_original_no_string: 96.36 ms
f_schwartz_original: 90.52 ms
f_original_xrange: 89.31 ms
f_schwartz_opt: 69.48 ms
f_comprehension: 68.87 ms
f_imap: 53.33 ms
f_numpy_mult_sum: 3.57 ms
f_numpy: 0.62 ms

So the numpy version using .dot (sorry, I can't find the documentation for it atm) is the fastest. If you are doing a lot of numerical operations (which it seems you are), it might be worth converting fields and weights as numpy arrays as soon as you create them.

update_weights

Numpy is likely to offer a similar speed-up for update_weights, doing something like:

def update_weights(weights, fields, to_increase, to_decrease):
  weights[to_increase,:] += fields
  weights[to_decrease,:] -= fields
  return weights

(I haven't tested or profiled that btw, you need to do that.)

edited Mar 18, 2023 at 19:00

Glorfindel

22.8k13 gold badges97 silver badges124 bronze badges

answered Apr 13, 2012 at 2:19

huon

103k24 gold badges239 silver badges230 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Nolen Royalty Over a year ago

This is easily the best answer. I'm happy to see that numpy performed well(not a surprise) but even happier to see such a comprehensive answer on how to profile your code.

Fergusmac Over a year ago

This is a really fantastic answer. I'll need some time to fully digest it. :)

huon Over a year ago

@Fergusmac, I've added some of the new things others have suggested.

Fergusmac Over a year ago

When I run your code, it tells me that an ndarray has no .dot() method. There's numpy.dot(vect1,vect2). But obviously you ran the code... is my copy of numpy old possibly (it's not my system)? I'm not sure how I would find out.

huon Over a year ago

@Fergusmac, import numpy.version; print(numpy.version.version) (versions 1.5.1 and 1.6.1 both have .dot on my system). Though numpy.dot is equivalent.

|

Glorfindel · Accepted Answer · 2023-03-18 19:00:22Z

4

I think you could get a pretty big speed boost using numpy. Stupidly simple example:

>>> fields = numpy.array([1, 4, 1, 3, 2, 5, 1])
>>> weights = numpy.array([[.2, .3, .4, .2, .1, .5, .9], [.3, .1, .1, .9, .2, .4, .5]])
>>> fields * weights
array([[ 0.2,  1.2,  0.4,  0.6,  0.2,  2.5,  0.9],
       [ 0.3,  0.4,  0.1,  2.7,  0.4,  2. ,  0.5]])
>>> result = _
>>> numpy.argmax(numpy.sum(result, axis=1))
1
>>> result[1]
array([ 0.3,  0.4,  0.1,  2.7,  0.4,  2. ,  0.5])

edited Mar 18, 2023 at 19:00

Glorfindel

22.8k13 gold badges97 silver badges124 bronze badges

answered Apr 13, 2012 at 1:28

Nolen Royalty

18.7k4 gold badges43 silver badges51 bronze badges

5 Comments

Preet Kukreti Over a year ago

alternatively you might also be able to make use of Python's built in array module which is efficient for numeric types

heltonbiker Over a year ago

Numpy has the advantage (among a lot of other advantages) to use considerably less function calls, because with a single matrix operation you perform a lot of elementwise operations with a single call.

Fergusmac Over a year ago

So now I have fields as an array and weights as another array (which is two-dimensional). Weights is 39 x 30473 and fields is 30473 (I tested with len(weights[0])). However, fields * weights keeps giving the error TypeError: unsupported operand type(s) for *: 'numpy.ndarray' and 'numpy.ndarray'. I've done a little bit of googling and I'm not sure why this happens. The example you gave above runs fine.

Nolen Royalty Over a year ago

@Fergusmac the results I get from googling that indicate that it has to do with floats and float64's (at least potentially). I know you have some funny casts to float: perhaps removing those would help.

Nolen Royalty Over a year ago

Also note that you'll want to be using the dot product(I knew there was a better method than mine!) but that will probably throw the same error.

Levon · Accepted Answer · 2012-04-13 01:48:10Z

3

If you are running Python 2.x I would use xrange() rather than range(), uses less memory as it doesn't generate a list

This is assuming you want to keep the current code structure.

edited Apr 13, 2012 at 1:48

answered Apr 13, 2012 at 1:21

Levon

144k35 gold badges205 silver badges194 bronze badges

Comments

steveha · Accepted Answer · 2012-04-13 03:40:03Z

First, if you are using Python 2.x, you can gain some speed by using xrange() instead of range(). In Python 3.x there is no xrange(), but the built-in range() is basically the same as xrange().

Next, if we are going for speed, we need to write less code, and rely more on Python's built-in features (that are written in C for speed).

You could speed things up by using a generator expression inside of sum() like so:

from itertools import izip

def find_best(weights,fields):
    winner = -1
    best = -float('inf')
    for c in xrange(num_category):
        score = sum(float(t[0]) * t[1] for t in izip(fields, weights[c]))
        if score > best:
            best = score
            winner = c
    return winner

Applying the same idea again, let's try to use max() to find the best result. I think this code is ugly to look at, but if you benchmark it and it's enough faster, it might be worth it:

from itertools import izip

def find_best(weights, fields):
    tup = max(
        ((i, sum(float(t[0]) * t[1] for t in izip(fields, wlist))) for i, wlist in enumerate(weights)),
        key=lambda t: t[1]
    )
    return tup[0]

Ugh! But if I didn't make any mistakes, this does the same thing, and it should rely a lot on the C machinery in Python. Measure it and see if it is faster.

So, we are calling max(). We are giving it a generator expression, and it will find the max value returned from the generator expression. But you want the index of the best value, so the generator expression returns a tuple: index and weight value. So we need to pass the generator expression as the first argument, and the second argument must be a key function that looks at the weight value from the tuple and ignores the index. Since the generator expression is not the only argument to max() it needs to be in parens. Then it builds a tuple of i and the calculated weight, calculated by the same sum() we used above. Finally once we get back a tuple from max() we index it to get the index value, and return that.

We can make this much less ugly if we break out a function. This adds the overhead of a function call, but if you measure it I'll bet it isn't too much slower. Also, now that I think about it, it makes sense to build a list of fields values already pre-coerced to float; then we can use that multiple times. Also, instead of using izip() to iterate over two lists in parallel, let's just make an iterator and explicitly ask it for values. In Python 2.x we use the .next() method function to ask for a value; in Python 3.x you would use the next() built-in function.

def fweight(field_float_list, wlist):
    f = iter(field_float_list)
    return sum(f.next() * w for w in wlist)

def find_best(weights, fields):
    flst = [float(x) for x in fields]
    tup = max(
        ((i, fweight(flst, wlist)) for i, wlist in enumerate(weights)),
        key=lambda t: t[1]
    )
    return tup[0]

If there are 30K fields values, then pre-computing the float() values is likely to be a big speed win.

EDIT: I missed one trick. Instead of the lambda function, I should have used operator.itemgetter() like some of the code in the accepted answer. Also, the accepted answer timed things, and it does look like the overhead of the function call was significant. But the Numpy answers were so much faster that it's not worth playing with this answer anymore.

As for the second part, I don't think it can be sped up very much. I'll try:

def update_weights(weights,fields,toincrease,todecrease):
    w_inc = weights[toincrease]
    w_dec = weights[todecrease]
    for i, f in enumerated(fields):
        f = float(f)  # see note below
        w_inc[i] += f
        w_dec[i] -= f

So, instead of iterating over an xrange(), here we just iterate over the fields values directly. We have a line that coerces to float.

Note that if the weights values are already float, we don't really need to coerce to float here, and we can save time by just deleting that line.

Your code was indexing the weights list four times: twice to do the increment, twice to do the decrement. This code does the first index (using the toincrease or todecrease) argument just once. It still has to index by i in order for += to work. (My first version tried to avoid this with an iterator and didn't work. I should have tested before posting. But it's fixed now.)

One last version to try: instead of incrementing and decrementing values as we go, just use list comprehensions to build a new list with the values we want:

def update_weights(weights, field_float_list, toincrease, todecrease):
    f = iter(field_float_list)
    weights[toincrease] = [x + f.next() for x in weights[toincrease]]
    f = iter(field_float_list)
    weights[todecrease] = [x - f.next() for x in weights[todecrease]]

This assumes you have already coerced all the fields values to float, as shown above.

Is it faster, or slower, to replace the whole list this way? I'm going to guess faster, but I'm not sure. Measure and see!

Oh, I should add: note that my version of update_weights() shown above does not return weights. This is because in Python it is considered a good practice to not return a value from a function that mutates a data structure, just to make sure that nobody ever gets confused about which functions do queries and which functions change things.

http://en.wikipedia.org/wiki/Command-query_separation

Measure measure measure. See how much faster my suggestions are, or are not.

Unfortunately the weights can be negative. I really appreciate the time spent on this answer. Is any of it still applicable given that the expression can be negative?
Actually, I just looked again at your code, and it doesn't matter if the expression can be negative or not. You are just summing all the weights, and sum() is a perfectly good way to do that. I'll edit the answer to take out the part about negative weights not working; I was wrong about that.

Preet Kukreti · Accepted Answer · 2012-04-13 01:21:48Z

2

An easy optimisation is to use xrange instead of range. xrange is a generator function that yields results one by one as you iterate over it; whereas range first creates the entire (30,000 item) list as a temporary object, using more memory and CPU cycles.

answered Apr 13, 2012 at 1:21

Preet Kukreti

8,66732 silver badges36 bronze badges

Comments

Zenon · Accepted Answer · 2012-04-13 05:05:03Z

2

As @Levon says, xrange() in python2.x is a must. Also, if you are in python2.4+ you can use generator expression (thanks @steveha) , which kinda work like list comprehensions (only in 2.6+), for your inner loop as simply as follows:

for i in range(num_fields):
      score += float(fields[i]) * weights[c][i]

equivalent to

score = sum(float(fields[i]) * weights[c][i]) for i in num_fields)

Also in general, there is this great page on the python wiki about simple but effective optimizations tricks!

edited Apr 13, 2012 at 5:05

answered Apr 13, 2012 at 1:51

Zenon

1,50912 silver badges21 bronze badges

3 Comments

Fergusmac Over a year ago

I'm not familiar enough with the different versions to understand the comment about the square brackets. Could you clarify?

Zenon Over a year ago

@Fergusmac sorry about that, it wasn't meant to be in this answer :). I added another link to optimization tips to compensate.

steveha Over a year ago

You are actually using a "generator expression" here, not a list comprehension. Which is good and correct. The list comprehension actually builds a list, but here you just want to feed numbers into sum().

Collectives™ on Stack Overflow

Python optimisations in this code?

6 Answers 6

find_best

update_weights

6 Comments

5 Comments

Comments

2 Comments

Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

find_best

update_weights

6 Comments

5 Comments

Comments

2 Comments

Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related