5

I am using Python and Numpy to do some data analysis.

I have a large 3D matrix (NxNxN), where each cell is again a matrix, this time a 3x3 matrix. Calling the matrix data, it looks like this:

data[N,N,N,3,3]  

I need to find the eigenvalues of all the 3x3 matrices, and for that I use Numpy's eigvals routine, but it takes ages to do. Right now I pretty much do this:

for i in range(N):
    for j in range(N):
        for k in range(N):
            a = np.linalg.eigvals(data[i,j,k,:,:])

For N = 256, this takes about an hour. Any ideas on how to make this more efficient?

Many thanks for any suggestions!

2
  • 4
    have you profiled? i suspect you're spending much more time in eigvals than you are iterating. Commented Aug 7, 2011 at 16:28
  • 3
    eigvals takes about three orders of magnitude longer by my timeit calculations, so I don't think changing the iteration is going to affect anything. Commented Aug 7, 2011 at 16:50

3 Answers 3

5

itertools.product is nicer than nested loops, aesthetically speaking. But I don't think it will make your code that much faster. My testing suggests that iteration is not your bottleneck.

>>> bigdata = numpy.arange(256 * 256 * 256 * 3 * 3).reshape(256, 256, 256, 3, 3)
>>> %timeit numpy.linalg.eigvals(bigdata[100, 100, 100, :, :])
10000 loops, best of 3: 52.6 us per loop

So underestimating:

>>> .000052 * 256 * 256 * 256 / 60
14.540253866666665

That's 14 minutes minimum on my computer, which is pretty new. Let's see how long the loops take...

>>> def just_loops(N):
...     for i in xrange(N):
...         for j in xrange(N):
...             for k in xrange(N):
...                 pass
... 
>>> %timeit just_loops(256)
1 loops, best of 3: 350 ms per loop

Orders of magnitude smaller, as DSM said. Even the work of slicing the array alone is more substantial:

>>> def slice_loops(N, data):
...     for i in xrange(N):
...         for j in xrange(N):
...             for k in xrange(N):
...                 data[i, j, k, :, :]
... 
>>> %timeit slice_loops(256, bigdata)
1 loops, best of 3: 33.5 s per loop
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for the very thorough answer! From your tests it does indeed look like there is not much to be done to make it faster.
4

I'm sure there is a good way to do this in NumPy, but in general, itertools.product is faster than nested loops over ranges.

from itertools import product

for i, j, k in product(xrange(N), xrange(N), xrange(N)):
    a = np.linalg.eigvals(data[i,j,k,:,:])

2 Comments

In this case -- since N is so small -- I actually find that the loop overhead is roughly twice as large with the product loop than it is with the nested ranges. I still like the product approach better, because it's flatter and the overhead is negligible here anyway.
That's interesting. it would seem to me that creating 65536 + 256 inner lists would be slower (although I didn't expect it to make a huge difference).
2

since all the calculation are independent, you can use multiprocessing module to speed up the calculation if you have a multi-core processor.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.