Speed up double for loop in numpy

Question

I currently have the following double loop in my Python code:

for i in range(a):
    for j in range(b):
        A[:,i]*=B[j][:,C[i,j]]

(A is a float matrix. B is a list of float matrices. C is a matrix of integers. By matrices I mean m x n np.arrays.

To be precise, the sizes are: A: mxa B: b matrices of size mxl (with l different for each matrix) C: axb. Here m is very large, a is very large, b is small, the l's are even smaller than b )

I tried to speed it up by doing

for j in range(b):
    A[:,:]*=B[j][:,C[:,j]]

but surprisingly to me this performed worse.

More precisely, this did improve performance for small values of m and a (the "large" numbers), but from m=7000,a=700 onwards the first appraoch is roughly twice as fast.

Is there anything else I can do?

Maybe I could parallelize? But I don't really know how.

(I am not committed to either Python 2 or 3)

Also, with * are we talking about matrix-multiplication or elementwise multiplication? Are you dealing with NumPy arrays or matrices? — Divakar
– Divakar, Commented Oct 25, 2016 at 7:51
No (same #rows but different #cols). But they do not differ by much, so padding wouldn't be too costly, if that helps — Bananach
– Bananach, Commented Oct 25, 2016 at 7:51
I am dealing with elementwise multiplication, and as mentioned in the question I am dealing with np.arrays — Bananach
– Bananach, Commented Oct 25, 2016 at 7:52
Typically how many arrays do you have in B, i.e. typical value of b? — Divakar
– Divakar, Commented Oct 25, 2016 at 9:01

Divakar · Accepted Answer · 2016-10-25 09:32:56Z

1

Here's a vectorized approach assuming B as a list of arrays that are of the same shape -

# Convert B to a 3D array
B_arr = np.asarray(B)

# Use advanced indexing to index into the last axis of B array with C
# and then do product-reduction along the second axis. 
# Finally, we perform elementwise multiplication with A
A *= B_arr[np.arange(B_arr.shape[0]),:,C].prod(1).T

For cases with smaller a, we could run a loop that iterates through the length of a instead. Also, for more performance, it might be a better idea to store those elements into a separate 2D array instead and perform the elementwise multiplication only once after we get out of the loop.

Thus, we would have an alternative implementation like so -

range_arr = np.arange(B_arr.shape[0])
out = np.empty_like(A)
for i in range(a):
    out[:,i] = B_arr[range_arr,:,C[i,:]].prod(0)
A *= out

edited Oct 25, 2016 at 9:32

answered Oct 25, 2016 at 8:06

Divakar

222k19 gold badges273 silver badges374 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Bananach Over a year ago

Thanks for your answer. Your suggestion performs about as well as my alternative. That is, for not so large matrices it is better, but for large matrices it is worse than the double loop

Bananach Over a year ago

Your second approach comes closer to the performance of the original double loop. But unlike the other approaches it is always inferior to the double loop. Do you have a suggestion how to parallelize the first loop of the double loop?

Divakar Over a year ago

@Bananach "How to parallelize the first loop of the double loop?" - I thought you already did that in your second approach. As a small improvement, you could simply do : A* = instead of A[:,:]*. So, we have four approaches now - First one with that double loop, second one with first loop removed, third with second loop removed and finally fourth one with everything vectorized. Seems like we have explored all possibilities :)

Bananach Over a year ago

I guess you are right that that's what my second approach does. I am still confused that it does not beat the first one, but I am also happy to know that my naive code wasn't actually that naive after all

Collectives™ on Stack Overflow

Speed up double for loop in numpy

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related