2

I currently have the following double loop in my Python code:

for i in range(a):
    for j in range(b):
        A[:,i]*=B[j][:,C[i,j]]   

(A is a float matrix. B is a list of float matrices. C is a matrix of integers. By matrices I mean m x n np.arrays.

To be precise, the sizes are: A: mxa B: b matrices of size mxl (with l different for each matrix) C: axb. Here m is very large, a is very large, b is small, the l's are even smaller than b )

I tried to speed it up by doing

for j in range(b):
    A[:,:]*=B[j][:,C[:,j]]

but surprisingly to me this performed worse.

More precisely, this did improve performance for small values of m and a (the "large" numbers), but from m=7000,a=700 onwards the first appraoch is roughly twice as fast.

Is there anything else I can do?

Maybe I could parallelize? But I don't really know how.

(I am not committed to either Python 2 or 3)

6
  • Are all those float matrices in list B of the same shape? Commented Oct 25, 2016 at 7:39
  • Also, with * are we talking about matrix-multiplication or elementwise multiplication? Are you dealing with NumPy arrays or matrices? Commented Oct 25, 2016 at 7:51
  • No (same #rows but different #cols). But they do not differ by much, so padding wouldn't be too costly, if that helps Commented Oct 25, 2016 at 7:51
  • I am dealing with elementwise multiplication, and as mentioned in the question I am dealing with np.arrays Commented Oct 25, 2016 at 7:52
  • Typically how many arrays do you have in B, i.e. typical value of b? Commented Oct 25, 2016 at 9:01

1 Answer 1

1

Here's a vectorized approach assuming B as a list of arrays that are of the same shape -

# Convert B to a 3D array
B_arr = np.asarray(B)

# Use advanced indexing to index into the last axis of B array with C
# and then do product-reduction along the second axis. 
# Finally, we perform elementwise multiplication with A
A *= B_arr[np.arange(B_arr.shape[0]),:,C].prod(1).T

For cases with smaller a, we could run a loop that iterates through the length of a instead. Also, for more performance, it might be a better idea to store those elements into a separate 2D array instead and perform the elementwise multiplication only once after we get out of the loop.

Thus, we would have an alternative implementation like so -

range_arr = np.arange(B_arr.shape[0])
out = np.empty_like(A)
for i in range(a):
    out[:,i] = B_arr[range_arr,:,C[i,:]].prod(0)
A *= out
Sign up to request clarification or add additional context in comments.

4 Comments

Thanks for your answer. Your suggestion performs about as well as my alternative. That is, for not so large matrices it is better, but for large matrices it is worse than the double loop
Your second approach comes closer to the performance of the original double loop. But unlike the other approaches it is always inferior to the double loop. Do you have a suggestion how to parallelize the first loop of the double loop?
@Bananach "How to parallelize the first loop of the double loop?" - I thought you already did that in your second approach. As a small improvement, you could simply do : A* = instead of A[:,:]*. So, we have four approaches now - First one with that double loop, second one with first loop removed, third with second loop removed and finally fourth one with everything vectorized. Seems like we have explored all possibilities :)
I guess you are right that that's what my second approach does. I am still confused that it does not beat the first one, but I am also happy to know that my naive code wasn't actually that naive after all

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.