I currently have the following double loop in my Python code:
for i in range(a):
for j in range(b):
A[:,i]*=B[j][:,C[i,j]]
(A is a float matrix. B is a list of float matrices. C is a matrix of integers. By matrices I mean m x n np.arrays.
To be precise, the sizes are: A: mxa B: b matrices of size mxl (with l different for each matrix) C: axb. Here m is very large, a is very large, b is small, the l's are even smaller than b )
I tried to speed it up by doing
for j in range(b):
A[:,:]*=B[j][:,C[:,j]]
but surprisingly to me this performed worse.
More precisely, this did improve performance for small values of m and a (the "large" numbers), but from m=7000,a=700 onwards the first appraoch is roughly twice as fast.
Is there anything else I can do?
Maybe I could parallelize? But I don't really know how.
(I am not committed to either Python 2 or 3)
Bof the same shape?*are we talking about matrix-multiplication or elementwise multiplication? Are you dealing with NumPy arrays or matrices?B, i.e. typical value ofb?