2

This is an optimization problem. Given matrices E, H, Q, F and the logic in method my_func_basic (see code block), populate matrix V. Any potential ways, such as through vectorization, to speed up the computation? Thanks.

import timeit
import numpy as np
n = 20
m = 90
# E: m x n
E = np.random.randn(m,n)
# H, Q: m x m
H = np.random.randn(m,m)
Q = np.random.randn(m,m)
# F: n x n
F = np.random.randn(n,n)
# V: m x m
V = np.zeros(shape=(m,m))

def my_func_basic():
    for x in range(n):
        for y in range(n):
            if x == y:
                V[x][y] = np.nan
                continue   
            h = H[x][y]
            e = np.array([E[x,:]+h*E[y,:]])
            v1 = np.dot(np.dot(e,F),np.transpose(e))[0][0]
            v2 = Q[x][x]+h**2*Q[y][y]
            V[x][y] = v1/np.sqrt(v2)

print(timeit.timeit(my_func_basic,number=1000),'(sec), too slow...')
1
  • 1
    This will likely not make a dent in your overall timing, but E[x,:] + h*E[y,:] should already be an array. You can probably get away with just reshaping it to add the extra dimension. Otherwise, it might be a little easier to tell what's going on if you explain mathematically what it is that you're trying to accomplish. There might be some simplifications on that side that could help... Commented Jul 14, 2015 at 5:24

1 Answer 1

4

This would be one way to solve it with vectorized methods -

import numpy as np

def vectorized_approach(V,H,E,F,Q,n):

    # Create a copy of V to store output values into it
    V_vectorized = V.copy()

    # Calculate v1 in a vectorized fashion
    E1 = (E[None,:n,:]*H[:n,:n,None] + E[:n,None,:]).reshape(-1,n)
    E2 = np.dot(E1,F)
    v1_vectorized = np.einsum('ij,ji->i',E2,E1.T).reshape(n,n)
    np.fill_diagonal(v1_vectorized, np.nan)

    # Calculate v2 in a vectorized fashion
    Q_diag = np.diag(Q[:n,:n])
    v2_vectorized = Q_diag[:,None] + H[:n,:n]**2*Q_diag[None,:]

    # Finally, get vectorized version of output V
    V_vectorized[:n,:n] = v1_vectorized/np.sqrt(v2_vectorized)
    return V_vectorized 

Tests:

1) Setup inputs -

In [314]: n = 20
     ...: m = 90
     ...: # E: m x n
     ...: E = np.random.randn(m,n)
     ...: # H, Q: m x m
     ...: H = np.random.randn(m,m)
     ...: Q = np.random.randn(m,m)
     ...: # F: n x n
     ...: F = np.random.randn(n,n)
     ...: # V: m x m
     ...: V = np.zeros(shape=(m,m))
     ...: 

2) Verify results -

In [327]: out_basic_approach = my_func_basic(V,H,E,F,Q,n)
     ...: out_vectorized_approach = vectorized_approach(V,H,E,F,Q,n)
     ...: 
     ...: mask1 = ~np.isnan(out_basic_approach)
     ...: mask2 = ~np.isnan(out_vectorized_approach)
     ...: 

In [328]: np.allclose(mask1,mask2)
Out[328]: True

In [329]: np.allclose(out_basic_approach[mask1],out_vectorized_approach[mask1])
Out[329]: True

3) Runtime tests -

In [330]: %timeit my_func_basic(V,H,E,F,Q,n)
100 loops, best of 3: 12.2 ms per loop

In [331]: %timeit vectorized_approach(V,H,E,F,Q,n)
1000 loops, best of 3: 222 µs per loop
Sign up to request clarification or add additional context in comments.

1 Comment

55x speed up! That is some optimization skill you got there Divakar. Thanks also for doing the assert. I will need to take a look at the kind of wizardry einsum can do. Thank you!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.