Improving loop in loops with Numpy

Question

I am using numpy arrays aside from pandas for speed purposes. However, I am unable to advance my codes using broadcasting, indexing etc. Instead, I am using loop in loops as below. It is working but seems so ugly and inefficient to me.

Basically what I am doing is, I am trying to imitate groupby of pandas at the step mydata[mydata[:,1]==i]. You may consider it as a firm id number. Then with respect to the lookup data, I am checking if it is inside the selected firm or not at the step all(np.isin(lookup[u],d[:,3])). But as I denoted at the beginning, I feel so uncomfortable about this.

out = []
for i in np.unique(mydata[:,1]):
    d = mydata[mydata[:,1]==i]
    
    for u in range(0,len(lookup)):
        control = all(np.isin(lookup[u],d[:,3]))
        if(control):
            out.append(d[np.isin(d[:,3],lookup[u])])

It takes about 0.27 seconds. However there must exist some clever alternatives.

I also tried Numba jit() but it does not work.

Could anyone help me about that?

Thanks in advance!

Fake Data:

a = np.repeat(np.arange(100)+5000, np.random.randint(50, 100, 100))
b =  np.random.randint(100,200,len(a))
c = np.random.randint(10,70,len(a))
index =  np.arange(len(a))
mydata = np.vstack((index,a, b,c)).T

lookup = []
for i in range(0,60):
    lookup.append(np.random.randint(10,70,np.random.randint(3,6,1) ))

This is not the typical problem that broadcasting and indexing helps with. For one thing you are using unique, which under the covers uses sort to bring like values together. And you are doing that if test inside the inner loop. numpy doesn't have much in the way of grouping tools. Python itertools and pandas have better for grouping. Or if you really need speed, bite-the-bullet and use numba or cython. — hpaulj
– hpaulj, Commented Nov 24, 2021 at 8:54

MunsMan · Accepted Answer · 2021-11-24 08:28:07Z

2

I had some problems getting the goal of your Program, but I got a decent performance improvement, by refactoring your second for loop. I was able to compress your code to 3 or 4 lines.

f = (
    lambda lookup: out1.append(d[np.isin(d[:, 3], lookup)])
    if all(np.isin(lookup, d[:, 3]))
    else None
)
out = []
for i in np.unique(mydata[:, 1]):
    d = mydata[mydata[:, 1] == i]
    list(map(f, lookups))

This resolves to the same output list you received previously and the code runs almost twice as quick (at least on my machine).

answered Nov 24, 2021 at 8:28

MunsMan

1781 silver badge7 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

datatech Over a year ago

Thanks for your effort. Upvoted. But unfortunately in my computer it gives almost the same elapsed time.

Collectives™ on Stack Overflow

Improving loop in loops with Numpy

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related