1

I have an enormous 1D numpy array of booleans w and an increasing list of indices i, which splits w into len(i)+1 subarrays. A toy example is:

w=numpy.array([True,False,False,False,True,True,True,True,False,False])
i=numpy.array([0,0,2,5,5,8,8])

I wish to compute a numpy array wi, whose i-th entry is 1 if the i-th subarray contains a True and 0 otherwise. In other words, the i-th entry of w is the sum (logical 'or') of elements of the i-th subarray of w. In our example, the output is:

[0 0 1 1 0 1 0 0]

This is achieved with the code:

wi=numpy.fromiter(map(numpy.any,numpy.split(w,i)),int)

Is there a more efficient way of doing this or is this optimal as far as memory is concerned?

P.S. related post

0

2 Answers 2

2

For efficiency (memory and performance), use np.bitwise_or.reduceat as it keeps the output in boolean -

In [10]: np.bitwise_or.reduceat(w,np.r_[0,i])
Out[10]: array([ True,  True, False,  True, False, False])

To have as int output, view as int -

In [11]: np.bitwise_or.reduceat(w,np.r_[0,i]).view('i1')
Out[11]: array([1, 1, 0, 1, 0, 0], dtype=int8)

Here's all-weather solution -

def slice_reduce_or(w, i):
    valid = i<len(w)
    invalidc =( ~valid).sum()
    i = i[valid]
    
    mi = np.r_[i[:-1]!=i[1:],True]
    pp = i[mi]
    p1 = np.bitwise_or.reduceat(w,pp)
    
    N = len(i)+1
    out = np.zeros(N+invalidc, dtype=bool)
    out[1:N][mi] = p1
    out[0] = w[:i[0]].any()
    return out.view('i1')
Sign up to request clarification or add additional context in comments.

1 Comment

When len(w)=10^7, len(i)=10^7, your method is 164x faster than mine. When len(w)=10^8, len(i)=10^7, your method is 69x faster than mine. I like your all-weather function : ). Thank you!
2

Let's try np.add.reductat:

wi = np.add.reduceat(w,np.r_[0,i]).astype(bool)

output:

array([1, 1, 0, 1, 0, 0])

And performance:

%timeit -n 100 wi = np.add.reduceat(w,np.r_[0,i]).astype(bool).astype(int)
21.7 µs ± 7.86 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit -n 100 wi=np.fromiter(map(np.any,np.split(w,i)),int)
44.5 µs ± 7.79 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

So we're looking at about 2x speed here.

4 Comments

Hmm, your command does not produce the same results as mine. For w=[True,False,False,True,False,True,True]; i=[0,0,2,4], I get [0 0 1 1 1] and you get [1 1 1 1 1].
Well, sirs, it makes no sense to seriously post nano-"benchmarks" and derive performance expectations, given all data are in-cache. Kindly re-evaluate the tests for data-sizes slightly above 1E+9 elements in each vector ( as numpy is quite storage-efficiency aware ). Given these, real-world sizes, one may claim what strategy brings what storage & processing costs and which one gets higher performance. That's fair, isn't it?
@user3666197 See my benchmarks in the accepted answer.
@Leo Exactly, @Divakar is a reliable source of any ultimately performant processing strategy using numpy-tooling. Great step to make it quantitatively re-validated. Using the pure-[SERIAL], GIL-dancing, naive map()-iterator was a one-liner, yet a cost of a lost game since the very beginning - now you know how much acceleration comes from the gems hidden inside numpy. Stay well & stay tuned, Leo

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.