Numpy - How to vectorize on Sub-Arrays

Question

How do you apply vectorized functions on sub-arrays? Suppose I have the following:

array = np.array([
    [0, 1, 2],
    [2],
    [],
])

And I wanted to obtain the first element in each subarray, else None.

[0, 2, None]

While simple, is there are way to do this leveraging Numpy's pure vectorization? There doesn't seem to be native operations, and the np.vectorize() function is described to not be true documentation and has been stated at various other points in threads.

Is my only option to do a np.apply_along_axes()?

When do I know when I cannot solve my problem with numpy's pure vectorization?

numpy does not allow you to have a non-rectangular array. what you are referring to as sub-arrays will in fact be lists, in which you cannot use numpy tools beyond a simple loop over it. — Ehsan
– Ehsan, Commented Jun 18, 2020 at 6:43

hpaulj · Accepted Answer · 2020-06-18 07:03:42Z

You've created an object dtype array - containing lists (not subarrays):

In [2]: array = np.array([ 
   ...:     [0, 1, 2], 
   ...:     [2], 
   ...:     [], 
   ...: ])                                                                      
/usr/local/bin/ipython3:4: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (1.19dev gives warning)
In [3]: array                                                                   
Out[3]: array([list([0, 1, 2]), list([2]), list([])], dtype=object)

We could use a list comprehension:

In [4]: [a[0] for a in array]                                                   
....
IndexError: list index out of range

and correcting for the empty list:

In [5]: [a[0] if a else None for a in array]                                    
Out[5]: [0, 2, None]

Most of the fast compiled code for numpy - the "vectorized" stuff - only works with numeric dtype arrays. For object dtype it has to do something akin to a list comprehension. Even when math works, it's because it was able to delegate the action to the elements.

For example applying list replication to all elements of your array:

In [7]: array*3                                                                 
Out[7]: 
array([list([0, 1, 2, 0, 1, 2, 0, 1, 2]), list([2, 2, 2]), list([])],
      dtype=object)

and sum is just list join:

In [8]: array.sum()                                                             
Out[8]: [0, 1, 2, 2]

apply_along_axis isn't an faster than np.vectorize. And I can't imagine how it would be used in a case like this. array is 1d.

Sometimes frompyfunc is handy when working with object dtype arrays (but it's not a speed solution):

In [11]: timeit np.frompyfunc(lambda a: a[0] if a else None, 1,1)(array)        
3.8 µs ± 9.85 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [12]: timeit [a[0] if a else None for a in array]                            
1.02 µs ± 5.6 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [14]: timeit np.vectorize(lambda a: a[0] if a else None, otypes=['O'])(array)                                                                    
18 µs ± 46.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Collectives™ on Stack Overflow

Numpy - How to vectorize on Sub-Arrays

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related