0

How do you apply vectorized functions on sub-arrays? Suppose I have the following:

array = np.array([
    [0, 1, 2],
    [2],
    [],
])

And I wanted to obtain the first element in each subarray, else None.

[0, 2, None]

While simple, is there are way to do this leveraging Numpy's pure vectorization? There doesn't seem to be native operations, and the np.vectorize() function is described to not be true documentation and has been stated at various other points in threads.

Is my only option to do a np.apply_along_axes()?

When do I know when I cannot solve my problem with numpy's pure vectorization?

2
  • 1
    There's no true vectorization for this. Just loop through. Commented Jun 18, 2020 at 6:42
  • 2
    numpy does not allow you to have a non-rectangular array. what you are referring to as sub-arrays will in fact be lists, in which you cannot use numpy tools beyond a simple loop over it. Commented Jun 18, 2020 at 6:43

1 Answer 1

1

You've created an object dtype array - containing lists (not subarrays):

In [2]: array = np.array([ 
   ...:     [0, 1, 2], 
   ...:     [2], 
   ...:     [], 
   ...: ])                                                                      
/usr/local/bin/ipython3:4: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (1.19dev gives warning)
In [3]: array                                                                   
Out[3]: array([list([0, 1, 2]), list([2]), list([])], dtype=object)

We could use a list comprehension:

In [4]: [a[0] for a in array]                                                   
....
IndexError: list index out of range

and correcting for the empty list:

In [5]: [a[0] if a else None for a in array]                                    
Out[5]: [0, 2, None]

Most of the fast compiled code for numpy - the "vectorized" stuff - only works with numeric dtype arrays. For object dtype it has to do something akin to a list comprehension. Even when math works, it's because it was able to delegate the action to the elements.

For example applying list replication to all elements of your array:

In [7]: array*3                                                                 
Out[7]: 
array([list([0, 1, 2, 0, 1, 2, 0, 1, 2]), list([2, 2, 2]), list([])],
      dtype=object)

and sum is just list join:

In [8]: array.sum()                                                             
Out[8]: [0, 1, 2, 2]

apply_along_axis isn't an faster than np.vectorize. And I can't imagine how it would be used in a case like this. array is 1d.

Sometimes frompyfunc is handy when working with object dtype arrays (but it's not a speed solution):

In [11]: timeit np.frompyfunc(lambda a: a[0] if a else None, 1,1)(array)        
3.8 µs ± 9.85 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [12]: timeit [a[0] if a else None for a in array]                            
1.02 µs ± 5.6 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [14]: timeit np.vectorize(lambda a: a[0] if a else None, otypes=['O'])(array)                                                                    
18 µs ± 46.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.