You've created an object dtype array - containing lists (not subarrays):
In [2]: array = np.array([
...: [0, 1, 2],
...: [2],
...: [],
...: ])
/usr/local/bin/ipython3:4: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (1.19dev gives warning)
In [3]: array
Out[3]: array([list([0, 1, 2]), list([2]), list([])], dtype=object)
We could use a list comprehension:
In [4]: [a[0] for a in array]
....
IndexError: list index out of range
and correcting for the empty list:
In [5]: [a[0] if a else None for a in array]
Out[5]: [0, 2, None]
Most of the fast compiled code for numpy - the "vectorized" stuff - only works with numeric dtype arrays. For object dtype it has to do something akin to a list comprehension. Even when math works, it's because it was able to delegate the action to the elements.
For example applying list replication to all elements of your array:
In [7]: array*3
Out[7]:
array([list([0, 1, 2, 0, 1, 2, 0, 1, 2]), list([2, 2, 2]), list([])],
dtype=object)
and sum is just list join:
In [8]: array.sum()
Out[8]: [0, 1, 2, 2]
apply_along_axis isn't an faster than np.vectorize. And I can't imagine how it would be used in a case like this. array is 1d.
Sometimes frompyfunc is handy when working with object dtype arrays (but it's not a speed solution):
In [11]: timeit np.frompyfunc(lambda a: a[0] if a else None, 1,1)(array)
3.8 µs ± 9.85 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [12]: timeit [a[0] if a else None for a in array]
1.02 µs ± 5.6 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [14]: timeit np.vectorize(lambda a: a[0] if a else None, otypes=['O'])(array)
18 µs ± 46.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)