1

Is there a faster/smarter way to perform operations on every element of a numpy array? What I specifically have is a list of datetime objects like, e.g.:

hh = np.array( [ dt.date(2000, 1, 1), dt.date(2001, 1, 1) ] )

To get a list of of years from that I do at the moment:

years = np.array( [ x.year for x in hh ] )

Is there a smarter way to do this? I'm thinking something like

hh.year

which obviously doesn't work.

I have a script in which I need different variations of a (much longer) array constantly (year, month, hours...). Of course I could always just define a separate array for everything but like there should be a more elegant solution.

1

2 Answers 2

4

If you evaluate a python expression for each element, it doesn't matter whether the iteration will be done in C++ or Python. What will have weight is the python-complexity of the evaluated (in-loop) expression. This means: If your (in-loop) expression takes 1 microsec (a very simple script), it will be significantly harder than the difference between using a python iteration or a C++ iteration (you have a "marshalling" between C++ and PyObjects, and that applies to python functions as well).

For that reason, calling vectorize is -under the hoods- done in Python: what is called inside is python code. The idea behind vectorize is not performance, but code readability and ease of iteration: vectorize performs introspection (of function's parameters) and serves well for N-dimensional iterations (i.e. a lambda x,y: x+y automagically serves to iterate in two dimensions).

So: no, there's no "fast" way to iterate python code. The final speed that matters is the speed of your inner python code.

Edit: your -desired- hh.year looks like hh*.year equivalent in groovy, but even there under the hoods is the same as an in-code iteration. Comprehensions are the fastest (and equivalent) way in python. The real pity is being forced to:

years = np.array( [ x.year for x in hh ] )

(which forces you to create another provably-huge-sized) instead of letting you use any type of iterator:

years = np.array( x.year for x in hh )

Edit (suggestion by @Jaime): You can't construct array with that function from an iterator. For that, you must use:

np.fromiter(x.year for x in hh, dtype=int, count=len(x))

which lets you save the time and memory of building an intermediate array. This exact approach works for any sequence to avoid the inner-list creation (this one would be your case) but does not work with other types of generators, for future cases you'd need.

Sign up to request clarification or add additional context in comments.

2 Comments

There is np.fromiter, so np.fromiter(x.year for x in hh, dtype=int, count=len(x)) is probably going to be as fast as it gets.
ufunc is another mechanism. docs.scipy.org/doc/numpy-dev/user/c-info.ufunc-tutorial.html It doesn't speed up the iteration, but gives access to features like ndimensions and broadcasting.
0

You can use numpy.vectorize.

Doing some benchmarking, performance is pretty similar (vectorize slightly slower than a list comprehension), and in my opinion numpy.vectorize(lambda j: j.year)(hh) (or something similar) doesn't look super elegant.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.