Perform operations on elements of a NumPy array

Question

Is there a faster/smarter way to perform operations on every element of a numpy array? What I specifically have is a list of datetime objects like, e.g.:

hh = np.array( [ dt.date(2000, 1, 1), dt.date(2001, 1, 1) ] )

To get a list of of years from that I do at the moment:

years = np.array( [ x.year for x in hh ] )

Is there a smarter way to do this? I'm thinking something like

hh.year

which obviously doesn't work.

I have a script in which I need different variations of a (much longer) array constantly (year, month, hours...). Of course I could always just define a separate array for everything but like there should be a more elegant solution.

Maybe use pandas's datetime64? Check the answer to this: stackoverflow.com/questions/13648774/… — ojy
– ojy, Commented Aug 25, 2014 at 22:34

Luis Masuelli · Accepted Answer · 2014-08-25 23:04:28Z

4

If you evaluate a python expression for each element, it doesn't matter whether the iteration will be done in C++ or Python. What will have weight is the python-complexity of the evaluated (in-loop) expression. This means: If your (in-loop) expression takes 1 microsec (a very simple script), it will be significantly harder than the difference between using a python iteration or a C++ iteration (you have a "marshalling" between C++ and PyObjects, and that applies to python functions as well).

For that reason, calling vectorize is -under the hoods- done in Python: what is called inside is python code. The idea behind vectorize is not performance, but code readability and ease of iteration: vectorize performs introspection (of function's parameters) and serves well for N-dimensional iterations (i.e. a lambda x,y: x+y automagically serves to iterate in two dimensions).

So: no, there's no "fast" way to iterate python code. The final speed that matters is the speed of your inner python code.

Edit: your -desired- hh.year looks like hh*.year equivalent in groovy, but even there under the hoods is the same as an in-code iteration. Comprehensions are the fastest (and equivalent) way in python. The real pity is being forced to:

years = np.array( [ x.year for x in hh ] )

(which forces you to create another provably-huge-sized) instead of letting you use any type of iterator:

years = np.array( x.year for x in hh )

Edit (suggestion by @Jaime): You can't construct array with that function from an iterator. For that, you must use:

np.fromiter(x.year for x in hh, dtype=int, count=len(x))

which lets you save the time and memory of building an intermediate array. This exact approach works for any sequence to avoid the inner-list creation (this one would be your case) but does not work with other types of generators, for future cases you'd need.

edited Aug 25, 2014 at 23:04

answered Aug 25, 2014 at 22:04

Luis Masuelli

12.4k11 gold badges53 silver badges89 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Jaime Over a year ago

There is np.fromiter, so np.fromiter(x.year for x in hh, dtype=int, count=len(x)) is probably going to be as fast as it gets.

hpaulj Over a year ago

ufunc is another mechanism. docs.scipy.org/doc/numpy-dev/user/c-info.ufunc-tutorial.html It doesn't speed up the iteration, but gives access to features like ndimensions and broadcasting.

colcarroll · Accepted Answer · 2014-08-25 21:56:17Z

0

You can use numpy.vectorize.

Doing some benchmarking, performance is pretty similar (vectorize slightly slower than a list comprehension), and in my opinion numpy.vectorize(lambda j: j.year)(hh) (or something similar) doesn't look super elegant.

edited Aug 25, 2014 at 21:56

answered Aug 25, 2014 at 21:45

colcarroll

3,68219 silver badges25 bronze badges

Collectives™ on Stack Overflow

Perform operations on elements of a NumPy array

2 Answers 2

2 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related