Python/Numpy - calculate sum of equal array elements

Question

I have two numpy arrays, looking like:

field = np.array([5,1,3,3,2,1,6])    
counts = np.array([100,210,300,150,20,90,170])

They are not sorted (and shouldnt change). I now want to calculate a third array (of the same length and order) which contains the sum of the counts whenever they lie in the same field. Here the result should be:

field_counts = np.array([100,300,450,450,20,300,170])

The arrays are very long, such that iterating through it (and always looking where the corresponding partner fields are) is way too inefficient. Maybe I am just not seeing the wood for the trees... I hope someone can help me out on this!

Aside: when you find yourself needing a groupby operation, that's often a sign you should be using pandas instead of numpy; your operation would be something like df.groupby("field")["counts"].transform(sum). — DSM
– DSM, Commented Mar 26, 2015 at 20:53

Julien Spronck · Accepted Answer · 2015-03-26 20:31:40Z

2

I don't know if it will be efficient enough (since I do iterate over field) but here is a suggestion. I first make a directory of field/counts values. Then, I create an array based on that.

from collections import defaultdict
dic = defaultdict(int)
for j, f in enumerate(field):
    dic[f] += counts[j]

field_counts = np.array([dic[f] for f in field])

answered Mar 26, 2015 at 20:31

Julien Spronck

15.5k5 gold badges50 silver badges57 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Kasravnd · Accepted Answer · 2015-03-26 20:31:57Z

1

Use the following list comprehension :

>>> [np.sum(counts[np.where(field==i)]) for i in field]
[100, 300, 450, 450, 20, 300, 170]

You can get the index of same elements in field with np.where :

>>> [np.where(field==i) for i in field]
[(array([0]),), (array([1, 5]),), (array([2, 3]),), (array([2, 3]),), (array([4]),), (array([1, 5]),), (array([6]),)]

And then get the corresponding elements of counts with indexing! and calculate the sum with np.sum.

answered Mar 26, 2015 at 20:31

Kasravnd

108k19 gold badges167 silver badges195 bronze badges

1 Comment

DSM Over a year ago

This will be very slow if the arrays are long; you've made this an N^2 calculation.

Eelco Hoogendoorn · Accepted Answer · 2016-04-02 18:57:07Z

1

This problem an be solved in a fully vectorized manner using the numpy_indexed package (disclaimer: I am its author)

import numpy_indexed as npi
g = npi.group_by(field)
field_counts = g.sum(counts)[1][g.inverse]

g.sum computes the sums for each group of unique fields, and g.inverse maps those values back to the original fields.

edited Apr 2, 2016 at 18:57

answered Apr 2, 2016 at 18:23

Eelco Hoogendoorn

10.8k1 gold badge46 silver badges43 bronze badges

5 Comments

Eelco Hoogendoorn Over a year ago

There is a reason a went through the hassle to package this functionality, since there are indeed many questions of this type. In my perception, all these questions stand to benefit from my answers; as does this one. It substantially improves upon the currently accepted answer in several respects. It is my understanding that the sections you refer to are directed at commercial purposes; but this is a free-as-in-beer open-source package, but correct me if I'm wrong. My only selfish motive here is getting it better tested :).

Eelco Hoogendoorn Over a year ago

Subjectively, it feels more like self-promotion to me if I do mention my authorship; but thank you for the heads-up. Do you happen to have a link to any resources that are a bit more explicit about the distinction between commercial and non-commercial purposes?

Eelco Hoogendoorn Over a year ago

Some of them are duplicates I would say, yes. I will follow your suggestion to disclose authorship then, thanks.

Eelco Hoogendoorn Over a year ago

I do appreciate the feedback

Tunaki Over a year ago

Awesome @EelcoHoogendoorn I see you added disclosure :). Please do the same for your other answers as well. As a side-note, if some of them are duplicates, feel free to flag them as such! I will delete my previous comments to clean up.

Collectives™ on Stack Overflow

Python/Numpy - calculate sum of equal array elements

3 Answers 3

Comments

1 Comment

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

1 Comment

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related