Extract arbitrary subtotals from 2D numpy array

Question

I have 2D arrays of counts from which I need to extract a sequence of arbitrary subtotals. In this example they are subtotal columns. Each subtotal is the sum of an arbitrary collection of the base columns, represented by a tuple of addend-indices:

>>> A
[[11, 12, 13, 14, 15]
 [21, 22, 23, 24, 25]
 [31, 32, 33, 34, 35]]

>>> subtotal_addend_idxs
((0, 1), (1, 2, 3), (3, 4))

>>> desired_result
[[23, 39, 29]
 [43, 69, 49]
 [63, 99, 69]]

The best code I have for this so far is this:

subtotal_addend_idxs = ((0, 1), (1, 2, 3), (3, 4))
np.hstack(
    tuple(
        np.sum(A[:, subtotal_addend_idxs], axis=1, keepdims=True)
        for addend_idxs in self._column_addend_idxs
    )
)

Is there a clever way I can do this with a single numpy call/expression where I don't need a for loop creating a tuple of individual subtotal columns?

Note that the addend-indices are arbitrary; not all indices need appear in a subtotal, the indices do not necessarily appear in increasing order, and the same index can appear in more than one subtotal.

mathfux · Accepted Answer · 2020-10-12 03:10:47Z

2

Try np.add.reduceat :

lens = [len(n) for n in subtotal_addend_idxs]
c = np.concatenate(subtotal_addend_idxs)
output = np.add.reduceat(A[:,c], np.cumsum([0]+lens)[:-1], axis=1)

Output:

array([[23, 39, 29],
       [43, 69, 49],
       [63, 99, 69]], dtype=int32)

Remark: a faster option for np.concatenate would be np.fromiter(itertools.chain(*subtotal_addend_idxs), dtype=int).

edited Oct 12, 2020 at 3:10

answered Oct 12, 2020 at 3:04

mathfux

5,9792 gold badges21 silver badges38 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

scanny Over a year ago

Hmm, this is a very interesting approach, so basically create a new single array of all the addend columns, and then reduceat() them in each subtotal group. That's a clever idea :)

mathfux Over a year ago

@scanny Changing order of indices is a common practise in numpy arrays, don't hesitate to try it more often :)

meTchaikovsky · Accepted Answer · 2020-10-12 02:49:49Z

0

Since we cannot use np.take, here is my solution (there is still a for loop in the lambda function ...)

test = np.array([[11, 12, 13, 14, 15],
                 [21, 22, 23, 24, 25],
                 [31, 32, 33, 34, 35]])
inds = ((0, 1), (1, 2, 3), (3, 4))

fake_take = lambda array,inds:[np.sum(array[list(ind)]) for ind in inds]
np.apply_along_axis(lambda x:fake_take(x,inds),1,test)

answered Oct 12, 2020 at 2:49

meTchaikovsky

7,6963 gold badges18 silver badges37 bronze badges

Collectives™ on Stack Overflow

Extract arbitrary subtotals from 2D numpy array

2 Answers 2

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related