1

I have 2D arrays of counts from which I need to extract a sequence of arbitrary subtotals. In this example they are subtotal columns. Each subtotal is the sum of an arbitrary collection of the base columns, represented by a tuple of addend-indices:

>>> A
[[11, 12, 13, 14, 15]
 [21, 22, 23, 24, 25]
 [31, 32, 33, 34, 35]]

>>> subtotal_addend_idxs
((0, 1), (1, 2, 3), (3, 4))

>>> desired_result
[[23, 39, 29]
 [43, 69, 49]
 [63, 99, 69]]

The best code I have for this so far is this:

subtotal_addend_idxs = ((0, 1), (1, 2, 3), (3, 4))
np.hstack(
    tuple(
        np.sum(A[:, subtotal_addend_idxs], axis=1, keepdims=True)
        for addend_idxs in self._column_addend_idxs
    )
)

Is there a clever way I can do this with a single numpy call/expression where I don't need a for loop creating a tuple of individual subtotal columns?

Note that the addend-indices are arbitrary; not all indices need appear in a subtotal, the indices do not necessarily appear in increasing order, and the same index can appear in more than one subtotal.

2 Answers 2

2

Try np.add.reduceat :

lens = [len(n) for n in subtotal_addend_idxs]
c = np.concatenate(subtotal_addend_idxs)
output = np.add.reduceat(A[:,c], np.cumsum([0]+lens)[:-1], axis=1)

Output:

array([[23, 39, 29],
       [43, 69, 49],
       [63, 99, 69]], dtype=int32)

Remark: a faster option for np.concatenate would be np.fromiter(itertools.chain(*subtotal_addend_idxs), dtype=int).

Sign up to request clarification or add additional context in comments.

2 Comments

Hmm, this is a very interesting approach, so basically create a new single array of all the addend columns, and then reduceat() them in each subtotal group. That's a clever idea :)
@scanny Changing order of indices is a common practise in numpy arrays, don't hesitate to try it more often :)
0

Since we cannot use np.take, here is my solution (there is still a for loop in the lambda function ...)

test = np.array([[11, 12, 13, 14, 15],
                 [21, 22, 23, 24, 25],
                 [31, 32, 33, 34, 35]])
inds = ((0, 1), (1, 2, 3), (3, 4))

fake_take = lambda array,inds:[np.sum(array[list(ind)]) for ind in inds]
np.apply_along_axis(lambda x:fake_take(x,inds),1,test)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.