NumPy: Reduce the sum of an array by n

Question

I'm looking for an efficient way to reduce the sum of a NumPy array a by a given number n, such that no value in a is below 0, and I can specify probabilities pvals for the different values in a. So if the signature for my function is:

> def removeRandom(a, n, pvals):
>    ...

Then it should do the following:

> a     = np.array([2,   3,   5,   10])
> pvals = np.array([0.1, 0.1, 0.4, 0.4]) 
> removeRandom(a, 5, pvals)
array([2, 2, 3, 8])

Since the removal should be random, the output can look different next time:

> removeRandom(a, 5, pvals)
array([1, 3, 4, 7])

I currently have an approach that does the removal step, then checks whether any values in a have gone below 0, and if so, repeats the step until no value in a is below 0:

def removeRandom(a, n, pvals=None):
    if n < np.sum(a):
        # remove a total of n at random indexes, taking the pvals into account
        aranged = np.arange(a.size)
        randomIndexes = np.random.choice(aranged, n, p=pvals)
        np.subtract.at(a, randomIndexes, 1)

        while(a[a < 0].size > 0):   
            # what's the sum of all cells below 0?
            sumBelowZero = np.abs(np.sum(a[a < 0]))
            # set them to 0
            a[a < 0] = 0   

            # rinse and repeat the process
            randomIndexes = np.random.choice(aranged, n, p=pvals)
            np.subtract.at(a, randomIndexes, 1)
        return a
    else:
        return np.zeros_like(a)

That loop is obviously not very elegant, plus there is a chance that the function gets stuck in that loop if it keeps dropping at least one value below 0. The chance that this happens increases dramatically as n approaches np.sum(a).

A very elegant solution to this question has been posted here, but it does not allow for the setting of probabilities:

def removeRandom(a, n):
    c = np.cumsum(np.r_[0, a])
    if n < c[-1]:
        r = np.random.choice(np.arange(c[-1]) + 1, n, replace = False)
        d = np.sum(r[:,None] <= c[None,:], axis=0)
        return np.diff(c-d)
    else:
        return np.zeros_like(a)

Since np.random.choice is also used here and accepts probabilities, I have looked for a way to utilize that (without success, obviously) – can this be done at all?

I would also appreciate any other ideas to solve this, of course.

Nelewout · Accepted Answer · 2018-08-16 16:18:47Z

2

This took some head-scratching for me to wrap my head around, but I think I understand your problem. The following method removes the total sum from random elements in the array.

def remove_random(array, total, probs=None):
    if total >= np.sum(array):
        return np.zeros_like(array)

    if total < 0:
        raise ValueError("Cannot remove non-positive amount!")

    to_remove = total

    while to_remove != 0:
        idx = np.random.choice(range(len(array)), p=probs)

        removeable = min(array[idx], to_remove)

        array[idx] = array[idx] - removeable
        to_remove = to_remove - removeable

    return array

Output (e.g.),

>>>a = np.array([2, 3, 5, 10])
>>>pvals = np.array([0.1, 0.1, 0.4, 0.4])
>>>n = 10

>>>print(remove_random(a, n, pvals))

<<<[2 3 5 0]

As total approaches the sum of the array, this will slow down (many values will end up zero), but at least the method does not get stuck anymore. This slowdown can be prevented by e.g. taking only the non-zero elements and normalising their associated probabilities when calling np.random.choice.

edited Aug 16, 2018 at 16:18

answered Aug 16, 2018 at 16:07

Nelewout

6,6546 gold badges33 silver badges44 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Carsten Over a year ago

Perfect, that does the job.

Collectives™ on Stack Overflow

NumPy: Reduce the sum of an array by n

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related