0

I'm looking for an efficient way to reduce the sum of a NumPy array a by a given number n, such that no value in a is below 0, and I can specify probabilities pvals for the different values in a. So if the signature for my function is:

> def removeRandom(a, n, pvals):
>    ...

Then it should do the following:

> a     = np.array([2,   3,   5,   10])
> pvals = np.array([0.1, 0.1, 0.4, 0.4]) 
> removeRandom(a, 5, pvals)
array([2, 2, 3, 8])

Since the removal should be random, the output can look different next time:

> removeRandom(a, 5, pvals)
array([1, 3, 4, 7])

I currently have an approach that does the removal step, then checks whether any values in a have gone below 0, and if so, repeats the step until no value in a is below 0:

def removeRandom(a, n, pvals=None):
    if n < np.sum(a):
        # remove a total of n at random indexes, taking the pvals into account
        aranged = np.arange(a.size)
        randomIndexes = np.random.choice(aranged, n, p=pvals)
        np.subtract.at(a, randomIndexes, 1)

        while(a[a < 0].size > 0):   
            # what's the sum of all cells below 0?
            sumBelowZero = np.abs(np.sum(a[a < 0]))
            # set them to 0
            a[a < 0] = 0   

            # rinse and repeat the process
            randomIndexes = np.random.choice(aranged, n, p=pvals)
            np.subtract.at(a, randomIndexes, 1)
        return a
    else:
        return np.zeros_like(a)

That loop is obviously not very elegant, plus there is a chance that the function gets stuck in that loop if it keeps dropping at least one value below 0. The chance that this happens increases dramatically as n approaches np.sum(a).

A very elegant solution to this question has been posted here, but it does not allow for the setting of probabilities:

def removeRandom(a, n):
    c = np.cumsum(np.r_[0, a])
    if n < c[-1]:
        r = np.random.choice(np.arange(c[-1]) + 1, n, replace = False)
        d = np.sum(r[:,None] <= c[None,:], axis=0)
        return np.diff(c-d)
    else:
        return np.zeros_like(a)            

Since np.random.choice is also used here and accepts probabilities, I have looked for a way to utilize that (without success, obviously) – can this be done at all?

I would also appreciate any other ideas to solve this, of course.

1 Answer 1

2

This took some head-scratching for me to wrap my head around, but I think I understand your problem. The following method removes the total sum from random elements in the array.

def remove_random(array, total, probs=None):
    if total >= np.sum(array):
        return np.zeros_like(array)

    if total < 0:
        raise ValueError("Cannot remove non-positive amount!")

    to_remove = total

    while to_remove != 0:
        idx = np.random.choice(range(len(array)), p=probs)

        removeable = min(array[idx], to_remove)

        array[idx] = array[idx] - removeable
        to_remove = to_remove - removeable

    return array

Output (e.g.),

>>>a = np.array([2, 3, 5, 10])
>>>pvals = np.array([0.1, 0.1, 0.4, 0.4])
>>>n = 10

>>>print(remove_random(a, n, pvals))

<<<[2 3 5 0]

As total approaches the sum of the array, this will slow down (many values will end up zero), but at least the method does not get stuck anymore. This slowdown can be prevented by e.g. taking only the non-zero elements and normalising their associated probabilities when calling np.random.choice.

Sign up to request clarification or add additional context in comments.

1 Comment

Perfect, that does the job.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.