8

Can someone provide me with a better (simpler, more readable, more Pythonic, more efficient, etc.) way to remove multiple values from an array than what follows:

import numpy as np

# The array.
x = np.linspace(0, 360, 37)

# The values to be removed.
a = 0
b = 180
c = 360

new_array = np.delete(x, np.where(np.logical_or(np.logical_or(x == a,
                                                              x == b),
                                                x == c)))

A good answer to this question would produce the same result as the above code (i.e., new_array), but might do a better job dealing with equality between floats than the above code does.

BONUS

Can someone explain to me why this produces the wrong result?

In [5]: np.delete(x, x == a)
/usr/lib/python2.7/dist-packages/numpy/lib/function_base.py:3254: FutureWarning: in the future insert will treat boolean arrays and array-likes as boolean index instead of casting it to integer
  "of casting it to integer", FutureWarning)
Out[5]: 
array([  20.,   30.,   40.,   50.,   60.,   70.,   80.,   90.,  100.,
        110.,  120.,  130.,  140.,  150.,  160.,  170.,  180.,  190.,
        200.,  210.,  220.,  230.,  240.,  250.,  260.,  270.,  280.,
        290.,  300.,  310.,  320.,  330.,  340.,  350.,  360.])

The values 0 and 10 have both been removed, rather than just 0 (a).

Note, x == a is as expected (so the problem is inside np.delete):

In [6]: x == a
Out[6]: 
array([ True, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False, False], dtype=bool)

Note as well that np.delete(x, np.where(x == a)) produces the correct result. Thus, it appears to me that np.delete cannot handle Boolean indices.

1
  • 2
    If you have Boolean indices you don't need to use delete. It specifies that obj : slice, int or array of ints (no boolean). Commented Jun 6, 2015 at 6:23

3 Answers 3

7

You can also use np.ravel to get index of values and then remove them using np.delete

In [32]: r =  [a,b,c]

In [33]: indx = np.ravel([np.where(x == i) for i in r])

In [34]: indx
Out[34]: array([ 0, 18, 36])

In [35]: np.delete(x, indx)
Out[35]: 
array([  10.,   20.,   30.,   40.,   50.,   60.,   70.,   80.,   90.,
        100.,  110.,  120.,  130.,  140.,  150.,  160.,  170.,  190.,
        200.,  210.,  220.,  230.,  240.,  250.,  260.,  270.,  280.,
        290.,  300.,  310.,  320.,  330.,  340.,  350.])
Sign up to request clarification or add additional context in comments.

3 Comments

this wins for readability. i guess i'll wait a bit to see whether more answers come in and then do a speed test.
ah, hell, someone else can do the speed test. for now, you're the winner.
It does seem to be reasonably fast as well. Nice answer!
6

Your code does seem a little complex. I wondered whether you had considered numpy's Boolean vector indexing.

After the same setup as you I timed your code:

In [175]: %%timeit
   .....: np.delete(x, np.where(np.logical_or(np.logical_or(x == a, x == b), x == c)))
   .....:
10000 loops, best of 3: 32.9 µs per loop

I then timed two separate applications of Boolean indexing.

In [176]: %%timeit
   .....: x1 = x[x != a]
   .....: x2 = x1[x1 != b]
   .....: new_array = x2[x2 != c]
   .....:
100000 loops, best of 3: 6.56 µs per loop

Finally, for programming convenience and to extend the technique to an arbitrary number of excluded values I rewrote the same code as a loop. This will be a little slower, because of the need to make a copy first, but it's still quite respectable.

In [177]: %%timeit
   .....: new_array = x.copy()
   .....: for val in (a, b, c):
   .....:     new_array = new_array[new_array != val]
   .....:
100000 loops, best of 3: 7.61 µs per loop

I think the real gain is in programming clarity, though. Finally I thought it best to verify that the three algorithms were giving the same results ...

In [179]: new_array1 = np.delete(x,
   .....:                 np.where(np.logical_or(np.logical_or(x == a, x == b), x == c)))

In [180]: x1 = x[x != a]

In [181]: x2 = x1[x1 != b]

In [182]: new_array2 = x2[x2 != c]

In [183]: new_array3 = x.copy()

In [184]: for val in (a, b, c):
   .....:         new_array3 = new_array3[new_array3 != val]
   .....:

In [185]: all(new_array1 == new_array2)
Out[185]: True

In [186]: all(new_array1 == new_array3)
Out[186]: True

To handle the issue of floating-point comparisons you need to use numpy's isclose() function. As expected, this sends the timing to hell:

In [188]: %%timeit
   .....: new_array = x.copy()
   .....: for val in (a, b, c):
   .....:     new_array = new_array[~np.isclose(new_array, val)]
   .....:
10000 loops, best of 3: 126 µs per loop

The answer to your bonus is contained within the warning, but the warning isn't very useful unless you know that False and True compare numerically equal to zero and one respectively. So your code is equivalent to

np.delete(1, 1)

As the warning makes clear, the numpy team eventually intend that the result using Boolean arguments to np.delete() is likely to change, but at present it only takes index arguments.

3 Comments

yikes, the floating-point comparison fix comes with a big speed cost.
re: the warning, thanks for explaining that. the warning says "insert" when it means "delete" -- that threw me.
Yes, the F-P comparisons cost a lot because instead of x == y the function has to compute x-delta <= y <= x+delta, a significantly more complex calculation. I reported the issue about the error message - this is the bug report.
2

You could borrow np.allclose approach to testing whether floats are equal:

def float_equal(x,y,rtol=1.e-5, atol=1.e-8):
   return np.less_equal(abs(x-y), atol + rtol * abs(y))

np.delete(x,np.where(np.logical_or.reduce([float_equal(x,y) for y in [0,180,360]])))

There where part produces:

(array([ 0, 18, 36]),)

float_equal could probably be changed to broadcast x against y, eliminating the list comprehension.

I used the fact that logical_or is a ufunc, and has a reduce method.

You don't need the where; just use the result of the logical_or as a boolean index:

I = np.logical_or.reduce([float_equal(x,y) for y in [0,180,360]])
x[~I]

(with this small example, the direct use of the boolean is 2x faster than the np.delete(np.where(...)) approach.)


With this x, == produces the same thing:

np.where(np.logical_or.reduce([x==y for y in [0,180,360]]))
# (array([ 0, 18, 36]),)

so does this vectorized approach:

abc = np.array([0,180,360])
np.where(np.sum(x==abc[:,None],axis=0))
# (array([ 0, 18, 36]),)

x==abc[:,None] is (3,37) boolean array; np.sum acts like a logical or.

My float_equal also works this way:

float_equal(x,abc[:,None]).sum(axis=0)

2 Comments

np.delete(x,np.where(x==a)) behaves. x==a is the wrong kind of input for delete.
ah, i see -- you're not using Boolean indices in your calls to np.delete. i misread your answer earlier.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.