How to remove multiple values from an array at once

Question

Can someone provide me with a better (simpler, more readable, more Pythonic, more efficient, etc.) way to remove multiple values from an array than what follows:

import numpy as np

# The array.
x = np.linspace(0, 360, 37)

# The values to be removed.
a = 0
b = 180
c = 360

new_array = np.delete(x, np.where(np.logical_or(np.logical_or(x == a,
                                                              x == b),
                                                x == c)))

A good answer to this question would produce the same result as the above code (i.e., new_array), but might do a better job dealing with equality between floats than the above code does.

BONUS

Can someone explain to me why this produces the wrong result?

In [5]: np.delete(x, x == a)
/usr/lib/python2.7/dist-packages/numpy/lib/function_base.py:3254: FutureWarning: in the future insert will treat boolean arrays and array-likes as boolean index instead of casting it to integer
  "of casting it to integer", FutureWarning)
Out[5]: 
array([  20.,   30.,   40.,   50.,   60.,   70.,   80.,   90.,  100.,
        110.,  120.,  130.,  140.,  150.,  160.,  170.,  180.,  190.,
        200.,  210.,  220.,  230.,  240.,  250.,  260.,  270.,  280.,
        290.,  300.,  310.,  320.,  330.,  340.,  350.,  360.])

The values 0 and 10 have both been removed, rather than just 0 (a).

Note, x == a is as expected (so the problem is inside np.delete):

In [6]: x == a
Out[6]: 
array([ True, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False, False], dtype=bool)

Note as well that np.delete(x, np.where(x == a)) produces the correct result. Thus, it appears to me that np.delete cannot handle Boolean indices.

If you have Boolean indices you don't need to use delete. It specifies that obj : slice, int or array of ints (no boolean). — hpaulj
– hpaulj, Commented Jun 6, 2015 at 6:23

Sede · Accepted Answer · 2015-06-06 06:02:16Z

7

You can also use np.ravel to get index of values and then remove them using np.delete

In [32]: r =  [a,b,c]

In [33]: indx = np.ravel([np.where(x == i) for i in r])

In [34]: indx
Out[34]: array([ 0, 18, 36])

In [35]: np.delete(x, indx)
Out[35]: 
array([  10.,   20.,   30.,   40.,   50.,   60.,   70.,   80.,   90.,
        100.,  110.,  120.,  130.,  140.,  150.,  160.,  170.,  190.,
        200.,  210.,  220.,  230.,  240.,  250.,  260.,  270.,  280.,
        290.,  300.,  310.,  320.,  330.,  340.,  350.])

edited Jun 6, 2015 at 6:02

answered Jun 6, 2015 at 6:00

Sede

61.5k20 gold badges158 silver badges162 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

abcd Over a year ago

this wins for readability. i guess i'll wait a bit to see whether more answers come in and then do a speed test.

abcd Over a year ago

ah, hell, someone else can do the speed test. for now, you're the winner.

holdenweb Over a year ago

It does seem to be reasonably fast as well. Nice answer!

holdenweb · Accepted Answer · 2015-06-06 06:56:40Z

6

Your code does seem a little complex. I wondered whether you had considered numpy's Boolean vector indexing.

After the same setup as you I timed your code:

In [175]: %%timeit
   .....: np.delete(x, np.where(np.logical_or(np.logical_or(x == a, x == b), x == c)))
   .....:
10000 loops, best of 3: 32.9 µs per loop

I then timed two separate applications of Boolean indexing.

In [176]: %%timeit
   .....: x1 = x[x != a]
   .....: x2 = x1[x1 != b]
   .....: new_array = x2[x2 != c]
   .....:
100000 loops, best of 3: 6.56 µs per loop

Finally, for programming convenience and to extend the technique to an arbitrary number of excluded values I rewrote the same code as a loop. This will be a little slower, because of the need to make a copy first, but it's still quite respectable.

In [177]: %%timeit
   .....: new_array = x.copy()
   .....: for val in (a, b, c):
   .....:     new_array = new_array[new_array != val]
   .....:
100000 loops, best of 3: 7.61 µs per loop

I think the real gain is in programming clarity, though. Finally I thought it best to verify that the three algorithms were giving the same results ...

In [179]: new_array1 = np.delete(x,
   .....:                 np.where(np.logical_or(np.logical_or(x == a, x == b), x == c)))

In [180]: x1 = x[x != a]

In [181]: x2 = x1[x1 != b]

In [182]: new_array2 = x2[x2 != c]

In [183]: new_array3 = x.copy()

In [184]: for val in (a, b, c):
   .....:         new_array3 = new_array3[new_array3 != val]
   .....:

In [185]: all(new_array1 == new_array2)
Out[185]: True

In [186]: all(new_array1 == new_array3)
Out[186]: True

To handle the issue of floating-point comparisons you need to use numpy's isclose() function. As expected, this sends the timing to hell:

In [188]: %%timeit
   .....: new_array = x.copy()
   .....: for val in (a, b, c):
   .....:     new_array = new_array[~np.isclose(new_array, val)]
   .....:
10000 loops, best of 3: 126 µs per loop

The answer to your bonus is contained within the warning, but the warning isn't very useful unless you know that False and True compare numerically equal to zero and one respectively. So your code is equivalent to

np.delete(1, 1)

As the warning makes clear, the numpy team eventually intend that the result using Boolean arguments to np.delete() is likely to change, but at present it only takes index arguments.

edited Jun 6, 2015 at 6:56

answered Jun 6, 2015 at 6:20

holdenweb

37.8k7 gold badges62 silver badges80 bronze badges

3 Comments

abcd Over a year ago

yikes, the floating-point comparison fix comes with a big speed cost.

abcd Over a year ago

re: the warning, thanks for explaining that. the warning says "insert" when it means "delete" -- that threw me.

holdenweb Over a year ago

Yes, the F-P comparisons cost a lot because instead of x == y the function has to compute x-delta <= y <= x+delta, a significantly more complex calculation. I reported the issue about the error message - this is the bug report.

hpaulj · Accepted Answer · 2015-06-06 06:16:10Z

2

You could borrow np.allclose approach to testing whether floats are equal:

def float_equal(x,y,rtol=1.e-5, atol=1.e-8):
   return np.less_equal(abs(x-y), atol + rtol * abs(y))

np.delete(x,np.where(np.logical_or.reduce([float_equal(x,y) for y in [0,180,360]])))

There where part produces:

(array([ 0, 18, 36]),)

float_equal could probably be changed to broadcast x against y, eliminating the list comprehension.

I used the fact that logical_or is a ufunc, and has a reduce method.

You don't need the where; just use the result of the logical_or as a boolean index:

I = np.logical_or.reduce([float_equal(x,y) for y in [0,180,360]])
x[~I]

(with this small example, the direct use of the boolean is 2x faster than the np.delete(np.where(...)) approach.)

With this x, == produces the same thing:

np.where(np.logical_or.reduce([x==y for y in [0,180,360]]))
# (array([ 0, 18, 36]),)

so does this vectorized approach:

abc = np.array([0,180,360])
np.where(np.sum(x==abc[:,None],axis=0))
# (array([ 0, 18, 36]),)

x==abc[:,None] is (3,37) boolean array; np.sum acts like a logical or.

My float_equal also works this way:

float_equal(x,abc[:,None]).sum(axis=0)

edited Jun 6, 2015 at 6:16

answered Jun 6, 2015 at 5:49

hpaulj

233k14 gold badges260 silver badges392 bronze badges

2 Comments

hpaulj Over a year ago

np.delete(x,np.where(x==a)) behaves. x==a is the wrong kind of input for delete.

abcd Over a year ago

ah, i see -- you're not using Boolean indices in your calls to np.delete. i misread your answer earlier.

Collectives™ on Stack Overflow

How to remove multiple values from an array at once

BONUS

3 Answers 3

3 Comments

3 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

BONUS

3 Answers 3

3 Comments

3 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related