Delete all numbers from an array which are in another array

Question

I have an array "removable" containing a few numbers from another array "All" containing all numbers from 0 to k.

I want to remove all numbers in A which are listed in removable.

All = np.arange(k)
removable = np.ndarray([1, 3, 4 , 7, 9, ..., 200])

for i in removable:
    if i in All:
        All.remove(i)

ndarray has no remove attribute, but I'm sure there is an easy method in numpy to solve this problem, but I can't find it in the documentation.

I get the removable from another method, sadly im not able to change it. — Tim4497
– Tim4497, Commented Feb 5, 2019 at 14:51

Brad Solomon · Accepted Answer · 2019-02-05 14:58:46Z

5

You could use the function setdiff1d from NumPy:

>>> a = np.array([1, 2, 3, 2, 4, 1])
>>> b = np.array([3, 4, 5, 6])
>>> np.setdiff1d(a, b)
array([1, 2])

edited Feb 5, 2019 at 14:58

Brad Solomon

41.2k39 gold badges167 silver badges261 bronze badges

answered Feb 5, 2019 at 14:52

f.wue

8478 silver badges15 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Brad Solomon Over a year ago

Note that this will de-duplicate the original entries in a (not done by the pseudocode in the question), and the result will be sorted

f.wue Over a year ago

Thats true, however the np.arange(k) provides a list without duplicates. My answer will not work with duplicates.

Engineero Over a year ago

Oh snap, setdiff1d is even faster than explicit set conversion and differencing. I guess that makes sense, probably more optimized. I didn't know numpy had this!

Martin Over a year ago

Now the question is if OP wants duplicates or deduplicates

Brad Solomon · Accepted Answer · 2019-02-05 15:21:40Z

np.setdiff1d() will de-duplicate the original entries, and will also return the result sorted.

That's fine in some cases, but if you want to avoid one or both of these aspects, have a look at np.in1d() with an (inverted) boolean mask:

>>> a = np.array([1, 2, 3, 2, 4, 1])                                                                                                                                                                                                                    
>>> b = np.array([3, 4, 5, 6])                                                                                                                                                                                                                          
>>> a[~np.in1d(a, b)]                                                                                                                                                                                                                                   
array([1, 2, 2, 1])

The ~ operator does inversion on the boolean mask:

>>> np.in1d(a, b)                                                                                                                                                                                                                                       
array([False, False,  True, False,  True, False])

>>> ~np.in1d(a, b)                                                                                                                                                                                                                                      
array([ True,  True, False,  True, False,  True])

Disclaimer:

Note that this is not truly removal, as you indicated in your question; the result is a view into filtered elements of the original array a. Same goes for np.delete(); there's no concept of in-place element deletion for NumPy arrays.

Martin · Accepted Answer · 2019-02-05 15:09:14Z

1

Solution - fast for big arrays, no need to transform into list (slowing down computation)

orig=np.arange(15)
to_remove=np.array([1,2,3,4])
mask = np.isin(orig, to_remove)
orig=orig[np.invert(mask)]

>>> orig
array([ 0,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

edited Feb 5, 2019 at 15:09

answered Feb 5, 2019 at 15:01

Martin

3,3952 gold badges21 silver badges42 bronze badges

3 Comments

Brad Solomon Over a year ago

np.isin() calls np.asarray() + np.in1d() and does reshaping. If both of the inputs are 1d, those checks are probably not needed

Martin Over a year ago

no need to say its impossible or to give different solution than what is being asked from OP

Paritosh Singh Over a year ago

constructive criticism always welcome. We are all fellow programmers here, some having learnt a lot already, others just starting. Help correct mistakes or dont, but there is absolutely no need to mock them.

Mihai Andrei · Accepted Answer · 2019-02-05 14:55:05Z

-1

numpy arrays have a fixed shape, you cannot remove elements from them.

You cannot do this with ndarrays.

answered Feb 5, 2019 at 14:55

Mihai Andrei

1,0448 silver badges11 bronze badges

1 Comment

Eelco Hoogendoorn Over a year ago

Upvote to counter the downvotes. Perhaps pedantic, but not incorrect. And not an unimportant aspect about ndarrays to appreciate, on the path to a good solution to this problem.

marc_s · Accepted Answer · 2019-02-08 12:59:05Z

You should do this with sets instead of lists/arrays, which is easy enough:

remaining = np.array(set(arr).difference(removable))

where arr is your All array above ("all" is a keyword and should not be overwritten).

Granted, using sets will get rid of repeated elements if you have those in your arr, but it sounds like arr is just a sequence of unique values. Sets have much more efficient membership checking (constant-time vs. order N), so you get to go a lot faster. By comparison, I made a list version that builds a list if a value is not in removable:

def remove_list(arr, rem):
    result = []
    for i in arr:
        if i not in rem:
            result.append(i)
    return result

and made my set version a function as well:

def remove_set(arr, rem):
    return np.array(set(arr).difference(rem))

Timing comparison with arr = np.arange(10000) and removable = np.random.randint(0, 10000, 1000):

remove_list(arr, removable)
# 55.5 ms ± 664 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

remove_set(arr, removable)
# 947 µs ± 3.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Set is 50 times faster.

Collectives™ on Stack Overflow

Delete all numbers from an array which are in another array

5 Answers 5

4 Comments

Comments

3 Comments

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

4 Comments

Comments

3 Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related