30

A simply example of numpy indexing:

In: a = numpy.arange(10)
In: sel_id = numpy.arange(5)
In: a[sel_id]
Out: array([0,1,2,3,4])

How do I return the rest of the array that are not indexed by sel_id? What I can think of is:

In: numpy.array([x for x in a if x not in a[id]])
out: array([5,6,7,8,9])

Is there any easier way?

3
  • Is this a 1-time operation? Or will you be reusing sel_id (and it's negation) down the road? Also, are you interested in the multi-dimensional case, or just the 1D case? Commented Sep 20, 2012 at 17:58
  • In my application, it will be operated on a multi-dimensional massive array, and yes I will reuse sel_id. Commented Sep 21, 2012 at 19:56
  • Just realized my solution above is WRONG. If it is an array of ten 1's then the given code will give a None array instead of an array of five 1's. Commented Sep 21, 2012 at 20:10

8 Answers 8

21

For this simple 1D case, I'd actually use a boolean mask:

a = numpy.arange(10)
include_index = numpy.arange(4)
include_idx = set(include_index)  #Set is more efficient, but doesn't reorder your elements if that is desireable
mask = numpy.array([(i in include_idx) for i in xrange(len(a))])

Now you can get your values:

included = a[mask]  # array([0, 1, 2, 3])
excluded = a[~mask] # array([4, 5, 6, 7, 8, 9])

Note that a[mask] doesn't necessarily yield the same thing as a[include_index] since the order of include_index matters for the output in that scenario (it should be roughly equivalent to a[sorted(include_index)]). However, since the order of your excluded items isn't well defined, this should work Ok.


EDIT

A better way to create the mask is:

mask = np.zeros(a.shape,dtype=bool)
mask[include_idx] = True

(thanks to seberg).

Sign up to request clarification or add additional context in comments.

7 Comments

@BiRico -- wrong. I converted include_index to a set (called include_idx) which has a __contains__ method that goes in O(1). This solution has O(N) complexity.
+1, this is almost exactly what I was going to suggest, but I had to step away from the computer. Using a boolean mask is nice for operations like these because you don't have to do any extra work to calculate the relative complement. Just fyi, using fromiter on a generator instead of array on a list comprehension yields a small speed boost according to my tests.
Sorry, but using sets for creation of the mask is taking out the large weapons for a small problem...
May be another way to create the boolean array can be: mask = numpy.zeros(numpy.shape(a),bool); mask[sel_id] = True; I agree using boolean array is probably the best solution. Thanks!
@mgilson, if you don't want to fine, but how about you edit your answer the answer to use mask[sel_id] = True. That would make it a very nice answer for someone to find. (yes I really think sets are that awful here, for multiple reasons)
|
6

You can do this nicely with boolean masks:

a = numpy.arange(10)

mask = np.ones(len(a), dtype=bool) # all elements included/True.
mask[[7,2,8]] = False              # Set unwanted elements to False

print a[mask]
# Gives (removing entries 7, 2 and 8):
[0 1 3 4 5 6 9]

Addition (taken from @mgilson). The binary mask created can be used nicely to get back the original slices with a[~mask] however this is only the same if the original indices were sorted.


EDIT: Moved down, as I had to realize that I would consider np.delete buggy at this time (Sep. 2012).

You could also use np.delete, though masks are more powerful (and in the future I think that should be an OK option). At the moment however its slower then the above, and will create unexpected results with negative indices (or steps when given a slice).

print np.delete(a, [7,2,8])

4 Comments

Yes -- the second approach is the best so far, and the only pure-numpy linear apprach... seems obvious in retrospect! (Note that behind the scenes, numpy.delete just uses setdiff1d which in turn uses in1d. So it's also n log n.) Would +1 but you already have mine!
@senderle seriously! Thats funny, maybe the np.delete could use a change for that execution path...
@PierreGM, true I never use it too, but if you only want a copy of the array without it and nothing else, I honestly don't see a huge problem with using such a function.
@seberg The fact that it exists means that np.delete is useful for some, of course, but I keep thinking that it's a terrible function that never works as people expect it to (but works exactly as it should). Why adding to the confusion when explicit fancy indexing is so much clearer?
4

It's more like:

a = numpy.array([1, 2, 3, 4, 5, 6, 7, 4])
exclude_index = numpy.arange(5)
include_index = numpy.setdiff1d(numpy.arange(len(a)), exclude_index)
a[include_index]
# array([6, 7, 4])

# Notice this is a little different from
numpy.setdiff1d(a, a[exclude_index])
# array([6, 7]

Comments

1

I would do this with a Boolean mask but a little different. Which has the benefit of working in N-dimensions, with continuous or not indices. Memory usage will depend on if a view or copy is made for the masked array and I am not sure.

import numpy
a = numpy.arange(10)
sel_id = numpy.arange(5)
mask = numpy.ma.make_mask_none(a.shape)
mask[sel_id] = True
answer = numpy.ma.masked_array(a, mask).compressed()
print answer
# [5 6 7 8 9]

1 Comment

Masked arrays may be a really nice option. Though .compressed() somewhat defeats the masked array purpose IMO, as it creates a normal array copy.
0

Also, if they are contiguous use the [N:] syntax to select the rest. For instance, arr[5:] would select the 5th to last element in the array.

Comments

0

Here's another way, using numpy's isin() function:

import numpy as np

a = np.arange(10)
sel_id = np.arange(5)

a[~np.isin(np.arange(a.size), sel_id)]

Explanation:

np.arange(a.size) gives all the indices of a, i.e. [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

np.isin(np.arange(a.size), sel_id) returns a boolean mask [ True, True, True, True, True, False, False, False, False, False] with True at indices which are in sel_id and False otherwise. Since we want to get the indices that are not in sel_id we use the bitwise NOT operator ~ to invert the boolean mask.

Comments

-1

numpy.setdiff1d(a, a[sel_id]) should do the trick. Don't know if there's something neater than this.

1 Comment

Thats not going to work if there are repeated values in the array.
-1

Assuming that a is a 1D array, you could just pop the items you don't want from the list of indices:

accept = [i for i in range(a.size) if i not in avoid_list]
a[accept]

You could also try to use something like

accept = sorted(set(range(a.size)) - set(indices_to_discard))
a[accept]

The idea is to use fancy indexing on the complementary of the set of indices you don't want.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.