Numpy Indexing: Return the rest

Question

A simply example of numpy indexing:

In: a = numpy.arange(10)
In: sel_id = numpy.arange(5)
In: a[sel_id]
Out: array([0,1,2,3,4])

How do I return the rest of the array that are not indexed by sel_id? What I can think of is:

In: numpy.array([x for x in a if x not in a[id]])
out: array([5,6,7,8,9])

Is there any easier way?

Is this a 1-time operation? Or will you be reusing sel_id (and it's negation) down the road? Also, are you interested in the multi-dimensional case, or just the 1D case? — mgilson
– mgilson, Commented Sep 20, 2012 at 17:58
In my application, it will be operated on a multi-dimensional massive array, and yes I will reuse sel_id. — CJLam
– CJLam, Commented Sep 21, 2012 at 19:56
Just realized my solution above is WRONG. If it is an array of ten 1's then the given code will give a None array instead of an array of five 1's. — CJLam
– CJLam, Commented Sep 21, 2012 at 20:10

mgilson · Accepted Answer · 2012-09-22 00:46:48Z

21

For this simple 1D case, I'd actually use a boolean mask:

a = numpy.arange(10)
include_index = numpy.arange(4)
include_idx = set(include_index)  #Set is more efficient, but doesn't reorder your elements if that is desireable
mask = numpy.array([(i in include_idx) for i in xrange(len(a))])

Now you can get your values:

included = a[mask]  # array([0, 1, 2, 3])
excluded = a[~mask] # array([4, 5, 6, 7, 8, 9])

Note that a[mask] doesn't necessarily yield the same thing as a[include_index] since the order of include_index matters for the output in that scenario (it should be roughly equivalent to a[sorted(include_index)]). However, since the order of your excluded items isn't well defined, this should work Ok.

EDIT

A better way to create the mask is:

mask = np.zeros(a.shape,dtype=bool)
mask[include_idx] = True

(thanks to seberg).

edited Sep 22, 2012 at 0:46

answered Sep 20, 2012 at 18:17

mgilson

312k70 gold badges656 silver badges722 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

mgilson Over a year ago

@BiRico -- wrong. I converted include_index to a set (called include_idx) which has a __contains__ method that goes in O(1). This solution has O(N) complexity.

senderle Over a year ago

+1, this is almost exactly what I was going to suggest, but I had to step away from the computer. Using a boolean mask is nice for operations like these because you don't have to do any extra work to calculate the relative complement. Just fyi, using fromiter on a generator instead of array on a list comprehension yields a small speed boost according to my tests.

seberg Over a year ago

Sorry, but using sets for creation of the mask is taking out the large weapons for a small problem...

CJLam Over a year ago

May be another way to create the boolean array can be: mask = numpy.zeros(numpy.shape(a),bool); mask[sel_id] = True; I agree using boolean array is probably the best solution. Thanks!

seberg Over a year ago

@mgilson, if you don't want to fine, but how about you edit your answer the answer to use mask[sel_id] = True. That would make it a very nice answer for someone to find. (yes I really think sets are that awful here, for multiple reasons)

|

seberg · Accepted Answer · 2012-09-22 10:06:17Z

6

You can do this nicely with boolean masks:

a = numpy.arange(10)

mask = np.ones(len(a), dtype=bool) # all elements included/True.
mask[[7,2,8]] = False              # Set unwanted elements to False

print a[mask]
# Gives (removing entries 7, 2 and 8):
[0 1 3 4 5 6 9]

Addition (taken from @mgilson). The binary mask created can be used nicely to get back the original slices with a[~mask] however this is only the same if the original indices were sorted.

EDIT: Moved down, as I had to realize that I would consider np.delete buggy at this time (Sep. 2012).

You could also use np.delete, though masks are more powerful (and in the future I think that should be an OK option). At the moment however its slower then the above, and will create unexpected results with negative indices (or steps when given a slice).

print np.delete(a, [7,2,8])

edited Sep 22, 2012 at 10:06

answered Sep 20, 2012 at 20:33

seberg

9,0352 gold badges34 silver badges30 bronze badges

4 Comments

senderle Over a year ago

Yes -- the second approach is the best so far, and the only pure-numpy linear apprach... seems obvious in retrospect! (Note that behind the scenes, numpy.delete just uses setdiff1d which in turn uses in1d. So it's also n log n.) Would +1 but you already have mine!

seberg Over a year ago

@senderle seriously! Thats funny, maybe the np.delete could use a change for that execution path...

seberg Over a year ago

@PierreGM, true I never use it too, but if you only want a copy of the array without it and nothing else, I honestly don't see a huge problem with using such a function.

Pierre GM Over a year ago

@seberg The fact that it exists means that np.delete is useful for some, of course, but I keep thinking that it's a terrible function that never works as people expect it to (but works exactly as it should). Why adding to the confusion when explicit fancy indexing is so much clearer?

Bi Rico · Accepted Answer · 2012-09-20 18:07:12Z

4

It's more like:

a = numpy.array([1, 2, 3, 4, 5, 6, 7, 4])
exclude_index = numpy.arange(5)
include_index = numpy.setdiff1d(numpy.arange(len(a)), exclude_index)
a[include_index]
# array([6, 7, 4])

# Notice this is a little different from
numpy.setdiff1d(a, a[exclude_index])
# array([6, 7]

answered Sep 20, 2012 at 18:07

Bi Rico

25.9k3 gold badges57 silver badges75 bronze badges

Comments

Brian Larsen · Accepted Answer · 2012-09-20 22:24:32Z

1

I would do this with a Boolean mask but a little different. Which has the benefit of working in N-dimensions, with continuous or not indices. Memory usage will depend on if a view or copy is made for the masked array and I am not sure.

import numpy
a = numpy.arange(10)
sel_id = numpy.arange(5)
mask = numpy.ma.make_mask_none(a.shape)
mask[sel_id] = True
answer = numpy.ma.masked_array(a, mask).compressed()
print answer
# [5 6 7 8 9]

answered Sep 20, 2012 at 22:24

Brian Larsen

1,76617 silver badges28 bronze badges

1 Comment

seberg Over a year ago

Masked arrays may be a really nice option. Though .compressed() somewhat defeats the masked array purpose IMO, as it creates a normal array copy.

reptilicus · Accepted Answer · 2012-09-20 18:17:15Z

0

Also, if they are contiguous use the [N:] syntax to select the rest. For instance, arr[5:] would select the 5th to last element in the array.

answered Sep 20, 2012 at 18:17

reptilicus

10.4k6 gold badges59 silver badges80 bronze badges

Comments

kuzand · Accepted Answer · 2020-01-06 08:02:52Z

0

Here's another way, using numpy's isin() function:

import numpy as np

a = np.arange(10)
sel_id = np.arange(5)

a[~np.isin(np.arange(a.size), sel_id)]

Explanation:

np.arange(a.size) gives all the indices of a, i.e. [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

np.isin(np.arange(a.size), sel_id) returns a boolean mask [ True, True, True, True, True, False, False, False, False, False] with True at indices which are in sel_id and False otherwise. Since we want to get the indices that are not in sel_id we use the bitwise NOT operator ~ to invert the boolean mask.

answered Jan 6, 2020 at 8:02

kuzand

9,8964 gold badges48 silver badges50 bronze badges

Comments

Harel · Accepted Answer · 2012-09-20 17:55:30Z

-1

numpy.setdiff1d(a, a[sel_id]) should do the trick. Don't know if there's something neater than this.

answered Sep 20, 2012 at 17:55

Harel

3271 silver badge5 bronze badges

1 Comment

reptilicus Over a year ago

Thats not going to work if there are repeated values in the array.

Pierre GM · Accepted Answer · 2012-09-20 21:21:32Z

-1

Assuming that a is a 1D array, you could just pop the items you don't want from the list of indices:

accept = [i for i in range(a.size) if i not in avoid_list]
a[accept]

You could also try to use something like

accept = sorted(set(range(a.size)) - set(indices_to_discard))
a[accept]

The idea is to use fancy indexing on the complementary of the set of indices you don't want.

answered Sep 20, 2012 at 21:21

Pierre GM

20.5k3 gold badges58 silver badges67 bronze badges

Collectives™ on Stack Overflow

Numpy Indexing: Return the rest

8 Answers 8

7 Comments

4 Comments

Comments

1 Comment

Comments

Comments

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

8 Answers 8

7 Comments

4 Comments

Comments

1 Comment

Comments

Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related