If you have only a limited number of elements to index, you might want to store the indices as a list of indices. For this purpose the numpy.nonzero is very useful. However, if your number of elements to index is large, you use less memory by storing the boolean array (1 byte per element).
So, there are four possibilities:
- store a boolean array
- store indices of non-zeros
- always make the comparison separately
- masked arrays
From the memory storage point of view alternative 1 takes 8 bytes per indexed element per dimension. (Of course, the "per dimension" can be avoided by using flat indices.) The boolean approach takes 1 byte per element, so if you have more than 1/8 of the elements with a True in the boolean table, it is the more space-saving solution. Solution #3 probably takes the same space as the boolean solution.
(I do not know enough of the internals of NumPy to say much about the masked arrays. I suspect they behave in a similar way to the boolean indexing.)
Performance-wise the situation is similar. The boolean solution is efficient, if you have a lot of elements to pick, but if you have only a few elements, then the index solution is better.
Just to give some benchmarking idea:
import numpy as np
import time
def create_indices(prob):
data = np.random.random(100000000) < prob
return data, np.nonzero(data)
def bool_index(data):
return data[data]
def list_index(data, indices):
return data[indices]
By using timeit with different probabilities, the results are:
p boolean list
0.01 0.206 0.012
0.10 0.415 0.099
0.20 0.405 0.146
0.50 0.786 0.373
0.75 0.539 0.555
1.00 0.214 0.723
This is actually quite interesting: Using boolean indices is worst when half of the elements are True. The use of list indices behaves as expected.
This benchmark must not be taken as the whole truth. It may be that the type of the array to be indexed changes the situation (here bool which is the same as uint8), etc. However, it seems that performance-wise list indices are very good in most cases.
Wcontains only0or1, you can do:nulls = (W==0)and thenX0 = X[nulls]andX1 = X[np.logical_not(nulls)]. The latter could beX1 = X[~nulls]