3

I have a 2D array of arrays defined as follows:

traces = [['x1',11026,0,0,0,0],
          ['x0',11087,0,0,0,1],
          ['x0',11088,0,0,1,3],
          ['x0',11088,0,0,0,3],
          ['x0',11088,0,1,0,1]]

I want to find the index of the row which matches multiple conditions of selected columns. For example I want to find the row in this array where

row[0]=='x0' & row[1]==11088 & row[3]==1 & row[5]=1

Searching on this criteria should return 4.

I attempted to use numpy.where but can't seem to make it work with multiple conditions

print np.where((traces[:,0] == 'x0') & (traces[:,1] == 11088) & (traces[:,3] == 1) & (traces[:,5] == 1))

The above creates the warning

FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison   print np.where((traces[:,0] == 'x0') & (traces[:,1] == 11088) & (traces[:,3]
== 1) & (traces[:,5] == 1)) (array([], dtype=int32),)

I've attempted to use numpy.logical_and as well and that doesn't seem to work either, creating similar warnings.

Any way I can do this using numpy.where without iterating over the whole 2D array?

Thanks

2 Answers 2

3

I strongly assume you did something like this (conversion to np.array):

traces = [['x1',11026,0,0,0,0],
          ['x0',11087,0,0,0,1],
          ['x0',11088,0,0,1,3],
          ['x0',11088,0,0,0,3],
          ['x0',11088,0,1,0,1]]
          
traces = np.array(traces)

This exhibits the described error. The reason can be seen by printing the resulting array:

print(traces)
# array([['x1', '11026', '0', '0', '0', '0'],
#        ['x0', '11087', '0', '0', '0', '1'],
#        ['x0', '11088', '0', '0', '1', '3'],
#        ['x0', '11088', '0', '0', '0', '3'],
#        ['x0', '11088', '0', '1', '0', '1']],
#       dtype='<U5')

Numbers were converted to strings!

When constructing an array that contains values of different types, numpy usually creates an array of dtype=object. This works in most cases but has bad performance.

However, in this case numpy apparently tried to be smart and converted the data to a string type, which is more specific than object but general enough to take numbers - as strings.

As a solution construct the array explicitly as an "object array":

traces = np.array(traces, dtype='object')

print(np.where((traces[:,0] == 'x0') & (traces[:,1] == 11088) & (traces[:,3] == 1) & (traces[:,5] == 1)))
# (array([4], dtype=int32),)

Note that although this works, object arrays are often not a good idea to use. Consider instead to replace the strings in the first column with numeric values.

Sign up to request clarification or add additional context in comments.

2 Comments

You are correct, it was cast into a numpy array and I didn't realize this would change the numbers to strings.
@ozal I was surprised by this too ;)
2

Consider this comparison:

>>> traces[:,[0,1,3,5]] == ['x0', 11088, 1, 1]
array([[False, False, False, False],
       [ True, False, False,  True],
       [ True,  True, False, False],
       [ True,  True, False, False],
       [ True,  True,  True,  True]])

we are looking for one (or more) row(s) with all values equal to True:

>>> np.where(np.all(traces[:,[0,1,3,5]] == ['x0', 11088, 1, 1], axis=1))
(array([4]),)

3 Comments

Thanks, this is a very elegant solution. What is the purpose of axis=1 here?
To sum along a specified axis, in this case to sum by row.
Better use np.all insteaf of np.sum, which is somewhat more elegant (and should perform better if that matters).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.