Searching in numpy array

Question

I have a 2D numpy array, say A sorted with respect to Column 0. e.g.

Col.0	Col.1	Col.2
10	2.45	3.25
11	2.95	4
12	3.45	4.25
15	3.95	5
18	4.45	5.25
21	4.95	6
23	5.45	6.25
27	5.95	7
29	6.45	7.25
32	6.95	8
35	7.45	8.25

The entries in each row is unique i.e. Col. 0 is the identification number of a co-ordinate in xy plane, Columns 1 and 2 are x and y co-ordinates of these points. I have another array B (rows can contain duplicate data). Column 0 and Column 1 store x and y co-ordinates.

Col.0	Col.1
2.45	3.25
4.45	5.25
6.45	7.25
2.45	3.25

My aim is to find the row index number in array A corresponding to data in array B without using for loop. So, in this case, my output should be [0,4,8,0]. Now, I know that with numpy searchsorted lookup for multiple data can be done in one shot. But, it can be used to compare with a single column of A and not multiple columns. Is there a way to do this?

searchsorted won't help because the array is not sorted by the column(s) you're actually searching in. — user4815162342
– user4815162342, Commented Mar 4, 2021 at 22:32
Please avoid posting tables for data frames. Posting the actual data (or even better the code for data frame) would help us run your example data. — Ehsan
– Ehsan, Commented Mar 4, 2021 at 23:52

Naphat Amundsen · Accepted Answer · 2021-03-06 09:56:11Z

1

Pure numpy solution:

My intuition is that I take the difference c between a[:,1:] and b by broadcasting, such that c is of shape (11, 4, 2). The rows that match will be all zeros. Then I do c == False to obtain a mask. I do c.all(2) which results in a boolean array of shape (11, 4), where all True elements represents matches between a and b. Then I simply use np.nonzero to obtain the indices of said elements.

import numpy as np

a = np.array([
    [10, 2.45, 3.25],
    [11, 2.95, 4],
    [12, 3.45, 4.25],
    [15, 3.95, 5],
    [18, 4.45, 5.25],
    [21, 4.95, 6],
    [23, 5.45, 6.25],
    [27, 5.95, 7],
    [29, 6.45, 7.25],
    [32, 6.95, 8],
    [35, 7.45, 8.25],
])

b = np.array([
    [2.45, 3.25],
    [4.45, 5.25],
    [6.45, 7.25],
    [2.45, 3.25],
])

c = (a[:,np.newaxis,1:]-b) == False
rows, cols = c.all(2).nonzero()
print(rows[cols.argsort()])
# [0 4 8 0]

edited Mar 6, 2021 at 9:56

answered Mar 5, 2021 at 0:10

Naphat Amundsen

1,6331 gold badge9 silver badges21 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Sudipta Lal Basu Over a year ago

This really helped. I will now be able to implement this logic in a very large array.

Naphat Amundsen Over a year ago

@SudiptaLalBasu Nice to see that it worked for you! However, I noticed that my code matches when either the first or second element are equal. It may have worked for you anyways if you had all continuous points, as they are unlikely to get exact same values, but it is not strictly correct as it can match too mutch. To ensure that you require both elements to match, change: rows, cols = c.sum(2).nonzero() to rows, cols = c.all(2).nonzero(), I will edit my submission to reflect this. Sorry for the inconvenience!

Ehsan · Accepted Answer · 2021-03-04 23:58:08Z

0

You can use merge in pandas:

df2.merge(df1.reset_index(),how='left',left_on=['Col.0','Col.1'],right_on=['Col.1','Col.2'])['index']

output:

0    0
1    4
2    8
3    0
Name: index, dtype: int64

and if you like it as array:

df2.merge(df1.reset_index(),how='left',left_on=['Col.0','Col.1'],right_on=['Col.1','Col.2'])['index'].to_numpy()
#array([0, 4, 8, 0])

edited Mar 4, 2021 at 23:58

answered Mar 4, 2021 at 23:50

Ehsan

12.5k2 gold badges24 silver badges36 bronze badges

Collectives™ on Stack Overflow

Searching in numpy array

2 Answers 2

Pure numpy solution:

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Pure numpy solution:

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related