2

I have a 2D numpy array, say A sorted with respect to Column 0. e.g.

Col.0 Col.1 Col.2
10 2.45 3.25
11 2.95 4
12 3.45 4.25
15 3.95 5
18 4.45 5.25
21 4.95 6
23 5.45 6.25
27 5.95 7
29 6.45 7.25
32 6.95 8
35 7.45 8.25

The entries in each row is unique i.e. Col. 0 is the identification number of a co-ordinate in xy plane, Columns 1 and 2 are x and y co-ordinates of these points. I have another array B (rows can contain duplicate data). Column 0 and Column 1 store x and y co-ordinates.

Col.0 Col.1
2.45 3.25
4.45 5.25
6.45 7.25
2.45 3.25

My aim is to find the row index number in array A corresponding to data in array B without using for loop. So, in this case, my output should be [0,4,8,0]. Now, I know that with numpy searchsorted lookup for multiple data can be done in one shot. But, it can be used to compare with a single column of A and not multiple columns. Is there a way to do this?

2
  • searchsorted won't help because the array is not sorted by the column(s) you're actually searching in. Commented Mar 4, 2021 at 22:32
  • 1
    Please avoid posting tables for data frames. Posting the actual data (or even better the code for data frame) would help us run your example data. Commented Mar 4, 2021 at 23:52

2 Answers 2

1

Pure numpy solution:

My intuition is that I take the difference c between a[:,1:] and b by broadcasting, such that c is of shape (11, 4, 2). The rows that match will be all zeros. Then I do c == False to obtain a mask. I do c.all(2) which results in a boolean array of shape (11, 4), where all True elements represents matches between a and b. Then I simply use np.nonzero to obtain the indices of said elements.

import numpy as np

a = np.array([
    [10, 2.45, 3.25],
    [11, 2.95, 4],
    [12, 3.45, 4.25],
    [15, 3.95, 5],
    [18, 4.45, 5.25],
    [21, 4.95, 6],
    [23, 5.45, 6.25],
    [27, 5.95, 7],
    [29, 6.45, 7.25],
    [32, 6.95, 8],
    [35, 7.45, 8.25],
])

b = np.array([
    [2.45, 3.25],
    [4.45, 5.25],
    [6.45, 7.25],
    [2.45, 3.25],
])

c = (a[:,np.newaxis,1:]-b) == False
rows, cols = c.all(2).nonzero()
print(rows[cols.argsort()])
# [0 4 8 0]
Sign up to request clarification or add additional context in comments.

2 Comments

This really helped. I will now be able to implement this logic in a very large array.
@SudiptaLalBasu Nice to see that it worked for you! However, I noticed that my code matches when either the first or second element are equal. It may have worked for you anyways if you had all continuous points, as they are unlikely to get exact same values, but it is not strictly correct as it can match too mutch. To ensure that you require both elements to match, change: rows, cols = c.sum(2).nonzero() to rows, cols = c.all(2).nonzero(), I will edit my submission to reflect this. Sorry for the inconvenience!
0

You can use merge in pandas:

df2.merge(df1.reset_index(),how='left',left_on=['Col.0','Col.1'],right_on=['Col.1','Col.2'])['index']

output:

0    0
1    4
2    8
3    0
Name: index, dtype: int64

and if you like it as array:

df2.merge(df1.reset_index(),how='left',left_on=['Col.0','Col.1'],right_on=['Col.1','Col.2'])['index'].to_numpy()
#array([0, 4, 8, 0])

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.