0

Say I have a numpy array:

Y.shape = (n, 3)

where n is the amount of rows in the numpy array.

I split Y based on the values of the second column by following this thread:

distances = [Y[Y[:, 1] == k] for k in np.unique(Y[:, 1])]

Distances is now a list of numpy arrays of N length, where N is the number of possible values in the second column. I create a loop to split each array in distances, repeating the above step, however splitting by the last column this time like so:

for idx, dist in enumerate(distances):    
  conditions = [dist[dist[:, 2] == k] for k in np.unique(dist[:, 2])]
  # Save conditions list and do something with it 

How in numpy can I get the row indexes of the oringal Y numpy array that correspond to each numpy array in conditions?

4
  • For me, the snippet you posted to find conditions results in losing any arrays in distances that were 1 row. E.g. if I start with Y = np.array([[1,2,3], [3,4,5], [5,6,7], [7,8,9], [10,8,3], [11,3,2]]), the step to find distances keeps all rows, but the final step leaves me with conditions = [array([[10, 8, 3]]), array([[7, 8, 9]])] while discarding all other rows. Is this supposed to happen? Commented Mar 10, 2022 at 18:59
  • Yes this is correct! As I will iterate through each numpy array in the distances list. Commented Mar 10, 2022 at 19:18
  • I meant that even with the enumerate statement, rows are discarded because conditions is being overwritten during every iteration of the loop. The whole for idx, dist in enumerate(distances) section only keeps rows from my Y array where 2+ rows have the same value in the middle column. Commented Mar 10, 2022 at 19:22
  • Updated the question, I am saving the conditions list after each iteration of the loop, what I am looking for is the matching original indexes of Y Commented Mar 10, 2022 at 19:27

1 Answer 1

0

Assuming you're storing conditions in another list (I used all_conditions in my code), then this is a potential start-to-finish solution:

from functools import reduce
import operator

# The code you posted
distances = [Y[Y[:, 1] == k] for k in np.unique(Y[:, 1])]

# conditions are stored in this list
all_conditions = []
for idx, dist in enumerate(distances):
    conditions = [dist[dist[:, 2] == k] for k in np.unique(dist[:, 2])]
    all_conditions.append(conditions)

# This step flattens all_conditions so there are no nested lists.
all_conditions = reduce(operator.concat, list(all_conditions))

# For some reason, each row of 3 is within an extra bracket,
# so need to index the 0th element of each element in all_conditions.
# There is probably a more efficient way to extract them than a for loop,
# but this is the best I can come up with.

indices = np.zeros((len(all_conditions),3), dtype=int)
for i in range(len(all_conditions)):
    indices[i] = all_conditions[i][0]

# Select the values from X using the indices array as the indices.
selected = X[tuple(indices.T)]

Let me know if there's anything that needs clarification.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.