1

Assuming we have a two dimensional array like the following:

array1 = np.array([[1,4,3, 64356,5435,434],
                   [11,46,3, 7356,585,74],
                   [51,406,3, 769,5435,24],
                   [12,45,5, 656,135,134],
                   [112,475,5, 656,1385,134],
                   [13,46,  5, 656,1385,19]])

the row 4 and 5 are not unique in terms or their 2,3,4 columns , for which we want to drop one of them. Is there an efficient way to drop rows of an array and make its rows unique in terms of selected columns of it?

7
  • possible duplicate of stackoverflow.com/questions/16970982/… Commented Dec 16, 2020 at 15:27
  • @QuangHoang but that does not allow selectivity by columns. Commented Dec 16, 2020 at 15:27
  • work on array1[:,your_select_column] and get the index? Commented Dec 16, 2020 at 15:28
  • 1
    If using pandas is an option. You can turn your array to a dataframe and use drop_duplicates function which accepts subset argument which is used to select columns Commented Dec 16, 2020 at 15:32
  • 1
    pd.DataFrame(array1).drop_duplicates(your_select_column)? Commented Dec 16, 2020 at 15:35

2 Answers 2

2

A solution in pure numpy:

_, idx = np.unique(array1[:,[2,3,4]], axis=0, return_index=True)
array1[sorted(idx)]

Output:

array([[    1,     4,     3, 64356,  5435,   434],
       [   11,    46,     3,  7356,   585,    74],
       [   51,   406,     3,   769,  5435,    24],
       [   12,    45,     5,   656,   135,   134],
       [  112,   475,     5,   656,  1385,   134]])
Sign up to request clarification or add additional context in comments.

Comments

1

Convert to pandas and back as suggested by S.Mohsen

Code:

import pandas as pd
import numpy as np

array1 = np.array([[1,4,3, 64356,5435,434],
                   [11,46,3, 7356,585,74],
                   [51,406,3, 769,5435,24],
                   [12,45,5, 656,135,134],
                   [112,475,5, 656,1385,134],
                   [13,46,  5, 656,1385,19]])
                   
df = pd.DataFrame(data=array1)
print(df)
df.drop_duplicates(subset=[2,3],inplace=True)
print(df)

array2=df.values
print(array2)

Output:

     0    1  2      3     4    5
0    1    4  3  64356  5435  434
1   11   46  3   7356   585   74
2   51  406  3    769  5435   24
3   12   45  5    656   135  134
4  112  475  5    656  1385  134
5   13   46  5    656  1385   19

    0    1  2      3     4    5
0   1    4  3  64356  5435  434
1  11   46  3   7356   585   74
2  51  406  3    769  5435   24
3  12   45  5    656   135  134

[[    1     4     3 64356  5435   434]
 [   11    46     3  7356   585    74]
 [   51   406     3   769  5435    24]
 [   12    45     5   656   135   134]]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.