efficient way to make an array unique by column

Question

Assuming we have a two dimensional array like the following:

array1 = np.array([[1,4,3, 64356,5435,434],
                   [11,46,3, 7356,585,74],
                   [51,406,3, 769,5435,24],
                   [12,45,5, 656,135,134],
                   [112,475,5, 656,1385,134],
                   [13,46,  5, 656,1385,19]])

the row 4 and 5 are not unique in terms or their 2,3,4 columns , for which we want to drop one of them. Is there an efficient way to drop rows of an array and make its rows unique in terms of selected columns of it?

possible duplicate of stackoverflow.com/questions/16970982/… — Quang Hoang
– Quang Hoang, Commented Dec 16, 2020 at 15:27
If using pandas is an option. You can turn your array to a dataframe and use drop_duplicates function which accepts subset argument which is used to select columns — S.Mohsen sh
– S.Mohsen sh, Commented Dec 16, 2020 at 15:32

Quang Hoang · Accepted Answer · 2020-12-16 15:37:17Z

2

A solution in pure numpy:

_, idx = np.unique(array1[:,[2,3,4]], axis=0, return_index=True)
array1[sorted(idx)]

Output:

array([[    1,     4,     3, 64356,  5435,   434],
       [   11,    46,     3,  7356,   585,    74],
       [   51,   406,     3,   769,  5435,    24],
       [   12,    45,     5,   656,   135,   134],
       [  112,   475,     5,   656,  1385,   134]])

answered Dec 16, 2020 at 15:37

Quang Hoang

151k11 gold badges64 silver badges86 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Aaj Kaal · Accepted Answer · 2020-12-16 15:44:43Z

Convert to pandas and back as suggested by S.Mohsen

Code:

import pandas as pd
import numpy as np

array1 = np.array([[1,4,3, 64356,5435,434],
                   [11,46,3, 7356,585,74],
                   [51,406,3, 769,5435,24],
                   [12,45,5, 656,135,134],
                   [112,475,5, 656,1385,134],
                   [13,46,  5, 656,1385,19]])
                   
df = pd.DataFrame(data=array1)
print(df)
df.drop_duplicates(subset=[2,3],inplace=True)
print(df)

array2=df.values
print(array2)

Output:

     0    1  2      3     4    5
0    1    4  3  64356  5435  434
1   11   46  3   7356   585   74
2   51  406  3    769  5435   24
3   12   45  5    656   135  134
4  112  475  5    656  1385  134
5   13   46  5    656  1385   19

    0    1  2      3     4    5
0   1    4  3  64356  5435  434
1  11   46  3   7356   585   74
2  51  406  3    769  5435   24
3  12   45  5    656   135  134

[[    1     4     3 64356  5435   434]
 [   11    46     3  7356   585    74]
 [   51   406     3   769  5435    24]
 [   12    45     5   656   135   134]]

Collectives™ on Stack Overflow

efficient way to make an array unique by column

2 Answers 2

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related