Drop specific rows in pandas from a numpy array

Question

I have a dataframe a thousands of rows long that looks like this:

ID  Email Address
1   ...    ... 
2   ...    ... 
3   ...    ... 
4   ...    ... 
1   ...    ... 
2   ...    ... 
5   ...    ... 
5   ...    ... 
6   ...    ...

what I want to do is drop duplicates of ID so there is only one ID per person. I can't use drop_duplicates() because most people don't have ID's and this drops them too (not good!)

Is there a way to remove specific rows and only keep one instance of the IDs.

I have a dataframe of all the duplicate ID I want to remove if that helps. e.g. for the example I gave above:

ID  Email  Address
1   ...    ...
2   ...    ...
5   ...    ...

Maybe there's a way to turn this to a series/array of IDs and remove from the df that way?

@nixon I think that blank entries are also being considered as duplicates so thousands of rows are being removed just because an ID is not present — user8322222
– user8322222, Commented Dec 21, 2018 at 11:42

jezrael · Accepted Answer · 2018-12-21 11:46:20Z

1

I believe you need chain 2 conditions - duplicated with keep=False for all dupes with no parameter for first dupes:

df = df[df.duplicated(subset='ID', keep=False) & df.duplicated(subset='ID')]
print (df)
   ID Email Address
4   1   ...     ...
5   2   ...     ...
7   5   ...     ...

edited Dec 21, 2018 at 11:46

answered Dec 21, 2018 at 11:34

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

jezrael Over a year ago

@user8322222 - Super, glad can help!

yatu · Accepted Answer · 2018-12-21 11:52:04Z

1

Is this what you want?

df[df.duplicated(subset='ID')]

    ID Email Address
4   1   ...     ...
5   2   ...     ...
7   5   ...     ...

edited Dec 21, 2018 at 11:52

answered Dec 21, 2018 at 11:38

yatu

88.7k12 gold badges93 silver badges148 bronze badges

4 Comments

user8322222 Over a year ago

Hi nixon, unfortunately this seems to be dropping blank entries for ID too (same issue as drop_duplicates() I imagine)

yatu Over a year ago

Blank entries for ID? Please could you give an example of your desired output?

yatu Over a year ago

Seing that what you want from the other answer, you can simply do this

user8322222 Over a year ago

hi nixon, I was looking for the following: ID Email Address 1 ... ... 2 ... ... 3 ... ... 4 ... ... 5 ... ... and it was answered. Thanks for your help and time though! :D

Collectives™ on Stack Overflow

Drop specific rows in pandas from a numpy array

2 Answers 2

1 Comment

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related