Find if part of a string is within a Dataframe in pandas dataframe

Question

Kind of a confusing question, but I'll explain it thoroughly.

Here is dataframe 1

ID	Name	Letter	Range
16x019CF123	Mike	Aasd	12134
EMU_x123FF2	Lye	BASD	21231
SAT_xFF314C	Rike	GSDAS	21341

Dataframe 2

Index	ID
0	019CF123
1	123FF2
2	FF314C

So now I have 2 Panda Datframes

ID in DF2 corresponds to ID in DF1, however not fully.

ID in DF1 |ID in DF2

16x019CF123 | 019CF123 (Notice that the ID in DF2 is just everything after "x" in DF1)

Now, here is what I need to do.

I need to extract entire rows with the ID's from DF 1 which are NOT in DF 2

Hope I made it as clear as I can.

mozway · Accepted Answer · 2021-10-25 03:16:08Z

1

You can extract the ID after the 'x' (here using a regex, but you could also split on 'x' and take the last item) and check if the value isin the reference column. Finally use this info (that is a Series of booleans) to slice the initial dataframe, after inverting the condition (to get "not in"):

df1[~df1['ID'].str.extract('(?<=x)(.*$)').isin(df2['ID'])]

If you want to better understand how this works, here is a version with intermediate variables, you can print them to see the steps:

clean_ID = df1['ID'].str.extract('(?<=x)(.*$)')
mask = clean_ID.isin(df2['ID'])
df3 = df1[~mask]

edited Oct 25, 2021 at 3:16

answered Oct 25, 2021 at 3:09

mozway

267k13 gold badges56 silver badges106 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Find if part of a string is within a Dataframe in pandas dataframe

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related