0

Kind of a confusing question, but I'll explain it thoroughly.

Here is dataframe 1

ID Name Letter Range
16x019CF123 Mike Aasd 12134
EMU_x123FF2 Lye BASD 21231
SAT_xFF314C Rike GSDAS 21341

Dataframe 2

Index ID
0 019CF123
1 123FF2
2 FF314C

So now I have 2 Panda Datframes

ID in DF2 corresponds to ID in DF1, however not fully.

ID in DF1 |ID in DF2

16x019CF123 | 019CF123 (Notice that the ID in DF2 is just everything after "x" in DF1)

Now, here is what I need to do.

I need to extract entire rows with the ID's from DF 1 which are NOT in DF 2

Hope I made it as clear as I can.

1 Answer 1

1

You can extract the ID after the 'x' (here using a regex, but you could also split on 'x' and take the last item) and check if the value isin the reference column. Finally use this info (that is a Series of booleans) to slice the initial dataframe, after inverting the condition (to get "not in"):

df1[~df1['ID'].str.extract('(?<=x)(.*$)').isin(df2['ID'])]

If you want to better understand how this works, here is a version with intermediate variables, you can print them to see the steps:

clean_ID = df1['ID'].str.extract('(?<=x)(.*$)')
mask = clean_ID.isin(df2['ID'])
df3 = df1[~mask]
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.