2

I have a column of data that looks like this:

import pandas as pd
import numpy as np

   Items
0  Product A + Product B + Product C   
1  Product A + Product B + Product B1 + Product C1 
2  

I would like to look through the items and find out if the column contains a few specific items, relating to products I am interested in flagging as containing within the items column :

My_Items = ['Product B', 'Product C', 'Product C1']

I've tried the following lambda function but it is not picking up the strings i'm searching if there is more than 1 product within the column:

df['My Items'] = df['Items'].apply(lambda x: 'Contains my items' if x in My_Items else '')

Does anyone know how can search for multiple strings in a list within a lambda function?

Thank you for any help or suggestions.

Kind regards

1
  • What is expected output? Commented Apr 24, 2020 at 10:22

3 Answers 3

3

Use Series.str.count for count matched values and then tested with Series.gt for greater like 1:

mask = df.Items.str.count('|'.join(My_Items)).gt(1)

df['My Items'] = np.where(mask,'Contains 2 or more items', '')
print (df)
                                             Items                  My Items
0                Product A + Product B + Product C  Contains 2 or more items
1  Product A + Product B + Product B1 + Product C1  Contains 2 or more items

Details:

print (df.Items.str.count('|'.join(My_Items)))
0    2
1    3
Name: Items, dtype: int64
Sign up to request clarification or add additional context in comments.

Comments

2

IIUC you may use str.findall and check we get at least 2 matches:

import numpy as np

m = df.Items.str.findall('|'.join(My_Items)).str.len().ge(2)
df['My items'] = np.where(m, 'Contains at least 2 items', '')

If we check with an additional row containing only 1 of the products:

print(df)

                        Items  \
0                Product A + Product B + Product C      
1  Product A + Product B + Product B1 + Product C1     
2                            Product A + Product D    

                    My items  
0  Contains at least 2 items  
1  Contains at least 2 items  
2                             

Where df.Items.str.findall('|'.join(My_Items)) is giving you a list with all found matches:

df.Items.str.findall('|'.join(My_Items))

 [Product B, Product C]
1    [Product B, Product B, Product C]
2                                   []
Name: Items, dtype: object

Comments

-1

Thank you guys! The solution I was looking for ended up being a combination of both of your answers!

What I ended up doing was this for the mask, so I could filter:

DF['My_Items'] = DF.Items.str.findall('|'.join(My_list)).str.len().gt(1)

Then this for the list of items, so I can now analyse the combinations:

DF['My_Items'] = DF.Items.str.findall('|'.join(My_list)).astype(str)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.