1

Having a dataframe with ngrams of Italian text. Looking like that:

    Name
0   accensione del drive ribobinatrice ho
1   actions urgente proporre al cliente
2   al cliente upgrade del drive
3   al drive con una smontata
4   causa di un problema di

I would like to search for combination of words 'cliente problema'

In my logics it should give me row number 1,2 and 4.

Using the approach with contains() but it returns the empty Series:

Term = 'cliente problema'

x_word = df_pentagrams.Name[df_pentagrams.Name.str.contains(Term)]

How can this problem be solved in Pandas?

Thanks!

4 Answers 4

2

Your expectations are wrong regarding the behavior of str.contains. As you are using str.contains in your example you are searching for an explicit string cliente problema, but based on your expectation you aren't looking for clienta problema as a string, but for either clienta or problema occurring in any of the records.

Instead of treating clienta problema as a string you should split that string into a list and then use that list when you filter the DataFrame:

terms = term.split(' ')
df_penagrams.Name[df_pentagrams.Name.str.contains('|'.join(terms))
Sign up to request clarification or add additional context in comments.

1 Comment

This method may not suit because it doesn't filter for words in Name, e.g. problematic would be caught.
2

The problem is you are searching for the exact string 'cliente problema' not 'cliente' OR 'problema'.

This is what you want to do:

    Term1 = 'cliente' 
    Term2 = 'problema'

    x_word = df_pentagrams.Name[df_pentagrams.Name.str.contains(Term1) 
| df_pentagrams.Name.str.contains(Term2)]

2 Comments

My answer was posted before I saw vealkind's. I prefer their solution as it scales to any number of search terms.
This method may not suit because it doesn't filter for words in Name, e.g. problematic would be caught.
2

You can use either regex or a list comprehension to filter for words:

df = pd.DataFrame({'Name': ['accensione del drive ribobinatrice ho',
                            'actions urgente proporre al cliente',
                            'al cliente upgrade del drive',
                            'al drive con una smontata',
                            'causa di un problema di']})

Term = 'cliente problema'

# regex
p = '|'.join(Term.split())
res = df[df['Name'].str.contains(r'\b{}\b'.format(p))]

# list comprehension
res = df[[any(i in words for i in Term.split()) \
          for words in df['Name'].str.split().values]]

print(res)

                                  Name
1  actions urgente proporre al cliente
2         al cliente upgrade del drive
4              causa di un problema di

Comments

1

Try using the '|' character to join your separate terms in the search string. At the moment your code attempts to match the entire 'cliente problema' string, which none of your rows contain.

df = pd.DataFrame(data = ['accensione del drive ribobinatrice ho',
'actions urgente proporre al cliente',
'al cliente upgrade del drive',
'al drive con una smontata',
'causa di un problema di',], columns = ['Name'])

Term = 'cliente problema'

x_word = df.Name[df.Name.str.contains('|'.join(Term.split(' ')))]

1 Comment

This method may not suit because it doesn't filter for words in Name, e.g. problematic would be caught.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.