Searching in Pandas Dataframe in Python

Question

Having a dataframe with ngrams of Italian text. Looking like that:

    Name
0   accensione del drive ribobinatrice ho
1   actions urgente proporre al cliente
2   al cliente upgrade del drive
3   al drive con una smontata
4   causa di un problema di

I would like to search for combination of words 'cliente problema'

In my logics it should give me row number 1,2 and 4.

Using the approach with contains() but it returns the empty Series:

Term = 'cliente problema'

x_word = df_pentagrams.Name[df_pentagrams.Name.str.contains(Term)]

How can this problem be solved in Pandas?

Thanks!

vielkind · Accepted Answer · 2018-07-16 15:19:56Z

2

Your expectations are wrong regarding the behavior of str.contains. As you are using str.contains in your example you are searching for an explicit string cliente problema, but based on your expectation you aren't looking for clienta problema as a string, but for either clienta or problema occurring in any of the records.

Instead of treating clienta problema as a string you should split that string into a list and then use that list when you filter the DataFrame:

terms = term.split(' ')
df_penagrams.Name[df_pentagrams.Name.str.contains('|'.join(terms))

answered Jul 16, 2018 at 15:19

vielkind

2,9801 gold badge19 silver badges17 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

jpp Over a year ago

This method may not suit because it doesn't filter for words in Name, e.g. problematic would be caught.

Robert King · Accepted Answer · 2018-07-16 15:21:03Z

2

The problem is you are searching for the exact string 'cliente problema' not 'cliente' OR 'problema'.

This is what you want to do:

    Term1 = 'cliente' 
    Term2 = 'problema'

    x_word = df_pentagrams.Name[df_pentagrams.Name.str.contains(Term1) 
| df_pentagrams.Name.str.contains(Term2)]

answered Jul 16, 2018 at 15:21

Robert King

9946 silver badges16 bronze badges

2 Comments

Robert King Over a year ago

My answer was posted before I saw vealkind's. I prefer their solution as it scales to any number of search terms.

jpp Over a year ago

This method may not suit because it doesn't filter for words in Name, e.g. problematic would be caught.

jpp · Accepted Answer · 2018-07-16 15:24:44Z

You can use either regex or a list comprehension to filter for words:

df = pd.DataFrame({'Name': ['accensione del drive ribobinatrice ho',
                            'actions urgente proporre al cliente',
                            'al cliente upgrade del drive',
                            'al drive con una smontata',
                            'causa di un problema di']})

Term = 'cliente problema'

# regex
p = '|'.join(Term.split())
res = df[df['Name'].str.contains(r'\b{}\b'.format(p))]

# list comprehension
res = df[[any(i in words for i in Term.split()) \
          for words in df['Name'].str.split().values]]

print(res)

                                  Name
1  actions urgente proporre al cliente
2         al cliente upgrade del drive
4              causa di un problema di

ac24 · Accepted Answer · 2018-07-16 15:17:55Z

1

Try using the '|' character to join your separate terms in the search string. At the moment your code attempts to match the entire 'cliente problema' string, which none of your rows contain.

df = pd.DataFrame(data = ['accensione del drive ribobinatrice ho',
'actions urgente proporre al cliente',
'al cliente upgrade del drive',
'al drive con una smontata',
'causa di un problema di',], columns = ['Name'])

Term = 'cliente problema'

x_word = df.Name[df.Name.str.contains('|'.join(Term.split(' ')))]

answered Jul 16, 2018 at 15:17

ac24

5,6051 gold badge21 silver badges35 bronze badges

1 Comment

jpp Over a year ago

This method may not suit because it doesn't filter for words in Name, e.g. problematic would be caught.

Collectives™ on Stack Overflow

Searching in Pandas Dataframe in Python

4 Answers 4

1 Comment

2 Comments

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

1 Comment

2 Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related