How to apply a function with argument to a Pandas dataframe

Question

I am trying to remove all accents in the data. I found a function but I am not able to apply the same on entire dataframe at once.

import unicodedata
import pandas as pd

def remove_accents(input_str):
    nfkd_form = unicodedata.normalize('NFKD', input_str)
    only_ascii = nfkd_form.encode('ASCII', 'ignore')
    return only_ascii


data = {'name': ['Guzmán', 'Molly'],
        'year': [2012, 2012]}
df = pd.DataFrame(data)
df

How can I apply the above function?

Is there any parameter in pandas read_csv that I can use to achieve similar output?

Have you looked at any examples of apply? Your case looks very straigtfoward. And I do not understand your last question entirely. — DYZ
– DYZ, Commented Aug 4, 2017 at 2:23
giving error as unicodedata.normalize('NFKD', input_str) expects two patameters — learner
– learner, Commented Aug 4, 2017 at 2:26
df.name.apply(lambda x: unicodedata.normalize('NFKD', x).encode('ASCII', 'ignore')) — cs95
– cs95, Commented Aug 4, 2017 at 2:26
@COLDSPEED, thanks but I get TypeError: normalize() argument 2 must be unicode, not str eror. Also, I need to do it on the entire data frame all at once — learner
– learner, Commented Aug 4, 2017 at 2:27

Gustavo Bezerra · Accepted Answer · 2017-08-04 03:24:29Z

1

As others have pointed out, this is pretty straightforward:

df['name'] = df['name'].apply(remove_accents)

Also, in case you are using Python 3, I would recommend changing the last line of your remove_accents function. only_ascii is returning binary data, and it's usually best practice to keep unicode text as regular (Python 3) str.

def remove_accents(input_str):
    nfkd_form = unicodedata.normalize('NFKD', input_str)
    only_ascii = nfkd_form.encode('ASCII', 'ignore')
    return only_ascii.decode('utf-8')

answered Aug 4, 2017 at 3:24

Gustavo Bezerra

11.2k4 gold badges45 silver badges51 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

learner Over a year ago

@Thanks. Is there a way I can apply the function on entire dataframe instead of doing it one column at a time?

Gustavo Bezerra Over a year ago

You can loop over the columns or try something like: df.apply(lambda x: [remove_accents(i) for i in x]). Probably not very efficient, but gets the job done.

Gustavo Bezerra Over a year ago

Btw, a good way to "debug" apply and understand what your function is taking as input is passing a function that just simply prints the input it receives: lambda x: print(x). Probably Python 3 only though.

Collectives™ on Stack Overflow

How to apply a function with argument to a Pandas dataframe

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related