0

I am trying to remove all accents in the data. I found a function but I am not able to apply the same on entire dataframe at once.

import unicodedata
import pandas as pd

def remove_accents(input_str):
    nfkd_form = unicodedata.normalize('NFKD', input_str)
    only_ascii = nfkd_form.encode('ASCII', 'ignore')
    return only_ascii


data = {'name': ['Guzmán', 'Molly'],
        'year': [2012, 2012]}
df = pd.DataFrame(data)
df

How can I apply the above function?

Is there any parameter in pandas read_csv that I can use to achieve similar output?

5
  • Have you looked at any examples of apply? Your case looks very straigtfoward. And I do not understand your last question entirely. Commented Aug 4, 2017 at 2:23
  • Try the apply docs Commented Aug 4, 2017 at 2:24
  • giving error as unicodedata.normalize('NFKD', input_str) expects two patameters Commented Aug 4, 2017 at 2:26
  • df.name.apply(lambda x: unicodedata.normalize('NFKD', x).encode('ASCII', 'ignore')) Commented Aug 4, 2017 at 2:26
  • @COLDSPEED, thanks but I get TypeError: normalize() argument 2 must be unicode, not str eror. Also, I need to do it on the entire data frame all at once Commented Aug 4, 2017 at 2:27

1 Answer 1

1

As others have pointed out, this is pretty straightforward:

df['name'] = df['name'].apply(remove_accents)

Also, in case you are using Python 3, I would recommend changing the last line of your remove_accents function. only_ascii is returning binary data, and it's usually best practice to keep unicode text as regular (Python 3) str.

def remove_accents(input_str):
    nfkd_form = unicodedata.normalize('NFKD', input_str)
    only_ascii = nfkd_form.encode('ASCII', 'ignore')
    return only_ascii.decode('utf-8')
Sign up to request clarification or add additional context in comments.

3 Comments

@Thanks. Is there a way I can apply the function on entire dataframe instead of doing it one column at a time?
You can loop over the columns or try something like: df.apply(lambda x: [remove_accents(i) for i in x]). Probably not very efficient, but gets the job done.
Btw, a good way to "debug" apply and understand what your function is taking as input is passing a function that just simply prints the input it receives: lambda x: print(x). Probably Python 3 only though.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.