3

I'm writing a little script that allows me to import my Facebook contacts' email addresses to GMail/Android. My input file has unicode characters, like: Jasmin L\u00f3pez. The generated CSV output file looks like this:

Andr\u00e9 Zzz,,,,,,,,,,,,,,,,,,,,,,,,,,fbcontacts ::: * My Contacts,* Home,[email protected]
Andr\u00e9ia Ggg,,,,,,,,,,,,,,,,,,,,,,,,,,fbcontacts ::: * My Contacts,* Home,[email protected]
Andr\u00e9s Bbb,,,,,,,,,,,,,,,,,,,,,,,,,,fbcontacts ::: * My Contacts,* Home,[email protected]

As you can see I have problems with encodings. I'm creating a Google contacts CSV file but I need names properly displayed. I'm using this function to write the CSV:

def writecsv(self):
    if self.outfile is not '':
        #fh = open(self.outfile, 'wb')
        #fh = codecs.open(self.outfile, "wb", "utf-8")
        fh = codecs.open(self.outfile, 'wb', encoding="latin-1")
    else:
        fh = sys.stdout

    csvhdlr = csv.writer(fh, quotechar='"', quoting=csv.QUOTE_MINIMAL)
    csvhdlr.writerow("Name,Given Name,Additional Name,Family Name,Yomi Name,Given Name Yomi,Additional Name Yomi,Family Name Yomi,Name Prefix,Name Suffix,Initials,Nickname,Short Name,Maiden Name,Birthday,Gender,Location,Billing Information,Directory Server,Mileage,Occupation,Hobby,Sensitivity,Priority,Subject,Notes,Group Membership,E-mail 1 - Type,E-mail 1 - Value".split(','))        
    for contact in self.clist:
        #csvhdlr.writerow(dict((vname, vtype, vnotes, vstereotype, vauthor, valias, vgenfile.encode('utf-8')) for vname, vtype, vnotes, vstereotype, vauthor, valias, vgenfile in row.iteritems()))
        row = contact.fullname + ',,,,,,,,,,,,,,,,,,,,,,,,,,fbcontacts ::: * My Contacts,* Home,' + contact.email
        csvhdlr.writerow(row.split(','))

Any idea please? I'm quite new to python and everytime I have to use encodings, it doesn't work as I would like to =(

Thanks a lot for your help!

1 Answer 1

3

If I understand you right, your file doesn't contain high unicode characters; it just contains unicode escape sequences like "\u00f3" that represent high unicode characters. If your file actually contains the string "Jasmin L\u00f3pez" (with a literal backslash and u) then you'll need to decode that to actual unicode characters before writing it. Take a look at the unicode_escape codec.

>>> x = b"\u00f3"
>>> print x
\u00f3
>>> print x.decode('unicode_escape')
ó
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for your fast answer, it works great on console but when I try to write to the csv, I get this: UnicodeEncodeError: 'ascii' codec can't encode character u'\xe3' in position 3: ordinal not in range(128). Any clue?
What did you do to fix this?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.