6
{u'Status': u'OK', u'City': u'Ciri\xe8', u'TimezoneName': '', u'ZipPostalCode': '', u'CountryCode': u'IT', u'Dstoffset': u'0', u'Ip': u'x.x.x.x', u'Longitude': u'7.6', u'CountryName': u'Italy', u'RegionCode': u'12', u'Latitude': u'45.2333', u'Isdst': '', u'Gmtoffset': u'0', u'RegionName': u'Piemonte'}

This is the output of my object. I would like to access City but It's encoded. How can I read all parameters and decode it

>>> data['City']
u'Ciri\xe8'

>>>data['City'].decode('utf-8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe8' in position 4: ordinal not in range(128)

I want plaintext not unicode string. Thank you!

5
  • I'm using this code github.com/sonicrules1234/pyipinfodb/blob/master/pyipinfodb.py Commented Apr 22, 2012 at 2:04
  • 1
    There is no such thing as "plaintext". Commented Apr 22, 2012 at 2:07
  • 2
    You don't have to do anything. It's already decoded... Try print data['City'] Commented Apr 22, 2012 at 2:07
  • As you see in the post the result of print data['City'] is u'Ciri\xe8' Commented Apr 22, 2012 at 2:10
  • 1
    No, you just typed data['City']. Try print data['City']. For me, in iPython, that makes a difference. Commented Apr 22, 2012 at 2:11

3 Answers 3

9

What you want is not clear. If by 'plaintext' you mean remove accentuation, try this:

>>> s = u'Ciri\xe8'
>>> from unicodedata import normalize
>>> normalize('NFKD', s).encode('ASCII', 'ignore')
'Cirie'
Sign up to request clarification or add additional context in comments.

Comments

8

Read this: http://nedbatchelder.com/text/unipain.html

Then just print it:

>>> data = {u'City':u'Ciri\xe8'}
>>> data['City']
u'Ciri\xe8'
>>> print data['City']
Ciriè

If you don't print it, Python prints a safe representation of the string, indicating it is Unicode text u'', and that it contains a non-ASCII character \xe8. print attempts to display the non-ASCII character by encoding the Unicode string in the terminal encoding. It may fail if the string contains characters that aren't supported by the terminal encoding:

>>> print u'\xe8'
è
>>> print u'\x81'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "d:\dev\Python27\lib\encodings\cp437.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character u'\x81' in position 0: character maps to <undefined>

In the above example, code page 437 supports Unicode character U+00E8, but not U+0081.

Comments

0

By plaintext, I suppose you mean ascii. For this you can use:

data['City'].encode('ascii','ignore')

this will strip the unicode character and return

Ciri

See this link for more information: http://docs.python.org/howto/unicode.html

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.