Python how to solve Unicode Error in string

Question

I'm getting the classical error:

ascii' codec can't decode byte 0xc3 in position 28: ordinal not in range(128)

This time, I can't solve it. The error comes from this line:

mensaje_texto_inmobiliaria = "%s, con el email %s y el teléfono %s está se ha contactado con Inmobiliar" % (nombre, email, telefono)

Specifically, from the teléfono word. I have tried adding # -*- coding: utf-8 -*- to the beginning of the file, adding unicode( <string> ) and also <string>.encode("utf-8"). Nothing worked. Any advice will help.

try adding from __future__ import unicode_literals to the top of the file as well, does that solve your issue? — Thtu
– Thtu, Commented Aug 17, 2016 at 22:32
What version of Python are you running? I have 2.7.12 and it works fine it just interprets that character as an escape sequence in the string literal — Alex W
– Alex W, Commented Aug 17, 2016 at 22:35

Alex W · Accepted Answer · 2016-08-17 23:13:41Z

3

This is in response to why this solves the issue OP is having, and somebackground on the issue OP is trying describe

from __future__ import unicode_literals
from builtins import str

In the default iPython 2.7 kernel :

(iPython session)

In [1]: type("é") # By default, quotes in py2 create py2 strings, which is the same thing as a sequence of bytes that given some encoding, can be decoded to a character in that encoding.
Out[1]: str

In [2]: type("é".decode("utf-8")) # We can get to the actual text data by decoding it if we know what encoding it was initially encoded in, utf-8 is a safe guess in almost every country but Myanmar.
Out[2]: unicode

In [3]: len("é") # Note that the py2 `str` representation has a length of 2.  There's one byte for the "e" and one byte for the accent.  
Out[3]: 2

In [4]: len("é".decode("utf-8")) # the py2 `unicode` representation has length 1, since an accented e is a single character
Out[4]: 1

Some other things of note in python 2.7:

"é" is the same thing as str("é")
u"é" is the same thing as "é".decode('utf-8') or unicode("é", 'utf-8')
u"é".encode('utf-8') is the same thing as str("é")
You typically call decode with a py2 str, and encode with py2 unicode.
- Due to early design issues, you can call both on either even though that doesn't really make any sense.
- In python3, str, which is the same as python2 unicode, can no longer be decoded since a string by definition is a decoded sequence of bytes. By default, it uses the utf-8 encoding.
Byte sequences that were encoded with in the ascii codec behave exactly the same as their decoded counterparts.
- In python 2.7 with no future imports : type("a".decode('ascii')) gives a unicode object, but this behaves nearly identically with str("a"). This is not the case in python3.

With that said, here's what the snippets above do :

__future__ is a module maintained by the core python team that backports python3 functionality to python2 to allow you to use python3 idioms within python2.
from __future__ import unicode_literals has the following effect :
- Without the future import "é" is the same thing as str("é")
- With the future import "é" is functionally the same thing as unicode("é")
builtins is a module that is approved by the core python team, and contains safe aliases for using python3 idioms in python2 with the python3 api.
- Due to reasons beyond me, the package itself is named "future", so to install the builtins module you run : pip install future
from builtins import str has the following effect :
- the str constructor now gives what you think it does, i.e. text data in the form of python2 unicode objects. So it's functionally the same thing as str = unicode
- Note : Python3 str is functionally the same as Python2 unicode
- Note : To get bytes, you can use the "bytes" prefix, e.g. b'é'

The takeaway is this :

Decode on read/Decode early on and encode on write/encode at the end
Use str objects for bytes and unicode objects for text

edited Aug 17, 2016 at 23:13

Alex W

38.5k13 gold badges115 silver badges115 bronze badges

answered Aug 17, 2016 at 22:50

Thtu

2,03217 silver badges21 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Alex W Over a year ago

u"é".encode('utf-8') is the same thing as str("é") is not true

Thtu Over a year ago

That's good to know, do you think you can explain what's going on with u"é".encode('utf-8') == str("é") ?

Alex W Over a year ago

The double equals calls the special method __eq__ to check for equality. On those two objects it most likely is saying they are equal because they are both of <type 'str'>

roeland Over a year ago

@ThomasTu the meaning of "é" (a byte string) depends on the encoding of the file, eg. if the file is encoded in windows-1252, "é" == "\xe9" while if the file is encoded as UTF-8, then "é" == "\xc3\xa9". u"é".encode('utf-8') unambiguously equals "\xc3\xa9".

roeland Over a year ago

similarly, and very important, u"é".encode('utf-8') is not the same as str("é").

|

Collectives™ on Stack Overflow

Python how to solve Unicode Error in string

1 Answer 1

6 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

6 Comments

Your Answer

Sign up or log in

Post as a guest

Related