0

I'm getting the classical error:

ascii' codec can't decode byte 0xc3 in position 28: ordinal not in range(128)

This time, I can't solve it. The error comes from this line:

mensaje_texto_inmobiliaria = "%s, con el email %s y el teléfono %s está se ha contactado con Inmobiliar" % (nombre, email, telefono)

Specifically, from the teléfono word. I have tried adding # -*- coding: utf-8 -*- to the beginning of the file, adding unicode( <string> ) and also <string>.encode("utf-8"). Nothing worked. Any advice will help.

3
  • try adding from __future__ import unicode_literals to the top of the file as well, does that solve your issue? Commented Aug 17, 2016 at 22:32
  • What version of Python are you running? I have 2.7.12 and it works fine it just interprets that character as an escape sequence in the string literal Commented Aug 17, 2016 at 22:35
  • @ThomasTu Thanks man, ... That solves it. Why? Commented Aug 17, 2016 at 22:48

1 Answer 1

3

This is in response to why this solves the issue OP is having, and somebackground on the issue OP is trying describe

from __future__ import unicode_literals
from builtins import str

In the default iPython 2.7 kernel :

(iPython session)

In [1]: type("é") # By default, quotes in py2 create py2 strings, which is the same thing as a sequence of bytes that given some encoding, can be decoded to a character in that encoding.
Out[1]: str

In [2]: type("é".decode("utf-8")) # We can get to the actual text data by decoding it if we know what encoding it was initially encoded in, utf-8 is a safe guess in almost every country but Myanmar.
Out[2]: unicode

In [3]: len("é") # Note that the py2 `str` representation has a length of 2.  There's one byte for the "e" and one byte for the accent.  
Out[3]: 2

In [4]: len("é".decode("utf-8")) # the py2 `unicode` representation has length 1, since an accented e is a single character
Out[4]: 1

Some other things of note in python 2.7:

  • "é" is the same thing as str("é")
  • u"é" is the same thing as "é".decode('utf-8') or unicode("é", 'utf-8')
  • u"é".encode('utf-8') is the same thing as str("é")
  • You typically call decode with a py2 str, and encode with py2 unicode.
    • Due to early design issues, you can call both on either even though that doesn't really make any sense.
    • In python3, str, which is the same as python2 unicode, can no longer be decoded since a string by definition is a decoded sequence of bytes. By default, it uses the utf-8 encoding.
  • Byte sequences that were encoded with in the ascii codec behave exactly the same as their decoded counterparts.
    • In python 2.7 with no future imports : type("a".decode('ascii')) gives a unicode object, but this behaves nearly identically with str("a"). This is not the case in python3.

With that said, here's what the snippets above do :

  • __future__ is a module maintained by the core python team that backports python3 functionality to python2 to allow you to use python3 idioms within python2.
  • from __future__ import unicode_literals has the following effect :
    • Without the future import "é" is the same thing as str("é")
    • With the future import "é" is functionally the same thing as unicode("é")
  • builtins is a module that is approved by the core python team, and contains safe aliases for using python3 idioms in python2 with the python3 api.
    • Due to reasons beyond me, the package itself is named "future", so to install the builtins module you run : pip install future
  • from builtins import str has the following effect :
    • the str constructor now gives what you think it does, i.e. text data in the form of python2 unicode objects. So it's functionally the same thing as str = unicode
    • Note : Python3 str is functionally the same as Python2 unicode
    • Note : To get bytes, you can use the "bytes" prefix, e.g. b'é'

The takeaway is this :

  1. Decode on read/Decode early on and encode on write/encode at the end
  2. Use str objects for bytes and unicode objects for text
Sign up to request clarification or add additional context in comments.

6 Comments

u"é".encode('utf-8') is the same thing as str("é") is not true
That's good to know, do you think you can explain what's going on with u"é".encode('utf-8') == str("é") ?
The double equals calls the special method __eq__ to check for equality. On those two objects it most likely is saying they are equal because they are both of <type 'str'>
@ThomasTu the meaning of "é" (a byte string) depends on the encoding of the file, eg. if the file is encoded in windows-1252, "é" == "\xe9" while if the file is encoded as UTF-8, then "é" == "\xc3\xa9". u"é".encode('utf-8') unambiguously equals "\xc3\xa9".
similarly, and very important, u"é".encode('utf-8') is not the same as str("é").
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.