1

I have the following error:

UnicodeEncodeError: 'ascii' codec can't encode character '\xd7' in position 31: ordinal not in range(128)

from this code :

test_string = """
Antelope Canyon, Arizona [1600×1068] </a>&#32; <span class="domain">(<a
"""

print(test_string)

output of sys.getdefaultencoding :

In [6]: sys.getdefaultencoding()
Out[10]: 'utf-8'

I'm using a Chromebook with crouton - if that makes a difference (I've a feeling that it might).

I'm not sure if there's some way of 'forcing' the output of strings like this or just ignoring any chars that are problematic.

terminal or console o redirect cannot handle UTF-8; what environment are you trying to print in.

I'm trying to run this using iPython within Spacemacs

In [22]: sys.stdout.encoding
Out[27]: 'ANSI_X3.4-1968'

In the shell, what does the command locale output?

In the shell I'm running this within (iPython within Spacemacs) the command is undefined, on the default shell brought up with ctrl alt t the output is

$ locale
LANG=
LANGUAGE=
LC_CTYPE="POSIX"
LC_NUMERIC="POSIX"
LC_TIME="POSIX"
LC_COLLATE="POSIX"
LC_MONETARY="POSIX"
LC_MESSAGES="POSIX"
LC_PAPER="POSIX"
LC_NAME="POSIX"
LC_ADDRESS="POSIX"
LC_TELEPHONE="POSIX"
LC_MEASUREMENT="POSIX"
LC_IDENTIFICATION="POSIX"
LC_ALL=
11
  • 3
    Ah, the × is the U+00D7 character. Commented Dec 7, 2015 at 10:09
  • 2
    What matters here is sys.stdout.encoding, which is set from the parent process (usually the shell). Commented Dec 7, 2015 at 10:11
  • 2
    In the shell, what does the command locale output? Commented Dec 7, 2015 at 10:12
  • 1
    @KevinGuan: exactly. But the OP is using a Chromebook.. Commented Dec 7, 2015 at 10:13
  • 1
    If you are using an Emacs environment, use (setenv "LC_CTYPE" "UTF-8") to set a locale. Commented Dec 7, 2015 at 10:26

2 Answers 2

2

On a POSIX host, Python determines the output encoding from the locale, a set of environment variables that communicate how the environment is configured for various language settings. See the locale.getdefaultlocale() function, or more specifically, the locale.getpreferredencoding() function.

The output of that function is used to set sys.stdout.encoding, which is then used to encode any Unicode text printed.

Your locale is set to POSIX, which means that the default encoding is ASCII. You'll need to configure that locale to use an encoding that supports all of Unicode. How to do this for Chromebooks, I don't know. On my Mac, the locale is set to en_US.UTF-8, mostly, so all of the Unicode standard is supported by my terminal. You could force the issue by setting export LC_CTYPE=en_US.UTF-8.

You can override Python's choices by setting the PYTHONIOENCODING environment variable.

Note that on more recent Python 3 releases, sys.stdout and sys.stderr use the backslashescape error handler, which replaces any character your console can't handle with the standard \xhh, \uhhhh and \Uhhhhhhhh escape sequences; so instead of an exception you'd see:

Antelope Canyon, Arizona [1600\xd71068] </a>&#32; <span class="domain">(<a 
Sign up to request clarification or add additional context in comments.

1 Comment

I should be able to get it working from here, thanks
1

Ah, after search and search, I found this. As it says maybe you could try:

  1. Edit (create it first) /etc/locale.gen file.
  2. Write the following text in it:

    en_GB.UTF-8 UTF-8
    LC_ALL="en_GB.UTF-8"
    
  3. Maybe try reboot the Chromebook.

And then check the locale command's output.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.