3

simple test program of an encoding issue:

#!/bin/env python
# -*- coding: utf-8 -*-
print u"Råbjerg"      # >>> unicodedata.name(u"å") = 'LATIN SMALL LETTER A WITH RING ABOVE'

here is what i get when i use it from a debian command box, i do not understand why using redirect here broke the thing, as i can see it correctly when using without.

can someone help to understand what i have missed? and what should the right way to print this characters so that they are ok everywhere?

$ python testu.py
Råbjerg

$ python testu.py > A
Traceback (most recent call last):
  File "testu.py", line 3, in <module>
    print u"Råbjerg"
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe5' in position 1: ordinal not in range(128)

using debian Debian GNU/Linux 6.0.7 (squeeze) configured with:

$ locale
LANG=fr_FR.UTF-8
LANGUAGE=
LC_CTYPE="fr_FR.UTF-8"
LC_NUMERIC="fr_FR.UTF-8"
LC_TIME="fr_FR.UTF-8"
LC_COLLATE="fr_FR.UTF-8"
LC_MONETARY="fr_FR.UTF-8"
LC_MESSAGES="fr_FR.UTF-8"
LC_PAPER="fr_FR.UTF-8"
LC_NAME="fr_FR.UTF-8"
LC_ADDRESS="fr_FR.UTF-8"
LC_TELEPHONE="fr_FR.UTF-8"
LC_MEASUREMENT="fr_FR.UTF-8"
LC_IDENTIFICATION="fr_FR.UTF-8"
LC_ALL=

EDIT: from other similar questions seen later from the pointing done below

#!/bin/env python1
# -*- coding: utf-8 -*-
import sys, locale
s = u"Råbjerg"      # >>> unicodedata.name(u"å") = 'LATIN SMALL LETTER A WITH RING ABOVE'
if sys.stdout.encoding is None: # if it is a pipe, seems python2 return None
    s = s.encode(locale.getpreferredencoding())
print s
1
  • ok, thanks and sorry, it is effectively like the post you pointed, and the explication is interesting over there. Just for any reference using import locale, sys ; print sys.stdout.encoding, locale.getpreferredencoding() can help to understand the pipe behaviour tty encoding vs None that can default to ascii when redirect. Commented Jul 3, 2013 at 12:33

3 Answers 3

5

When redirecting the output, sys.stdout is not connected to a terminal and Python cannot determine the output encoding. When not directing the output, Python can detect that sys.stdout is a TTY and will use the codec configured for that TTY when printing unicode.

Set the PYTHONIOENCODING environment variable to tell Python what encoding to use in such cases, or encode explicitly.

Sign up to request clarification or add additional context in comments.

Comments

3

Use: print u"Råbjerg".encode('utf-8')

Similar question was asked today : Understanding Python Unicode and Linux terminal

Comments

2

I'll suggest you to output it already encoded:

print u"Råbjerg".encode('utf-8')

This will write the correct bytes of the string in utf-8 and you'll be able to see in almost every editor/console which support utf-8

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.