What is the default encoding method for code assumed by Python interpreter?

Question

Some people use the following to declare the encoding method for the text of their Python source code:

# -*- coding: utf-8 -*-

Back in 2001, it is said the default encoding method that Python interpreter assumes is ASCII. I have dealt with strings using non-ASCII characters in my Python code, without declaring encoding method of my code, and I don't remember I have bumped into encoding error before. What is the default encoding for code assumed by Python interpreter now?

I am not sure if this is relevant. My OS is Ubuntu, and I am using the default Python interpreter, and gedit or emacs for editing. Will the default encoding method by Python interpreter changes if the above changes?

Thanks.

Python 2 still uses ASCII as the default encoding. It only changed to UTF-8 with Python 3, and Arch Linux is the only distro where Python 3 is the default python. — Wooble
– Wooble, Commented Aug 8, 2014 at 14:09
What exactly do you mean by "I have dealt with strings using non-ASCII characters in my Python code, without declaring encoding method of my code, and I don't remember I have bumped into encoding error before"? Unless you were working with Python 3, this is not possible, assuming you actually had non-ASCII characters in your source code. — Lukas Graf
– Lukas Graf, Commented Aug 8, 2014 at 14:29
@lukas: I remember I wrote a script that read a file with non-ASCII characters, and then output them to another file, without declaring any encoding method. All works — Tim
– Tim, Commented Aug 8, 2014 at 14:41
@Tim: That is something totally different. That's your program dealing with non-ASCII characters in strings, as part of processed data. But the source code encoding declaration affects what encoding your source code will be interpreted with - so it's only needed if you decide to directly put non-ASCII characters in your source code. — Lukas Graf
– Lukas Graf, Commented Aug 8, 2014 at 14:44
@Lukas: When my script read a file, doesn't the file content become a string? I also did some regex matching to modify the string, before writing it back to another file. What encoding method does Python interpreter interpret the string content? — Tim
– Tim, Commented Aug 8, 2014 at 14:47

Lukas Graf · Accepted Answer · 2014-08-08 15:03:10Z

Without any explicit encoding declaration, the assumed encoding for your source code will be

ascii for Python 2.x
utf-8 for Python 3.x

See PEP 0263 and Using source code encoding for Python 2.x, and PEP 3120 for the new default of utf-8 for Python 3.x.

So the default encoding assumened for source code will be directly dependent of the version of the Python interpreter, and it is not configurable.

Note that the source code encoding is something entirely different than dealing with non-ASCII characters as part of your data in strings.

There are two distinct cases where you may encounter non-ASCII characters:

As part of your programs data, during runtime
As part of your source code (and since you can't have non-ASCII characters in identifiers, that usually means hard coded string data in your source code or comments).

The source code encoding declaration affects what encoding your source code will be interpreted with - so it's only needed if you decide to directly put non-ASCII characters in your source code.

So, the following code will eventually have to deal with the fact that there might be non-ASCII characters in data.txt:

with open('data.txt') as f:
    for line in f:
        # do something with `line`

But it doesn't contain any non-ASCII characters in the source code, therefore it doesn't need an encoding declaration at the top of the file. It will however need to properly decode line if it wants to turn it into unicode. Simply doing unicode(line) will use the system default encoding, which is ascii (different from the default source encoding, but happens to also be ascii). So to explicitely decode the string using utf-8 you'd need to do line.decode('utf-8').

This code however does contain non-ASCII characters directly in its source code:

TEST_DATA = 'Bär'    # <--- non-ASCII character on this line
print TEST_DATA

And it will fail with a SyntaxError similar to this, unless you declare an explicit source code encoding:

SyntaxError: Non-ASCII character '\xc3' in file foo.py on line 1, but no encoding declared;
see http://www.python.org/peps/pep-0263.html for details

So assuming your text editor is configured to save files in utf-8, you'd need to put the line

# -*- coding: utf-8 -*-

at the top of the file for Python to interpret the source code correctly.

My advice however would be to generally avoid putting non-ASCII characters in your source code, exactly because if it depends on your and your co-workers editor and terminal settings wheter it will be written and read correctly.

Instead you can use escaped strings to safely enter non-ASCII characters in your code:

TEST_DATA = 'B\xc3\xa4r'

The byte literal ambiguity is fixed in Python 3: b'ä' (non-ASCII) now causes SyntaxError: bytes can only contain ASCII literal characters.
Python 2 documentation states: "By default, Python source files are treated as encoded in UTF-8." But as you said ASCII is the default on Python 2, so this is a documentation error.

Kasravnd · Accepted Answer · 2014-08-08 14:16:46Z

0

By default, Python source files are treated as encoded in UTF-8. In that encoding, — although the standard library only uses ASCII characters for identifiers, a convention that any portable code should follow. To display all these characters properly, the editor must recognize that the file is UTF-8, and it must use a font that supports all the characters in the file.

It is also possible to specify a different encoding for source files. In order to do this, we put the below code on top of our code !

# -*- coding: encoding -*-

https://docs.python.org/dev/tutorial/interpreter.html

answered Aug 8, 2014 at 14:16

Kasravnd

108k19 gold badges167 silver badges195 bronze badges

3 Comments

Tim Over a year ago

thanks. "By default, Python source files are treated as encoded in UTF-8." Is this treated by any/most standard Python interpreter?

Lukas Graf Over a year ago

What you state is only true for Python 3. For Python 2, which is still in wide-spread use, the default encoding assumed is ASCII.

Kasravnd Over a year ago

@Tim about interpreters as Lukas says its for python 3 ! but the some of editors don't support UTF-8 !

Collectives™ on Stack Overflow

What is the default encoding method for code assumed by Python interpreter?

2 Answers 2

2 Comments

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related