1

Python source files often come with a coding header similar to the following

# -*- coding: iso-8859-1 -*-

How can I this line to properly parse the contents of such a file? Is there a better way than manually opening the file in binary mode, reading one line, and checking if it contains the header? Is there a library that does this?


Background: this comes in the context of fixing this bug, which crashes elpy when used in conjunction with python3 and importmagic. The code that I'm trying to fix uses

with open(filename) as fd:
    success = subtree.index_source(filename, fd.read())

and crashes on non-utf-8 files. Ideally I would like to keep changes to a minimum.

4
  • "better way" is such an extremely relative thing that I'm tempted to ignore your question. What is bad about the way you're currently doing it? Commented Feb 11, 2015 at 16:45
  • 1
    @MarcusMüller - considering that python supports some source encoding schemes, it is reasonable to assume that there is an already existing python library to read such files. There are several formats, 8 vs. 16 bit encodings, BOMs and etc..., its not an obvious thing to do on your own. Commented Feb 11, 2015 at 16:55
  • ah, but there's a PEP that already describes how this should be handled Commented Feb 11, 2015 at 16:56
  • @tdelaney: I've added an answer based on your inspiration; thanks! Commented Feb 11, 2015 at 17:32

1 Answer 1

1

There is tokenize.open() that does exactly that: it opens a Python source file using the character encoding specified in the coding header (encoding declaration).

You could decode on-the-fly remote Python files too.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.