1

I'm currently working on a project where I need to run a command in powershell, and part of the output is not in English (Specifically - Hebrew).

For example (a simplified version of the problem), if I want to get the content of my desktop, and there is a filename in Hebrew:

import subprocess
command = "powershell.exe ls ~/Desktop"
print (subprocess.run(command.split(), stdout=subprocess.PIPE).stdout.decode())

This code will raise the following error (Or something similar with a different byte value):

UnicodeDecodeError: 'utf8' codec can't decode byte 0x96 in position 19: invalid start byte

Tried to run it on a different computer, and this was the output:

?????

Any idea why is that and how can I fix it? Tried a lot of things I saw on other questions, but none of them worked for me.

7
  • 1
    Try to use decode() with encoding parameter, for example decode(encoding="latin1") Commented May 31, 2021 at 13:01
  • Output character encoding is dependant on your system/os/shell settings. If you get the UnicodeDecodeError, it means that the output captured is NOT unicode. You might be able to fetch the encoding with locale.getpreferredencoding() and use that as parameter to decode() as @Marino pointed out above. Commented May 31, 2021 at 13:12
  • 2
    @Marino Latin-1 doesn't support Hebrew. Decoding will succeed (because any byte sequence can be decoded with Latin-1), but the result probably will be garbage. Commented May 31, 2021 at 13:12
  • Thank you for your comments. Unfortunately - none of them worked :( The command output in python I think is literally the char ?, not really sure why. Commented May 31, 2021 at 13:40
  • 1
    Can you give some example file names that you are having issues with? Commented May 31, 2021 at 14:25

1 Answer 1

3

Note: The following are Python 3+ solutions, but there is a caveat:

  • With the first solution below and also with the second one - but only if UTF-8 data must be sent to PowerShell's stdin stream - due to a bug in powershell.exe, the Windows PowerShell CLI, the current console window switches to a raster font (potentially with a different font size), which does not support most non-extended-ASCII-range Unicode characters. While visually jarring, this is merely a display (rendering) problem; the data is handled correctly; switching back to a Unicode-aware font such as Consolas reveals the correct output.

  • By contrast, pwsh.exe, the PowerShell (Core) (v6+) CLI does not exhibit this problem.


Option A: Configure both the console and Python to use UTF-8 character encoding before executing your script:

  • Configure the console to use UTF-8:

    • From cmd.exe, by switching the active OEM code page to 65001 (UTF-8); note that this change potentially affects all later calls to console applications in the session, independently of Python, unless you restore the original code page (see Option B below):

      chcp 65001
      
    • From PowerShell:

      $OutputEncoding = [Console]::InputEncoding = [Console]::OutputEncoding = [System.Text.UTF8Encoding]::new()
      
  • And configure Python (v3+) to use UTF-8 consistently:[1]

    • Set environment variable PYTHONUTF8 to 1, possibly persistently, via the registry; to do it ad hoc:

      • From cmd.exe:

        Set PYTHONUTF8=1
        
      • From PowerShell:

        $env:PYTHONUTF8=1
        
    • Alternatively, for an individual call (v3.7+): Pass command-line option -X utf8 to the python interpreter (note: case matters):

        python -X utf8 somefile.py ...
      
    • Both options enable Python UTF-8 Mode, which will become the default in Python 3.15.

Now, your original code should work as-is (except for the display bug).

Note:

  • A simpler alternative via a one-time configuration step is to configure your system to use UTF-8 system-wide, in which case both the OEM and the ANSI code pages are set to 65001. However, this has far-reaching consequences - see this answer.

Option B: (Temporarily) switch to UTF-8 for the PowerShell call:

import sys, ctypes, subprocess

# Switch Python's own encoding to UTF-8, if necessary
# This is the in-script equivalent of setting environment var. 
# PYTHONUTF8 to 1 *before* calling the script.
sys.stdin.reconfigure(encoding='utf-8'); sys.stdout.reconfigure(encoding='utf-8'); sys.stderr.reconfigure(encoding='utf-8')

# Save the current console output code page and switch to 65001 (UTF-8)
previousCp = windll.kernel32.GetConsoleOutputCP()
windll.kernel32.SetConsoleOutputCP(65001)

# PowerShell now emits UTF-8-encoded output; decode it as such.
command = "powershell.exe ls ~/Desktop"
print(subprocess.run(command, stdout=subprocess.PIPE).stdout.decode())

# Restore the previous output console code page.
windll.kernel32.SetConsoleOutputCP(previousCp)

Note:

  • Due to setting only the output console page, the Windows PowerShell display bug is avoided.
  • If you also wanted to send input to PowerShell's stdin stream, you'd have to set the input console page too, via windll.kernel32.SetConsoleCP(65001) (which would then again surface the display bug).

[1] This isn't strictly necessary just for correctly decoding PowerShell's output, but matters if you want to pass that output on from Python: Python 3.x defaults to the active ANSI(!) code page for encoding non-console output, which means that Hebrew characters, for instance, cannot be represented in non-console output (e.g., when redirecting to a file), and cause the script to break.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.