0

i have file having name "SSE-Künden, SSE-Händler.pdf" which having those two unicode char ( ü,ä) when i am printing this file name on python interpreter the unicode values are getting converted into respective ascii value i guess 'SSE-K\x81nden, SSE-H\x84ndler.pdf' but i want to

test dir contains the pdf file of name 'SSE-Künden, SSE-Händler.pdf'

i tried this: path = 'C:\test' for a,b,c in os.walk(path): print c

['SSE-K\x81nden, SSE-H\x84ndler.pdf']

how do i convert this ascii chars to its respective unicode vals and i want to show the original name("SSE-Künden, SSE-Händler.pdf") on interpreter and also writeing into some file as it is.how do i achive this. I am using Python 2.6 and windows OS.

Thanks.

3
  • 1
    Is your terminal session's character encoding set to UTF-8? Commented Sep 22, 2011 at 6:57
  • sorry but how to verify that. Commented Sep 22, 2011 at 6:59
  • If you're using Ubuntu, Terminal (from the menu) --> Set Character Encoding Commented Sep 22, 2011 at 7:00

3 Answers 3

3

Assuming your terminal supports displaying the characters, iterate over the list of files and print them individually (or use Python 3, which displays Unicode in lists):

Python 2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> for p,d,f in os.walk(u'.'):
...  for n in f:
...   print n
...
SSE-Künden, SSE-Händler.pdf

Also note I used a Unicode string (u'.') for the path. This instructs os.walk to return Unicode strings as opposed to byte strings. When dealing with non-ASCII filenames this is a good idea.

In Python 3 strings are Unicode by default and non-ASCII characters are displayed to the user instead of displayed as escape codes:

Python 3.2.1 (default, Jul 10 2011, 21:51:15) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> for p,d,f in os.walk('.'):
...  print(f)
...
['SSE-Künden, SSE-Händler.pdf']
Sign up to request clarification or add additional context in comments.

9 Comments

sorry i didnt mention before i am using python 2.6 and windows os, ipython
His question is how to display the unicode characters in their native form (non-byte format)
+1 Using a unicode path does indeed work, interesting and non-obvious.
no i tried on python 2.6.7 i am getting following error:UnicodeEncodeError: 'charmap' codec can't encode character u'\x81' in position 22: character maps to <undefined>
@Shashi, interesting. Your filename is a Unicode string but contains the cp437 (US Windows console encoding) character value for ü. Was this file originally created on Windows? I created the file for the example above and the Unicode characters for ü and ä are \xfc and \xe4.
|
1
for a,b,c in os.walk(path):
    for n in c:
        print n.decode('utf-8')

3 Comments

+1: This should work if his terminal session is set to display unicode.
To set the windows terminal to unicode see stackoverflow.com/questions/5419/…
This won't work if the file system doesn't use UTF-8, such as Windows.
0

For writing to a file: http://docs.python.org/howto/unicode.html#reading-and-writing-unicode-data

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.