All my python source code is encoded in utf-8 and has this coding declared on the top of the file.
But sometimes the u before a unicode string is missing.
Example Umlauts = "üöä"
Above is a bytestring containing non-ascii characters and this makes trouble (UnicodeDecodeError).
I tried pylint and python -3 but I could not get a warning.
I search an automated way to find non-ascii characters in bytestrings.
My source code needs to support Python 2.6 and Python 2.7.
I get this well known error:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 7: ordinal not in range(128)
BTW: This question is only about python source code, not about strings read from files or sockets.
Solution
- for projects which need to support Python 2.6+ I will use
__future__.unicode_literals - for projects which need to support 2.5 I will use the solution from thg435 (module ast)
uin front of them is not going to solve your problem. This error appears whenever you do something with your data (likeprinting) where the accepting function doesn't expect characters encoded that way. You need to make sure that all strings in your program are handled as Unicode as soon and as long as possible and only encoded to specific, matching encodings when exporting/printing etc.__future__.unicode_literals. Second: To find those I would probably try usinggreplike in this example. Of course this will find those characters out of a bytestring too, but I assume theres's not many variables with umlaut names is it?"""'"'\""\n'''""")...