Is there any way we can find what kind of encoding is used in bytes string with codecs in python. There is a method in chardet chardet.detect(string)['encoding'] Is there any method similar to this in codecs python
-
Why not use chardet?Thaer A– Thaer A2020-05-02 08:08:00 +00:00Commented May 2, 2020 at 8:08
-
If there was such a method in the standard library, chardet would most probably not exist.MaxNoe– MaxNoe2020-05-02 08:29:22 +00:00Commented May 2, 2020 at 8:29
-
Does this answer your question? How to detect string byte encoding?Joe– Joe2020-05-02 09:15:33 +00:00Commented May 2, 2020 at 9:15
Add a comment
|
1 Answer
There isn't a built-in method, because it wouldn't be possible to reliably determine this for arbitrary values and arbitrary encodings. (For example, any text containing only ASCII characters is valid in most other encodings.)
The best you could do is a series of try-catch blocks where you guess a series of encodings (eg UTF8, UTF16) and go to the next if there is an invalid character.