I read data from Apache log file. There are some texts are encoded. Like this line:
192.168.1.17 - - [04/Aug/2016:18:45:00 +0800] "GET /d/?q=\xa9\xfa\xa4\xd1\xb7|\xa7\xf3\xa6n HTTP/1.1" 302 3734 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko"
I want to decode '\xa9\xfa\xa4\xd1\xb7|\xa7\xf3\xa6n'.
In python 2, I use the code:
print(line.decode('string-escape').decode('big5'))
The result:
明天會更好
But I can't write the right code in python 3.
I try to use the code:
with open('access.log', 'r') as f:
line = f.read()
print(bytes(line, 'latin-1').decode('big5'))
The result:
\xa9\xfa\xa4\xd1\xb7|\xa7\xf3\xa6n
Or this code:
with open('access.log', 'rb') as f:
line = f.read()
print(line.decode('big5'))
The result:
\xa9\xfa\xa4\xd1\xb7|\xa7\xf3\xa6n
It seems because read form file with Python 3, the '\x' become '\x'. So if someone help me to resolve this problem? Thank you.