Different interpretation of hex strings in python

Ask Question

Asked 5 years, 8 months ago

Modified 5 years, 8 months ago

Viewed 169 times

In the past few days, I've been struggling to understand why this piece of codes behaves in such a way:

code:

file1 = open("input.txt","r")
M = file1.read()
file1.close()
print(M)
print(M.encode("latin"))
print(type(M.encode("latin")))
print("\n-----------------------------\n")
t = "\xAC\x42\x4C\x45\x54\x43\x48\x49\x4E\x47\x4C\x45\x59"
print(t)
print(t.encode("latin"))
print(type(t.encode("latin")))

file "input.txt" content:

\xAC\x42\x4C\x45\x54\x43\x48\x49\x4E\x47\x4C\x45\x59

output:

\xAC\x42\x4C\x45\x54\x43\x48\x49\x4E\x47\x4C\x45\x59
b'\\xAC\\x42\\x4C\\x45\\x54\\x43\\x48\\x49\\x4E\\x47\\x4C\\x45\\x59'
<class 'bytes'>

-----------------------------

¬BLETCHINGLEY
b'\xacBLETCHINGLEY'
<class 'bytes'>

What I don't understand is why the same string is interpreted in 2 different ways, if I read it from the file or if I copy it (by hands) in a variable. I know that the double "\" is probably the result of me printing the string to the console, but I cannot understand what is happening.

edited Mar 26, 2020 at 22:24

David Buck

3,88840 gold badges54 silver badges74 bronze badges

asked Mar 23, 2020 at 9:45

Marco Borinato

514 bronze badges

1

Here you are fundamentally confusing two things. In your text file, you have \xAC\x42\x4C\x45\x54\x43\x48\x49\x4E\x47\x4C\x45\x59, so literally those characters. in Python source code, string literals understand these back-space + x combination as an escape sequence. Similarly, if you write hello\nworld in a text file, and load it in python and print it, you'll see hello\nworld on the same line, but if your source code contains print("hello\nworld") you will see it hello then on another line world

juanpa.arrivillaga
– juanpa.arrivillaga

2020-03-24 09:54:27 +00:00
Commented Mar 24, 2020 at 9:54
1

IOW, these are two completely different strings. In one, you've used a string literal with escape sequences to particular unicode characters, t = "\xAC\x42\x4C\x45\x54\x43\x48\x49\x4E\x47\x4C\x45\x59", in the other, M you have the a string which happens to represent that same source code. But that won't make python magically execute this string. The same, if you write in a text file [1,2,3] and load it in the same way as M, then type(M) will be str, not magically list because strings are not source code. You would need to use eval

juanpa.arrivillaga
– juanpa.arrivillaga

2020-03-24 09:56:59 +00:00
Commented Mar 24, 2020 at 9:56
Solved it! Thanks again, I needed to understand the basics and your comment was very clear! Much appreciated.

Marco Borinato
– Marco Borinato

2020-03-25 17:38:36 +00:00
Commented Mar 25, 2020 at 17:38

Add a comment |

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

Different interpretation of hex strings in python

0

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest

Linked