2

I'm trying to read a file's contents and convert them into what is actually stored in memory if I write

file = open("filename","br")
binary = "0b"
for i in file.read():
    binary += bin(i)[2:]

will binary equal the actual value stored in memory? if so, how can I convert this back into a string?

EDIT: I tried

file = open("filename.txt","br")
binary = ""
for i in file.read():
    binary += bin(i)[2:]
stored = ""
for bit in binary:
    stored += bit
    if len(stored) == 7:
        print(chr(eval("0b"+stored)), end="")
        stored = ""

and it worked fine until it reached a space and then it became weird signs and mixed-up letters.

3
  • It's not really clear what you're trying to do. file.read() is literally the bytes that are in the file. Could you give an example of what you think is in the file and what you want the result to look like? Commented Sep 12, 2020 at 21:22
  • I'm trying to do this for any text file in general. also, I want the result to be what's in the file to prove to myself that I actually have the binary version for various purposes Commented Sep 12, 2020 at 21:24
  • Also, you may not know that when you loop through a set of bytes, it returns the number representing those bytes, like ord does. Commented Sep 12, 2020 at 21:30

1 Answer 1

2

To get a (somewhat) accurate representation of the string as it is stored in memory, you need to convert each character into binary.

Assuming basic ascii (1 byte per character) encoding:

s = "python"
binlst = [bin(ord(c))[2:].rjust(8,'0') for c in s]  # remove '0b' from string, fill 8 bits
binstr = ''.join(binlst)

print(s)
print(binlst)
print(binstr)

Output

python
['01110000', '01111001', '01110100', '01101000', '01101111', '01101110']
011100000111100101110100011010000110111101101110

For unicode (utf-8), the length of each character can be 1-4 bytes so it's difficult to determine the exact binary representation. As @Yellen mentioned, it may be easier to just convert the file bytes to binary.

Sign up to request clarification or add additional context in comments.

3 Comments

I found an interesting article describing how to determine how many bytes UTF-8 encoded characters need to be read: johndcook.com/blog/2019/09/09/how-utf-8-works
@Mike67 so the problem was that bin deletes trailing zeros so you need to add them back?
It deletes leading zeroes, so 00001101 becomes 1101. Need to add back zeros to fill 8 bits.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.