1

I have this list

bytes = ['11010001', '00100111']

And I want to write the content of bytes in my own binary file as a byte. So I iterate through every element of the list, convert it from string to binary and then write it in the file as a char represented by that binary combination.

output = open(location+filename + '.enchuff', 'wb')
for byte in bytes:
   chunk = int(byte, base=2)
   output.write(chr(chunk))

It works well, but the problem is when the bytes list gets bigger. I generate it from another file and when I input let's say a 100MB file for it to read, the list gets REALLY long and my program hangs on the for cycle. I guess the for cycle must be the problem, since it is iterating probably more than hundreds thousands of elements and writes down every single one of them. Also my memory consumption jumps from that point even to 4GB of ram. Is there any other way to achieve this faster and preserve precious RAM?

3
  • You could make an an iterator out of the contents of your input file and pass the iterator to your byte conversion function. Won't make it faster, but will conserve ram. Commented Nov 1, 2015 at 20:10
  • Could you show me exactly how would I achieve that? I really need to preserve the ram. Right now when I input too large files, my whole laptop's ram fills and then I need to restart it. Commented Nov 1, 2015 at 20:16
  • You might like to try something like bitarray to represent your bits, instead of a list of strings. Instead of 1 byte for 8 bits, you're using 8 bytes (for the string) + whatever the list overhead is. Commented Nov 1, 2015 at 22:33

1 Answer 1

2

Your code is probably inefficient because you are performing a write for every byte in the stream. While writes are likely buffered, this happens at a lower level, giving quite a lot of overhead per iteration.

You could instead convert the byte stream in memory, later writing it to a stream, e.g.:

data = [chr(int(x, base=2)) for x in bytes]
output.write(''.join(data))

If memory consumption is an issue, you could write the converted bytes in chunks, e.g.:

chunksize = 1024
for c in range(0,len(bytes),chunksize):
   data = [chr(int(x, base=2)) for x in bytes[c:c+chunksize]]
   output.write(''.join(data))
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you, in the end I used this. There was more to my problem, then just a bad for loop in output, but this really helped.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.