What is the most efficient way to write in binary output in python?

Question

I have this list

bytes = ['11010001', '00100111']

And I want to write the content of bytes in my own binary file as a byte. So I iterate through every element of the list, convert it from string to binary and then write it in the file as a char represented by that binary combination.

output = open(location+filename + '.enchuff', 'wb')
for byte in bytes:
   chunk = int(byte, base=2)
   output.write(chr(chunk))

It works well, but the problem is when the bytes list gets bigger. I generate it from another file and when I input let's say a 100MB file for it to read, the list gets REALLY long and my program hangs on the for cycle. I guess the for cycle must be the problem, since it is iterating probably more than hundreds thousands of elements and writes down every single one of them. Also my memory consumption jumps from that point even to 4GB of ram. Is there any other way to achieve this faster and preserve precious RAM?

You could make an an iterator out of the contents of your input file and pass the iterator to your byte conversion function. Won't make it faster, but will conserve ram. — JL Peyret
– JL Peyret, Commented Nov 1, 2015 at 20:10
Could you show me exactly how would I achieve that? I really need to preserve the ram. Right now when I input too large files, my whole laptop's ram fills and then I need to restart it. — Arcane
– Arcane, Commented Nov 1, 2015 at 20:16
You might like to try something like bitarray to represent your bits, instead of a list of strings. Instead of 1 byte for 8 bits, you're using 8 bytes (for the string) + whatever the list overhead is. — Claudiu
– Claudiu, Commented Nov 1, 2015 at 22:33

Krumelur · Accepted Answer · 2015-11-01 22:31:21Z

2

Your code is probably inefficient because you are performing a write for every byte in the stream. While writes are likely buffered, this happens at a lower level, giving quite a lot of overhead per iteration.

You could instead convert the byte stream in memory, later writing it to a stream, e.g.:

data = [chr(int(x, base=2)) for x in bytes]
output.write(''.join(data))

If memory consumption is an issue, you could write the converted bytes in chunks, e.g.:

chunksize = 1024
for c in range(0,len(bytes),chunksize):
   data = [chr(int(x, base=2)) for x in bytes[c:c+chunksize]]
   output.write(''.join(data))

answered Nov 1, 2015 at 22:31

Krumelur

32.8k10 gold badges82 silver badges123 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Arcane Over a year ago

Thank you, in the end I used this. There was more to my problem, then just a bad for loop in output, but this really helped.

Collectives™ on Stack Overflow

What is the most efficient way to write in binary output in python?

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related