Converting and writing list of strings as binary in Python 3

Question

I'm trying to convert a Python 2.x version of this code:

out_chunk = open('out.txt','w+b')
chunks.append(out_chunk) # out_chunk is just a list of strings like ['a', 'b', ...]
out_chunk.writelines(chunk)

into Python 3.x version. If I run the above code in Python 3.x directly, I get an error like below, which is expected:

Traceback (most recent call last):
  File "C:/Users/Desktop/es/prog.py", line 145, in <module>
    ob.external_sort()
  File "C:/Users/Desktop/es/prog.py", line 70, in my_func
    out_chunk.writelines(chunk)
TypeError: a bytes-like object is required, not 'str'

Is there a way to write list of strings as bytes in Python 3.x? Or should I just write as a list of strings (and take the performance hit, maybe?)

So why is your Python 3 version not producing bytes objects? — Martijn Pieters
– Martijn Pieters, Commented Jul 31, 2016 at 22:11

Martijn Pieters · Accepted Answer · 2016-07-31 22:21:24Z

You opened the file in binary mode, so you'd have to encode your bytes.

If you drop the 'b' part from the file mode (so open with 'w+' rather than 'w+b'), you get an implementation of the TextIOBase interface instead, which will encode strings for you given an encoding (the default is to use the result of locale.getdefaultencoding(), you probably want to supply an explicit encoding argument to the open() call instead).

The alternative would be for you to encode your strings manually, using the str.encode() method on each chunk. Leaving encoding to the TextIOBase implementation is going to be a little faster however, because the I/O layer can encode without having to look up a method object on each str chunk, nor do the resulting bytes have to be boxed in a Python bytes object again.

Also, for encodings that require a byte order mark, it is best to leave writing that marker to the file implementation.

However, if you could produce bytes objects in the first place, you'd avoid having to encode at all.

cdonts · Accepted Answer · 2016-07-31 21:23:31Z

1

Just don't open the file in binary mode:

out_chunk = open('out.txt','w+')

Hope it helps!

answered Jul 31, 2016 at 21:23

cdonts

9,7116 gold badges53 silver badges78 bronze badges

5 Comments

user1330974 Over a year ago

Yeah, I figured that. But I wonder if writing a list of string as binary might add some performance gain as opposed to just simply writing them Text I/O. Hopefully, someone who knows Python 3.x IO behavior well might comment here. :) Thank you for your suggestion.

Serge Ballesta Over a year ago

Don't worry for such lower case optimization as it could change in a minor Python release. If you have Python3 (unicode) strings and want to write them to a text file, you must first encode them. You can either do explicit encoding and then write to a binary file, or let the TextIO engine do an implicit encoding. Any way, the same encoding have took place so performance should be very close.

Martijn Pieters Over a year ago

@SergeBallesta: well, the TextIOBase implementation doesn't have to resolve the .encode attribute for each chunk, nor push the current frame on the stack to make the call, nor does it have to create the bytes object (it can just leave the bytes as a C array to pass on to the wrapped buffer).

Martijn Pieters Over a year ago

@SergeBallesta: of course, compared to the slow speed of hardware access to a spinning disk that performance difference is going to be insignificant in most cases.

user1330974 Over a year ago

Thank you guys for insightful answers. I learned more from them! :)

Collectives™ on Stack Overflow

Converting and writing list of strings as binary in Python 3

2 Answers 2

Comments

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related