0

I'm trying to convert a Python 2.x version of this code:

out_chunk = open('out.txt','w+b')
chunks.append(out_chunk) # out_chunk is just a list of strings like ['a', 'b', ...]
out_chunk.writelines(chunk)

into Python 3.x version. If I run the above code in Python 3.x directly, I get an error like below, which is expected:

Traceback (most recent call last):
  File "C:/Users/Desktop/es/prog.py", line 145, in <module>
    ob.external_sort()
  File "C:/Users/Desktop/es/prog.py", line 70, in my_func
    out_chunk.writelines(chunk)
TypeError: a bytes-like object is required, not 'str'

Is there a way to write list of strings as bytes in Python 3.x? Or should I just write as a list of strings (and take the performance hit, maybe?)

1
  • So why is your Python 3 version not producing bytes objects? Commented Jul 31, 2016 at 22:11

2 Answers 2

3

You opened the file in binary mode, so you'd have to encode your bytes.

If you drop the 'b' part from the file mode (so open with 'w+' rather than 'w+b'), you get an implementation of the TextIOBase interface instead, which will encode strings for you given an encoding (the default is to use the result of locale.getdefaultencoding(), you probably want to supply an explicit encoding argument to the open() call instead).

The alternative would be for you to encode your strings manually, using the str.encode() method on each chunk. Leaving encoding to the TextIOBase implementation is going to be a little faster however, because the I/O layer can encode without having to look up a method object on each str chunk, nor do the resulting bytes have to be boxed in a Python bytes object again.

Also, for encodings that require a byte order mark, it is best to leave writing that marker to the file implementation.

However, if you could produce bytes objects in the first place, you'd avoid having to encode at all.

Sign up to request clarification or add additional context in comments.

Comments

1

Just don't open the file in binary mode:

out_chunk = open('out.txt','w+')

Hope it helps!

5 Comments

Yeah, I figured that. But I wonder if writing a list of string as binary might add some performance gain as opposed to just simply writing them Text I/O. Hopefully, someone who knows Python 3.x IO behavior well might comment here. :) Thank you for your suggestion.
Don't worry for such lower case optimization as it could change in a minor Python release. If you have Python3 (unicode) strings and want to write them to a text file, you must first encode them. You can either do explicit encoding and then write to a binary file, or let the TextIO engine do an implicit encoding. Any way, the same encoding have took place so performance should be very close.
@SergeBallesta: well, the TextIOBase implementation doesn't have to resolve the .encode attribute for each chunk, nor push the current frame on the stack to make the call, nor does it have to create the bytes object (it can just leave the bytes as a C array to pass on to the wrapped buffer).
@SergeBallesta: of course, compared to the slow speed of hardware access to a spinning disk that performance difference is going to be insignificant in most cases.
Thank you guys for insightful answers. I learned more from them! :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.