3

I want a version of buffer which points to a bytearray and is mutable. I want to pass it to I/O functions like io.BufferedIOBase.readinto() without an overhead of memory allocation in a loop.

import sys, struct

ba = bytearray(2000)
lenbuf = bytearray(8)

with open(sys.argv[1]) as fp:
  while True:
    fp.readinto(lenbuf)  # efficient version of fp.read(8)
    dat_len = struct.unpack("Q", lenbuf)
    buf = buffer(ba, 0, dat_len)
    fp.readinto(buf)  # efficient version of fp.read(dat_len), but
                      # yields TypeError: must be read-write buffer, not buffer
    my_parse(buf)

I also tried buf =memoryview(buffer(ba, 0, length)) but got (essentially) the same error.

I believe using Python shouldn't be synonymous to paying little attention to runtime performance.

I use Python 2.6 installed on Cent6 by default but can switch to 2.7 or 3.x if really necessary.

Thanks!

Update <- no, this is not the way to go

I'm perplexed by the behavior of a slice into bytearray. The below transcript suggests I can simply take a slice out of a bytearray:

>>> x = bytearray(10**8)
>>> cProfile.run('x[10:13]="abc"')
         2 function calls in 0.000 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.000    0.000 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

>>> x.count(b'\x00')
3999999997
>>> len(x)
4000000000

>>> cProfile.run('x[10:13]="abcd"')  # intentionally try an inefficient case
         2 function calls in 0.750 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.750    0.750    0.750    0.750 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

>>> len(x)
4000000001

But, the "mutable slice" doesn't work as expected under assignment of a single byte:

>>> x = bytearray(4*10**9)
>>> x = bytearray(10)
>>> x[2] = 0xff
>>> x.count(b'\x00')
9
>>> x[3:5][0] = 0xff
>>> x.count(b'\x00')
9  # WHAT

I will not really use a single byte assignment in my application, but I'm concerned if there's any fundamental misunderstanding.

3
  • why do you need a buffer when all the functions you mention are actually expecting a bytearray? Commented Feb 15, 2016 at 7:59
  • Because those I/O functions try to fill as long as len(buf) bytes but I want to keep reusing a single "long enough" buffer (bytearray(2000)) Commented Feb 15, 2016 at 8:11
  • I'm curious to see if there are any performance improvement between your code and @ALGOholic code. Because frankly, with garbage collection, trying to fix the supposed overhead of memory allocation is rather bold. Commented Feb 15, 2016 at 8:48

1 Answer 1

1

You could let it read excess data and then simply use all excess data from your bytearray before reading more from file.

Otherwise you can use numpy:

import sys, struct
import numpy as np

buf = np.zeros(2000, dtype=np.uint8)
lenbuf = bytearray(8)

with open(sys.argv[1]) as fp:
    while True:
        fp.readinto(lenbuf)
        dat_len = struct.unpack("Q", lenbuf)
        fp.readinto(buf[:dat_len])
        my_parse(buf[:dat_len])

numpy creates the read-write buffers you need and indexing [:dat_len] returns a "view" of subset of the data rather than copy. Since numpy arrays conform to buffer protocol you can further use them with struct.unpack() as if they were bytearrays/buffers.

Sign up to request clarification or add additional context in comments.

1 Comment

"You could let it read excess data and then simply use all excess data from your bytearray before reading more from file." Sorry but you're essentially saying I should implement buffered I/O myself here. But thanks for letting me know about NumPy array type.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.