Correct use of buffer protocol for dynamic array

Question

I have a dynamic array type in C++ that I would like to expose through the buffer protocol. The type is already exposed as a sequence in Python, but I want to build NumPy arrays with the C++ array data, and I am hoping that a buffer will be faster than the element-wise construction from a sequence.

Reading through the protocol description, I am not sure what is the correct way to do this. The problem is that, being a dynamic array, its memory may be rellocated, and the bounds of valid memory may change. But, from what I understand, the buffer protocol assumes that the exposed buffer will remain intact on the native side, at least as long as one Python buffer object is alive.

The only solution I can think of is copying the array contents into a new memory area when a buffer is requested and delete that memory after the buffer is no longer needed. But I am not sure if this complies with the buffer protocol, i.e. returning a buffer that may not represent the current state of the corresponding Python object.

The documentation on the obj field of the Py_buffer struct says:

As a special case, for temporary buffers that are wrapped by PyMemoryView_FromBuffer() or PyBuffer_FillInfo() this field is NULL. In general, exporting objects MUST NOT use this scheme.

If I did make a copy of the data on each buffer request, would it qualify as such a "temporary" buffer?

Obviously, making a copy of the data somewhat misses the point of the buffer protocol, but as I said I'm hoping NumPy array construction will be faster this way (that is, with just a memcpy copy instead of a loop over sequence items).

" have a dynamic array type in C++" --> So why tag C and not C++? — chux
– chux, Commented Dec 13, 2023 at 12:48
@chux-ReinstateMonica It doesn't really matter that it is a C++ dynamic array, it would be the same for a C dynamic array. The question is about the Python C api so C tag seemed more appropriate. — javidcf
– javidcf, Commented Dec 13, 2023 at 13:09

DavidW · Accepted Answer · 2023-12-14 12:00:53Z

1

The option that's commonly taken by Python itself is to block resizing of the array while a buffer is held. It definitely does this for bytearray (and I think for array.array too)

e.g.

a = bytearray(b'abc')
a.append(ord(b'd')) # works
view = memoryview(a)
a.append(ord(b'e')) # fails

The last line raises BufferError: Existing exports of data: object cannot be re-sized.

Given that's what Python does I'd say it's fairly idiomatic. You just need to keep a counter of "number of buffers held".

answered Dec 14, 2023 at 12:00

DavidW

31.2k7 gold badges64 silver badges99 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

javidcf Over a year ago

Thanks, that's a good example. My question was about the case where changes to the underlying array can happen while buffers are alive. I suppose copying the data is the only possibility, but I'm just not sure if that is a valid use of the protocol. I have the impression there is nothing forbidding it, so maybe it's just a matter of documenting the behaviour, like "getting a buffer view for this object gives you a copy and changes to the underlying object are not reflected in the buffer and viceversa".

DavidW Over a year ago

My personal feeling is that this would be OK, but maybe a bit surprising. But I'm not sure there's an official rule. It'd probably be better to return a read-only buffer if you were going to do that (so it's a snapshot at a certain point in time, and nobody is expecting to change the underlying data through the buffer protocol). I'd also be tempted to only make the copy when someone actually does a resize when there's a buffer held.

Collectives™ on Stack Overflow

Correct use of buffer protocol for dynamic array

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related