1

I have a dynamic array type in C++ that I would like to expose through the buffer protocol. The type is already exposed as a sequence in Python, but I want to build NumPy arrays with the C++ array data, and I am hoping that a buffer will be faster than the element-wise construction from a sequence.

Reading through the protocol description, I am not sure what is the correct way to do this. The problem is that, being a dynamic array, its memory may be rellocated, and the bounds of valid memory may change. But, from what I understand, the buffer protocol assumes that the exposed buffer will remain intact on the native side, at least as long as one Python buffer object is alive.

The only solution I can think of is copying the array contents into a new memory area when a buffer is requested and delete that memory after the buffer is no longer needed. But I am not sure if this complies with the buffer protocol, i.e. returning a buffer that may not represent the current state of the corresponding Python object.

The documentation on the obj field of the Py_buffer struct says:

As a special case, for temporary buffers that are wrapped by PyMemoryView_FromBuffer() or PyBuffer_FillInfo() this field is NULL. In general, exporting objects MUST NOT use this scheme.

If I did make a copy of the data on each buffer request, would it qualify as such a "temporary" buffer?

Obviously, making a copy of the data somewhat misses the point of the buffer protocol, but as I said I'm hoping NumPy array construction will be faster this way (that is, with just a memcpy copy instead of a loop over sequence items).

2
  • " have a dynamic array type in C++" --> So why tag C and not C++? Commented Dec 13, 2023 at 12:48
  • @chux-ReinstateMonica It doesn't really matter that it is a C++ dynamic array, it would be the same for a C dynamic array. The question is about the Python C api so C tag seemed more appropriate. Commented Dec 13, 2023 at 13:09

1 Answer 1

1

The option that's commonly taken by Python itself is to block resizing of the array while a buffer is held. It definitely does this for bytearray (and I think for array.array too)

e.g.

a = bytearray(b'abc')
a.append(ord(b'd')) # works
view = memoryview(a)
a.append(ord(b'e')) # fails

The last line raises BufferError: Existing exports of data: object cannot be re-sized.

Given that's what Python does I'd say it's fairly idiomatic. You just need to keep a counter of "number of buffers held".

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks, that's a good example. My question was about the case where changes to the underlying array can happen while buffers are alive. I suppose copying the data is the only possibility, but I'm just not sure if that is a valid use of the protocol. I have the impression there is nothing forbidding it, so maybe it's just a matter of documenting the behaviour, like "getting a buffer view for this object gives you a copy and changes to the underlying object are not reflected in the buffer and viceversa".
My personal feeling is that this would be OK, but maybe a bit surprising. But I'm not sure there's an official rule. It'd probably be better to return a read-only buffer if you were going to do that (so it's a snapshot at a certain point in time, and nobody is expecting to change the underlying data through the buffer protocol). I'd also be tempted to only make the copy when someone actually does a resize when there's a buffer held.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.