984 lines
35 KiB
Plaintext
984 lines
35 KiB
Plaintext
PEP: 3118
|
|
Title: Revising the buffer protocol
|
|
Version: $Revision$
|
|
Last-Modified: $Date$
|
|
Author: Travis Oliphant <oliphant@ee.byu.edu>, Carl Banks <pythondev@aerojockey.com>
|
|
Status: Final
|
|
Type: Standards Track
|
|
Content-Type: text/x-rst
|
|
Created: 28-Aug-2006
|
|
Python-Version: 3.0
|
|
Post-History:
|
|
|
|
Abstract
|
|
========
|
|
|
|
This PEP proposes re-designing the buffer interface (``PyBufferProcs``
|
|
function pointers) to improve the way Python allows memory sharing in
|
|
Python 3.0
|
|
|
|
In particular, it is proposed that the character buffer portion
|
|
of the API be eliminated and the multiple-segment portion be
|
|
re-designed in conjunction with allowing for strided memory
|
|
to be shared. In addition, the new buffer interface will
|
|
allow the sharing of any multi-dimensional nature of the
|
|
memory and what data-format the memory contains.
|
|
|
|
This interface will allow any extension module to either
|
|
create objects that share memory or create algorithms that
|
|
use and manipulate raw memory from arbitrary objects that
|
|
export the interface.
|
|
|
|
|
|
Rationale
|
|
=========
|
|
|
|
The Python 2.X buffer protocol allows different Python types to
|
|
exchange a pointer to a sequence of internal buffers. This
|
|
functionality is *extremely* useful for sharing large segments of
|
|
memory between different high-level objects, but it is too limited and
|
|
has issues:
|
|
|
|
1. There is the little used "sequence-of-segments" option
|
|
(bf_getsegcount) that is not well motivated.
|
|
|
|
2. There is the apparently redundant character-buffer option
|
|
(bf_getcharbuffer)
|
|
|
|
3. There is no way for a consumer to tell the buffer-API-exporting
|
|
object it is "finished" with its view of the memory and
|
|
therefore no way for the exporting object to be sure that it is
|
|
safe to reallocate the pointer to the memory that it owns (for
|
|
example, the array object reallocating its memory after sharing
|
|
it with the buffer object which held the original pointer led
|
|
to the infamous buffer-object problem).
|
|
|
|
4. Memory is just a pointer with a length. There is no way to
|
|
describe what is "in" the memory (float, int, C-structure, etc.)
|
|
|
|
5. There is no shape information provided for the memory. But,
|
|
several array-like Python types could make use of a standard
|
|
way to describe the shape-interpretation of the memory
|
|
(wxPython, GTK, pyQT, CVXOPT, PyVox, Audio and Video
|
|
Libraries, ctypes, NumPy, data-base interfaces, etc.)
|
|
|
|
6. There is no way to share discontiguous memory (except through
|
|
the sequence of segments notion).
|
|
|
|
There are two widely used libraries that use the concept of
|
|
discontiguous memory: PIL and NumPy. Their view of discontiguous
|
|
arrays is different, though. The proposed buffer interface allows
|
|
sharing of either memory model. Exporters will typically use only one
|
|
approach and consumers may choose to support discontiguous
|
|
arrays of each type however they choose.
|
|
|
|
NumPy uses the notion of constant striding in each dimension as its
|
|
basic concept of an array. With this concept, a simple sub-region
|
|
of a larger array can be described without copying the data.
|
|
Thus, stride information is the additional information that must be
|
|
shared.
|
|
|
|
The PIL uses a more opaque memory representation. Sometimes an
|
|
image is contained in a contiguous segment of memory, but sometimes
|
|
it is contained in an array of pointers to the contiguous segments
|
|
(usually lines) of the image. The PIL is where the idea of multiple
|
|
buffer segments in the original buffer interface came from.
|
|
|
|
NumPy's strided memory model is used more often in computational
|
|
libraries and because it is so simple it makes sense to support
|
|
memory sharing using this model. The PIL memory model is sometimes
|
|
used in C-code where a 2-d array can then be accessed using double
|
|
pointer indirection: e.g. ``image[i][j]``.
|
|
|
|
The buffer interface should allow the object to export either of these
|
|
memory models. Consumers are free to either require contiguous memory
|
|
or write code to handle one or both of these memory models.
|
|
|
|
Proposal Overview
|
|
=================
|
|
|
|
* Eliminate the char-buffer and multiple-segment sections of the
|
|
buffer-protocol.
|
|
|
|
* Unify the read/write versions of getting the buffer.
|
|
|
|
* Add a new function to the interface that should be called when
|
|
the consumer object is "done" with the memory area.
|
|
|
|
* Add a new variable to allow the interface to describe what is in
|
|
memory (unifying what is currently done now in struct and
|
|
array)
|
|
|
|
* Add a new variable to allow the protocol to share shape information
|
|
|
|
* Add a new variable for sharing stride information
|
|
|
|
* Add a new mechanism for sharing arrays that must
|
|
be accessed using pointer indirection.
|
|
|
|
* Fix all objects in the core and the standard library to conform
|
|
to the new interface
|
|
|
|
* Extend the struct module to handle more format specifiers
|
|
|
|
* Extend the buffer object into a new memory object which places
|
|
a Python veneer around the buffer interface.
|
|
|
|
* Add a few functions to make it easy to copy contiguous data
|
|
in and out of object supporting the buffer interface.
|
|
|
|
Specification
|
|
=============
|
|
|
|
While the new specification allows for complicated memory sharing,
|
|
simple contiguous buffers of bytes can still be obtained from an
|
|
object. In fact, the new protocol allows a standard mechanism for
|
|
doing this even if the original object is not represented as a
|
|
contiguous chunk of memory.
|
|
|
|
The easiest way to obtain a simple contiguous chunk of memory is
|
|
to use the provided C-API to obtain a chunk of memory.
|
|
|
|
|
|
Change the ``PyBufferProcs`` structure to ::
|
|
|
|
typedef struct {
|
|
getbufferproc bf_getbuffer;
|
|
releasebufferproc bf_releasebuffer;
|
|
} PyBufferProcs;
|
|
|
|
Both of these routines are optional for a type object
|
|
|
|
::
|
|
|
|
typedef int (*getbufferproc)(PyObject *obj, PyBuffer *view, int flags)
|
|
|
|
This function returns ``0`` on success and ``-1`` on failure (and raises an
|
|
error). The first variable is the "exporting" object. The second
|
|
argument is the address to a bufferinfo structure. Both arguments must
|
|
never be NULL.
|
|
|
|
The third argument indicates what kind of buffer the consumer is
|
|
prepared to deal with and therefore what kind of buffer the exporter
|
|
is allowed to return. The new buffer interface allows for much more
|
|
complicated memory sharing possibilities. Some consumers may not be
|
|
able to handle all the complexity but may want to see if the
|
|
exporter will let them take a simpler view to its memory.
|
|
|
|
In addition, some exporters may not be able to share memory in every
|
|
possible way and may need to raise errors to signal to some consumers
|
|
that something is just not possible. These errors should be
|
|
``PyErr_BufferError`` unless there is another error that is actually
|
|
causing the problem. The exporter can use flags information to
|
|
simplify how much of the PyBuffer structure is filled in with
|
|
non-default values and/or raise an error if the object can't support a
|
|
simpler view of its memory.
|
|
|
|
The exporter should always fill in all elements of the buffer
|
|
structure (with defaults or NULLs if nothing else is requested). The
|
|
PyBuffer_FillInfo function can be used for simple cases.
|
|
|
|
|
|
Access flags
|
|
------------
|
|
|
|
Some flags are useful for requesting a specific kind of memory
|
|
segment, while others indicate to the exporter what kind of
|
|
information the consumer can deal with. If certain information is not
|
|
asked for by the consumer, but the exporter cannot share its memory
|
|
without that information, then a ``PyErr_BufferError`` should be raised.
|
|
|
|
``PyBUF_SIMPLE``
|
|
|
|
This is the default flag state (0). The returned buffer may or may
|
|
not have writable memory. The format will be assumed to be
|
|
unsigned bytes. This is a "stand-alone" flag constant. It never
|
|
needs to be \|'d to the others. The exporter will raise an error if
|
|
it cannot provide such a contiguous buffer of bytes.
|
|
|
|
``PyBUF_WRITABLE``
|
|
|
|
The returned buffer must be writable. If it is not writable,
|
|
then raise an error.
|
|
|
|
``PyBUF_FORMAT``
|
|
|
|
The returned buffer must have true format information if this flag
|
|
is provided. This would be used when the consumer is going to be
|
|
checking for what 'kind' of data is actually stored. An exporter
|
|
should always be able to provide this information if requested. If
|
|
format is not explicitly requested then the format must be returned
|
|
as ``NULL`` (which means "B", or unsigned bytes)
|
|
|
|
``PyBUF_ND``
|
|
|
|
The returned buffer must provide shape information. The memory will
|
|
be assumed C-style contiguous (last dimension varies the fastest).
|
|
The exporter may raise an error if it cannot provide this kind of
|
|
contiguous buffer. If this is not given then shape will be NULL.
|
|
|
|
``PyBUF_STRIDES`` (implies ``PyBUF_ND``)
|
|
|
|
The returned buffer must provide strides information (i.e. the
|
|
strides cannot be NULL). This would be used when the consumer can
|
|
handle strided, discontiguous arrays. Handling strides
|
|
automatically assumes you can handle shape. The exporter may raise
|
|
an error if cannot provide a strided-only representation of the
|
|
data (i.e. without the suboffsets).
|
|
|
|
| ``PyBUF_C_CONTIGUOUS``
|
|
| ``PyBUF_F_CONTIGUOUS``
|
|
| ``PyBUF_ANY_CONTIGUOUS``
|
|
|
|
These flags indicate that the returned buffer must be respectively,
|
|
C-contiguous (last dimension varies the fastest), Fortran
|
|
contiguous (first dimension varies the fastest) or either one.
|
|
All of these flags imply PyBUF_STRIDES and guarantee that the
|
|
strides buffer info structure will be filled in correctly.
|
|
|
|
``PyBUF_INDIRECT`` (implies ``PyBUF_STRIDES``)
|
|
|
|
The returned buffer must have suboffsets information (which can be
|
|
NULL if no suboffsets are needed). This would be used when the
|
|
consumer can handle indirect array referencing implied by these
|
|
suboffsets.
|
|
|
|
|
|
Specialized combinations of flags for specific kinds of memory_sharing.
|
|
|
|
Multi-dimensional (but contiguous)
|
|
|
|
| ``PyBUF_CONTIG`` (``PyBUF_ND | PyBUF_WRITABLE``)
|
|
| ``PyBUF_CONTIG_RO`` (``PyBUF_ND``)
|
|
|
|
Multi-dimensional using strides but aligned
|
|
|
|
| ``PyBUF_STRIDED`` (``PyBUF_STRIDES | PyBUF_WRITABLE``)
|
|
| ``PyBUF_STRIDED_RO`` (``PyBUF_STRIDES``)
|
|
|
|
Multi-dimensional using strides and not necessarily aligned
|
|
|
|
| ``PyBUF_RECORDS`` (``PyBUF_STRIDES | PyBUF_WRITABLE | PyBUF_FORMAT``)
|
|
| ``PyBUF_RECORDS_RO`` (``PyBUF_STRIDES | PyBUF_FORMAT``)
|
|
|
|
Multi-dimensional using sub-offsets
|
|
|
|
| ``PyBUF_FULL`` (``PyBUF_INDIRECT | PyBUF_WRITABLE | PyBUF_FORMAT``)
|
|
| ``PyBUF_FULL_RO`` (``PyBUF_INDIRECT | PyBUF_FORMAT``)
|
|
|
|
Thus, the consumer simply wanting a contiguous chunk of bytes from
|
|
the object would use ``PyBUF_SIMPLE``, while a consumer that understands
|
|
how to make use of the most complicated cases could use ``PyBUF_FULL``.
|
|
|
|
The format information is only guaranteed to be non-NULL if
|
|
``PyBUF_FORMAT`` is in the flag argument, otherwise it is expected the
|
|
consumer will assume unsigned bytes.
|
|
|
|
There is a C-API that simple exporting objects can use to fill-in the
|
|
buffer info structure correctly according to the provided flags if a
|
|
contiguous chunk of "unsigned bytes" is all that can be exported.
|
|
|
|
|
|
The Py_buffer struct
|
|
--------------------
|
|
|
|
The bufferinfo structure is::
|
|
|
|
struct bufferinfo {
|
|
void *buf;
|
|
Py_ssize_t len;
|
|
int readonly;
|
|
const char *format;
|
|
int ndim;
|
|
Py_ssize_t *shape;
|
|
Py_ssize_t *strides;
|
|
Py_ssize_t *suboffsets;
|
|
Py_ssize_t itemsize;
|
|
void *internal;
|
|
} Py_buffer;
|
|
|
|
Before calling the bf_getbuffer function, the bufferinfo structure can
|
|
be filled with whatever, but the ``buf`` field must be NULL when
|
|
requesting a new buffer. Upon return from bf_getbuffer, the
|
|
bufferinfo structure is filled in with relevant information about the
|
|
buffer. This same bufferinfo structure must be passed to
|
|
bf_releasebuffer (if available) when the consumer is done with the
|
|
memory. The caller is responsible for keeping a reference to obj until
|
|
releasebuffer is called (i.e. the call to bf_getbuffer does not alter
|
|
the reference count of obj).
|
|
|
|
The members of the bufferinfo structure are:
|
|
|
|
``buf``
|
|
a pointer to the start of the memory for the object
|
|
|
|
``len``
|
|
the total bytes of memory the object uses. This should be the
|
|
same as the product of the shape array multiplied by the number of
|
|
bytes per item of memory.
|
|
|
|
``readonly``
|
|
an integer variable to hold whether or not the memory is readonly.
|
|
1 means the memory is readonly, zero means the memory is writable.
|
|
|
|
``format``
|
|
a NULL-terminated format-string (following the struct-style syntax
|
|
including extensions) indicating what is in each element of
|
|
memory. The number of elements is len / itemsize, where itemsize
|
|
is the number of bytes implied by the format. This can be NULL which
|
|
implies standard unsigned bytes ("B").
|
|
|
|
``ndim``
|
|
a variable storing the number of dimensions the memory represents.
|
|
Must be >=0. A value of 0 means that shape and strides and suboffsets
|
|
must be ``NULL`` (i.e. the memory represents a scalar).
|
|
|
|
``shape``
|
|
an array of ``Py_ssize_t`` of length ``ndims`` indicating the
|
|
shape of the memory as an N-D array. Note that ``((*shape)[0] *
|
|
... * (*shape)[ndims-1])*itemsize = len``. If ndims is 0 (indicating
|
|
a scalar), then this must be ``NULL``.
|
|
|
|
``strides``
|
|
address of a ``Py_ssize_t*`` variable that will be filled with a
|
|
pointer to an array of ``Py_ssize_t`` of length ``ndims`` (or ``NULL``
|
|
if ``ndims`` is 0). indicating the number of bytes to skip to get to
|
|
the next element in each dimension. If this is not requested by
|
|
the caller (``PyBUF_STRIDES`` is not set), then this should be set
|
|
to NULL which indicates a C-style contiguous array or a
|
|
PyExc_BufferError raised if this is not possible.
|
|
|
|
``suboffsets``
|
|
address of a ``Py_ssize_t *`` variable that will be filled with a
|
|
pointer to an array of ``Py_ssize_t`` of length ``*ndims``. If
|
|
these suboffset numbers are >=0, then the value stored along the
|
|
indicated dimension is a pointer and the suboffset value dictates
|
|
how many bytes to add to the pointer after de-referencing. A
|
|
suboffset value that it negative indicates that no de-referencing
|
|
should occur (striding in a contiguous memory block). If all
|
|
suboffsets are negative (i.e. no de-referencing is needed, then
|
|
this must be NULL (the default value). If this is not requested
|
|
by the caller (PyBUF_INDIRECT is not set), then this should be
|
|
set to NULL or an PyExc_BufferError raised if this is not possible.
|
|
|
|
For clarity, here is a function that returns a pointer to the
|
|
element in an N-D array pointed to by an N-dimensional index when
|
|
there are both non-NULL strides and suboffsets::
|
|
|
|
void *get_item_pointer(int ndim, void *buf, Py_ssize_t *strides,
|
|
Py_ssize_t *suboffsets, Py_ssize_t *indices) {
|
|
char *pointer = (char*)buf;
|
|
int i;
|
|
for (i = 0; i < ndim; i++) {
|
|
pointer += strides[i] * indices[i];
|
|
if (suboffsets[i] >=0 ) {
|
|
pointer = *((char**)pointer) + suboffsets[i];
|
|
}
|
|
}
|
|
return (void*)pointer;
|
|
}
|
|
|
|
Notice the suboffset is added "after" the dereferencing occurs.
|
|
Thus slicing in the ith dimension would add to the suboffsets in
|
|
the (i-1)st dimension. Slicing in the first dimension would change
|
|
the location of the starting pointer directly (i.e. buf would
|
|
be modified).
|
|
|
|
``itemsize``
|
|
This is a storage for the itemsize (in bytes) of each element of the shared
|
|
memory. It is technically un-necessary as it can be obtained using
|
|
``PyBuffer_SizeFromFormat``, however an exporter may know this
|
|
information without parsing the format string and it is necessary
|
|
to know the itemsize for proper interpretation of striding.
|
|
Therefore, storing it is more convenient and faster.
|
|
|
|
``internal``
|
|
This is for use internally by the exporting object. For example,
|
|
this might be re-cast as an integer by the exporter and used to
|
|
store flags about whether or not the shape, strides, and suboffsets
|
|
arrays must be freed when the buffer is released. The consumer
|
|
should never alter this value.
|
|
|
|
|
|
The exporter is responsible for making sure that any memory pointed to
|
|
by buf, format, shape, strides, and suboffsets is valid until
|
|
releasebuffer is called. If the exporter wants to be able to change
|
|
an object's shape, strides, and/or suboffsets before releasebuffer is
|
|
called then it should allocate those arrays when getbuffer is called
|
|
(pointing to them in the buffer-info structure provided) and free them
|
|
when releasebuffer is called.
|
|
|
|
|
|
Releasing the buffer
|
|
--------------------
|
|
|
|
The same bufferinfo struct should be used in the release-buffer
|
|
interface call. The caller is responsible for the memory of the
|
|
Py_buffer structure itself.
|
|
|
|
::
|
|
|
|
typedef void (*releasebufferproc)(PyObject *obj, Py_buffer *view)
|
|
|
|
Callers of getbufferproc must make sure that this function is called
|
|
when memory previously acquired from the object is no longer needed.
|
|
The exporter of the interface must make sure that any memory pointed
|
|
to in the bufferinfo structure remains valid until releasebuffer is
|
|
called.
|
|
|
|
If the bf_releasebuffer function is not provided (i.e. it is NULL),
|
|
then it does not ever need to be called.
|
|
|
|
Exporters will need to define a bf_releasebuffer function if they can
|
|
re-allocate their memory, strides, shape, suboffsets, or format
|
|
variables which they might share through the struct bufferinfo.
|
|
Several mechanisms could be used to keep track of how many getbuffer
|
|
calls have been made and shared. Either a single variable could be
|
|
used to keep track of how many "views" have been exported, or a
|
|
linked-list of bufferinfo structures filled in could be maintained in
|
|
each object.
|
|
|
|
All that is specifically required by the exporter, however, is to
|
|
ensure that any memory shared through the bufferinfo structure remains
|
|
valid until releasebuffer is called on the bufferinfo structure
|
|
exporting that memory.
|
|
|
|
|
|
New C-API calls are proposed
|
|
============================
|
|
|
|
::
|
|
|
|
int PyObject_CheckBuffer(PyObject *obj)
|
|
|
|
Return 1 if the getbuffer function is available otherwise 0.
|
|
|
|
::
|
|
|
|
int PyObject_GetBuffer(PyObject *obj, Py_buffer *view,
|
|
int flags)
|
|
|
|
This is a C-API version of the getbuffer function call. It checks to
|
|
make sure object has the required function pointer and issues the
|
|
call. Returns -1 and raises an error on failure and returns 0 on
|
|
success.
|
|
|
|
::
|
|
|
|
void PyBuffer_Release(PyObject *obj, Py_buffer *view)
|
|
|
|
This is a C-API version of the releasebuffer function call. It checks
|
|
to make sure the object has the required function pointer and issues
|
|
the call. This function always succeeds even if there is no releasebuffer
|
|
function for the object.
|
|
|
|
::
|
|
|
|
PyObject *PyObject_GetMemoryView(PyObject *obj)
|
|
|
|
Return a memory-view object from an object that defines the buffer interface.
|
|
|
|
A memory-view object is an extended buffer object that could replace
|
|
the buffer object (but doesn't have to as that could be kept as a
|
|
simple 1-d memory-view object). Its C-structure is ::
|
|
|
|
typedef struct {
|
|
PyObject_HEAD
|
|
PyObject *base;
|
|
Py_buffer view;
|
|
} PyMemoryViewObject;
|
|
|
|
This is functionally similar to the current buffer object except a
|
|
reference to base is kept and the memory view is not re-grabbed.
|
|
Thus, this memory view object holds on to the memory of base until it
|
|
is deleted.
|
|
|
|
This memory-view object will support multi-dimensional slicing and be
|
|
the first object provided with Python to do so. Slices of the
|
|
memory-view object are other memory-view objects with the same base
|
|
but with a different view of the base object.
|
|
|
|
When an "element" from the memory-view is returned it is always a
|
|
bytes object whose format should be interpreted by the format
|
|
attribute of the memoryview object. The struct module can be used to
|
|
"decode" the bytes in Python if desired. Or the contents can be
|
|
passed to a NumPy array or other object consuming the buffer protocol.
|
|
|
|
The Python name will be
|
|
|
|
``__builtin__.memoryview``
|
|
|
|
Methods:
|
|
|
|
| ``__getitem__`` (will support multi-dimensional slicing)
|
|
| ``__setitem__`` (will support multi-dimensional slicing)
|
|
| ``tobytes`` (obtain a new bytes-object of a copy of the memory).
|
|
| ``tolist`` (obtain a "nested" list of the memory. Everything
|
|
is interpreted into standard Python objects
|
|
as the struct module unpack would do -- in fact
|
|
it uses struct.unpack to accomplish it).
|
|
|
|
Attributes (taken from the memory of the base object):
|
|
|
|
* ``format``
|
|
* ``itemsize``
|
|
* ``shape``
|
|
* ``strides``
|
|
* ``suboffsets``
|
|
* ``readonly``
|
|
* ``ndim``
|
|
|
|
|
|
::
|
|
|
|
Py_ssize_t PyBuffer_SizeFromFormat(const char *)
|
|
|
|
Return the implied itemsize of the data-format area from a struct-style
|
|
description.
|
|
|
|
::
|
|
|
|
PyObject * PyMemoryView_GetContiguous(PyObject *obj, int buffertype,
|
|
char fortran)
|
|
|
|
Return a memoryview object to a contiguous chunk of memory represented
|
|
by obj. If a copy must be made (because the memory pointed to by obj
|
|
is not contiguous), then a new bytes object will be created and become
|
|
the base object for the returned memory view object.
|
|
|
|
The buffertype argument can be PyBUF_READ, PyBUF_WRITE,
|
|
PyBUF_UPDATEIFCOPY to determine whether the returned buffer should be
|
|
readable, writable, or set to update the original buffer if a copy
|
|
must be made. If buffertype is PyBUF_WRITE and the buffer is not
|
|
contiguous an error will be raised. In this circumstance, the user
|
|
can use PyBUF_UPDATEIFCOPY to ensure that a writable temporary
|
|
contiguous buffer is returned. The contents of this contiguous buffer
|
|
will be copied back into the original object after the memoryview
|
|
object is deleted as long as the original object is writable. If this
|
|
is not allowed by the original object, then a BufferError is raised.
|
|
|
|
If the object is multi-dimensional, then if fortran is 'F', the first
|
|
dimension of the underlying array will vary the fastest in the buffer.
|
|
If fortran is 'C', then the last dimension will vary the fastest
|
|
(C-style contiguous). If fortran is 'A', then it does not matter and
|
|
you will get whatever the object decides is more efficient. If a copy
|
|
is made, then the memory must be freed by calling ``PyMem_Free``.
|
|
|
|
You receive a new reference to the memoryview object.
|
|
|
|
::
|
|
|
|
int PyObject_CopyToObject(PyObject *obj, void *buf, Py_ssize_t len,
|
|
char fortran)
|
|
|
|
Copy ``len`` bytes of data pointed to by the contiguous chunk of
|
|
memory pointed to by ``buf`` into the buffer exported by obj. Return
|
|
0 on success and return -1 and raise an error on failure. If the
|
|
object does not have a writable buffer, then an error is raised. If
|
|
fortran is 'F', then if the object is multi-dimensional, then the data
|
|
will be copied into the array in Fortran-style (first dimension varies
|
|
the fastest). If fortran is 'C', then the data will be copied into
|
|
the array in C-style (last dimension varies the fastest). If fortran
|
|
is 'A', then it does not matter and the copy will be made in whatever
|
|
way is more efficient.
|
|
|
|
::
|
|
|
|
int PyObject_CopyData(PyObject *dest, PyObject *src)
|
|
|
|
These last three C-API calls allow a standard way of getting data in and
|
|
out of Python objects into contiguous memory areas no matter how it is
|
|
actually stored. These calls use the extended buffer interface to perform
|
|
their work.
|
|
|
|
::
|
|
|
|
int PyBuffer_IsContiguous(Py_buffer *view, char fortran)
|
|
|
|
Return 1 if the memory defined by the view object is C-style (fortran
|
|
= 'C') or Fortran-style (fortran = 'F') contiguous or either one
|
|
(fortran = 'A'). Return 0 otherwise.
|
|
|
|
::
|
|
|
|
void PyBuffer_FillContiguousStrides(int ndim, Py_ssize_t *shape,
|
|
Py_ssize_t *strides, Py_ssize_t itemsize,
|
|
char fortran)
|
|
|
|
Fill the strides array with byte-strides of a contiguous (C-style if
|
|
fortran is 'C' or Fortran-style if fortran is 'F' array of the given
|
|
shape with the given number of bytes per element.
|
|
|
|
::
|
|
|
|
int PyBuffer_FillInfo(Py_buffer *view, void *buf,
|
|
Py_ssize_t len, int readonly, int infoflags)
|
|
|
|
Fills in a buffer-info structure correctly for an exporter that can
|
|
only share a contiguous chunk of memory of "unsigned bytes" of the
|
|
given length. Returns 0 on success and -1 (with raising an error) on
|
|
error.
|
|
|
|
::
|
|
|
|
PyExc_BufferError
|
|
|
|
A new error object for returning buffer errors which arise because an
|
|
exporter cannot provide the kind of buffer that a consumer expects.
|
|
This will also be raised when a consumer requests a buffer from an
|
|
object that does not provide the protocol.
|
|
|
|
|
|
Additions to the struct string-syntax
|
|
=====================================
|
|
|
|
The struct string-syntax is missing some characters to fully
|
|
implement data-format descriptions already available elsewhere (in
|
|
ctypes and NumPy for example). The Python 2.5 specification is
|
|
at http://docs.python.org/library/struct.html.
|
|
|
|
Here are the proposed additions:
|
|
|
|
|
|
================ ===========
|
|
Character Description
|
|
================ ===========
|
|
't' bit (number before states how many bits)
|
|
'?' platform _Bool type
|
|
'g' long double
|
|
'c' ucs-1 (latin-1) encoding
|
|
'u' ucs-2
|
|
'w' ucs-4
|
|
'O' pointer to Python Object
|
|
'Z' complex (whatever the next specifier is)
|
|
'&' specific pointer (prefix before another character)
|
|
'T{}' structure (detailed layout inside {})
|
|
'(k1,k2,...,kn)' multi-dimensional array of whatever follows
|
|
':name:' optional name of the preceding element
|
|
'X{}' pointer to a function (optional function
|
|
signature inside {} with any return value
|
|
preceded by -> and placed at the end)
|
|
================ ===========
|
|
|
|
The struct module will be changed to understand these as well and
|
|
return appropriate Python objects on unpacking. Unpacking a
|
|
long-double will return a decimal object or a ctypes long-double.
|
|
Unpacking 'u' or 'w' will return Python unicode. Unpacking a
|
|
multi-dimensional array will return a list (of lists if >1d).
|
|
Unpacking a pointer will return a ctypes pointer object. Unpacking a
|
|
function pointer will return a ctypes call-object (perhaps). Unpacking
|
|
a bit will return a Python Bool. White-space in the struct-string
|
|
syntax will be ignored if it isn't already. Unpacking a named-object
|
|
will return some kind of named-tuple-like object that acts like a
|
|
tuple but whose entries can also be accessed by name. Unpacking a
|
|
nested structure will return a nested tuple.
|
|
|
|
Endian-specification ('!', '@','=','>','<', '^') is also allowed
|
|
inside the string so that it can change if needed. The
|
|
previously-specified endian string is in force until changed. The
|
|
default endian is '@' which means native data-types and alignment. If
|
|
un-aligned, native data-types are requested, then the endian
|
|
specification is '^'.
|
|
|
|
According to the struct-module, a number can precede a character
|
|
code to specify how many of that type there are. The
|
|
``(k1,k2,...,kn)`` extension also allows specifying if the data is
|
|
supposed to be viewed as a (C-style contiguous, last-dimension
|
|
varies the fastest) multi-dimensional array of a particular format.
|
|
|
|
Functions should be added to ctypes to create a ctypes object from
|
|
a struct description, and add long-double, and ucs-2 to ctypes.
|
|
|
|
Examples of Data-Format Descriptions
|
|
====================================
|
|
|
|
Here are some examples of C-structures and how they would be
|
|
represented using the struct-style syntax.
|
|
|
|
<named> is the constructor for a named-tuple (not-specified yet).
|
|
|
|
float
|
|
``'d'`` <--> Python float
|
|
complex double
|
|
``'Zd'`` <--> Python complex
|
|
RGB Pixel data
|
|
``'BBB'`` <--> (int, int, int)
|
|
``'B:r: B:g: B:b:'`` <--> <named>((int, int, int), ('r','g','b'))
|
|
|
|
Mixed endian (weird but possible)
|
|
``'>i:big: <i:little:'`` <--> <named>((int, int), ('big', 'little'))
|
|
|
|
Nested structure
|
|
::
|
|
|
|
struct {
|
|
int ival;
|
|
struct {
|
|
unsigned short sval;
|
|
unsigned char bval;
|
|
unsigned char cval;
|
|
} sub;
|
|
}
|
|
"""i:ival:
|
|
T{
|
|
H:sval:
|
|
B:bval:
|
|
B:cval:
|
|
}:sub:
|
|
"""
|
|
Nested array
|
|
::
|
|
|
|
struct {
|
|
int ival;
|
|
double data[16*4];
|
|
}
|
|
"""i:ival:
|
|
(16,4)d:data:
|
|
"""
|
|
|
|
Note, that in the last example, the C-structure compared against is
|
|
intentionally a 1-d array and not a 2-d array data[16][4]. The reason
|
|
for this is to avoid the confusions between static multi-dimensional
|
|
arrays in C (which are laid out contiguously) and dynamic
|
|
multi-dimensional arrays which use the same syntax to access elements,
|
|
data[0][1], but whose memory is not necessarily contiguous. The
|
|
struct-syntax *always* uses contiguous memory and the
|
|
multi-dimensional character is information about the memory to be
|
|
communicated by the exporter.
|
|
|
|
In other words, the struct-syntax description does not have to match
|
|
the C-syntax exactly as long as it describes the same memory layout.
|
|
The fact that a C-compiler would think of the memory as a 1-d array of
|
|
doubles is irrelevant to the fact that the exporter wanted to
|
|
communicate to the consumer that this field of the memory should be
|
|
thought of as a 2-d array where a new dimension is considered after
|
|
every 4 elements.
|
|
|
|
|
|
Code to be affected
|
|
===================
|
|
|
|
All objects and modules in Python that export or consume the old
|
|
buffer interface will be modified. Here is a partial list.
|
|
|
|
* buffer object
|
|
* bytes object
|
|
* string object
|
|
* unicode object
|
|
* array module
|
|
* struct module
|
|
* mmap module
|
|
* ctypes module
|
|
|
|
Anything else using the buffer API.
|
|
|
|
|
|
Issues and Details
|
|
==================
|
|
|
|
It is intended that this PEP will be back-ported to Python 2.6 by
|
|
adding the C-API and the two functions to the existing buffer
|
|
protocol.
|
|
|
|
Previous versions of this PEP proposed a read/write locking scheme,
|
|
but it was later perceived as a) too complicated for common simple use
|
|
cases that do not require any locking and b) too simple for use cases
|
|
that required concurrent read/write access to a buffer with changing,
|
|
short-living locks. It is therefore left to users to implement their
|
|
own specific locking scheme around buffer objects if they require
|
|
consistent views across concurrent read/write access. A future PEP
|
|
may be proposed which includes a separate locking API after some
|
|
experience with these user-schemes is obtained
|
|
|
|
The sharing of strided memory and suboffsets is new and can be seen as
|
|
a modification of the multiple-segment interface. It is motivated by
|
|
NumPy and the PIL. NumPy objects should be able to share their
|
|
strided memory with code that understands how to manage strided memory
|
|
because strided memory is very common when interfacing with compute
|
|
libraries.
|
|
|
|
Also, with this approach it should be possible to write generic code
|
|
that works with both kinds of memory without copying.
|
|
|
|
Memory management of the format string, the shape array, the strides
|
|
array, and the suboffsets array in the bufferinfo structure is always
|
|
the responsibility of the exporting object. The consumer should not
|
|
set these pointers to any other memory or try to free them.
|
|
|
|
Several ideas were discussed and rejected:
|
|
|
|
Having a "releaser" object whose release-buffer was called. This
|
|
was deemed unacceptable because it caused the protocol to be
|
|
asymmetric (you called release on something different than you
|
|
"got" the buffer from). It also complicated the protocol without
|
|
providing a real benefit.
|
|
|
|
Passing all the struct variables separately into the function.
|
|
This had the advantage that it allowed one to set NULL to
|
|
variables that were not of interest, but it also made the function
|
|
call more difficult. The flags variable allows the same
|
|
ability of consumers to be "simple" in how they call the protocol.
|
|
|
|
|
|
Code
|
|
====
|
|
|
|
The authors of the PEP promise to contribute and maintain the code for
|
|
this proposal but will welcome any help.
|
|
|
|
|
|
Examples
|
|
========
|
|
|
|
Ex. 1
|
|
-----------
|
|
|
|
This example shows how an image object that uses contiguous lines might expose its buffer::
|
|
|
|
struct rgba {
|
|
unsigned char r, g, b, a;
|
|
};
|
|
|
|
struct ImageObject {
|
|
PyObject_HEAD;
|
|
...
|
|
struct rgba** lines;
|
|
Py_ssize_t height;
|
|
Py_ssize_t width;
|
|
Py_ssize_t shape_array[2];
|
|
Py_ssize_t stride_array[2];
|
|
Py_ssize_t view_count;
|
|
};
|
|
|
|
"lines" points to malloced 1-D array of ``(struct rgba*)``. Each pointer
|
|
in THAT block points to a separately malloced array of ``(struct rgba)``.
|
|
|
|
In order to access, say, the red value of the pixel at x=30, y=50, you'd use "lines[50][30].r".
|
|
|
|
So what does ImageObject's getbuffer do? Leaving error checking out::
|
|
|
|
int Image_getbuffer(PyObject *self, Py_buffer *view, int flags) {
|
|
|
|
static Py_ssize_t suboffsets[2] = { 0, -1};
|
|
|
|
view->buf = self->lines;
|
|
view->len = self->height*self->width;
|
|
view->readonly = 0;
|
|
view->ndims = 2;
|
|
self->shape_array[0] = height;
|
|
self->shape_array[1] = width;
|
|
view->shape = &self->shape_array;
|
|
self->stride_array[0] = sizeof(struct rgba*);
|
|
self->stride_array[1] = sizeof(struct rgba);
|
|
view->strides = &self->stride_array;
|
|
view->suboffsets = suboffsets;
|
|
|
|
self->view_count ++;
|
|
|
|
return 0;
|
|
}
|
|
|
|
|
|
int Image_releasebuffer(PyObject *self, Py_buffer *view) {
|
|
self->view_count--;
|
|
return 0;
|
|
}
|
|
|
|
|
|
Ex. 2
|
|
-----------
|
|
|
|
This example shows how an object that wants to expose a contiguous
|
|
chunk of memory (which will never be re-allocated while the object is
|
|
alive) would do that.
|
|
|
|
::
|
|
|
|
int myobject_getbuffer(PyObject *self, Py_buffer *view, int flags) {
|
|
|
|
void *buf;
|
|
Py_ssize_t len;
|
|
int readonly=0;
|
|
|
|
buf = /* Point to buffer */
|
|
len = /* Set to size of buffer */
|
|
readonly = /* Set to 1 if readonly */
|
|
|
|
return PyObject_FillBufferInfo(view, buf, len, readonly, flags);
|
|
}
|
|
|
|
/* No releasebuffer is necessary because the memory will never
|
|
be re-allocated
|
|
*/
|
|
|
|
Ex. 3
|
|
-----------
|
|
|
|
A consumer that wants to only get a simple contiguous chunk of bytes
|
|
from a Python object, obj would do the following:
|
|
|
|
::
|
|
|
|
Py_buffer view;
|
|
int ret;
|
|
|
|
if (PyObject_GetBuffer(obj, &view, Py_BUF_SIMPLE) < 0) {
|
|
/* error return */
|
|
}
|
|
|
|
/* Now, view.buf is the pointer to memory
|
|
view.len is the length
|
|
view.readonly is whether or not the memory is read-only.
|
|
*/
|
|
|
|
|
|
/* After using the information and you don't need it anymore */
|
|
|
|
if (PyBuffer_Release(obj, &view) < 0) {
|
|
/* error return */
|
|
}
|
|
|
|
|
|
Ex. 4
|
|
-----------
|
|
|
|
A consumer that wants to be able to use any object's memory but is
|
|
writing an algorithm that only handle contiguous memory could do the following:
|
|
|
|
::
|
|
|
|
void *buf;
|
|
Py_ssize_t len;
|
|
char *format;
|
|
int copy;
|
|
|
|
copy = PyObject_GetContiguous(obj, &buf, &len, &format, 0, 'A');
|
|
if (copy < 0) {
|
|
/* error return */
|
|
}
|
|
|
|
/* process memory pointed to by buffer if format is correct */
|
|
|
|
/* Optional:
|
|
|
|
if, after processing, we want to copy data from buffer back
|
|
into the object
|
|
|
|
we could do
|
|
*/
|
|
|
|
if (PyObject_CopyToObject(obj, buf, len, 'A') < 0) {
|
|
/* error return */
|
|
}
|
|
|
|
/* Make sure that if a copy was made, the memory is freed */
|
|
if (copy == 1) PyMem_Free(buf);
|
|
|
|
|
|
Copyright
|
|
=========
|
|
|
|
This PEP is placed in the public domain.
|
|
|