2007-04-09 01:51:32 -04:00
|
|
|
PEP: 3118
|
|
|
|
Title: Revising the buffer protocol
|
|
|
|
Version: $Revision$
|
|
|
|
Last-Modified: $Date$
|
|
|
|
Authors: Travis Oliphant <oliphant@ee.byu.edu>, Carl Banks <pythondev@aerojockey.com>
|
|
|
|
Status: Draft
|
|
|
|
Type: Standards Track
|
|
|
|
Content-Type: text/x-rst
|
|
|
|
Created: 28-Aug-2006
|
|
|
|
Python-Version: 3000
|
|
|
|
|
|
|
|
Abstract
|
|
|
|
========
|
|
|
|
|
|
|
|
This PEP proposes re-designing the buffer interface (PyBufferProcs
|
|
|
|
function pointers) to improve the way Python allows memory sharing
|
|
|
|
in Python 3.0
|
|
|
|
|
|
|
|
In particular, it is proposed that the character buffer portion
|
|
|
|
of the API be elminated and the multiple-segment portion be
|
|
|
|
re-designed in conjunction with allowing for strided memory
|
|
|
|
to be shared. In addition, the new buffer interface will
|
|
|
|
allow the sharing of any multi-dimensional nature of the
|
|
|
|
memory and what data-format the memory contains.
|
|
|
|
|
|
|
|
This interface will allow any extension module to either
|
|
|
|
create objects that share memory or create algorithms that
|
|
|
|
use and manipulate raw memory from arbitrary objects that
|
|
|
|
export the interface.
|
|
|
|
|
|
|
|
|
|
|
|
Rationale
|
|
|
|
=========
|
|
|
|
|
|
|
|
The Python 2.X buffer protocol allows different Python types to
|
|
|
|
exchange a pointer to a sequence of internal buffers. This
|
|
|
|
functionality is *extremely* useful for sharing large segments of
|
|
|
|
memory between different high-level objects, but it is too limited and
|
|
|
|
has issues:
|
|
|
|
|
|
|
|
1. There is the little used "sequence-of-segments" option
|
|
|
|
(bf_getsegcount) that is not well motivated.
|
|
|
|
|
|
|
|
2. There is the apparently redundant character-buffer option
|
|
|
|
(bf_getcharbuffer)
|
|
|
|
|
|
|
|
3. There is no way for a consumer to tell the buffer-API-exporting
|
|
|
|
object it is "finished" with its view of the memory and
|
|
|
|
therefore no way for the exporting object to be sure that it is
|
|
|
|
safe to reallocate the pointer to the memory that it owns (for
|
|
|
|
example, the array object reallocating its memory after sharing
|
|
|
|
it with the buffer object which held the original pointer led
|
|
|
|
to the infamous buffer-object problem).
|
|
|
|
|
|
|
|
4. Memory is just a pointer with a length. There is no way to
|
|
|
|
describe what is "in" the memory (float, int, C-structure, etc.)
|
|
|
|
|
|
|
|
5. There is no shape information provided for the memory. But,
|
|
|
|
several array-like Python types could make use of a standard
|
|
|
|
way to describe the shape-interpretation of the memory
|
|
|
|
(wxPython, GTK, pyQT, CVXOPT, PyVox, Audio and Video
|
|
|
|
Libraries, ctypes, NumPy, data-base interfaces, etc.)
|
|
|
|
|
|
|
|
6. There is no way to share discontiguous memory (except through
|
|
|
|
the sequence of segments notion).
|
|
|
|
|
|
|
|
There are two widely used libraries that use the concept of
|
|
|
|
discontiguous memory: PIL and NumPy. Their view of discontiguous
|
|
|
|
arrays is different, though. The proposed buffer interface allows
|
|
|
|
sharing of either memory model. Exporters will use only one
|
|
|
|
approach and consumers may choose to support discontiguous
|
|
|
|
arrays of each type however they choose.
|
|
|
|
|
|
|
|
NumPy uses the notion of constant striding in each dimension as its
|
|
|
|
basic concept of an array. With this concept, a simple sub-region
|
|
|
|
of a larger array can be described without copying the data. T
|
|
|
|
Thus, stride information is the additional information that must be
|
|
|
|
shared.
|
|
|
|
|
|
|
|
The PIL uses a more opaque memory representation. Sometimes an
|
|
|
|
image is contained in a contiguous segment of memory, but sometimes
|
|
|
|
it is contained in an array of pointers to the contiguous segments
|
|
|
|
(usually lines) of the image. The PIL is where the idea of multiple
|
|
|
|
buffer segments in the original buffer interface came from.
|
|
|
|
|
|
|
|
NumPy's strided memory model is used more often in computational
|
|
|
|
libraries and because it is so simple it makes sense to support
|
|
|
|
memory sharing using this model. The PIL memory model is sometimes
|
|
|
|
used in C-code where a 2-d array can be then accessed using double
|
|
|
|
pointer indirection: e.g. image[i][j].
|
|
|
|
|
|
|
|
The buffer interface should allow the object to export either of these
|
|
|
|
memory models. Consumers are free to either require contiguous memory
|
|
|
|
or write code to handle one or both of these memory models.
|
|
|
|
|
|
|
|
Proposal Overview
|
|
|
|
=================
|
|
|
|
|
|
|
|
* Eliminate the char-buffer and multiple-segment sections of the
|
|
|
|
buffer-protocol.
|
|
|
|
|
|
|
|
* Unify the read/write versions of getting the buffer.
|
|
|
|
|
|
|
|
* Add a new function to the interface that should be called when
|
|
|
|
the consumer object is "done" with the memory area.
|
|
|
|
|
|
|
|
* Add a new variable to allow the interface to describe what is in
|
|
|
|
memory (unifying what is currently done now in struct and
|
|
|
|
array)
|
|
|
|
|
|
|
|
* Add a new variable to allow the protocol to share shape information
|
|
|
|
|
|
|
|
* Add a new variable for sharing stride information
|
|
|
|
|
|
|
|
* Add a new mechanism for sharing arrays that must
|
|
|
|
be accessed using pointer indirection.
|
|
|
|
|
|
|
|
* Fix all objects in the core and the standard library to conform
|
|
|
|
to the new interface
|
|
|
|
|
|
|
|
* Extend the struct module to handle more format specifiers
|
|
|
|
|
|
|
|
* Extend the buffer object into a new memory object which places
|
|
|
|
a Python veneer around the buffer interface.
|
|
|
|
|
|
|
|
* Add a few functions to make it easy to copy contiguous data
|
|
|
|
in and out of object supporting the buffer interface.
|
|
|
|
|
|
|
|
Specification
|
|
|
|
=============
|
|
|
|
|
|
|
|
While the new specification allows for complicated memory sharing.
|
|
|
|
Simple contiguous buffers of bytes can still be obtained from an
|
|
|
|
object. In fact, the new protocol allows a standard mechanism for
|
|
|
|
doing this even if the original object is not represented as a
|
|
|
|
contiguous chunk of memory.
|
|
|
|
|
2007-04-10 05:45:04 -04:00
|
|
|
The easiest way to obtain a simple contiguous chunk of memory is
|
|
|
|
to use the provided C-API to obtain a chunk of memory.
|
2007-04-09 01:51:32 -04:00
|
|
|
|
|
|
|
|
|
|
|
Change the PyBufferProcs structure to
|
|
|
|
|
|
|
|
::
|
|
|
|
|
|
|
|
typedef struct {
|
|
|
|
getbufferproc bf_getbuffer;
|
|
|
|
releasebufferproc bf_releasebuffer;
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
::
|
|
|
|
|
2007-04-10 05:45:04 -04:00
|
|
|
typedef int (*getbufferproc)(PyObject *obj, struct bufferinfo *view, int flags)
|
2007-04-09 01:51:32 -04:00
|
|
|
|
|
|
|
This function returns 0 on success and -1 on failure (and raises an
|
|
|
|
error). The first variable is the "exporting" object. The second
|
|
|
|
argument is the address to a bufferinfo structure. If view is NULL,
|
|
|
|
then no information is returned but a lock on the memory is still
|
2007-04-10 05:45:04 -04:00
|
|
|
obtained. In this case, the corresponding releasebuffer should also
|
|
|
|
be called with NULL.
|
|
|
|
|
|
|
|
The third argument indicates what kind of buffer the exporter is allowed to return. It tells the
|
|
|
|
exporter what elements the bufferinfo structure the consumer is going to make use of. This
|
|
|
|
allows the exporter to simplify and/or raise an error if it can't support the operation.
|
|
|
|
|
|
|
|
It also allows the caller to make a request for a simple "view" and
|
|
|
|
receive it or have an error raised if it's not possible.
|
|
|
|
|
|
|
|
All of the following assume that at least buf, len, and readonly will be
|
|
|
|
utilized by the caller.
|
|
|
|
|
|
|
|
Py_BUF_SIMPLE
|
|
|
|
The returned buffer will only be assumed to be readable (the object
|
|
|
|
may or may not have writeable memory). Only the buf, len, and
|
|
|
|
readonly variables may be accessed. The format will be
|
|
|
|
assumed to be unsigned bytes. This is a "stand-alone" flag constant.
|
|
|
|
It never needs to be \|'d to the others.
|
|
|
|
|
|
|
|
Py_BUF_WRITEABLE
|
|
|
|
The returned buffer must be writeable. If it cannot be, then raise an error.
|
|
|
|
|
|
|
|
Py_BUF_READONLY
|
|
|
|
The returned buffer must be readonly and the underlying object should make
|
|
|
|
its memory readonly if that is possible.
|
|
|
|
|
|
|
|
Py_BUF_FORMAT
|
|
|
|
The consumer will be using the format string information so make sure that
|
|
|
|
member is filled correctly.
|
|
|
|
|
|
|
|
Py_BUF_SHAPE
|
|
|
|
The consumer can (and might) make use of using the ndims and shape members of the structure
|
|
|
|
so make sure they are filled in correctly.
|
|
|
|
|
|
|
|
Py_BUF_STRIDES (implies SHAPE)
|
|
|
|
The consumer can (and might) make use of the strides member of the structure (as well
|
|
|
|
as ndims and shape)
|
|
|
|
|
|
|
|
Py_BUF_OFFSETS (implies STRIDES)
|
|
|
|
The consumer can (and might) make use of the suboffsets member (as well as
|
|
|
|
ndims, shape, and strides)
|
|
|
|
|
|
|
|
Thus, the consumer simply wanting an contiguous chunk of bytes from
|
|
|
|
the object would use Py_BUF_SIMPLE, while a consumer that understands
|
|
|
|
how to make use of the most complicated cases would use
|
|
|
|
Py_BUF_OFFSETS.
|
|
|
|
|
|
|
|
There is a C-API that simple exporting objects can use to fill-in the
|
|
|
|
buffer info structure correctly according to the provided flags if a
|
|
|
|
contiguous chunk of memory is all that can be exported.
|
|
|
|
|
2007-04-09 01:51:32 -04:00
|
|
|
|
|
|
|
The bufferinfo structure is::
|
|
|
|
|
|
|
|
struct bufferinfo {
|
|
|
|
void *buf;
|
|
|
|
Py_ssize_t len;
|
|
|
|
int readonly;
|
|
|
|
char *format;
|
|
|
|
int ndims;
|
|
|
|
Py_ssize_t *shape;
|
|
|
|
Py_ssize_t *strides;
|
|
|
|
Py_ssize_t *suboffsets;
|
2007-04-10 05:45:04 -04:00
|
|
|
void *internal;
|
2007-04-09 01:51:32 -04:00
|
|
|
};
|
|
|
|
|
2007-04-10 05:45:04 -04:00
|
|
|
Before calling this function, the bufferinfo structure can be filled with
|
|
|
|
whatever. Upon return from getbufferproc, the bufferinfo structure is filled in
|
2007-04-09 01:51:32 -04:00
|
|
|
with relevant information about the buffer. This same bufferinfo
|
|
|
|
structure must be passed to bf_releasebuffer (if available) when the
|
|
|
|
consumer is done with the memory. The caller is responsible for
|
2007-04-10 05:45:04 -04:00
|
|
|
keeping a reference to obj until releasebuffer is called (i.e. this
|
|
|
|
call does not alter the reference count of obj).
|
2007-04-09 01:51:32 -04:00
|
|
|
|
|
|
|
The members of the bufferinfo structure are:
|
|
|
|
|
|
|
|
buf
|
|
|
|
a pointer to the start of the memory for the object
|
|
|
|
|
|
|
|
len
|
|
|
|
the total bytes of memory the object uses. This should be the
|
|
|
|
same as the product of the shape array multiplied by the number of
|
|
|
|
bytes per item of memory.
|
|
|
|
|
|
|
|
readonly
|
|
|
|
an integer variable to hold whether or not the memory is
|
|
|
|
readonly. 1 means the memory is readonly, zero means the
|
|
|
|
memory is writeable.
|
|
|
|
|
2007-04-10 05:45:04 -04:00
|
|
|
format
|
|
|
|
a NULL-terminated format-string (following the struct-style syntax
|
|
|
|
including extensions) indicating what is in each element of
|
|
|
|
memory. The number of elements is len / itemsize, where itemsize
|
|
|
|
is the number of bytes implied by the format. For standard
|
|
|
|
unsigned bytes use a format string of "B".
|
2007-04-09 01:51:32 -04:00
|
|
|
|
|
|
|
ndims
|
|
|
|
a variable storing the number of dimensions the memory represents.
|
2007-04-10 05:45:04 -04:00
|
|
|
Must be >=0.
|
2007-04-09 01:51:32 -04:00
|
|
|
|
|
|
|
shape
|
|
|
|
an array of ``Py_ssize_t`` of length ``ndims`` indicating the
|
|
|
|
shape of the memory as an N-D array. Note that ``((*shape)[0] *
|
2007-04-10 05:45:04 -04:00
|
|
|
... * (*shape)[ndims-1])*itemsize = len``.
|
2007-04-09 01:51:32 -04:00
|
|
|
|
|
|
|
strides
|
|
|
|
address of a ``Py_ssize_t*`` variable that will be filled with a
|
|
|
|
pointer to an array of ``Py_ssize_t`` of length ``*ndims``
|
|
|
|
indicating the number of bytes to skip to get to the next element
|
2007-04-10 05:45:04 -04:00
|
|
|
in each dimension. For C-style contiguous arrays (where the
|
|
|
|
last-dimension varies the fastest) this must be filled in.
|
2007-04-09 01:51:32 -04:00
|
|
|
|
|
|
|
suboffsets
|
|
|
|
address of a ``Py_ssize_t *`` variable that will be filled with a
|
|
|
|
pointer to an array of ``Py_ssize_t`` of length ``*ndims``. If
|
|
|
|
these suboffset numbers are >=0, then the value stored along the
|
|
|
|
indicated dimension is a pointer and the suboffset value dictates
|
|
|
|
how many bytes to add to the pointer after de-referencing. A
|
|
|
|
suboffset value that it negative indicates that no de-referencing
|
|
|
|
should occur (striding in a contiguous memory block). If all
|
|
|
|
suboffsets are negative (i.e. no de-referencing is needed, then
|
|
|
|
this must be NULL.
|
|
|
|
|
|
|
|
For clarity, here is a function that returns a pointer to the
|
|
|
|
element in an N-D array pointed to by an N-dimesional index when
|
|
|
|
there are both strides and suboffsets.::
|
|
|
|
|
|
|
|
void* get_item_pointer(int ndim, void* buf, Py_ssize_t* strides,
|
|
|
|
Py_ssize_t* suboffsets, Py_ssize_t *indices) {
|
|
|
|
char* pointer = (char*)buf;
|
|
|
|
int i;
|
|
|
|
for (i = 0; i < ndim; i++) {
|
|
|
|
pointer += strides[i]*indices[i];
|
|
|
|
if (suboffsets[i] >=0 ) {
|
|
|
|
pointer = *((char**)pointer) + suboffsets[i];
|
|
|
|
}
|
|
|
|
}
|
|
|
|
return (void*)pointer;
|
|
|
|
}
|
|
|
|
|
|
|
|
Notice the suboffset is added "after" the dereferencing occurs.
|
|
|
|
Thus slicing in the ith dimension would add to the suboffsets in
|
2007-04-10 05:45:04 -04:00
|
|
|
the (i-1)st dimension. Slicing in the first dimension would change
|
2007-04-09 01:51:32 -04:00
|
|
|
the location of the starting pointer directly (i.e. buf would
|
|
|
|
be modified).
|
2007-04-10 05:45:04 -04:00
|
|
|
|
|
|
|
internal
|
|
|
|
This is for use internally by the exporting object. For example,
|
|
|
|
this might be re-cast as an integer by the exporter and used to
|
|
|
|
store flags about whether or not the shape, strides, and suboffsets
|
|
|
|
arrays must be freed when the buffer is released. The consumer
|
|
|
|
should never touch this value.
|
2007-04-09 01:51:32 -04:00
|
|
|
|
|
|
|
|
|
|
|
The exporter is responsible for making sure the memory pointed to by
|
|
|
|
buf, format, shape, strides, and suboffsets is valid until
|
|
|
|
releasebuffer is called. If the exporter wants to be able to change
|
|
|
|
shape, strides, and/or suboffsets before releasebuffer is called then
|
2007-04-10 05:45:04 -04:00
|
|
|
it should allocate those arrays when getbuffer is called (pointing to
|
|
|
|
them in the buffer-info structure provided) and free them when
|
|
|
|
releasebuffer is called.
|
2007-04-09 01:51:32 -04:00
|
|
|
|
|
|
|
|
2007-04-10 05:45:04 -04:00
|
|
|
The same bufferinfo struct should be used in the release-buffer
|
2007-04-09 01:51:32 -04:00
|
|
|
interface call. The caller is responsible for the memory of the
|
2007-04-10 05:45:04 -04:00
|
|
|
bufferinfo structure itself.
|
2007-04-09 01:51:32 -04:00
|
|
|
|
|
|
|
``typedef int (*releasebufferproc)(PyObject *obj, struct bufferinfo *view)``
|
|
|
|
Callers of getbufferproc must make sure that this function is
|
|
|
|
called when memory previously acquired from the object is no
|
|
|
|
longer needed. The exporter of the interface must make sure that
|
|
|
|
any memory pointed to in the bufferinfo structure remains valid
|
|
|
|
until releasebuffer is called.
|
|
|
|
|
|
|
|
Both of these routines are optional for a type object
|
|
|
|
|
|
|
|
If the releasebuffer function is not provided then it does not ever
|
|
|
|
need to be called.
|
|
|
|
|
|
|
|
Exporters will need to define a releasebuffer function if they can
|
|
|
|
re-allocate their memory, strides, shape, suboffsets, or format
|
|
|
|
variables which they might share through the struct bufferinfo.
|
|
|
|
Several mechanisms could be used to keep track of how many getbuffer
|
|
|
|
calls have been made and shared. Either a single variable could be
|
|
|
|
used to keep track of how many "views" have been exported, or a
|
|
|
|
linked-list of bufferinfo structures filled in could be maintained in
|
2007-04-10 05:45:04 -04:00
|
|
|
each object.
|
|
|
|
|
|
|
|
All that is specifically required by the exporter, however, is to
|
|
|
|
ensure that any memory shared through the bufferinfo structure remains
|
|
|
|
valid until releasebuffer is called on the bufferinfo structure.
|
2007-04-09 01:51:32 -04:00
|
|
|
|
|
|
|
|
|
|
|
New C-API calls are proposed
|
|
|
|
============================
|
|
|
|
|
|
|
|
::
|
|
|
|
|
|
|
|
int PyObject_CheckBuffer(PyObject *obj)
|
|
|
|
|
|
|
|
Return 1 if the getbuffer function is available otherwise 0.
|
|
|
|
|
|
|
|
::
|
|
|
|
|
2007-04-10 05:45:04 -04:00
|
|
|
int PyObject_GetBuffer(PyObject *obj, struct bufferinfo *view, int flags)
|
|
|
|
|
|
|
|
This is a C-API version of the getbuffer function call. It checks to
|
|
|
|
make sure object has the required function pointer and issues the
|
|
|
|
call. Returns -1 and raises an error on failure and returns 0 on
|
|
|
|
success.
|
|
|
|
|
|
|
|
::
|
|
|
|
|
|
|
|
int PyObject_ReleaseBuffer(PyObject *obj, struct bufferinfo *view)
|
|
|
|
|
|
|
|
This is a C-API version of the releasebuffer function call. It checks to
|
|
|
|
make sure the object has the required function pointer and issues the call. Returns 0
|
|
|
|
on success and -1 (with an error raised) on failure. This function always
|
|
|
|
succeeds if there is no releasebuffer function for the object.
|
|
|
|
|
|
|
|
::
|
|
|
|
|
|
|
|
PyObject *PyObject_GetMemoryView(PyObject *obj)
|
2007-04-09 01:51:32 -04:00
|
|
|
|
|
|
|
Return a memory-view object from an object that defines the buffer interface.
|
|
|
|
If make_ro is non-zero then request that the memory is made read-only until
|
|
|
|
release buffer is called.
|
|
|
|
|
|
|
|
A memory-view object is an extended buffer object that should replace
|
|
|
|
the buffer object in Python 3K. It's C-structure is::
|
|
|
|
|
|
|
|
typedef struct {
|
|
|
|
PyObject_HEAD
|
|
|
|
PyObject *base;
|
|
|
|
struct bufferinfo view;
|
|
|
|
int itemsize;
|
|
|
|
int flags;
|
|
|
|
} PyMemoryViewObject;
|
|
|
|
|
|
|
|
This is very similar to the current buffer object except offset has
|
|
|
|
been removed because ptr can just be modified by offset and a single
|
2007-04-10 05:45:04 -04:00
|
|
|
offset is not sufficient for the sub-offsets. Also the hash has been
|
|
|
|
removed because using the buffer object as a hash even if it is
|
|
|
|
read-only is rarely useful.
|
2007-04-09 01:51:32 -04:00
|
|
|
|
|
|
|
Also, the format, ndims, shape, strides, and suboffsets have been
|
|
|
|
added. These additions will allow multi-dimensional slicing of the
|
|
|
|
memory-view object which can be added at some point. This object
|
|
|
|
always owns it's own shape, strides, and suboffsets arrays and it's
|
|
|
|
own format string, but always borrows the memory from the object
|
|
|
|
pointed to by base.
|
|
|
|
|
|
|
|
The itemsize is a convenience and specifies the number of bytes
|
|
|
|
indicated by the format string if positive.
|
|
|
|
|
|
|
|
This object never reallocates ptr, shape, strides, subboffsets or
|
|
|
|
format and therefore does not need to keep track of how many views it
|
|
|
|
has exported.
|
|
|
|
|
2007-04-10 05:45:04 -04:00
|
|
|
It exports a view using the base object. It releases a view by
|
|
|
|
releasing the view on the base object. Because, it will never
|
|
|
|
re-allocate memory, it does not need to keep track of how many it has
|
|
|
|
exported but simple reference counting will suffice.
|
2007-04-09 01:51:32 -04:00
|
|
|
|
|
|
|
::
|
|
|
|
|
|
|
|
int PyObject_SizeFromFormat(char *)
|
|
|
|
|
|
|
|
Return the implied itemsize of the data-format area from a struct-style
|
|
|
|
description.
|
|
|
|
|
|
|
|
::
|
|
|
|
|
|
|
|
int PyObject_GetContiguous(PyObject *obj, void **buf, Py_ssize_t *len,
|
|
|
|
int fortran)
|
|
|
|
|
|
|
|
Return a contiguous chunk of memory representing the buffer. If a
|
|
|
|
copy is made then return 1. If no copy was needed return 0. If an
|
|
|
|
error occurred in probing the buffer interface, then return -1. The
|
|
|
|
contiguous chunk of memory is pointed to by ``*buf`` and the length of
|
|
|
|
that memory is ``*len``. If the object is multi-dimensional, then if
|
|
|
|
fortran is 1, the first dimension of the underlying array will vary
|
|
|
|
the fastest in the buffer. If fortran is 0, then the last dimension
|
|
|
|
will vary the fastest (C-style contiguous). If fortran is -1, then it
|
2007-04-10 05:45:04 -04:00
|
|
|
does not matter and you will get whatever the object decides is more
|
|
|
|
efficient.
|
2007-04-09 01:51:32 -04:00
|
|
|
|
|
|
|
::
|
|
|
|
|
|
|
|
int PyObject_CopyToObject(PyObject *obj, void *buf, Py_ssize_t len,
|
|
|
|
int fortran)
|
|
|
|
|
|
|
|
Copy ``len`` bytes of data pointed to by the contiguous chunk of
|
|
|
|
memory pointed to by ``buf`` into the buffer exported by obj. Return
|
|
|
|
0 on success and return -1 and raise an error on failure. If the
|
|
|
|
object does not have a writeable buffer, then an error is raised. If
|
|
|
|
fortran is 1, then if the object is multi-dimensional, then the data
|
|
|
|
will be copied into the array in Fortran-style (first dimension varies
|
|
|
|
the fastest). If fortran is 0, then the data will be copied into the
|
|
|
|
array in C-style (last dimension varies the fastest). If fortran is -1, then
|
2007-04-10 05:45:04 -04:00
|
|
|
it does not matter and the copy will be made in whatever way is more
|
|
|
|
efficient.
|
2007-04-09 01:51:32 -04:00
|
|
|
|
|
|
|
The last two C-API calls allow a standard way of getting data in and
|
|
|
|
out of Python objects into contiguous memory areas no matter how it is
|
|
|
|
actually stored. These calls use the extended buffer interface to perform
|
|
|
|
their work.
|
|
|
|
|
|
|
|
::
|
|
|
|
|
2007-04-10 05:45:04 -04:00
|
|
|
int PyObject_IsContiguous(struct bufferinfo *view, int fortran);
|
2007-04-09 01:51:32 -04:00
|
|
|
|
2007-04-10 05:45:04 -04:00
|
|
|
Return 1 if the memory defined by the view object is C-style (fortran = 0)
|
|
|
|
or Fortran-style (fortran = 1) contiguous. Return 0 otherwise.
|
2007-04-09 01:51:32 -04:00
|
|
|
|
|
|
|
::
|
|
|
|
|
|
|
|
void PyObject_FillContiguousStrides(int *ndims, Py_ssize_t *shape,
|
|
|
|
int itemsize,
|
2007-04-10 05:45:04 -04:00
|
|
|
Py_ssize_t *strides, int fortran)
|
2007-04-09 01:51:32 -04:00
|
|
|
|
2007-04-10 05:45:04 -04:00
|
|
|
Fill the strides array with byte-strides of a contiguous (C-style if
|
|
|
|
fortran is 0 or Fortran-style if fortran is 1) array of the given
|
|
|
|
shape with the given number of bytes per element.
|
2007-04-09 01:51:32 -04:00
|
|
|
|
2007-04-10 05:45:04 -04:00
|
|
|
::
|
|
|
|
|
|
|
|
int PyObject_FillBufferInfo(struct bufferinfo *view, void *buf, Py_ssize_t len,
|
|
|
|
int readonly, int infoflags)
|
|
|
|
|
|
|
|
Fills in a buffer-info structure correctly for an exporter that can only share
|
|
|
|
a contiguous chunk of memory of "unsigned bytes" of the given length. Returns 0 on success
|
|
|
|
and -1 (with raising an error) on error
|
2007-04-09 01:51:32 -04:00
|
|
|
|
|
|
|
|
|
|
|
Additions to the struct string-syntax
|
|
|
|
=====================================
|
|
|
|
|
|
|
|
The struct string-syntax is missing some characters to fully
|
|
|
|
implement data-format descriptions already available elsewhere (in
|
|
|
|
ctypes and NumPy for example). The Python 2.5 specification is
|
|
|
|
at http://docs.python.org/lib/module-struct.html
|
|
|
|
|
|
|
|
Here are the proposed additions:
|
|
|
|
|
|
|
|
|
|
|
|
================ ===========
|
|
|
|
Character Description
|
|
|
|
================ ===========
|
|
|
|
't' bit (number before states how many bits)
|
|
|
|
'?' platform _Bool type
|
|
|
|
'g' long double
|
|
|
|
'c' ucs-1 (latin-1) encoding
|
|
|
|
'u' ucs-2
|
|
|
|
'w' ucs-4
|
|
|
|
'O' pointer to Python Object
|
|
|
|
'Z' complex (whatever the next specifier is)
|
|
|
|
'&' specific pointer (prefix before another charater)
|
|
|
|
'T{}' structure (detailed layout inside {})
|
|
|
|
'(k1,k2,...,kn)' multi-dimensional array of whatever follows
|
|
|
|
':name:' optional name of the preceeding element
|
|
|
|
'X{}' pointer to a function (optional function
|
|
|
|
signature inside {})
|
2007-04-10 05:45:04 -04:00
|
|
|
' \n\t' ignored (allow better readability) -- this may already be true
|
2007-04-09 01:51:32 -04:00
|
|
|
================ ===========
|
|
|
|
|
|
|
|
The struct module will be changed to understand these as well and
|
|
|
|
return appropriate Python objects on unpacking. Un-packing a
|
2007-04-10 05:45:04 -04:00
|
|
|
long-double will return a decimal object or a ctypes long-double.
|
|
|
|
Unpacking 'u' or 'w' will return Python unicode. Unpacking a
|
|
|
|
multi-dimensional array will return a list of lists. Un-packing a
|
|
|
|
pointer will return a ctypes pointer object. Un-packing a bit will
|
|
|
|
return a Python Bool. Spaces in the struct-string syntax will be
|
|
|
|
ignored. Unpacking a named-object will return a Python class with
|
|
|
|
attributes having those names.
|
2007-04-09 01:51:32 -04:00
|
|
|
|
|
|
|
Endian-specification ('=','>','<') is also allowed inside the
|
|
|
|
string so that it can change if needed. The previously-specified
|
|
|
|
endian string is in force until changed. The default endian is '='.
|
|
|
|
|
|
|
|
According to the struct-module, a number can preceed a character
|
|
|
|
code to specify how many of that type there are. The
|
|
|
|
(k1,k2,...,kn) extension also allows specifying if the data is
|
|
|
|
supposed to be viewed as a (C-style contiguous, last-dimension
|
|
|
|
varies the fastest) multi-dimensional array of a particular format.
|
|
|
|
|
|
|
|
Functions should be added to ctypes to create a ctypes object from
|
|
|
|
a struct description, and add long-double, and ucs-2 to ctypes.
|
|
|
|
|
|
|
|
Examples of Data-Format Descriptions
|
|
|
|
====================================
|
|
|
|
|
|
|
|
Here are some examples of C-structures and how they would be
|
|
|
|
represented using the struct-style syntax:
|
|
|
|
|
|
|
|
float
|
|
|
|
'f'
|
|
|
|
complex double
|
|
|
|
'Zd'
|
|
|
|
RGB Pixel data
|
|
|
|
'BBB' or 'B:r: B:g: B:b:'
|
|
|
|
Mixed endian (weird but possible)
|
|
|
|
'>i:big: <i:little:'
|
|
|
|
Nested structure
|
|
|
|
::
|
|
|
|
|
|
|
|
struct {
|
|
|
|
int ival;
|
|
|
|
struct {
|
|
|
|
unsigned short sval;
|
|
|
|
unsigned char bval;
|
|
|
|
unsigned char cval;
|
|
|
|
} sub;
|
|
|
|
}
|
2007-04-10 05:45:04 -04:00
|
|
|
"""i:ival:
|
|
|
|
T{
|
|
|
|
H:sval:
|
|
|
|
B:bval:
|
|
|
|
B:cval:
|
|
|
|
}:sub:
|
|
|
|
"""
|
2007-04-09 01:51:32 -04:00
|
|
|
Nested array
|
|
|
|
::
|
|
|
|
|
|
|
|
struct {
|
|
|
|
int ival;
|
|
|
|
double data[16*4];
|
|
|
|
}
|
|
|
|
'i:ival: (16,4)d:data:'
|
|
|
|
|
2007-04-10 05:45:04 -04:00
|
|
|
|
2007-04-09 01:51:32 -04:00
|
|
|
Code to be affected
|
|
|
|
===================
|
|
|
|
|
|
|
|
All objects and modules in Python that export or consume the old
|
|
|
|
buffer interface will be modified. Here is a partial list.
|
|
|
|
|
|
|
|
* buffer object
|
|
|
|
* bytes object
|
|
|
|
* string object
|
|
|
|
* array module
|
|
|
|
* struct module
|
|
|
|
* mmap module
|
|
|
|
* ctypes module
|
|
|
|
|
|
|
|
Anything else using the buffer API.
|
|
|
|
|
|
|
|
|
|
|
|
Issues and Details
|
|
|
|
==================
|
|
|
|
|
2007-04-10 05:45:04 -04:00
|
|
|
It is intended that this PEP will be back-ported to Python 2.6 by
|
|
|
|
adding the C-API and the two functions to the existing buffer
|
|
|
|
protocol.
|
|
|
|
|
2007-04-09 01:51:32 -04:00
|
|
|
The proposed locking mechanism relies entirely on the exporter object
|
|
|
|
to not invalidate any of the memory pointed to by the buffer structure
|
|
|
|
until a corresponding releasebuffer is called. If it wants to be able
|
|
|
|
to change its own shape and/or strides arrays, then it needs to create
|
|
|
|
memory for these in the bufferinfo structure and copy information
|
|
|
|
over.
|
|
|
|
|
|
|
|
The sharing of strided memory and suboffsets is new and can be seen as
|
|
|
|
a modification of the multiple-segment interface. It is motivated by
|
|
|
|
NumPy and the PIL. NumPy objects should be able to share their
|
|
|
|
strided memory with code that understands how to manage strided memory
|
|
|
|
because strided memory is very common when interfacing with compute
|
|
|
|
libraries.
|
|
|
|
|
2007-04-10 05:45:04 -04:00
|
|
|
Also, with this approach it should be possible to write generic code
|
2007-04-09 01:51:32 -04:00
|
|
|
that works with both kinds of memory.
|
|
|
|
|
|
|
|
Memory management of the format string, the shape array, the strides
|
|
|
|
array, and the suboffsets array in the bufferinfo structure is always
|
|
|
|
the responsibility of the exporting object. The consumer should not
|
|
|
|
set these pointers to any other memory or try to free them.
|
|
|
|
|
2007-04-10 05:45:04 -04:00
|
|
|
Several ideas were discussed and rejected:
|
|
|
|
|
|
|
|
Having a "releaser" object whose release-buffer was called. This
|
|
|
|
was deemed unacceptable because it caused the protocol to be
|
|
|
|
asymmetric (you called release on something different than you
|
|
|
|
"got" the buffer from). It also complicated the protocol without
|
|
|
|
providing a real benefit.
|
|
|
|
|
|
|
|
Passing all the struct variables separately into the function.
|
|
|
|
This had the advantage that it allowed one to set NULL to
|
|
|
|
variables that were not of interest, but it also made the function
|
|
|
|
call more difficult. The flags variable allows the same
|
|
|
|
ability of consumers to be "simple" in how they call the protocol.
|
|
|
|
|
2007-04-09 01:51:32 -04:00
|
|
|
Code
|
|
|
|
========
|
|
|
|
|
|
|
|
The authors of the PEP promise to contribute and maintain the code for
|
|
|
|
this proposal but will welcome any help.
|
|
|
|
|
|
|
|
|
2007-04-10 05:45:04 -04:00
|
|
|
|
|
|
|
|
2007-04-09 01:51:32 -04:00
|
|
|
Examples
|
|
|
|
=========
|
|
|
|
|
|
|
|
Ex. 1
|
|
|
|
----------
|
|
|
|
|
|
|
|
This example shows how an image object that uses contiguous lines might expose its buffer.::
|
|
|
|
|
|
|
|
struct rgba {
|
|
|
|
unsigned char r, g, b, a;
|
|
|
|
};
|
|
|
|
|
|
|
|
struct ImageObject {
|
|
|
|
PyObject_HEAD;
|
|
|
|
...
|
|
|
|
struct rgba** lines;
|
|
|
|
Py_ssize_t height;
|
|
|
|
Py_ssize_t width;
|
|
|
|
Py_ssize_t shape_array[2];
|
|
|
|
Py_ssize_t stride_array[2];
|
|
|
|
Py_ssize_t view_count;
|
|
|
|
};
|
|
|
|
|
|
|
|
"lines" points to malloced 1-D array of (struct rgba*). Each pointer
|
|
|
|
in THAT block points to a seperately malloced array of (struct rgba).
|
|
|
|
|
|
|
|
In order to access, say, the red value of the pixel at x=30, y=50, you'd use "lines[50][30].r".
|
|
|
|
|
|
|
|
So what does ImageObject's getbuffer do? Leaving error checking out::
|
|
|
|
|
2007-04-10 05:45:04 -04:00
|
|
|
int Image_getbuffer(PyObject *self, struct bufferinfo *view, int flags) {
|
2007-04-09 01:51:32 -04:00
|
|
|
|
|
|
|
static Py_ssize_t suboffsets[2] = { -1, 0 };
|
|
|
|
|
|
|
|
view->buf = self->lines;
|
|
|
|
view->len = self->height*self->width;
|
|
|
|
view->readonly = 0;
|
|
|
|
view->ndims = 2;
|
|
|
|
self->shape_array[0] = height;
|
|
|
|
self->shape_array[1] = width;
|
|
|
|
view->shape = &self->shape_array;
|
|
|
|
self->stride_array[0] = sizeof(struct rgba*);
|
|
|
|
self->stride_array[1] = sizeof(struct rgba);
|
|
|
|
view->strides = &self->stride_array;
|
|
|
|
view->suboffsets = suboffsets;
|
|
|
|
|
|
|
|
self->view_count ++;
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
int Image_releasebuffer(PyObject *self, struct bufferinfo *view) {
|
|
|
|
self->view_count--;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
|
2007-04-10 05:45:04 -04:00
|
|
|
Ex. 2
|
|
|
|
-----------
|
|
|
|
|
|
|
|
This example shows how an object that wants to expose a contiguous
|
|
|
|
chunk of memory (which will never be re-allocated while the object is
|
|
|
|
alive) would do that.::
|
|
|
|
|
|
|
|
int myobject_getbuffer(PyObject *self, struct bufferinfo *view, int flags) {
|
|
|
|
|
|
|
|
void *buf;
|
|
|
|
Py_ssize_t len;
|
|
|
|
int readonly=0;
|
|
|
|
|
|
|
|
buf = /* Point to buffer */
|
|
|
|
len = /* Set to size of buffer */
|
|
|
|
readonly = /* Set to 1 if readonly */
|
|
|
|
|
|
|
|
return PyObject_FillBufferInfo(view, buf, len, readonly, flags);
|
|
|
|
}
|
|
|
|
|
|
|
|
/* No releasebuffer is necessary because the memory will never
|
|
|
|
be re-allocated so the locking mechanism is not needed
|
|
|
|
*/
|
|
|
|
|
|
|
|
Ex. 3
|
|
|
|
-----------
|
|
|
|
|
|
|
|
A consumer that wants to only get a simple contiguous chunk of bytes
|
|
|
|
from a Python object, obj would do the following::
|
|
|
|
|
|
|
|
|
|
|
|
struct bufferinfo view;
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
if (PyObject_GetBuffer(obj, &view, Py_BUF_SIMPLE) < 0) {
|
|
|
|
/* error return */
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Now, view.buf is the pointer to memory
|
|
|
|
view.len is the length
|
|
|
|
view.readonly is whether or not the memory is read-only.
|
|
|
|
*/
|
|
|
|
|
|
|
|
|
|
|
|
/* After using the information and you don't need it anymore */
|
|
|
|
|
|
|
|
if (PyObject_ReleaseBuffer(obj, &view) < 0) {
|
|
|
|
/* error return */
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
|
2007-04-09 01:51:32 -04:00
|
|
|
|
|
|
|
Copyright
|
|
|
|
=========
|
|
|
|
|
|
|
|
This PEP is placed in the public domain
|
|
|
|
|