PEP 3118 updates from Travis

2007-04-10 09:45:04 +00:00 · 2007-04-10 09:45:04 +00:00 · ead0191bee
parent f77cdbd262
commit ead0191bee
1 changed files with 223 additions and 55 deletions
--- a/pep-3118.txt
+++ b/pep-3118.txt
@ -135,8 +135,8 @@ object.  In fact, the new protocol allows a standard mechanism for
 doing this even if the original object is not represented as a
 contiguous chunk of memory.

-The easiest way is to use the provided C-API to obtain a contiguous
-chunk of memory like the old buffer protocol allowed.
+The easiest way to obtain a simple contiguous chunk of memory is
+to use the provided C-API to obtain a chunk of memory.


 Change the PyBufferProcs structure to
@ -151,13 +151,64 @@ Change the PyBufferProcs structure to

 ::

-    typedef int (*getbufferproc)(PyObject *obj, struct bufferinfo *view) 
+    typedef int (*getbufferproc)(PyObject *obj, struct bufferinfo *view, int flags) 

 This function returns 0 on success and -1 on failure (and raises an
 error). The first variable is the "exporting" object.  The second
 argument is the address to a bufferinfo structure.  If view is NULL,
 then no information is returned but a lock on the memory is still
-obtained.  In this case, releasebuffer should also be called with NULL.
+obtained.  In this case, the corresponding releasebuffer should also
+be called with NULL.
+
+The third argument indicates what kind of buffer the exporter is allowed to return.   It tells the
+exporter what elements the bufferinfo structure the consumer is going to make use of.  This
+allows the exporter to simplify and/or raise an error if it can't support the operation.  
+
+It also allows the caller to make a request for a simple "view" and
+receive it or have an error raised if it's not possible. 
+
+All of the following assume that at least buf, len, and readonly will be
+utilized by the caller.
+
+ Py_BUF_SIMPLE
+   The returned buffer will only be assumed to be readable (the object
+   may or may not have writeable memory).  Only the buf, len, and
+   readonly variables may be accessed. The format will be
+   assumed to be unsigned bytes.  This is a "stand-alone" flag constant.
+   It never needs to be \|'d to the others. 
+
+ Py_BUF_WRITEABLE
+   The returned buffer must be writeable.  If it cannot be, then raise an error. 
+
+ Py_BUF_READONLY
+   The returned buffer must be readonly and the underlying object should make
+   its memory readonly if that is possible.  
+
+ Py_BUF_FORMAT
+   The consumer will be using the format string information so make sure that 
+   member is filled correctly. 
+
+ Py_BUF_SHAPE
+   The consumer can (and might) make use of using the ndims and shape members of the structure
+   so make sure they are filled in correctly. 
+   
+ Py_BUF_STRIDES (implies SHAPE)
+   The consumer can (and might) make use of the strides member of the structure (as well
+   as ndims and shape)
+
+ Py_BUF_OFFSETS (implies STRIDES)
+   The consumer can (and might) make use of the suboffsets member (as well as 
+   ndims, shape, and strides)
+
+Thus, the consumer simply wanting an contiguous chunk of bytes from
+the object would use Py_BUF_SIMPLE, while a consumer that understands
+how to make use of the most complicated cases would use
+Py_BUF_OFFSETS.
+
+There is a C-API that simple exporting objects can use to fill-in the
+buffer info structure correctly according to the provided flags if a
+contiguous chunk of memory is all that can be exported.
+

 The bufferinfo structure is::

@ -170,18 +221,19 @@ The bufferinfo structure is::
       Py_ssize_t *shape;
       Py_ssize_t *strides;
       Py_ssize_t *suboffsets;
+       void *internal;
  };

-Upon return from getbufferproc, the bufferinfo structure is filled in
+Before calling this function, the bufferinfo structure can be filled with
+whatever.  Upon return from getbufferproc, the bufferinfo structure is filled in
 with relevant information about the buffer.  This same bufferinfo
 structure must be passed to bf_releasebuffer (if available) when the
 consumer is done with the memory. The caller is responsible for
-keeping a reference to obj until releasebuffer is called.
-
+keeping a reference to obj until releasebuffer is called (i.e. this
+call does not alter the reference count of obj). 

 The members of the bufferinfo structure are:

-
 buf
    a pointer to the start of the memory for the object

@ -195,29 +247,28 @@ readonly
    readonly.  1 means the memory is readonly, zero means the
    memory is writeable.

-
-format
-    a format-string (following extended struct syntax) indicating what
-    is in each element of of memory.  The number of elements is len /
-    itemsize, where itemsize is the number of bytes implied by the
-    format.  For standard unsigned bytes use a format string of "B".
+format 
+    a NULL-terminated format-string (following the struct-style syntax
+    including extensions) indicating what is in each element of
+    memory.  The number of elements is len / itemsize, where itemsize
+    is the number of bytes implied by the format.  For standard
+    unsigned bytes use a format string of "B".

 ndims
    a variable storing the number of dimensions the memory represents.
-    Should be >=0. 
+    Must be >=0. 

 shape
    an array of ``Py_ssize_t`` of length ``ndims`` indicating the
    shape of the memory as an N-D array.  Note that ``((*shape)[0] *
-    ... * (*shape)[ndims-1])*itemsize = len``.  This can be NULL
-    to indicate 1-d arrays. 
+    ... * (*shape)[ndims-1])*itemsize = len``.

 strides 
    address of a ``Py_ssize_t*`` variable that will be filled with a
    pointer to an array of ``Py_ssize_t`` of length ``*ndims``
    indicating the number of bytes to skip to get to the next element
-    in each dimension.  If this is NULL, then the memory is assumed to
-    be C-style contigous with the last dimension varying the fastest.
+    in each dimension.  For C-style contiguous arrays (where the
+    last-dimension varies the fastest) this must be filled in.  

 suboffsets
    address of a ``Py_ssize_t *`` variable that will be filled with a
@ -249,22 +300,30 @@ suboffsets

    Notice the suboffset is added "after" the dereferencing occurs.
    Thus slicing in the ith dimension would add to the suboffsets in
-    the i-1st dimension.  Slicing in the first dimension would change
+    the (i-1)st dimension.  Slicing in the first dimension would change
    the location of the starting pointer directly (i.e. buf would
    be modified).  
+
+internal
+    This is for use internally by the exporting object.  For example,
+    this might be re-cast as an integer by the exporter and used to 
+    store flags about whether or not the shape, strides, and suboffsets
+    arrays must be freed when the buffer is released.   The consumer
+    should never touch this value. 
    

 The exporter is responsible for making sure the memory pointed to by
 buf, format, shape, strides, and suboffsets is valid until
 releasebuffer is called.  If the exporter wants to be able to change
 shape, strides, and/or suboffsets before releasebuffer is called then
-it should allocate those arrays when getbuffer is called and free them
-when releasebuffer is called.
+it should allocate those arrays when getbuffer is called (pointing to
+them in the buffer-info structure provided) and free them when
+releasebuffer is called.


-The same bufferinfo struct should be used in the other buffer
+The same bufferinfo struct should be used in the release-buffer
 interface call. The caller is responsible for the memory of the
-bufferinfo object itself.
+bufferinfo structure itself. 

 ``typedef int (*releasebufferproc)(PyObject *obj, struct bufferinfo *view)``
    Callers of getbufferproc must make sure that this function is
@ -285,9 +344,11 @@ Several mechanisms could be used to keep track of how many getbuffer
 calls have been made and shared.  Either a single variable could be
 used to keep track of how many "views" have been exported, or a
 linked-list of bufferinfo structures filled in could be maintained in
-each objet.  All that is needed is to ensure that any memory shared
-through the bufferinfo structure remains valid until releasebuffer is
-called on that memory.
+each object.  
+
+All that is specifically required by the exporter, however, is to
+ensure that any memory shared through the bufferinfo structure remains
+valid until releasebuffer is called on the bufferinfo structure.


 New C-API calls are proposed
@ -301,7 +362,25 @@ Return 1 if the getbuffer function is available otherwise 0.

 ::

-    PyObject *PyObject_GetBuffer(PyObject *obj)
+    int PyObject_GetBuffer(PyObject *obj, struct bufferinfo *view, int flags)
+
+This is a C-API version of the getbuffer function call.  It checks to
+make sure object has the required function pointer and issues the
+call.  Returns -1 and raises an error on failure and returns 0 on 
+success. 
+
+::
+
+    int PyObject_ReleaseBuffer(PyObject *obj, struct bufferinfo *view)
+
+This is a C-API version of the releasebuffer function call.  It checks to
+make sure the object has the required function pointer and issues the call.  Returns 0
+on success and -1 (with an error raised) on failure. This function always 
+succeeds if there is no releasebuffer function for the object. 
+
+::
+
+    PyObject *PyObject_GetMemoryView(PyObject *obj)

 Return a memory-view object from an object that defines the buffer interface. 
 If make_ro is non-zero then request that the memory is made read-only until 
@ -320,9 +399,9 @@ the buffer object in Python 3K.  It's C-structure is::

 This is very similar to the current buffer object except offset has
 been removed because ptr can just be modified by offset and a single
-offset is not sufficient.  Also the hash has been removed because
-using the buffer object as a hash even if it is read-only is rarely
-useful.  
+offset is not sufficient for the sub-offsets.  Also the hash has been
+removed because using the buffer object as a hash even if it is
+read-only is rarely useful.

 Also, the format, ndims, shape, strides, and suboffsets have been
 added. These additions will allow multi-dimensional slicing of the
@ -338,10 +417,10 @@ This object never reallocates ptr, shape, strides, subboffsets or
 format and therefore does not need to keep track of how many views it
 has exported.

-It exports a view using the base object.  It releases a view by releasing
-the view on the base object.  Because, it will never re-allocate memory, 
-it does not need to keep track of how many it has exported but simple 
-reference counting will suffice. 
+It exports a view using the base object.  It releases a view by
+releasing the view on the base object.  Because, it will never
+re-allocate memory, it does not need to keep track of how many it has
+exported but simple reference counting will suffice.

 ::

@ -363,7 +442,8 @@ that memory is ``*len``.  If the object is multi-dimensional, then if
 fortran is 1, the first dimension of the underlying array will vary
 the fastest in the buffer.  If fortran is 0, then the last dimension
 will vary the fastest (C-style contiguous). If fortran is -1, then it
-does not matter and you will get whatever the object decides is easiest.
+does not matter and you will get whatever the object decides is more
+efficient.

 :: 

@ -378,8 +458,8 @@ fortran is 1, then if the object is multi-dimensional, then the data
 will be copied into the array in Fortran-style (first dimension varies
 the fastest).  If fortran is 0, then the data will be copied into the
 array in C-style (last dimension varies the fastest).  If fortran is -1, then
-it does not matter and the copy will be made in whatever way is
-easiest. 
+it does not matter and the copy will be made in whatever way is more
+efficient.

 The last two C-API calls allow a standard way of getting data in and
 out of Python objects into contiguous memory areas no matter how it is
@ -388,20 +468,29 @@ their work.

 ::

-    int PyObject_IsContiguous(struct bufferinfo *view);
+    int PyObject_IsContiguous(struct bufferinfo *view, int fortran);

-Return 1 if the memory defined by the view object is C-style
-contiguous.  Return 0 otherwise.
+Return 1 if the memory defined by the view object is C-style (fortran = 0)
+or Fortran-style (fortran = 1) contiguous.  Return 0 otherwise.

 ::

    void PyObject_FillContiguousStrides(int *ndims, Py_ssize_t *shape,
                                        int itemsize, 
-                                        Py_ssize_t *strides)
+                                        Py_ssize_t *strides, int fortran)

-Fill the strides array with byte-strides of a contiguous array of the
-given shape with the given number of bytes per element. 
+Fill the strides array with byte-strides of a contiguous (C-style if
+fortran is 0 or Fortran-style if fortran is 1) array of the given
+shape with the given number of bytes per element.

+::
+
+    int PyObject_FillBufferInfo(struct bufferinfo *view, void *buf, Py_ssize_t len,
+                                 int readonly, int infoflags)
+
+Fills in a buffer-info structure correctly for an exporter that can only share
+a contiguous chunk of memory of "unsigned bytes" of the given length.  Returns 0 on success
+and -1 (with raising an error) on error 


 Additions to the struct string-syntax
@ -432,18 +521,18 @@ Character         Description
 ':name:'          optional name of the preceeding element 
 'X{}'             pointer to a function (optional function 
                                         signature inside {})
-' '               ignored (allow better readability)
+' \n\t'           ignored (allow better readability) -- this may already be true
 ================  ===========

 The struct module will be changed to understand these as well and
 return appropriate Python objects on unpacking.  Un-packing a
-long-double will return a decimal object.  Unpacking 'u' or
-'w' will return Python unicode.  Unpacking a multi-dimensional
-array will return a list of lists.  Un-packing a pointer will
-return a ctypes pointer object.  Un-packing a bit will return a
-Python Bool.  Spaces in the struct-string syntax will be ignored.
-Unpacking a named-object will return a Python class with attributes 
-having those names. 
+long-double will return a decimal object or a ctypes long-double.
+Unpacking 'u' or 'w' will return Python unicode.  Unpacking a
+multi-dimensional array will return a list of lists.  Un-packing a
+pointer will return a ctypes pointer object.  Un-packing a bit will
+return a Python Bool.  Spaces in the struct-string syntax will be
+ignored.  Unpacking a named-object will return a Python class with
+attributes having those names.

 Endian-specification ('=','>','<') is also allowed inside the
 string so that it can change if needed.  The previously-specified
@ -483,7 +572,13 @@ Nested structure
                 unsigned char cval;
             } sub;
        }
-        'i:ival: T{H:sval: B:bval: B:cval:}:sub:'
+        """i:ival: 
+           T{
+              H:sval: 
+              B:bval: 
+              B:cval:
+            }:sub:
+        """
 Nested array
    ::

@ -493,6 +588,7 @@ Nested array
        }
        'i:ival: (16,4)d:data:'

+
 Code to be affected
 ===================

@ -513,6 +609,10 @@ Anything else using the buffer API.
 Issues and Details
 ==================

+It is intended that this PEP will be back-ported to Python 2.6 by
+adding the C-API and the two functions to the existing buffer
+protocol.
+
 The proposed locking mechanism relies entirely on the exporter object
 to not invalidate any of the memory pointed to by the buffer structure
 until a corresponding releasebuffer is called.  If it wants to be able
@ -527,7 +627,7 @@ strided memory with code that understands how to manage strided memory
 because strided memory is very common when interfacing with compute
 libraries.

-Also with this approach it should be possible to write generic code
+Also, with this approach it should be possible to write generic code
 that works with both kinds of memory.

 Memory management of the format string, the shape array, the strides
@ -535,6 +635,20 @@ array, and the suboffsets array in the bufferinfo structure is always
 the responsibility of the exporting object.  The consumer should not
 set these pointers to any other memory or try to free them. 

+Several ideas were discussed and rejected: 
+
+    Having a "releaser" object whose release-buffer was called.  This
+    was deemed unacceptable because it caused the protocol to be
+    asymmetric (you called release on something different than you
+    "got" the buffer from).  It also complicated the protocol without
+    providing a real benefit.
+
+    Passing all the struct variables separately into the function.
+    This had the advantage that it allowed one to set NULL to
+    variables that were not of interest, but it also made the function
+    call more difficult.  The flags variable allows the same
+    ability of consumers to be "simple" in how they call the protocol. 
+
 Code
 ========

@ -542,6 +656,8 @@ The authors of the PEP promise to contribute and maintain the code for
 this proposal but will welcome any help.


+
+
 Examples
 =========

@ -572,7 +688,7 @@ In order to access, say, the red value of the pixel at x=30, y=50, you'd use "li

 So what does ImageObject's getbuffer do?  Leaving error checking out::

-  int Image_getbuffer(PyObject *self, struct bufferinfo *view) {
+  int Image_getbuffer(PyObject *self, struct bufferinfo *view, int flags) {

      static Py_ssize_t suboffsets[2] = { -1, 0 };

@ -600,6 +716,58 @@ So what does ImageObject's getbuffer do?  Leaving error checking out::
  }


+Ex. 2
+-----------
+
+This example shows how an object that wants to expose a contiguous
+chunk of memory (which will never be re-allocated while the object is
+alive) would do that.::
+
+  int myobject_getbuffer(PyObject *self, struct bufferinfo *view, int flags) {
+
+    void *buf;
+    Py_ssize_t len;
+    int readonly=0;
+        
+    buf = /* Point to buffer */
+    len = /* Set to size of buffer */
+    readonly = /* Set to 1 if readonly */
+
+    return PyObject_FillBufferInfo(view, buf, len, readonly, flags);    
+  }
+
+  /* No releasebuffer is necessary because the memory will never 
+  be re-allocated so the locking mechanism is not needed
+  */
+
+Ex.  3
+-----------
+
+A consumer that wants to only get a simple contiguous chunk of bytes
+from a Python object, obj would do the following::
+
+
+  struct bufferinfo view;
+  int ret;
+      
+  if (PyObject_GetBuffer(obj, &view, Py_BUF_SIMPLE) < 0) {
+       /* error return */
+  }
+
+  /* Now, view.buf is the pointer to memory
+          view.len is the length
+          view.readonly is whether or not the memory is read-only.
+   */
+  
+
+  /* After using the information and you don't need it anymore */
+  
+  if (PyObject_ReleaseBuffer(obj, &view) < 0) {
+          /* error return */
+  }
+  
+
+

 Copyright
 =========