reSTify PEP 296 (#352)
This commit is contained in:
parent
5aea3b9792
commit
e6fe4f377f
90
pep-0296.txt
90
pep-0296.txt
|
@ -5,17 +5,20 @@ Last-Modified: $Date$
|
|||
Author: xscottg at yahoo.com (Scott Gilbert)
|
||||
Status: Withdrawn
|
||||
Type: Standards Track
|
||||
Content-Type: text/x-rst
|
||||
Created: 12-Jul-2002
|
||||
Python-Version: 2.3
|
||||
Post-History:
|
||||
|
||||
|
||||
Notice
|
||||
=======
|
||||
|
||||
This PEP is withdrawn by the author (in favor of PEP 358).
|
||||
|
||||
|
||||
Abstract
|
||||
========
|
||||
|
||||
This PEP proposes the creation of a new standard type and builtin
|
||||
constructor called 'bytes'. The bytes object is an efficiently
|
||||
|
@ -24,6 +27,7 @@ Abstract
|
|||
|
||||
|
||||
Rationale
|
||||
=========
|
||||
|
||||
Python currently has many objects that implement something akin to
|
||||
the bytes object of this proposal. For instance the standard
|
||||
|
@ -36,6 +40,7 @@ Rationale
|
|||
|
||||
|
||||
Specification
|
||||
=============
|
||||
|
||||
The bytes object has the following important characteristics:
|
||||
|
||||
|
@ -65,7 +70,7 @@ Specification
|
|||
the extension author sees fit.
|
||||
|
||||
This alignment restriction should allow the bytes object to be
|
||||
used as storage for all standard C types - including PyComplex
|
||||
used as storage for all standard C types - including ``PyComplex``
|
||||
objects or other structs of standard C type types. Further
|
||||
alignment restrictions can be provided by extensions as necessary.
|
||||
|
||||
|
@ -81,7 +86,7 @@ Specification
|
|||
applications, one motivation for the decision to use view slicing
|
||||
is that copying between bytes objects should be very efficient and
|
||||
not require the creation of temporary objects. The following code
|
||||
illustrates this:
|
||||
illustrates this::
|
||||
|
||||
# create two 10 Meg bytes objects
|
||||
b1 = bytes(10000000)
|
||||
|
@ -95,8 +100,8 @@ Specification
|
|||
work correctly with overlapping slices (typically implemented with
|
||||
memmove).
|
||||
|
||||
4. The bytes object will be recognized as a native type by the pickle and
|
||||
cPickle modules for efficient serialization. (In truth, this is
|
||||
4. The bytes object will be recognized as a native type by the ``pickle`` and
|
||||
``cPickle`` modules for efficient serialization. (In truth, this is
|
||||
the only requirement that can't be implemented via a third party
|
||||
extension.)
|
||||
|
||||
|
@ -115,7 +120,7 @@ Specification
|
|||
string object.
|
||||
|
||||
When unpickling, the bytes object will be created from memory
|
||||
allocated from Python (via malloc). As such, it will lose any
|
||||
allocated from Python (via ``malloc``). As such, it will lose any
|
||||
additional properties that an extension supplied pointer might
|
||||
have provided (special alignment, or special types of memory).
|
||||
|
||||
|
@ -131,19 +136,19 @@ Specification
|
|||
|
||||
At least on platforms supporting large files (many of them),
|
||||
pickling large bytes objects to files should be possible via
|
||||
repeated calls to the file.write() method.
|
||||
repeated calls to the ``file.write()`` method.
|
||||
|
||||
5. The bytes type supports the PyBufferProcs interface, but a bytes object
|
||||
5. The bytes type supports the ``PyBufferProcs`` interface, but a bytes object
|
||||
provides the additional guarantee that the pointer will not be
|
||||
deallocated or reallocated as long as a reference to the bytes
|
||||
object is held. This implies that a bytes object is not resizable
|
||||
once it is created, but allows the global interpreter lock (GIL)
|
||||
to be released while a separate thread manipulates the memory
|
||||
pointed to if the PyBytes_Check(...) test passes.
|
||||
pointed to if the ``PyBytes_Check(...)`` test passes.
|
||||
|
||||
This characteristic of the bytes object allows it to be used in
|
||||
situations such as asynchronous file I/O or on multiprocessor
|
||||
machines where the pointer obtained by PyBufferProcs will be used
|
||||
machines where the pointer obtained by ``PyBufferProcs`` will be used
|
||||
independently of the global interpreter lock.
|
||||
|
||||
Knowing that the pointer can not be reallocated or freed after the
|
||||
|
@ -170,27 +175,27 @@ Specification
|
|||
bytes object.
|
||||
|
||||
8. The bytes object keeps track of the length of its data with a Python
|
||||
LONG_LONG type. Even though the current definition for PyBufferProcs
|
||||
``LONG_LONG`` type. Even though the current definition for ``PyBufferProcs``
|
||||
restricts the length to be the size of an int, this PEP does not propose
|
||||
to make any changes there. Instead, extensions can work around this limit
|
||||
by making an explicit PyBytes_Check(...) call, and if that succeeds they
|
||||
can make a PyBytes_GetReadBuffer(...) or PyBytes_GetWriteBuffer call to
|
||||
get the pointer and full length of the object as a LONG_LONG.
|
||||
by making an explicit ``PyBytes_Check(...)`` call, and if that succeeds they
|
||||
can make a ``PyBytes_GetReadBuffer(...)`` or ``PyBytes_GetWriteBuffer``
|
||||
call to get the pointer and full length of the object as a ``LONG_LONG``.
|
||||
|
||||
The bytes object will raise an exception if the standard PyBufferProcs
|
||||
The bytes object will raise an exception if the standard ``PyBufferProcs``
|
||||
mechanism is used and the size of the bytes object is greater than can be
|
||||
represented by an integer.
|
||||
|
||||
From Python scripting, the bytes object will be subscriptable with longs
|
||||
so the 32 bit int limit can be avoided.
|
||||
|
||||
There is still a problem with the len() function as it is PyObject_Size()
|
||||
and this returns an int as well. As a workaround, the bytes object will
|
||||
provide a .length() method that will return a long.
|
||||
There is still a problem with the ``len()`` function as it is
|
||||
``PyObject_Size()`` and this returns an int as well. As a workaround,
|
||||
the bytes object will provide a ``.length()`` method that will return a long.
|
||||
|
||||
9. The bytes object can be constructed at the Python scripting level by
|
||||
passing an int/long to the bytes constructor with the number of bytes to
|
||||
allocate. For example:
|
||||
allocate. For example::
|
||||
|
||||
b = bytes(100000) # alloc 100K bytes
|
||||
|
||||
|
@ -200,26 +205,28 @@ Specification
|
|||
designate creation of a readonly bytes object.
|
||||
|
||||
10. From the C API, the bytes object can be allocated using any of the
|
||||
following signatures:
|
||||
following signatures::
|
||||
|
||||
PyObject* PyBytes_FromLength(LONG_LONG len, int readonly);
|
||||
PyObject* PyBytes_FromPointer(void* ptr, LONG_LONG len, int readonly
|
||||
void (*dest)(void *ptr, void *user), void* user);
|
||||
|
||||
In the PyBytes_FromPointer(...) function, if the dest function pointer is
|
||||
passed in as NULL, it will not be called. This should only be used for
|
||||
creating bytes objects from statically allocated space.
|
||||
In the ``PyBytes_FromPointer(...)`` function, if the dest function pointer
|
||||
is passed in as ``NULL``, it will not be called. This should only be used
|
||||
for creating bytes objects from statically allocated space.
|
||||
|
||||
The user pointer has been called a closure in other places. It is a
|
||||
pointer that the user can use for whatever purposes. It will be passed to
|
||||
the destructor function on cleanup and can be useful for a number of
|
||||
things. If the user pointer is not needed, NULL should be passed instead.
|
||||
things. If the user pointer is not needed, ``NULL`` should be passed
|
||||
instead.
|
||||
|
||||
11. The bytes type will be a new style class as that seems to be where all
|
||||
standard Python types are headed.
|
||||
|
||||
|
||||
Contrast to existing types
|
||||
==========================
|
||||
|
||||
The most common way to work around the lack of a bytes object has been to
|
||||
simply use a string object in its place. Binary files, the struct/array
|
||||
|
@ -233,7 +240,7 @@ Contrast to existing types
|
|||
|
||||
The buffer object seems like it was intended to address the purpose that
|
||||
the bytes object is trying fulfill, but several shortcomings in its
|
||||
implementation [1] have made it less useful in many common cases. The
|
||||
implementation [1]_ have made it less useful in many common cases. The
|
||||
buffer object made a different choice for its slicing behavior (it returns
|
||||
new strings instead of buffers for slicing and other operations), and it
|
||||
doesn't make many of the promises on alignment or being able to release
|
||||
|
@ -242,7 +249,7 @@ Contrast to existing types
|
|||
Also in regards to the buffer object, it is not possible to simply replace
|
||||
the buffer object with the bytes object and maintain backwards
|
||||
compatibility. The buffer object provides a mechanism to take the
|
||||
PyBufferProcs supplied pointer of another object and present it as its
|
||||
``PyBufferProcs`` supplied pointer of another object and present it as its
|
||||
own. Since the behavior of the other object can not be guaranteed to
|
||||
follow the same set of strict rules that a bytes object does, it can't be
|
||||
used in places that a bytes object could.
|
||||
|
@ -268,6 +275,7 @@ Contrast to existing types
|
|||
|
||||
|
||||
Backward Compatibility
|
||||
======================
|
||||
|
||||
The only possibility for backwards compatibility problems that the author
|
||||
is aware of are in previous versions of Python that try to unpickle data
|
||||
|
@ -275,18 +283,19 @@ Backward Compatibility
|
|||
|
||||
|
||||
Reference Implementation
|
||||
========================
|
||||
|
||||
XXX: Actual implementation is in progress, but changes are still possible
|
||||
as this PEP gets further review.
|
||||
|
||||
The following new files will be added to the Python baseline:
|
||||
The following new files will be added to the Python baseline::
|
||||
|
||||
Include/bytesobject.h # C interface
|
||||
Objects/bytesobject.c # C implementation
|
||||
Lib/test/test_bytes.py # unit testing
|
||||
Doc/lib/libbytes.tex # documentation
|
||||
|
||||
The following files will also be modified:
|
||||
The following files will also be modified::
|
||||
|
||||
Include/Python.h # adding bytesmodule.h include file
|
||||
Python/bltinmodule.c # adding the bytes type object
|
||||
|
@ -303,6 +312,7 @@ Reference Implementation
|
|||
|
||||
|
||||
Additional Notes/Comments
|
||||
=========================
|
||||
|
||||
- Guido van Rossum wondered whether it would make sense to be able
|
||||
to create a bytes object from a mmap object. The mmap object
|
||||
|
@ -311,43 +321,45 @@ Additional Notes/Comments
|
|||
for the lifetime of the object.) As such, a method could be added
|
||||
to the mmap module such that a bytes object could be created
|
||||
directly from a mmap object. An initial stab at how this would be
|
||||
implemented would be to use the PyBytes_FromPointer() function
|
||||
described above and pass the mmap_object as the user pointer. The
|
||||
destructor function would decref the mmap_object for cleanup.
|
||||
implemented would be to use the ``PyBytes_FromPointer()`` function
|
||||
described above and pass the ``mmap_object`` as the user pointer. The
|
||||
destructor function would decref the ``mmap_object`` for cleanup.
|
||||
|
||||
- Todd Miller notes that it may be useful to have two new functions:
|
||||
PyObject_AsLargeReadBuffer() and PyObject_AsLargeWriteBuffer that are
|
||||
similar to PyObject_AsReadBuffer() and PyObject_AsWriteBuffer(), but
|
||||
support getting a LONG_LONG length in addition to the void* pointer.
|
||||
``PyObject_AsLargeReadBuffer()`` and ``PyObject_AsLargeWriteBuffer`` that are
|
||||
similar to ``PyObject_AsReadBuffer()`` and ``PyObject_AsWriteBuffer()``, but
|
||||
support getting a ``LONG_LONG`` length in addition to the ``void*`` pointer.
|
||||
These functions would allow extension authors to work transparently with
|
||||
bytes object (that support LONG_LONG lengths) and most other buffer like
|
||||
bytes object (that support ``LONG_LONG`` lengths) and most other buffer like
|
||||
objects (which only support int lengths). These functions could be in
|
||||
lieu of, or in addition to, creating a specific PyByte_GetReadBuffer() and
|
||||
PyBytes_GetWriteBuffer() functions.
|
||||
lieu of, or in addition to, creating a specific ``PyByte_GetReadBuffer()`` and
|
||||
``PyBytes_GetWriteBuffer()`` functions.
|
||||
|
||||
XXX: The author thinks this is very a good idea as it paves the way for
|
||||
other objects to eventually support large (64 bit) pointers, and it should
|
||||
only affect abstract.c and abstract.h. Should this be added above?
|
||||
|
||||
- It was generally agreed that abusing the segment count of the
|
||||
PyBufferProcs interface is not a good hack to work around the 31 bit
|
||||
``PyBufferProcs`` interface is not a good hack to work around the 31 bit
|
||||
limitation of the length. If you don't know what this means, then you're
|
||||
in good company. Most code in the Python baseline, and presumably in many
|
||||
third party extensions, punt when the segment count is not 1.
|
||||
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
[1] The buffer interface
|
||||
.. [1] The buffer interface
|
||||
https://mail.python.org/pipermail/python-dev/2000-October/009974.html
|
||||
|
||||
|
||||
Copyright
|
||||
=========
|
||||
|
||||
This document has been placed in the public domain.
|
||||
|
||||
|
||||
|
||||
..
|
||||
Local Variables:
|
||||
mode: indented-text
|
||||
indent-tabs-mode: nil
|
||||
|
|
Loading…
Reference in New Issue