reSTify PEP 296 (#352)

This commit is contained in:
Huang Huang 2017-08-19 03:00:20 +08:00 committed by Brett Cannon
parent 5aea3b9792
commit e6fe4f377f
1 changed files with 261 additions and 249 deletions

View File

@ -5,41 +5,46 @@ Last-Modified: $Date$
Author: xscottg at yahoo.com (Scott Gilbert) Author: xscottg at yahoo.com (Scott Gilbert)
Status: Withdrawn Status: Withdrawn
Type: Standards Track Type: Standards Track
Content-Type: text/x-rst
Created: 12-Jul-2002 Created: 12-Jul-2002
Python-Version: 2.3 Python-Version: 2.3
Post-History: Post-History:
Notice Notice
=======
This PEP is withdrawn by the author (in favor of PEP 358). This PEP is withdrawn by the author (in favor of PEP 358).
Abstract Abstract
========
This PEP proposes the creation of a new standard type and builtin This PEP proposes the creation of a new standard type and builtin
constructor called 'bytes'. The bytes object is an efficiently constructor called 'bytes'. The bytes object is an efficiently
stored array of bytes with some additional characteristics that stored array of bytes with some additional characteristics that
set it apart from several implementations that are similar. set it apart from several implementations that are similar.
Rationale Rationale
=========
Python currently has many objects that implement something akin to Python currently has many objects that implement something akin to
the bytes object of this proposal. For instance the standard the bytes object of this proposal. For instance the standard
string, buffer, array, and mmap objects are all very similar in string, buffer, array, and mmap objects are all very similar in
some regards to the bytes object. Additionally, several some regards to the bytes object. Additionally, several
significant third party extensions have created similar objects to significant third party extensions have created similar objects to
try and fill similar needs. Frustratingly, each of these objects try and fill similar needs. Frustratingly, each of these objects
is too narrow in scope and is missing critical features to make it is too narrow in scope and is missing critical features to make it
applicable to a wider category of problems. applicable to a wider category of problems.
Specification Specification
=============
The bytes object has the following important characteristics: The bytes object has the following important characteristics:
1. Efficient underlying array storage via the standard C type "unsigned 1. Efficient underlying array storage via the standard C type "unsigned
char". This allows fine grain control over how much memory is char". This allows fine grain control over how much memory is
allocated. With the alignment restrictions designated in the next allocated. With the alignment restrictions designated in the next
item, it is trivial for low level extensions to cast the pointer item, it is trivial for low level extensions to cast the pointer
@ -59,17 +64,17 @@ Specification
extensions would handle this correctly, but Python script could be extensions would handle this correctly, but Python script could be
portable in these cases. portable in these cases.
2. Alignment of the allocated byte array is whatever is promised by the 2. Alignment of the allocated byte array is whatever is promised by the
platform implementation of malloc. A bytes object created from an platform implementation of malloc. A bytes object created from an
extension can be supplied that provides any arbitrary alignment as extension can be supplied that provides any arbitrary alignment as
the extension author sees fit. the extension author sees fit.
This alignment restriction should allow the bytes object to be This alignment restriction should allow the bytes object to be
used as storage for all standard C types - including PyComplex used as storage for all standard C types - including ``PyComplex``
objects or other structs of standard C type types. Further objects or other structs of standard C type types. Further
alignment restrictions can be provided by extensions as necessary. alignment restrictions can be provided by extensions as necessary.
3. The bytes object implements a subset of the sequence operations 3. The bytes object implements a subset of the sequence operations
provided by string/array objects, but with slightly different provided by string/array objects, but with slightly different
semantics in some cases. In particular, a slice always returns a semantics in some cases. In particular, a slice always returns a
new bytes object, but the underlying memory is shared between the new bytes object, but the underlying memory is shared between the
@ -81,7 +86,7 @@ Specification
applications, one motivation for the decision to use view slicing applications, one motivation for the decision to use view slicing
is that copying between bytes objects should be very efficient and is that copying between bytes objects should be very efficient and
not require the creation of temporary objects. The following code not require the creation of temporary objects. The following code
illustrates this: illustrates this::
# create two 10 Meg bytes objects # create two 10 Meg bytes objects
b1 = bytes(10000000) b1 = bytes(10000000)
@ -95,8 +100,8 @@ Specification
work correctly with overlapping slices (typically implemented with work correctly with overlapping slices (typically implemented with
memmove). memmove).
4. The bytes object will be recognized as a native type by the pickle and 4. The bytes object will be recognized as a native type by the ``pickle`` and
cPickle modules for efficient serialization. (In truth, this is ``cPickle`` modules for efficient serialization. (In truth, this is
the only requirement that can't be implemented via a third party the only requirement that can't be implemented via a third party
extension.) extension.)
@ -115,7 +120,7 @@ Specification
string object. string object.
When unpickling, the bytes object will be created from memory When unpickling, the bytes object will be created from memory
allocated from Python (via malloc). As such, it will lose any allocated from Python (via ``malloc``). As such, it will lose any
additional properties that an extension supplied pointer might additional properties that an extension supplied pointer might
have provided (special alignment, or special types of memory). have provided (special alignment, or special types of memory).
@ -131,19 +136,19 @@ Specification
At least on platforms supporting large files (many of them), At least on platforms supporting large files (many of them),
pickling large bytes objects to files should be possible via pickling large bytes objects to files should be possible via
repeated calls to the file.write() method. repeated calls to the ``file.write()`` method.
5. The bytes type supports the PyBufferProcs interface, but a bytes object 5. The bytes type supports the ``PyBufferProcs`` interface, but a bytes object
provides the additional guarantee that the pointer will not be provides the additional guarantee that the pointer will not be
deallocated or reallocated as long as a reference to the bytes deallocated or reallocated as long as a reference to the bytes
object is held. This implies that a bytes object is not resizable object is held. This implies that a bytes object is not resizable
once it is created, but allows the global interpreter lock (GIL) once it is created, but allows the global interpreter lock (GIL)
to be released while a separate thread manipulates the memory to be released while a separate thread manipulates the memory
pointed to if the PyBytes_Check(...) test passes. pointed to if the ``PyBytes_Check(...)`` test passes.
This characteristic of the bytes object allows it to be used in This characteristic of the bytes object allows it to be used in
situations such as asynchronous file I/O or on multiprocessor situations such as asynchronous file I/O or on multiprocessor
machines where the pointer obtained by PyBufferProcs will be used machines where the pointer obtained by ``PyBufferProcs`` will be used
independently of the global interpreter lock. independently of the global interpreter lock.
Knowing that the pointer can not be reallocated or freed after the Knowing that the pointer can not be reallocated or freed after the
@ -151,7 +156,7 @@ Specification
concurrency and make use of additional processors for long running concurrency and make use of additional processors for long running
computations on the pointer. computations on the pointer.
6. In C/C++ extensions, the bytes object can be created from a supplied 6. In C/C++ extensions, the bytes object can be created from a supplied
pointer and destructor function to free the memory when the pointer and destructor function to free the memory when the
reference count goes to zero. reference count goes to zero.
@ -165,32 +170,32 @@ Specification
actual Python object. If a good use case arises, it should be possible actual Python object. If a good use case arises, it should be possible
for this to be implemented later with no loss to backwards compatibility. for this to be implemented later with no loss to backwards compatibility.
7. It is also possible to signify the bytes object as readonly, in this 7. It is also possible to signify the bytes object as readonly, in this
case it isn't actually mutable, but does provide the other features of a case it isn't actually mutable, but does provide the other features of a
bytes object. bytes object.
8. The bytes object keeps track of the length of its data with a Python 8. The bytes object keeps track of the length of its data with a Python
LONG_LONG type. Even though the current definition for PyBufferProcs ``LONG_LONG`` type. Even though the current definition for ``PyBufferProcs``
restricts the length to be the size of an int, this PEP does not propose restricts the length to be the size of an int, this PEP does not propose
to make any changes there. Instead, extensions can work around this limit to make any changes there. Instead, extensions can work around this limit
by making an explicit PyBytes_Check(...) call, and if that succeeds they by making an explicit ``PyBytes_Check(...)`` call, and if that succeeds they
can make a PyBytes_GetReadBuffer(...) or PyBytes_GetWriteBuffer call to can make a ``PyBytes_GetReadBuffer(...)`` or ``PyBytes_GetWriteBuffer``
get the pointer and full length of the object as a LONG_LONG. call to get the pointer and full length of the object as a ``LONG_LONG``.
The bytes object will raise an exception if the standard PyBufferProcs The bytes object will raise an exception if the standard ``PyBufferProcs``
mechanism is used and the size of the bytes object is greater than can be mechanism is used and the size of the bytes object is greater than can be
represented by an integer. represented by an integer.
From Python scripting, the bytes object will be subscriptable with longs From Python scripting, the bytes object will be subscriptable with longs
so the 32 bit int limit can be avoided. so the 32 bit int limit can be avoided.
There is still a problem with the len() function as it is PyObject_Size() There is still a problem with the ``len()`` function as it is
and this returns an int as well. As a workaround, the bytes object will ``PyObject_Size()`` and this returns an int as well. As a workaround,
provide a .length() method that will return a long. the bytes object will provide a ``.length()`` method that will return a long.
9. The bytes object can be constructed at the Python scripting level by 9. The bytes object can be constructed at the Python scripting level by
passing an int/long to the bytes constructor with the number of bytes to passing an int/long to the bytes constructor with the number of bytes to
allocate. For example: allocate. For example::
b = bytes(100000) # alloc 100K bytes b = bytes(100000) # alloc 100K bytes
@ -199,158 +204,165 @@ Specification
object into a read-only one. An optional second argument will be used to object into a read-only one. An optional second argument will be used to
designate creation of a readonly bytes object. designate creation of a readonly bytes object.
10. From the C API, the bytes object can be allocated using any of the 10. From the C API, the bytes object can be allocated using any of the
following signatures: following signatures::
PyObject* PyBytes_FromLength(LONG_LONG len, int readonly); PyObject* PyBytes_FromLength(LONG_LONG len, int readonly);
PyObject* PyBytes_FromPointer(void* ptr, LONG_LONG len, int readonly PyObject* PyBytes_FromPointer(void* ptr, LONG_LONG len, int readonly
void (*dest)(void *ptr, void *user), void* user); void (*dest)(void *ptr, void *user), void* user);
In the PyBytes_FromPointer(...) function, if the dest function pointer is In the ``PyBytes_FromPointer(...)`` function, if the dest function pointer
passed in as NULL, it will not be called. This should only be used for is passed in as ``NULL``, it will not be called. This should only be used
creating bytes objects from statically allocated space. for creating bytes objects from statically allocated space.
The user pointer has been called a closure in other places. It is a The user pointer has been called a closure in other places. It is a
pointer that the user can use for whatever purposes. It will be passed to pointer that the user can use for whatever purposes. It will be passed to
the destructor function on cleanup and can be useful for a number of the destructor function on cleanup and can be useful for a number of
things. If the user pointer is not needed, NULL should be passed instead. things. If the user pointer is not needed, ``NULL`` should be passed
instead.
11. The bytes type will be a new style class as that seems to be where all 11. The bytes type will be a new style class as that seems to be where all
standard Python types are headed. standard Python types are headed.
Contrast to existing types Contrast to existing types
==========================
The most common way to work around the lack of a bytes object has been to The most common way to work around the lack of a bytes object has been to
simply use a string object in its place. Binary files, the struct/array simply use a string object in its place. Binary files, the struct/array
modules, and several other examples exist of this. Putting aside the modules, and several other examples exist of this. Putting aside the
style issue that these uses typically have nothing to do with text style issue that these uses typically have nothing to do with text
strings, there is the real problem that strings are not mutable, so direct strings, there is the real problem that strings are not mutable, so direct
manipulation of the data returned in these cases is not possible. Also, manipulation of the data returned in these cases is not possible. Also,
numerous optimizations in the string module (such as caching the hash numerous optimizations in the string module (such as caching the hash
value or interning the pointers) mean that extension authors are on very value or interning the pointers) mean that extension authors are on very
thin ice if they try to break the rules with the string object. thin ice if they try to break the rules with the string object.
The buffer object seems like it was intended to address the purpose that The buffer object seems like it was intended to address the purpose that
the bytes object is trying fulfill, but several shortcomings in its the bytes object is trying fulfill, but several shortcomings in its
implementation [1] have made it less useful in many common cases. The implementation [1]_ have made it less useful in many common cases. The
buffer object made a different choice for its slicing behavior (it returns buffer object made a different choice for its slicing behavior (it returns
new strings instead of buffers for slicing and other operations), and it new strings instead of buffers for slicing and other operations), and it
doesn't make many of the promises on alignment or being able to release doesn't make many of the promises on alignment or being able to release
the GIL that the bytes object does. the GIL that the bytes object does.
Also in regards to the buffer object, it is not possible to simply replace Also in regards to the buffer object, it is not possible to simply replace
the buffer object with the bytes object and maintain backwards the buffer object with the bytes object and maintain backwards
compatibility. The buffer object provides a mechanism to take the compatibility. The buffer object provides a mechanism to take the
PyBufferProcs supplied pointer of another object and present it as its ``PyBufferProcs`` supplied pointer of another object and present it as its
own. Since the behavior of the other object can not be guaranteed to own. Since the behavior of the other object can not be guaranteed to
follow the same set of strict rules that a bytes object does, it can't be follow the same set of strict rules that a bytes object does, it can't be
used in places that a bytes object could. used in places that a bytes object could.
The array module supports the creation of an array of bytes, but it does The array module supports the creation of an array of bytes, but it does
not provide a C API for supplying pointers and destructors to extension not provide a C API for supplying pointers and destructors to extension
supplied memory. This makes it unusable for constructing objects out of supplied memory. This makes it unusable for constructing objects out of
shared memory, or memory that has special alignment or locking for things shared memory, or memory that has special alignment or locking for things
like DMA transfers. Also, the array object does not currently pickle. like DMA transfers. Also, the array object does not currently pickle.
Finally since the array object allows its contents to grow, via the extend Finally since the array object allows its contents to grow, via the extend
method, the pointer can be changed if the GIL is not held while using it. method, the pointer can be changed if the GIL is not held while using it.
Creating a buffer object from an array object has the same problem of Creating a buffer object from an array object has the same problem of
leaving an invalid pointer when the array object is resized. leaving an invalid pointer when the array object is resized.
The mmap object caters to its particular niche, but does not attempt to The mmap object caters to its particular niche, but does not attempt to
solve a wider class of problems. solve a wider class of problems.
Finally, any third party extension can not implement pickling without Finally, any third party extension can not implement pickling without
creating a temporary object of a standard Python type. For example, in the creating a temporary object of a standard Python type. For example, in the
Numeric community, it is unpleasant that a large array can't pickle Numeric community, it is unpleasant that a large array can't pickle
without creating a large binary string to duplicate the array data. without creating a large binary string to duplicate the array data.
Backward Compatibility Backward Compatibility
======================
The only possibility for backwards compatibility problems that the author The only possibility for backwards compatibility problems that the author
is aware of are in previous versions of Python that try to unpickle data is aware of are in previous versions of Python that try to unpickle data
containing the new bytes type. containing the new bytes type.
Reference Implementation Reference Implementation
========================
XXX: Actual implementation is in progress, but changes are still possible XXX: Actual implementation is in progress, but changes are still possible
as this PEP gets further review. as this PEP gets further review.
The following new files will be added to the Python baseline: The following new files will be added to the Python baseline::
Include/bytesobject.h # C interface Include/bytesobject.h # C interface
Objects/bytesobject.c # C implementation Objects/bytesobject.c # C implementation
Lib/test/test_bytes.py # unit testing Lib/test/test_bytes.py # unit testing
Doc/lib/libbytes.tex # documentation Doc/lib/libbytes.tex # documentation
The following files will also be modified: The following files will also be modified::
Include/Python.h # adding bytesmodule.h include file Include/Python.h # adding bytesmodule.h include file
Python/bltinmodule.c # adding the bytes type object Python/bltinmodule.c # adding the bytes type object
Modules/cPickle.c # adding bytes to the standard types Modules/cPickle.c # adding bytes to the standard types
Lib/pickle.py # adding bytes to the standard types Lib/pickle.py # adding bytes to the standard types
It is possible that several other modules could be cleaned up and It is possible that several other modules could be cleaned up and
implemented in terms of the bytes object. The mmap module comes to mind implemented in terms of the bytes object. The mmap module comes to mind
first, but as noted above it would be possible to reimplement the array first, but as noted above it would be possible to reimplement the array
module as a pure Python module. While it is attractive that this PEP module as a pure Python module. While it is attractive that this PEP
could actually reduce the amount of source code by some amount, the author could actually reduce the amount of source code by some amount, the author
feels that this could cause unnecessary risk for breaking existing feels that this could cause unnecessary risk for breaking existing
applications and should be avoided at this time. applications and should be avoided at this time.
Additional Notes/Comments Additional Notes/Comments
=========================
- Guido van Rossum wondered whether it would make sense to be able - Guido van Rossum wondered whether it would make sense to be able
to create a bytes object from a mmap object. The mmap object to create a bytes object from a mmap object. The mmap object
appears to support the requirements necessary to provide memory appears to support the requirements necessary to provide memory
for a bytes object. (It doesn't resize, and the pointer is valid for a bytes object. (It doesn't resize, and the pointer is valid
for the lifetime of the object.) As such, a method could be added for the lifetime of the object.) As such, a method could be added
to the mmap module such that a bytes object could be created to the mmap module such that a bytes object could be created
directly from a mmap object. An initial stab at how this would be directly from a mmap object. An initial stab at how this would be
implemented would be to use the PyBytes_FromPointer() function implemented would be to use the ``PyBytes_FromPointer()`` function
described above and pass the mmap_object as the user pointer. The described above and pass the ``mmap_object`` as the user pointer. The
destructor function would decref the mmap_object for cleanup. destructor function would decref the ``mmap_object`` for cleanup.
- Todd Miller notes that it may be useful to have two new functions: - Todd Miller notes that it may be useful to have two new functions:
PyObject_AsLargeReadBuffer() and PyObject_AsLargeWriteBuffer that are ``PyObject_AsLargeReadBuffer()`` and ``PyObject_AsLargeWriteBuffer`` that are
similar to PyObject_AsReadBuffer() and PyObject_AsWriteBuffer(), but similar to ``PyObject_AsReadBuffer()`` and ``PyObject_AsWriteBuffer()``, but
support getting a LONG_LONG length in addition to the void* pointer. support getting a ``LONG_LONG`` length in addition to the ``void*`` pointer.
These functions would allow extension authors to work transparently with These functions would allow extension authors to work transparently with
bytes object (that support LONG_LONG lengths) and most other buffer like bytes object (that support ``LONG_LONG`` lengths) and most other buffer like
objects (which only support int lengths). These functions could be in objects (which only support int lengths). These functions could be in
lieu of, or in addition to, creating a specific PyByte_GetReadBuffer() and lieu of, or in addition to, creating a specific ``PyByte_GetReadBuffer()`` and
PyBytes_GetWriteBuffer() functions. ``PyBytes_GetWriteBuffer()`` functions.
XXX: The author thinks this is very a good idea as it paves the way for XXX: The author thinks this is very a good idea as it paves the way for
other objects to eventually support large (64 bit) pointers, and it should other objects to eventually support large (64 bit) pointers, and it should
only affect abstract.c and abstract.h. Should this be added above? only affect abstract.c and abstract.h. Should this be added above?
- It was generally agreed that abusing the segment count of the - It was generally agreed that abusing the segment count of the
PyBufferProcs interface is not a good hack to work around the 31 bit ``PyBufferProcs`` interface is not a good hack to work around the 31 bit
limitation of the length. If you don't know what this means, then you're limitation of the length. If you don't know what this means, then you're
in good company. Most code in the Python baseline, and presumably in many in good company. Most code in the Python baseline, and presumably in many
third party extensions, punt when the segment count is not 1. third party extensions, punt when the segment count is not 1.
References References
==========
[1] The buffer interface .. [1] The buffer interface
https://mail.python.org/pipermail/python-dev/2000-October/009974.html https://mail.python.org/pipermail/python-dev/2000-October/009974.html
Copyright Copyright
=========
This document has been placed in the public domain. This document has been placed in the public domain.
..
Local Variables: Local Variables:
mode: indented-text mode: indented-text
indent-tabs-mode: nil indent-tabs-mode: nil
sentence-end-double-space: t sentence-end-double-space: t
fill-column: 70 fill-column: 70
End: End: