reSTify PEP 296 (#352)
This commit is contained in:
parent
5aea3b9792
commit
e6fe4f377f
510
pep-0296.txt
510
pep-0296.txt
|
@ -5,352 +5,364 @@ Last-Modified: $Date$
|
|||
Author: xscottg at yahoo.com (Scott Gilbert)
|
||||
Status: Withdrawn
|
||||
Type: Standards Track
|
||||
Content-Type: text/x-rst
|
||||
Created: 12-Jul-2002
|
||||
Python-Version: 2.3
|
||||
Post-History:
|
||||
|
||||
|
||||
Notice
|
||||
=======
|
||||
|
||||
This PEP is withdrawn by the author (in favor of PEP 358).
|
||||
This PEP is withdrawn by the author (in favor of PEP 358).
|
||||
|
||||
|
||||
Abstract
|
||||
========
|
||||
|
||||
This PEP proposes the creation of a new standard type and builtin
|
||||
constructor called 'bytes'. The bytes object is an efficiently
|
||||
stored array of bytes with some additional characteristics that
|
||||
set it apart from several implementations that are similar.
|
||||
This PEP proposes the creation of a new standard type and builtin
|
||||
constructor called 'bytes'. The bytes object is an efficiently
|
||||
stored array of bytes with some additional characteristics that
|
||||
set it apart from several implementations that are similar.
|
||||
|
||||
|
||||
Rationale
|
||||
=========
|
||||
|
||||
Python currently has many objects that implement something akin to
|
||||
the bytes object of this proposal. For instance the standard
|
||||
string, buffer, array, and mmap objects are all very similar in
|
||||
some regards to the bytes object. Additionally, several
|
||||
significant third party extensions have created similar objects to
|
||||
try and fill similar needs. Frustratingly, each of these objects
|
||||
is too narrow in scope and is missing critical features to make it
|
||||
applicable to a wider category of problems.
|
||||
Python currently has many objects that implement something akin to
|
||||
the bytes object of this proposal. For instance the standard
|
||||
string, buffer, array, and mmap objects are all very similar in
|
||||
some regards to the bytes object. Additionally, several
|
||||
significant third party extensions have created similar objects to
|
||||
try and fill similar needs. Frustratingly, each of these objects
|
||||
is too narrow in scope and is missing critical features to make it
|
||||
applicable to a wider category of problems.
|
||||
|
||||
|
||||
Specification
|
||||
=============
|
||||
|
||||
The bytes object has the following important characteristics:
|
||||
The bytes object has the following important characteristics:
|
||||
|
||||
1. Efficient underlying array storage via the standard C type "unsigned
|
||||
char". This allows fine grain control over how much memory is
|
||||
allocated. With the alignment restrictions designated in the next
|
||||
item, it is trivial for low level extensions to cast the pointer
|
||||
to a different type as needed.
|
||||
1. Efficient underlying array storage via the standard C type "unsigned
|
||||
char". This allows fine grain control over how much memory is
|
||||
allocated. With the alignment restrictions designated in the next
|
||||
item, it is trivial for low level extensions to cast the pointer
|
||||
to a different type as needed.
|
||||
|
||||
Also, since the object is implemented as an array of bytes, it is
|
||||
possible to pass the bytes object to the extensive library of
|
||||
routines already in the standard library that presently work with
|
||||
strings. For instance, the bytes object in conjunction with the
|
||||
struct module could be used to provide a complete replacement for
|
||||
the array module using only Python script.
|
||||
Also, since the object is implemented as an array of bytes, it is
|
||||
possible to pass the bytes object to the extensive library of
|
||||
routines already in the standard library that presently work with
|
||||
strings. For instance, the bytes object in conjunction with the
|
||||
struct module could be used to provide a complete replacement for
|
||||
the array module using only Python script.
|
||||
|
||||
If an unusual platform comes to light, one where there isn't a
|
||||
native unsigned 8 bit type, the object will do its best to
|
||||
represent itself at the Python script level as though it were an
|
||||
array of 8 bit unsigned values. It is doubtful whether many
|
||||
extensions would handle this correctly, but Python script could be
|
||||
portable in these cases.
|
||||
If an unusual platform comes to light, one where there isn't a
|
||||
native unsigned 8 bit type, the object will do its best to
|
||||
represent itself at the Python script level as though it were an
|
||||
array of 8 bit unsigned values. It is doubtful whether many
|
||||
extensions would handle this correctly, but Python script could be
|
||||
portable in these cases.
|
||||
|
||||
2. Alignment of the allocated byte array is whatever is promised by the
|
||||
platform implementation of malloc. A bytes object created from an
|
||||
extension can be supplied that provides any arbitrary alignment as
|
||||
the extension author sees fit.
|
||||
2. Alignment of the allocated byte array is whatever is promised by the
|
||||
platform implementation of malloc. A bytes object created from an
|
||||
extension can be supplied that provides any arbitrary alignment as
|
||||
the extension author sees fit.
|
||||
|
||||
This alignment restriction should allow the bytes object to be
|
||||
used as storage for all standard C types - including PyComplex
|
||||
objects or other structs of standard C type types. Further
|
||||
alignment restrictions can be provided by extensions as necessary.
|
||||
This alignment restriction should allow the bytes object to be
|
||||
used as storage for all standard C types - including ``PyComplex``
|
||||
objects or other structs of standard C type types. Further
|
||||
alignment restrictions can be provided by extensions as necessary.
|
||||
|
||||
3. The bytes object implements a subset of the sequence operations
|
||||
provided by string/array objects, but with slightly different
|
||||
semantics in some cases. In particular, a slice always returns a
|
||||
new bytes object, but the underlying memory is shared between the
|
||||
two objects. This type of slice behavior has been called creating
|
||||
a "view". Additionally, repetition and concatenation are
|
||||
undefined for bytes objects and will raise an exception.
|
||||
3. The bytes object implements a subset of the sequence operations
|
||||
provided by string/array objects, but with slightly different
|
||||
semantics in some cases. In particular, a slice always returns a
|
||||
new bytes object, but the underlying memory is shared between the
|
||||
two objects. This type of slice behavior has been called creating
|
||||
a "view". Additionally, repetition and concatenation are
|
||||
undefined for bytes objects and will raise an exception.
|
||||
|
||||
As these objects are likely to find use in high performance
|
||||
applications, one motivation for the decision to use view slicing
|
||||
is that copying between bytes objects should be very efficient and
|
||||
not require the creation of temporary objects. The following code
|
||||
illustrates this:
|
||||
As these objects are likely to find use in high performance
|
||||
applications, one motivation for the decision to use view slicing
|
||||
is that copying between bytes objects should be very efficient and
|
||||
not require the creation of temporary objects. The following code
|
||||
illustrates this::
|
||||
|
||||
# create two 10 Meg bytes objects
|
||||
b1 = bytes(10000000)
|
||||
b2 = bytes(10000000)
|
||||
# create two 10 Meg bytes objects
|
||||
b1 = bytes(10000000)
|
||||
b2 = bytes(10000000)
|
||||
|
||||
# copy from part of one to another with out creating a 1 Meg temporary
|
||||
b1[2000000:3000000] = b2[4000000:5000000]
|
||||
# copy from part of one to another with out creating a 1 Meg temporary
|
||||
b1[2000000:3000000] = b2[4000000:5000000]
|
||||
|
||||
Slice assignment where the rvalue is not the same length as the
|
||||
lvalue will raise an exception. However, slice assignment will
|
||||
work correctly with overlapping slices (typically implemented with
|
||||
memmove).
|
||||
Slice assignment where the rvalue is not the same length as the
|
||||
lvalue will raise an exception. However, slice assignment will
|
||||
work correctly with overlapping slices (typically implemented with
|
||||
memmove).
|
||||
|
||||
4. The bytes object will be recognized as a native type by the pickle and
|
||||
cPickle modules for efficient serialization. (In truth, this is
|
||||
the only requirement that can't be implemented via a third party
|
||||
extension.)
|
||||
4. The bytes object will be recognized as a native type by the ``pickle`` and
|
||||
``cPickle`` modules for efficient serialization. (In truth, this is
|
||||
the only requirement that can't be implemented via a third party
|
||||
extension.)
|
||||
|
||||
Partial solutions to address the need to serialize the data stored
|
||||
in a bytes-like object without creating a temporary copy of the
|
||||
data into a string have been implemented in the past. The tofile
|
||||
and fromfile methods of the array object are good examples of
|
||||
this. The bytes object will support these methods too. However,
|
||||
pickling is useful in other situations - such as in the shelve
|
||||
module, or implementing RPC of Python objects, and requiring the
|
||||
end user to use two different serialization mechanisms to get an
|
||||
efficient transfer of data is undesirable.
|
||||
Partial solutions to address the need to serialize the data stored
|
||||
in a bytes-like object without creating a temporary copy of the
|
||||
data into a string have been implemented in the past. The tofile
|
||||
and fromfile methods of the array object are good examples of
|
||||
this. The bytes object will support these methods too. However,
|
||||
pickling is useful in other situations - such as in the shelve
|
||||
module, or implementing RPC of Python objects, and requiring the
|
||||
end user to use two different serialization mechanisms to get an
|
||||
efficient transfer of data is undesirable.
|
||||
|
||||
XXX: Will try to implement pickling of the new bytes object in
|
||||
such a way that previous versions of Python will unpickle it as a
|
||||
string object.
|
||||
XXX: Will try to implement pickling of the new bytes object in
|
||||
such a way that previous versions of Python will unpickle it as a
|
||||
string object.
|
||||
|
||||
When unpickling, the bytes object will be created from memory
|
||||
allocated from Python (via malloc). As such, it will lose any
|
||||
additional properties that an extension supplied pointer might
|
||||
have provided (special alignment, or special types of memory).
|
||||
When unpickling, the bytes object will be created from memory
|
||||
allocated from Python (via ``malloc``). As such, it will lose any
|
||||
additional properties that an extension supplied pointer might
|
||||
have provided (special alignment, or special types of memory).
|
||||
|
||||
XXX: Will try to make it so that C subclasses of bytes type can
|
||||
supply the memory that will be unpickled into. For instance, a
|
||||
derived class called PageAlignedBytes would unpickle to memory
|
||||
that is also page aligned.
|
||||
XXX: Will try to make it so that C subclasses of bytes type can
|
||||
supply the memory that will be unpickled into. For instance, a
|
||||
derived class called PageAlignedBytes would unpickle to memory
|
||||
that is also page aligned.
|
||||
|
||||
On any platform where an int is 32 bits (most of them), it is
|
||||
currently impossible to create a string with a length larger than
|
||||
can be represented in 31 bits. As such, pickling to a string will
|
||||
raise an exception when the operation is not possible.
|
||||
On any platform where an int is 32 bits (most of them), it is
|
||||
currently impossible to create a string with a length larger than
|
||||
can be represented in 31 bits. As such, pickling to a string will
|
||||
raise an exception when the operation is not possible.
|
||||
|
||||
At least on platforms supporting large files (many of them),
|
||||
pickling large bytes objects to files should be possible via
|
||||
repeated calls to the file.write() method.
|
||||
At least on platforms supporting large files (many of them),
|
||||
pickling large bytes objects to files should be possible via
|
||||
repeated calls to the ``file.write()`` method.
|
||||
|
||||
5. The bytes type supports the PyBufferProcs interface, but a bytes object
|
||||
provides the additional guarantee that the pointer will not be
|
||||
deallocated or reallocated as long as a reference to the bytes
|
||||
object is held. This implies that a bytes object is not resizable
|
||||
once it is created, but allows the global interpreter lock (GIL)
|
||||
to be released while a separate thread manipulates the memory
|
||||
pointed to if the PyBytes_Check(...) test passes.
|
||||
5. The bytes type supports the ``PyBufferProcs`` interface, but a bytes object
|
||||
provides the additional guarantee that the pointer will not be
|
||||
deallocated or reallocated as long as a reference to the bytes
|
||||
object is held. This implies that a bytes object is not resizable
|
||||
once it is created, but allows the global interpreter lock (GIL)
|
||||
to be released while a separate thread manipulates the memory
|
||||
pointed to if the ``PyBytes_Check(...)`` test passes.
|
||||
|
||||
This characteristic of the bytes object allows it to be used in
|
||||
situations such as asynchronous file I/O or on multiprocessor
|
||||
machines where the pointer obtained by PyBufferProcs will be used
|
||||
independently of the global interpreter lock.
|
||||
This characteristic of the bytes object allows it to be used in
|
||||
situations such as asynchronous file I/O or on multiprocessor
|
||||
machines where the pointer obtained by ``PyBufferProcs`` will be used
|
||||
independently of the global interpreter lock.
|
||||
|
||||
Knowing that the pointer can not be reallocated or freed after the
|
||||
GIL is released gives extension authors the capability to get true
|
||||
concurrency and make use of additional processors for long running
|
||||
computations on the pointer.
|
||||
Knowing that the pointer can not be reallocated or freed after the
|
||||
GIL is released gives extension authors the capability to get true
|
||||
concurrency and make use of additional processors for long running
|
||||
computations on the pointer.
|
||||
|
||||
6. In C/C++ extensions, the bytes object can be created from a supplied
|
||||
pointer and destructor function to free the memory when the
|
||||
reference count goes to zero.
|
||||
6. In C/C++ extensions, the bytes object can be created from a supplied
|
||||
pointer and destructor function to free the memory when the
|
||||
reference count goes to zero.
|
||||
|
||||
The special implementation of slicing for the bytes object allows
|
||||
multiple bytes objects to refer to the same pointer/destructor.
|
||||
As such, a refcount will be kept on the actual
|
||||
pointer/destructor. This refcount is separate from the refcount
|
||||
typically associated with Python objects.
|
||||
The special implementation of slicing for the bytes object allows
|
||||
multiple bytes objects to refer to the same pointer/destructor.
|
||||
As such, a refcount will be kept on the actual
|
||||
pointer/destructor. This refcount is separate from the refcount
|
||||
typically associated with Python objects.
|
||||
|
||||
XXX: It may be desirable to expose the inner refcounted object as an
|
||||
actual Python object. If a good use case arises, it should be possible
|
||||
for this to be implemented later with no loss to backwards compatibility.
|
||||
XXX: It may be desirable to expose the inner refcounted object as an
|
||||
actual Python object. If a good use case arises, it should be possible
|
||||
for this to be implemented later with no loss to backwards compatibility.
|
||||
|
||||
7. It is also possible to signify the bytes object as readonly, in this
|
||||
case it isn't actually mutable, but does provide the other features of a
|
||||
bytes object.
|
||||
7. It is also possible to signify the bytes object as readonly, in this
|
||||
case it isn't actually mutable, but does provide the other features of a
|
||||
bytes object.
|
||||
|
||||
8. The bytes object keeps track of the length of its data with a Python
|
||||
LONG_LONG type. Even though the current definition for PyBufferProcs
|
||||
restricts the length to be the size of an int, this PEP does not propose
|
||||
to make any changes there. Instead, extensions can work around this limit
|
||||
by making an explicit PyBytes_Check(...) call, and if that succeeds they
|
||||
can make a PyBytes_GetReadBuffer(...) or PyBytes_GetWriteBuffer call to
|
||||
get the pointer and full length of the object as a LONG_LONG.
|
||||
8. The bytes object keeps track of the length of its data with a Python
|
||||
``LONG_LONG`` type. Even though the current definition for ``PyBufferProcs``
|
||||
restricts the length to be the size of an int, this PEP does not propose
|
||||
to make any changes there. Instead, extensions can work around this limit
|
||||
by making an explicit ``PyBytes_Check(...)`` call, and if that succeeds they
|
||||
can make a ``PyBytes_GetReadBuffer(...)`` or ``PyBytes_GetWriteBuffer``
|
||||
call to get the pointer and full length of the object as a ``LONG_LONG``.
|
||||
|
||||
The bytes object will raise an exception if the standard PyBufferProcs
|
||||
mechanism is used and the size of the bytes object is greater than can be
|
||||
represented by an integer.
|
||||
The bytes object will raise an exception if the standard ``PyBufferProcs``
|
||||
mechanism is used and the size of the bytes object is greater than can be
|
||||
represented by an integer.
|
||||
|
||||
From Python scripting, the bytes object will be subscriptable with longs
|
||||
so the 32 bit int limit can be avoided.
|
||||
From Python scripting, the bytes object will be subscriptable with longs
|
||||
so the 32 bit int limit can be avoided.
|
||||
|
||||
There is still a problem with the len() function as it is PyObject_Size()
|
||||
and this returns an int as well. As a workaround, the bytes object will
|
||||
provide a .length() method that will return a long.
|
||||
There is still a problem with the ``len()`` function as it is
|
||||
``PyObject_Size()`` and this returns an int as well. As a workaround,
|
||||
the bytes object will provide a ``.length()`` method that will return a long.
|
||||
|
||||
9. The bytes object can be constructed at the Python scripting level by
|
||||
passing an int/long to the bytes constructor with the number of bytes to
|
||||
allocate. For example:
|
||||
9. The bytes object can be constructed at the Python scripting level by
|
||||
passing an int/long to the bytes constructor with the number of bytes to
|
||||
allocate. For example::
|
||||
|
||||
b = bytes(100000) # alloc 100K bytes
|
||||
|
||||
The constructor can also take another bytes object. This will be useful
|
||||
for the implementation of unpickling, and in converting a read-write bytes
|
||||
object into a read-only one. An optional second argument will be used to
|
||||
designate creation of a readonly bytes object.
|
||||
The constructor can also take another bytes object. This will be useful
|
||||
for the implementation of unpickling, and in converting a read-write bytes
|
||||
object into a read-only one. An optional second argument will be used to
|
||||
designate creation of a readonly bytes object.
|
||||
|
||||
10. From the C API, the bytes object can be allocated using any of the
|
||||
following signatures:
|
||||
10. From the C API, the bytes object can be allocated using any of the
|
||||
following signatures::
|
||||
|
||||
PyObject* PyBytes_FromLength(LONG_LONG len, int readonly);
|
||||
PyObject* PyBytes_FromPointer(void* ptr, LONG_LONG len, int readonly
|
||||
void (*dest)(void *ptr, void *user), void* user);
|
||||
PyObject* PyBytes_FromLength(LONG_LONG len, int readonly);
|
||||
PyObject* PyBytes_FromPointer(void* ptr, LONG_LONG len, int readonly
|
||||
void (*dest)(void *ptr, void *user), void* user);
|
||||
|
||||
In the PyBytes_FromPointer(...) function, if the dest function pointer is
|
||||
passed in as NULL, it will not be called. This should only be used for
|
||||
creating bytes objects from statically allocated space.
|
||||
In the ``PyBytes_FromPointer(...)`` function, if the dest function pointer
|
||||
is passed in as ``NULL``, it will not be called. This should only be used
|
||||
for creating bytes objects from statically allocated space.
|
||||
|
||||
The user pointer has been called a closure in other places. It is a
|
||||
pointer that the user can use for whatever purposes. It will be passed to
|
||||
the destructor function on cleanup and can be useful for a number of
|
||||
things. If the user pointer is not needed, NULL should be passed instead.
|
||||
things. If the user pointer is not needed, ``NULL`` should be passed
|
||||
instead.
|
||||
|
||||
11. The bytes type will be a new style class as that seems to be where all
|
||||
11. The bytes type will be a new style class as that seems to be where all
|
||||
standard Python types are headed.
|
||||
|
||||
|
||||
Contrast to existing types
|
||||
==========================
|
||||
|
||||
The most common way to work around the lack of a bytes object has been to
|
||||
simply use a string object in its place. Binary files, the struct/array
|
||||
modules, and several other examples exist of this. Putting aside the
|
||||
style issue that these uses typically have nothing to do with text
|
||||
strings, there is the real problem that strings are not mutable, so direct
|
||||
manipulation of the data returned in these cases is not possible. Also,
|
||||
numerous optimizations in the string module (such as caching the hash
|
||||
value or interning the pointers) mean that extension authors are on very
|
||||
thin ice if they try to break the rules with the string object.
|
||||
The most common way to work around the lack of a bytes object has been to
|
||||
simply use a string object in its place. Binary files, the struct/array
|
||||
modules, and several other examples exist of this. Putting aside the
|
||||
style issue that these uses typically have nothing to do with text
|
||||
strings, there is the real problem that strings are not mutable, so direct
|
||||
manipulation of the data returned in these cases is not possible. Also,
|
||||
numerous optimizations in the string module (such as caching the hash
|
||||
value or interning the pointers) mean that extension authors are on very
|
||||
thin ice if they try to break the rules with the string object.
|
||||
|
||||
The buffer object seems like it was intended to address the purpose that
|
||||
the bytes object is trying fulfill, but several shortcomings in its
|
||||
implementation [1] have made it less useful in many common cases. The
|
||||
buffer object made a different choice for its slicing behavior (it returns
|
||||
new strings instead of buffers for slicing and other operations), and it
|
||||
doesn't make many of the promises on alignment or being able to release
|
||||
the GIL that the bytes object does.
|
||||
The buffer object seems like it was intended to address the purpose that
|
||||
the bytes object is trying fulfill, but several shortcomings in its
|
||||
implementation [1]_ have made it less useful in many common cases. The
|
||||
buffer object made a different choice for its slicing behavior (it returns
|
||||
new strings instead of buffers for slicing and other operations), and it
|
||||
doesn't make many of the promises on alignment or being able to release
|
||||
the GIL that the bytes object does.
|
||||
|
||||
Also in regards to the buffer object, it is not possible to simply replace
|
||||
the buffer object with the bytes object and maintain backwards
|
||||
compatibility. The buffer object provides a mechanism to take the
|
||||
PyBufferProcs supplied pointer of another object and present it as its
|
||||
own. Since the behavior of the other object can not be guaranteed to
|
||||
follow the same set of strict rules that a bytes object does, it can't be
|
||||
used in places that a bytes object could.
|
||||
Also in regards to the buffer object, it is not possible to simply replace
|
||||
the buffer object with the bytes object and maintain backwards
|
||||
compatibility. The buffer object provides a mechanism to take the
|
||||
``PyBufferProcs`` supplied pointer of another object and present it as its
|
||||
own. Since the behavior of the other object can not be guaranteed to
|
||||
follow the same set of strict rules that a bytes object does, it can't be
|
||||
used in places that a bytes object could.
|
||||
|
||||
The array module supports the creation of an array of bytes, but it does
|
||||
not provide a C API for supplying pointers and destructors to extension
|
||||
supplied memory. This makes it unusable for constructing objects out of
|
||||
shared memory, or memory that has special alignment or locking for things
|
||||
like DMA transfers. Also, the array object does not currently pickle.
|
||||
Finally since the array object allows its contents to grow, via the extend
|
||||
method, the pointer can be changed if the GIL is not held while using it.
|
||||
The array module supports the creation of an array of bytes, but it does
|
||||
not provide a C API for supplying pointers and destructors to extension
|
||||
supplied memory. This makes it unusable for constructing objects out of
|
||||
shared memory, or memory that has special alignment or locking for things
|
||||
like DMA transfers. Also, the array object does not currently pickle.
|
||||
Finally since the array object allows its contents to grow, via the extend
|
||||
method, the pointer can be changed if the GIL is not held while using it.
|
||||
|
||||
Creating a buffer object from an array object has the same problem of
|
||||
leaving an invalid pointer when the array object is resized.
|
||||
Creating a buffer object from an array object has the same problem of
|
||||
leaving an invalid pointer when the array object is resized.
|
||||
|
||||
The mmap object caters to its particular niche, but does not attempt to
|
||||
solve a wider class of problems.
|
||||
The mmap object caters to its particular niche, but does not attempt to
|
||||
solve a wider class of problems.
|
||||
|
||||
Finally, any third party extension can not implement pickling without
|
||||
creating a temporary object of a standard Python type. For example, in the
|
||||
Numeric community, it is unpleasant that a large array can't pickle
|
||||
without creating a large binary string to duplicate the array data.
|
||||
Finally, any third party extension can not implement pickling without
|
||||
creating a temporary object of a standard Python type. For example, in the
|
||||
Numeric community, it is unpleasant that a large array can't pickle
|
||||
without creating a large binary string to duplicate the array data.
|
||||
|
||||
|
||||
Backward Compatibility
|
||||
======================
|
||||
|
||||
The only possibility for backwards compatibility problems that the author
|
||||
is aware of are in previous versions of Python that try to unpickle data
|
||||
containing the new bytes type.
|
||||
The only possibility for backwards compatibility problems that the author
|
||||
is aware of are in previous versions of Python that try to unpickle data
|
||||
containing the new bytes type.
|
||||
|
||||
|
||||
Reference Implementation
|
||||
========================
|
||||
|
||||
XXX: Actual implementation is in progress, but changes are still possible
|
||||
as this PEP gets further review.
|
||||
XXX: Actual implementation is in progress, but changes are still possible
|
||||
as this PEP gets further review.
|
||||
|
||||
The following new files will be added to the Python baseline:
|
||||
The following new files will be added to the Python baseline::
|
||||
|
||||
Include/bytesobject.h # C interface
|
||||
Objects/bytesobject.c # C implementation
|
||||
Lib/test/test_bytes.py # unit testing
|
||||
Doc/lib/libbytes.tex # documentation
|
||||
Include/bytesobject.h # C interface
|
||||
Objects/bytesobject.c # C implementation
|
||||
Lib/test/test_bytes.py # unit testing
|
||||
Doc/lib/libbytes.tex # documentation
|
||||
|
||||
The following files will also be modified:
|
||||
The following files will also be modified::
|
||||
|
||||
Include/Python.h # adding bytesmodule.h include file
|
||||
Python/bltinmodule.c # adding the bytes type object
|
||||
Modules/cPickle.c # adding bytes to the standard types
|
||||
Lib/pickle.py # adding bytes to the standard types
|
||||
Include/Python.h # adding bytesmodule.h include file
|
||||
Python/bltinmodule.c # adding the bytes type object
|
||||
Modules/cPickle.c # adding bytes to the standard types
|
||||
Lib/pickle.py # adding bytes to the standard types
|
||||
|
||||
It is possible that several other modules could be cleaned up and
|
||||
implemented in terms of the bytes object. The mmap module comes to mind
|
||||
first, but as noted above it would be possible to reimplement the array
|
||||
module as a pure Python module. While it is attractive that this PEP
|
||||
could actually reduce the amount of source code by some amount, the author
|
||||
feels that this could cause unnecessary risk for breaking existing
|
||||
applications and should be avoided at this time.
|
||||
It is possible that several other modules could be cleaned up and
|
||||
implemented in terms of the bytes object. The mmap module comes to mind
|
||||
first, but as noted above it would be possible to reimplement the array
|
||||
module as a pure Python module. While it is attractive that this PEP
|
||||
could actually reduce the amount of source code by some amount, the author
|
||||
feels that this could cause unnecessary risk for breaking existing
|
||||
applications and should be avoided at this time.
|
||||
|
||||
|
||||
Additional Notes/Comments
|
||||
=========================
|
||||
|
||||
- Guido van Rossum wondered whether it would make sense to be able
|
||||
to create a bytes object from a mmap object. The mmap object
|
||||
appears to support the requirements necessary to provide memory
|
||||
for a bytes object. (It doesn't resize, and the pointer is valid
|
||||
for the lifetime of the object.) As such, a method could be added
|
||||
to the mmap module such that a bytes object could be created
|
||||
directly from a mmap object. An initial stab at how this would be
|
||||
implemented would be to use the PyBytes_FromPointer() function
|
||||
described above and pass the mmap_object as the user pointer. The
|
||||
destructor function would decref the mmap_object for cleanup.
|
||||
- Guido van Rossum wondered whether it would make sense to be able
|
||||
to create a bytes object from a mmap object. The mmap object
|
||||
appears to support the requirements necessary to provide memory
|
||||
for a bytes object. (It doesn't resize, and the pointer is valid
|
||||
for the lifetime of the object.) As such, a method could be added
|
||||
to the mmap module such that a bytes object could be created
|
||||
directly from a mmap object. An initial stab at how this would be
|
||||
implemented would be to use the ``PyBytes_FromPointer()`` function
|
||||
described above and pass the ``mmap_object`` as the user pointer. The
|
||||
destructor function would decref the ``mmap_object`` for cleanup.
|
||||
|
||||
- Todd Miller notes that it may be useful to have two new functions:
|
||||
PyObject_AsLargeReadBuffer() and PyObject_AsLargeWriteBuffer that are
|
||||
similar to PyObject_AsReadBuffer() and PyObject_AsWriteBuffer(), but
|
||||
support getting a LONG_LONG length in addition to the void* pointer.
|
||||
These functions would allow extension authors to work transparently with
|
||||
bytes object (that support LONG_LONG lengths) and most other buffer like
|
||||
objects (which only support int lengths). These functions could be in
|
||||
lieu of, or in addition to, creating a specific PyByte_GetReadBuffer() and
|
||||
PyBytes_GetWriteBuffer() functions.
|
||||
- Todd Miller notes that it may be useful to have two new functions:
|
||||
``PyObject_AsLargeReadBuffer()`` and ``PyObject_AsLargeWriteBuffer`` that are
|
||||
similar to ``PyObject_AsReadBuffer()`` and ``PyObject_AsWriteBuffer()``, but
|
||||
support getting a ``LONG_LONG`` length in addition to the ``void*`` pointer.
|
||||
These functions would allow extension authors to work transparently with
|
||||
bytes object (that support ``LONG_LONG`` lengths) and most other buffer like
|
||||
objects (which only support int lengths). These functions could be in
|
||||
lieu of, or in addition to, creating a specific ``PyByte_GetReadBuffer()`` and
|
||||
``PyBytes_GetWriteBuffer()`` functions.
|
||||
|
||||
XXX: The author thinks this is very a good idea as it paves the way for
|
||||
other objects to eventually support large (64 bit) pointers, and it should
|
||||
only affect abstract.c and abstract.h. Should this be added above?
|
||||
XXX: The author thinks this is very a good idea as it paves the way for
|
||||
other objects to eventually support large (64 bit) pointers, and it should
|
||||
only affect abstract.c and abstract.h. Should this be added above?
|
||||
|
||||
- It was generally agreed that abusing the segment count of the
|
||||
PyBufferProcs interface is not a good hack to work around the 31 bit
|
||||
limitation of the length. If you don't know what this means, then you're
|
||||
in good company. Most code in the Python baseline, and presumably in many
|
||||
third party extensions, punt when the segment count is not 1.
|
||||
- It was generally agreed that abusing the segment count of the
|
||||
``PyBufferProcs`` interface is not a good hack to work around the 31 bit
|
||||
limitation of the length. If you don't know what this means, then you're
|
||||
in good company. Most code in the Python baseline, and presumably in many
|
||||
third party extensions, punt when the segment count is not 1.
|
||||
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
[1] The buffer interface
|
||||
https://mail.python.org/pipermail/python-dev/2000-October/009974.html
|
||||
.. [1] The buffer interface
|
||||
https://mail.python.org/pipermail/python-dev/2000-October/009974.html
|
||||
|
||||
|
||||
Copyright
|
||||
=========
|
||||
|
||||
This document has been placed in the public domain.
|
||||
This document has been placed in the public domain.
|
||||
|
||||
|
||||
|
||||
Local Variables:
|
||||
mode: indented-text
|
||||
indent-tabs-mode: nil
|
||||
sentence-end-double-space: t
|
||||
fill-column: 70
|
||||
End:
|
||||
..
|
||||
Local Variables:
|
||||
mode: indented-text
|
||||
indent-tabs-mode: nil
|
||||
sentence-end-double-space: t
|
||||
fill-column: 70
|
||||
End:
|
||||
|
|
Loading…
Reference in New Issue