PEP 467: try to streamline the proposal
This commit is contained in:
parent
4820a784db
commit
19a8de5ca0
235
pep-0467.txt
235
pep-0467.txt
|
@ -20,163 +20,166 @@ that is now referred to as ``bytearray``. Other aspects of operating in
|
|||
the binary domain in Python have also evolved over the course of the Python
|
||||
3 series.
|
||||
|
||||
This PEP proposes a number of small adjustments to the APIs of the ``bytes``
|
||||
and ``bytearray`` types to make it easier to operate entirely in the binary
|
||||
domain.
|
||||
This PEP proposes four small adjustments to the APIs of the ``bytes``,
|
||||
``bytearray`` and ``memoryview`` types to make it easier to operate entirely
|
||||
in the binary domain:
|
||||
|
||||
|
||||
Background
|
||||
==========
|
||||
|
||||
To simplify the task of writing the Python 3 documentation, the ``bytes``
|
||||
and ``bytearray`` types were documented primarily in terms of the way they
|
||||
differed from the Unicode based Python 3 ``str`` type. Even when I
|
||||
`heavily revised the sequence documentation
|
||||
<http://hg.python.org/cpython/rev/463f52d20314>`__ in 2012, I retained that
|
||||
simplifying shortcut.
|
||||
|
||||
However, it turns out that this approach to the documentation of these types
|
||||
had a problem: it doesn't adequately introduce users to their hybrid nature,
|
||||
where they can be manipulated *either* as a "sequence of integers" type,
|
||||
*or* as ``str``-like types that assume ASCII compatible data.
|
||||
|
||||
That oversight has now been corrected, with the binary sequence types now
|
||||
being documented entirely independently of the ``str`` documentation in
|
||||
`Python 3.4+ <https://docs.python.org/3/library/stdtypes.html#binary-sequence-types-bytes-bytearray-memoryview>`__
|
||||
|
||||
The confusion isn't just a documentation issue, however, as there are also
|
||||
some lingering design quirks from an earlier pre-release design where there
|
||||
was *no* separate ``bytearray`` type, and instead the core ``bytes`` type
|
||||
was mutable (with no immutable counterpart).
|
||||
|
||||
Finally, additional experience with using the existing Python 3 binary
|
||||
sequence types in real world applications has suggested it would be
|
||||
beneficial to make it easier to convert integers to length 1 bytes objects.
|
||||
* Deprecate passing single integer values to ``bytes`` and ``bytearray``
|
||||
* Add ``bytes.zeros`` and ``bytearray.zeros`` alternative constructors
|
||||
* Add ``bytes.byte`` and ``bytearray.byte`` alternative constructors
|
||||
* Add ``bytes.iterbytes``, ``bytearray.iterbytes`` and
|
||||
``memoryview.iterbytes`` alternative iterators
|
||||
|
||||
|
||||
Proposals
|
||||
=========
|
||||
|
||||
As a "consistency improvement" proposal, this PEP is actually about a few
|
||||
smaller micro-proposals, each aimed at improving the usability of the binary
|
||||
data model in Python 3. Proposals are motivated by one of two main factors:
|
||||
Deprecation of current "zero-initialised sequence" behaviour
|
||||
------------------------------------------------------------
|
||||
|
||||
* removing remnants of the original design of ``bytes`` as a mutable type
|
||||
* allowing users to easily convert integer values to a length 1 ``bytes``
|
||||
object
|
||||
Currently, the ``bytes`` and ``bytearray`` constructors accept an integer
|
||||
argument and interpret it as meaning to create a zero-initialised sequence
|
||||
of the given size::
|
||||
|
||||
|
||||
Alternate Constructors
|
||||
----------------------
|
||||
|
||||
The ``bytes`` and ``bytearray`` constructors currently accept an integer
|
||||
argument, but interpret it to mean a zero-filled object of the given length.
|
||||
This is a legacy of the original design of ``bytes`` as a mutable type,
|
||||
rather than a particularly intuitive behaviour for users. It has become
|
||||
especially confusing now that some other ``bytes`` interfaces treat integers
|
||||
and the corresponding length 1 bytes instances as equivalent input.
|
||||
Compare::
|
||||
|
||||
>>> b"\x03" in bytes([1, 2, 3])
|
||||
True
|
||||
>>> 3 in bytes([1, 2, 3])
|
||||
True
|
||||
|
||||
>>> bytes(b"\x03")
|
||||
b'\x03'
|
||||
>>> bytes(3)
|
||||
b'\x00\x00\x00'
|
||||
>>> bytearray(3)
|
||||
bytearray(b'\x00\x00\x00')
|
||||
|
||||
This PEP proposes that the current handling of integers in the bytes and
|
||||
bytearray constructors by deprecated in Python 3.5 and targeted for
|
||||
removal in Python 3.7, being replaced by two more explicit alternate
|
||||
constructors provided as class methods. The initial python-ideas thread
|
||||
[ideas-thread1]_ that spawned this PEP was specifically aimed at deprecating
|
||||
this constructor behaviour.
|
||||
This PEP proposes to deprecate that behaviour in Python 3.5, and remove it
|
||||
entirely in Python 3.6.
|
||||
|
||||
Firstly, a ``byte`` constructor is proposed that converts integers
|
||||
in the range 0 to 255 (inclusive) to a ``bytes`` object::
|
||||
No other changes are proposed to the existing constructors.
|
||||
|
||||
>>> bytes.byte(3)
|
||||
b'\x03'
|
||||
>>> bytearray.byte(3)
|
||||
bytearray(b'\x03')
|
||||
>>> bytes.byte(512)
|
||||
Traceback (most recent call last):
|
||||
File "<stdin>", line 1, in <module>
|
||||
ValueError: bytes must be in range(0, 256)
|
||||
|
||||
One specific use case for this alternate constructor is to easily convert
|
||||
the result of indexing operations on ``bytes`` and other binary sequences
|
||||
from an integer to a ``bytes`` object. The documentation for this API
|
||||
should note that its counterpart for the reverse conversion is ``ord()``.
|
||||
The ``ord()`` documentation will also be updated to note that while
|
||||
``chr()`` is the counterpart for ``str`` input, ``bytes.byte`` and
|
||||
``bytearray.byte`` are the counterparts for binary input.
|
||||
Addition of explicit "zero-initialised sequence" constructors
|
||||
-------------------------------------------------------------
|
||||
|
||||
Secondly, a ``zeros`` constructor is proposed that serves as a direct
|
||||
replacement for the current constructor behaviour, rather than having to use
|
||||
sequence repetition to achieve the same effect in a less intuitive way::
|
||||
To replace the deprecated behaviour, this PEP proposes the addition of an
|
||||
explicit ``zeros`` alternative constructor as a class method on both
|
||||
``bytes`` and ``bytearray``::
|
||||
|
||||
>>> bytes.zeros(3)
|
||||
b'\x00\x00\x00'
|
||||
>>> bytearray.zeros(3)
|
||||
bytearray(b'\x00\x00\x00')
|
||||
|
||||
The chosen name here is taken from the corresponding initialisation function
|
||||
in NumPy (although, as these are sequence types rather than N-dimensional
|
||||
matrices, the constructors take a length as input rather than a shape tuple)
|
||||
It will behave just as the current constructors behave when passed a single
|
||||
integer.
|
||||
|
||||
While ``bytes.byte`` and ``bytearray.zeros`` are expected to be the more
|
||||
useful duo amongst the new constructors, ``bytes.zeros`` and
|
||||
`bytearray.byte`` are provided in order to maintain API consistency between
|
||||
the two types.
|
||||
The specific choice of ``zeros`` as the alternative constructor name is taken
|
||||
from the corresponding initialisation function in NumPy (although, as these
|
||||
are 1-dimensional sequence types rather than N-dimensional matrices, the
|
||||
constructors take a length as input rather than a shape tuple)
|
||||
|
||||
|
||||
Iteration
|
||||
---------
|
||||
Addition of explicit "single byte" constructors
|
||||
-----------------------------------------------
|
||||
|
||||
While iteration over ``bytes`` objects and other binary sequences produces
|
||||
integers, it is sometimes desirable to iterate over length 1 bytes objects
|
||||
instead.
|
||||
As binary counterparts to the text ``chr`` function, this PEP proposes the
|
||||
addition of an explicit ``byte`` alternative constructor as a class method
|
||||
on both ``bytes`` and ``bytearray``::
|
||||
|
||||
To handle this situation more obviously (and more efficiently) than would be
|
||||
the case with the ``map(bytes.byte, data)`` construct enabled by the above
|
||||
constructor changes, this PEP proposes the addition of a new ``iterbytes``
|
||||
method to ``bytes``, ``bytearray`` and ``memoryview``::
|
||||
>>> bytes.byte(3)
|
||||
b'\x03'
|
||||
>>> bytearray.byte(3)
|
||||
bytearray(b'\x03')
|
||||
|
||||
These methods will only accept integers in the range 0 to 255 (inclusive)::
|
||||
|
||||
>>> bytes.byte(512)
|
||||
Traceback (most recent call last):
|
||||
File "<stdin>", line 1, in <module>
|
||||
ValueError: bytes must be in range(0, 256)
|
||||
|
||||
>>> bytes.byte(1.0)
|
||||
Traceback (most recent call last):
|
||||
File "<stdin>", line 1, in <module>
|
||||
TypeError: 'float' object cannot be interpreted as an integer
|
||||
|
||||
The documentation of the ``ord`` builtin will be updated to explicitly note
|
||||
that ``bytes.byte`` is the inverse operation for binary data, while ``chr``
|
||||
is the inverse operation for text data.
|
||||
|
||||
Behaviourally, ``bytes.byte(x)`` will be equivalent to the current
|
||||
``bytes([x])`` (and similarly for ``bytearray``). The new spelling is
|
||||
expected to be easier to discover and easier to read (especially when used
|
||||
in conjunction with indexing operations on binary sequence types).
|
||||
|
||||
As a separate method, the new spelling will also work better with higher
|
||||
order functions like ``map``.
|
||||
|
||||
|
||||
Addition of optimised iterator methods that produce ``bytes`` objects
|
||||
---------------------------------------------------------------------
|
||||
|
||||
This PEP proposes that ``bytes``, ``bytearray`` and ``memoryview`` gain an
|
||||
optimised ``iterbytes`` method that produces length 1 ``bytes`` objects
|
||||
rather than integers::
|
||||
|
||||
for x in data.iterbytes():
|
||||
# x is a length 1 ``bytes`` object, rather than an integer
|
||||
|
||||
Third party types and arbitrary containers of integers that lack the new
|
||||
method can still be handled by combining ``map`` with the new
|
||||
``bytes.byte()`` alternate constructor proposed above::
|
||||
The method can be used with arbitrary buffer exporting objects by wrapping
|
||||
them in a ``memoryview`` instance first::
|
||||
|
||||
for x in map(bytes.byte, data):
|
||||
for x in memoryview(data).iterbytes():
|
||||
# x is a length 1 ``bytes`` object, rather than an integer
|
||||
# This works with *any* container of integers in the range
|
||||
# 0 to 255 inclusive
|
||||
|
||||
For ``memoryview``, the semantics of ``iterbytes()`` are defined such that::
|
||||
|
||||
memview.tobytes() == b''.join(memview.iterbytes())
|
||||
|
||||
This allows the raw bytes of the memory view to be iterated over without
|
||||
needing to make a copy, regardless of the defined shape and format.
|
||||
|
||||
The main advantage this method offers over the ``map(bytes.byte, data)``
|
||||
approach is that it is guaranteed *not* to fail midstream with a
|
||||
``ValueError`` or ``TypeError``. By contrast, when using the ``map`` based
|
||||
approach, the type and value of the individual items in the iterable are
|
||||
only checked as they are retrieved and passed through the ``bytes.byte``
|
||||
constructor.
|
||||
|
||||
|
||||
Open questions
|
||||
^^^^^^^^^^^^^^
|
||||
Design discussion
|
||||
=================
|
||||
|
||||
* The fallback case above suggests that this could perhaps be better handled
|
||||
as an ``iterbytes(data)`` *builtin*, that used ``data.__iterbytes__()``
|
||||
if defined, but otherwise fell back to ``map(bytes.byte, data)``::
|
||||
Why not rely on sequence repetition to create zero-initialised sequences?
|
||||
-------------------------------------------------------------------------
|
||||
|
||||
for x in iterbytes(data):
|
||||
# x is a length 1 ``bytes`` object, rather than an integer
|
||||
# This works with *any* container of integers in the range
|
||||
# 0 to 255 inclusive
|
||||
Zero-initialised sequences can be created via sequence repetition::
|
||||
|
||||
>>> b'\x00' * 3
|
||||
b'\x00\x00\x00'
|
||||
>>> bytearray(b'\x00') * 3
|
||||
bytearray(b'\x00\x00\x00')
|
||||
|
||||
However, this was also the case when the ``bytearray`` type was originally
|
||||
designed, and the decision was made to add explicit support for it in the
|
||||
type constructor. The immutable ``bytes`` type then inherited that feature
|
||||
when it was introduced in PEP 3137.
|
||||
|
||||
This PEP isn't revisiting that original design decision, just changing the
|
||||
spelling as users sometimes find the current behaviour of the binary sequence
|
||||
constructors surprising. In particular, there's a reasonable case to be made
|
||||
that ``bytes(x)`` (where ``x`` is an integer) should behave like the
|
||||
``bytes.byte(x)`` proposal in this PEP. Providing both behaviours as separate
|
||||
class methods avoids that ambiguity.
|
||||
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
.. [ideas-thread1] https://mail.python.org/pipermail/python-ideas/2014-March/027295.html
|
||||
.. [empty-buffer-issue] http://bugs.python.org/issue20895
|
||||
.. [GvR-initial-feedback] https://mail.python.org/pipermail/python-ideas/2014-March/027376.html
|
||||
.. [1] Initial March 2014 discussion thread on python-ideas
|
||||
(https://mail.python.org/pipermail/python-ideas/2014-March/027295.html)
|
||||
.. [2] Guido's initial feedback in that thread
|
||||
(https://mail.python.org/pipermail/python-ideas/2014-March/027376.html)
|
||||
.. [3] Issue proposing moving zero-initialised sequences to a dedicated API
|
||||
(http://bugs.python.org/issue20895)
|
||||
.. [4] Issue proposing to use calloc() for zero-initialised binary sequences
|
||||
(http://bugs.python.org/issue21644)
|
||||
.. [5] August 2014 discussion thread on python-dev
|
||||
(https://mail.python.org/pipermail/python-ideas/2014-March/027295.html)
|
||||
|
||||
|
||||
Copyright
|
||||
|
|
Loading…
Reference in New Issue