225 lines
8.3 KiB
Plaintext
225 lines
8.3 KiB
Plaintext
PEP: 467
|
|
Title: Improved API consistency for bytes and bytearray
|
|
Version: $Revision$
|
|
Last-Modified: $Date$
|
|
Author: Nick Coghlan <ncoghlan@gmail.com>
|
|
Status: Draft
|
|
Type: Standards Track
|
|
Content-Type: text/x-rst
|
|
Created: 2014-03-30
|
|
Python-Version: 3.5
|
|
Post-History: 2014-03-30
|
|
|
|
|
|
Abstract
|
|
========
|
|
|
|
During the initial development of the Python 3 language specification, the
|
|
core ``bytes`` type for arbitrary binary data started as the mutable type
|
|
that is now referred to as ``bytearray``. Other aspects of operating in
|
|
the binary domain in Python have also evolved over the course of the Python
|
|
3 series.
|
|
|
|
This PEP proposes a number of small adjustments to the APIs of the ``bytes``
|
|
and ``bytearray`` types to make their behaviour more internally consistent
|
|
and to make it easier to operate entirely in the binary domain, as well as
|
|
changes to their documentation to make it easier to grasp their dual roles
|
|
as containers of "arbitrary binary data" and "binary data with ASCII
|
|
compatible segments".
|
|
|
|
|
|
Background
|
|
==========
|
|
|
|
To simplify the task of writing the Python 3 documentation, the ``bytes``
|
|
and ``bytearray`` types were documented primarily in terms of the way they
|
|
differed from the Unicode based Python 3 ``str`` type. Even when I
|
|
`heavily revised the sequence documentation
|
|
<http://hg.python.org/cpython/rev/463f52d20314>`__ in 2012, I retained that
|
|
simplifying shortcut.
|
|
|
|
However, it turns out that this approach to the documentation of these types
|
|
has a problem: it doesn't adequately introduce users to their hybrid nature,
|
|
where they can be manipulated *either* as a "sequence of integers" type,
|
|
*or* as ``str``-like types that assume ASCII compatible data.
|
|
|
|
In addition to the documentation issues, there are some lingering design
|
|
quirks from an earlier pre-release design where there was *no* separate
|
|
``bytearray`` type, and instead the core ``bytes`` type was mutable (with
|
|
no immutable counterpart).
|
|
|
|
Finally, additional experience with using the existing Python 3 binary
|
|
sequence types in real world applications has suggested it would be
|
|
beneficial to make it easier to convert integers to length 1 bytes objects.
|
|
|
|
|
|
Proposals
|
|
=========
|
|
|
|
As a "consistency improvement" proposal, this PEP is actually about a number
|
|
of smaller micro-proposals, each aimed at improving the self-consistency of
|
|
the binary data model in Python 3. Proposals are motivated by one of three
|
|
factors:
|
|
|
|
* removing remnants of the original design of ``bytes`` as a mutable type
|
|
* allowing users to easily convert integer values to a length 1 ``bytes``
|
|
object
|
|
* consistently applying the following analogies to the type API designs
|
|
and documentation:
|
|
|
|
* ``bytes``: tuple of integers, with additional str-like methods
|
|
* ``bytearray``: list of integers, with additional str-like methods
|
|
|
|
|
|
Alternate Constructors
|
|
----------------------
|
|
|
|
The ``bytes`` and ``bytearray`` constructors currently accept an integer
|
|
argument, but interpret it to mean a zero-filled object of the given length.
|
|
This is a legacy of the original design of ``bytes`` as a mutable type,
|
|
rather than a particularly intuitive behaviour for users. It has become
|
|
especially confusing now that other ``bytes`` interfaces treat integers
|
|
and the corresponding length 1 bytes instances as equivalent input.
|
|
Compare::
|
|
|
|
>>> b"\x03" in bytes([1, 2, 3])
|
|
True
|
|
>>> 3 in bytes([1, 2, 3])
|
|
True
|
|
|
|
>>> bytes(b"\x03")
|
|
b'\x03'
|
|
>>> bytes(3)
|
|
b'\x00\x00\x00'
|
|
|
|
This PEP proposes that the current handling of integers in the bytes and
|
|
bytearray constructors by deprecated in Python 3.5 and targeted for
|
|
removal in Python 3.7, being replaced by two more explicit alternate
|
|
constructors provided as class methods. The initial python-ideas thread
|
|
[ideas-thread1]_ that spawned this PEP was specifically aimed at deprecating
|
|
this constructor behaviour.
|
|
|
|
Firstly, a ``byte`` constructor is proposed that converts integers
|
|
in the range 0 to 255 (inclusive) to a ``bytes`` object::
|
|
|
|
>>> bytes.byte(3)
|
|
b'\x03'
|
|
>>> bytearray.byte(3)
|
|
bytearray(b'\x03')
|
|
>>> bytes.byte(512)
|
|
Traceback (most recent call last):
|
|
File "<stdin>", line 1, in <module>
|
|
ValueError: bytes must be in range(0, 256)
|
|
|
|
One specific use case for this alternate constructor is to easily convert
|
|
the result of indexing operations on ``bytes`` and other binary sequences
|
|
from an integer to a ``bytes`` object. The documentation for this API
|
|
should note that its counterpart for the reverse conversion is ``ord()``.
|
|
The ``ord()`` documentation will also be updated to note that while
|
|
``chr()`` is the counterpart for ``str`` input, ``bytes.byte`` and
|
|
``bytearray.byte`` are the counterparts for binary input.
|
|
|
|
Secondly, a ``zeros`` constructor is proposed that serves as a direct
|
|
replacement for the current constructor behaviour, rather than having to use
|
|
sequence repetition to achieve the same effect in a less intuitive way::
|
|
|
|
>>> bytes.zeros(3)
|
|
b'\x00\x00\x00'
|
|
>>> bytearray.zeros(3)
|
|
bytearray(b'\x00\x00\x00')
|
|
|
|
The chosen name here is taken from the corresponding initialisation function
|
|
in NumPy (although, as these are sequence types rather than N-dimensional
|
|
matrices, the constructors take a length as input rather than a shape tuple)
|
|
|
|
While ``bytes.byte`` and ``bytearray.zeros`` are expected to be the more
|
|
useful duo amongst the new constructors, ``bytes.zeros`` and
|
|
`bytearray.byte`` are provided in order to maintain API consistency between
|
|
the two types.
|
|
|
|
|
|
Iteration
|
|
---------
|
|
|
|
While iteration over ``bytes`` objects and other binary sequences produces
|
|
integers, it is sometimes desirable to iterate over length 1 bytes objects
|
|
instead.
|
|
|
|
To handle this situation more obviously (and more efficiently) than would be
|
|
the case with the ``map(bytes.byte, data)`` construct enabled by the above
|
|
constructor changes, this PEP proposes the addition of a new ``iterbytes``
|
|
method to ``bytes``, ``bytearray`` and ``memoryview``::
|
|
|
|
for x in data.iterbytes():
|
|
# x is a length 1 ``bytes`` object, rather than an integer
|
|
|
|
Third party types and arbitrary containers of integers that lack the new
|
|
method can still be handled by combining ``map`` with the new
|
|
``bytes.byte()`` alternate constructor proposed above::
|
|
|
|
for x in map(bytes.byte, data):
|
|
# x is a length 1 ``bytes`` object, rather than an integer
|
|
# This works with *any* container of integers in the range
|
|
# 0 to 255 inclusive
|
|
|
|
|
|
Open questions
|
|
^^^^^^^^^^^^^^
|
|
|
|
* The fallback case above suggests that this could perhaps be better handled
|
|
as an ``iterbytes(data)`` *builtin*, that used ``data.__iterbytes__()``
|
|
if defined, but otherwise fell back to ``map(bytes.byte, data)``::
|
|
|
|
for x in iterbytes(data):
|
|
# x is a length 1 ``bytes`` object, rather than an integer
|
|
# This works with *any* container of integers in the range
|
|
# 0 to 255 inclusive
|
|
|
|
|
|
Documentation clarifications
|
|
----------------------------
|
|
|
|
In an attempt to clarify the `documentation
|
|
<https://docs.python.org/dev/library/stdtypes.html#binary-sequence-types-bytes-bytearray-memoryview>`__
|
|
of the ``bytes`` and ``bytearray`` types, the following changes are
|
|
proposed:
|
|
|
|
* the documentation of the *sequence* behaviour of each type is moved to
|
|
section for that individual type. These sections will be updated to
|
|
explicitly make the ``tuple of integers`` and ``list of integers``
|
|
analogies, as well as to make it clear that these parts of the API work
|
|
with arbitrary binary data
|
|
* the current "Bytes and bytearray operations" section will be updated to
|
|
"Handling binary data with ASCII compatible segments", and will explicitly
|
|
list *all* of the methods that are included.
|
|
* clarify that due to their origins in the API of the immutable ``str``
|
|
type, even the ``bytearray`` versions of these methods do *not* operate
|
|
in place, but instead create a new object.
|
|
|
|
A patch for at least this part of the proposal will be prepared before
|
|
submitting the PEP for approval, as writing out these docs completely may
|
|
suggest additional opportunities for API consistency improvements.
|
|
|
|
|
|
References
|
|
==========
|
|
|
|
.. [ideas-thread1] https://mail.python.org/pipermail/python-ideas/2014-March/027295.html
|
|
.. [empty-buffer-issue] http://bugs.python.org/issue20895
|
|
|
|
|
|
Copyright
|
|
=========
|
|
|
|
This document has been placed in the public domain.
|
|
|
|
|
|
..
|
|
Local Variables:
|
|
mode: indented-text
|
|
indent-tabs-mode: nil
|
|
sentence-end-double-space: t
|
|
fill-column: 70
|
|
coding: utf-8
|
|
End:
|