PEP 467: try to streamline the proposal

2014-08-16 15:05:16 +10:00 · 2014-08-16 15:05:16 +10:00 · 19a8de5ca0
parent 4820a784db
commit 19a8de5ca0
1 changed files with 119 additions and 116 deletions
--- a/pep-0467.txt
+++ b/pep-0467.txt
@ -20,163 +20,166 @@ that is now referred to as ``bytearray``. Other aspects of operating in
 the binary domain in Python have also evolved over the course of the Python
 3 series.

-This PEP proposes a number of small adjustments to the APIs of the ``bytes``
-and ``bytearray`` types to make it easier to operate entirely in the binary
-domain.
+This PEP proposes four small adjustments to the APIs of the ``bytes``,
+``bytearray`` and ``memoryview`` types to make it easier to operate entirely
+in the binary domain:

-
-Background
-==========
-
-To simplify the task of writing the Python 3 documentation, the ``bytes``
-and ``bytearray`` types were documented primarily in terms of the way they
-differed from the Unicode based Python 3 ``str`` type. Even when I
-`heavily revised the sequence documentation
-<http://hg.python.org/cpython/rev/463f52d20314>`__ in 2012, I retained that
-simplifying shortcut.
-
-However, it turns out that this approach to the documentation of these types
-had a problem: it doesn't adequately introduce users to their hybrid nature,
-where they can be manipulated *either* as a "sequence of integers" type,
-*or* as ``str``-like types that assume ASCII compatible data.
-
-That oversight has now been corrected, with the binary sequence types now
-being documented entirely independently of the ``str`` documentation in
-`Python 3.4+ <https://docs.python.org/3/library/stdtypes.html#binary-sequence-types-bytes-bytearray-memoryview>`__
-
-The confusion isn't just a documentation issue, however, as there are also
-some lingering design quirks from an earlier pre-release design where there
-was *no* separate ``bytearray`` type, and instead the core ``bytes`` type
-was mutable (with no immutable counterpart).
-
-Finally, additional experience with using the existing Python 3 binary
-sequence types in real world applications has suggested it would be
-beneficial to make it easier to convert integers to length 1 bytes objects.
+* Deprecate passing single integer values to ``bytes`` and ``bytearray``
+* Add ``bytes.zeros`` and ``bytearray.zeros`` alternative constructors
+* Add ``bytes.byte`` and ``bytearray.byte`` alternative constructors
+* Add ``bytes.iterbytes``, ``bytearray.iterbytes`` and
+  ``memoryview.iterbytes`` alternative iterators


 Proposals
 =========

-As a "consistency improvement" proposal, this PEP is actually about a few
-smaller micro-proposals, each aimed at improving the usability of the binary
-data model in Python 3. Proposals are motivated by one of two main factors:
+Deprecation of current "zero-initialised sequence" behaviour
+------------------------------------------------------------

-* removing remnants of the original design of ``bytes`` as a mutable type
-* allowing users to easily convert integer values to a length 1 ``bytes``
-  object
+Currently, the ``bytes`` and ``bytearray`` constructors accept an integer
+argument and interpret it as meaning to create a zero-initialised sequence
+of the given size::

-
-Alternate Constructors
----------------------
-
-The ``bytes`` and ``bytearray`` constructors currently accept an integer
-argument, but interpret it to mean a zero-filled object of the given length.
-This is a legacy of the original design of ``bytes`` as a mutable type,
-rather than a particularly intuitive behaviour for users. It has become
-especially confusing now that some other ``bytes`` interfaces treat integers
-and the corresponding length 1 bytes instances as equivalent input.
-Compare::
-
-    >>> b"\x03" in bytes([1, 2, 3])
-    True
-    >>> 3 in bytes([1, 2, 3])
-    True
-
-    >>> bytes(b"\x03")
-    b'\x03'
    >>> bytes(3)
    b'\x00\x00\x00'
+    >>> bytearray(3)
+    bytearray(b'\x00\x00\x00')

-This PEP proposes that the current handling of integers in the bytes and
-bytearray constructors by deprecated in Python 3.5 and targeted for
-removal in Python 3.7, being replaced by two more explicit alternate
-constructors provided as class methods. The initial python-ideas thread
-[ideas-thread1]_ that spawned this PEP was specifically aimed at deprecating
-this constructor behaviour.
+This PEP proposes to deprecate that behaviour in Python 3.5, and remove it
+entirely in Python 3.6.

-Firstly, a ``byte`` constructor is proposed that converts integers
-in the range 0 to 255 (inclusive) to a ``bytes`` object::
+No other changes are proposed to the existing constructors.

-    >>> bytes.byte(3)
-    b'\x03'
-    >>> bytearray.byte(3)
-    bytearray(b'\x03')
-    >>> bytes.byte(512)
-    Traceback (most recent call last):
-      File "<stdin>", line 1, in <module>
-    ValueError: bytes must be in range(0, 256)

-One specific use case for this alternate constructor is to easily convert
-the result of indexing operations on ``bytes`` and other binary sequences
-from an integer to a ``bytes`` object. The documentation for this API
-should note that its counterpart for the reverse conversion is ``ord()``.
-The ``ord()`` documentation will also be updated to note that while
-``chr()`` is the counterpart for ``str`` input, ``bytes.byte`` and
-``bytearray.byte`` are the counterparts for binary input.
+Addition of explicit "zero-initialised sequence" constructors
+-------------------------------------------------------------

-Secondly, a ``zeros`` constructor is proposed that serves as a direct
-replacement for the current constructor behaviour, rather than having to use
-sequence repetition to achieve the same effect in a less intuitive way::
+To replace the deprecated behaviour, this PEP proposes the addition of an
+explicit ``zeros`` alternative constructor as a class method on both
+``bytes`` and ``bytearray``::

    >>> bytes.zeros(3)
    b'\x00\x00\x00'
    >>> bytearray.zeros(3)
    bytearray(b'\x00\x00\x00')

-The chosen name here is taken from the corresponding initialisation function
-in NumPy (although, as these are sequence types rather than N-dimensional
-matrices, the constructors take a length as input rather than a shape tuple)
+It will behave just as the current constructors behave when passed a single
+integer.

-While ``bytes.byte`` and ``bytearray.zeros`` are expected to be the more
-useful duo amongst the new constructors, ``bytes.zeros`` and
-`bytearray.byte`` are provided in order to maintain API consistency between
-the two types.
+The specific choice of ``zeros`` as the alternative constructor name is taken
+from the corresponding initialisation function in NumPy (although, as these
+are 1-dimensional sequence types rather than N-dimensional matrices, the
+constructors take a length as input rather than a shape tuple)


-Iteration
---------
+Addition of explicit "single byte" constructors
+-----------------------------------------------

-While iteration over ``bytes`` objects and other binary sequences produces
-integers, it is sometimes desirable to iterate over length 1 bytes objects
-instead.
+As binary counterparts to the text ``chr`` function, this PEP proposes the
+addition of an explicit ``byte`` alternative constructor as a class method
+on both ``bytes`` and ``bytearray``::

-To handle this situation more obviously (and more efficiently) than would be
-the case with the ``map(bytes.byte, data)`` construct enabled by the above
-constructor changes, this PEP proposes the addition of a new ``iterbytes``
-method to ``bytes``, ``bytearray`` and ``memoryview``::
+    >>> bytes.byte(3)
+    b'\x03'
+    >>> bytearray.byte(3)
+    bytearray(b'\x03')
+
+These methods will only accept integers in the range 0 to 255 (inclusive)::
+
+    >>> bytes.byte(512)
+    Traceback (most recent call last):
+      File "<stdin>", line 1, in <module>
+    ValueError: bytes must be in range(0, 256)
+
+    >>> bytes.byte(1.0)
+    Traceback (most recent call last):
+      File "<stdin>", line 1, in <module>
+    TypeError: 'float' object cannot be interpreted as an integer
+
+The documentation of the ``ord`` builtin will be updated to explicitly note
+that ``bytes.byte`` is the inverse operation for binary data, while ``chr``
+is the inverse operation for text data.
+
+Behaviourally, ``bytes.byte(x)`` will be equivalent to the current
+``bytes([x])`` (and similarly for ``bytearray``). The new spelling is
+expected to be easier to discover and easier to read (especially when used
+in conjunction with indexing operations on binary sequence types).
+
+As a separate method, the new spelling will also work better with higher
+order functions like ``map``.
+
+
+Addition of optimised iterator methods that produce ``bytes`` objects
+---------------------------------------------------------------------
+
+This PEP proposes that ``bytes``, ``bytearray`` and ``memoryview`` gain an
+optimised ``iterbytes`` method that produces length 1 ``bytes`` objects
+rather than integers::

    for x in data.iterbytes():
        # x is a length 1 ``bytes`` object, rather than an integer

-Third party types and arbitrary containers of integers that lack the new
-method can still be handled by combining ``map`` with the new
-``bytes.byte()`` alternate constructor proposed above::
+The method can be used with arbitrary buffer exporting objects by wrapping
+them in a ``memoryview`` instance first::

-    for x in map(bytes.byte, data):
+    for x in memoryview(data).iterbytes():
        # x is a length 1 ``bytes`` object, rather than an integer
-        # This works with *any* container of integers in the range
-        # 0 to 255 inclusive
+
+For ``memoryview``, the semantics of ``iterbytes()`` are defined such that::
+
+    memview.tobytes() == b''.join(memview.iterbytes())
+
+This allows the raw bytes of the memory view to be iterated over without
+needing to make a copy, regardless of the defined shape and format.
+
+The main advantage this method offers over the ``map(bytes.byte, data)``
+approach is that it is guaranteed *not* to fail midstream with a
+``ValueError`` or ``TypeError``. By contrast, when using the ``map`` based
+approach, the type and value of the individual items in the iterable are
+only checked as they are retrieved and passed through the ``bytes.byte``
+constructor.


-Open questions
-^^^^^^^^^^^^^^
+Design discussion
+=================

-* The fallback case above suggests that this could perhaps be better handled
-  as an ``iterbytes(data)`` *builtin*, that used ``data.__iterbytes__()``
-  if defined, but otherwise fell back to ``map(bytes.byte, data)``::
+Why not rely on sequence repetition to create zero-initialised sequences?
+-------------------------------------------------------------------------

-    for x in iterbytes(data):
-        # x is a length 1 ``bytes`` object, rather than an integer
-        # This works with *any* container of integers in the range
-        # 0 to 255 inclusive
+Zero-initialised sequences can be created via sequence repetition::
+
+    >>> b'\x00' * 3
+    b'\x00\x00\x00'
+    >>> bytearray(b'\x00') * 3
+    bytearray(b'\x00\x00\x00')
+
+However, this was also the case when the ``bytearray`` type was originally
+designed, and the decision was made to add explicit support for it in the
+type constructor. The immutable ``bytes`` type then inherited that feature
+when it was introduced in PEP 3137.
+
+This PEP isn't revisiting that original design decision, just changing the
+spelling as users sometimes find the current behaviour of the binary sequence
+constructors surprising. In particular, there's a reasonable case to be made
+that ``bytes(x)`` (where ``x`` is an integer) should behave like the
+``bytes.byte(x)`` proposal in this PEP. Providing both behaviours as separate
+class methods avoids that ambiguity.


 References
 ==========

-.. [ideas-thread1] https://mail.python.org/pipermail/python-ideas/2014-March/027295.html
-.. [empty-buffer-issue] http://bugs.python.org/issue20895
-.. [GvR-initial-feedback] https://mail.python.org/pipermail/python-ideas/2014-March/027376.html
+.. [1] Initial March 2014 discussion thread on python-ideas
+   (https://mail.python.org/pipermail/python-ideas/2014-March/027295.html)
+.. [2] Guido's initial feedback in that thread
+   (https://mail.python.org/pipermail/python-ideas/2014-March/027376.html)
+.. [3] Issue proposing moving zero-initialised sequences to a dedicated API
+   (http://bugs.python.org/issue20895)
+.. [4] Issue proposing to use calloc() for zero-initialised binary sequences
+   (http://bugs.python.org/issue21644)
+.. [5] August 2014 discussion thread on python-dev
+   (https://mail.python.org/pipermail/python-ideas/2014-March/027295.html)


 Copyright