PEP 467: Add updates from last discussion

https://mail.python.org/pipermail/python-dev/2016-September/146043.html
2018-02-21 19:18:38 -08:00 · 2018-02-21 19:18:38 -08:00 · 7ec5ba7357
parent 0977d33b02
commit 7ec5ba7357
1 changed files with 93 additions and 75 deletions
--- a/pep-0467.txt
+++ b/pep-0467.txt
@ -2,13 +2,13 @@ PEP: 467
 Title: Minor API improvements for binary sequences
 Version: $Revision$
 Last-Modified: $Date$
-Author: Nick Coghlan <ncoghlan@gmail.com>
+Author: Nick Coghlan <ncoghlan@gmail.com>, Ethan Furman <ethan@stoneleaf.us>
 Status: Draft
 Type: Standards Track
 Content-Type: text/x-rst
 Created: 2014-03-30
-Python-Version: 3.5
-Post-History: 2014-03-30 2014-08-15 2014-08-16
+Python-Version: 3.8
+Post-History: 2014-03-30 2014-08-15 2014-08-16 2016-06-07 2016-09-01


 Abstract
@ -20,22 +20,25 @@ that is now referred to as ``bytearray``. Other aspects of operating in
 the binary domain in Python have also evolved over the course of the Python
 3 series.

-This PEP proposes four small adjustments to the APIs of the ``bytes``,
-``bytearray`` and ``memoryview`` types to make it easier to operate entirely
-in the binary domain:
+This PEP proposes five small adjustments to the APIs of the ``bytes`` and
+``bytearray`` types to make it easier to operate entirely in the binary domain:

 * Deprecate passing single integer values to ``bytes`` and ``bytearray``
-* Add ``bytes.zeros`` and ``bytearray.zeros`` alternative constructors
-* Add ``bytes.byte`` and ``bytearray.byte`` alternative constructors
-* Add ``bytes.iterbytes``, ``bytearray.iterbytes`` and
-  ``memoryview.iterbytes`` alternative iterators
+* Add ``bytes.fromsize`` and ``bytearray.fromsize`` alternative constructors
+* Add ``bytes.fromord`` and ``bytearray.fromord`` alternative constructors
+* Add ``bytes.getbyte`` and ``bytearray.getbyte`` byte retrieval methods
+* Add ``bytes.iterbytes`` and ``bytearray.iterbytes`` alternative iterators
+
+And one built-in::
+
+* bchr


 Proposals
 =========

-Deprecation of current "zero-initialised sequence" behaviour
------------------------------------------------------------
+Deprecation of current "zero-initialised sequence" behaviour without removal
+----------------------------------------------------------------------------

 Currently, the ``bytes`` and ``bytearray`` constructors accept an integer
 argument and interpret it as meaning to create a zero-initialised sequence
@ -46,62 +49,75 @@ of the given size::
    >>> bytearray(3)
    bytearray(b'\x00\x00\x00')

-This PEP proposes to deprecate that behaviour in Python 3.5, and remove it
-entirely in Python 3.6.
+This PEP proposes to deprecate that behaviour in Python 3.6, but to leave
+it in place for at least as long as Python 2.7 is supported, possibly
+indefinitely.

 No other changes are proposed to the existing constructors.


-Addition of explicit "zero-initialised sequence" constructors
-------------------------------------------------------------
+Addition of explicit "count and byte initialised sequence" constructors
+-----------------------------------------------------------------------

 To replace the deprecated behaviour, this PEP proposes the addition of an
-explicit ``zeros`` alternative constructor as a class method on both
-``bytes`` and ``bytearray``::
+explicit ``fromsize`` alternative constructor as a class method on both
+``bytes`` and ``bytearray`` whose first argument is the count, and whose
+second argument is the fill byte to use (defaults to ``\x00``)::

-    >>> bytes.zeros(3)
+    >>> bytes.fromsize(3)
    b'\x00\x00\x00'
-    >>> bytearray.zeros(3)
+    >>> bytearray.fromsize(3)
    bytearray(b'\x00\x00\x00')
+    >>> bytes.fromsize(5, b'\x0a')
+    b'\x0a\x0a\x0a\x0a\x0a'
+    >>> bytearray.fromsize(5, b'\x0a')
+    bytearray(b'\x0a\x0a\x0a\x0a\x0a')

-It will behave just as the current constructors behave when passed a single
-integer.
-
-The specific choice of ``zeros`` as the alternative constructor name is taken
-from the corresponding initialisation function in NumPy (although, as these
-are 1-dimensional sequence types rather than N-dimensional matrices, the
-constructors take a length as input rather than a shape tuple)
+``fromsize`` will behave just as the current constructors behave when passed a single
+integer, while allowing for non-zero fill values when needed.


-Addition of explicit "single byte" constructors
-----------------------------------------------
+Addition of "bchr" function and explicit "single byte" constructors
+-------------------------------------------------------------------

-As binary counterparts to the text ``chr`` function, this PEP proposes the
-addition of an explicit ``byte`` alternative constructor as a class method
-on both ``bytes`` and ``bytearray``::
+As binary counterparts to the text ``chr`` function, this PEP proposes
+the addition of a ``bchr`` function and an explicit ``fromord`` alternative
+constructor as a class method on both ``bytes`` and ``bytearray``::

-    >>> bytes.byte(3)
-    b'\x03'
-    >>> bytearray.byte(3)
-    bytearray(b'\x03')
+    >>> bchr(ord("A"))
+    b'A'
+    >>> bchr(ord(b"A"))
+    b'A'
+    >>> bytes.fromord(65)
+    b'A'
+    >>> bytearray.fromord(65)
+    bytearray(b'A')

 These methods will only accept integers in the range 0 to 255 (inclusive)::

-    >>> bytes.byte(512)
+    >>> bytes.fromord(512)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
-    ValueError: bytes must be in range(0, 256)
+    ValueError: integer must be in range(0, 256)

-    >>> bytes.byte(1.0)
+    >>> bytes.fromord(1.0)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: 'float' object cannot be interpreted as an integer

-The documentation of the ``ord`` builtin will be updated to explicitly note
-that ``bytes.byte`` is the inverse operation for binary data, while ``chr``
-is the inverse operation for text data.
+While this does create some duplication, there are valid reasons for it::

-Behaviourally, ``bytes.byte(x)`` will be equivalent to the current
+* the ``bchr`` builtin is to recreate the ord/chr/unichr trio from Python
+  2 under a different naming scheme
+* the class method is mainly for the ``bytearray.fromord`` case, with
+  ``bytes.fromord`` added for consistency
+
+The documentation of the ``ord`` builtin will be updated to explicitly note
+that ``bchr`` is the primary inverse operation for binary data, while ``chr``
+is the inverse operation for text data, and that ``bytes.fromord`` and
+``bytearray.fromord`` also exist.
+
+Behaviourally, ``bytes.fromord(x)`` will be equivalent to the current
 ``bytes([x])`` (and similarly for ``bytearray``). The new spelling is
 expected to be easier to discover and easier to read (especially when used
 in conjunction with indexing operations on binary sequence types).
@ -110,35 +126,37 @@ As a separate method, the new spelling will also work better with higher
 order functions like ``map``.


+Addition of "getbyte" method to retrieve a single byte
+------------------------------------------------------
+
+This PEP proposes that ``bytes`` and ``bytearray`` gain the method ``getbyte``
+which will always return ``bytes``::
+
+    >>> b'abc'.getbyte(0)
+    b'a'
+
+If an index is asked for that doesn't exist, ``IndexError`` is raised::
+
+    >>> b'abc'.getbyte(9)
+    Traceback (most recent call last):
+      File "<stdin>", line 1, in <module>
+    IndexError: index out of range
+
+
 Addition of optimised iterator methods that produce ``bytes`` objects
 ---------------------------------------------------------------------

-This PEP proposes that ``bytes``, ``bytearray`` and ``memoryview`` gain an
-optimised ``iterbytes`` method that produces length 1 ``bytes`` objects
-rather than integers::
+This PEP proposes that ``bytes`` and ``bytearray``gain an optimised
+``iterbytes`` method that produces length 1 ``bytes`` objects rather than
+integers::

    for x in data.iterbytes():
        # x is a length 1 ``bytes`` object, rather than an integer

-The method can be used with arbitrary buffer exporting objects by wrapping
-them in a ``memoryview`` instance first::
+For example::

-    for x in memoryview(data).iterbytes():
-        # x is a length 1 ``bytes`` object, rather than an integer
-
-For ``memoryview``, the semantics of ``iterbytes()`` are defined such that::
-
-    memview.tobytes() == b''.join(memview.iterbytes())
-
-This allows the raw bytes of the memory view to be iterated over without
-needing to make a copy, regardless of the defined shape and format.
-
-The main advantage this method offers over the ``map(bytes.byte, data)``
-approach is that it is guaranteed *not* to fail midstream with a
-``ValueError`` or ``TypeError``. By contrast, when using the ``map`` based
-approach, the type and value of the individual items in the iterable are
-only checked as they are retrieved and passed through the ``bytes.byte``
-constructor.
+    >>> tuple(b"ABC".iterbytes())
+    (b'A', b'B', b'C')


 Design discussion
@ -163,10 +181,18 @@ This PEP isn't revisiting that original design decision, just changing the
 spelling as users sometimes find the current behaviour of the binary sequence
 constructors surprising. In particular, there's a reasonable case to be made
 that ``bytes(x)`` (where ``x`` is an integer) should behave like the
-``bytes.byte(x)`` proposal in this PEP. Providing both behaviours as separate
+``bytes.fromord(x)`` proposal in this PEP. Providing both behaviours as separate
 class methods avoids that ambiguity.


+Open Questions
+==============
+
+Do we add ``iterbytes`` to ``memoryview``, or modify
+``memoryview.cast()`` to accept ``'s'`` as a single-byte interpretation?  Or
+do we ignore memory for now and add it later?
+
+
 References
 ==========

@ -180,19 +206,11 @@ References
   (http://bugs.python.org/issue21644)
 .. [5] August 2014 discussion thread on python-dev
   (https://mail.python.org/pipermail/python-ideas/2014-March/027295.html)
+.. [6] June 2016 discussion thread on python-dev
+   (https://mail.python.org/pipermail/python-dev/2016-June/144875.html)


 Copyright
 =========

 This document has been placed in the public domain.
-
-
-..
-   Local Variables:
-   mode: indented-text
-   indent-tabs-mode: nil
-   sentence-end-double-space: t
-   fill-column: 70
-   coding: utf-8
-   End: