PEP 467: Add updates from last discussion

https://mail.python.org/pipermail/python-dev/2016-September/146043.html
This commit is contained in:
Ethan Furman 2018-02-21 19:18:38 -08:00
parent 0977d33b02
commit 7ec5ba7357
1 changed files with 93 additions and 75 deletions

View File

@ -2,13 +2,13 @@ PEP: 467
Title: Minor API improvements for binary sequences Title: Minor API improvements for binary sequences
Version: $Revision$ Version: $Revision$
Last-Modified: $Date$ Last-Modified: $Date$
Author: Nick Coghlan <ncoghlan@gmail.com> Author: Nick Coghlan <ncoghlan@gmail.com>, Ethan Furman <ethan@stoneleaf.us>
Status: Draft Status: Draft
Type: Standards Track Type: Standards Track
Content-Type: text/x-rst Content-Type: text/x-rst
Created: 2014-03-30 Created: 2014-03-30
Python-Version: 3.5 Python-Version: 3.8
Post-History: 2014-03-30 2014-08-15 2014-08-16 Post-History: 2014-03-30 2014-08-15 2014-08-16 2016-06-07 2016-09-01
Abstract Abstract
@ -20,22 +20,25 @@ that is now referred to as ``bytearray``. Other aspects of operating in
the binary domain in Python have also evolved over the course of the Python the binary domain in Python have also evolved over the course of the Python
3 series. 3 series.
This PEP proposes four small adjustments to the APIs of the ``bytes``, This PEP proposes five small adjustments to the APIs of the ``bytes`` and
``bytearray`` and ``memoryview`` types to make it easier to operate entirely ``bytearray`` types to make it easier to operate entirely in the binary domain:
in the binary domain:
* Deprecate passing single integer values to ``bytes`` and ``bytearray`` * Deprecate passing single integer values to ``bytes`` and ``bytearray``
* Add ``bytes.zeros`` and ``bytearray.zeros`` alternative constructors * Add ``bytes.fromsize`` and ``bytearray.fromsize`` alternative constructors
* Add ``bytes.byte`` and ``bytearray.byte`` alternative constructors * Add ``bytes.fromord`` and ``bytearray.fromord`` alternative constructors
* Add ``bytes.iterbytes``, ``bytearray.iterbytes`` and * Add ``bytes.getbyte`` and ``bytearray.getbyte`` byte retrieval methods
``memoryview.iterbytes`` alternative iterators * Add ``bytes.iterbytes`` and ``bytearray.iterbytes`` alternative iterators
And one built-in::
* bchr
Proposals Proposals
========= =========
Deprecation of current "zero-initialised sequence" behaviour Deprecation of current "zero-initialised sequence" behaviour without removal
------------------------------------------------------------ ----------------------------------------------------------------------------
Currently, the ``bytes`` and ``bytearray`` constructors accept an integer Currently, the ``bytes`` and ``bytearray`` constructors accept an integer
argument and interpret it as meaning to create a zero-initialised sequence argument and interpret it as meaning to create a zero-initialised sequence
@ -46,62 +49,75 @@ of the given size::
>>> bytearray(3) >>> bytearray(3)
bytearray(b'\x00\x00\x00') bytearray(b'\x00\x00\x00')
This PEP proposes to deprecate that behaviour in Python 3.5, and remove it This PEP proposes to deprecate that behaviour in Python 3.6, but to leave
entirely in Python 3.6. it in place for at least as long as Python 2.7 is supported, possibly
indefinitely.
No other changes are proposed to the existing constructors. No other changes are proposed to the existing constructors.
Addition of explicit "zero-initialised sequence" constructors Addition of explicit "count and byte initialised sequence" constructors
------------------------------------------------------------- -----------------------------------------------------------------------
To replace the deprecated behaviour, this PEP proposes the addition of an To replace the deprecated behaviour, this PEP proposes the addition of an
explicit ``zeros`` alternative constructor as a class method on both explicit ``fromsize`` alternative constructor as a class method on both
``bytes`` and ``bytearray``:: ``bytes`` and ``bytearray`` whose first argument is the count, and whose
second argument is the fill byte to use (defaults to ``\x00``)::
>>> bytes.zeros(3) >>> bytes.fromsize(3)
b'\x00\x00\x00' b'\x00\x00\x00'
>>> bytearray.zeros(3) >>> bytearray.fromsize(3)
bytearray(b'\x00\x00\x00') bytearray(b'\x00\x00\x00')
>>> bytes.fromsize(5, b'\x0a')
b'\x0a\x0a\x0a\x0a\x0a'
>>> bytearray.fromsize(5, b'\x0a')
bytearray(b'\x0a\x0a\x0a\x0a\x0a')
It will behave just as the current constructors behave when passed a single ``fromsize`` will behave just as the current constructors behave when passed a single
integer. integer, while allowing for non-zero fill values when needed.
The specific choice of ``zeros`` as the alternative constructor name is taken
from the corresponding initialisation function in NumPy (although, as these
are 1-dimensional sequence types rather than N-dimensional matrices, the
constructors take a length as input rather than a shape tuple)
Addition of explicit "single byte" constructors Addition of "bchr" function and explicit "single byte" constructors
----------------------------------------------- -------------------------------------------------------------------
As binary counterparts to the text ``chr`` function, this PEP proposes the As binary counterparts to the text ``chr`` function, this PEP proposes
addition of an explicit ``byte`` alternative constructor as a class method the addition of a ``bchr`` function and an explicit ``fromord`` alternative
on both ``bytes`` and ``bytearray``:: constructor as a class method on both ``bytes`` and ``bytearray``::
>>> bytes.byte(3) >>> bchr(ord("A"))
b'\x03' b'A'
>>> bytearray.byte(3) >>> bchr(ord(b"A"))
bytearray(b'\x03') b'A'
>>> bytes.fromord(65)
b'A'
>>> bytearray.fromord(65)
bytearray(b'A')
These methods will only accept integers in the range 0 to 255 (inclusive):: These methods will only accept integers in the range 0 to 255 (inclusive)::
>>> bytes.byte(512) >>> bytes.fromord(512)
Traceback (most recent call last): Traceback (most recent call last):
File "<stdin>", line 1, in <module> File "<stdin>", line 1, in <module>
ValueError: bytes must be in range(0, 256) ValueError: integer must be in range(0, 256)
>>> bytes.byte(1.0) >>> bytes.fromord(1.0)
Traceback (most recent call last): Traceback (most recent call last):
File "<stdin>", line 1, in <module> File "<stdin>", line 1, in <module>
TypeError: 'float' object cannot be interpreted as an integer TypeError: 'float' object cannot be interpreted as an integer
The documentation of the ``ord`` builtin will be updated to explicitly note While this does create some duplication, there are valid reasons for it::
that ``bytes.byte`` is the inverse operation for binary data, while ``chr``
is the inverse operation for text data.
Behaviourally, ``bytes.byte(x)`` will be equivalent to the current * the ``bchr`` builtin is to recreate the ord/chr/unichr trio from Python
2 under a different naming scheme
* the class method is mainly for the ``bytearray.fromord`` case, with
``bytes.fromord`` added for consistency
The documentation of the ``ord`` builtin will be updated to explicitly note
that ``bchr`` is the primary inverse operation for binary data, while ``chr``
is the inverse operation for text data, and that ``bytes.fromord`` and
``bytearray.fromord`` also exist.
Behaviourally, ``bytes.fromord(x)`` will be equivalent to the current
``bytes([x])`` (and similarly for ``bytearray``). The new spelling is ``bytes([x])`` (and similarly for ``bytearray``). The new spelling is
expected to be easier to discover and easier to read (especially when used expected to be easier to discover and easier to read (especially when used
in conjunction with indexing operations on binary sequence types). in conjunction with indexing operations on binary sequence types).
@ -110,35 +126,37 @@ As a separate method, the new spelling will also work better with higher
order functions like ``map``. order functions like ``map``.
Addition of "getbyte" method to retrieve a single byte
------------------------------------------------------
This PEP proposes that ``bytes`` and ``bytearray`` gain the method ``getbyte``
which will always return ``bytes``::
>>> b'abc'.getbyte(0)
b'a'
If an index is asked for that doesn't exist, ``IndexError`` is raised::
>>> b'abc'.getbyte(9)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: index out of range
Addition of optimised iterator methods that produce ``bytes`` objects Addition of optimised iterator methods that produce ``bytes`` objects
--------------------------------------------------------------------- ---------------------------------------------------------------------
This PEP proposes that ``bytes``, ``bytearray`` and ``memoryview`` gain an This PEP proposes that ``bytes`` and ``bytearray``gain an optimised
optimised ``iterbytes`` method that produces length 1 ``bytes`` objects ``iterbytes`` method that produces length 1 ``bytes`` objects rather than
rather than integers:: integers::
for x in data.iterbytes(): for x in data.iterbytes():
# x is a length 1 ``bytes`` object, rather than an integer # x is a length 1 ``bytes`` object, rather than an integer
The method can be used with arbitrary buffer exporting objects by wrapping For example::
them in a ``memoryview`` instance first::
for x in memoryview(data).iterbytes(): >>> tuple(b"ABC".iterbytes())
# x is a length 1 ``bytes`` object, rather than an integer (b'A', b'B', b'C')
For ``memoryview``, the semantics of ``iterbytes()`` are defined such that::
memview.tobytes() == b''.join(memview.iterbytes())
This allows the raw bytes of the memory view to be iterated over without
needing to make a copy, regardless of the defined shape and format.
The main advantage this method offers over the ``map(bytes.byte, data)``
approach is that it is guaranteed *not* to fail midstream with a
``ValueError`` or ``TypeError``. By contrast, when using the ``map`` based
approach, the type and value of the individual items in the iterable are
only checked as they are retrieved and passed through the ``bytes.byte``
constructor.
Design discussion Design discussion
@ -163,10 +181,18 @@ This PEP isn't revisiting that original design decision, just changing the
spelling as users sometimes find the current behaviour of the binary sequence spelling as users sometimes find the current behaviour of the binary sequence
constructors surprising. In particular, there's a reasonable case to be made constructors surprising. In particular, there's a reasonable case to be made
that ``bytes(x)`` (where ``x`` is an integer) should behave like the that ``bytes(x)`` (where ``x`` is an integer) should behave like the
``bytes.byte(x)`` proposal in this PEP. Providing both behaviours as separate ``bytes.fromord(x)`` proposal in this PEP. Providing both behaviours as separate
class methods avoids that ambiguity. class methods avoids that ambiguity.
Open Questions
==============
Do we add ``iterbytes`` to ``memoryview``, or modify
``memoryview.cast()`` to accept ``'s'`` as a single-byte interpretation? Or
do we ignore memory for now and add it later?
References References
========== ==========
@ -180,19 +206,11 @@ References
(http://bugs.python.org/issue21644) (http://bugs.python.org/issue21644)
.. [5] August 2014 discussion thread on python-dev .. [5] August 2014 discussion thread on python-dev
(https://mail.python.org/pipermail/python-ideas/2014-March/027295.html) (https://mail.python.org/pipermail/python-ideas/2014-March/027295.html)
.. [6] June 2016 discussion thread on python-dev
(https://mail.python.org/pipermail/python-dev/2016-June/144875.html)
Copyright Copyright
========= =========
This document has been placed in the public domain. This document has been placed in the public domain.
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End: