PEP 467: Add updates from last discussion
https://mail.python.org/pipermail/python-dev/2016-September/146043.html
This commit is contained in:
parent
0977d33b02
commit
7ec5ba7357
168
pep-0467.txt
168
pep-0467.txt
|
@ -2,13 +2,13 @@ PEP: 467
|
|||
Title: Minor API improvements for binary sequences
|
||||
Version: $Revision$
|
||||
Last-Modified: $Date$
|
||||
Author: Nick Coghlan <ncoghlan@gmail.com>
|
||||
Author: Nick Coghlan <ncoghlan@gmail.com>, Ethan Furman <ethan@stoneleaf.us>
|
||||
Status: Draft
|
||||
Type: Standards Track
|
||||
Content-Type: text/x-rst
|
||||
Created: 2014-03-30
|
||||
Python-Version: 3.5
|
||||
Post-History: 2014-03-30 2014-08-15 2014-08-16
|
||||
Python-Version: 3.8
|
||||
Post-History: 2014-03-30 2014-08-15 2014-08-16 2016-06-07 2016-09-01
|
||||
|
||||
|
||||
Abstract
|
||||
|
@ -20,22 +20,25 @@ that is now referred to as ``bytearray``. Other aspects of operating in
|
|||
the binary domain in Python have also evolved over the course of the Python
|
||||
3 series.
|
||||
|
||||
This PEP proposes four small adjustments to the APIs of the ``bytes``,
|
||||
``bytearray`` and ``memoryview`` types to make it easier to operate entirely
|
||||
in the binary domain:
|
||||
This PEP proposes five small adjustments to the APIs of the ``bytes`` and
|
||||
``bytearray`` types to make it easier to operate entirely in the binary domain:
|
||||
|
||||
* Deprecate passing single integer values to ``bytes`` and ``bytearray``
|
||||
* Add ``bytes.zeros`` and ``bytearray.zeros`` alternative constructors
|
||||
* Add ``bytes.byte`` and ``bytearray.byte`` alternative constructors
|
||||
* Add ``bytes.iterbytes``, ``bytearray.iterbytes`` and
|
||||
``memoryview.iterbytes`` alternative iterators
|
||||
* Add ``bytes.fromsize`` and ``bytearray.fromsize`` alternative constructors
|
||||
* Add ``bytes.fromord`` and ``bytearray.fromord`` alternative constructors
|
||||
* Add ``bytes.getbyte`` and ``bytearray.getbyte`` byte retrieval methods
|
||||
* Add ``bytes.iterbytes`` and ``bytearray.iterbytes`` alternative iterators
|
||||
|
||||
And one built-in::
|
||||
|
||||
* bchr
|
||||
|
||||
|
||||
Proposals
|
||||
=========
|
||||
|
||||
Deprecation of current "zero-initialised sequence" behaviour
|
||||
------------------------------------------------------------
|
||||
Deprecation of current "zero-initialised sequence" behaviour without removal
|
||||
----------------------------------------------------------------------------
|
||||
|
||||
Currently, the ``bytes`` and ``bytearray`` constructors accept an integer
|
||||
argument and interpret it as meaning to create a zero-initialised sequence
|
||||
|
@ -46,62 +49,75 @@ of the given size::
|
|||
>>> bytearray(3)
|
||||
bytearray(b'\x00\x00\x00')
|
||||
|
||||
This PEP proposes to deprecate that behaviour in Python 3.5, and remove it
|
||||
entirely in Python 3.6.
|
||||
This PEP proposes to deprecate that behaviour in Python 3.6, but to leave
|
||||
it in place for at least as long as Python 2.7 is supported, possibly
|
||||
indefinitely.
|
||||
|
||||
No other changes are proposed to the existing constructors.
|
||||
|
||||
|
||||
Addition of explicit "zero-initialised sequence" constructors
|
||||
-------------------------------------------------------------
|
||||
Addition of explicit "count and byte initialised sequence" constructors
|
||||
-----------------------------------------------------------------------
|
||||
|
||||
To replace the deprecated behaviour, this PEP proposes the addition of an
|
||||
explicit ``zeros`` alternative constructor as a class method on both
|
||||
``bytes`` and ``bytearray``::
|
||||
explicit ``fromsize`` alternative constructor as a class method on both
|
||||
``bytes`` and ``bytearray`` whose first argument is the count, and whose
|
||||
second argument is the fill byte to use (defaults to ``\x00``)::
|
||||
|
||||
>>> bytes.zeros(3)
|
||||
>>> bytes.fromsize(3)
|
||||
b'\x00\x00\x00'
|
||||
>>> bytearray.zeros(3)
|
||||
>>> bytearray.fromsize(3)
|
||||
bytearray(b'\x00\x00\x00')
|
||||
>>> bytes.fromsize(5, b'\x0a')
|
||||
b'\x0a\x0a\x0a\x0a\x0a'
|
||||
>>> bytearray.fromsize(5, b'\x0a')
|
||||
bytearray(b'\x0a\x0a\x0a\x0a\x0a')
|
||||
|
||||
It will behave just as the current constructors behave when passed a single
|
||||
integer.
|
||||
|
||||
The specific choice of ``zeros`` as the alternative constructor name is taken
|
||||
from the corresponding initialisation function in NumPy (although, as these
|
||||
are 1-dimensional sequence types rather than N-dimensional matrices, the
|
||||
constructors take a length as input rather than a shape tuple)
|
||||
``fromsize`` will behave just as the current constructors behave when passed a single
|
||||
integer, while allowing for non-zero fill values when needed.
|
||||
|
||||
|
||||
Addition of explicit "single byte" constructors
|
||||
-----------------------------------------------
|
||||
Addition of "bchr" function and explicit "single byte" constructors
|
||||
-------------------------------------------------------------------
|
||||
|
||||
As binary counterparts to the text ``chr`` function, this PEP proposes the
|
||||
addition of an explicit ``byte`` alternative constructor as a class method
|
||||
on both ``bytes`` and ``bytearray``::
|
||||
As binary counterparts to the text ``chr`` function, this PEP proposes
|
||||
the addition of a ``bchr`` function and an explicit ``fromord`` alternative
|
||||
constructor as a class method on both ``bytes`` and ``bytearray``::
|
||||
|
||||
>>> bytes.byte(3)
|
||||
b'\x03'
|
||||
>>> bytearray.byte(3)
|
||||
bytearray(b'\x03')
|
||||
>>> bchr(ord("A"))
|
||||
b'A'
|
||||
>>> bchr(ord(b"A"))
|
||||
b'A'
|
||||
>>> bytes.fromord(65)
|
||||
b'A'
|
||||
>>> bytearray.fromord(65)
|
||||
bytearray(b'A')
|
||||
|
||||
These methods will only accept integers in the range 0 to 255 (inclusive)::
|
||||
|
||||
>>> bytes.byte(512)
|
||||
>>> bytes.fromord(512)
|
||||
Traceback (most recent call last):
|
||||
File "<stdin>", line 1, in <module>
|
||||
ValueError: bytes must be in range(0, 256)
|
||||
ValueError: integer must be in range(0, 256)
|
||||
|
||||
>>> bytes.byte(1.0)
|
||||
>>> bytes.fromord(1.0)
|
||||
Traceback (most recent call last):
|
||||
File "<stdin>", line 1, in <module>
|
||||
TypeError: 'float' object cannot be interpreted as an integer
|
||||
|
||||
The documentation of the ``ord`` builtin will be updated to explicitly note
|
||||
that ``bytes.byte`` is the inverse operation for binary data, while ``chr``
|
||||
is the inverse operation for text data.
|
||||
While this does create some duplication, there are valid reasons for it::
|
||||
|
||||
Behaviourally, ``bytes.byte(x)`` will be equivalent to the current
|
||||
* the ``bchr`` builtin is to recreate the ord/chr/unichr trio from Python
|
||||
2 under a different naming scheme
|
||||
* the class method is mainly for the ``bytearray.fromord`` case, with
|
||||
``bytes.fromord`` added for consistency
|
||||
|
||||
The documentation of the ``ord`` builtin will be updated to explicitly note
|
||||
that ``bchr`` is the primary inverse operation for binary data, while ``chr``
|
||||
is the inverse operation for text data, and that ``bytes.fromord`` and
|
||||
``bytearray.fromord`` also exist.
|
||||
|
||||
Behaviourally, ``bytes.fromord(x)`` will be equivalent to the current
|
||||
``bytes([x])`` (and similarly for ``bytearray``). The new spelling is
|
||||
expected to be easier to discover and easier to read (especially when used
|
||||
in conjunction with indexing operations on binary sequence types).
|
||||
|
@ -110,35 +126,37 @@ As a separate method, the new spelling will also work better with higher
|
|||
order functions like ``map``.
|
||||
|
||||
|
||||
Addition of "getbyte" method to retrieve a single byte
|
||||
------------------------------------------------------
|
||||
|
||||
This PEP proposes that ``bytes`` and ``bytearray`` gain the method ``getbyte``
|
||||
which will always return ``bytes``::
|
||||
|
||||
>>> b'abc'.getbyte(0)
|
||||
b'a'
|
||||
|
||||
If an index is asked for that doesn't exist, ``IndexError`` is raised::
|
||||
|
||||
>>> b'abc'.getbyte(9)
|
||||
Traceback (most recent call last):
|
||||
File "<stdin>", line 1, in <module>
|
||||
IndexError: index out of range
|
||||
|
||||
|
||||
Addition of optimised iterator methods that produce ``bytes`` objects
|
||||
---------------------------------------------------------------------
|
||||
|
||||
This PEP proposes that ``bytes``, ``bytearray`` and ``memoryview`` gain an
|
||||
optimised ``iterbytes`` method that produces length 1 ``bytes`` objects
|
||||
rather than integers::
|
||||
This PEP proposes that ``bytes`` and ``bytearray``gain an optimised
|
||||
``iterbytes`` method that produces length 1 ``bytes`` objects rather than
|
||||
integers::
|
||||
|
||||
for x in data.iterbytes():
|
||||
# x is a length 1 ``bytes`` object, rather than an integer
|
||||
|
||||
The method can be used with arbitrary buffer exporting objects by wrapping
|
||||
them in a ``memoryview`` instance first::
|
||||
For example::
|
||||
|
||||
for x in memoryview(data).iterbytes():
|
||||
# x is a length 1 ``bytes`` object, rather than an integer
|
||||
|
||||
For ``memoryview``, the semantics of ``iterbytes()`` are defined such that::
|
||||
|
||||
memview.tobytes() == b''.join(memview.iterbytes())
|
||||
|
||||
This allows the raw bytes of the memory view to be iterated over without
|
||||
needing to make a copy, regardless of the defined shape and format.
|
||||
|
||||
The main advantage this method offers over the ``map(bytes.byte, data)``
|
||||
approach is that it is guaranteed *not* to fail midstream with a
|
||||
``ValueError`` or ``TypeError``. By contrast, when using the ``map`` based
|
||||
approach, the type and value of the individual items in the iterable are
|
||||
only checked as they are retrieved and passed through the ``bytes.byte``
|
||||
constructor.
|
||||
>>> tuple(b"ABC".iterbytes())
|
||||
(b'A', b'B', b'C')
|
||||
|
||||
|
||||
Design discussion
|
||||
|
@ -163,10 +181,18 @@ This PEP isn't revisiting that original design decision, just changing the
|
|||
spelling as users sometimes find the current behaviour of the binary sequence
|
||||
constructors surprising. In particular, there's a reasonable case to be made
|
||||
that ``bytes(x)`` (where ``x`` is an integer) should behave like the
|
||||
``bytes.byte(x)`` proposal in this PEP. Providing both behaviours as separate
|
||||
``bytes.fromord(x)`` proposal in this PEP. Providing both behaviours as separate
|
||||
class methods avoids that ambiguity.
|
||||
|
||||
|
||||
Open Questions
|
||||
==============
|
||||
|
||||
Do we add ``iterbytes`` to ``memoryview``, or modify
|
||||
``memoryview.cast()`` to accept ``'s'`` as a single-byte interpretation? Or
|
||||
do we ignore memory for now and add it later?
|
||||
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
|
@ -180,19 +206,11 @@ References
|
|||
(http://bugs.python.org/issue21644)
|
||||
.. [5] August 2014 discussion thread on python-dev
|
||||
(https://mail.python.org/pipermail/python-ideas/2014-March/027295.html)
|
||||
.. [6] June 2016 discussion thread on python-dev
|
||||
(https://mail.python.org/pipermail/python-dev/2016-June/144875.html)
|
||||
|
||||
|
||||
Copyright
|
||||
=========
|
||||
|
||||
This document has been placed in the public domain.
|
||||
|
||||
|
||||
..
|
||||
Local Variables:
|
||||
mode: indented-text
|
||||
indent-tabs-mode: nil
|
||||
sentence-end-double-space: t
|
||||
fill-column: 70
|
||||
coding: utf-8
|
||||
End:
|
||||
|
|
Loading…
Reference in New Issue