2014-03-29 21:28:34 -04:00
|
|
|
PEP: 467
|
2014-08-16 02:59:02 -04:00
|
|
|
Title: Minor API improvements for binary sequences
|
2014-03-29 21:28:34 -04:00
|
|
|
Version: $Revision$
|
|
|
|
Last-Modified: $Date$
|
2018-02-21 22:18:38 -05:00
|
|
|
Author: Nick Coghlan <ncoghlan@gmail.com>, Ethan Furman <ethan@stoneleaf.us>
|
2019-04-21 11:59:45 -04:00
|
|
|
Status: Deferred
|
2014-03-29 21:28:34 -04:00
|
|
|
Type: Standards Track
|
|
|
|
Content-Type: text/x-rst
|
|
|
|
Created: 2014-03-30
|
2019-04-21 11:59:45 -04:00
|
|
|
Python-Version: 3.9
|
2018-02-21 22:18:38 -05:00
|
|
|
Post-History: 2014-03-30 2014-08-15 2014-08-16 2016-06-07 2016-09-01
|
2014-03-29 21:28:34 -04:00
|
|
|
|
|
|
|
|
|
|
|
Abstract
|
|
|
|
========
|
|
|
|
|
|
|
|
During the initial development of the Python 3 language specification, the
|
|
|
|
core ``bytes`` type for arbitrary binary data started as the mutable type
|
|
|
|
that is now referred to as ``bytearray``. Other aspects of operating in
|
|
|
|
the binary domain in Python have also evolved over the course of the Python
|
|
|
|
3 series.
|
|
|
|
|
2018-02-21 22:18:38 -05:00
|
|
|
This PEP proposes five small adjustments to the APIs of the ``bytes`` and
|
|
|
|
``bytearray`` types to make it easier to operate entirely in the binary domain:
|
2014-03-29 21:28:34 -04:00
|
|
|
|
2019-04-21 11:59:45 -04:00
|
|
|
* Discourage passing single integer values to ``bytes`` and ``bytearray``
|
2018-02-21 22:18:38 -05:00
|
|
|
* Add ``bytes.fromsize`` and ``bytearray.fromsize`` alternative constructors
|
|
|
|
* Add ``bytes.fromord`` and ``bytearray.fromord`` alternative constructors
|
|
|
|
* Add ``bytes.getbyte`` and ``bytearray.getbyte`` byte retrieval methods
|
|
|
|
* Add ``bytes.iterbytes`` and ``bytearray.iterbytes`` alternative iterators
|
|
|
|
|
|
|
|
And one built-in::
|
|
|
|
|
2019-04-21 11:59:45 -04:00
|
|
|
* ``bchr``
|
2014-03-29 21:28:34 -04:00
|
|
|
|
2019-04-21 11:59:45 -04:00
|
|
|
PEP Deferral
|
|
|
|
============
|
|
|
|
|
|
|
|
This PEP has been deferred until Python 3.9 at the earliest, as the open
|
|
|
|
questions aren't currently expected to be resolved in time for the Python 3.8
|
|
|
|
feature addition deadline in May 2019 (if you're keen to see these changes
|
|
|
|
implemented and are willing to drive that resolution process, contact the PEP
|
|
|
|
authors).
|
2014-04-03 08:33:36 -04:00
|
|
|
|
2014-08-16 01:05:16 -04:00
|
|
|
Proposals
|
|
|
|
=========
|
2014-04-03 08:33:36 -04:00
|
|
|
|
2019-04-21 11:59:45 -04:00
|
|
|
Discourage use of current "zero-initialised sequence" behaviour
|
|
|
|
---------------------------------------------------------------
|
2014-08-15 01:34:40 -04:00
|
|
|
|
2014-08-16 01:05:16 -04:00
|
|
|
Currently, the ``bytes`` and ``bytearray`` constructors accept an integer
|
|
|
|
argument and interpret it as meaning to create a zero-initialised sequence
|
|
|
|
of the given size::
|
2014-03-29 21:28:34 -04:00
|
|
|
|
2014-08-16 01:05:16 -04:00
|
|
|
>>> bytes(3)
|
|
|
|
b'\x00\x00\x00'
|
|
|
|
>>> bytearray(3)
|
|
|
|
bytearray(b'\x00\x00\x00')
|
2014-03-29 21:28:34 -04:00
|
|
|
|
2019-04-21 11:59:45 -04:00
|
|
|
This PEP proposes to update the documentation to discourage making use of that
|
|
|
|
input type dependent behaviour in Python 3.9, suggesting to use a new, more
|
|
|
|
explicit, ``bytes.fromsize(n)`` or ``bytearray.fromsize(n)`` spelling instead
|
|
|
|
(see next section).
|
|
|
|
|
|
|
|
However, the current handling of numeric inputs in the default constructors
|
|
|
|
would remain in place indefinitely to avoid introducing a compatibility break.
|
2014-03-29 21:28:34 -04:00
|
|
|
|
2014-08-16 01:05:16 -04:00
|
|
|
No other changes are proposed to the existing constructors.
|
2014-03-29 21:28:34 -04:00
|
|
|
|
|
|
|
|
2018-02-21 22:18:38 -05:00
|
|
|
Addition of explicit "count and byte initialised sequence" constructors
|
|
|
|
-----------------------------------------------------------------------
|
2014-03-29 21:28:34 -04:00
|
|
|
|
2019-04-21 11:59:45 -04:00
|
|
|
To replace the now discouraged behaviour, this PEP proposes the addition of an
|
2018-02-21 22:18:38 -05:00
|
|
|
explicit ``fromsize`` alternative constructor as a class method on both
|
|
|
|
``bytes`` and ``bytearray`` whose first argument is the count, and whose
|
|
|
|
second argument is the fill byte to use (defaults to ``\x00``)::
|
2014-03-29 21:28:34 -04:00
|
|
|
|
2018-02-21 22:18:38 -05:00
|
|
|
>>> bytes.fromsize(3)
|
2014-08-16 01:05:16 -04:00
|
|
|
b'\x00\x00\x00'
|
2018-02-21 22:18:38 -05:00
|
|
|
>>> bytearray.fromsize(3)
|
2014-08-16 01:05:16 -04:00
|
|
|
bytearray(b'\x00\x00\x00')
|
2018-02-21 22:18:38 -05:00
|
|
|
>>> bytes.fromsize(5, b'\x0a')
|
|
|
|
b'\x0a\x0a\x0a\x0a\x0a'
|
|
|
|
>>> bytearray.fromsize(5, b'\x0a')
|
|
|
|
bytearray(b'\x0a\x0a\x0a\x0a\x0a')
|
2014-03-29 21:28:34 -04:00
|
|
|
|
2019-04-21 11:59:45 -04:00
|
|
|
``fromsize`` will behave just as the current constructors behave when passed a
|
|
|
|
single integer, while allowing for non-zero fill values when needed.
|
|
|
|
|
|
|
|
Similar to ``str.center``, ``str.ljust``, and ``str.rjust``, both parameters
|
|
|
|
would be positional-only with no externally visible name.
|
2014-03-29 21:28:34 -04:00
|
|
|
|
|
|
|
|
2018-02-21 22:18:38 -05:00
|
|
|
Addition of "bchr" function and explicit "single byte" constructors
|
|
|
|
-------------------------------------------------------------------
|
2014-03-29 21:28:34 -04:00
|
|
|
|
2018-02-21 22:18:38 -05:00
|
|
|
As binary counterparts to the text ``chr`` function, this PEP proposes
|
|
|
|
the addition of a ``bchr`` function and an explicit ``fromord`` alternative
|
|
|
|
constructor as a class method on both ``bytes`` and ``bytearray``::
|
2014-03-29 21:28:34 -04:00
|
|
|
|
2018-02-21 22:18:38 -05:00
|
|
|
>>> bchr(ord("A"))
|
|
|
|
b'A'
|
|
|
|
>>> bchr(ord(b"A"))
|
|
|
|
b'A'
|
|
|
|
>>> bytes.fromord(65)
|
|
|
|
b'A'
|
|
|
|
>>> bytearray.fromord(65)
|
|
|
|
bytearray(b'A')
|
2014-08-16 01:05:16 -04:00
|
|
|
|
|
|
|
These methods will only accept integers in the range 0 to 255 (inclusive)::
|
|
|
|
|
2018-02-21 22:18:38 -05:00
|
|
|
>>> bytes.fromord(512)
|
2014-03-29 21:28:34 -04:00
|
|
|
Traceback (most recent call last):
|
|
|
|
File "<stdin>", line 1, in <module>
|
2018-02-21 22:18:38 -05:00
|
|
|
ValueError: integer must be in range(0, 256)
|
2014-03-29 21:28:34 -04:00
|
|
|
|
2018-02-21 22:18:38 -05:00
|
|
|
>>> bytes.fromord(1.0)
|
2014-08-16 01:05:16 -04:00
|
|
|
Traceback (most recent call last):
|
|
|
|
File "<stdin>", line 1, in <module>
|
|
|
|
TypeError: 'float' object cannot be interpreted as an integer
|
2014-03-29 21:28:34 -04:00
|
|
|
|
2018-02-24 20:20:44 -05:00
|
|
|
While this does create some duplication, there are valid reasons for it:
|
2018-02-21 22:18:38 -05:00
|
|
|
|
2019-04-21 11:59:45 -04:00
|
|
|
* the ``bchr`` builtin is to recreate the ``ord``/``chr``/``unichr`` trio from
|
|
|
|
Python 2 under a different naming scheme (however, see the Open Questions
|
|
|
|
section below)
|
2018-02-21 22:18:38 -05:00
|
|
|
* the class method is mainly for the ``bytearray.fromord`` case, with
|
|
|
|
``bytes.fromord`` added for consistency
|
|
|
|
|
2014-08-16 01:05:16 -04:00
|
|
|
The documentation of the ``ord`` builtin will be updated to explicitly note
|
2018-02-21 22:18:38 -05:00
|
|
|
that ``bchr`` is the primary inverse operation for binary data, while ``chr``
|
|
|
|
is the inverse operation for text data, and that ``bytes.fromord`` and
|
|
|
|
``bytearray.fromord`` also exist.
|
2014-03-29 21:54:55 -04:00
|
|
|
|
2018-02-21 22:18:38 -05:00
|
|
|
Behaviourally, ``bytes.fromord(x)`` will be equivalent to the current
|
2014-08-16 01:05:16 -04:00
|
|
|
``bytes([x])`` (and similarly for ``bytearray``). The new spelling is
|
|
|
|
expected to be easier to discover and easier to read (especially when used
|
|
|
|
in conjunction with indexing operations on binary sequence types).
|
2014-03-29 21:28:34 -04:00
|
|
|
|
2014-08-16 01:05:16 -04:00
|
|
|
As a separate method, the new spelling will also work better with higher
|
|
|
|
order functions like ``map``.
|
2014-03-29 21:54:55 -04:00
|
|
|
|
2014-03-29 21:28:34 -04:00
|
|
|
|
2018-02-21 22:18:38 -05:00
|
|
|
Addition of "getbyte" method to retrieve a single byte
|
|
|
|
------------------------------------------------------
|
2014-04-03 08:33:36 -04:00
|
|
|
|
2018-02-21 22:18:38 -05:00
|
|
|
This PEP proposes that ``bytes`` and ``bytearray`` gain the method ``getbyte``
|
|
|
|
which will always return ``bytes``::
|
2014-04-03 08:33:36 -04:00
|
|
|
|
2018-02-21 22:18:38 -05:00
|
|
|
>>> b'abc'.getbyte(0)
|
|
|
|
b'a'
|
2014-04-03 08:33:36 -04:00
|
|
|
|
2018-02-21 22:18:38 -05:00
|
|
|
If an index is asked for that doesn't exist, ``IndexError`` is raised::
|
2014-03-29 21:28:34 -04:00
|
|
|
|
2018-02-21 22:18:38 -05:00
|
|
|
>>> b'abc'.getbyte(9)
|
|
|
|
Traceback (most recent call last):
|
|
|
|
File "<stdin>", line 1, in <module>
|
|
|
|
IndexError: index out of range
|
2014-03-29 21:28:34 -04:00
|
|
|
|
|
|
|
|
2018-02-21 22:18:38 -05:00
|
|
|
Addition of optimised iterator methods that produce ``bytes`` objects
|
|
|
|
---------------------------------------------------------------------
|
|
|
|
|
|
|
|
This PEP proposes that ``bytes`` and ``bytearray``gain an optimised
|
|
|
|
``iterbytes`` method that produces length 1 ``bytes`` objects rather than
|
|
|
|
integers::
|
|
|
|
|
|
|
|
for x in data.iterbytes():
|
|
|
|
# x is a length 1 ``bytes`` object, rather than an integer
|
2014-03-30 03:03:44 -04:00
|
|
|
|
2018-02-21 22:18:38 -05:00
|
|
|
For example::
|
2014-03-30 03:03:44 -04:00
|
|
|
|
2018-02-21 22:18:38 -05:00
|
|
|
>>> tuple(b"ABC".iterbytes())
|
|
|
|
(b'A', b'B', b'C')
|
2014-08-16 01:05:16 -04:00
|
|
|
|
|
|
|
|
|
|
|
Design discussion
|
|
|
|
=================
|
|
|
|
|
|
|
|
Why not rely on sequence repetition to create zero-initialised sequences?
|
|
|
|
-------------------------------------------------------------------------
|
|
|
|
|
|
|
|
Zero-initialised sequences can be created via sequence repetition::
|
|
|
|
|
|
|
|
>>> b'\x00' * 3
|
|
|
|
b'\x00\x00\x00'
|
|
|
|
>>> bytearray(b'\x00') * 3
|
|
|
|
bytearray(b'\x00\x00\x00')
|
|
|
|
|
|
|
|
However, this was also the case when the ``bytearray`` type was originally
|
|
|
|
designed, and the decision was made to add explicit support for it in the
|
|
|
|
type constructor. The immutable ``bytes`` type then inherited that feature
|
|
|
|
when it was introduced in PEP 3137.
|
|
|
|
|
|
|
|
This PEP isn't revisiting that original design decision, just changing the
|
|
|
|
spelling as users sometimes find the current behaviour of the binary sequence
|
|
|
|
constructors surprising. In particular, there's a reasonable case to be made
|
|
|
|
that ``bytes(x)`` (where ``x`` is an integer) should behave like the
|
2018-02-21 22:18:38 -05:00
|
|
|
``bytes.fromord(x)`` proposal in this PEP. Providing both behaviours as separate
|
2014-08-16 01:05:16 -04:00
|
|
|
class methods avoids that ambiguity.
|
2014-03-30 03:03:44 -04:00
|
|
|
|
|
|
|
|
2019-04-21 11:59:45 -04:00
|
|
|
Why use positional-only parameters?
|
|
|
|
-----------------------------------
|
|
|
|
|
|
|
|
This is for consistency with the other methods on the affected types, and to
|
|
|
|
avoid having to devise sensible names for them.
|
|
|
|
|
|
|
|
|
2018-02-21 22:18:38 -05:00
|
|
|
Open Questions
|
|
|
|
==============
|
|
|
|
|
2019-04-21 11:59:45 -04:00
|
|
|
* Given the "multiple ways to do it" outcome, is the proposed ``bchr`` builtin
|
|
|
|
actually worth adding, or is ``ord``/``bytes.fromord``/``chr`` a sufficiently
|
|
|
|
straightforward replacement for the Python 2 ``ord``/``chr``/``unichr``
|
|
|
|
operations?
|
|
|
|
|
|
|
|
* Do we add ``iterbytes`` to ``memoryview``, or modify
|
|
|
|
``memoryview.cast()`` to accept ``'s'`` as a single-byte interpretation? Or
|
|
|
|
do we ignore ``memoryview`` for now and add it later?
|
2018-02-21 22:18:38 -05:00
|
|
|
|
|
|
|
|
2014-03-29 21:28:34 -04:00
|
|
|
References
|
|
|
|
==========
|
|
|
|
|
2014-08-16 01:05:16 -04:00
|
|
|
.. [1] Initial March 2014 discussion thread on python-ideas
|
|
|
|
(https://mail.python.org/pipermail/python-ideas/2014-March/027295.html)
|
|
|
|
.. [2] Guido's initial feedback in that thread
|
|
|
|
(https://mail.python.org/pipermail/python-ideas/2014-March/027376.html)
|
|
|
|
.. [3] Issue proposing moving zero-initialised sequences to a dedicated API
|
|
|
|
(http://bugs.python.org/issue20895)
|
|
|
|
.. [4] Issue proposing to use calloc() for zero-initialised binary sequences
|
|
|
|
(http://bugs.python.org/issue21644)
|
|
|
|
.. [5] August 2014 discussion thread on python-dev
|
|
|
|
(https://mail.python.org/pipermail/python-ideas/2014-March/027295.html)
|
2018-02-21 22:18:38 -05:00
|
|
|
.. [6] June 2016 discussion thread on python-dev
|
|
|
|
(https://mail.python.org/pipermail/python-dev/2016-June/144875.html)
|
2014-03-29 21:28:34 -04:00
|
|
|
|
|
|
|
|
|
|
|
Copyright
|
|
|
|
=========
|
|
|
|
|
|
|
|
This document has been placed in the public domain.
|