199 lines
6.8 KiB
Plaintext
199 lines
6.8 KiB
Plaintext
PEP: 467
|
|
Title: Minor API improvements for binary sequences
|
|
Version: $Revision$
|
|
Last-Modified: $Date$
|
|
Author: Nick Coghlan <ncoghlan@gmail.com>
|
|
Status: Draft
|
|
Type: Standards Track
|
|
Content-Type: text/x-rst
|
|
Created: 2014-03-30
|
|
Python-Version: 3.5
|
|
Post-History: 2014-03-30 2014-08-15 2014-08-16
|
|
|
|
|
|
Abstract
|
|
========
|
|
|
|
During the initial development of the Python 3 language specification, the
|
|
core ``bytes`` type for arbitrary binary data started as the mutable type
|
|
that is now referred to as ``bytearray``. Other aspects of operating in
|
|
the binary domain in Python have also evolved over the course of the Python
|
|
3 series.
|
|
|
|
This PEP proposes four small adjustments to the APIs of the ``bytes``,
|
|
``bytearray`` and ``memoryview`` types to make it easier to operate entirely
|
|
in the binary domain:
|
|
|
|
* Deprecate passing single integer values to ``bytes`` and ``bytearray``
|
|
* Add ``bytes.zeros`` and ``bytearray.zeros`` alternative constructors
|
|
* Add ``bytes.byte`` and ``bytearray.byte`` alternative constructors
|
|
* Add ``bytes.iterbytes``, ``bytearray.iterbytes`` and
|
|
``memoryview.iterbytes`` alternative iterators
|
|
|
|
|
|
Proposals
|
|
=========
|
|
|
|
Deprecation of current "zero-initialised sequence" behaviour
|
|
------------------------------------------------------------
|
|
|
|
Currently, the ``bytes`` and ``bytearray`` constructors accept an integer
|
|
argument and interpret it as meaning to create a zero-initialised sequence
|
|
of the given size::
|
|
|
|
>>> bytes(3)
|
|
b'\x00\x00\x00'
|
|
>>> bytearray(3)
|
|
bytearray(b'\x00\x00\x00')
|
|
|
|
This PEP proposes to deprecate that behaviour in Python 3.5, and remove it
|
|
entirely in Python 3.6.
|
|
|
|
No other changes are proposed to the existing constructors.
|
|
|
|
|
|
Addition of explicit "zero-initialised sequence" constructors
|
|
-------------------------------------------------------------
|
|
|
|
To replace the deprecated behaviour, this PEP proposes the addition of an
|
|
explicit ``zeros`` alternative constructor as a class method on both
|
|
``bytes`` and ``bytearray``::
|
|
|
|
>>> bytes.zeros(3)
|
|
b'\x00\x00\x00'
|
|
>>> bytearray.zeros(3)
|
|
bytearray(b'\x00\x00\x00')
|
|
|
|
It will behave just as the current constructors behave when passed a single
|
|
integer.
|
|
|
|
The specific choice of ``zeros`` as the alternative constructor name is taken
|
|
from the corresponding initialisation function in NumPy (although, as these
|
|
are 1-dimensional sequence types rather than N-dimensional matrices, the
|
|
constructors take a length as input rather than a shape tuple)
|
|
|
|
|
|
Addition of explicit "single byte" constructors
|
|
-----------------------------------------------
|
|
|
|
As binary counterparts to the text ``chr`` function, this PEP proposes the
|
|
addition of an explicit ``byte`` alternative constructor as a class method
|
|
on both ``bytes`` and ``bytearray``::
|
|
|
|
>>> bytes.byte(3)
|
|
b'\x03'
|
|
>>> bytearray.byte(3)
|
|
bytearray(b'\x03')
|
|
|
|
These methods will only accept integers in the range 0 to 255 (inclusive)::
|
|
|
|
>>> bytes.byte(512)
|
|
Traceback (most recent call last):
|
|
File "<stdin>", line 1, in <module>
|
|
ValueError: bytes must be in range(0, 256)
|
|
|
|
>>> bytes.byte(1.0)
|
|
Traceback (most recent call last):
|
|
File "<stdin>", line 1, in <module>
|
|
TypeError: 'float' object cannot be interpreted as an integer
|
|
|
|
The documentation of the ``ord`` builtin will be updated to explicitly note
|
|
that ``bytes.byte`` is the inverse operation for binary data, while ``chr``
|
|
is the inverse operation for text data.
|
|
|
|
Behaviourally, ``bytes.byte(x)`` will be equivalent to the current
|
|
``bytes([x])`` (and similarly for ``bytearray``). The new spelling is
|
|
expected to be easier to discover and easier to read (especially when used
|
|
in conjunction with indexing operations on binary sequence types).
|
|
|
|
As a separate method, the new spelling will also work better with higher
|
|
order functions like ``map``.
|
|
|
|
|
|
Addition of optimised iterator methods that produce ``bytes`` objects
|
|
---------------------------------------------------------------------
|
|
|
|
This PEP proposes that ``bytes``, ``bytearray`` and ``memoryview`` gain an
|
|
optimised ``iterbytes`` method that produces length 1 ``bytes`` objects
|
|
rather than integers::
|
|
|
|
for x in data.iterbytes():
|
|
# x is a length 1 ``bytes`` object, rather than an integer
|
|
|
|
The method can be used with arbitrary buffer exporting objects by wrapping
|
|
them in a ``memoryview`` instance first::
|
|
|
|
for x in memoryview(data).iterbytes():
|
|
# x is a length 1 ``bytes`` object, rather than an integer
|
|
|
|
For ``memoryview``, the semantics of ``iterbytes()`` are defined such that::
|
|
|
|
memview.tobytes() == b''.join(memview.iterbytes())
|
|
|
|
This allows the raw bytes of the memory view to be iterated over without
|
|
needing to make a copy, regardless of the defined shape and format.
|
|
|
|
The main advantage this method offers over the ``map(bytes.byte, data)``
|
|
approach is that it is guaranteed *not* to fail midstream with a
|
|
``ValueError`` or ``TypeError``. By contrast, when using the ``map`` based
|
|
approach, the type and value of the individual items in the iterable are
|
|
only checked as they are retrieved and passed through the ``bytes.byte``
|
|
constructor.
|
|
|
|
|
|
Design discussion
|
|
=================
|
|
|
|
Why not rely on sequence repetition to create zero-initialised sequences?
|
|
-------------------------------------------------------------------------
|
|
|
|
Zero-initialised sequences can be created via sequence repetition::
|
|
|
|
>>> b'\x00' * 3
|
|
b'\x00\x00\x00'
|
|
>>> bytearray(b'\x00') * 3
|
|
bytearray(b'\x00\x00\x00')
|
|
|
|
However, this was also the case when the ``bytearray`` type was originally
|
|
designed, and the decision was made to add explicit support for it in the
|
|
type constructor. The immutable ``bytes`` type then inherited that feature
|
|
when it was introduced in PEP 3137.
|
|
|
|
This PEP isn't revisiting that original design decision, just changing the
|
|
spelling as users sometimes find the current behaviour of the binary sequence
|
|
constructors surprising. In particular, there's a reasonable case to be made
|
|
that ``bytes(x)`` (where ``x`` is an integer) should behave like the
|
|
``bytes.byte(x)`` proposal in this PEP. Providing both behaviours as separate
|
|
class methods avoids that ambiguity.
|
|
|
|
|
|
References
|
|
==========
|
|
|
|
.. [1] Initial March 2014 discussion thread on python-ideas
|
|
(https://mail.python.org/pipermail/python-ideas/2014-March/027295.html)
|
|
.. [2] Guido's initial feedback in that thread
|
|
(https://mail.python.org/pipermail/python-ideas/2014-March/027376.html)
|
|
.. [3] Issue proposing moving zero-initialised sequences to a dedicated API
|
|
(http://bugs.python.org/issue20895)
|
|
.. [4] Issue proposing to use calloc() for zero-initialised binary sequences
|
|
(http://bugs.python.org/issue21644)
|
|
.. [5] August 2014 discussion thread on python-dev
|
|
(https://mail.python.org/pipermail/python-ideas/2014-March/027295.html)
|
|
|
|
|
|
Copyright
|
|
=========
|
|
|
|
This document has been placed in the public domain.
|
|
|
|
|
|
..
|
|
Local Variables:
|
|
mode: indented-text
|
|
indent-tabs-mode: nil
|
|
sentence-end-double-space: t
|
|
fill-column: 70
|
|
coding: utf-8
|
|
End:
|