python-peps/pep-0467.txt

PEP: 467
Title: Improved API consistency for bytes and bytearray
Version: $Revision$
Last-Modified: $Date$
Author: Nick Coghlan <ncoghlan@gmail.com>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 2014-03-30
Python-Version: 3.5
Post-History: 2014-03-30


Abstract
========

During the initial development of the Python 3 language specification, the
core ``bytes`` type for arbitrary binary data started as the mutable type
that is now referred to as ``bytearray``. Other aspects of operating in
the binary domain in Python have also evolved over the course of the Python
3 series.

This PEP proposes a number of small adjustments to the APIs of the ``bytes``
and ``bytearray`` types to make their behaviour more internally consistent
and to make it easier to operate entirely in the binary domain, as well as
changes to their documentation to make it easier to grasp their dual roles
as containers of "arbitrary binary data" and "binary data with ASCII
compatible segments".


Background
==========

To simplify the task of writing the Python 3 documentation, the ``bytes``
and ``bytearray`` types were documented primarily in terms of the way they
differed from the Unicode based Python 3 ``str`` type. Even when I
`heavily revised the sequence documentation
<http://hg.python.org/cpython/rev/463f52d20314>`__ in 2012, I retained that
simplifying shortcut.

However, it turns out that this approach to the documentation of these types
has a problem: it doesn't adequately introduce users to their hybrid nature,
where they can be manipulated *either* as a "sequence of integers" type,
*or* as ``str``-like types that assume ASCII compatible data.

In addition to the documentation issues, there are some lingering design
quirks from an earlier pre-release design where there was *no* separate
``bytearray`` type, and instead the core ``bytes`` type was mutable (with
no immutable counterpart).

Finally, additional experience with using the existing Python 3 binary
sequence types in real world applications has suggested it would be
beneficial to make it easier to convert integers to length 1 bytes objects.


Proposals
=========

As a "consistency improvement" proposal, this PEP is actually about a number
of smaller micro-proposals, each aimed at improving the self-consistency of
the binary data model in Python 3. Proposals are motivated by one of three
factors:

* removing remnants of the original design of ``bytes`` as a mutable type
* allowing users to easily convert integer values to a length 1 ``bytes``
  object
* consistently applying the following analogies to the type API designs
  and documentation:

  * ``bytes``: tuple of integers, with additional str-like methods
  * ``bytearray``: list of integers, with additional str-like methods


Alternate Constructors
----------------------

The ``bytes`` and ``bytearray`` constructors currently accept an integer
argument, but interpret it to mean a zero-filled object of the given length.
This is a legacy of the original design of ``bytes`` as a mutable type,
rather than a particularly intuitive behaviour for users. It has become
especially confusing now that other ``bytes`` interfaces treat integers
and the corresponding length 1 bytes instances as equivalent input.
Compare::

    >>> b"\x03" in bytes([1, 2, 3])
    True
    >>> 3 in bytes([1, 2, 3])
    True

    >>> bytes(b"\x03")
    b'\x03'
    >>> bytes(3)
    b'\x00\x00\x00'

This PEP proposes that the current handling of integers in the bytes and
bytearray constructors by deprecated in Python 3.5 and targeted for
removal in Python 3.7, being replaced by two more explicit alternate
constructors provided as class methods. The initial python-ideas thread
[ideas-thread1]_ that spawned this PEP was specifically aimed at deprecating
this constructor behaviour.

Firstly, a ``byte`` constructor is proposed that converts integers
in the range 0 to 255 (inclusive) to a ``bytes`` object::

    >>> bytes.byte(3)
    b'\x03'
    >>> bytearray.byte(3)
    bytearray(b'\x03')
    >>> bytes.byte(512)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    ValueError: bytes must be in range(0, 256)

One specific use case for this alternate constructor is to easily convert
the result of indexing operations on ``bytes`` and other binary sequences
from an integer to a ``bytes`` object. The documentation for this API
should note that its counterpart for the reverse conversion is ``ord()``.
The ``ord()`` documentation will also be updated to note that while
``chr()`` is the counterpart for ``str`` input, ``bytes.byte`` and
``bytearray.byte`` are the counterparts for binary input.

Secondly, a ``zeros`` constructor is proposed that serves as a direct
replacement for the current constructor behaviour, rather than having to use
sequence repetition to achieve the same effect in a less intuitive way::

    >>> bytes.zeros(3)
    b'\x00\x00\x00'
    >>> bytearray.zeros(3)
    bytearray(b'\x00\x00\x00')

The chosen name here is taken from the corresponding initialisation function
in NumPy (although, as these are sequence types rather than N-dimensional
matrices, the constructors take a length as input rather than a shape tuple)

While ``bytes.byte`` and ``bytearray.zeros`` are expected to be the more
useful duo amongst the new constructors, ``bytes.zeros`` and
`bytearray.byte`` are provided in order to maintain API consistency between
the two types.


Iteration
---------

While iteration over ``bytes`` objects and other binary sequences produces
integers, it is sometimes desirable to iterate over length 1 bytes objects
instead.

To handle this situation more obviously (and more efficiently) than would be
the case with the ``map(bytes.byte, data)`` construct enabled by the above
constructor changes, this PEP proposes the addition of a new ``iterbytes``
method to ``bytes``, ``bytearray`` and ``memoryview``::

    for x in data.iterbytes():
        # x is a length 1 ``bytes`` object, rather than an integer

Third party types and arbitrary containers of integers that lack the new
method can still be handled by combining ``map`` with the new
``bytes.byte()`` alternate constructor proposed above::

    for x in map(bytes.byte, data):
        # x is a length 1 ``bytes`` object, rather than an integer
        # This works with *any* container of integers in the range
        # 0 to 255 inclusive


Open questions
^^^^^^^^^^^^^^

* The fallback case above suggests that this could perhaps be better handled
  as an ``iterbytes(data)`` *builtin*, that used ``data.__iterbytes__()``
  if defined, but otherwise fell back to ``map(bytes.byte, data)``::

    for x in iterbytes(data):
        # x is a length 1 ``bytes`` object, rather than an integer
        # This works with *any* container of integers in the range
        # 0 to 255 inclusive


Documentation clarifications
----------------------------

In an attempt to clarify the `documentation
<https://docs.python.org/dev/library/stdtypes.html#binary-sequence-types-bytes-bytearray-memoryview>`__
of the ``bytes`` and ``bytearray`` types, the following changes are
proposed:

* the documentation of the *sequence* behaviour of each type is moved to
  section for that individual type. These sections will be updated to
  explicitly make the ``tuple of integers`` and ``list of integers``
  analogies, as well as to make it clear that these parts of the API work
  with arbitrary binary data
* the current "Bytes and bytearray operations" section will be updated to
  "Handling binary data with ASCII compatible segments", and will explicitly
  list *all* of the methods that are included.
* clarify that due to their origins in the API of the immutable ``str``
  type, even the ``bytearray`` versions of these methods do *not* operate
  in place, but instead create a new object.

A patch for at least this part of the proposal will be prepared before
submitting the PEP for approval, as writing out these docs completely may
suggest additional opportunities for API consistency improvements.


References
==========

.. [ideas-thread1] https://mail.python.org/pipermail/python-ideas/2014-March/027295.html
.. [empty-buffer-issue] http://bugs.python.org/issue20895


Copyright
=========

This document has been placed in the public domain.


..
   Local Variables:
   mode: indented-text
   indent-tabs-mode: nil
   sentence-end-double-space: t
   fill-column: 70
   coding: utf-8
   End:
PEP 467: bytes/bytearray API & docs improvements 2014-03-29 21:28:34 -04:00			`PEP: 467`
			`Title: Improved API consistency for bytes and bytearray`
			`Version: $Revision$`
			`Last-Modified: $Date$`
			`Author: Nick Coghlan <ncoghlan@gmail.com>`
			`Status: Draft`
			`Type: Standards Track`
			`Content-Type: text/x-rst`
			`Created: 2014-03-30`
			`Python-Version: 3.5`
			`Post-History: 2014-03-30`


			`Abstract`
			`========`

			`During the initial development of the Python 3 language specification, the`
			core ``bytes`` type for arbitrary binary data started as the mutable type
			that is now referred to as ``bytearray``. Other aspects of operating in
			`the binary domain in Python have also evolved over the course of the Python`
			`3 series.`

			This PEP proposes a number of small adjustments to the APIs of the ``bytes``
			and ``bytearray`` types to make their behaviour more internally consistent
PEP 467: descope dramatically based on Guido's feedback 2014-04-03 08:33:36 -04:00			`and to make it easier to operate entirely in the binary domain, as well as`
			`changes to their documentation to make it easier to grasp their dual roles`
			`as containers of "arbitrary binary data" and "binary data with ASCII`
			`compatible segments".`
PEP 467: bytes/bytearray API & docs improvements 2014-03-29 21:28:34 -04:00

			`Background`
			`==========`

PEP 467: descope dramatically based on Guido's feedback 2014-04-03 08:33:36 -04:00			To simplify the task of writing the Python 3 documentation, the ``bytes``
			and ``bytearray`` types were documented primarily in terms of the way they
			differed from the Unicode based Python 3 ``str`` type. Even when I
			`heavily revised the sequence documentation
			<http://hg.python.org/cpython/rev/463f52d20314>`__ in 2012, I retained that
			`simplifying shortcut.`

			`However, it turns out that this approach to the documentation of these types`
			`has a problem: it doesn't adequately introduce users to their hybrid nature,`
			`where they can be manipulated either as a "sequence of integers" type,`
			or as ``str``-like types that assume ASCII compatible data.

			`In addition to the documentation issues, there are some lingering design`
			`quirks from an earlier pre-release design where there was no separate`
PEP 467: bytes/bytearray API & docs improvements 2014-03-29 21:28:34 -04:00			``bytearray`` type, and instead the core ``bytes`` type was mutable (with
PEP 467: descope dramatically based on Guido's feedback 2014-04-03 08:33:36 -04:00			`no immutable counterpart).`
PEP 467: bytes/bytearray API & docs improvements 2014-03-29 21:28:34 -04:00
PEP 467: descope dramatically based on Guido's feedback 2014-04-03 08:33:36 -04:00			`Finally, additional experience with using the existing Python 3 binary`
			`sequence types in real world applications has suggested it would be`
			`beneficial to make it easier to convert integers to length 1 bytes objects.`
PEP 467: bytes/bytearray API & docs improvements 2014-03-29 21:28:34 -04:00

			`Proposals`
			`=========`

			`As a "consistency improvement" proposal, this PEP is actually about a number`
			`of smaller micro-proposals, each aimed at improving the self-consistency of`
			`the binary data model in Python 3. Proposals are motivated by one of three`
			`factors:`

			* removing remnants of the original design of ``bytes`` as a mutable type
PEP 467: descope dramatically based on Guido's feedback 2014-04-03 08:33:36 -04:00			* allowing users to easily convert integer values to a length 1 ``bytes``
PEP 467: bytes/bytearray API & docs improvements 2014-03-29 21:28:34 -04:00			`object`
PEP 467: descope dramatically based on Guido's feedback 2014-04-03 08:33:36 -04:00			`* consistently applying the following analogies to the type API designs`
			`and documentation:`

			* ``bytes``: tuple of integers, with additional str-like methods
			* ``bytearray``: list of integers, with additional str-like methods
PEP 467: bytes/bytearray API & docs improvements 2014-03-29 21:28:34 -04:00

			`Alternate Constructors`
			`----------------------`

			The ``bytes`` and ``bytearray`` constructors currently accept an integer
			`argument, but interpret it to mean a zero-filled object of the given length.`
			This is a legacy of the original design of ``bytes`` as a mutable type,
			`rather than a particularly intuitive behaviour for users. It has become`
			especially confusing now that other ``bytes`` interfaces treat integers
			`and the corresponding length 1 bytes instances as equivalent input.`
			`Compare::`

			`>>> b"\x03" in bytes([1, 2, 3])`
			`True`
			`>>> 3 in bytes([1, 2, 3])`
			`True`

			`>>> bytes(b"\x03")`
			`b'\x03'`
			`>>> bytes(3)`
			`b'\x00\x00\x00'`

			`This PEP proposes that the current handling of integers in the bytes and`
PEP 467: descope dramatically based on Guido's feedback 2014-04-03 08:33:36 -04:00			`bytearray constructors by deprecated in Python 3.5 and targeted for`
			`removal in Python 3.7, being replaced by two more explicit alternate`
			`constructors provided as class methods. The initial python-ideas thread`
			`[ideas-thread1]_ that spawned this PEP was specifically aimed at deprecating`
			`this constructor behaviour.`
PEP 467: bytes/bytearray API & docs improvements 2014-03-29 21:28:34 -04:00
PEP 467: descope dramatically based on Guido's feedback 2014-04-03 08:33:36 -04:00			Firstly, a ``byte`` constructor is proposed that converts integers
			in the range 0 to 255 (inclusive) to a ``bytes`` object::
PEP 467: bytes/bytearray API & docs improvements 2014-03-29 21:28:34 -04:00
			`>>> bytes.byte(3)`
			`b'\x03'`
PEP 467: descope dramatically based on Guido's feedback 2014-04-03 08:33:36 -04:00			`>>> bytearray.byte(3)`
			`bytearray(b'\x03')`
PEP 467: bytes/bytearray API & docs improvements 2014-03-29 21:28:34 -04:00			`>>> bytes.byte(512)`
			`Traceback (most recent call last):`
			`File "<stdin>", line 1, in <module>`
			`ValueError: bytes must be in range(0, 256)`

			`One specific use case for this alternate constructor is to easily convert`
			the result of indexing operations on ``bytes`` and other binary sequences
			from an integer to a ``bytes`` object. The documentation for this API
			should note that its counterpart for the reverse conversion is ``ord()``.
PEP 467: descope dramatically based on Guido's feedback 2014-04-03 08:33:36 -04:00			The ``ord()`` documentation will also be updated to note that while
			``chr()`` is the counterpart for ``str`` input, ``bytes.byte`` and
			``bytearray.byte`` are the counterparts for binary input.
PEP 467: bytes/bytearray API & docs improvements 2014-03-29 21:28:34 -04:00
PEP 467: descope dramatically based on Guido's feedback 2014-04-03 08:33:36 -04:00			Secondly, a ``zeros`` constructor is proposed that serves as a direct
PEP 467: bytes/bytearray API & docs improvements 2014-03-29 21:28:34 -04:00			`replacement for the current constructor behaviour, rather than having to use`
			`sequence repetition to achieve the same effect in a less intuitive way::`

PEP 467: descope dramatically based on Guido's feedback 2014-04-03 08:33:36 -04:00			`>>> bytes.zeros(3)`
			`b'\x00\x00\x00'`
			`>>> bytearray.zeros(3)`
PEP 467: bytes/bytearray API & docs improvements 2014-03-29 21:28:34 -04:00			`bytearray(b'\x00\x00\x00')`

PEP 467: descope dramatically based on Guido's feedback 2014-04-03 08:33:36 -04:00			`The chosen name here is taken from the corresponding initialisation function`
			`in NumPy (although, as these are sequence types rather than N-dimensional`
			`matrices, the constructors take a length as input rather than a shape tuple)`
PEP 467: Additional tweaks after re-reading 2014-03-29 21:54:55 -04:00
PEP 467: descope dramatically based on Guido's feedback 2014-04-03 08:33:36 -04:00			While ``bytes.byte`` and ``bytearray.zeros`` are expected to be the more
			useful duo amongst the new constructors, ``bytes.zeros`` and
			`bytearray.byte`` are provided in order to maintain API consistency between
			`the two types.`
PEP 467: bytes/bytearray API & docs improvements 2014-03-29 21:28:34 -04:00
PEP 467: Additional tweaks after re-reading 2014-03-29 21:54:55 -04:00
PEP 467: bytes/bytearray API & docs improvements 2014-03-29 21:28:34 -04:00			`Iteration`
			`---------`

PEP 467: descope dramatically based on Guido's feedback 2014-04-03 08:33:36 -04:00			While iteration over ``bytes`` objects and other binary sequences produces
			`integers, it is sometimes desirable to iterate over length 1 bytes objects`
			`instead.`

			`To handle this situation more obviously (and more efficiently) than would be`
			the case with the ``map(bytes.byte, data)`` construct enabled by the above
			constructor changes, this PEP proposes the addition of a new ``iterbytes``
			method to ``bytes``, ``bytearray`` and ``memoryview``::

			`for x in data.iterbytes():`
			# x is a length 1 ``bytes`` object, rather than an integer

			`Third party types and arbitrary containers of integers that lack the new`
			method can still be handled by combining ``map`` with the new
			``bytes.byte()`` alternate constructor proposed above::
PEP 467: bytes/bytearray API & docs improvements 2014-03-29 21:28:34 -04:00
			`for x in map(bytes.byte, data):`
			# x is a length 1 ``bytes`` object, rather than an integer
			`# This works with any container of integers in the range`
			`# 0 to 255 inclusive`


PEP 467: descope dramatically based on Guido's feedback 2014-04-03 08:33:36 -04:00			`Open questions`
			`^^^^^^^^^^^^^^`
PEP 467: more concrete examples of current inconsistencies 2014-03-30 03:03:44 -04:00
PEP 467: descope dramatically based on Guido's feedback 2014-04-03 08:33:36 -04:00			`* The fallback case above suggests that this could perhaps be better handled`
			as an ``iterbytes(data)`` builtin, that used ``data.__iterbytes__()``
			if defined, but otherwise fell back to ``map(bytes.byte, data)``::
PEP 467: more concrete examples of current inconsistencies 2014-03-30 03:03:44 -04:00
PEP 467: descope dramatically based on Guido's feedback 2014-04-03 08:33:36 -04:00			`for x in iterbytes(data):`
			# x is a length 1 ``bytes`` object, rather than an integer
			`# This works with any container of integers in the range`
			`# 0 to 255 inclusive`
PEP 467: more concrete examples of current inconsistencies 2014-03-30 03:03:44 -04:00

PEP 467: descope dramatically based on Guido's feedback 2014-04-03 08:33:36 -04:00			`Documentation clarifications`
			`----------------------------`

			In an attempt to clarify the `documentation
			<https://docs.python.org/dev/library/stdtypes.html#binary-sequence-types-bytes-bytearray-memoryview>`__
			of the ``bytes`` and ``bytearray`` types, the following changes are
			`proposed:`

			`* the documentation of the sequence behaviour of each type is moved to`
			`section for that individual type. These sections will be updated to`
			explicitly make the ``tuple of integers`` and ``list of integers``
			`analogies, as well as to make it clear that these parts of the API work`
			`with arbitrary binary data`
			`* the current "Bytes and bytearray operations" section will be updated to`
			`"Handling binary data with ASCII compatible segments", and will explicitly`
			`list all of the methods that are included.`
			* clarify that due to their origins in the API of the immutable ``str``
			type, even the ``bytearray`` versions of these methods do not operate
			`in place, but instead create a new object.`

			`A patch for at least this part of the proposal will be prepared before`
			`submitting the PEP for approval, as writing out these docs completely may`
			`suggest additional opportunities for API consistency improvements.`
PEP 467: bytes/bytearray API & docs improvements 2014-03-29 21:28:34 -04:00
PEP 467: Additional tweaks after re-reading 2014-03-29 21:54:55 -04:00
PEP 467: bytes/bytearray API & docs improvements 2014-03-29 21:28:34 -04:00			`References`
			`==========`

			`.. [ideas-thread1] https://mail.python.org/pipermail/python-ideas/2014-March/027295.html`
			`.. [empty-buffer-issue] http://bugs.python.org/issue20895`


			`Copyright`
			`=========`

			`This document has been placed in the public domain.`


			`..`
			`Local Variables:`
			`mode: indented-text`
			`indent-tabs-mode: nil`
			`sentence-end-double-space: t`
			`fill-column: 70`
			`coding: utf-8`
			`End:`