diff --git a/pep-0467.txt b/pep-0467.txt index ba6d42b9b..e0a615d8f 100644 --- a/pep-0467.txt +++ b/pep-0467.txt @@ -20,163 +20,166 @@ that is now referred to as ``bytearray``. Other aspects of operating in the binary domain in Python have also evolved over the course of the Python 3 series. -This PEP proposes a number of small adjustments to the APIs of the ``bytes`` -and ``bytearray`` types to make it easier to operate entirely in the binary -domain. +This PEP proposes four small adjustments to the APIs of the ``bytes``, +``bytearray`` and ``memoryview`` types to make it easier to operate entirely +in the binary domain: - -Background -========== - -To simplify the task of writing the Python 3 documentation, the ``bytes`` -and ``bytearray`` types were documented primarily in terms of the way they -differed from the Unicode based Python 3 ``str`` type. Even when I -`heavily revised the sequence documentation -`__ in 2012, I retained that -simplifying shortcut. - -However, it turns out that this approach to the documentation of these types -had a problem: it doesn't adequately introduce users to their hybrid nature, -where they can be manipulated *either* as a "sequence of integers" type, -*or* as ``str``-like types that assume ASCII compatible data. - -That oversight has now been corrected, with the binary sequence types now -being documented entirely independently of the ``str`` documentation in -`Python 3.4+ `__ - -The confusion isn't just a documentation issue, however, as there are also -some lingering design quirks from an earlier pre-release design where there -was *no* separate ``bytearray`` type, and instead the core ``bytes`` type -was mutable (with no immutable counterpart). - -Finally, additional experience with using the existing Python 3 binary -sequence types in real world applications has suggested it would be -beneficial to make it easier to convert integers to length 1 bytes objects. +* Deprecate passing single integer values to ``bytes`` and ``bytearray`` +* Add ``bytes.zeros`` and ``bytearray.zeros`` alternative constructors +* Add ``bytes.byte`` and ``bytearray.byte`` alternative constructors +* Add ``bytes.iterbytes``, ``bytearray.iterbytes`` and + ``memoryview.iterbytes`` alternative iterators Proposals ========= -As a "consistency improvement" proposal, this PEP is actually about a few -smaller micro-proposals, each aimed at improving the usability of the binary -data model in Python 3. Proposals are motivated by one of two main factors: +Deprecation of current "zero-initialised sequence" behaviour +------------------------------------------------------------ -* removing remnants of the original design of ``bytes`` as a mutable type -* allowing users to easily convert integer values to a length 1 ``bytes`` - object +Currently, the ``bytes`` and ``bytearray`` constructors accept an integer +argument and interpret it as meaning to create a zero-initialised sequence +of the given size:: - -Alternate Constructors ----------------------- - -The ``bytes`` and ``bytearray`` constructors currently accept an integer -argument, but interpret it to mean a zero-filled object of the given length. -This is a legacy of the original design of ``bytes`` as a mutable type, -rather than a particularly intuitive behaviour for users. It has become -especially confusing now that some other ``bytes`` interfaces treat integers -and the corresponding length 1 bytes instances as equivalent input. -Compare:: - - >>> b"\x03" in bytes([1, 2, 3]) - True - >>> 3 in bytes([1, 2, 3]) - True - - >>> bytes(b"\x03") - b'\x03' >>> bytes(3) b'\x00\x00\x00' + >>> bytearray(3) + bytearray(b'\x00\x00\x00') -This PEP proposes that the current handling of integers in the bytes and -bytearray constructors by deprecated in Python 3.5 and targeted for -removal in Python 3.7, being replaced by two more explicit alternate -constructors provided as class methods. The initial python-ideas thread -[ideas-thread1]_ that spawned this PEP was specifically aimed at deprecating -this constructor behaviour. +This PEP proposes to deprecate that behaviour in Python 3.5, and remove it +entirely in Python 3.6. -Firstly, a ``byte`` constructor is proposed that converts integers -in the range 0 to 255 (inclusive) to a ``bytes`` object:: +No other changes are proposed to the existing constructors. - >>> bytes.byte(3) - b'\x03' - >>> bytearray.byte(3) - bytearray(b'\x03') - >>> bytes.byte(512) - Traceback (most recent call last): - File "", line 1, in - ValueError: bytes must be in range(0, 256) -One specific use case for this alternate constructor is to easily convert -the result of indexing operations on ``bytes`` and other binary sequences -from an integer to a ``bytes`` object. The documentation for this API -should note that its counterpart for the reverse conversion is ``ord()``. -The ``ord()`` documentation will also be updated to note that while -``chr()`` is the counterpart for ``str`` input, ``bytes.byte`` and -``bytearray.byte`` are the counterparts for binary input. +Addition of explicit "zero-initialised sequence" constructors +------------------------------------------------------------- -Secondly, a ``zeros`` constructor is proposed that serves as a direct -replacement for the current constructor behaviour, rather than having to use -sequence repetition to achieve the same effect in a less intuitive way:: +To replace the deprecated behaviour, this PEP proposes the addition of an +explicit ``zeros`` alternative constructor as a class method on both +``bytes`` and ``bytearray``:: >>> bytes.zeros(3) b'\x00\x00\x00' >>> bytearray.zeros(3) bytearray(b'\x00\x00\x00') -The chosen name here is taken from the corresponding initialisation function -in NumPy (although, as these are sequence types rather than N-dimensional -matrices, the constructors take a length as input rather than a shape tuple) +It will behave just as the current constructors behave when passed a single +integer. -While ``bytes.byte`` and ``bytearray.zeros`` are expected to be the more -useful duo amongst the new constructors, ``bytes.zeros`` and -`bytearray.byte`` are provided in order to maintain API consistency between -the two types. +The specific choice of ``zeros`` as the alternative constructor name is taken +from the corresponding initialisation function in NumPy (although, as these +are 1-dimensional sequence types rather than N-dimensional matrices, the +constructors take a length as input rather than a shape tuple) -Iteration ---------- +Addition of explicit "single byte" constructors +----------------------------------------------- -While iteration over ``bytes`` objects and other binary sequences produces -integers, it is sometimes desirable to iterate over length 1 bytes objects -instead. +As binary counterparts to the text ``chr`` function, this PEP proposes the +addition of an explicit ``byte`` alternative constructor as a class method +on both ``bytes`` and ``bytearray``:: -To handle this situation more obviously (and more efficiently) than would be -the case with the ``map(bytes.byte, data)`` construct enabled by the above -constructor changes, this PEP proposes the addition of a new ``iterbytes`` -method to ``bytes``, ``bytearray`` and ``memoryview``:: + >>> bytes.byte(3) + b'\x03' + >>> bytearray.byte(3) + bytearray(b'\x03') + +These methods will only accept integers in the range 0 to 255 (inclusive):: + + >>> bytes.byte(512) + Traceback (most recent call last): + File "", line 1, in + ValueError: bytes must be in range(0, 256) + + >>> bytes.byte(1.0) + Traceback (most recent call last): + File "", line 1, in + TypeError: 'float' object cannot be interpreted as an integer + +The documentation of the ``ord`` builtin will be updated to explicitly note +that ``bytes.byte`` is the inverse operation for binary data, while ``chr`` +is the inverse operation for text data. + +Behaviourally, ``bytes.byte(x)`` will be equivalent to the current +``bytes([x])`` (and similarly for ``bytearray``). The new spelling is +expected to be easier to discover and easier to read (especially when used +in conjunction with indexing operations on binary sequence types). + +As a separate method, the new spelling will also work better with higher +order functions like ``map``. + + +Addition of optimised iterator methods that produce ``bytes`` objects +--------------------------------------------------------------------- + +This PEP proposes that ``bytes``, ``bytearray`` and ``memoryview`` gain an +optimised ``iterbytes`` method that produces length 1 ``bytes`` objects +rather than integers:: for x in data.iterbytes(): # x is a length 1 ``bytes`` object, rather than an integer -Third party types and arbitrary containers of integers that lack the new -method can still be handled by combining ``map`` with the new -``bytes.byte()`` alternate constructor proposed above:: +The method can be used with arbitrary buffer exporting objects by wrapping +them in a ``memoryview`` instance first:: - for x in map(bytes.byte, data): + for x in memoryview(data).iterbytes(): # x is a length 1 ``bytes`` object, rather than an integer - # This works with *any* container of integers in the range - # 0 to 255 inclusive + +For ``memoryview``, the semantics of ``iterbytes()`` are defined such that:: + + memview.tobytes() == b''.join(memview.iterbytes()) + +This allows the raw bytes of the memory view to be iterated over without +needing to make a copy, regardless of the defined shape and format. + +The main advantage this method offers over the ``map(bytes.byte, data)`` +approach is that it is guaranteed *not* to fail midstream with a +``ValueError`` or ``TypeError``. By contrast, when using the ``map`` based +approach, the type and value of the individual items in the iterable are +only checked as they are retrieved and passed through the ``bytes.byte`` +constructor. -Open questions -^^^^^^^^^^^^^^ +Design discussion +================= -* The fallback case above suggests that this could perhaps be better handled - as an ``iterbytes(data)`` *builtin*, that used ``data.__iterbytes__()`` - if defined, but otherwise fell back to ``map(bytes.byte, data)``:: +Why not rely on sequence repetition to create zero-initialised sequences? +------------------------------------------------------------------------- - for x in iterbytes(data): - # x is a length 1 ``bytes`` object, rather than an integer - # This works with *any* container of integers in the range - # 0 to 255 inclusive +Zero-initialised sequences can be created via sequence repetition:: + + >>> b'\x00' * 3 + b'\x00\x00\x00' + >>> bytearray(b'\x00') * 3 + bytearray(b'\x00\x00\x00') + +However, this was also the case when the ``bytearray`` type was originally +designed, and the decision was made to add explicit support for it in the +type constructor. The immutable ``bytes`` type then inherited that feature +when it was introduced in PEP 3137. + +This PEP isn't revisiting that original design decision, just changing the +spelling as users sometimes find the current behaviour of the binary sequence +constructors surprising. In particular, there's a reasonable case to be made +that ``bytes(x)`` (where ``x`` is an integer) should behave like the +``bytes.byte(x)`` proposal in this PEP. Providing both behaviours as separate +class methods avoids that ambiguity. References ========== -.. [ideas-thread1] https://mail.python.org/pipermail/python-ideas/2014-March/027295.html -.. [empty-buffer-issue] http://bugs.python.org/issue20895 -.. [GvR-initial-feedback] https://mail.python.org/pipermail/python-ideas/2014-March/027376.html +.. [1] Initial March 2014 discussion thread on python-ideas + (https://mail.python.org/pipermail/python-ideas/2014-March/027295.html) +.. [2] Guido's initial feedback in that thread + (https://mail.python.org/pipermail/python-ideas/2014-March/027376.html) +.. [3] Issue proposing moving zero-initialised sequences to a dedicated API + (http://bugs.python.org/issue20895) +.. [4] Issue proposing to use calloc() for zero-initialised binary sequences + (http://bugs.python.org/issue21644) +.. [5] August 2014 discussion thread on python-dev + (https://mail.python.org/pipermail/python-ideas/2014-March/027295.html) Copyright