python-peps/pep-0460.txt

169 lines
6.0 KiB
Plaintext
Raw Normal View History

PEP: 460
Title: Add binary interpolation and formatting
Version: $Revision$
Last-Modified: $Date$
2014-01-12 14:59:57 -05:00
Author: Antoine Pitrou <solipsis@pitrou.net>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 6-Jan-2014
Python-Version: 3.5
Abstract
========
This PEP proposes to add minimal formatting operations to bytes and
bytearray objects. The proposed additions are:
* ``bytes % ...`` and ``bytearray % ...`` for percent-formatting,
similar in syntax to percent-formatting on ``str`` objects
(accepting a single object, a tuple or a dict).
* ``bytes.format(...)`` and ``bytearray.format(...)`` for a formatting
similar in syntax to ``str.format()`` (accepting positional as well as
keyword arguments).
2014-01-09 16:02:01 -05:00
* ``bytes.format_map(...)`` and ``bytearray.format_map(...)`` for an
API similar to ``str.format_map(...)``, with the same formatting
syntax and semantics as ``bytes.format()`` and ``bytearray.format()``.
Rationale
=========
In Python 2, ``str % args`` and ``str.format(args)`` allow the formatting
2014-01-09 16:02:01 -05:00
and interpolation of bytestrings. This feature has commonly been used
for the assembling of protocol messages when protocols are known to use
a fixed encoding.
Python 3 generally mandates that text be stored and manipulated as unicode
(i.e. ``str`` objects, not ``bytes``). In some cases, though, it makes
sense to manipulate ``bytes`` objects directly. Typical usage is binary
network protocols, where you can want to interpolate and assemble several
bytes object (some of them literals, some of them compute) to produce
complete protocol messages. For example, protocols such as HTTP or SIP
have headers with ASCII names and opaque "textual" values using a varying
and/or sometimes ill-defined encoding. Moreover, those headers can be
followed by a binary body... which can be chunked and decorated with ASCII
headers and trailers!
While there are reasonably efficient ways to accumulate binary data
(such as using a ``bytearray`` object, the ``bytes.join`` method or
even ``io.BytesIO``), none of them leads to the kind of readable and
intuitive code that is produced by a %-formatted or {}-formatted template
and a formatting operation.
Binary formatting features
==========================
Supported features
------------------
In this proposal, percent-formatting for ``bytes`` and ``bytearray``
supports the following features:
* Looking up formatting arguments by position as well as by name (i.e.,
``%s`` as well as ``%(name)s``).
* ``%s`` will try to get a ``Py_buffer`` on the given value, and fallback
on calling ``__bytes__``. The resulting binary data is inserted at
the given point in the string. This is expected to work with bytes,
bytearray and memoryview objects (as well as a couple others such
as pathlib's path objects).
* ``%c`` will accept an integer between 0 and 255, and insert a byte of the
given value.
Braces-formatting for ``bytes`` and ``bytearray`` supports the following
features:
* All the kinds of argument lookup supported by ``str.format()`` (explicit
positional lookup, auto-incremented positional lookup, keyword lookup,
attribute lookup, etc.)
* Insertion of binary data when no modifier or layout is specified
(e.g. ``{}``, ``{0}``, ``{name}``). This has the same semantics as
``%s`` for percent-formatting (see above).
* The ``c`` modifier will accept an integer between 0 and 255, and insert a
byte of the given value (same as ``%c`` above).
Unsupported features
--------------------
All other features present in formatting of ``str`` objects (either
through the percent operator or the ``str.format()`` method) are
unsupported. Those features imply treating the recipient of the
operator or method as text, which goes counter to the text / bytes
separation (for example, accepting ``%d`` as a format code would imply
that the bytes object really is a ASCII-compatible text string).
Amongst those unsupported features are not only most type-specific
format codes, but also the various layout specifiers such as padding
or alignment. Besides, ``str`` objects are not acceptable as arguments
to the formatting operations, even when using e.g. the ``%s`` format code.
``__format__`` isn't called.
Criticisms
==========
* The development cost and maintenance cost.
* In 3.3 encoding to ASCII or latin-1 is as fast as memcpy (but it still
creates a separate object).
* Developers will have to work around the lack of binary formatting anyway,
if they want to to support Python 3.4 and earlier.
* bytes.join() is consistently faster than format to join bytes strings
(XXX *is it?*).
* Formatting functions could be implemented in a third party module,
rather than added to builtin types.
Other proposals
===============
A new type datatype
-------------------
It was proposed to create a new datatype specialized for "network
programming". The authors of this PEP believe this is counter-productive.
Python 3 already has several major types dedicated to manipulation of
binary data: ``bytes``, ``bytearray``, ``memoryview``, ``io.BytesIO``.
Adding yet another type would make things more confusing for users, and
interoperability between libraries more painful (also potentially
sub-optimal, due to the necessary conversions).
Moreover, not one type would be needed, but two: one immutable type (to
allow for hashing), and one mutable type (as efficient accumulation is
often necessary when working with network messages).
References
==========
* `Issue #3982: support .format for bytes
<http://bugs.python.org/issue3982>`_
* `Mercurial project
<http://mercurial.selenic.com/>`_
* `Twisted project
<http://twistedmatrix.com/trac/>`_
* `Documentation of Python 2 formatting (str % args)
<http://docs.python.org/2/library/stdtypes.html#string-formatting>`_
* `Documentation of Python 2 formatting (str.format)
<http://docs.python.org/2/library/string.html#formatstrings>`_
Copyright
=========
This document has been placed in the public domain.
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End: