2014-01-06 08:01:09 -05:00
|
|
|
|
PEP: 460
|
2014-01-08 17:38:18 -05:00
|
|
|
|
Title: Add binary interpolation and formatting
|
2014-01-06 08:01:09 -05:00
|
|
|
|
Version: $Revision$
|
|
|
|
|
Last-Modified: $Date$
|
2014-01-12 14:59:57 -05:00
|
|
|
|
Author: Antoine Pitrou <solipsis@pitrou.net>
|
2014-01-06 08:01:09 -05:00
|
|
|
|
Status: Draft
|
|
|
|
|
Type: Standards Track
|
|
|
|
|
Content-Type: text/x-rst
|
|
|
|
|
Created: 6-Jan-2014
|
|
|
|
|
Python-Version: 3.5
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Abstract
|
|
|
|
|
========
|
|
|
|
|
|
2014-01-08 17:38:18 -05:00
|
|
|
|
This PEP proposes to add minimal formatting operations to bytes and
|
|
|
|
|
bytearray objects. The proposed additions are:
|
|
|
|
|
|
|
|
|
|
* ``bytes % ...`` and ``bytearray % ...`` for percent-formatting,
|
|
|
|
|
similar in syntax to percent-formatting on ``str`` objects
|
|
|
|
|
(accepting a single object, a tuple or a dict).
|
|
|
|
|
|
|
|
|
|
* ``bytes.format(...)`` and ``bytearray.format(...)`` for a formatting
|
|
|
|
|
similar in syntax to ``str.format()`` (accepting positional as well as
|
|
|
|
|
keyword arguments).
|
2014-01-06 08:01:09 -05:00
|
|
|
|
|
2014-01-09 16:02:01 -05:00
|
|
|
|
* ``bytes.format_map(...)`` and ``bytearray.format_map(...)`` for an
|
|
|
|
|
API similar to ``str.format_map(...)``, with the same formatting
|
|
|
|
|
syntax and semantics as ``bytes.format()`` and ``bytearray.format()``.
|
|
|
|
|
|
2014-01-06 08:01:09 -05:00
|
|
|
|
|
|
|
|
|
Rationale
|
|
|
|
|
=========
|
|
|
|
|
|
2014-01-08 17:38:18 -05:00
|
|
|
|
In Python 2, ``str % args`` and ``str.format(args)`` allow the formatting
|
2014-01-09 16:02:01 -05:00
|
|
|
|
and interpolation of bytestrings. This feature has commonly been used
|
2014-01-08 17:38:18 -05:00
|
|
|
|
for the assembling of protocol messages when protocols are known to use
|
|
|
|
|
a fixed encoding.
|
|
|
|
|
|
|
|
|
|
Python 3 generally mandates that text be stored and manipulated as unicode
|
|
|
|
|
(i.e. ``str`` objects, not ``bytes``). In some cases, though, it makes
|
|
|
|
|
sense to manipulate ``bytes`` objects directly. Typical usage is binary
|
|
|
|
|
network protocols, where you can want to interpolate and assemble several
|
|
|
|
|
bytes object (some of them literals, some of them compute) to produce
|
|
|
|
|
complete protocol messages. For example, protocols such as HTTP or SIP
|
|
|
|
|
have headers with ASCII names and opaque "textual" values using a varying
|
|
|
|
|
and/or sometimes ill-defined encoding. Moreover, those headers can be
|
|
|
|
|
followed by a binary body... which can be chunked and decorated with ASCII
|
|
|
|
|
headers and trailers!
|
|
|
|
|
|
|
|
|
|
While there are reasonably efficient ways to accumulate binary data
|
|
|
|
|
(such as using a ``bytearray`` object, the ``bytes.join`` method or
|
|
|
|
|
even ``io.BytesIO``), none of them leads to the kind of readable and
|
|
|
|
|
intuitive code that is produced by a %-formatted or {}-formatted template
|
|
|
|
|
and a formatting operation.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Binary formatting features
|
|
|
|
|
==========================
|
|
|
|
|
|
|
|
|
|
Supported features
|
|
|
|
|
------------------
|
|
|
|
|
|
|
|
|
|
In this proposal, percent-formatting for ``bytes`` and ``bytearray``
|
|
|
|
|
supports the following features:
|
|
|
|
|
|
|
|
|
|
* Looking up formatting arguments by position as well as by name (i.e.,
|
|
|
|
|
``%s`` as well as ``%(name)s``).
|
|
|
|
|
* ``%s`` will try to get a ``Py_buffer`` on the given value, and fallback
|
|
|
|
|
on calling ``__bytes__``. The resulting binary data is inserted at
|
|
|
|
|
the given point in the string. This is expected to work with bytes,
|
|
|
|
|
bytearray and memoryview objects (as well as a couple others such
|
|
|
|
|
as pathlib's path objects).
|
|
|
|
|
* ``%c`` will accept an integer between 0 and 255, and insert a byte of the
|
|
|
|
|
given value.
|
|
|
|
|
|
|
|
|
|
Braces-formatting for ``bytes`` and ``bytearray`` supports the following
|
|
|
|
|
features:
|
|
|
|
|
|
|
|
|
|
* All the kinds of argument lookup supported by ``str.format()`` (explicit
|
|
|
|
|
positional lookup, auto-incremented positional lookup, keyword lookup,
|
|
|
|
|
attribute lookup, etc.)
|
|
|
|
|
* Insertion of binary data when no modifier or layout is specified
|
|
|
|
|
(e.g. ``{}``, ``{0}``, ``{name}``). This has the same semantics as
|
|
|
|
|
``%s`` for percent-formatting (see above).
|
|
|
|
|
* The ``c`` modifier will accept an integer between 0 and 255, and insert a
|
|
|
|
|
byte of the given value (same as ``%c`` above).
|
|
|
|
|
|
|
|
|
|
Unsupported features
|
|
|
|
|
--------------------
|
|
|
|
|
|
|
|
|
|
All other features present in formatting of ``str`` objects (either
|
|
|
|
|
through the percent operator or the ``str.format()`` method) are
|
|
|
|
|
unsupported. Those features imply treating the recipient of the
|
|
|
|
|
operator or method as text, which goes counter to the text / bytes
|
|
|
|
|
separation (for example, accepting ``%d`` as a format code would imply
|
|
|
|
|
that the bytes object really is a ASCII-compatible text string).
|
|
|
|
|
|
|
|
|
|
Amongst those unsupported features are not only most type-specific
|
|
|
|
|
format codes, but also the various layout specifiers such as padding
|
|
|
|
|
or alignment. Besides, ``str`` objects are not acceptable as arguments
|
|
|
|
|
to the formatting operations, even when using e.g. the ``%s`` format code.
|
2014-01-06 08:01:09 -05:00
|
|
|
|
|
2014-01-10 04:10:56 -05:00
|
|
|
|
``__format__`` isn't called.
|
|
|
|
|
|
2014-01-06 08:01:09 -05:00
|
|
|
|
|
|
|
|
|
Criticisms
|
|
|
|
|
==========
|
|
|
|
|
|
|
|
|
|
* The development cost and maintenance cost.
|
2014-01-08 17:38:18 -05:00
|
|
|
|
* In 3.3 encoding to ASCII or latin-1 is as fast as memcpy (but it still
|
|
|
|
|
creates a separate object).
|
|
|
|
|
* Developers will have to work around the lack of binary formatting anyway,
|
|
|
|
|
if they want to to support Python 3.4 and earlier.
|
|
|
|
|
* bytes.join() is consistently faster than format to join bytes strings
|
|
|
|
|
(XXX *is it?*).
|
|
|
|
|
* Formatting functions could be implemented in a third party module,
|
|
|
|
|
rather than added to builtin types.
|
|
|
|
|
|
2014-01-06 08:01:09 -05:00
|
|
|
|
|
2014-01-08 17:38:18 -05:00
|
|
|
|
Other proposals
|
|
|
|
|
===============
|
|
|
|
|
|
|
|
|
|
A new type datatype
|
|
|
|
|
-------------------
|
|
|
|
|
|
|
|
|
|
It was proposed to create a new datatype specialized for "network
|
|
|
|
|
programming". The authors of this PEP believe this is counter-productive.
|
|
|
|
|
Python 3 already has several major types dedicated to manipulation of
|
|
|
|
|
binary data: ``bytes``, ``bytearray``, ``memoryview``, ``io.BytesIO``.
|
|
|
|
|
|
|
|
|
|
Adding yet another type would make things more confusing for users, and
|
|
|
|
|
interoperability between libraries more painful (also potentially
|
|
|
|
|
sub-optimal, due to the necessary conversions).
|
|
|
|
|
|
|
|
|
|
Moreover, not one type would be needed, but two: one immutable type (to
|
|
|
|
|
allow for hashing), and one mutable type (as efficient accumulation is
|
|
|
|
|
often necessary when working with network messages).
|
2014-01-06 08:01:09 -05:00
|
|
|
|
|
|
|
|
|
References
|
|
|
|
|
==========
|
|
|
|
|
|
|
|
|
|
* `Issue #3982: support .format for bytes
|
|
|
|
|
<http://bugs.python.org/issue3982>`_
|
|
|
|
|
* `Mercurial project
|
|
|
|
|
<http://mercurial.selenic.com/>`_
|
|
|
|
|
* `Twisted project
|
|
|
|
|
<http://twistedmatrix.com/trac/>`_
|
|
|
|
|
* `Documentation of Python 2 formatting (str % args)
|
|
|
|
|
<http://docs.python.org/2/library/stdtypes.html#string-formatting>`_
|
|
|
|
|
* `Documentation of Python 2 formatting (str.format)
|
|
|
|
|
<http://docs.python.org/2/library/string.html#formatstrings>`_
|
|
|
|
|
|
|
|
|
|
Copyright
|
|
|
|
|
=========
|
|
|
|
|
|
|
|
|
|
This document has been placed in the public domain.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
..
|
|
|
|
|
Local Variables:
|
|
|
|
|
mode: indented-text
|
|
|
|
|
indent-tabs-mode: nil
|
|
|
|
|
sentence-end-double-space: t
|
|
|
|
|
fill-column: 70
|
|
|
|
|
coding: utf-8
|
|
|
|
|
End:
|