Overhaul PEP 460, and add myself as author
This commit is contained in:
parent
e35a26608c
commit
19f33e611b
203
pep-0460.txt
203
pep-0460.txt
|
@ -1,8 +1,8 @@
|
||||||
PEP: 460
|
PEP: 460
|
||||||
Title: Add bytes % args and bytes.format(args) to Python 3.5
|
Title: Add binary interpolation and formatting
|
||||||
Version: $Revision$
|
Version: $Revision$
|
||||||
Last-Modified: $Date$
|
Last-Modified: $Date$
|
||||||
Author: Victor Stinner <victor.stinner@gmail.com>
|
Author: Victor Stinner <victor.stinner@gmail.com>, Antoine Pitrou <solipsis@pitrou.net>
|
||||||
Status: Draft
|
Status: Draft
|
||||||
Type: Standards Track
|
Type: Standards Track
|
||||||
Content-Type: text/x-rst
|
Content-Type: text/x-rst
|
||||||
|
@ -13,136 +13,124 @@ Python-Version: 3.5
|
||||||
Abstract
|
Abstract
|
||||||
========
|
========
|
||||||
|
|
||||||
Add ``bytes % args`` operator and ``bytes.format(args)`` method to
|
This PEP proposes to add minimal formatting operations to bytes and
|
||||||
Python 3.5.
|
bytearray objects. The proposed additions are:
|
||||||
|
|
||||||
|
* ``bytes % ...`` and ``bytearray % ...`` for percent-formatting,
|
||||||
|
similar in syntax to percent-formatting on ``str`` objects
|
||||||
|
(accepting a single object, a tuple or a dict).
|
||||||
|
|
||||||
|
* ``bytes.format(...)`` and ``bytearray.format(...)`` for a formatting
|
||||||
|
similar in syntax to ``str.format()`` (accepting positional as well as
|
||||||
|
keyword arguments).
|
||||||
|
|
||||||
|
|
||||||
Rationale
|
Rationale
|
||||||
=========
|
=========
|
||||||
|
|
||||||
``bytes % args`` and ``bytes.format(args)`` have been removed in Python
|
In Python 2, ``str % args`` and ``str.format(args)`` allow the formatting
|
||||||
2. This operator and this method are requested by Mercurial and Twisted
|
and interpolation of 8-bit strings. This feature has commonly been used
|
||||||
developers to ease porting their project on Python 3.
|
for the assembling of protocol messages when protocols are known to use
|
||||||
|
a fixed encoding.
|
||||||
|
|
||||||
Python 3 suggests to format text first and then encode to bytes. In
|
Python 3 generally mandates that text be stored and manipulated as unicode
|
||||||
some cases, it does not make sense because arguments are bytes strings.
|
(i.e. ``str`` objects, not ``bytes``). In some cases, though, it makes
|
||||||
Typical usage is a network protocol which is binary, since data are
|
sense to manipulate ``bytes`` objects directly. Typical usage is binary
|
||||||
send to and received from sockets. For example, SMTP, SIP, HTTP, IMAP,
|
network protocols, where you can want to interpolate and assemble several
|
||||||
POP, FTP are ASCII commands interspersed with binary data.
|
bytes object (some of them literals, some of them compute) to produce
|
||||||
|
complete protocol messages. For example, protocols such as HTTP or SIP
|
||||||
|
have headers with ASCII names and opaque "textual" values using a varying
|
||||||
|
and/or sometimes ill-defined encoding. Moreover, those headers can be
|
||||||
|
followed by a binary body... which can be chunked and decorated with ASCII
|
||||||
|
headers and trailers!
|
||||||
|
|
||||||
Using multiple ``bytes + bytes`` instructions is inefficient because it
|
While there are reasonably efficient ways to accumulate binary data
|
||||||
requires temporary buffers and copies which are slow and waste memory.
|
(such as using a ``bytearray`` object, the ``bytes.join`` method or
|
||||||
Python 3.3 optimizes ``str2 += str2`` but not ``bytes2 += bytes1``.
|
even ``io.BytesIO``), none of them leads to the kind of readable and
|
||||||
|
intuitive code that is produced by a %-formatted or {}-formatted template
|
||||||
``bytes % args`` and ``bytes.format(args)`` were asked since 2008, even
|
and a formatting operation.
|
||||||
before the first release of Python 3.0 (see issue #3982).
|
|
||||||
|
|
||||||
``struct.pack()`` is incomplete. For example, a number cannot be
|
|
||||||
formatted as decimal and it does not support padding bytes string.
|
|
||||||
|
|
||||||
Mercurial 2.8 still supports Python 2.4.
|
|
||||||
|
|
||||||
|
|
||||||
Needed and excluded features
|
Binary formatting features
|
||||||
============================
|
==========================
|
||||||
|
|
||||||
Needed features
|
Supported features
|
||||||
|
------------------
|
||||||
|
|
||||||
* Bytes strings: bytes, bytearray and memoryview types
|
In this proposal, percent-formatting for ``bytes`` and ``bytearray``
|
||||||
* Format integer numbers as decimal
|
supports the following features:
|
||||||
* Padding with spaces and null bytes
|
|
||||||
* "%s" should use the buffer protocol, not str()
|
|
||||||
|
|
||||||
The feature set is minimal to keep the implementation as simple as
|
* Looking up formatting arguments by position as well as by name (i.e.,
|
||||||
possible to limit the cost of the implementation. ``str % args`` and
|
``%s`` as well as ``%(name)s``).
|
||||||
``str.format(args)`` are already complex and difficult to maintain, the
|
* ``%s`` will try to get a ``Py_buffer`` on the given value, and fallback
|
||||||
code is heavily optimized.
|
on calling ``__bytes__``. The resulting binary data is inserted at
|
||||||
|
the given point in the string. This is expected to work with bytes,
|
||||||
|
bytearray and memoryview objects (as well as a couple others such
|
||||||
|
as pathlib's path objects).
|
||||||
|
* ``%c`` will accept an integer between 0 and 255, and insert a byte of the
|
||||||
|
given value.
|
||||||
|
|
||||||
Excluded features:
|
Braces-formatting for ``bytes`` and ``bytearray`` supports the following
|
||||||
|
features:
|
||||||
|
|
||||||
* no implicit conversion from Unicode to bytes (ex: encode to ASCII or
|
* All the kinds of argument lookup supported by ``str.format()`` (explicit
|
||||||
to Latin1)
|
positional lookup, auto-incremented positional lookup, keyword lookup,
|
||||||
* Locale support (``{!n}`` format for numbers). Locales are related to
|
attribute lookup, etc.)
|
||||||
text and usually to an encoding.
|
* Insertion of binary data when no modifier or layout is specified
|
||||||
* ``repr()``, ``ascii()``: ``%r``, ``{!r}``, ``%a`` and ``{!a}``
|
(e.g. ``{}``, ``{0}``, ``{name}``). This has the same semantics as
|
||||||
formats. ``repr()`` and ``ascii()`` are used to debug, the output is
|
``%s`` for percent-formatting (see above).
|
||||||
displayed a terminal or a graphical widget. They are more related to
|
* The ``c`` modifier will accept an integer between 0 and 255, and insert a
|
||||||
text.
|
byte of the given value (same as ``%c`` above).
|
||||||
* Attribute access: ``{obj.attr}``
|
|
||||||
* Indexing: ``{dict[key]}``
|
|
||||||
* Features of struct.pack(). For example, format a number as 32 bit unsigned
|
|
||||||
integer in network endian. The ``struct.pack()`` can be used to prepare
|
|
||||||
arguments, the implementation should be kept simple.
|
|
||||||
* Features of int.to_bytes().
|
|
||||||
* Features of ctypes.
|
|
||||||
* New format protocol like a new ``__bformat__()`` method. Since the
|
|
||||||
* list of
|
|
||||||
supported types is short, there is no need to add a new protocol.
|
|
||||||
Other types must be explicitly casted.
|
|
||||||
* Alternate format for integer. For example, ``'{|#x}'.format(0x123)``
|
|
||||||
to get ``0x123``. It is more related to debug, and the prefix can be
|
|
||||||
easily be written in the format string (ex: ``0x%x``).
|
|
||||||
* Relation with format() and the __format__() protocol. bytes.format()
|
|
||||||
and str.format() are unrelated.
|
|
||||||
|
|
||||||
Unknown:
|
Unsupported features
|
||||||
|
--------------------
|
||||||
|
|
||||||
* Format integer to hexadecimal? ``%x`` and ``%X``
|
All other features present in formatting of ``str`` objects (either
|
||||||
* Format integer to octal? ``%o``
|
through the percent operator or the ``str.format()`` method) are
|
||||||
* Format integer to binary? ``{!b}``
|
unsupported. Those features imply treating the recipient of the
|
||||||
* Alignment?
|
operator or method as text, which goes counter to the text / bytes
|
||||||
* Truncating? Truncate or raise an error?
|
separation (for example, accepting ``%d`` as a format code would imply
|
||||||
* format keywords? ``b'{arg}'.format(arg=5)``
|
that the bytes object really is a ASCII-compatible text string).
|
||||||
* ``str % dict`` ? ``b'%(arg)s' % {'arg': 5)``
|
|
||||||
* Floating point number?
|
|
||||||
* ``%i``, ``%u`` and ``%d`` formats for integer numbers?
|
|
||||||
* Signed number? ``%+i`` and ``%-i``
|
|
||||||
|
|
||||||
|
Amongst those unsupported features are not only most type-specific
|
||||||
bytes % args
|
format codes, but also the various layout specifiers such as padding
|
||||||
============
|
or alignment. Besides, ``str`` objects are not acceptable as arguments
|
||||||
|
to the formatting operations, even when using e.g. the ``%s`` format code.
|
||||||
Formatters:
|
|
||||||
|
|
||||||
* ``"%c"``: one byte
|
|
||||||
* ``"%s"``: integer or bytes strings
|
|
||||||
* ``"%20s"`` pads to 20 bytes with spaces (``b' '``)
|
|
||||||
* ``"%020s"`` pads to 20 bytes with zeros (``b'0'``)
|
|
||||||
* ``"%\020s"`` pads to 20 bytes with null bytes (``b'\0'``)
|
|
||||||
|
|
||||||
|
|
||||||
bytes.format(args)
|
|
||||||
==================
|
|
||||||
|
|
||||||
Formatters:
|
|
||||||
|
|
||||||
* ``"{!c}"``: one byte
|
|
||||||
* ``"{!s}"``: integer or bytes strings
|
|
||||||
* ``"{!.20s}"`` pads to 20 bytes with spaces (``b' '``)
|
|
||||||
* ``"{!.020s}"`` pads to 20 bytes with zeros (``b'0'``)
|
|
||||||
* ``"{!\020s}"`` pads to 20 bytes with null bytes (``b'\0'``)
|
|
||||||
|
|
||||||
|
|
||||||
Examples
|
|
||||||
========
|
|
||||||
|
|
||||||
* ``b'a%sc%s' % (b'b', 4)`` gives ``b'abc4'``
|
|
||||||
* ``b'a{}c{}'.format(b'b', 4)`` gives ``b'abc4'``
|
|
||||||
* ``b'%c'`` % 88`` gives ``b'X``'
|
|
||||||
* ``b'%%'`` gives ``b'%'``
|
|
||||||
|
|
||||||
|
|
||||||
Criticisms
|
Criticisms
|
||||||
==========
|
==========
|
||||||
|
|
||||||
* The development cost and maintenance cost.
|
* The development cost and maintenance cost.
|
||||||
* In 3.3 encoding to ascii or latin1 is as fast as memcpy
|
* In 3.3 encoding to ASCII or latin-1 is as fast as memcpy (but it still
|
||||||
* Developers must work around the lack of bytes%args and
|
creates a separate object).
|
||||||
bytes.format(args) anyway to support Python 3.0-3.4
|
* Developers will have to work around the lack of binary formatting anyway,
|
||||||
* bytes.join() is consistently faster than format to join bytes strings.
|
if they want to to support Python 3.4 and earlier.
|
||||||
* Formatting functions can be implemented in a third party module
|
* bytes.join() is consistently faster than format to join bytes strings
|
||||||
|
(XXX *is it?*).
|
||||||
|
* Formatting functions could be implemented in a third party module,
|
||||||
|
rather than added to builtin types.
|
||||||
|
|
||||||
|
|
||||||
|
Other proposals
|
||||||
|
===============
|
||||||
|
|
||||||
|
A new type datatype
|
||||||
|
-------------------
|
||||||
|
|
||||||
|
It was proposed to create a new datatype specialized for "network
|
||||||
|
programming". The authors of this PEP believe this is counter-productive.
|
||||||
|
Python 3 already has several major types dedicated to manipulation of
|
||||||
|
binary data: ``bytes``, ``bytearray``, ``memoryview``, ``io.BytesIO``.
|
||||||
|
|
||||||
|
Adding yet another type would make things more confusing for users, and
|
||||||
|
interoperability between libraries more painful (also potentially
|
||||||
|
sub-optimal, due to the necessary conversions).
|
||||||
|
|
||||||
|
Moreover, not one type would be needed, but two: one immutable type (to
|
||||||
|
allow for hashing), and one mutable type (as efficient accumulation is
|
||||||
|
often necessary when working with network messages).
|
||||||
|
|
||||||
References
|
References
|
||||||
==========
|
==========
|
||||||
|
|
||||||
|
@ -172,4 +160,3 @@ This document has been placed in the public domain.
|
||||||
fill-column: 70
|
fill-column: 70
|
||||||
coding: utf-8
|
coding: utf-8
|
||||||
End:
|
End:
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue