Create PEP 460 "Add bytes % args and bytes.format(args) to Python 3.5"
This commit is contained in:
parent
c77d38b4d2
commit
1a7d7a09c8
|
@ -0,0 +1,175 @@
|
||||||
|
PEP: 460
|
||||||
|
Title: Add bytes % args and bytes.format(args) to Python 3.5
|
||||||
|
Version: $Revision$
|
||||||
|
Last-Modified: $Date$
|
||||||
|
Author: Victor Stinner <victor.stinner@gmail.com>
|
||||||
|
Status: Draft
|
||||||
|
Type: Standards Track
|
||||||
|
Content-Type: text/x-rst
|
||||||
|
Created: 6-Jan-2014
|
||||||
|
Python-Version: 3.5
|
||||||
|
|
||||||
|
|
||||||
|
Abstract
|
||||||
|
========
|
||||||
|
|
||||||
|
Add ``bytes % args`` operator and ``bytes.format(args)`` method to
|
||||||
|
Python 3.5.
|
||||||
|
|
||||||
|
|
||||||
|
Rationale
|
||||||
|
=========
|
||||||
|
|
||||||
|
``bytes % args`` and ``bytes.format(args)`` have been removed in Python
|
||||||
|
2. This operator and this method are requested by Mercurial and Twisted
|
||||||
|
developers to ease porting their project on Python 3.
|
||||||
|
|
||||||
|
Python 3 suggests to format text first and then encode to bytes. In
|
||||||
|
some cases, it does not make sense because arguments are bytes strings.
|
||||||
|
Typical usage is a network protocol which is binary, since data are
|
||||||
|
send to and received from sockets. For example, SMTP, SIP, HTTP, IMAP,
|
||||||
|
POP, FTP are ASCII commands interspersed with binary data.
|
||||||
|
|
||||||
|
Using multiple ``bytes + bytes`` instructions is inefficient because it
|
||||||
|
requires temporary buffers and copies which are slow and waste memory.
|
||||||
|
Python 3.3 optimizes ``str2 += str2`` but not ``bytes2 += bytes1``.
|
||||||
|
|
||||||
|
``bytes % args`` and ``bytes.format(args)`` were asked since 2008, even
|
||||||
|
before the first release of Python 3.0 (see issue #3982).
|
||||||
|
|
||||||
|
``struct.pack()`` is incomplete. For example, a number cannot be
|
||||||
|
formatted as decimal and it does not support padding bytes string.
|
||||||
|
|
||||||
|
Mercurial 2.8 still supports Python 2.4.
|
||||||
|
|
||||||
|
|
||||||
|
Needed and excluded features
|
||||||
|
============================
|
||||||
|
|
||||||
|
Needed features
|
||||||
|
|
||||||
|
* Bytes strings: bytes, bytearray and memoryview types
|
||||||
|
* Format integer numbers as decimal
|
||||||
|
* Padding with spaces and null bytes
|
||||||
|
* "%s" should use the buffer protocol, not str()
|
||||||
|
|
||||||
|
The feature set is minimal to keep the implementation as simple as
|
||||||
|
possible to limit the cost of the implementation. ``str % args`` and
|
||||||
|
``str.format(args)`` are already complex and difficult to maintain, the
|
||||||
|
code is heavily optimized.
|
||||||
|
|
||||||
|
Excluded features:
|
||||||
|
|
||||||
|
* no implicit conversion from Unicode to bytes (ex: encode to ASCII or
|
||||||
|
to Latin1)
|
||||||
|
* Locale support (``{!n}`` format for numbers). Locales are related to
|
||||||
|
text and usually to an encoding.
|
||||||
|
* ``repr()``, ``ascii()``: ``%r``, ``{!r}``, ``%a`` and ``{!a}``
|
||||||
|
formats. ``repr()`` and ``ascii()`` are used to debug, the output is
|
||||||
|
displayed a terminal or a graphical widget. They are more related to
|
||||||
|
text.
|
||||||
|
* Attribute access: ``{obj.attr}``
|
||||||
|
* Indexing: ``{dict[key]}``
|
||||||
|
* Features of struct.pack(). For example, format a number as 32 bit unsigned
|
||||||
|
integer in network endian. The ``struct.pack()`` can be used to prepare
|
||||||
|
arguments, the implementation should be kept simple.
|
||||||
|
* Features of int.to_bytes().
|
||||||
|
* Features of ctypes.
|
||||||
|
* New format protocol like a new ``__bformat__()`` method. Since the
|
||||||
|
* list of
|
||||||
|
supported types is short, there is no need to add a new protocol.
|
||||||
|
Other types must be explicitly casted.
|
||||||
|
* Alternate format for integer. For example, ``'{|#x}'.format(0x123)``
|
||||||
|
to get ``0x123``. It is more related to debug, and the prefix can be
|
||||||
|
easily be written in the format string (ex: ``0x%x``).
|
||||||
|
* Relation with format() and the __format__() protocol. bytes.format()
|
||||||
|
and str.format() are unrelated.
|
||||||
|
|
||||||
|
Unknown:
|
||||||
|
|
||||||
|
* Format integer to hexadecimal? ``%x`` and ``%X``
|
||||||
|
* Format integer to octal? ``%o``
|
||||||
|
* Format integer to binary? ``{!b}``
|
||||||
|
* Alignment?
|
||||||
|
* Truncating? Truncate or raise an error?
|
||||||
|
* format keywords? ``b'{arg}'.format(arg=5)``
|
||||||
|
* ``str % dict`` ? ``b'%(arg)s' % {'arg': 5)``
|
||||||
|
* Floating point number?
|
||||||
|
* ``%i``, ``%u`` and ``%d`` formats for integer numbers?
|
||||||
|
* Signed number? ``%+i`` and ``%-i``
|
||||||
|
|
||||||
|
|
||||||
|
bytes % args
|
||||||
|
============
|
||||||
|
|
||||||
|
Formatters:
|
||||||
|
|
||||||
|
* ``"%c"``: one byte
|
||||||
|
* ``"%s"``: integer or bytes strings
|
||||||
|
* ``"%20s"`` pads to 20 bytes with spaces (``b' '``)
|
||||||
|
* ``"%020s"`` pads to 20 bytes with zeros (``b'0'``)
|
||||||
|
* ``"%\020s"`` pads to 20 bytes with null bytes (``b'\0'``)
|
||||||
|
|
||||||
|
|
||||||
|
bytes.format(args)
|
||||||
|
==================
|
||||||
|
|
||||||
|
Formatters:
|
||||||
|
|
||||||
|
* ``"{!c}"``: one byte
|
||||||
|
* ``"{!s}"``: integer or bytes strings
|
||||||
|
* ``"{!.20s}"`` pads to 20 bytes with spaces (``b' '``)
|
||||||
|
* ``"{!.020s}"`` pads to 20 bytes with zeros (``b'0'``)
|
||||||
|
* ``"{!\020s}"`` pads to 20 bytes with null bytes (``b'\0'``)
|
||||||
|
|
||||||
|
|
||||||
|
Examples
|
||||||
|
========
|
||||||
|
|
||||||
|
* ``b'a%sc%s' % (b'b', 4)`` gives ``b'abc4'``
|
||||||
|
* ``b'a{}c{}'.format(b'b', 4)`` gives ``b'abc4'``
|
||||||
|
* ``b'%c'`` % 88`` gives ``b'X``'
|
||||||
|
* ``b'%%'`` gives ``b'%'``
|
||||||
|
|
||||||
|
|
||||||
|
Criticisms
|
||||||
|
==========
|
||||||
|
|
||||||
|
* The development cost and maintenance cost.
|
||||||
|
* In 3.3 encoding to ascii or latin1 is as fast as memcpy
|
||||||
|
* Developers must work around the lack of bytes%args and
|
||||||
|
bytes.format(args) anyway to support Python 3.0-3.4
|
||||||
|
* bytes.join() is consistently faster than format to join bytes strings.
|
||||||
|
* Formatting functions can be implemented in a third party module
|
||||||
|
|
||||||
|
|
||||||
|
References
|
||||||
|
==========
|
||||||
|
|
||||||
|
* `Issue #3982: support .format for bytes
|
||||||
|
<http://bugs.python.org/issue3982>`_
|
||||||
|
* `Mercurial project
|
||||||
|
<http://mercurial.selenic.com/>`_
|
||||||
|
* `Twisted project
|
||||||
|
<http://twistedmatrix.com/trac/>`_
|
||||||
|
* `Documentation of Python 2 formatting (str % args)
|
||||||
|
<http://docs.python.org/2/library/stdtypes.html#string-formatting>`_
|
||||||
|
* `Documentation of Python 2 formatting (str.format)
|
||||||
|
<http://docs.python.org/2/library/string.html#formatstrings>`_
|
||||||
|
|
||||||
|
Copyright
|
||||||
|
=========
|
||||||
|
|
||||||
|
This document has been placed in the public domain.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
..
|
||||||
|
Local Variables:
|
||||||
|
mode: indented-text
|
||||||
|
indent-tabs-mode: nil
|
||||||
|
sentence-end-double-space: t
|
||||||
|
fill-column: 70
|
||||||
|
coding: utf-8
|
||||||
|
End:
|
||||||
|
|
Loading…
Reference in New Issue