PEP 461: removed .format; added markup

This commit is contained in:
Ethan Furman 2014-01-17 09:07:32 -08:00
parent 6f67f62478
commit d98a48bde3
1 changed files with 53 additions and 67 deletions

View File

@ -1,5 +1,5 @@
PEP: 461 PEP: 461
Title: Adding % and {} formatting to bytes Title: Adding % formatting to bytes
Version: $Revision$ Version: $Revision$
Last-Modified: $Date$ Last-Modified: $Date$
Author: Ethan Furman <ethan@stoneleaf.us> Author: Ethan Furman <ethan@stoneleaf.us>
@ -8,25 +8,23 @@ Type: Standards Track
Content-Type: text/x-rst Content-Type: text/x-rst
Created: 2014-01-13 Created: 2014-01-13
Python-Version: 3.5 Python-Version: 3.5
Post-History: 2014-01-14, 2014-01-15 Post-History: 2014-01-14, 2014-01-15, 2014-01-17
Resolution: Resolution:
Abstract Abstract
======== ========
This PEP proposes adding the % and {} formatting operations from str to bytes [1]. This PEP proposes adding % formatting operations similar to Python 2's ``str``
type to ``bytes`` [1]_ [2]_.
Overriding Principles Overriding Principles
===================== =====================
In order to avoid the problems of auto-conversion and value-generated exceptions, In order to avoid the problems of auto-conversion and Unicode exceptions that
all object checking will be done via isinstance, not by values contained in a could plague Py2 code, all object checking will be done by duck-typing, not by
Unicode representation. In other words:: values contained in a Unicode representation [3]_.
- duck-typing to allow/reject entry into a byte-stream
- no value generated errors
Proposed semantics for bytes formatting Proposed semantics for bytes formatting
@ -35,17 +33,23 @@ Proposed semantics for bytes formatting
%-interpolation %-interpolation
--------------- ---------------
All the numeric formatting codes (such as %x, %o, %e, %f, %g, etc.) All the numeric formatting codes (such as ``%x``, ``%o``, ``%e``, ``%f``,
will be supported, and will work as they do for str, including the ``%g``, etc.) will be supported, and will work as they do for str, including
padding, justification and other related modifiers, except locale. the padding, justification and other related modifiers.
Example:: Example::
>>> b'%4x' % 10 >>> b'%4x' % 10
b' a' b' a'
%c will insert a single byte, either from an int in range(256), or from >>> '%#4x' % 10
a bytes argument of length 1. ' 0xa'
>>> '%04X' % 10
'000A'
``%c`` will insert a single byte, either from an ``int`` in range(256), or from
a ``bytes`` argument of length 1, not from a ``str``.
Example: Example:
@ -55,13 +59,13 @@ Example:
>>> b'%c' % b'a' >>> b'%c' % b'a'
b'a' b'a'
%s is restricted in what it will accept:: ``%s`` is restricted in what it will accept::
- input type supports Py_buffer? - input type supports ``Py_buffer`` [4]_?
use it to collect the necessary bytes use it to collect the necessary bytes
- input type is something else? - input type is something else?
use its __bytes__ method; if there isn't one, raise an exception [2] use its ``__bytes__`` method [5]_ ; if there isn't one, raise a ``TypeError``
Examples: Examples:
@ -80,89 +84,71 @@ Examples:
.. note:: .. note::
Because the str type does not have a __bytes__ method, attempts to Because the ``str`` type does not have a ``__bytes__`` method, attempts to
directly use 'a string' as a bytes interpolation value will raise an directly use ``'a string'`` as a bytes interpolation value will raise an
exception. To use 'string' values, they must be encoded or otherwise exception. To use ``'string'`` values, they must be encoded or otherwise
transformed into a bytes sequence:: transformed into a ``bytes`` sequence::
'a string'.encode('latin-1') 'a string'.encode('latin-1')
format
------
The format mini language codes, where they correspond with the %-interpolation codes,
will be used as-is, with three exceptions::
- !s is not supported, as {} can mean the default for both str and bytes, in both
Py2 and Py3.
- !b is supported, and new Py3k code can use it to be explicit.
- no other __format__ method will be called.
Numeric Format Codes Numeric Format Codes
-------------------- --------------------
To properly handle int and float subclasses, int(), index(), and float() will be called on the To properly handle ``int`` and ``float`` subclasses, ``int()``, ``index()``,
objects intended for (d, i, u), (b, o, x, X), and (e, E, f, F, g, G). and ``float()`` will be called on the objects intended for (``d``, ``i``,
``u``), (``b``, ``o``, ``x``, ``X``), and (``e``, ``E``, ``f``, ``F``, ``g``,
``G``).
Unsupported codes Unsupported codes
----------------- -----------------
%r (which calls __repr__), and %a (which calls ascii() on __repr__) are not supported. ``%r`` (which calls ``__repr__``), and ``%a`` (which calls ``ascii()`` on
``__repr__``) are not supported.
!r and !a are not supported.
The n integer and float format code is not supported.
Open Questions
==============
Currently non-numeric objects go through::
- Py_buffer
- __bytes__
- failure
Do we want to add a __format_bytes__ method in there?
- Guaranteed to produce only ascii (as in b'10', not b'\x0a')
- Makes more sense than using __bytes__ to produce ascii output
- What if an object has both __bytes__ and __format_bytes__?
Do we need to support all the numeric format codes? The floating point
exponential formats seem less appropriate, for example.
Proposed variations Proposed variations
=================== ===================
It was suggested to let %s accept numbers, but since numbers have their own It was suggested to let ``%s`` accept numbers, but since numbers have their own
format codes this idea was discarded. format codes this idea was discarded.
It has been suggested to use %b for bytes instead of %s. It has been suggested to use ``%b`` for bytes instead of ``%s``.
- Rejected as %b does not exist in Python 2.x %-interpolation, which is - Rejected as ``%b`` does not exist in Python 2.x %-interpolation, which is
why we are using %s. why we are using ``%s``.
It has been proposed to automatically use .encode('ascii','strict') for str It has been proposed to automatically use ``.encode('ascii','strict')`` for
arguments to %s. ``str`` arguments to ``%s``.
- Rejected as this would lead to intermittent failures. Better to have the - Rejected as this would lead to intermittent failures. Better to have the
operation always fail so the trouble-spot can be correctly fixed. operation always fail so the trouble-spot can be correctly fixed.
It has been proposed to have %s return the ascii-encoded repr when the value It has been proposed to have ``%s`` return the ascii-encoded repr when the
is a str (b'%s' % 'abc' --> b"'abc'"). value is a ``str`` (b'%s' % 'abc' --> b"'abc'").
- Rejected as this would lead to hard to debug failures far from the problem - Rejected as this would lead to hard to debug failures far from the problem
site. Better to have the operation always fail so the trouble-spot can be site. Better to have the operation always fail so the trouble-spot can be
easily fixed. easily fixed.
Originally this PEP also proposed adding format style formatting, but it
was decided that format and its related machinery were all strictly text
(aka ``str``) based, and it was dropped.
Various new special methods were proposed, such as ``__ascii__``,
``__format_bytes__``, etc.; such methods are not needed at this time, but can
be visited again later if real-world use shows deficiencies with this solution.
Footnotes Footnotes
========= =========
.. [1] string.Template is not under consideration. .. [1] http://docs.python.org/2/library/stdtypes.html#string-formatting
.. [2] TypeError, ValueError, or UnicodeEncodeError? .. [2] neither string.Template, format, nor str.format are under consideration.
.. [3] %c is not an exception as neither of its possible arguments are unicode.
.. [4] http://docs.python.org/3/c-api/buffer.html
.. [5] http://docs.python.org/3/reference/datamodel.html#object.__bytes__
Copyright Copyright