167 lines
4.4 KiB
Plaintext
167 lines
4.4 KiB
Plaintext
PEP: 461
|
|
Title: Adding % formatting to bytes
|
|
Version: $Revision$
|
|
Last-Modified: $Date$
|
|
Author: Ethan Furman <ethan@stoneleaf.us>
|
|
Status: Draft
|
|
Type: Standards Track
|
|
Content-Type: text/x-rst
|
|
Created: 2014-01-13
|
|
Python-Version: 3.5
|
|
Post-History: 2014-01-14, 2014-01-15, 2014-01-17
|
|
Resolution:
|
|
|
|
|
|
Abstract
|
|
========
|
|
|
|
This PEP proposes adding % formatting operations similar to Python 2's ``str``
|
|
type to ``bytes`` [1]_ [2]_.
|
|
|
|
|
|
Overriding Principles
|
|
=====================
|
|
|
|
In order to avoid the problems of auto-conversion and Unicode exceptions that
|
|
could plague Py2 code, all object checking will be done by duck-typing, not by
|
|
values contained in a Unicode representation [3]_.
|
|
|
|
|
|
Proposed semantics for bytes formatting
|
|
=======================================
|
|
|
|
%-interpolation
|
|
---------------
|
|
|
|
All the numeric formatting codes (such as ``%x``, ``%o``, ``%e``, ``%f``,
|
|
``%g``, etc.) will be supported, and will work as they do for str, including
|
|
the padding, justification and other related modifiers.
|
|
|
|
Example::
|
|
|
|
>>> b'%4x' % 10
|
|
b' a'
|
|
|
|
>>> '%#4x' % 10
|
|
' 0xa'
|
|
|
|
>>> '%04X' % 10
|
|
'000A'
|
|
|
|
``%c`` will insert a single byte, either from an ``int`` in range(256), or from
|
|
a ``bytes`` argument of length 1, not from a ``str``.
|
|
|
|
Example:
|
|
|
|
>>> b'%c' % 48
|
|
b'0'
|
|
|
|
>>> b'%c' % b'a'
|
|
b'a'
|
|
|
|
``%s`` is restricted in what it will accept:
|
|
|
|
- input type supports ``Py_buffer`` [4]_?
|
|
use it to collect the necessary bytes
|
|
|
|
- input type is something else?
|
|
use its ``__bytes__`` method [5]_ ; if there isn't one, raise a ``TypeError``
|
|
|
|
Examples::
|
|
|
|
>>> b'%s' % b'abc'
|
|
b'abc'
|
|
|
|
>>> b'%s' % 3.14
|
|
Traceback (most recent call last):
|
|
...
|
|
TypeError: 3.14 has no __bytes__ method
|
|
|
|
>>> b'%s' % 'hello world!'
|
|
Traceback (most recent call last):
|
|
...
|
|
TypeError: 'hello world' has no __bytes__ method, perhaps you need to encode it?
|
|
|
|
.. note::
|
|
Because the ``str`` type does not have a ``__bytes__`` method, attempts to
|
|
directly use ``'a string'`` as a bytes interpolation value will raise an
|
|
exception. To use ``'string'`` values, they must be encoded or otherwise
|
|
transformed into a ``bytes`` sequence::
|
|
|
|
'a string'.encode('latin-1')
|
|
|
|
|
|
Numeric Format Codes
|
|
--------------------
|
|
|
|
To properly handle ``int`` and ``float`` subclasses, ``int()``, ``index()``,
|
|
and ``float()`` will be called on the objects intended for (``d``, ``i``,
|
|
``u``), (``b``, ``o``, ``x``, ``X``), and (``e``, ``E``, ``f``, ``F``, ``g``,
|
|
``G``).
|
|
|
|
|
|
Unsupported codes
|
|
-----------------
|
|
|
|
``%r`` (which calls ``__repr__``), and ``%a`` (which calls ``ascii()`` on
|
|
``__repr__``) are not supported.
|
|
|
|
|
|
Proposed variations
|
|
===================
|
|
|
|
It was suggested to let ``%s`` accept numbers, but since numbers have their own
|
|
format codes this idea was discarded.
|
|
|
|
It has been suggested to use ``%b`` for bytes instead of ``%s``.
|
|
|
|
- Rejected as ``%b`` does not exist in Python 2.x %-interpolation, which is
|
|
why we are using ``%s``.
|
|
|
|
It has been proposed to automatically use ``.encode('ascii','strict')`` for
|
|
``str`` arguments to ``%s``.
|
|
|
|
- Rejected as this would lead to intermittent failures. Better to have the
|
|
operation always fail so the trouble-spot can be correctly fixed.
|
|
|
|
It has been proposed to have ``%s`` return the ascii-encoded repr when the
|
|
value is a ``str`` (b'%s' % 'abc' --> b"'abc'").
|
|
|
|
- Rejected as this would lead to hard to debug failures far from the problem
|
|
site. Better to have the operation always fail so the trouble-spot can be
|
|
easily fixed.
|
|
|
|
Originally this PEP also proposed adding format style formatting, but it
|
|
was decided that format and its related machinery were all strictly text
|
|
(aka ``str``) based, and it was dropped.
|
|
|
|
Various new special methods were proposed, such as ``__ascii__``,
|
|
``__format_bytes__``, etc.; such methods are not needed at this time, but can
|
|
be visited again later if real-world use shows deficiencies with this solution.
|
|
|
|
|
|
Footnotes
|
|
=========
|
|
|
|
.. [1] http://docs.python.org/2/library/stdtypes.html#string-formatting
|
|
.. [2] neither string.Template, format, nor str.format are under consideration.
|
|
.. [3] %c is not an exception as neither of its possible arguments are unicode.
|
|
.. [4] http://docs.python.org/3/c-api/buffer.html
|
|
.. [5] http://docs.python.org/3/reference/datamodel.html#object.__bytes__
|
|
|
|
|
|
Copyright
|
|
=========
|
|
|
|
This document has been placed in the public domain.
|
|
|
|
|
|
..
|
|
Local Variables:
|
|
mode: indented-text
|
|
indent-tabs-mode: nil
|
|
sentence-end-double-space: t
|
|
fill-column: 70
|
|
coding: utf-8
|
|
End:
|