PEP: 461 Title: Adding % formatting to bytes Version: $Revision$ Last-Modified: $Date$ Author: Ethan Furman Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 2014-01-13 Python-Version: 3.5 Post-History: 2014-01-14, 2014-01-15, 2014-01-17 Resolution: Abstract ======== This PEP proposes adding % formatting operations similar to Python 2's ``str`` type to ``bytes`` [1]_ [2]_. Overriding Principles ===================== In order to avoid the problems of auto-conversion and Unicode exceptions that could plague Py2 code, all object checking will be done by duck-typing, not by values contained in a Unicode representation [3]_. Proposed semantics for bytes formatting ======================================= %-interpolation --------------- All the numeric formatting codes (such as ``%x``, ``%o``, ``%e``, ``%f``, ``%g``, etc.) will be supported, and will work as they do for str, including the padding, justification and other related modifiers. Example:: >>> b'%4x' % 10 b' a' >>> '%#4x' % 10 ' 0xa' >>> '%04X' % 10 '000A' ``%c`` will insert a single byte, either from an ``int`` in range(256), or from a ``bytes`` argument of length 1, not from a ``str``. Example: >>> b'%c' % 48 b'0' >>> b'%c' % b'a' b'a' ``%s`` is restricted in what it will accept: - input type supports ``Py_buffer`` [4]_? use it to collect the necessary bytes - input type is something else? use its ``__bytes__`` method [5]_ ; if there isn't one, raise a ``TypeError`` Examples:: >>> b'%s' % b'abc' b'abc' >>> b'%s' % 3.14 Traceback (most recent call last): ... TypeError: 3.14 has no __bytes__ method >>> b'%s' % 'hello world!' Traceback (most recent call last): ... TypeError: 'hello world' has no __bytes__ method, perhaps you need to encode it? .. note:: Because the ``str`` type does not have a ``__bytes__`` method, attempts to directly use ``'a string'`` as a bytes interpolation value will raise an exception. To use ``'string'`` values, they must be encoded or otherwise transformed into a ``bytes`` sequence:: 'a string'.encode('latin-1') Numeric Format Codes -------------------- To properly handle ``int`` and ``float`` subclasses, ``int()``, ``index()``, and ``float()`` will be called on the objects intended for (``d``, ``i``, ``u``), (``b``, ``o``, ``x``, ``X``), and (``e``, ``E``, ``f``, ``F``, ``g``, ``G``). Unsupported codes ----------------- ``%r`` (which calls ``__repr__``), and ``%a`` (which calls ``ascii()`` on ``__repr__``) are not supported. Proposed variations =================== It was suggested to let ``%s`` accept numbers, but since numbers have their own format codes this idea was discarded. It has been suggested to use ``%b`` for bytes instead of ``%s``. - Rejected as ``%b`` does not exist in Python 2.x %-interpolation, which is why we are using ``%s``. It has been proposed to automatically use ``.encode('ascii','strict')`` for ``str`` arguments to ``%s``. - Rejected as this would lead to intermittent failures. Better to have the operation always fail so the trouble-spot can be correctly fixed. It has been proposed to have ``%s`` return the ascii-encoded repr when the value is a ``str`` (b'%s' % 'abc' --> b"'abc'"). - Rejected as this would lead to hard to debug failures far from the problem site. Better to have the operation always fail so the trouble-spot can be easily fixed. Originally this PEP also proposed adding format style formatting, but it was decided that format and its related machinery were all strictly text (aka ``str``) based, and it was dropped. Various new special methods were proposed, such as ``__ascii__``, ``__format_bytes__``, etc.; such methods are not needed at this time, but can be visited again later if real-world use shows deficiencies with this solution. Footnotes ========= .. [1] http://docs.python.org/2/library/stdtypes.html#string-formatting .. [2] neither string.Template, format, nor str.format are under consideration. .. [3] %c is not an exception as neither of its possible arguments are unicode. .. [4] http://docs.python.org/3/c-api/buffer.html .. [5] http://docs.python.org/3/reference/datamodel.html#object.__bytes__ Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: