python-peps/pep-0461.txt

167 lines
4.4 KiB
Plaintext

PEP: 461
Title: Adding % formatting to bytes
Version: $Revision$
Last-Modified: $Date$
Author: Ethan Furman <ethan@stoneleaf.us>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 2014-01-13
Python-Version: 3.5
Post-History: 2014-01-14, 2014-01-15, 2014-01-17
Resolution:
Abstract
========
This PEP proposes adding % formatting operations similar to Python 2's ``str``
type to ``bytes`` [1]_ [2]_.
Overriding Principles
=====================
In order to avoid the problems of auto-conversion and Unicode exceptions that
could plague Py2 code, all object checking will be done by duck-typing, not by
values contained in a Unicode representation [3]_.
Proposed semantics for bytes formatting
=======================================
%-interpolation
---------------
All the numeric formatting codes (such as ``%x``, ``%o``, ``%e``, ``%f``,
``%g``, etc.) will be supported, and will work as they do for str, including
the padding, justification and other related modifiers.
Example::
>>> b'%4x' % 10
b' a'
>>> '%#4x' % 10
' 0xa'
>>> '%04X' % 10
'000A'
``%c`` will insert a single byte, either from an ``int`` in range(256), or from
a ``bytes`` argument of length 1, not from a ``str``.
Example:
>>> b'%c' % 48
b'0'
>>> b'%c' % b'a'
b'a'
``%s`` is restricted in what it will accept:
- input type supports ``Py_buffer`` [4]_?
use it to collect the necessary bytes
- input type is something else?
use its ``__bytes__`` method [5]_ ; if there isn't one, raise a ``TypeError``
Examples::
>>> b'%s' % b'abc'
b'abc'
>>> b'%s' % 3.14
Traceback (most recent call last):
...
TypeError: 3.14 has no __bytes__ method
>>> b'%s' % 'hello world!'
Traceback (most recent call last):
...
TypeError: 'hello world' has no __bytes__ method, perhaps you need to encode it?
.. note::
Because the ``str`` type does not have a ``__bytes__`` method, attempts to
directly use ``'a string'`` as a bytes interpolation value will raise an
exception. To use ``'string'`` values, they must be encoded or otherwise
transformed into a ``bytes`` sequence::
'a string'.encode('latin-1')
Numeric Format Codes
--------------------
To properly handle ``int`` and ``float`` subclasses, ``int()``, ``index()``,
and ``float()`` will be called on the objects intended for (``d``, ``i``,
``u``), (``b``, ``o``, ``x``, ``X``), and (``e``, ``E``, ``f``, ``F``, ``g``,
``G``).
Unsupported codes
-----------------
``%r`` (which calls ``__repr__``), and ``%a`` (which calls ``ascii()`` on
``__repr__``) are not supported.
Proposed variations
===================
It was suggested to let ``%s`` accept numbers, but since numbers have their own
format codes this idea was discarded.
It has been suggested to use ``%b`` for bytes instead of ``%s``.
- Rejected as ``%b`` does not exist in Python 2.x %-interpolation, which is
why we are using ``%s``.
It has been proposed to automatically use ``.encode('ascii','strict')`` for
``str`` arguments to ``%s``.
- Rejected as this would lead to intermittent failures. Better to have the
operation always fail so the trouble-spot can be correctly fixed.
It has been proposed to have ``%s`` return the ascii-encoded repr when the
value is a ``str`` (b'%s' % 'abc' --> b"'abc'").
- Rejected as this would lead to hard to debug failures far from the problem
site. Better to have the operation always fail so the trouble-spot can be
easily fixed.
Originally this PEP also proposed adding format style formatting, but it
was decided that format and its related machinery were all strictly text
(aka ``str``) based, and it was dropped.
Various new special methods were proposed, such as ``__ascii__``,
``__format_bytes__``, etc.; such methods are not needed at this time, but can
be visited again later if real-world use shows deficiencies with this solution.
Footnotes
=========
.. [1] http://docs.python.org/2/library/stdtypes.html#string-formatting
.. [2] neither string.Template, format, nor str.format are under consideration.
.. [3] %c is not an exception as neither of its possible arguments are unicode.
.. [4] http://docs.python.org/3/c-api/buffer.html
.. [5] http://docs.python.org/3/reference/datamodel.html#object.__bytes__
Copyright
=========
This document has been placed in the public domain.
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End: