Incorporate comments from last round of emails (in late January)
This commit is contained in:
parent
33340f76c2
commit
3ff3f95f32
114
pep-0461.txt
114
pep-0461.txt
|
@ -1,5 +1,5 @@
|
|||
PEP: 461
|
||||
Title: Adding % formatting to bytes
|
||||
Title: Adding % formatting to bytes and bytearray
|
||||
Version: $Revision$
|
||||
Last-Modified: $Date$
|
||||
Author: Ethan Furman <ethan@stoneleaf.us>
|
||||
|
@ -8,7 +8,7 @@ Type: Standards Track
|
|||
Content-Type: text/x-rst
|
||||
Created: 2014-01-13
|
||||
Python-Version: 3.5
|
||||
Post-History: 2014-01-14, 2014-01-15, 2014-01-17
|
||||
Post-History: 2014-01-14, 2014-01-15, 2014-01-17, 2014-02-22
|
||||
Resolution:
|
||||
|
||||
|
||||
|
@ -16,18 +16,40 @@ Abstract
|
|||
========
|
||||
|
||||
This PEP proposes adding % formatting operations similar to Python 2's ``str``
|
||||
type to ``bytes`` [1]_ [2]_.
|
||||
type to ``bytes`` and ``bytearray`` [1]_ [2]_.
|
||||
|
||||
|
||||
Rationale
|
||||
=========
|
||||
|
||||
While interpolation is usually thought of as a string operation, there are
|
||||
cases where interpolation on ``bytes`` or ``bytearrays`` make sense, and the
|
||||
work needed to make up for this missing functionality detracts from the overall
|
||||
readability of the code.
|
||||
|
||||
|
||||
Motivation
|
||||
==========
|
||||
|
||||
With Python 3 and the split between ``str`` and ``bytes``, one small but
|
||||
important area of programming became slightly more difficult, and much more
|
||||
painful -- wire format protocols [3]_.
|
||||
|
||||
This area of programming is characterized by a mixture of binary data and
|
||||
ASCII compatible segments of text (aka ASCII-encoded text). Bringing back a
|
||||
restricted %-interpolation for ``bytes`` and ``bytearray`` will aid both in
|
||||
writing new wire format code, and in porting Python 2 wire format code.
|
||||
|
||||
|
||||
Overriding Principles
|
||||
=====================
|
||||
|
||||
In order to avoid the problems of auto-conversion and Unicode exceptions that
|
||||
could plague Py2 code, all object checking will be done by duck-typing, not by
|
||||
values contained in a Unicode representation [3]_.
|
||||
In order to avoid the problems of auto-conversion and Unicode exceptions
|
||||
that could plague Python 2 code, :class:`str` objects will not be supported as
|
||||
interpolation values [4]_ [5]_.
|
||||
|
||||
|
||||
Proposed semantics for bytes formatting
|
||||
Proposed semantics for ``bytes`` and ``bytearray`` formatting
|
||||
=======================================
|
||||
|
||||
%-interpolation
|
||||
|
@ -59,15 +81,15 @@ Example:
|
|||
>>> b'%c' % b'a'
|
||||
b'a'
|
||||
|
||||
``%s`` is restricted in what it will accept:
|
||||
``%s`` is restricted in what it will accept::
|
||||
|
||||
- input type supports ``Py_buffer`` [4]_?
|
||||
use it to collect the necessary bytes
|
||||
- input type supports ``Py_buffer`` [6]_?
|
||||
use it to collect the necessary bytes
|
||||
|
||||
- input type is something else?
|
||||
use its ``__bytes__`` method [5]_ ; if there isn't one, raise a ``TypeError``
|
||||
- input type is something else?
|
||||
use its ``__bytes__`` method [7]_ ; if there isn't one, raise a ``TypeError``
|
||||
|
||||
Examples::
|
||||
Examples:
|
||||
|
||||
>>> b'%s' % b'abc'
|
||||
b'abc'
|
||||
|
@ -75,7 +97,7 @@ Examples::
|
|||
>>> b'%s' % 3.14
|
||||
Traceback (most recent call last):
|
||||
...
|
||||
TypeError: 3.14 has no __bytes__ method
|
||||
TypeError: 3.14 has no __bytes__ method, use a numeric code instead
|
||||
|
||||
>>> b'%s' % 'hello world!'
|
||||
Traceback (most recent call last):
|
||||
|
@ -83,28 +105,24 @@ Examples::
|
|||
TypeError: 'hello world' has no __bytes__ method, perhaps you need to encode it?
|
||||
|
||||
.. note::
|
||||
|
||||
Because the ``str`` type does not have a ``__bytes__`` method, attempts to
|
||||
directly use ``'a string'`` as a bytes interpolation value will raise an
|
||||
exception. To use ``'string'`` values, they must be encoded or otherwise
|
||||
transformed into a ``bytes`` sequence::
|
||||
exception. To use strings they must be encoded or otherwise transformed
|
||||
into a ``bytes`` sequence::
|
||||
|
||||
'a string'.encode('latin-1')
|
||||
|
||||
|
||||
Numeric Format Codes
|
||||
--------------------
|
||||
|
||||
To properly handle ``int`` and ``float`` subclasses, ``int()``, ``index()``,
|
||||
and ``float()`` will be called on the objects intended for (``d``, ``i``,
|
||||
``u``), (``b``, ``o``, ``x``, ``X``), and (``e``, ``E``, ``f``, ``F``, ``g``,
|
||||
``G``).
|
||||
``%a`` will call :func:``ascii()`` on the interpolated value's :func:``repr()``.
|
||||
This is intended as a debugging aid, rather than something that should be used
|
||||
in production. Non-ascii values will be encoded to either ``\xnn`` or ``\unnnn``
|
||||
representation.
|
||||
|
||||
|
||||
Unsupported codes
|
||||
-----------------
|
||||
|
||||
``%r`` (which calls ``__repr__``), and ``%a`` (which calls ``ascii()`` on
|
||||
``__repr__``) are not supported.
|
||||
``%r`` (which calls ``__repr__`` and returns a :class:`str`) is not supported.
|
||||
|
||||
|
||||
Proposed variations
|
||||
|
@ -113,41 +131,51 @@ Proposed variations
|
|||
It was suggested to let ``%s`` accept numbers, but since numbers have their own
|
||||
format codes this idea was discarded.
|
||||
|
||||
It has been suggested to use ``%b`` for bytes instead of ``%s``.
|
||||
|
||||
- Rejected as ``%b`` does not exist in Python 2.x %-interpolation, which is
|
||||
why we are using ``%s``.
|
||||
|
||||
It has been proposed to automatically use ``.encode('ascii','strict')`` for
|
||||
``str`` arguments to ``%s``.
|
||||
|
||||
- Rejected as this would lead to intermittent failures. Better to have the
|
||||
operation always fail so the trouble-spot can be correctly fixed.
|
||||
- Rejected as this would lead to intermittent failures. Better to have the
|
||||
operation always fail so the trouble-spot can be correctly fixed.
|
||||
|
||||
It has been proposed to have ``%s`` return the ascii-encoded repr when the
|
||||
value is a ``str`` (b'%s' % 'abc' --> b"'abc'").
|
||||
|
||||
- Rejected as this would lead to hard to debug failures far from the problem
|
||||
site. Better to have the operation always fail so the trouble-spot can be
|
||||
easily fixed.
|
||||
- Rejected as this would lead to hard to debug failures far from the problem
|
||||
site. Better to have the operation always fail so the trouble-spot can be
|
||||
easily fixed.
|
||||
|
||||
Originally this PEP also proposed adding format style formatting, but it
|
||||
was decided that format and its related machinery were all strictly text
|
||||
(aka ``str``) based, and it was dropped.
|
||||
Originally this PEP also proposed adding format-style formatting, but it was
|
||||
decided that format and its related machinery were all strictly text (aka
|
||||
``str``) based, and it was dropped.
|
||||
|
||||
Various new special methods were proposed, such as ``__ascii__``,
|
||||
``__format_bytes__``, etc.; such methods are not needed at this time, but can
|
||||
be visited again later if real-world use shows deficiencies with this solution.
|
||||
|
||||
|
||||
Open Questions
|
||||
==============
|
||||
|
||||
It has been suggested to use ``%b`` for bytes as well as ``%s``.
|
||||
|
||||
- Pro: clearly says 'this is bytes'; should be used for new code.
|
||||
|
||||
- Con: does not exist in Python 2.x, so we would have two ways of doing the
|
||||
same thing, ``%s`` and ``%b``, with no difference between them.
|
||||
|
||||
|
||||
|
||||
Footnotes
|
||||
=========
|
||||
|
||||
.. [1] http://docs.python.org/2/library/stdtypes.html#string-formatting
|
||||
.. [2] neither string.Template, format, nor str.format are under consideration.
|
||||
.. [3] %c is not an exception as neither of its possible arguments are unicode.
|
||||
.. [4] http://docs.python.org/3/c-api/buffer.html
|
||||
.. [5] http://docs.python.org/3/reference/datamodel.html#object.__bytes__
|
||||
.. [2] neither string.Template, format, nor str.format are under consideration
|
||||
.. [3] https://mail.python.org/pipermail/python-dev/2014-January/131518.html
|
||||
.. [4] to use a str object in a bytes interpolation, encode it first
|
||||
.. [5] %c is not an exception as neither of its possible arguments are str
|
||||
.. [6] http://docs.python.org/3/c-api/buffer.html
|
||||
examples: ``memoryview``, ``array.array``, ``bytearray``, ``bytes``
|
||||
.. [7] http://docs.python.org/3/reference/datamodel.html#object.__bytes__
|
||||
|
||||
|
||||
Copyright
|
||||
|
|
Loading…
Reference in New Issue