incorporated %a comments; general clean-up

This commit is contained in:
Ethan Furman 2014-03-25 15:33:49 -07:00
parent 8978bad5e8
commit 68ce00d52e
1 changed files with 48 additions and 43 deletions

View File

@ -8,8 +8,8 @@ Type: Standards Track
Content-Type: text/x-rst
Created: 2014-01-13
Python-Version: 3.5
Post-History: 2014-01-14, 2014-01-15, 2014-01-17, 2014-02-22
Resolution:
Post-History: 2014-01-14, 2014-01-15, 2014-01-17, 2014-02-22, 2014-03-25
Resolution:
Abstract
@ -40,13 +40,8 @@ ASCII compatible segments of text (aka ASCII-encoded text). Bringing back a
restricted %-interpolation for ``bytes`` and ``bytearray`` will aid both in
writing new wire format code, and in porting Python 2 wire format code.
Overriding Principles
=====================
In order to avoid the problems of auto-conversion and Unicode exceptions
that could plague Python 2 code, ``str`` objects will not be supported as
interpolation values [4]_ [5]_.
Common use-cases include ``dbf`` and ``pdf`` file formats, ``email``
formats, and ``FTP`` and ``HTTP`` communications, among many others.
Proposed semantics for ``bytes`` and ``bytearray`` formatting
@ -57,23 +52,31 @@ Proposed semantics for ``bytes`` and ``bytearray`` formatting
All the numeric formatting codes (such as ``%x``, ``%o``, ``%e``, ``%f``,
``%g``, etc.) will be supported, and will work as they do for str, including
the padding, justification and other related modifiers.
the padding, justification and other related modifiers. The only difference
will be that the results from these codes will be ASCII-encoded text, not
unicode. In other words, for any numeric formatting code `%x`::
Example::
b"%x" % val
is equivalent to
("%x" % val).encode("ascii")
Examples::
>>> b'%4x' % 10
b' a'
>>> '%#4x' % 10
>>> b'%#4x' % 10
' 0xa'
>>> '%04X' % 10
>>> b'%04X' % 10
'000A'
``%c`` will insert a single byte, either from an ``int`` in range(256), or from
a ``bytes`` argument of length 1, not from a ``str``.
Example::
Examples::
>>> b'%c' % 48
b'0'
@ -81,7 +84,9 @@ Example::
>>> b'%c' % b'a'
b'a'
``%s`` is restricted in what it will accept::
``%s`` is included for two reasons: 1) `b` is already a format code for
``format`` numerics (binary), and 2) it will make 2/3 code easier as Python 2.x
code uses ``%s``; however, it is restricted in what it will accept::
- input type supports ``Py_buffer`` [6]_?
use it to collect the necessary bytes
@ -89,40 +94,46 @@ Example::
- input type is something else?
use its ``__bytes__`` method [7]_ ; if there isn't one, raise a ``TypeError``
In particular, ``%s`` will not accept numbers (use a numeric format code for
that), nor ``str`` (encode it to ``bytes``).
Examples::
>>> b'%s' % b'abc'
b'abc'
>>> b'%s' % 'some string'.encode('utf8')
b'some string'
>>> b'%s' % 3.14
Traceback (most recent call last):
...
TypeError: 3.14 has no __bytes__ method, use a numeric code instead
TypeError: b'%s' does not accept numbers, use a numeric code instead
>>> b'%s' % 'hello world!'
Traceback (most recent call last):
...
TypeError: 'hello world' has no __bytes__ method, perhaps you need to encode it?
TypeError: b'%s' does not accept 'str', it must be encoded to `bytes`
``%a`` will call ``ascii()`` on the interpolated value. This is intended
as a debugging aid, rather than something that should be used in production.
Non-ASCII values will be encoded to either ``\xnn`` or ``\unnnn``
representation. Use cases include developing a new protocol and writing
landmarks into the stream; debugging data going into an existing protocol
to see if the problem is the protocol itself or bad data; a fall-back for a
serialization format; or even a rudimentary serialization format when
defining ``__bytes__`` would not be appropriate [8].
.. note::
Because the ``str`` type does not have a ``__bytes__`` method, attempts to
directly use ``'a string'`` as a bytes interpolation value will raise an
exception. To use strings they must be encoded or otherwise transformed
into a ``bytes`` sequence::
'a string'.encode('latin-1')
``%a`` will call ``ascii()`` on the interpolated value's ``repr()``.
This is intended as a debugging aid, rather than something that should be used
in production. Non-ascii values will be encoded to either ``\xnn`` or ``\unnnn``
representation.
If a ``str`` is passed into ``%a``, it will be surrounded by quotes.
Unsupported codes
-----------------
``%r`` (which calls ``__repr__`` and returns a '`str`') is not supported.
``%r`` (which calls ``__repr__`` and returns a ``str``) is not supported.
Proposed variations
@ -131,6 +142,9 @@ Proposed variations
It was suggested to let ``%s`` accept numbers, but since numbers have their own
format codes this idea was discarded.
It has been suggested to use ``%b`` for bytes as well as ``%s``. This was
rejected as not adding any value either in clarity or simplicity.
It has been proposed to automatically use ``.encode('ascii','strict')`` for
``str`` arguments to ``%s``.
@ -171,19 +185,11 @@ for mixed binary data and ASCII-compatible segments: file formats such as
``bytes`` and ``bytearray`` already have several methods which assume an ASCII
compatible encoding. ``upper()``, ``isalpha()``, and ``expandtabs()`` to name
just a few. %-interpolation, with its very restricted mini-language, will not
be any more of a nuisance than the already existing methdods.
Open Questions
==============
It has been suggested to use ``%b`` for bytes as well as ``%s``.
- Pro: clearly says 'this is bytes'; should be used for new code.
- Con: does not exist in Python 2.x, so we would have two ways of doing the
same thing, ``%s`` and ``%b``, with no difference between them.
be any more of a nuisance than the already existing methods.
Some have objected to allowing the full range of numeric formatting codes with
the claim that decimal alone would be sufficient. However, at least two
formats (dbf and pdf) make use of non-decimal numbers.
Footnotes
@ -197,8 +203,7 @@ Footnotes
.. [6] http://docs.python.org/3/c-api/buffer.html
examples: ``memoryview``, ``array.array``, ``bytearray``, ``bytes``
.. [7] http://docs.python.org/3/reference/datamodel.html#object.__bytes__
.. [8] mainly implicit encode/decode, with intermittent errors when the data
was not ASCII compatible
.. [8] https://mail.python.org/pipermail/python-dev/2014-February/132750.html
Copyright