incorporated %a comments; general clean-up
This commit is contained in:
parent
8978bad5e8
commit
68ce00d52e
91
pep-0461.txt
91
pep-0461.txt
|
@ -8,8 +8,8 @@ Type: Standards Track
|
|||
Content-Type: text/x-rst
|
||||
Created: 2014-01-13
|
||||
Python-Version: 3.5
|
||||
Post-History: 2014-01-14, 2014-01-15, 2014-01-17, 2014-02-22
|
||||
Resolution:
|
||||
Post-History: 2014-01-14, 2014-01-15, 2014-01-17, 2014-02-22, 2014-03-25
|
||||
Resolution:
|
||||
|
||||
|
||||
Abstract
|
||||
|
@ -40,13 +40,8 @@ ASCII compatible segments of text (aka ASCII-encoded text). Bringing back a
|
|||
restricted %-interpolation for ``bytes`` and ``bytearray`` will aid both in
|
||||
writing new wire format code, and in porting Python 2 wire format code.
|
||||
|
||||
|
||||
Overriding Principles
|
||||
=====================
|
||||
|
||||
In order to avoid the problems of auto-conversion and Unicode exceptions
|
||||
that could plague Python 2 code, ``str`` objects will not be supported as
|
||||
interpolation values [4]_ [5]_.
|
||||
Common use-cases include ``dbf`` and ``pdf`` file formats, ``email``
|
||||
formats, and ``FTP`` and ``HTTP`` communications, among many others.
|
||||
|
||||
|
||||
Proposed semantics for ``bytes`` and ``bytearray`` formatting
|
||||
|
@ -57,23 +52,31 @@ Proposed semantics for ``bytes`` and ``bytearray`` formatting
|
|||
|
||||
All the numeric formatting codes (such as ``%x``, ``%o``, ``%e``, ``%f``,
|
||||
``%g``, etc.) will be supported, and will work as they do for str, including
|
||||
the padding, justification and other related modifiers.
|
||||
the padding, justification and other related modifiers. The only difference
|
||||
will be that the results from these codes will be ASCII-encoded text, not
|
||||
unicode. In other words, for any numeric formatting code `%x`::
|
||||
|
||||
Example::
|
||||
b"%x" % val
|
||||
|
||||
is equivalent to
|
||||
|
||||
("%x" % val).encode("ascii")
|
||||
|
||||
Examples::
|
||||
|
||||
>>> b'%4x' % 10
|
||||
b' a'
|
||||
|
||||
>>> '%#4x' % 10
|
||||
>>> b'%#4x' % 10
|
||||
' 0xa'
|
||||
|
||||
>>> '%04X' % 10
|
||||
>>> b'%04X' % 10
|
||||
'000A'
|
||||
|
||||
``%c`` will insert a single byte, either from an ``int`` in range(256), or from
|
||||
a ``bytes`` argument of length 1, not from a ``str``.
|
||||
|
||||
Example::
|
||||
Examples::
|
||||
|
||||
>>> b'%c' % 48
|
||||
b'0'
|
||||
|
@ -81,7 +84,9 @@ Example::
|
|||
>>> b'%c' % b'a'
|
||||
b'a'
|
||||
|
||||
``%s`` is restricted in what it will accept::
|
||||
``%s`` is included for two reasons: 1) `b` is already a format code for
|
||||
``format`` numerics (binary), and 2) it will make 2/3 code easier as Python 2.x
|
||||
code uses ``%s``; however, it is restricted in what it will accept::
|
||||
|
||||
- input type supports ``Py_buffer`` [6]_?
|
||||
use it to collect the necessary bytes
|
||||
|
@ -89,40 +94,46 @@ Example::
|
|||
- input type is something else?
|
||||
use its ``__bytes__`` method [7]_ ; if there isn't one, raise a ``TypeError``
|
||||
|
||||
In particular, ``%s`` will not accept numbers (use a numeric format code for
|
||||
that), nor ``str`` (encode it to ``bytes``).
|
||||
|
||||
Examples::
|
||||
|
||||
>>> b'%s' % b'abc'
|
||||
b'abc'
|
||||
|
||||
>>> b'%s' % 'some string'.encode('utf8')
|
||||
b'some string'
|
||||
|
||||
>>> b'%s' % 3.14
|
||||
Traceback (most recent call last):
|
||||
...
|
||||
TypeError: 3.14 has no __bytes__ method, use a numeric code instead
|
||||
TypeError: b'%s' does not accept numbers, use a numeric code instead
|
||||
|
||||
>>> b'%s' % 'hello world!'
|
||||
Traceback (most recent call last):
|
||||
...
|
||||
TypeError: 'hello world' has no __bytes__ method, perhaps you need to encode it?
|
||||
TypeError: b'%s' does not accept 'str', it must be encoded to `bytes`
|
||||
|
||||
|
||||
``%a`` will call ``ascii()`` on the interpolated value. This is intended
|
||||
as a debugging aid, rather than something that should be used in production.
|
||||
Non-ASCII values will be encoded to either ``\xnn`` or ``\unnnn``
|
||||
representation. Use cases include developing a new protocol and writing
|
||||
landmarks into the stream; debugging data going into an existing protocol
|
||||
to see if the problem is the protocol itself or bad data; a fall-back for a
|
||||
serialization format; or even a rudimentary serialization format when
|
||||
defining ``__bytes__`` would not be appropriate [8].
|
||||
|
||||
.. note::
|
||||
|
||||
Because the ``str`` type does not have a ``__bytes__`` method, attempts to
|
||||
directly use ``'a string'`` as a bytes interpolation value will raise an
|
||||
exception. To use strings they must be encoded or otherwise transformed
|
||||
into a ``bytes`` sequence::
|
||||
|
||||
'a string'.encode('latin-1')
|
||||
|
||||
``%a`` will call ``ascii()`` on the interpolated value's ``repr()``.
|
||||
This is intended as a debugging aid, rather than something that should be used
|
||||
in production. Non-ascii values will be encoded to either ``\xnn`` or ``\unnnn``
|
||||
representation.
|
||||
If a ``str`` is passed into ``%a``, it will be surrounded by quotes.
|
||||
|
||||
|
||||
Unsupported codes
|
||||
-----------------
|
||||
|
||||
``%r`` (which calls ``__repr__`` and returns a '`str`') is not supported.
|
||||
``%r`` (which calls ``__repr__`` and returns a ``str``) is not supported.
|
||||
|
||||
|
||||
Proposed variations
|
||||
|
@ -131,6 +142,9 @@ Proposed variations
|
|||
It was suggested to let ``%s`` accept numbers, but since numbers have their own
|
||||
format codes this idea was discarded.
|
||||
|
||||
It has been suggested to use ``%b`` for bytes as well as ``%s``. This was
|
||||
rejected as not adding any value either in clarity or simplicity.
|
||||
|
||||
It has been proposed to automatically use ``.encode('ascii','strict')`` for
|
||||
``str`` arguments to ``%s``.
|
||||
|
||||
|
@ -171,19 +185,11 @@ for mixed binary data and ASCII-compatible segments: file formats such as
|
|||
``bytes`` and ``bytearray`` already have several methods which assume an ASCII
|
||||
compatible encoding. ``upper()``, ``isalpha()``, and ``expandtabs()`` to name
|
||||
just a few. %-interpolation, with its very restricted mini-language, will not
|
||||
be any more of a nuisance than the already existing methdods.
|
||||
|
||||
|
||||
Open Questions
|
||||
==============
|
||||
|
||||
It has been suggested to use ``%b`` for bytes as well as ``%s``.
|
||||
|
||||
- Pro: clearly says 'this is bytes'; should be used for new code.
|
||||
|
||||
- Con: does not exist in Python 2.x, so we would have two ways of doing the
|
||||
same thing, ``%s`` and ``%b``, with no difference between them.
|
||||
be any more of a nuisance than the already existing methods.
|
||||
|
||||
Some have objected to allowing the full range of numeric formatting codes with
|
||||
the claim that decimal alone would be sufficient. However, at least two
|
||||
formats (dbf and pdf) make use of non-decimal numbers.
|
||||
|
||||
|
||||
Footnotes
|
||||
|
@ -197,8 +203,7 @@ Footnotes
|
|||
.. [6] http://docs.python.org/3/c-api/buffer.html
|
||||
examples: ``memoryview``, ``array.array``, ``bytearray``, ``bytes``
|
||||
.. [7] http://docs.python.org/3/reference/datamodel.html#object.__bytes__
|
||||
.. [8] mainly implicit encode/decode, with intermittent errors when the data
|
||||
was not ASCII compatible
|
||||
.. [8] https://mail.python.org/pipermail/python-dev/2014-February/132750.html
|
||||
|
||||
|
||||
Copyright
|
||||
|
|
Loading…
Reference in New Issue