From 68ce00d52ef602d93b424eba43d02878050e0c3b Mon Sep 17 00:00:00 2001 From: Ethan Furman Date: Tue, 25 Mar 2014 15:33:49 -0700 Subject: [PATCH] incorporated %a comments; general clean-up --- pep-0461.txt | 91 +++++++++++++++++++++++++++------------------------- 1 file changed, 48 insertions(+), 43 deletions(-) diff --git a/pep-0461.txt b/pep-0461.txt index 35938fd5a..1f55e6d77 100644 --- a/pep-0461.txt +++ b/pep-0461.txt @@ -8,8 +8,8 @@ Type: Standards Track Content-Type: text/x-rst Created: 2014-01-13 Python-Version: 3.5 -Post-History: 2014-01-14, 2014-01-15, 2014-01-17, 2014-02-22 -Resolution: +Post-History: 2014-01-14, 2014-01-15, 2014-01-17, 2014-02-22, 2014-03-25 +Resolution: Abstract @@ -40,13 +40,8 @@ ASCII compatible segments of text (aka ASCII-encoded text). Bringing back a restricted %-interpolation for ``bytes`` and ``bytearray`` will aid both in writing new wire format code, and in porting Python 2 wire format code. - -Overriding Principles -===================== - -In order to avoid the problems of auto-conversion and Unicode exceptions -that could plague Python 2 code, ``str`` objects will not be supported as -interpolation values [4]_ [5]_. +Common use-cases include ``dbf`` and ``pdf`` file formats, ``email`` +formats, and ``FTP`` and ``HTTP`` communications, among many others. Proposed semantics for ``bytes`` and ``bytearray`` formatting @@ -57,23 +52,31 @@ Proposed semantics for ``bytes`` and ``bytearray`` formatting All the numeric formatting codes (such as ``%x``, ``%o``, ``%e``, ``%f``, ``%g``, etc.) will be supported, and will work as they do for str, including -the padding, justification and other related modifiers. +the padding, justification and other related modifiers. The only difference +will be that the results from these codes will be ASCII-encoded text, not +unicode. In other words, for any numeric formatting code `%x`:: -Example:: + b"%x" % val + +is equivalent to + + ("%x" % val).encode("ascii") + +Examples:: >>> b'%4x' % 10 b' a' - >>> '%#4x' % 10 + >>> b'%#4x' % 10 ' 0xa' - >>> '%04X' % 10 + >>> b'%04X' % 10 '000A' ``%c`` will insert a single byte, either from an ``int`` in range(256), or from a ``bytes`` argument of length 1, not from a ``str``. -Example:: +Examples:: >>> b'%c' % 48 b'0' @@ -81,7 +84,9 @@ Example:: >>> b'%c' % b'a' b'a' -``%s`` is restricted in what it will accept:: +``%s`` is included for two reasons: 1) `b` is already a format code for +``format`` numerics (binary), and 2) it will make 2/3 code easier as Python 2.x +code uses ``%s``; however, it is restricted in what it will accept:: - input type supports ``Py_buffer`` [6]_? use it to collect the necessary bytes @@ -89,40 +94,46 @@ Example:: - input type is something else? use its ``__bytes__`` method [7]_ ; if there isn't one, raise a ``TypeError`` +In particular, ``%s`` will not accept numbers (use a numeric format code for +that), nor ``str`` (encode it to ``bytes``). + Examples:: >>> b'%s' % b'abc' b'abc' + >>> b'%s' % 'some string'.encode('utf8') + b'some string' + >>> b'%s' % 3.14 Traceback (most recent call last): ... - TypeError: 3.14 has no __bytes__ method, use a numeric code instead + TypeError: b'%s' does not accept numbers, use a numeric code instead >>> b'%s' % 'hello world!' Traceback (most recent call last): ... - TypeError: 'hello world' has no __bytes__ method, perhaps you need to encode it? + TypeError: b'%s' does not accept 'str', it must be encoded to `bytes` + + +``%a`` will call ``ascii()`` on the interpolated value. This is intended +as a debugging aid, rather than something that should be used in production. +Non-ASCII values will be encoded to either ``\xnn`` or ``\unnnn`` +representation. Use cases include developing a new protocol and writing +landmarks into the stream; debugging data going into an existing protocol +to see if the problem is the protocol itself or bad data; a fall-back for a +serialization format; or even a rudimentary serialization format when +defining ``__bytes__`` would not be appropriate [8]. .. note:: - Because the ``str`` type does not have a ``__bytes__`` method, attempts to - directly use ``'a string'`` as a bytes interpolation value will raise an - exception. To use strings they must be encoded or otherwise transformed - into a ``bytes`` sequence:: - - 'a string'.encode('latin-1') - -``%a`` will call ``ascii()`` on the interpolated value's ``repr()``. -This is intended as a debugging aid, rather than something that should be used -in production. Non-ascii values will be encoded to either ``\xnn`` or ``\unnnn`` -representation. + If a ``str`` is passed into ``%a``, it will be surrounded by quotes. Unsupported codes ----------------- -``%r`` (which calls ``__repr__`` and returns a '`str`') is not supported. +``%r`` (which calls ``__repr__`` and returns a ``str``) is not supported. Proposed variations @@ -131,6 +142,9 @@ Proposed variations It was suggested to let ``%s`` accept numbers, but since numbers have their own format codes this idea was discarded. +It has been suggested to use ``%b`` for bytes as well as ``%s``. This was +rejected as not adding any value either in clarity or simplicity. + It has been proposed to automatically use ``.encode('ascii','strict')`` for ``str`` arguments to ``%s``. @@ -171,19 +185,11 @@ for mixed binary data and ASCII-compatible segments: file formats such as ``bytes`` and ``bytearray`` already have several methods which assume an ASCII compatible encoding. ``upper()``, ``isalpha()``, and ``expandtabs()`` to name just a few. %-interpolation, with its very restricted mini-language, will not -be any more of a nuisance than the already existing methdods. - - -Open Questions -============== - -It has been suggested to use ``%b`` for bytes as well as ``%s``. - - - Pro: clearly says 'this is bytes'; should be used for new code. - - - Con: does not exist in Python 2.x, so we would have two ways of doing the - same thing, ``%s`` and ``%b``, with no difference between them. +be any more of a nuisance than the already existing methods. +Some have objected to allowing the full range of numeric formatting codes with +the claim that decimal alone would be sufficient. However, at least two +formats (dbf and pdf) make use of non-decimal numbers. Footnotes @@ -197,8 +203,7 @@ Footnotes .. [6] http://docs.python.org/3/c-api/buffer.html examples: ``memoryview``, ``array.array``, ``bytearray``, ``bytes`` .. [7] http://docs.python.org/3/reference/datamodel.html#object.__bytes__ -.. [8] mainly implicit encode/decode, with intermittent errors when the data - was not ASCII compatible +.. [8] https://mail.python.org/pipermail/python-dev/2014-February/132750.html Copyright