incorporated %a comments; general clean-up

2014-03-25 15:33:49 -07:00 · 2014-03-25 15:33:49 -07:00 · 68ce00d52e
parent 8978bad5e8
commit 68ce00d52e
1 changed files with 48 additions and 43 deletions
--- a/pep-0461.txt
+++ b/pep-0461.txt
@ -8,8 +8,8 @@ Type: Standards Track
 Content-Type: text/x-rst
 Created: 2014-01-13
 Python-Version: 3.5
-Post-History: 2014-01-14, 2014-01-15, 2014-01-17, 2014-02-22
-Resolution: 
+Post-History: 2014-01-14, 2014-01-15, 2014-01-17, 2014-02-22, 2014-03-25
+Resolution:


 Abstract
@ -40,13 +40,8 @@ ASCII compatible segments of text (aka ASCII-encoded text).  Bringing back a
 restricted %-interpolation for ``bytes`` and ``bytearray`` will aid both in
 writing new wire format code, and in porting Python 2 wire format code.

-
-Overriding Principles
-=====================
-
-In order to avoid the problems of auto-conversion and Unicode exceptions
-that could plague Python 2 code, ``str`` objects will not be supported as
-interpolation values [4]_ [5]_.
+Common use-cases include ``dbf`` and ``pdf`` file formats, ``email``
+formats, and ``FTP`` and ``HTTP`` communications, among many others.


 Proposed semantics for ``bytes`` and ``bytearray`` formatting
@ -57,23 +52,31 @@ Proposed semantics for ``bytes`` and ``bytearray`` formatting

 All the numeric formatting codes (such as ``%x``, ``%o``, ``%e``, ``%f``,
 ``%g``, etc.) will be supported, and will work as they do for str, including
-the padding, justification and other related modifiers.
+the padding, justification and other related modifiers.  The only difference
+will be that the results from these codes will be ASCII-encoded text, not
+unicode.  In other words, for any numeric formatting code `%x`::

-Example::
+   b"%x" % val
+
+is equivalent to
+
+   ("%x" % val).encode("ascii")
+
+Examples::

   >>> b'%4x' % 10
   b'   a'

-   >>> '%#4x' % 10
+   >>> b'%#4x' % 10
   ' 0xa'

-   >>> '%04X' % 10
+   >>> b'%04X' % 10
   '000A'

 ``%c`` will insert a single byte, either from an ``int`` in range(256), or from
 a ``bytes`` argument of length 1, not from a ``str``.

-Example::
+Examples::

    >>> b'%c' % 48
    b'0'
@ -81,7 +84,9 @@ Example::
    >>> b'%c' % b'a'
    b'a'

-``%s`` is restricted in what it will accept::
+``%s`` is included for two reasons:  1) `b` is already a format code for
+``format`` numerics (binary), and 2) it will make 2/3 code easier as Python 2.x
+code uses ``%s``; however, it is restricted in what it will accept::

  - input type supports ``Py_buffer`` [6]_?
    use it to collect the necessary bytes
@ -89,40 +94,46 @@ Example::
  - input type is something else?
    use its ``__bytes__`` method [7]_ ; if there isn't one, raise a ``TypeError``

+In particular, ``%s`` will not accept numbers (use a numeric format code for
+that), nor ``str`` (encode it to ``bytes``).
+
 Examples::

    >>> b'%s' % b'abc'
    b'abc'

+    >>> b'%s' % 'some string'.encode('utf8')
+    b'some string'
+
    >>> b'%s' % 3.14
    Traceback (most recent call last):
    ...
-    TypeError: 3.14 has no __bytes__ method, use a numeric code instead
+    TypeError: b'%s' does not accept numbers, use a numeric code instead

    >>> b'%s' % 'hello world!'
    Traceback (most recent call last):
    ...
-    TypeError: 'hello world' has no __bytes__ method, perhaps you need to encode it?
+    TypeError: b'%s' does not accept 'str', it must be encoded to `bytes`
+
+
+``%a`` will call ``ascii()`` on the interpolated value.  This is intended
+as a debugging aid, rather than something that should be used in production.
+Non-ASCII values will be encoded to either ``\xnn`` or ``\unnnn``
+representation.  Use cases include developing a new protocol and writing
+landmarks into the stream; debugging data going into an existing protocol
+to see if the problem is the protocol itself or bad data; a fall-back for a
+serialization format; or even a rudimentary serialization format when
+defining ``__bytes__`` would not be appropriate [8].

 .. note::

-   Because the ``str`` type does not have a ``__bytes__`` method, attempts to
-   directly use ``'a string'`` as a bytes interpolation value will raise an
-   exception.  To use strings they must be encoded or otherwise transformed
-   into a ``bytes`` sequence::
-
-      'a string'.encode('latin-1')
-
-``%a`` will call ``ascii()`` on the interpolated value's ``repr()``.
-This is intended as a debugging aid, rather than something that should be used
-in production.  Non-ascii values will be encoded to either ``\xnn`` or ``\unnnn``
-representation.
+    If a ``str`` is passed into ``%a``, it will be surrounded by quotes.


 Unsupported codes
 -----------------

-``%r`` (which calls ``__repr__`` and returns a '`str`') is not supported.
+``%r`` (which calls ``__repr__`` and returns a ``str``) is not supported.


 Proposed variations
@ -131,6 +142,9 @@ Proposed variations
 It was suggested to let ``%s`` accept numbers, but since numbers have their own
 format codes this idea was discarded.

+It has been suggested to use ``%b`` for bytes as well as ``%s``.  This was
+rejected as not adding any value either in clarity or simplicity.
+
 It has been proposed to automatically use ``.encode('ascii','strict')`` for
 ``str`` arguments to ``%s``.

@ -171,19 +185,11 @@ for mixed binary data and ASCII-compatible segments: file formats such as
 ``bytes`` and ``bytearray`` already have several methods which assume an ASCII
 compatible encoding.  ``upper()``, ``isalpha()``, and ``expandtabs()`` to name
 just a few.  %-interpolation, with its very restricted mini-language, will not
-be any more of a nuisance than the already existing methdods.
-
-
-Open Questions
-==============
-
-It has been suggested to use ``%b`` for bytes as well as ``%s``.
-
-  - Pro: clearly says 'this is bytes'; should be used for new code.
-
-  - Con: does not exist in Python 2.x, so we would have two ways of doing the
-    same thing, ``%s`` and ``%b``, with no difference between them.
+be any more of a nuisance than the already existing methods.

+Some have objected to allowing the full range of numeric formatting codes with
+the claim that decimal alone would be sufficient.  However, at least two
+formats (dbf and pdf) make use of non-decimal numbers.


 Footnotes
@ -197,8 +203,7 @@ Footnotes
 .. [6] http://docs.python.org/3/c-api/buffer.html
       examples:  ``memoryview``, ``array.array``, ``bytearray``, ``bytes``
 .. [7] http://docs.python.org/3/reference/datamodel.html#object.__bytes__
-.. [8] mainly implicit encode/decode, with intermittent errors when the data
-       was not ASCII compatible
+.. [8] https://mail.python.org/pipermail/python-dev/2014-February/132750.html


 Copyright