Mark PEP 414 as Final and incorporate feedback (both public and private)

2012-03-05 22:56:18 +10:00 · 2012-03-05 22:56:18 +10:00 · efff9e9745
parent a166e27c84
commit efff9e9745
1 changed files with 75 additions and 47 deletions
--- a/pep-0414.txt
+++ b/pep-0414.txt
@ -4,7 +4,7 @@ Version: $Revision$
 Last-Modified: $Date$
 Author: Armin Ronacher <armin.ronacher@active-4.com>,
        Nick Coghlan <ncoghlan@gmail.com>
-Status: Accepted
+Status: Final
 Type: Standards Track
 Content-Type: text/x-rst
 Created: 15-Feb-2012
@ -71,21 +71,17 @@ acceptable forms for string literals.
 Author's Note
 =============

-This PEP was originally written by Armin Ronacher, and directly reflected his
-feelings regarding his personal experiences porting Unicode aware Python
-applications to Python 3. Guido's approval was given based on Armin's version
-of the PEP.
+This PEP was originally written by Armin Ronacher, and Guido's approval was
+given based on that version.

-The currently published version has been rewritten by Nick Coghlan to address
-the concerns of those who felt that Armin's experience did not accurately
-reflect the *typical* experience of porting to Python 3, but rather only
-related to a specific subset of porting activities that were not well served
-by the existing set of porting tools.
+The currently published version has been rewritten by Nick Coghlan to
+include additional historical details and rationale that were taken into
+account when Guido made his decision, but were not explicitly documented in
+Armin's version of the PEP.

 Readers should be aware that many of the arguments in this PEP are *not*
 technical ones. Instead, they relate heavily to the *social* and *personal*
-aspects of software development. After all, developers are people first,
-coders second.
+aspects of software development.


 Rationale
@ -134,17 +130,28 @@ This PEP is not for those communities. Instead, it is designed specifically to
 help people that *don't* want to put up with those difficulties.

 However, since the proposal is for a comparatively small tweak to the language
-syntax with no semantic changes, it may be feasible to support it as a third
-party import hook. While such an import hook will impose a small import time
-overhead, and will require additional steps from each application that needs it
-to get the hook in place, it would allow applications that target Python 3.2
-to use libraries and frameworks that may otherwise only run on Python 3.3+.
-
-This approach may prove useful, for example, for applications that wish to
-target Python 3 for the Ubuntu LTS release that ships with Python 2.7 and 3.2.
+syntax with no semantic changes, it is feasible to support it as a third
+party import hook. While such an import hook imposes some import time
+overhead, and requires additional steps from each application that needs it
+to get the hook in place, it allows applications that target Python 3.2
+to use libraries and frameworks that would otherwise only run on Python 3.3+
+due to their use of unicode literal prefixes.

 One such import hook project is Vinay Sajip's ``uprefix`` [4]_.

+For those that prefer to translate their code in advance rather than
+converting on the fly at import time, Armin Ronacher is working on a hook
+that runs at install time rather than during import [5]_.
+
+Combining the two approaches is of course also possible. For example, the
+import hook could be used for rapid edit-test cycles during local
+development, but the install hook for continuous integration tasks and
+deployment on Python 3.2.
+
+The approaches described in this section may prove useful, for example, for
+applications that wish to target Python 3 on the Ubuntu 12.04 LTS release,
+which will ship with Python 2.7 and 3.2 as officially supported Python
+versions.

 Complaint: Python 3 shouldn't be made worse just to support porting from Python 2
 ---------------------------------------------------------------------------------
@ -155,14 +162,15 @@ purity". If we're going to impose a significant burden on third party
 developers, we should have a solid rationale for doing so.

 In most cases, the rationale for backwards incompatible Python 3 changes are
-either to improve code correctness (for example, stricter separation of binary
-and text data and integer division upgrading to floats when necessary), reduce
-typical memory usage (for example, increased usage of iterators and views over
-concrete lists), or to remove distracting nuisances that make Python code
-harder to read without increasing its expressiveness (for example, the comma
-based syntax for naming caught exceptions). Changes backed by such reasoning
-are *not* going to be reverted, regardless of objections from Python 2
-developers attempting to make the transition to Python 3.
+either to improve code correctness (for example, stricter default separation
+of binary and text data and integer division upgrading to floats when
+necessary), reduce typical memory usage (for example, increased usage of
+iterators and views over concrete lists), or to remove distracting nuisances
+that make Python code harder to read without increasing its expressiveness
+(for example, the comma based syntax for naming caught exceptions). Changes
+backed by such reasoning are *not* going to be reverted, regardless of
+objections from Python 2 developers attempting to make the transition to
+Python 3.

 In many cases, Python 2 offered two ways of doing things for historical reasons.
 For example, inequality could be tested with both ``!=`` and ``<>`` and integer
@ -197,8 +205,11 @@ eliminated in Python 3?

 Just as support for string exceptions was eliminated from Python 2 using the
 normal deprecation process, support for redundant string prefix characters
-(specifically, ``B``, ``R``, ``u``, ``U``) may be eventually eliminated
-from Python 3, regardless of the current acceptance of this PEP.
+(specifically, ``B``, ``R``, ``u``, ``U``) may eventually be eliminated
+from Python 3, regardless of the current acceptance of this PEP. However,
+such a change will likely only occur once third party libraries supporting
+Python 2.7 is about as common as libraries supporting Python 2.2 or 2.3 is
+today.


 Complaint: The WSGI "native strings" concept is an ugly hack
@ -218,13 +229,27 @@ three different kinds of string:
 * native strings: handled as ``str`` in both Python 2 and Python 3
 * binary data: handled as ``str`` in Python 2 and ``bytes`` in Python 3

-Native strings are a useful concept because there are some APIs and internal
-operations that are designed primarily to work with native strings. They often
-don't support ``unicode`` in Python 2 or support ``bytes`` in Python 3 (at
-least, not without needing additional encoding information and/or imposing
-constraints that don't apply to the native string variants).
+Some developers consider WSGI's "native strings" to be an ugly hack, as they
+are *explicitly* documented as being used solely for ``latin-1`` decoded
+"text", regardless of the actual encoding of the underlying data. Using this
+approach bypasses many of the updates to Python 3's data model that are
+designed to encourage correct handling of text encodings. However, it
+generally works due to the specific details of the problem domain - web server
+and web framework developers are some of the individuals *most* aware of how
+blurry the line can get between binary data and text when working with HTTP
+and related protocols, and how important it is to understand the implications
+of the encodings in use when manipulating encoded text data. At the
+*application* level most of these details are hidden from the developer by
+the web frameworks and support libraries (both in Python 2 *and* in Python 3).

-Some example of such interfaces are:
+In practice, native strings are a useful concept because there are some APIs
+(both in the standard library and in third party frameworks and packages) and
+some internal interpreter details that are designed primarily to work with
+native strings. These components often don't support ``unicode`` in Python 2
+or ``bytes`` in Python 3, or, if they do, require additional encoding details
+and/or impose constraints that don't apply to the native string variants.
+
+Some example of interfaces that are best handled as native strings are:

 * Python identifiers (dict keys, class names, module names, import paths)
 * URLs for the most part as well as HTTP headers in urllib/http servers
@ -238,16 +263,16 @@ Some example of such interfaces are:
 In Python 2.6 and 2.7, these distinctions are most naturally expressed as
 follows:

-* ``u""``: text string
-* ``""``: native string
-* ``b""``: binary data
+* ``u""``: text string (``unicode``)
+* ``""``: native string (``str``)
+* ``b""``: binary data (``str``, also aliased as ``bytes``)

-In Python 3, the native strings are not distinguished from any other text
-strings:
+In Python 3, the ``latin-1`` decoded native strings are not distinguished
+from any other text strings:

-* ``""``: text string
-* ``""``: native string
-* ``b""``: binary data
+* ``""``: text string (``str``)
+* ``""``: native string (``str``)
+* ``b""``: binary data (``bytes``)

 If ``from __future__ import unicode_literals`` is used to modify the behaviour
 of Python 2, then, along with an appropriate definition of ``n()``, the
@ -273,9 +298,10 @@ That last approach is the only variant that supports Python 2.5 and earlier.
 Of all the alternatives, the format currently supported in Python 2.6 and 2.7
 is by far the cleanest approach that clearly distinguishes the three desired
 kinds of behaviour. With this PEP, that format will also be supported in
-Python 3.3+. If the import hook approach works out as planned, it may even be
-supported in Python 3.1 and 3.2. A bit more effort could likely adapt the hook
-to allow the use of the ``b`` prefix on Python 2.5
+Python 3.3+. It will also be supported in Python 3.1 and 3.2 through the use
+of import and install hooks. While it is significantly less likely, it is
+also conceivable that the hooks could be adapted to allow the use of the
+``b`` prefix on Python 2.5.


 Complaint: The existing tools should be good enough for everyone
@ -369,6 +395,8 @@ References
 .. [4] uprefix import hook project
   (https://bitbucket.org/vinay.sajip/uprefix)

+.. [5] install hook to remove unicode string prefix characters
+   (https://github.com/mitsuhiko/unicode-literals-pep/tree/master/install-hook)

 Copyright
 =========