Mark PEP 414 as Final and incorporate feedback (both public and private)

This commit is contained in:
Nick Coghlan 2012-03-05 22:56:18 +10:00
parent a166e27c84
commit efff9e9745
1 changed files with 75 additions and 47 deletions

View File

@ -4,7 +4,7 @@ Version: $Revision$
Last-Modified: $Date$
Author: Armin Ronacher <armin.ronacher@active-4.com>,
Nick Coghlan <ncoghlan@gmail.com>
Status: Accepted
Status: Final
Type: Standards Track
Content-Type: text/x-rst
Created: 15-Feb-2012
@ -71,21 +71,17 @@ acceptable forms for string literals.
Author's Note
=============
This PEP was originally written by Armin Ronacher, and directly reflected his
feelings regarding his personal experiences porting Unicode aware Python
applications to Python 3. Guido's approval was given based on Armin's version
of the PEP.
This PEP was originally written by Armin Ronacher, and Guido's approval was
given based on that version.
The currently published version has been rewritten by Nick Coghlan to address
the concerns of those who felt that Armin's experience did not accurately
reflect the *typical* experience of porting to Python 3, but rather only
related to a specific subset of porting activities that were not well served
by the existing set of porting tools.
The currently published version has been rewritten by Nick Coghlan to
include additional historical details and rationale that were taken into
account when Guido made his decision, but were not explicitly documented in
Armin's version of the PEP.
Readers should be aware that many of the arguments in this PEP are *not*
technical ones. Instead, they relate heavily to the *social* and *personal*
aspects of software development. After all, developers are people first,
coders second.
aspects of software development.
Rationale
@ -134,17 +130,28 @@ This PEP is not for those communities. Instead, it is designed specifically to
help people that *don't* want to put up with those difficulties.
However, since the proposal is for a comparatively small tweak to the language
syntax with no semantic changes, it may be feasible to support it as a third
party import hook. While such an import hook will impose a small import time
overhead, and will require additional steps from each application that needs it
to get the hook in place, it would allow applications that target Python 3.2
to use libraries and frameworks that may otherwise only run on Python 3.3+.
This approach may prove useful, for example, for applications that wish to
target Python 3 for the Ubuntu LTS release that ships with Python 2.7 and 3.2.
syntax with no semantic changes, it is feasible to support it as a third
party import hook. While such an import hook imposes some import time
overhead, and requires additional steps from each application that needs it
to get the hook in place, it allows applications that target Python 3.2
to use libraries and frameworks that would otherwise only run on Python 3.3+
due to their use of unicode literal prefixes.
One such import hook project is Vinay Sajip's ``uprefix`` [4]_.
For those that prefer to translate their code in advance rather than
converting on the fly at import time, Armin Ronacher is working on a hook
that runs at install time rather than during import [5]_.
Combining the two approaches is of course also possible. For example, the
import hook could be used for rapid edit-test cycles during local
development, but the install hook for continuous integration tasks and
deployment on Python 3.2.
The approaches described in this section may prove useful, for example, for
applications that wish to target Python 3 on the Ubuntu 12.04 LTS release,
which will ship with Python 2.7 and 3.2 as officially supported Python
versions.
Complaint: Python 3 shouldn't be made worse just to support porting from Python 2
---------------------------------------------------------------------------------
@ -155,14 +162,15 @@ purity". If we're going to impose a significant burden on third party
developers, we should have a solid rationale for doing so.
In most cases, the rationale for backwards incompatible Python 3 changes are
either to improve code correctness (for example, stricter separation of binary
and text data and integer division upgrading to floats when necessary), reduce
typical memory usage (for example, increased usage of iterators and views over
concrete lists), or to remove distracting nuisances that make Python code
harder to read without increasing its expressiveness (for example, the comma
based syntax for naming caught exceptions). Changes backed by such reasoning
are *not* going to be reverted, regardless of objections from Python 2
developers attempting to make the transition to Python 3.
either to improve code correctness (for example, stricter default separation
of binary and text data and integer division upgrading to floats when
necessary), reduce typical memory usage (for example, increased usage of
iterators and views over concrete lists), or to remove distracting nuisances
that make Python code harder to read without increasing its expressiveness
(for example, the comma based syntax for naming caught exceptions). Changes
backed by such reasoning are *not* going to be reverted, regardless of
objections from Python 2 developers attempting to make the transition to
Python 3.
In many cases, Python 2 offered two ways of doing things for historical reasons.
For example, inequality could be tested with both ``!=`` and ``<>`` and integer
@ -197,8 +205,11 @@ eliminated in Python 3?
Just as support for string exceptions was eliminated from Python 2 using the
normal deprecation process, support for redundant string prefix characters
(specifically, ``B``, ``R``, ``u``, ``U``) may be eventually eliminated
from Python 3, regardless of the current acceptance of this PEP.
(specifically, ``B``, ``R``, ``u``, ``U``) may eventually be eliminated
from Python 3, regardless of the current acceptance of this PEP. However,
such a change will likely only occur once third party libraries supporting
Python 2.7 is about as common as libraries supporting Python 2.2 or 2.3 is
today.
Complaint: The WSGI "native strings" concept is an ugly hack
@ -218,13 +229,27 @@ three different kinds of string:
* native strings: handled as ``str`` in both Python 2 and Python 3
* binary data: handled as ``str`` in Python 2 and ``bytes`` in Python 3
Native strings are a useful concept because there are some APIs and internal
operations that are designed primarily to work with native strings. They often
don't support ``unicode`` in Python 2 or support ``bytes`` in Python 3 (at
least, not without needing additional encoding information and/or imposing
constraints that don't apply to the native string variants).
Some developers consider WSGI's "native strings" to be an ugly hack, as they
are *explicitly* documented as being used solely for ``latin-1`` decoded
"text", regardless of the actual encoding of the underlying data. Using this
approach bypasses many of the updates to Python 3's data model that are
designed to encourage correct handling of text encodings. However, it
generally works due to the specific details of the problem domain - web server
and web framework developers are some of the individuals *most* aware of how
blurry the line can get between binary data and text when working with HTTP
and related protocols, and how important it is to understand the implications
of the encodings in use when manipulating encoded text data. At the
*application* level most of these details are hidden from the developer by
the web frameworks and support libraries (both in Python 2 *and* in Python 3).
Some example of such interfaces are:
In practice, native strings are a useful concept because there are some APIs
(both in the standard library and in third party frameworks and packages) and
some internal interpreter details that are designed primarily to work with
native strings. These components often don't support ``unicode`` in Python 2
or ``bytes`` in Python 3, or, if they do, require additional encoding details
and/or impose constraints that don't apply to the native string variants.
Some example of interfaces that are best handled as native strings are:
* Python identifiers (dict keys, class names, module names, import paths)
* URLs for the most part as well as HTTP headers in urllib/http servers
@ -238,16 +263,16 @@ Some example of such interfaces are:
In Python 2.6 and 2.7, these distinctions are most naturally expressed as
follows:
* ``u""``: text string
* ``""``: native string
* ``b""``: binary data
* ``u""``: text string (``unicode``)
* ``""``: native string (``str``)
* ``b""``: binary data (``str``, also aliased as ``bytes``)
In Python 3, the native strings are not distinguished from any other text
strings:
In Python 3, the ``latin-1`` decoded native strings are not distinguished
from any other text strings:
* ``""``: text string
* ``""``: native string
* ``b""``: binary data
* ``""``: text string (``str``)
* ``""``: native string (``str``)
* ``b""``: binary data (``bytes``)
If ``from __future__ import unicode_literals`` is used to modify the behaviour
of Python 2, then, along with an appropriate definition of ``n()``, the
@ -273,9 +298,10 @@ That last approach is the only variant that supports Python 2.5 and earlier.
Of all the alternatives, the format currently supported in Python 2.6 and 2.7
is by far the cleanest approach that clearly distinguishes the three desired
kinds of behaviour. With this PEP, that format will also be supported in
Python 3.3+. If the import hook approach works out as planned, it may even be
supported in Python 3.1 and 3.2. A bit more effort could likely adapt the hook
to allow the use of the ``b`` prefix on Python 2.5
Python 3.3+. It will also be supported in Python 3.1 and 3.2 through the use
of import and install hooks. While it is significantly less likely, it is
also conceivable that the hooks could be adapted to allow the use of the
``b`` prefix on Python 2.5.
Complaint: The existing tools should be good enough for everyone
@ -369,6 +395,8 @@ References
.. [4] uprefix import hook project
(https://bitbucket.org/vinay.sajip/uprefix)
.. [5] install hook to remove unicode string prefix characters
(https://github.com/mitsuhiko/unicode-literals-pep/tree/master/install-hook)
Copyright
=========