Mark PEP 414 as Final and incorporate feedback (both public and private)
This commit is contained in:
parent
a166e27c84
commit
efff9e9745
122
pep-0414.txt
122
pep-0414.txt
|
@ -4,7 +4,7 @@ Version: $Revision$
|
|||
Last-Modified: $Date$
|
||||
Author: Armin Ronacher <armin.ronacher@active-4.com>,
|
||||
Nick Coghlan <ncoghlan@gmail.com>
|
||||
Status: Accepted
|
||||
Status: Final
|
||||
Type: Standards Track
|
||||
Content-Type: text/x-rst
|
||||
Created: 15-Feb-2012
|
||||
|
@ -71,21 +71,17 @@ acceptable forms for string literals.
|
|||
Author's Note
|
||||
=============
|
||||
|
||||
This PEP was originally written by Armin Ronacher, and directly reflected his
|
||||
feelings regarding his personal experiences porting Unicode aware Python
|
||||
applications to Python 3. Guido's approval was given based on Armin's version
|
||||
of the PEP.
|
||||
This PEP was originally written by Armin Ronacher, and Guido's approval was
|
||||
given based on that version.
|
||||
|
||||
The currently published version has been rewritten by Nick Coghlan to address
|
||||
the concerns of those who felt that Armin's experience did not accurately
|
||||
reflect the *typical* experience of porting to Python 3, but rather only
|
||||
related to a specific subset of porting activities that were not well served
|
||||
by the existing set of porting tools.
|
||||
The currently published version has been rewritten by Nick Coghlan to
|
||||
include additional historical details and rationale that were taken into
|
||||
account when Guido made his decision, but were not explicitly documented in
|
||||
Armin's version of the PEP.
|
||||
|
||||
Readers should be aware that many of the arguments in this PEP are *not*
|
||||
technical ones. Instead, they relate heavily to the *social* and *personal*
|
||||
aspects of software development. After all, developers are people first,
|
||||
coders second.
|
||||
aspects of software development.
|
||||
|
||||
|
||||
Rationale
|
||||
|
@ -134,17 +130,28 @@ This PEP is not for those communities. Instead, it is designed specifically to
|
|||
help people that *don't* want to put up with those difficulties.
|
||||
|
||||
However, since the proposal is for a comparatively small tweak to the language
|
||||
syntax with no semantic changes, it may be feasible to support it as a third
|
||||
party import hook. While such an import hook will impose a small import time
|
||||
overhead, and will require additional steps from each application that needs it
|
||||
to get the hook in place, it would allow applications that target Python 3.2
|
||||
to use libraries and frameworks that may otherwise only run on Python 3.3+.
|
||||
|
||||
This approach may prove useful, for example, for applications that wish to
|
||||
target Python 3 for the Ubuntu LTS release that ships with Python 2.7 and 3.2.
|
||||
syntax with no semantic changes, it is feasible to support it as a third
|
||||
party import hook. While such an import hook imposes some import time
|
||||
overhead, and requires additional steps from each application that needs it
|
||||
to get the hook in place, it allows applications that target Python 3.2
|
||||
to use libraries and frameworks that would otherwise only run on Python 3.3+
|
||||
due to their use of unicode literal prefixes.
|
||||
|
||||
One such import hook project is Vinay Sajip's ``uprefix`` [4]_.
|
||||
|
||||
For those that prefer to translate their code in advance rather than
|
||||
converting on the fly at import time, Armin Ronacher is working on a hook
|
||||
that runs at install time rather than during import [5]_.
|
||||
|
||||
Combining the two approaches is of course also possible. For example, the
|
||||
import hook could be used for rapid edit-test cycles during local
|
||||
development, but the install hook for continuous integration tasks and
|
||||
deployment on Python 3.2.
|
||||
|
||||
The approaches described in this section may prove useful, for example, for
|
||||
applications that wish to target Python 3 on the Ubuntu 12.04 LTS release,
|
||||
which will ship with Python 2.7 and 3.2 as officially supported Python
|
||||
versions.
|
||||
|
||||
Complaint: Python 3 shouldn't be made worse just to support porting from Python 2
|
||||
---------------------------------------------------------------------------------
|
||||
|
@ -155,14 +162,15 @@ purity". If we're going to impose a significant burden on third party
|
|||
developers, we should have a solid rationale for doing so.
|
||||
|
||||
In most cases, the rationale for backwards incompatible Python 3 changes are
|
||||
either to improve code correctness (for example, stricter separation of binary
|
||||
and text data and integer division upgrading to floats when necessary), reduce
|
||||
typical memory usage (for example, increased usage of iterators and views over
|
||||
concrete lists), or to remove distracting nuisances that make Python code
|
||||
harder to read without increasing its expressiveness (for example, the comma
|
||||
based syntax for naming caught exceptions). Changes backed by such reasoning
|
||||
are *not* going to be reverted, regardless of objections from Python 2
|
||||
developers attempting to make the transition to Python 3.
|
||||
either to improve code correctness (for example, stricter default separation
|
||||
of binary and text data and integer division upgrading to floats when
|
||||
necessary), reduce typical memory usage (for example, increased usage of
|
||||
iterators and views over concrete lists), or to remove distracting nuisances
|
||||
that make Python code harder to read without increasing its expressiveness
|
||||
(for example, the comma based syntax for naming caught exceptions). Changes
|
||||
backed by such reasoning are *not* going to be reverted, regardless of
|
||||
objections from Python 2 developers attempting to make the transition to
|
||||
Python 3.
|
||||
|
||||
In many cases, Python 2 offered two ways of doing things for historical reasons.
|
||||
For example, inequality could be tested with both ``!=`` and ``<>`` and integer
|
||||
|
@ -197,8 +205,11 @@ eliminated in Python 3?
|
|||
|
||||
Just as support for string exceptions was eliminated from Python 2 using the
|
||||
normal deprecation process, support for redundant string prefix characters
|
||||
(specifically, ``B``, ``R``, ``u``, ``U``) may be eventually eliminated
|
||||
from Python 3, regardless of the current acceptance of this PEP.
|
||||
(specifically, ``B``, ``R``, ``u``, ``U``) may eventually be eliminated
|
||||
from Python 3, regardless of the current acceptance of this PEP. However,
|
||||
such a change will likely only occur once third party libraries supporting
|
||||
Python 2.7 is about as common as libraries supporting Python 2.2 or 2.3 is
|
||||
today.
|
||||
|
||||
|
||||
Complaint: The WSGI "native strings" concept is an ugly hack
|
||||
|
@ -218,13 +229,27 @@ three different kinds of string:
|
|||
* native strings: handled as ``str`` in both Python 2 and Python 3
|
||||
* binary data: handled as ``str`` in Python 2 and ``bytes`` in Python 3
|
||||
|
||||
Native strings are a useful concept because there are some APIs and internal
|
||||
operations that are designed primarily to work with native strings. They often
|
||||
don't support ``unicode`` in Python 2 or support ``bytes`` in Python 3 (at
|
||||
least, not without needing additional encoding information and/or imposing
|
||||
constraints that don't apply to the native string variants).
|
||||
Some developers consider WSGI's "native strings" to be an ugly hack, as they
|
||||
are *explicitly* documented as being used solely for ``latin-1`` decoded
|
||||
"text", regardless of the actual encoding of the underlying data. Using this
|
||||
approach bypasses many of the updates to Python 3's data model that are
|
||||
designed to encourage correct handling of text encodings. However, it
|
||||
generally works due to the specific details of the problem domain - web server
|
||||
and web framework developers are some of the individuals *most* aware of how
|
||||
blurry the line can get between binary data and text when working with HTTP
|
||||
and related protocols, and how important it is to understand the implications
|
||||
of the encodings in use when manipulating encoded text data. At the
|
||||
*application* level most of these details are hidden from the developer by
|
||||
the web frameworks and support libraries (both in Python 2 *and* in Python 3).
|
||||
|
||||
Some example of such interfaces are:
|
||||
In practice, native strings are a useful concept because there are some APIs
|
||||
(both in the standard library and in third party frameworks and packages) and
|
||||
some internal interpreter details that are designed primarily to work with
|
||||
native strings. These components often don't support ``unicode`` in Python 2
|
||||
or ``bytes`` in Python 3, or, if they do, require additional encoding details
|
||||
and/or impose constraints that don't apply to the native string variants.
|
||||
|
||||
Some example of interfaces that are best handled as native strings are:
|
||||
|
||||
* Python identifiers (dict keys, class names, module names, import paths)
|
||||
* URLs for the most part as well as HTTP headers in urllib/http servers
|
||||
|
@ -238,16 +263,16 @@ Some example of such interfaces are:
|
|||
In Python 2.6 and 2.7, these distinctions are most naturally expressed as
|
||||
follows:
|
||||
|
||||
* ``u""``: text string
|
||||
* ``""``: native string
|
||||
* ``b""``: binary data
|
||||
* ``u""``: text string (``unicode``)
|
||||
* ``""``: native string (``str``)
|
||||
* ``b""``: binary data (``str``, also aliased as ``bytes``)
|
||||
|
||||
In Python 3, the native strings are not distinguished from any other text
|
||||
strings:
|
||||
In Python 3, the ``latin-1`` decoded native strings are not distinguished
|
||||
from any other text strings:
|
||||
|
||||
* ``""``: text string
|
||||
* ``""``: native string
|
||||
* ``b""``: binary data
|
||||
* ``""``: text string (``str``)
|
||||
* ``""``: native string (``str``)
|
||||
* ``b""``: binary data (``bytes``)
|
||||
|
||||
If ``from __future__ import unicode_literals`` is used to modify the behaviour
|
||||
of Python 2, then, along with an appropriate definition of ``n()``, the
|
||||
|
@ -273,9 +298,10 @@ That last approach is the only variant that supports Python 2.5 and earlier.
|
|||
Of all the alternatives, the format currently supported in Python 2.6 and 2.7
|
||||
is by far the cleanest approach that clearly distinguishes the three desired
|
||||
kinds of behaviour. With this PEP, that format will also be supported in
|
||||
Python 3.3+. If the import hook approach works out as planned, it may even be
|
||||
supported in Python 3.1 and 3.2. A bit more effort could likely adapt the hook
|
||||
to allow the use of the ``b`` prefix on Python 2.5
|
||||
Python 3.3+. It will also be supported in Python 3.1 and 3.2 through the use
|
||||
of import and install hooks. While it is significantly less likely, it is
|
||||
also conceivable that the hooks could be adapted to allow the use of the
|
||||
``b`` prefix on Python 2.5.
|
||||
|
||||
|
||||
Complaint: The existing tools should be good enough for everyone
|
||||
|
@ -369,6 +395,8 @@ References
|
|||
.. [4] uprefix import hook project
|
||||
(https://bitbucket.org/vinay.sajip/uprefix)
|
||||
|
||||
.. [5] install hook to remove unicode string prefix characters
|
||||
(https://github.com/mitsuhiko/unicode-literals-pep/tree/master/install-hook)
|
||||
|
||||
Copyright
|
||||
=========
|
||||
|
|
Loading…
Reference in New Issue