Rewrite PEP 414 to be less passionate in its tone and better address the common objections
This commit is contained in:
parent
6e7b0815b4
commit
6f949e069a
492
pep-0414.txt
492
pep-0414.txt
|
@ -2,7 +2,8 @@ PEP: 414
|
|||
Title: Explicit Unicode Literal for Python 3.3
|
||||
Version: $Revision$
|
||||
Last-Modified: $Date$
|
||||
Author: Armin Ronacher <armin.ronacher@active-4.com>
|
||||
Author: Armin Ronacher <armin.ronacher@active-4.com>,
|
||||
Nick Coghlan <ncoghlan@gmail.com>
|
||||
Status: Accepted
|
||||
Type: Standards Track
|
||||
Content-Type: text/x-rst
|
||||
|
@ -16,231 +17,339 @@ Abstract
|
|||
|
||||
This document proposes the reintegration of an explicit unicode literal
|
||||
from Python 2.x to the Python 3.x language specification, in order to
|
||||
enable side-by-side support of libraries for both Python 2 and Python 3
|
||||
without the need for an explicit 2to3 run.
|
||||
reduce the volume of changes needed when porting Unicode-aware
|
||||
Python 2 applications to Python 3.
|
||||
|
||||
|
||||
BDFL Pronouncement
|
||||
==================
|
||||
|
||||
This PEP has been formally accepted for Python 3.3.
|
||||
This PEP has been formally accepted for Python 3.3:
|
||||
|
||||
I'm accepting the PEP. It's about as harmless as they come. Make it so.
|
||||
|
||||
|
||||
Rationale and Goals
|
||||
===================
|
||||
Proposal
|
||||
========
|
||||
|
||||
Python 3 is a major new revision of the language, and it was decided very
|
||||
early on that breaking backwards compatibility was part of the design. The
|
||||
migration from a Python 2.x to a Python 3 codebase is to be accomplished
|
||||
with the aid of a separate translation tool that converts the Python 2.x
|
||||
sourcecode to Python 3 syntax. With more and more libraries supporting
|
||||
Python 3, however, it has become clear that 2to3 as a tool is
|
||||
insufficient, and people are now attempting to find ways to make the same
|
||||
source work in both Python 2.x and Python 3.x, with varying levels of
|
||||
success.
|
||||
This PEP proposes that Python 3.3 restore support for Python 2's Unicode
|
||||
literal syntax, substantially increasing the number of lines of existing
|
||||
Python 2 code in Unicode aware applications that will run without modification
|
||||
on Python 3.
|
||||
|
||||
Python 2.6 and Python 2.7 support syntax features from Python 3 which for
|
||||
the most part make a unified code base possible. Many thought that the
|
||||
``unicode_literals`` future import might make a common source possible,
|
||||
but it turns out that it's doing more harm than good.
|
||||
Specifically, the Python 3 definition for string literal prefixes will be
|
||||
expanded to allow::
|
||||
|
||||
With the design of the updated WSGI specification a few new terms for
|
||||
strings were loosely defined: unicode strings, byte strings and native
|
||||
strings. In Python 3 the native string type is unicode, in Python 2 the
|
||||
native string type is a bytestring. These native string types are used in
|
||||
a couple of places. The native string type can be interned and is
|
||||
preferably used for identifier names, filenames, source code and a few
|
||||
other low level interpreter operations such as the return value of a
|
||||
``__repr__`` or exception messages.
|
||||
"u" | "U" | "ur" | "UR" | "Ur" | "uR"
|
||||
|
||||
In Python 2.7 these string types can be defined explicitly. Without any
|
||||
future imports ``b'foo'`` means bytestring, ``u'foo'`` declares a unicode
|
||||
string and ``'foo'`` a native string which in Python 2.x means bytes.
|
||||
With the ``unicode_literals`` import the native string type is no longer
|
||||
available by syntax and has to be incorrectly labeled as bytestring. If
|
||||
such a codebase is then used in Python 3, the interpreter will start using
|
||||
byte objects in places where they are no longer accepted (such as
|
||||
identifiers). This can be solved by a module that detects 2.x and 3.x and
|
||||
provides wrapper functions that transcode literals at runtime (either by
|
||||
having a ``u`` function that marks things as unicode without future
|
||||
imports or the inverse by having a ``n`` function that marks strings as
|
||||
native). Unfortunately, this has the side effect of slowing down the
|
||||
runtime performance of Python and makes for less beautiful code.
|
||||
Considering that Python 2 and Python 3 support for most libraries will
|
||||
have to continue side by side for several more years to come, this means
|
||||
that such modules lose one of Python's key properties: easily readable and
|
||||
understandable code.
|
||||
in additional to the currently supported::
|
||||
|
||||
Additionally, the vast majority of people who maintain Python 2.x
|
||||
codebases are more familiar with Python 2.x semantics, and a per-file
|
||||
difference in literal meanings will be very annoying for them in the long
|
||||
run. A quick poll on Twitter about the use of the division future import
|
||||
supported my suspicions that people opt out of behaviour-changing future
|
||||
imports because they are a maintenance burden. Every time you review code
|
||||
you have to check the top of the file to see if the behaviour was changed.
|
||||
Obviously that was an unscientific informal poll, but it might be
|
||||
something worth considering.
|
||||
"r" | "R"
|
||||
|
||||
Proposed Solution
|
||||
=================
|
||||
The following will all denote ordinary Python 3 strings::
|
||||
|
||||
The idea is to support (with Python 3.3) an explicit ``u`` and ``U``
|
||||
prefix for native strings in addition to the prefix-less variants. These
|
||||
would stick around for the entirety of the Python 3 lifetime but might at
|
||||
some point yield deprecation warnings if deemed appropriate. This could
|
||||
be something for pyflakes or other similar libraries to support.
|
||||
'text'
|
||||
"text"
|
||||
'''text'''
|
||||
"""text"""
|
||||
u'text'
|
||||
u"text"
|
||||
u'''text'''
|
||||
u"""text"""
|
||||
U'text'
|
||||
U"text"
|
||||
U'''text'''
|
||||
U"""text"""
|
||||
|
||||
Python 3.2 and earlier
|
||||
======================
|
||||
Combination of the unicode prefix with the raw string prefix will also be
|
||||
supported, just as it was in Python 2.
|
||||
|
||||
An argument against this proposal was made on the Python-Dev mailinglist,
|
||||
mentioning that Ubuntu LTS will ship Python 3.2 and 2.7 for only 5 years.
|
||||
The counterargument is that Python 2.7 is currently the Python version of
|
||||
choice for users who want LTS support. As it stands, when chosing between
|
||||
2.7 and Python 3.2, Python 3 is currently not the best choice for certain
|
||||
long-term investments, since the ecosystem is not yet properly developed,
|
||||
and libraries are still fighting with their API decisions for Python 3.
|
||||
No changes are proposed to Python 3's actual Unicode handling, only to the
|
||||
acceptable forms for string literals.
|
||||
|
||||
A valid point is that this would encourage people to become dependent on
|
||||
Python 3.3 for their ports. Fortunately that is not a big problem since
|
||||
that could be fixed at installation time similar to how many projects are
|
||||
currently invoking 2to3 as part of their installation process.
|
||||
|
||||
For Python 3.1 and Python 3.2 (even 3.0 if necessary) a simple
|
||||
on-installation hook could be provided that tokenizes all source files and
|
||||
strips away the otherwise unnecessary ``u`` prefix at installation time.
|
||||
|
||||
Who Benefits?
|
||||
Author's Note
|
||||
=============
|
||||
|
||||
There are a couple of places where decisions have to be made for or
|
||||
against unicode support almost arbitrarily. This is mostly the case for
|
||||
protocols that do not support unicode all the way down, or hide it behind
|
||||
transport encodings that might or might not be unicode themselves. HTTP,
|
||||
Email and WSGI are good examples of that. For certain ambiguous cases it
|
||||
would be possible to apply the same logic for unicode that Python 3
|
||||
applies to the Python 2 versions of the library as well but, if those
|
||||
details were exposed to the user of the API, it would mean breaking
|
||||
compatibility for existing users of the Python 2 API which is a no-go for
|
||||
many situations. The automatic upgrading of binary strings to unicode
|
||||
strings that would be enabled by this proposal would make it much easier
|
||||
to port such libraries over.
|
||||
This PEP was originally written by Armin Ronacher, and directly reflected his
|
||||
feelings regarding his personal experiences porting Unicode aware Python
|
||||
applications to Python 3. Guido's approval was given based on Armin's version
|
||||
of the PEP.
|
||||
|
||||
Not only the libraries but also the users of these APIs would benefit from
|
||||
that. For instance, the urllib module in Python 2 is using byte strings,
|
||||
and the one in Python 3 is using unicode strings. By leveraging a native
|
||||
string, users can avoid having to adjust for that.
|
||||
The currently published version has been rewritten by Nick Coghlan to address
|
||||
the concerns of those who felt that Armin's experience did not accurately
|
||||
reflect the *typical* experience of porting to Python 3, but rather only
|
||||
related to a specific subset of porting activities that were not well served
|
||||
by the existing set of porting tools.
|
||||
|
||||
Problems with 2to3
|
||||
==================
|
||||
|
||||
In practice 2to3 currently suffers from a few problems which make it
|
||||
unnecessarily difficult and/or unpleasant to use:
|
||||
|
||||
- Bad overall performance. In many cases 2to3 runs 20 times slower than
|
||||
the testsuite for the library or application it's testing. (This for
|
||||
instance is the case for the Jinja2 library).
|
||||
- Slightly different behaviour in 2to3 between different versions of
|
||||
Python cause different outcomes when paired with custom fixers.
|
||||
- Line numbers from error messages do not match up with the real source
|
||||
lines due to added/rewritten imports.
|
||||
- extending 2to3 with custom fixers is nontrivial without using
|
||||
distribute. By default 2to3 works acceptably well for upgrading
|
||||
byte-based APIs to unicode based APIs but it fails to upgrade APIs
|
||||
which already support unicode to Python 3::
|
||||
|
||||
--- test.py (original)
|
||||
+++ test.py (refactored)
|
||||
@@ -1,5 +1,5 @@
|
||||
class Foo(object):
|
||||
def __unicode__(self):
|
||||
- return u'test'
|
||||
+ return 'test'
|
||||
def __str__(self):
|
||||
- return unicode(self).encode('utf-8')
|
||||
+ return str(self).encode('utf-8')
|
||||
Readers should be aware that many of the arguments in this PEP are *not*
|
||||
technical ones. Instead, they relate heavily to the *social* and *personal*
|
||||
aspects of software development. After all, developers are people first,
|
||||
coders second.
|
||||
|
||||
|
||||
APIs and Concepts Using Native Strings
|
||||
======================================
|
||||
Rationale
|
||||
=========
|
||||
|
||||
The following is an incomplete list of APIs and general concepts that use
|
||||
native strings and need implicit upgrading to unicode in Python 3, and
|
||||
which would directly benefit from this support:
|
||||
With the release of a Python 3 compatible version of the Web Services Gateway
|
||||
Interface (WSGI) specification (PEP 3333) for Python 3.2, many parts of the
|
||||
Python web ecosystem have been making a concerted effort to support Python 3
|
||||
without adversely affecting their existing developer and user communities.
|
||||
|
||||
- Python identifiers (dict keys, class names, module names, import
|
||||
paths)
|
||||
- URLs for the most part as well as HTTP headers in urllib/http servers
|
||||
- WSGI environment keys and CGI-inherited values
|
||||
- Python source code for dynamic compilation and AST hacks
|
||||
- Exception messages
|
||||
- ``__repr__`` return value
|
||||
- preferred filesystem paths
|
||||
- preferred OS environment
|
||||
One major item of feedback from key developers in those communities, including
|
||||
Chris McDonough (WebOb, Pyramid), Armin Ronacher (Flask, Werkzeug), Jacob
|
||||
Kaplan-Moss (Django) and Kenneth Reitz (``requests``) is that the requirement
|
||||
to change the spelling of *every* Unicode literal in an application
|
||||
(regardless of how that is accomplished) is a key stumbling block for porting
|
||||
efforts.
|
||||
|
||||
In particular, unlike many of the other Python 3 changes, it isn't one that
|
||||
framework and library authors can easily handle on behalf of their users. Most
|
||||
of those users couldn't care less about the "purity" of the Python language
|
||||
specification, they just want their websites and applications to work as well
|
||||
as possible.
|
||||
|
||||
While it is the Python web community that has been most vocal in highlighting
|
||||
this concern, it is expected that other highly Unicode aware domains (such as
|
||||
GUI development) may run into similar issues as they (and their communities)
|
||||
start making concerted efforts to support Python 3.
|
||||
|
||||
|
||||
Modernizing Code
|
||||
================
|
||||
Common Objections
|
||||
=================
|
||||
|
||||
The 2to3 tool can be easily adjusted to generate code that runs on both
|
||||
Python 2 and Python 3. An experimental extension to 2to3 which only
|
||||
modernizes Python code to the extent that it runs on Python 2.7 or later
|
||||
with support for the ``six`` library is available as python-modernize
|
||||
[1]_. For most cases the runtime impact of ``six`` can be neglected (like
|
||||
a function that calls ``iteritems()`` on a passed dictionary under 2.x or
|
||||
``items()`` under 3.x), but to make strings cheap for both 2.x and 3.x it
|
||||
is nearly impossible. The way it currently works is by abusing the
|
||||
``unicode-escape`` codec on Python 2.x native strings. This is especially
|
||||
ugly if such a string literal is used in a tight loop.
|
||||
|
||||
This proposal would fix this. The modernize module could easily be
|
||||
adjusted to simply not translate unicode strings, and the runtime overhead
|
||||
would disappear.
|
||||
This PEP may harm adoption of Python 3.2
|
||||
----------------------------------------
|
||||
|
||||
Possible Downsides
|
||||
==================
|
||||
This complaint is interesting, as it carries within it a tacit admission that
|
||||
this PEP *will* make it easier to port Unicode aware Python 2 applications to
|
||||
Python 3.
|
||||
|
||||
The obvious downside for this is that potential Python 3 users would have
|
||||
to be aware of the fact that ``u`` is an optional prefix for strings.
|
||||
This is something that Python 3 in general tried to avoid. The second
|
||||
inequality comparison operator was removed, the ``L`` prefix for long
|
||||
integers etc. This PEP would propose a slight revert on that practice by
|
||||
reintroducing redundant syntax. On the other hand, Python already has
|
||||
multiple literals for strings with mostly the same behavior (single
|
||||
quoted, double quoted, single triple quoted, double triple quoted).
|
||||
There are many existing Python communities that are prepared to put up with
|
||||
the constraints imposed by the existing suite of porting tools, or to update
|
||||
their Python 2 code bases sufficiently that the problems are minimised.
|
||||
|
||||
Runtime Overhead of Wrappers
|
||||
============================
|
||||
This PEP is not for those communities. Instead, it is designed specifically to
|
||||
help people that *don't* want to put up with those difficulties.
|
||||
|
||||
I did some basic timings on the performance of a ``u()`` wrapper function
|
||||
as used by the ``six`` library. The implementation of ``u()`` is as
|
||||
follows::
|
||||
However, since the proposal is for a comparatively small tweak to the language
|
||||
syntax with no semantic changes, it may be feasible to support it as a third
|
||||
party import hook. While such an import hook will impose a small import time
|
||||
overhead, and will require additional steps from each application that needs it
|
||||
to get the hook in place, it would allow applications that target Python 3.2
|
||||
to use libraries and frameworks that may otherwise only run on Python 3.3+.
|
||||
|
||||
if sys.version_info >= (3, 0):
|
||||
def u(value):
|
||||
return value
|
||||
else:
|
||||
def u(value):
|
||||
return unicode(value, 'unicode-escape')
|
||||
This approach may prove useful, for example, for applications that wish to
|
||||
target Python 3 for the Ubuntu LTS release that ships with Python 2.7 and 3.2.
|
||||
|
||||
The intention is that ``u'foo'`` can be turned to ``u('foo')`` and that on
|
||||
Python 2.x an implicit decoding happens. In this case the wrapper will
|
||||
have a decoding overhead for Python 2.x. I did some basic timings [2]_ to
|
||||
see how bad the performance loss would be. The following examples measure
|
||||
the execution time over 10000 iterations::
|
||||
If such an import hook becomes available, this PEP will be updated to include
|
||||
a reference to it.
|
||||
|
||||
u'\N{SNOWMAN}barbaz' 1000 loops, best of 3: 295 usec per loop
|
||||
u('\N{SNOWMAN}barbaz') 10 loops, best of 3: 18.5 msec per loop
|
||||
u'foobarbaz_%d' % x 100 loops, best of 3: 8.32 msec per loop
|
||||
u('foobarbaz_%d') % x 10 loops, best of 3: 25.6 msec per loop
|
||||
u'fööbarbaz' 1000 loops, best of 3: 289 usec per loop
|
||||
u('fööbarbaz') 100 loops, best of 3: 15.1 msec per loop
|
||||
u'foobarbaz' 1000 loops, best of 3: 294 usec per loop
|
||||
u('foobarbaz') 100 loops, best of 3: 14.3 msec per loop
|
||||
|
||||
The overhead of the wrapper function in Python 3 is the price of a
|
||||
function call since the function only has to return the argument
|
||||
unchanged.
|
||||
Python 3 shouldn't be made worse just to support porting from Python 2
|
||||
----------------------------------------------------------------------
|
||||
|
||||
This is indeed one of the key design principles of Python 3. However, one of
|
||||
the key design principles of Python as a whole is that "practicality beats
|
||||
purity". If we're going to impose a significant burden on third party
|
||||
developers, we should have a solid rationale for doing so.
|
||||
|
||||
In most cases, the rationale for backwards incompatible Python 3 changes are
|
||||
either to improve code correctness (for example, stricter separation of binary
|
||||
and text data and integer division upgrading to floats when necessary), reduce
|
||||
typical memory usage (for example, increased usage of iterators and views over
|
||||
concrete lists), or to remove distracting nuisances that make Python code
|
||||
harder to read without increasing its expressiveness (for example, the comma
|
||||
based syntax for naming caught exceptions). Changes backed by such reasoning
|
||||
are *not* going to be reverted, regardless of objections from Python 2
|
||||
developers attempting to make the transition to Python 3.
|
||||
|
||||
In many cases, Python 2 offered two ways of doing things for historical reasons.
|
||||
For example, inequality could be tested with both ``!=`` and ``<>`` and integer
|
||||
literals could be specified with an optional ``L`` suffix. Such redundancies
|
||||
have been eliminated in Python 3, which reduces the overall size of the
|
||||
language and improves consistency across developers.
|
||||
|
||||
In the original Python 3 design (up to and including Python 3.2), the explicit
|
||||
prefix syntax for unicode literals was deemed to fall into this category, as it
|
||||
is completely unnecessary in Python 3. However, the difference between those
|
||||
other cases and unicode literals is that the unicode literal prefix is *not*
|
||||
redundant in Python 2 code: it is a programmatically significant distinction
|
||||
that needs to be preserved in some fashion to avoid losing information.
|
||||
|
||||
While porting tools were created to help with the transition (see next section)
|
||||
it still creates an additional burden on heavy users of unicode strings in
|
||||
Python 2, solely so that future developers learning Python 3 don't need to be
|
||||
told "For historical reasons, string literals may have an optional ``u`` or
|
||||
``U`` prefix. Never use this yourselves, it's just there to help with porting
|
||||
from an earlier version of the language."
|
||||
|
||||
Plenty of students learning Python 2 received similar warnings regarding string
|
||||
exceptions without being confused or irreparably stunted in their growth as
|
||||
Python developers. It will be the same with this feature.
|
||||
|
||||
This point is further reinforced by the fact that Python 3 *still* allows the
|
||||
uppercase variants of the ``B`` and ``R`` prefixes for bytes literals and raw
|
||||
bytes and string literals. If the potential for confusion due to string prefix
|
||||
variants is that significant, where was the outcry asking that these
|
||||
redundant prefixes removed along with all the other redundancies that were
|
||||
eliminated in Python 3?
|
||||
|
||||
Just as support for string exceptions was eliminated from Python 2 using the
|
||||
normal deprecation process, support for redundant string prefix characters
|
||||
(specifically, ``B``, ``R``, ``u``, ``U``) may be eventually eliminated
|
||||
from Python 3, regardless of the current acceptance of this PEP.
|
||||
|
||||
|
||||
The WSGI "native strings" concept is an ugly hack, anyway
|
||||
---------------------------------------------------------
|
||||
|
||||
One reason the removal of unicode literals has provoked such concern amongst
|
||||
the web development community is that the updated WSGI specification had to
|
||||
make a few compromises to minimise the disruption for existing web servers
|
||||
that provide a WSGI-compatible interface (this was deemed necessary in order
|
||||
to make the updated standard a viable target for web application authors and
|
||||
web framework developers).
|
||||
|
||||
One of those compromises is the concept of a "native string". WSGI defines
|
||||
three different kinds of string:
|
||||
|
||||
* text strings: handled as ``unicode`` in Python 2 and ``str`` in Python 3
|
||||
* native strings: handled as ``str`` in both Python 2 and Python 3
|
||||
* binary data: handled as ``str`` in Python 2 and ``bytes`` in Python 3
|
||||
|
||||
Native strings are a useful concept because there are some APIs and internal
|
||||
operations that are designed primarily to work with native strings. They often
|
||||
don't support ``unicode`` in Python 2 and don't support ``bytes`` in Python 3
|
||||
(at least, not without needing additional encoding information and/or imposing
|
||||
constraints that don't apply to the native string variants).
|
||||
|
||||
Some example of such interfaces are:
|
||||
|
||||
* Python identifiers (dict keys, class names, module names, import paths)
|
||||
* URLs for the most part as well as HTTP headers in urllib/http servers
|
||||
* WSGI environment keys and CGI-inherited values
|
||||
* Python source code for dynamic compilation and AST hacks
|
||||
* Exception messages
|
||||
* ``__repr__`` return value
|
||||
* preferred filesystem paths
|
||||
* preferred OS environment
|
||||
|
||||
In Python 2.6 and 2.7, these distinctions are most naturally expressed as
|
||||
follows:
|
||||
|
||||
* ``u""``: text string
|
||||
* ``""``: native string
|
||||
* ``b""``: binary data
|
||||
|
||||
In Python 3, the native strings are not distinguished from any other text
|
||||
strings:
|
||||
|
||||
* ``""``: text string
|
||||
* ``""``: native string
|
||||
* ``b""``: binary data
|
||||
|
||||
If ``from __future__ import unicode_literals`` is used to modify the behaviour
|
||||
of Python 2, then, along with an appropriate definition of ``n()``, the
|
||||
distinction can be expressed as:
|
||||
|
||||
* ``""``: text string
|
||||
* ``n("")``: native string
|
||||
* ``b""``: binary data
|
||||
|
||||
(While ``n=str`` works for simple cases, it can sometimes have problems
|
||||
due to non-ASCII source encodings)
|
||||
|
||||
In the common subset of Python 2 and Python 3 (with appropriate
|
||||
specification of a source encoding and definitions of the ``u()`` and ``b()``
|
||||
helper functions), they can be expressed as:
|
||||
|
||||
* ``u("")``: text string
|
||||
* ``""``: native string
|
||||
* ``b("")``: binary data
|
||||
|
||||
That last approach is the only variant that supports Python 2.5 and earlier.
|
||||
|
||||
Of all the alternatives, the format currently supported in Python 2.6 and 2.7
|
||||
is by far the cleanest. With this PEP, that format will also be supported in
|
||||
Python 3.3+. If the import hook approach works out as planned, it may even be
|
||||
supported in Python 3.1 and 3.2. A bit more effort could likely adapt the hook
|
||||
to allow the use of the ``b`` prefix on Python 2.5
|
||||
|
||||
|
||||
The existing tools should be good enough for everyone
|
||||
-----------------------------------------------------
|
||||
|
||||
A commonly expressed sentiment from developers that have already sucessfully
|
||||
ported applications to Python 3 is along the lines of "if you think it's hard,
|
||||
you're doing it wrong" or "it's not that hard, just try it!". While it is no
|
||||
doubt unintentional, these responses all have the effect of telling the
|
||||
people that are pointing out inadequacies in the current porting toolset
|
||||
"there's nothing wrong with the porting tools, you just suck and don't know
|
||||
how to use them properly".
|
||||
|
||||
These responses are a case of completely missing the point of what people are
|
||||
complaining about. The feedback that resulted in this PEP isn't due to people complaining that ports aren't possible. Instead, the feedback is coming from
|
||||
people that have succesfully *completed* ports and are objecting that they
|
||||
found the experience thoroughly *unpleasant* for the class of application that
|
||||
they needed to port (specifically, Unicode aware web frameworks and support
|
||||
libraries).
|
||||
|
||||
This is a subjective appraisal, and it's the reason why the Python 3
|
||||
porting tools ecosystem is a case where the "one obvious way to do it"
|
||||
philosophy emphatically does *not* apply. While it was originally intended that
|
||||
"develop in Python 2, convert with ``2to3``, test both" would be the standard
|
||||
way to develop for both versions in parallel, in practice, the needs of
|
||||
different projects and developer communities have proven to be sufficiently
|
||||
diverse that a variety of approaches have been devised, allowing each group
|
||||
to select an approach that best fits their needs.
|
||||
|
||||
Lennart Regebro has produced an excellent overview of the available migration
|
||||
strategies [2]_, and a similar review is provided in the official porting
|
||||
guide [3]_. (Note that the official guidance has softened to "it depends on
|
||||
your specific situation" since Lennart wrote his overview).
|
||||
|
||||
However, both of those guides are written from the founding assumption that
|
||||
all of the developers involved are *already* committed to the idea of
|
||||
supporting Python 3. They make no allowance for the *social* aspects of such a
|
||||
change when you're interacting with a user base that may not be especially
|
||||
tolerant of disruptions without a clear benefit, or are trying to persuade
|
||||
Python 2 focused upstream developers to accept patches that are solely about
|
||||
improving Python 3 forward compatibility.
|
||||
|
||||
With the current porting toolset, *every* migration strategy will result in
|
||||
changes to *every* Unicode literal in a project. No exceptions. They will
|
||||
be converted to either an unprefixed string literal (if the project decides to
|
||||
adopt the ``unicode_literals`` import) or else to a converter call like
|
||||
``u("text")``.
|
||||
|
||||
If the ``unicode_literals`` import approach is employed, but is not adopted
|
||||
across the entire project at the same time, then the meaning of a bare string
|
||||
literal may become annoyingly ambiguous. This problem can be particularly
|
||||
pernicious for *aggregated* software, like a Django site - in such a situation,
|
||||
some files may end up using the unicode literals import and others may not,
|
||||
creating definite potential for confusion.
|
||||
|
||||
While these problems are clearly solvable at a technical level, they're a
|
||||
completely unnecessary distraction at the social level. Developer energy should
|
||||
be reserved for addressing *real* technical difficulties associated with the
|
||||
Python 3 transition (like distinguishing their 8-bit text strings from their
|
||||
binary data). They shouldn't be punished with additional code changes (even
|
||||
automated ones) solely due to the fact that they have *already* explicitly
|
||||
identified their Unicode strings in Python 2.
|
||||
|
||||
Armin Ronacher has created an experimental extension to 2to3 which only
|
||||
modernizes Python code to the extent that it runs on Python 2.7 or later with
|
||||
support from the cross-version compatibility ``six`` library is available as
|
||||
``python-modernize`` [1]_. Currently, the deltas generated by this tool will
|
||||
affect every Unicode literal in the converted source. This will create
|
||||
legitimate concerns amongst upstream developers asked to accept such changes.
|
||||
|
||||
However, by eliminating the noise from changes to the Unicode literal syntax,
|
||||
many projects could be cleanly and (relatively) non-controversially made
|
||||
forward compatible with Python 3.3+ just by running ``python-modernize`` and
|
||||
applying the recommended changes.
|
||||
|
||||
|
||||
References
|
||||
|
@ -248,9 +357,12 @@ References
|
|||
|
||||
.. [1] Python-Modernize
|
||||
(http://github.com/mitsuhiko/python-modernize)
|
||||
.. [2] Benchmark
|
||||
(https://github.com/mitsuhiko/unicode-literals-pep/blob/master/timing.py)
|
||||
|
||||
.. [2] Porting to Python 3: Migration Strategies
|
||||
(http://python3porting.com/strategies.html)
|
||||
|
||||
.. [3] Porting Python 2 Code to Python 3
|
||||
(http://docs.python.org/howto/pyporting.html)
|
||||
|
||||
Copyright
|
||||
=========
|
||||
|
|
Loading…
Reference in New Issue