2012-02-25 13:52:45 -05:00
|
|
|
|
PEP: 414
|
|
|
|
|
Title: Explicit Unicode Literal for Python 3.3
|
|
|
|
|
Version: $Revision$
|
|
|
|
|
Last-Modified: $Date$
|
2012-03-04 02:24:43 -05:00
|
|
|
|
Author: Armin Ronacher <armin.ronacher@active-4.com>,
|
|
|
|
|
Nick Coghlan <ncoghlan@gmail.com>
|
2012-03-05 07:56:18 -05:00
|
|
|
|
Status: Final
|
2012-02-25 13:52:45 -05:00
|
|
|
|
Type: Standards Track
|
|
|
|
|
Content-Type: text/x-rst
|
|
|
|
|
Created: 15-Feb-2012
|
2022-08-24 18:40:18 -04:00
|
|
|
|
Python-Version: 3.3
|
2012-03-04 02:58:04 -05:00
|
|
|
|
Post-History: 28-Feb-2012, 04-Mar-2012
|
2017-06-11 15:02:39 -04:00
|
|
|
|
Resolution: https://mail.python.org/pipermail/python-dev/2012-February/116995.html
|
2012-02-25 13:52:45 -05:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Abstract
|
|
|
|
|
========
|
|
|
|
|
|
|
|
|
|
This document proposes the reintegration of an explicit unicode literal
|
|
|
|
|
from Python 2.x to the Python 3.x language specification, in order to
|
2012-03-04 02:24:43 -05:00
|
|
|
|
reduce the volume of changes needed when porting Unicode-aware
|
|
|
|
|
Python 2 applications to Python 3.
|
2012-02-25 13:52:45 -05:00
|
|
|
|
|
|
|
|
|
|
2012-02-28 03:18:52 -05:00
|
|
|
|
BDFL Pronouncement
|
|
|
|
|
==================
|
|
|
|
|
|
2012-03-04 02:24:43 -05:00
|
|
|
|
This PEP has been formally accepted for Python 3.3:
|
|
|
|
|
|
|
|
|
|
I'm accepting the PEP. It's about as harmless as they come. Make it so.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Proposal
|
|
|
|
|
========
|
|
|
|
|
|
|
|
|
|
This PEP proposes that Python 3.3 restore support for Python 2's Unicode
|
|
|
|
|
literal syntax, substantially increasing the number of lines of existing
|
|
|
|
|
Python 2 code in Unicode aware applications that will run without modification
|
|
|
|
|
on Python 3.
|
|
|
|
|
|
|
|
|
|
Specifically, the Python 3 definition for string literal prefixes will be
|
|
|
|
|
expanded to allow::
|
|
|
|
|
|
2012-06-20 07:45:58 -04:00
|
|
|
|
"u" | "U"
|
2012-03-04 02:24:43 -05:00
|
|
|
|
|
2012-03-04 09:54:24 -05:00
|
|
|
|
in addition to the currently supported::
|
2012-03-04 02:24:43 -05:00
|
|
|
|
|
|
|
|
|
"r" | "R"
|
|
|
|
|
|
|
|
|
|
The following will all denote ordinary Python 3 strings::
|
|
|
|
|
|
|
|
|
|
'text'
|
|
|
|
|
"text"
|
|
|
|
|
'''text'''
|
|
|
|
|
"""text"""
|
|
|
|
|
u'text'
|
|
|
|
|
u"text"
|
|
|
|
|
u'''text'''
|
|
|
|
|
u"""text"""
|
|
|
|
|
U'text'
|
|
|
|
|
U"text"
|
|
|
|
|
U'''text'''
|
|
|
|
|
U"""text"""
|
|
|
|
|
|
|
|
|
|
No changes are proposed to Python 3's actual Unicode handling, only to the
|
|
|
|
|
acceptable forms for string literals.
|
|
|
|
|
|
2012-02-25 13:52:45 -05:00
|
|
|
|
|
2012-06-20 07:45:58 -04:00
|
|
|
|
Exclusion of "Raw" Unicode Literals
|
|
|
|
|
===================================
|
|
|
|
|
|
|
|
|
|
Python 2 supports a concept of "raw" Unicode literals that don't meet the
|
2012-10-23 05:56:24 -04:00
|
|
|
|
conventional definition of a raw string: ``\uXXXX`` and ``\UXXXXXXXX`` escape
|
2012-06-20 07:45:58 -04:00
|
|
|
|
sequences are still processed by the compiler and converted to the
|
|
|
|
|
appropriate Unicode code points when creating the associated Unicode objects.
|
|
|
|
|
|
|
|
|
|
Python 3 has no corresponding concept - the compiler performs *no*
|
|
|
|
|
preprocessing of the contents of raw string literals. This matches the
|
|
|
|
|
behaviour of 8-bit raw string literals in Python 2.
|
|
|
|
|
|
|
|
|
|
Since such strings are rarely used and would be interpreted differently in
|
|
|
|
|
Python 3 if permitted, it was decided that leaving them out entirely was
|
|
|
|
|
a better choice. Code which uses them will thus still fail immediately on
|
|
|
|
|
Python 3 (with a Syntax Error), rather than potentially producing different
|
|
|
|
|
output.
|
|
|
|
|
|
|
|
|
|
To get equivalent behaviour that will run on both Python 2 and Python 3,
|
|
|
|
|
either an ordinary Unicode literal can be used (with appropriate additional
|
|
|
|
|
escaping within the string), or else string concatenation or string
|
|
|
|
|
formatting can be combine the raw portions of the string with those that
|
|
|
|
|
require the use of Unicode escape sequences.
|
|
|
|
|
|
|
|
|
|
Note that when using ``from __future__ import unicode_literals`` in Python 2,
|
|
|
|
|
the nominally "raw" Unicode string literals will process ``\uXXXX`` and
|
|
|
|
|
``\UXXXXXXXX`` escape sequences, just like Python 2 strings explicitly marked
|
|
|
|
|
with the "raw Unicode" prefix.
|
|
|
|
|
|
|
|
|
|
|
2012-03-04 02:24:43 -05:00
|
|
|
|
Author's Note
|
2012-02-25 13:52:45 -05:00
|
|
|
|
=============
|
|
|
|
|
|
2012-03-05 07:56:18 -05:00
|
|
|
|
This PEP was originally written by Armin Ronacher, and Guido's approval was
|
|
|
|
|
given based on that version.
|
2012-02-25 13:52:45 -05:00
|
|
|
|
|
2012-03-05 07:56:18 -05:00
|
|
|
|
The currently published version has been rewritten by Nick Coghlan to
|
|
|
|
|
include additional historical details and rationale that were taken into
|
|
|
|
|
account when Guido made his decision, but were not explicitly documented in
|
|
|
|
|
Armin's version of the PEP.
|
2012-03-04 02:24:43 -05:00
|
|
|
|
|
|
|
|
|
Readers should be aware that many of the arguments in this PEP are *not*
|
|
|
|
|
technical ones. Instead, they relate heavily to the *social* and *personal*
|
2012-03-05 07:56:18 -05:00
|
|
|
|
aspects of software development.
|
2012-03-04 02:24:43 -05:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Rationale
|
|
|
|
|
=========
|
|
|
|
|
|
|
|
|
|
With the release of a Python 3 compatible version of the Web Services Gateway
|
2022-01-21 06:03:51 -05:00
|
|
|
|
Interface (WSGI) specification (:pep:`3333`) for Python 3.2, many parts of the
|
2012-03-04 02:24:43 -05:00
|
|
|
|
Python web ecosystem have been making a concerted effort to support Python 3
|
|
|
|
|
without adversely affecting their existing developer and user communities.
|
|
|
|
|
|
|
|
|
|
One major item of feedback from key developers in those communities, including
|
|
|
|
|
Chris McDonough (WebOb, Pyramid), Armin Ronacher (Flask, Werkzeug), Jacob
|
|
|
|
|
Kaplan-Moss (Django) and Kenneth Reitz (``requests``) is that the requirement
|
|
|
|
|
to change the spelling of *every* Unicode literal in an application
|
|
|
|
|
(regardless of how that is accomplished) is a key stumbling block for porting
|
|
|
|
|
efforts.
|
2012-02-25 13:52:45 -05:00
|
|
|
|
|
2012-03-04 02:24:43 -05:00
|
|
|
|
In particular, unlike many of the other Python 3 changes, it isn't one that
|
|
|
|
|
framework and library authors can easily handle on behalf of their users. Most
|
|
|
|
|
of those users couldn't care less about the "purity" of the Python language
|
|
|
|
|
specification, they just want their websites and applications to work as well
|
|
|
|
|
as possible.
|
|
|
|
|
|
|
|
|
|
While it is the Python web community that has been most vocal in highlighting
|
|
|
|
|
this concern, it is expected that other highly Unicode aware domains (such as
|
|
|
|
|
GUI development) may run into similar issues as they (and their communities)
|
|
|
|
|
start making concerted efforts to support Python 3.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Common Objections
|
|
|
|
|
=================
|
|
|
|
|
|
|
|
|
|
|
2012-03-04 02:48:49 -05:00
|
|
|
|
Complaint: This PEP may harm adoption of Python 3.2
|
|
|
|
|
---------------------------------------------------
|
2012-03-04 02:24:43 -05:00
|
|
|
|
|
|
|
|
|
This complaint is interesting, as it carries within it a tacit admission that
|
|
|
|
|
this PEP *will* make it easier to port Unicode aware Python 2 applications to
|
|
|
|
|
Python 3.
|
|
|
|
|
|
|
|
|
|
There are many existing Python communities that are prepared to put up with
|
|
|
|
|
the constraints imposed by the existing suite of porting tools, or to update
|
|
|
|
|
their Python 2 code bases sufficiently that the problems are minimised.
|
|
|
|
|
|
|
|
|
|
This PEP is not for those communities. Instead, it is designed specifically to
|
|
|
|
|
help people that *don't* want to put up with those difficulties.
|
|
|
|
|
|
|
|
|
|
However, since the proposal is for a comparatively small tweak to the language
|
2012-03-05 07:56:18 -05:00
|
|
|
|
syntax with no semantic changes, it is feasible to support it as a third
|
|
|
|
|
party import hook. While such an import hook imposes some import time
|
|
|
|
|
overhead, and requires additional steps from each application that needs it
|
|
|
|
|
to get the hook in place, it allows applications that target Python 3.2
|
|
|
|
|
to use libraries and frameworks that would otherwise only run on Python 3.3+
|
|
|
|
|
due to their use of unicode literal prefixes.
|
2012-03-04 02:24:43 -05:00
|
|
|
|
|
2012-03-04 08:02:40 -05:00
|
|
|
|
One such import hook project is Vinay Sajip's ``uprefix`` [4]_.
|
2012-03-04 02:24:43 -05:00
|
|
|
|
|
2012-03-05 07:56:18 -05:00
|
|
|
|
For those that prefer to translate their code in advance rather than
|
|
|
|
|
converting on the fly at import time, Armin Ronacher is working on a hook
|
|
|
|
|
that runs at install time rather than during import [5]_.
|
|
|
|
|
|
|
|
|
|
Combining the two approaches is of course also possible. For example, the
|
|
|
|
|
import hook could be used for rapid edit-test cycles during local
|
|
|
|
|
development, but the install hook for continuous integration tasks and
|
|
|
|
|
deployment on Python 3.2.
|
|
|
|
|
|
|
|
|
|
The approaches described in this section may prove useful, for example, for
|
|
|
|
|
applications that wish to target Python 3 on the Ubuntu 12.04 LTS release,
|
|
|
|
|
which will ship with Python 2.7 and 3.2 as officially supported Python
|
|
|
|
|
versions.
|
2012-03-04 02:24:43 -05:00
|
|
|
|
|
2012-03-04 02:48:49 -05:00
|
|
|
|
Complaint: Python 3 shouldn't be made worse just to support porting from Python 2
|
|
|
|
|
---------------------------------------------------------------------------------
|
2012-03-04 02:24:43 -05:00
|
|
|
|
|
|
|
|
|
This is indeed one of the key design principles of Python 3. However, one of
|
|
|
|
|
the key design principles of Python as a whole is that "practicality beats
|
|
|
|
|
purity". If we're going to impose a significant burden on third party
|
|
|
|
|
developers, we should have a solid rationale for doing so.
|
|
|
|
|
|
|
|
|
|
In most cases, the rationale for backwards incompatible Python 3 changes are
|
2012-03-05 07:56:18 -05:00
|
|
|
|
either to improve code correctness (for example, stricter default separation
|
|
|
|
|
of binary and text data and integer division upgrading to floats when
|
|
|
|
|
necessary), reduce typical memory usage (for example, increased usage of
|
|
|
|
|
iterators and views over concrete lists), or to remove distracting nuisances
|
|
|
|
|
that make Python code harder to read without increasing its expressiveness
|
|
|
|
|
(for example, the comma based syntax for naming caught exceptions). Changes
|
|
|
|
|
backed by such reasoning are *not* going to be reverted, regardless of
|
|
|
|
|
objections from Python 2 developers attempting to make the transition to
|
|
|
|
|
Python 3.
|
2012-03-04 02:24:43 -05:00
|
|
|
|
|
|
|
|
|
In many cases, Python 2 offered two ways of doing things for historical reasons.
|
|
|
|
|
For example, inequality could be tested with both ``!=`` and ``<>`` and integer
|
|
|
|
|
literals could be specified with an optional ``L`` suffix. Such redundancies
|
|
|
|
|
have been eliminated in Python 3, which reduces the overall size of the
|
|
|
|
|
language and improves consistency across developers.
|
|
|
|
|
|
|
|
|
|
In the original Python 3 design (up to and including Python 3.2), the explicit
|
|
|
|
|
prefix syntax for unicode literals was deemed to fall into this category, as it
|
|
|
|
|
is completely unnecessary in Python 3. However, the difference between those
|
|
|
|
|
other cases and unicode literals is that the unicode literal prefix is *not*
|
|
|
|
|
redundant in Python 2 code: it is a programmatically significant distinction
|
|
|
|
|
that needs to be preserved in some fashion to avoid losing information.
|
|
|
|
|
|
|
|
|
|
While porting tools were created to help with the transition (see next section)
|
|
|
|
|
it still creates an additional burden on heavy users of unicode strings in
|
|
|
|
|
Python 2, solely so that future developers learning Python 3 don't need to be
|
|
|
|
|
told "For historical reasons, string literals may have an optional ``u`` or
|
|
|
|
|
``U`` prefix. Never use this yourselves, it's just there to help with porting
|
|
|
|
|
from an earlier version of the language."
|
|
|
|
|
|
|
|
|
|
Plenty of students learning Python 2 received similar warnings regarding string
|
|
|
|
|
exceptions without being confused or irreparably stunted in their growth as
|
|
|
|
|
Python developers. It will be the same with this feature.
|
|
|
|
|
|
|
|
|
|
This point is further reinforced by the fact that Python 3 *still* allows the
|
|
|
|
|
uppercase variants of the ``B`` and ``R`` prefixes for bytes literals and raw
|
|
|
|
|
bytes and string literals. If the potential for confusion due to string prefix
|
|
|
|
|
variants is that significant, where was the outcry asking that these
|
2012-03-04 02:48:49 -05:00
|
|
|
|
redundant prefixes be removed along with all the other redundancies that were
|
2012-03-04 02:24:43 -05:00
|
|
|
|
eliminated in Python 3?
|
|
|
|
|
|
|
|
|
|
Just as support for string exceptions was eliminated from Python 2 using the
|
|
|
|
|
normal deprecation process, support for redundant string prefix characters
|
2012-03-05 07:56:18 -05:00
|
|
|
|
(specifically, ``B``, ``R``, ``u``, ``U``) may eventually be eliminated
|
|
|
|
|
from Python 3, regardless of the current acceptance of this PEP. However,
|
|
|
|
|
such a change will likely only occur once third party libraries supporting
|
|
|
|
|
Python 2.7 is about as common as libraries supporting Python 2.2 or 2.3 is
|
|
|
|
|
today.
|
2012-03-04 02:24:43 -05:00
|
|
|
|
|
|
|
|
|
|
2012-03-04 02:48:49 -05:00
|
|
|
|
Complaint: The WSGI "native strings" concept is an ugly hack
|
|
|
|
|
------------------------------------------------------------
|
2012-03-04 02:24:43 -05:00
|
|
|
|
|
|
|
|
|
One reason the removal of unicode literals has provoked such concern amongst
|
|
|
|
|
the web development community is that the updated WSGI specification had to
|
|
|
|
|
make a few compromises to minimise the disruption for existing web servers
|
|
|
|
|
that provide a WSGI-compatible interface (this was deemed necessary in order
|
|
|
|
|
to make the updated standard a viable target for web application authors and
|
|
|
|
|
web framework developers).
|
|
|
|
|
|
|
|
|
|
One of those compromises is the concept of a "native string". WSGI defines
|
|
|
|
|
three different kinds of string:
|
|
|
|
|
|
|
|
|
|
* text strings: handled as ``unicode`` in Python 2 and ``str`` in Python 3
|
|
|
|
|
* native strings: handled as ``str`` in both Python 2 and Python 3
|
|
|
|
|
* binary data: handled as ``str`` in Python 2 and ``bytes`` in Python 3
|
|
|
|
|
|
2012-03-05 07:56:18 -05:00
|
|
|
|
Some developers consider WSGI's "native strings" to be an ugly hack, as they
|
|
|
|
|
are *explicitly* documented as being used solely for ``latin-1`` decoded
|
|
|
|
|
"text", regardless of the actual encoding of the underlying data. Using this
|
|
|
|
|
approach bypasses many of the updates to Python 3's data model that are
|
|
|
|
|
designed to encourage correct handling of text encodings. However, it
|
|
|
|
|
generally works due to the specific details of the problem domain - web server
|
|
|
|
|
and web framework developers are some of the individuals *most* aware of how
|
|
|
|
|
blurry the line can get between binary data and text when working with HTTP
|
|
|
|
|
and related protocols, and how important it is to understand the implications
|
|
|
|
|
of the encodings in use when manipulating encoded text data. At the
|
|
|
|
|
*application* level most of these details are hidden from the developer by
|
|
|
|
|
the web frameworks and support libraries (both in Python 2 *and* in Python 3).
|
|
|
|
|
|
|
|
|
|
In practice, native strings are a useful concept because there are some APIs
|
|
|
|
|
(both in the standard library and in third party frameworks and packages) and
|
|
|
|
|
some internal interpreter details that are designed primarily to work with
|
2012-03-05 08:12:42 -05:00
|
|
|
|
``str``. These components often don't support ``unicode`` in Python 2
|
2012-03-05 07:56:18 -05:00
|
|
|
|
or ``bytes`` in Python 3, or, if they do, require additional encoding details
|
2012-03-05 08:12:42 -05:00
|
|
|
|
and/or impose constraints that don't apply to the ``str`` variants.
|
2012-03-05 07:56:18 -05:00
|
|
|
|
|
2012-03-05 08:12:42 -05:00
|
|
|
|
Some example of interfaces that are best handled by using actual ``str``
|
|
|
|
|
instances are:
|
2012-03-04 02:24:43 -05:00
|
|
|
|
|
2012-03-05 08:12:42 -05:00
|
|
|
|
* Python identifiers (as attributes, dict keys, class names, module names,
|
|
|
|
|
import references, etc)
|
2012-03-04 02:24:43 -05:00
|
|
|
|
* URLs for the most part as well as HTTP headers in urllib/http servers
|
|
|
|
|
* WSGI environment keys and CGI-inherited values
|
|
|
|
|
* Python source code for dynamic compilation and AST hacks
|
|
|
|
|
* Exception messages
|
|
|
|
|
* ``__repr__`` return value
|
|
|
|
|
* preferred filesystem paths
|
|
|
|
|
* preferred OS environment
|
|
|
|
|
|
|
|
|
|
In Python 2.6 and 2.7, these distinctions are most naturally expressed as
|
|
|
|
|
follows:
|
|
|
|
|
|
2012-03-05 07:56:18 -05:00
|
|
|
|
* ``u""``: text string (``unicode``)
|
|
|
|
|
* ``""``: native string (``str``)
|
|
|
|
|
* ``b""``: binary data (``str``, also aliased as ``bytes``)
|
2012-03-04 02:24:43 -05:00
|
|
|
|
|
2012-03-05 07:56:18 -05:00
|
|
|
|
In Python 3, the ``latin-1`` decoded native strings are not distinguished
|
|
|
|
|
from any other text strings:
|
2012-03-04 02:24:43 -05:00
|
|
|
|
|
2012-03-05 07:56:18 -05:00
|
|
|
|
* ``""``: text string (``str``)
|
|
|
|
|
* ``""``: native string (``str``)
|
|
|
|
|
* ``b""``: binary data (``bytes``)
|
2012-03-04 02:24:43 -05:00
|
|
|
|
|
|
|
|
|
If ``from __future__ import unicode_literals`` is used to modify the behaviour
|
|
|
|
|
of Python 2, then, along with an appropriate definition of ``n()``, the
|
|
|
|
|
distinction can be expressed as:
|
|
|
|
|
|
|
|
|
|
* ``""``: text string
|
|
|
|
|
* ``n("")``: native string
|
|
|
|
|
* ``b""``: binary data
|
|
|
|
|
|
|
|
|
|
(While ``n=str`` works for simple cases, it can sometimes have problems
|
|
|
|
|
due to non-ASCII source encodings)
|
|
|
|
|
|
|
|
|
|
In the common subset of Python 2 and Python 3 (with appropriate
|
|
|
|
|
specification of a source encoding and definitions of the ``u()`` and ``b()``
|
|
|
|
|
helper functions), they can be expressed as:
|
|
|
|
|
|
|
|
|
|
* ``u("")``: text string
|
|
|
|
|
* ``""``: native string
|
|
|
|
|
* ``b("")``: binary data
|
|
|
|
|
|
|
|
|
|
That last approach is the only variant that supports Python 2.5 and earlier.
|
|
|
|
|
|
|
|
|
|
Of all the alternatives, the format currently supported in Python 2.6 and 2.7
|
2012-03-04 02:48:49 -05:00
|
|
|
|
is by far the cleanest approach that clearly distinguishes the three desired
|
|
|
|
|
kinds of behaviour. With this PEP, that format will also be supported in
|
2012-03-05 07:56:18 -05:00
|
|
|
|
Python 3.3+. It will also be supported in Python 3.1 and 3.2 through the use
|
|
|
|
|
of import and install hooks. While it is significantly less likely, it is
|
|
|
|
|
also conceivable that the hooks could be adapted to allow the use of the
|
|
|
|
|
``b`` prefix on Python 2.5.
|
2012-03-04 02:24:43 -05:00
|
|
|
|
|
|
|
|
|
|
2012-03-04 02:48:49 -05:00
|
|
|
|
Complaint: The existing tools should be good enough for everyone
|
|
|
|
|
----------------------------------------------------------------
|
2012-03-04 02:24:43 -05:00
|
|
|
|
|
2012-10-23 05:56:24 -04:00
|
|
|
|
A commonly expressed sentiment from developers that have already successfully
|
2012-03-04 02:24:43 -05:00
|
|
|
|
ported applications to Python 3 is along the lines of "if you think it's hard,
|
|
|
|
|
you're doing it wrong" or "it's not that hard, just try it!". While it is no
|
|
|
|
|
doubt unintentional, these responses all have the effect of telling the
|
|
|
|
|
people that are pointing out inadequacies in the current porting toolset
|
|
|
|
|
"there's nothing wrong with the porting tools, you just suck and don't know
|
|
|
|
|
how to use them properly".
|
|
|
|
|
|
|
|
|
|
These responses are a case of completely missing the point of what people are
|
2012-06-20 07:45:58 -04:00
|
|
|
|
complaining about. The feedback that resulted in this PEP isn't due to people
|
|
|
|
|
complaining that ports aren't possible. Instead, the feedback is coming from
|
2012-10-23 05:56:24 -04:00
|
|
|
|
people that have successfully *completed* ports and are objecting that they
|
2012-03-04 02:24:43 -05:00
|
|
|
|
found the experience thoroughly *unpleasant* for the class of application that
|
|
|
|
|
they needed to port (specifically, Unicode aware web frameworks and support
|
|
|
|
|
libraries).
|
|
|
|
|
|
|
|
|
|
This is a subjective appraisal, and it's the reason why the Python 3
|
|
|
|
|
porting tools ecosystem is a case where the "one obvious way to do it"
|
|
|
|
|
philosophy emphatically does *not* apply. While it was originally intended that
|
|
|
|
|
"develop in Python 2, convert with ``2to3``, test both" would be the standard
|
|
|
|
|
way to develop for both versions in parallel, in practice, the needs of
|
|
|
|
|
different projects and developer communities have proven to be sufficiently
|
|
|
|
|
diverse that a variety of approaches have been devised, allowing each group
|
|
|
|
|
to select an approach that best fits their needs.
|
|
|
|
|
|
|
|
|
|
Lennart Regebro has produced an excellent overview of the available migration
|
|
|
|
|
strategies [2]_, and a similar review is provided in the official porting
|
|
|
|
|
guide [3]_. (Note that the official guidance has softened to "it depends on
|
|
|
|
|
your specific situation" since Lennart wrote his overview).
|
|
|
|
|
|
|
|
|
|
However, both of those guides are written from the founding assumption that
|
|
|
|
|
all of the developers involved are *already* committed to the idea of
|
|
|
|
|
supporting Python 3. They make no allowance for the *social* aspects of such a
|
|
|
|
|
change when you're interacting with a user base that may not be especially
|
|
|
|
|
tolerant of disruptions without a clear benefit, or are trying to persuade
|
|
|
|
|
Python 2 focused upstream developers to accept patches that are solely about
|
|
|
|
|
improving Python 3 forward compatibility.
|
|
|
|
|
|
|
|
|
|
With the current porting toolset, *every* migration strategy will result in
|
|
|
|
|
changes to *every* Unicode literal in a project. No exceptions. They will
|
|
|
|
|
be converted to either an unprefixed string literal (if the project decides to
|
|
|
|
|
adopt the ``unicode_literals`` import) or else to a converter call like
|
|
|
|
|
``u("text")``.
|
|
|
|
|
|
|
|
|
|
If the ``unicode_literals`` import approach is employed, but is not adopted
|
|
|
|
|
across the entire project at the same time, then the meaning of a bare string
|
|
|
|
|
literal may become annoyingly ambiguous. This problem can be particularly
|
|
|
|
|
pernicious for *aggregated* software, like a Django site - in such a situation,
|
2012-03-04 02:56:12 -05:00
|
|
|
|
some files may end up using the ``unicode_literals`` import and others may not,
|
2012-03-04 02:24:43 -05:00
|
|
|
|
creating definite potential for confusion.
|
|
|
|
|
|
|
|
|
|
While these problems are clearly solvable at a technical level, they're a
|
|
|
|
|
completely unnecessary distraction at the social level. Developer energy should
|
|
|
|
|
be reserved for addressing *real* technical difficulties associated with the
|
|
|
|
|
Python 3 transition (like distinguishing their 8-bit text strings from their
|
|
|
|
|
binary data). They shouldn't be punished with additional code changes (even
|
|
|
|
|
automated ones) solely due to the fact that they have *already* explicitly
|
|
|
|
|
identified their Unicode strings in Python 2.
|
|
|
|
|
|
|
|
|
|
Armin Ronacher has created an experimental extension to 2to3 which only
|
|
|
|
|
modernizes Python code to the extent that it runs on Python 2.7 or later with
|
2012-03-04 02:56:12 -05:00
|
|
|
|
support from the cross-version compatibility ``six`` library. This tool is
|
|
|
|
|
available as ``python-modernize`` [1]_. Currently, the deltas generated by
|
|
|
|
|
this tool will affect every Unicode literal in the converted source. This
|
|
|
|
|
will create legitimate concerns amongst upstream developers asked to accept
|
|
|
|
|
such changes, and amongst framework *users* being asked to change their
|
|
|
|
|
applications.
|
2012-03-04 02:24:43 -05:00
|
|
|
|
|
|
|
|
|
However, by eliminating the noise from changes to the Unicode literal syntax,
|
2012-03-04 02:56:12 -05:00
|
|
|
|
many projects could be cleanly and (comparatively) non-controversially made
|
2012-03-04 02:24:43 -05:00
|
|
|
|
forward compatible with Python 3.3+ just by running ``python-modernize`` and
|
|
|
|
|
applying the recommended changes.
|
2012-02-25 13:52:45 -05:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
References
|
|
|
|
|
==========
|
|
|
|
|
|
|
|
|
|
.. [1] Python-Modernize
|
|
|
|
|
(http://github.com/mitsuhiko/python-modernize)
|
|
|
|
|
|
2012-03-04 02:24:43 -05:00
|
|
|
|
.. [2] Porting to Python 3: Migration Strategies
|
|
|
|
|
(http://python3porting.com/strategies.html)
|
|
|
|
|
|
|
|
|
|
.. [3] Porting Python 2 Code to Python 3
|
|
|
|
|
(http://docs.python.org/howto/pyporting.html)
|
2012-02-25 13:52:45 -05:00
|
|
|
|
|
2012-03-04 08:02:40 -05:00
|
|
|
|
.. [4] uprefix import hook project
|
|
|
|
|
(https://bitbucket.org/vinay.sajip/uprefix)
|
|
|
|
|
|
2012-03-05 07:56:18 -05:00
|
|
|
|
.. [5] install hook to remove unicode string prefix characters
|
|
|
|
|
(https://github.com/mitsuhiko/unicode-literals-pep/tree/master/install-hook)
|
2012-03-04 08:02:40 -05:00
|
|
|
|
|
2012-02-25 13:52:45 -05:00
|
|
|
|
Copyright
|
|
|
|
|
=========
|
|
|
|
|
|
|
|
|
|
This document has been placed in the public domain.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
..
|
|
|
|
|
Local Variables:
|
|
|
|
|
mode: indented-text
|
|
|
|
|
indent-tabs-mode: nil
|
|
|
|
|
sentence-end-double-space: t
|
|
|
|
|
fill-column: 70
|
|
|
|
|
End:
|