python-peps/pep-0414.txt

PEP: 414
Title: Explicit Unicode Literal for Python 3.3
Version: $Revision$
Last-Modified: $Date$
Author: Armin Ronacher <armin.ronacher@active-4.com>,
        Nick Coghlan <ncoghlan@gmail.com>
Status: Final
Type: Standards Track
Content-Type: text/x-rst
Created: 15-Feb-2012
Python-Version: 3.3
Post-History: 28-Feb-2012, 04-Mar-2012
Resolution: https://mail.python.org/pipermail/python-dev/2012-February/116995.html


Abstract
========

This document proposes the reintegration of an explicit unicode literal
from Python 2.x to the Python 3.x language specification, in order to
reduce the volume of changes needed when porting Unicode-aware
Python 2 applications to Python 3.


BDFL Pronouncement
==================

This PEP has been formally accepted for Python 3.3:

    I'm accepting the PEP. It's about as harmless as they come. Make it so.


Proposal
========

This PEP proposes that Python 3.3 restore support for Python 2's Unicode
literal syntax, substantially increasing the number of lines of existing
Python 2 code in Unicode aware applications that will run without modification
on Python 3.

Specifically, the Python 3 definition for string literal prefixes will be
expanded to allow::

    "u" | "U"

in addition to the currently supported::

    "r" | "R"

The following will all denote ordinary Python 3 strings::

    'text'
    "text"
    '''text'''
    """text"""
    u'text'
    u"text"
    u'''text'''
    u"""text"""
    U'text'
    U"text"
    U'''text'''
    U"""text"""

No changes are proposed to Python 3's actual Unicode handling, only to the
acceptable forms for string literals.


Exclusion of "Raw" Unicode Literals
===================================

Python 2 supports a concept of "raw" Unicode literals that don't meet the
conventional definition of a raw string: ``\uXXXX`` and ``\UXXXXXXXX`` escape
sequences are still processed by the compiler and converted to the
appropriate Unicode code points when creating the associated Unicode objects.

Python 3 has no corresponding concept - the compiler performs *no*
preprocessing of the contents of raw string literals. This matches the
behaviour of 8-bit raw string literals in Python 2.

Since such strings are rarely used and would be interpreted differently in
Python 3 if permitted, it was decided that leaving them out entirely was
a better choice. Code which uses them will thus still fail immediately on
Python 3 (with a Syntax Error), rather than potentially producing different
output.

To get equivalent behaviour that will run on both Python 2 and Python 3,
either an ordinary Unicode literal can be used (with appropriate additional
escaping within the string), or else string concatenation or string
formatting can be combine the raw portions of the string with those that
require the use of Unicode escape sequences.

Note that when using ``from __future__ import unicode_literals`` in Python 2,
the nominally "raw" Unicode string literals will process ``\uXXXX`` and
``\UXXXXXXXX`` escape sequences, just like Python 2 strings explicitly marked
with the "raw Unicode" prefix.


Author's Note
=============

This PEP was originally written by Armin Ronacher, and Guido's approval was
given based on that version.

The currently published version has been rewritten by Nick Coghlan to
include additional historical details and rationale that were taken into
account when Guido made his decision, but were not explicitly documented in
Armin's version of the PEP.

Readers should be aware that many of the arguments in this PEP are *not*
technical ones. Instead, they relate heavily to the *social* and *personal*
aspects of software development.


Rationale
=========

With the release of a Python 3 compatible version of the Web Services Gateway
Interface (WSGI) specification (:pep:`3333`) for Python 3.2, many parts of the
Python web ecosystem have been making a concerted effort to support Python 3
without adversely affecting their existing developer and user communities.

One major item of feedback from key developers in those communities, including
Chris McDonough (WebOb, Pyramid), Armin Ronacher (Flask, Werkzeug), Jacob
Kaplan-Moss (Django) and Kenneth Reitz (``requests``) is that the requirement
to change the spelling of *every* Unicode literal in an application
(regardless of how that is accomplished) is a key stumbling block for porting
efforts.

In particular, unlike many of the other Python 3 changes, it isn't one that
framework and library authors can easily handle on behalf of their users. Most
of those users couldn't care less about the "purity" of the Python language
specification, they just want their websites and applications to work as well
as possible.

While it is the Python web community that has been most vocal in highlighting
this concern, it is expected that other highly Unicode aware domains (such as
GUI development) may run into similar issues as they (and their communities)
start making concerted efforts to support Python 3.


Common Objections
=================


Complaint: This PEP may harm adoption of Python 3.2
---------------------------------------------------

This complaint is interesting, as it carries within it a tacit admission that
this PEP *will* make it easier to port Unicode aware Python 2 applications to
Python 3.

There are many existing Python communities that are prepared to put up with
the constraints imposed by the existing suite of porting tools, or to update
their Python 2 code bases sufficiently that the problems are minimised.

This PEP is not for those communities. Instead, it is designed specifically to
help people that *don't* want to put up with those difficulties.

However, since the proposal is for a comparatively small tweak to the language
syntax with no semantic changes, it is feasible to support it as a third
party import hook. While such an import hook imposes some import time
overhead, and requires additional steps from each application that needs it
to get the hook in place, it allows applications that target Python 3.2
to use libraries and frameworks that would otherwise only run on Python 3.3+
due to their use of unicode literal prefixes.

One such import hook project is Vinay Sajip's ``uprefix`` [4]_.

For those that prefer to translate their code in advance rather than
converting on the fly at import time, Armin Ronacher is working on a hook
that runs at install time rather than during import [5]_.

Combining the two approaches is of course also possible. For example, the
import hook could be used for rapid edit-test cycles during local
development, but the install hook for continuous integration tasks and
deployment on Python 3.2.

The approaches described in this section may prove useful, for example, for
applications that wish to target Python 3 on the Ubuntu 12.04 LTS release,
which will ship with Python 2.7 and 3.2 as officially supported Python
versions.

Complaint: Python 3 shouldn't be made worse just to support porting from Python 2
---------------------------------------------------------------------------------

This is indeed one of the key design principles of Python 3. However, one of
the key design principles of Python as a whole is that "practicality beats
purity". If we're going to impose a significant burden on third party
developers, we should have a solid rationale for doing so.

In most cases, the rationale for backwards incompatible Python 3 changes are
either to improve code correctness (for example, stricter default separation
of binary and text data and integer division upgrading to floats when
necessary), reduce typical memory usage (for example, increased usage of
iterators and views over concrete lists), or to remove distracting nuisances
that make Python code harder to read without increasing its expressiveness
(for example, the comma based syntax for naming caught exceptions). Changes
backed by such reasoning are *not* going to be reverted, regardless of
objections from Python 2 developers attempting to make the transition to
Python 3.

In many cases, Python 2 offered two ways of doing things for historical reasons.
For example, inequality could be tested with both ``!=`` and ``<>`` and integer
literals could be specified with an optional ``L`` suffix. Such redundancies
have been eliminated in Python 3, which reduces the overall size of the
language and improves consistency across developers.

In the original Python 3 design (up to and including Python 3.2), the explicit
prefix syntax for unicode literals was deemed to fall into this category, as it
is completely unnecessary in Python 3. However, the difference between those
other cases and unicode literals is that the unicode literal prefix is *not*
redundant in Python 2 code: it is a programmatically significant distinction
that needs to be preserved in some fashion to avoid losing information.

While porting tools were created to help with the transition (see next section)
it still creates an additional burden on heavy users of unicode strings in
Python 2, solely so that future developers learning Python 3 don't need to be
told "For historical reasons, string literals may have an optional ``u`` or
``U`` prefix. Never use this yourselves, it's just there to help with porting
from an earlier version of the language."

Plenty of students learning Python 2 received similar warnings regarding string
exceptions without being confused or irreparably stunted in their growth as
Python developers. It will be the same with this feature.

This point is further reinforced by the fact that Python 3 *still* allows the
uppercase variants of the ``B`` and ``R`` prefixes for bytes literals and raw
bytes and string literals. If the potential for confusion due to string prefix
variants is that significant, where was the outcry asking that these
redundant prefixes be removed along with all the other redundancies that were
eliminated in Python 3?

Just as support for string exceptions was eliminated from Python 2 using the
normal deprecation process, support for redundant string prefix characters
(specifically, ``B``, ``R``, ``u``, ``U``) may eventually be eliminated
from Python 3, regardless of the current acceptance of this PEP. However,
such a change will likely only occur once third party libraries supporting
Python 2.7 is about as common as libraries supporting Python 2.2 or 2.3 is
today.


Complaint: The WSGI "native strings" concept is an ugly hack
------------------------------------------------------------

One reason the removal of unicode literals has provoked such concern amongst
the web development community is that the updated WSGI specification had to
make a few compromises to minimise the disruption for existing web servers
that provide a WSGI-compatible interface (this was deemed necessary in order
to make the updated standard a viable target for web application authors and
web framework developers).

One of those compromises is the concept of a "native string". WSGI defines
three different kinds of string:

* text strings: handled as ``unicode`` in Python 2 and ``str`` in Python 3
* native strings: handled as ``str`` in both Python 2 and Python 3
* binary data: handled as ``str`` in Python 2 and ``bytes`` in Python 3

Some developers consider WSGI's "native strings" to be an ugly hack, as they
are *explicitly* documented as being used solely for ``latin-1`` decoded
"text", regardless of the actual encoding of the underlying data. Using this
approach bypasses many of the updates to Python 3's data model that are
designed to encourage correct handling of text encodings. However, it
generally works due to the specific details of the problem domain - web server
and web framework developers are some of the individuals *most* aware of how
blurry the line can get between binary data and text when working with HTTP
and related protocols, and how important it is to understand the implications
of the encodings in use when manipulating encoded text data. At the
*application* level most of these details are hidden from the developer by
the web frameworks and support libraries (both in Python 2 *and* in Python 3).

In practice, native strings are a useful concept because there are some APIs
(both in the standard library and in third party frameworks and packages) and
some internal interpreter details that are designed primarily to work with
``str``. These components often don't support ``unicode`` in Python 2
or ``bytes`` in Python 3, or, if they do, require additional encoding details
and/or impose constraints that don't apply to the ``str`` variants.

Some example of interfaces that are best handled by using actual ``str``
instances are:

* Python identifiers (as attributes, dict keys, class names, module names,
  import references, etc)
* URLs for the most part as well as HTTP headers in urllib/http servers
* WSGI environment keys and CGI-inherited values
* Python source code for dynamic compilation and AST hacks
* Exception messages
* ``__repr__`` return value
* preferred filesystem paths
* preferred OS environment

In Python 2.6 and 2.7, these distinctions are most naturally expressed as
follows:

* ``u""``: text string (``unicode``)
* ``""``: native string (``str``)
* ``b""``: binary data (``str``, also aliased as ``bytes``)

In Python 3, the ``latin-1`` decoded native strings are not distinguished
from any other text strings:

* ``""``: text string (``str``)
* ``""``: native string (``str``)
* ``b""``: binary data (``bytes``)

If ``from __future__ import unicode_literals`` is used to modify the behaviour
of Python 2, then, along with an appropriate definition of ``n()``, the
distinction can be expressed as:

* ``""``: text string
* ``n("")``: native string
* ``b""``: binary data

(While ``n=str`` works for simple cases, it can sometimes have problems
due to non-ASCII source encodings)

In the common subset of Python 2 and Python 3 (with appropriate
specification of a source encoding and definitions of the ``u()`` and ``b()``
helper functions), they can be expressed as:

* ``u("")``: text string
* ``""``: native string
* ``b("")``: binary data

That last approach is the only variant that supports Python 2.5 and earlier.

Of all the alternatives, the format currently supported in Python 2.6 and 2.7
is by far the cleanest approach that clearly distinguishes the three desired
kinds of behaviour. With this PEP, that format will also be supported in
Python 3.3+. It will also be supported in Python 3.1 and 3.2 through the use
of import and install hooks. While it is significantly less likely, it is
also conceivable that the hooks could be adapted to allow the use of the
``b`` prefix on Python 2.5.


Complaint: The existing tools should be good enough for everyone
----------------------------------------------------------------

A commonly expressed sentiment from developers that have already successfully
ported applications to Python 3 is along the lines of "if you think it's hard,
you're doing it wrong" or "it's not that hard, just try it!". While it is no
doubt unintentional, these responses all have the effect of telling the
people that are pointing out inadequacies in the current porting toolset
"there's nothing wrong with the porting tools, you just suck and don't know
how to use them properly".

These responses are a case of completely missing the point of what people are
complaining about. The feedback that resulted in this PEP isn't due to people
complaining that ports aren't possible. Instead, the feedback is coming from
people that have successfully *completed* ports and are objecting that they
found the experience thoroughly *unpleasant* for the class of application that
they needed to port (specifically, Unicode aware web frameworks and support
libraries).

This is a subjective appraisal, and it's the reason why the Python 3
porting tools ecosystem is a case where the "one obvious way to do it"
philosophy emphatically does *not* apply. While it was originally intended that
"develop in Python 2, convert with ``2to3``, test both" would be the standard
way to develop for both versions in parallel, in practice, the needs of
different projects and developer communities have proven to be sufficiently
diverse that a variety of approaches have been devised, allowing each group
to select an approach that best fits their needs.

Lennart Regebro has produced an excellent overview of the available migration
strategies [2]_, and a similar review is provided in the official porting
guide [3]_. (Note that the official guidance has softened to "it depends on
your specific situation" since Lennart wrote his overview).

However, both of those guides are written from the founding assumption that
all of the developers involved are *already* committed to the idea of
supporting Python 3. They make no allowance for the *social* aspects of such a
change when you're interacting with a user base that may not be especially
tolerant of disruptions without a clear benefit, or are trying to persuade
Python 2 focused upstream developers to accept patches that are solely about
improving Python 3 forward compatibility.

With the current porting toolset, *every* migration strategy will result in
changes to *every* Unicode literal in a project. No exceptions. They will
be converted to either an unprefixed string literal (if the project decides to
adopt the ``unicode_literals`` import) or else to a converter call like
``u("text")``.

If the ``unicode_literals`` import approach is employed, but is not adopted
across the entire project at the same time, then the meaning of a bare string
literal may become annoyingly ambiguous. This problem can be particularly
pernicious for *aggregated* software, like a Django site - in such a situation,
some files may end up using the ``unicode_literals`` import and others may not,
creating definite potential for confusion.

While these problems are clearly solvable at a technical level, they're a
completely unnecessary distraction at the social level. Developer energy should
be reserved for addressing *real* technical difficulties associated with the
Python 3 transition (like distinguishing their 8-bit text strings from their
binary data). They shouldn't be punished with additional code changes (even
automated ones) solely due to the fact that they have *already* explicitly
identified their Unicode strings in Python 2.

Armin Ronacher has created an experimental extension to 2to3 which only
modernizes Python code to the extent that it runs on Python 2.7 or later with
support from the cross-version compatibility ``six`` library. This tool is
available as ``python-modernize`` [1]_. Currently, the deltas generated by
this tool will affect every Unicode literal in the converted source. This
will create legitimate concerns amongst upstream developers asked to accept
such changes, and amongst framework *users* being asked to change their
applications.

However, by eliminating the noise from changes to the Unicode literal syntax,
many projects could be cleanly and (comparatively) non-controversially made
forward compatible with Python 3.3+ just by running ``python-modernize`` and
applying the recommended changes.


References
==========

.. [1] Python-Modernize
   (http://github.com/mitsuhiko/python-modernize)

.. [2] Porting to Python 3: Migration Strategies
   (http://python3porting.com/strategies.html)

.. [3] Porting Python 2 Code to Python 3
   (http://docs.python.org/howto/pyporting.html)

.. [4] uprefix import hook project
   (https://bitbucket.org/vinay.sajip/uprefix)

.. [5] install hook to remove unicode string prefix characters
   (https://github.com/mitsuhiko/unicode-literals-pep/tree/master/install-hook)

Copyright
=========

This document has been placed in the public domain.


..
   Local Variables:
   mode: indented-text
   indent-tabs-mode: nil
   sentence-end-double-space: t
   fill-column: 70
   End:
-												Added unicode literals pepe

											
										
										
											2012-02-25 13:52:45 -05:00
+								PEP: 414
 								Title: Explicit Unicode Literal for Python 3.3
 								Version: $Revision$
 								Last-Modified: $Date$
-												Rewrite PEP 414 to be less passionate in its tone and better address the common objections

											
										
										
											2012-03-04 02:24:43 -05:00
+								Author: Armin Ronacher <armin.ronacher@active-4.com>,
 								        Nick Coghlan <ncoghlan@gmail.com>
-												Mark PEP 414 as Final and incorporate feedback (both public and private)

											
										
										
											2012-03-05 07:56:18 -05:00
+								Status: Final
-												Added unicode literals pepe

											
										
										
											2012-02-25 13:52:45 -05:00
+								Type: Standards Track
 								Content-Type: text/x-rst
 								Created: 15-Feb-2012
-												Lint: Add missing Python-Version header in PEPs (#2757)


											
										
										
											2022-08-24 18:40:18 -04:00
+								Python-Version: 3.3
-												Update post history for PEP 414

											
										
										
											2012-03-04 02:58:04 -05:00
+								Post-History: 28-Feb-2012, 04-Mar-2012
-												promote m.p.o links to https

											
										
										
											2017-06-11 15:02:39 -04:00
+								Resolution: https://mail.python.org/pipermail/python-dev/2012-February/116995.html
-												Added unicode literals pepe

											
										
										
											2012-02-25 13:52:45 -05:00
 								Abstract
 								========
 								This document proposes the reintegration of an explicit unicode literal
 								from Python 2.x to the Python 3.x language specification, in order to
-												Rewrite PEP 414 to be less passionate in its tone and better address the common objections

											
										
										
											2012-03-04 02:24:43 -05:00
+								reduce the volume of changes needed when porting Unicode-aware
 								Python 2 applications to Python 3.
-												Added unicode literals pepe

											
										
										
											2012-02-25 13:52:45 -05:00
-												pep-0414 -> accepted and added two things brought up on the mailinglist

											
										
										
											2012-02-28 03:18:52 -05:00
+								BDFL Pronouncement
 								==================
-												Rewrite PEP 414 to be less passionate in its tone and better address the common objections

											
										
										
											2012-03-04 02:24:43 -05:00
+								This PEP has been formally accepted for Python 3.3:
 								    I'm accepting the PEP. It's about as harmless as they come. Make it so.
 								Proposal
 								========
 								This PEP proposes that Python 3.3 restore support for Python 2's Unicode
 								literal syntax, substantially increasing the number of lines of existing
 								Python 2 code in Unicode aware applications that will run without modification
 								on Python 3.
 								Specifically, the Python 3 definition for string literal prefixes will be
 								expanded to allow::
-												Update PEP 414 to record the exclusion of raw Unicode literals from the scope

											
										
										
											2012-06-20 07:45:58 -04:00
+								    "u" | "U"
-												Rewrite PEP 414 to be less passionate in its tone and better address the common objections

											
										
										
											2012-03-04 02:24:43 -05:00
-												#14914: fix typo.

											
										
										
											2012-03-04 09:54:24 -05:00
+								in addition to the currently supported::
-												Rewrite PEP 414 to be less passionate in its tone and better address the common objections

											
										
										
											2012-03-04 02:24:43 -05:00
 								    "r" | "R"
 								The following will all denote ordinary Python 3 strings::
 								    'text'
 								    "text"
 								    '''text'''
 								    """text"""
 								    u'text'
 								    u"text"
 								    u'''text'''
 								    u"""text"""
 								    U'text'
 								    U"text"
 								    U'''text'''
 								    U"""text"""
 								No changes are proposed to Python 3's actual Unicode handling, only to the
 								acceptable forms for string literals.
-												Added unicode literals pepe

											
										
										
											2012-02-25 13:52:45 -05:00
-												Update PEP 414 to record the exclusion of raw Unicode literals from the scope

											
										
										
											2012-06-20 07:45:58 -04:00
+								Exclusion of "Raw" Unicode Literals
 								===================================
 								Python 2 supports a concept of "raw" Unicode literals that don't meet the
-												Fix typos.

											
										
										
											2012-10-23 05:56:24 -04:00
+								conventional definition of a raw string: ``\uXXXX`` and ``\UXXXXXXXX`` escape
-												Update PEP 414 to record the exclusion of raw Unicode literals from the scope

											
										
										
											2012-06-20 07:45:58 -04:00
+								sequences are still processed by the compiler and converted to the
 								appropriate Unicode code points when creating the associated Unicode objects.
 								Python 3 has no corresponding concept - the compiler performs *no*
 								preprocessing of the contents of raw string literals. This matches the
 								behaviour of 8-bit raw string literals in Python 2.
 								Since such strings are rarely used and would be interpreted differently in
 								Python 3 if permitted, it was decided that leaving them out entirely was
 								a better choice. Code which uses them will thus still fail immediately on
 								Python 3 (with a Syntax Error), rather than potentially producing different
 								output.
 								To get equivalent behaviour that will run on both Python 2 and Python 3,
 								either an ordinary Unicode literal can be used (with appropriate additional
 								escaping within the string), or else string concatenation or string
 								formatting can be combine the raw portions of the string with those that
 								require the use of Unicode escape sequences.
 								Note that when using ``from __future__ import unicode_literals`` in Python 2,
 								the nominally "raw" Unicode string literals will process ``\uXXXX`` and
 								``\UXXXXXXXX`` escape sequences, just like Python 2 strings explicitly marked
 								with the "raw Unicode" prefix.
-												Rewrite PEP 414 to be less passionate in its tone and better address the common objections

											
										
										
											2012-03-04 02:24:43 -05:00
+								Author's Note
-												Added unicode literals pepe

											
										
										
											2012-02-25 13:52:45 -05:00
+								=============
-												Mark PEP 414 as Final and incorporate feedback (both public and private)

											
										
										
											2012-03-05 07:56:18 -05:00
+								This PEP was originally written by Armin Ronacher, and Guido's approval was
 								given based on that version.
-												Added unicode literals pepe

											
										
										
											2012-02-25 13:52:45 -05:00
-												Mark PEP 414 as Final and incorporate feedback (both public and private)

											
										
										
											2012-03-05 07:56:18 -05:00
+								The currently published version has been rewritten by Nick Coghlan to
 								include additional historical details and rationale that were taken into
 								account when Guido made his decision, but were not explicitly documented in
 								Armin's version of the PEP.
-												Rewrite PEP 414 to be less passionate in its tone and better address the common objections

											
										
										
											2012-03-04 02:24:43 -05:00
 								Readers should be aware that many of the arguments in this PEP are *not*
 								technical ones. Instead, they relate heavily to the *social* and *personal*
-												Mark PEP 414 as Final and incorporate feedback (both public and private)

											
										
										
											2012-03-05 07:56:18 -05:00
+								aspects of software development.
-												Rewrite PEP 414 to be less passionate in its tone and better address the common objections

											
										
										
											2012-03-04 02:24:43 -05:00
 								Rationale
 								=========
 								With the release of a Python 3 compatible version of the Web Services Gateway
-												Several PEPs: Use explicit `:pep:` and `:rfc:` roles (#2209)


											
										
										
											2022-01-21 06:03:51 -05:00
+								Interface (WSGI) specification (:pep:`3333`) for Python 3.2, many parts of the
-												Rewrite PEP 414 to be less passionate in its tone and better address the common objections

											
										
										
											2012-03-04 02:24:43 -05:00
+								Python web ecosystem have been making a concerted effort to support Python 3
 								without adversely affecting their existing developer and user communities.
 								One major item of feedback from key developers in those communities, including
 								Chris McDonough (WebOb, Pyramid), Armin Ronacher (Flask, Werkzeug), Jacob
 								Kaplan-Moss (Django) and Kenneth Reitz (``requests``) is that the requirement
 								to change the spelling of *every* Unicode literal in an application
 								(regardless of how that is accomplished) is a key stumbling block for porting
 								efforts.
-												Added unicode literals pepe

											
										
										
											2012-02-25 13:52:45 -05:00
-												Rewrite PEP 414 to be less passionate in its tone and better address the common objections

											
										
										
											2012-03-04 02:24:43 -05:00
+								In particular, unlike many of the other Python 3 changes, it isn't one that
 								framework and library authors can easily handle on behalf of their users. Most
 								of those users couldn't care less about the "purity" of the Python language
 								specification, they just want their websites and applications to work as well
 								as possible.
 								While it is the Python web community that has been most vocal in highlighting
 								this concern, it is expected that other highly Unicode aware domains (such as
 								GUI development) may run into similar issues as they (and their communities)
 								start making concerted efforts to support Python 3.
 								Common Objections
 								=================
-												Make it clear that the objection headings are paraphrased versions of the complaints made about the PEP

											
										
										
											2012-03-04 02:48:49 -05:00
+								Complaint: This PEP may harm adoption of Python 3.2
 								---------------------------------------------------
-												Rewrite PEP 414 to be less passionate in its tone and better address the common objections

											
										
										
											2012-03-04 02:24:43 -05:00
 								This complaint is interesting, as it carries within it a tacit admission that
 								this PEP *will* make it easier to port Unicode aware Python 2 applications to
 								Python 3.
 								There are many existing Python communities that are prepared to put up with
 								the constraints imposed by the existing suite of porting tools, or to update
 								their Python 2 code bases sufficiently that the problems are minimised.
 								This PEP is not for those communities. Instead, it is designed specifically to
 								help people that *don't* want to put up with those difficulties.
 								However, since the proposal is for a comparatively small tweak to the language
-												Mark PEP 414 as Final and incorporate feedback (both public and private)

											
										
										
											2012-03-05 07:56:18 -05:00
+								syntax with no semantic changes, it is feasible to support it as a third
 								party import hook. While such an import hook imposes some import time
 								overhead, and requires additional steps from each application that needs it
 								to get the hook in place, it allows applications that target Python 3.2
 								to use libraries and frameworks that would otherwise only run on Python 3.3+
 								due to their use of unicode literal prefixes.
-												Rewrite PEP 414 to be less passionate in its tone and better address the common objections

											
										
										
											2012-03-04 02:24:43 -05:00
-												Add link to Vinay's import hook project

											
										
										
											2012-03-04 08:02:40 -05:00
+								One such import hook project is Vinay Sajip's ``uprefix`` [4]_.
-												Rewrite PEP 414 to be less passionate in its tone and better address the common objections

											
										
										
											2012-03-04 02:24:43 -05:00
-												Mark PEP 414 as Final and incorporate feedback (both public and private)

											
										
										
											2012-03-05 07:56:18 -05:00
+								For those that prefer to translate their code in advance rather than
 								converting on the fly at import time, Armin Ronacher is working on a hook
 								that runs at install time rather than during import [5]_.
 								Combining the two approaches is of course also possible. For example, the
 								import hook could be used for rapid edit-test cycles during local
 								development, but the install hook for continuous integration tasks and
 								deployment on Python 3.2.
 								The approaches described in this section may prove useful, for example, for
 								applications that wish to target Python 3 on the Ubuntu 12.04 LTS release,
 								which will ship with Python 2.7 and 3.2 as officially supported Python
 								versions.
-												Rewrite PEP 414 to be less passionate in its tone and better address the common objections

											
										
										
											2012-03-04 02:24:43 -05:00
-												Make it clear that the objection headings are paraphrased versions of the complaints made about the PEP

											
										
										
											2012-03-04 02:48:49 -05:00
+								Complaint: Python 3 shouldn't be made worse just to support porting from Python 2
 								---------------------------------------------------------------------------------
-												Rewrite PEP 414 to be less passionate in its tone and better address the common objections

											
										
										
											2012-03-04 02:24:43 -05:00
 								This is indeed one of the key design principles of Python 3. However, one of
 								the key design principles of Python as a whole is that "practicality beats
 								purity". If we're going to impose a significant burden on third party
 								developers, we should have a solid rationale for doing so.
 								In most cases, the rationale for backwards incompatible Python 3 changes are
-												Mark PEP 414 as Final and incorporate feedback (both public and private)

											
										
										
											2012-03-05 07:56:18 -05:00
+								either to improve code correctness (for example, stricter default separation
 								of binary and text data and integer division upgrading to floats when
 								necessary), reduce typical memory usage (for example, increased usage of
 								iterators and views over concrete lists), or to remove distracting nuisances
 								that make Python code harder to read without increasing its expressiveness
 								(for example, the comma based syntax for naming caught exceptions). Changes
 								backed by such reasoning are *not* going to be reverted, regardless of
 								objections from Python 2 developers attempting to make the transition to
 								Python 3.
-												Rewrite PEP 414 to be less passionate in its tone and better address the common objections

											
										
										
											2012-03-04 02:24:43 -05:00
 								In many cases, Python 2 offered two ways of doing things for historical reasons.
 								For example, inequality could be tested with both ``!=`` and ``<>`` and integer
 								literals could be specified with an optional ``L`` suffix. Such redundancies
 								have been eliminated in Python 3, which reduces the overall size of the
 								language and improves consistency across developers.
 								In the original Python 3 design (up to and including Python 3.2), the explicit
 								prefix syntax for unicode literals was deemed to fall into this category, as it
 								is completely unnecessary in Python 3. However, the difference between those
 								other cases and unicode literals is that the unicode literal prefix is *not*
 								redundant in Python 2 code: it is a programmatically significant distinction
 								that needs to be preserved in some fashion to avoid losing information.
 								While porting tools were created to help with the transition (see next section)
 								it still creates an additional burden on heavy users of unicode strings in
 								Python 2, solely so that future developers learning Python 3 don't need to be
 								told "For historical reasons, string literals may have an optional ``u`` or
 								``U`` prefix. Never use this yourselves, it's just there to help with porting
 								from an earlier version of the language."
 								Plenty of students learning Python 2 received similar warnings regarding string
 								exceptions without being confused or irreparably stunted in their growth as
 								Python developers. It will be the same with this feature.
 								This point is further reinforced by the fact that Python 3 *still* allows the
 								uppercase variants of the ``B`` and ``R`` prefixes for bytes literals and raw
 								bytes and string literals. If the potential for confusion due to string prefix
 								variants is that significant, where was the outcry asking that these
-												Make it clear that the objection headings are paraphrased versions of the complaints made about the PEP

											
										
										
											2012-03-04 02:48:49 -05:00
+								redundant prefixes be removed along with all the other redundancies that were
-												Rewrite PEP 414 to be less passionate in its tone and better address the common objections

											
										
										
											2012-03-04 02:24:43 -05:00
+								eliminated in Python 3?
 								Just as support for string exceptions was eliminated from Python 2 using the
 								normal deprecation process, support for redundant string prefix characters
-												Mark PEP 414 as Final and incorporate feedback (both public and private)

											
										
										
											2012-03-05 07:56:18 -05:00
+								(specifically, ``B``, ``R``, ``u``, ``U``) may eventually be eliminated
 								from Python 3, regardless of the current acceptance of this PEP. However,
 								such a change will likely only occur once third party libraries supporting
 								Python 2.7 is about as common as libraries supporting Python 2.2 or 2.3 is
 								today.
-												Rewrite PEP 414 to be less passionate in its tone and better address the common objections

											
										
										
											2012-03-04 02:24:43 -05:00
-												Make it clear that the objection headings are paraphrased versions of the complaints made about the PEP

											
										
										
											2012-03-04 02:48:49 -05:00
+								Complaint: The WSGI "native strings" concept is an ugly hack
 								------------------------------------------------------------
-												Rewrite PEP 414 to be less passionate in its tone and better address the common objections

											
										
										
											2012-03-04 02:24:43 -05:00
 								One reason the removal of unicode literals has provoked such concern amongst
 								the web development community is that the updated WSGI specification had to
 								make a few compromises to minimise the disruption for existing web servers
 								that provide a WSGI-compatible interface (this was deemed necessary in order
 								to make the updated standard a viable target for web application authors and
 								web framework developers).
 								One of those compromises is the concept of a "native string". WSGI defines
 								three different kinds of string:
 								* text strings: handled as ``unicode`` in Python 2 and ``str`` in Python 3
 								* native strings: handled as ``str`` in both Python 2 and Python 3
 								* binary data: handled as ``str`` in Python 2 and ``bytes`` in Python 3
-												Mark PEP 414 as Final and incorporate feedback (both public and private)

											
										
										
											2012-03-05 07:56:18 -05:00
+								Some developers consider WSGI's "native strings" to be an ugly hack, as they
 								are *explicitly* documented as being used solely for ``latin-1`` decoded
 								"text", regardless of the actual encoding of the underlying data. Using this
 								approach bypasses many of the updates to Python 3's data model that are
 								designed to encourage correct handling of text encodings. However, it
 								generally works due to the specific details of the problem domain - web server
 								and web framework developers are some of the individuals *most* aware of how
 								blurry the line can get between binary data and text when working with HTTP
 								and related protocols, and how important it is to understand the implications
 								of the encodings in use when manipulating encoded text data. At the
 								*application* level most of these details are hidden from the developer by
 								the web frameworks and support libraries (both in Python 2 *and* in Python 3).
 								In practice, native strings are a useful concept because there are some APIs
 								(both in the standard library and in third party frameworks and packages) and
 								some internal interpreter details that are designed primarily to work with
-												Additional clarifications in the WSGI section

											
										
										
											2012-03-05 08:12:42 -05:00
+								``str``. These components often don't support ``unicode`` in Python 2
-												Mark PEP 414 as Final and incorporate feedback (both public and private)

											
										
										
											2012-03-05 07:56:18 -05:00
+								or ``bytes`` in Python 3, or, if they do, require additional encoding details
-												Additional clarifications in the WSGI section

											
										
										
											2012-03-05 08:12:42 -05:00
+								and/or impose constraints that don't apply to the ``str`` variants.
-												Mark PEP 414 as Final and incorporate feedback (both public and private)

											
										
										
											2012-03-05 07:56:18 -05:00
-												Additional clarifications in the WSGI section

											
										
										
											2012-03-05 08:12:42 -05:00
+								Some example of interfaces that are best handled by using actual ``str``
 								instances are:
-												Rewrite PEP 414 to be less passionate in its tone and better address the common objections

											
										
										
											2012-03-04 02:24:43 -05:00
-												Additional clarifications in the WSGI section

											
										
										
											2012-03-05 08:12:42 -05:00
+								* Python identifiers (as attributes, dict keys, class names, module names,
 								  import references, etc)
-												Rewrite PEP 414 to be less passionate in its tone and better address the common objections

											
										
										
											2012-03-04 02:24:43 -05:00
+								* URLs for the most part as well as HTTP headers in urllib/http servers
 								* WSGI environment keys and CGI-inherited values
 								* Python source code for dynamic compilation and AST hacks
 								* Exception messages
 								* ``__repr__`` return value
 								* preferred filesystem paths
 								* preferred OS environment
 								In Python 2.6 and 2.7, these distinctions are most naturally expressed as
 								follows:
-												Mark PEP 414 as Final and incorporate feedback (both public and private)

											
										
										
											2012-03-05 07:56:18 -05:00
+								* ``u""``: text string (``unicode``)
 								* ``""``: native string (``str``)
 								* ``b""``: binary data (``str``, also aliased as ``bytes``)
-												Rewrite PEP 414 to be less passionate in its tone and better address the common objections

											
										
										
											2012-03-04 02:24:43 -05:00
-												Mark PEP 414 as Final and incorporate feedback (both public and private)

											
										
										
											2012-03-05 07:56:18 -05:00
+								In Python 3, the ``latin-1`` decoded native strings are not distinguished
 								from any other text strings:
-												Rewrite PEP 414 to be less passionate in its tone and better address the common objections

											
										
										
											2012-03-04 02:24:43 -05:00
-												Mark PEP 414 as Final and incorporate feedback (both public and private)

											
										
										
											2012-03-05 07:56:18 -05:00
+								* ``""``: text string (``str``)
 								* ``""``: native string (``str``)
 								* ``b""``: binary data (``bytes``)
-												Rewrite PEP 414 to be less passionate in its tone and better address the common objections

											
										
										
											2012-03-04 02:24:43 -05:00
 								If ``from __future__ import unicode_literals`` is used to modify the behaviour
 								of Python 2, then, along with an appropriate definition of ``n()``, the
 								distinction can be expressed as:
 								* ``""``: text string
 								* ``n("")``: native string
 								* ``b""``: binary data
 								(While ``n=str`` works for simple cases, it can sometimes have problems
 								due to non-ASCII source encodings)
 								In the common subset of Python 2 and Python 3 (with appropriate
 								specification of a source encoding and definitions of the ``u()`` and ``b()``
 								helper functions), they can be expressed as:
 								* ``u("")``: text string
 								* ``""``: native string
 								* ``b("")``: binary data
 								That last approach is the only variant that supports Python 2.5 and earlier.
 								Of all the alternatives, the format currently supported in Python 2.6 and 2.7
-												Make it clear that the objection headings are paraphrased versions of the complaints made about the PEP

											
										
										
											2012-03-04 02:48:49 -05:00
+								is by far the cleanest approach that clearly distinguishes the three desired
 								kinds of behaviour. With this PEP, that format will also be supported in
-												Mark PEP 414 as Final and incorporate feedback (both public and private)

											
										
										
											2012-03-05 07:56:18 -05:00
+								Python 3.3+. It will also be supported in Python 3.1 and 3.2 through the use
 								of import and install hooks. While it is significantly less likely, it is
 								also conceivable that the hooks could be adapted to allow the use of the
 								``b`` prefix on Python 2.5.
-												Rewrite PEP 414 to be less passionate in its tone and better address the common objections

											
										
										
											2012-03-04 02:24:43 -05:00
-												Make it clear that the objection headings are paraphrased versions of the complaints made about the PEP

											
										
										
											2012-03-04 02:48:49 -05:00
+								Complaint: The existing tools should be good enough for everyone
 								----------------------------------------------------------------
-												Rewrite PEP 414 to be less passionate in its tone and better address the common objections

											
										
										
											2012-03-04 02:24:43 -05:00
-												Fix typos.

											
										
										
											2012-10-23 05:56:24 -04:00
+								A commonly expressed sentiment from developers that have already successfully
-												Rewrite PEP 414 to be less passionate in its tone and better address the common objections

											
										
										
											2012-03-04 02:24:43 -05:00
+								ported applications to Python 3 is along the lines of "if you think it's hard,
 								you're doing it wrong" or "it's not that hard, just try it!". While it is no
 								doubt unintentional, these responses all have the effect of telling the
 								people that are pointing out inadequacies in the current porting toolset
 								"there's nothing wrong with the porting tools, you just suck and don't know
 								how to use them properly".
 								These responses are a case of completely missing the point of what people are
-												Update PEP 414 to record the exclusion of raw Unicode literals from the scope

											
										
										
											2012-06-20 07:45:58 -04:00
+								complaining about. The feedback that resulted in this PEP isn't due to people
 								complaining that ports aren't possible. Instead, the feedback is coming from
-												Fix typos.

											
										
										
											2012-10-23 05:56:24 -04:00
+								people that have successfully *completed* ports and are objecting that they
-												Rewrite PEP 414 to be less passionate in its tone and better address the common objections

											
										
										
											2012-03-04 02:24:43 -05:00
+								found the experience thoroughly *unpleasant* for the class of application that
 								they needed to port (specifically, Unicode aware web frameworks and support
 								libraries).
 								This is a subjective appraisal, and it's the reason why the Python 3
 								porting tools ecosystem is a case where the "one obvious way to do it"
 								philosophy emphatically does *not* apply. While it was originally intended that
 								"develop in Python 2, convert with ``2to3``, test both" would be the standard
 								way to develop for both versions in parallel, in practice, the needs of
 								different projects and developer communities have proven to be sufficiently
 								diverse that a variety of approaches have been devised, allowing each group
 								to select an approach that best fits their needs.
 								Lennart Regebro has produced an excellent overview of the available migration
 								strategies [2]_, and a similar review is provided in the official porting
 								guide [3]_. (Note that the official guidance has softened to "it depends on
 								your specific situation" since Lennart wrote his overview).
 								However, both of those guides are written from the founding assumption that
 								all of the developers involved are *already* committed to the idea of
 								supporting Python 3. They make no allowance for the *social* aspects of such a
 								change when you're interacting with a user base that may not be especially
 								tolerant of disruptions without a clear benefit, or are trying to persuade
 								Python 2 focused upstream developers to accept patches that are solely about
 								improving Python 3 forward compatibility.
 								With the current porting toolset, *every* migration strategy will result in
 								changes to *every* Unicode literal in a project. No exceptions. They will
 								be converted to either an unprefixed string literal (if the project decides to
 								adopt the ``unicode_literals`` import) or else to a converter call like
 								``u("text")``.
 								If the ``unicode_literals`` import approach is employed, but is not adopted
 								across the entire project at the same time, then the meaning of a bare string
 								literal may become annoyingly ambiguous. This problem can be particularly
 								pernicious for *aggregated* software, like a Django site - in such a situation,
-												More minor cleanups to PEP 414

											
										
										
											2012-03-04 02:56:12 -05:00
+								some files may end up using the ``unicode_literals`` import and others may not,
-												Rewrite PEP 414 to be less passionate in its tone and better address the common objections

											
										
										
											2012-03-04 02:24:43 -05:00
+								creating definite potential for confusion.
 								While these problems are clearly solvable at a technical level, they're a
 								completely unnecessary distraction at the social level. Developer energy should
 								be reserved for addressing *real* technical difficulties associated with the
 								Python 3 transition (like distinguishing their 8-bit text strings from their
 								binary data). They shouldn't be punished with additional code changes (even
 								automated ones) solely due to the fact that they have *already* explicitly
 								identified their Unicode strings in Python 2.
 								Armin Ronacher has created an experimental extension to 2to3 which only
 								modernizes Python code to the extent that it runs on Python 2.7 or later with
-												More minor cleanups to PEP 414

											
										
										
											2012-03-04 02:56:12 -05:00
+								support from the cross-version compatibility ``six`` library. This tool is
 								available as ``python-modernize`` [1]_. Currently, the deltas generated by
 								this tool will affect every Unicode literal in the converted source. This
 								will create legitimate concerns amongst upstream developers asked to accept
 								such changes, and amongst framework *users* being asked to change their
 								applications.
-												Rewrite PEP 414 to be less passionate in its tone and better address the common objections

											
										
										
											2012-03-04 02:24:43 -05:00
 								However, by eliminating the noise from changes to the Unicode literal syntax,
-												More minor cleanups to PEP 414

											
										
										
											2012-03-04 02:56:12 -05:00
+								many projects could be cleanly and (comparatively) non-controversially made
-												Rewrite PEP 414 to be less passionate in its tone and better address the common objections

											
										
										
											2012-03-04 02:24:43 -05:00
+								forward compatible with Python 3.3+ just by running ``python-modernize`` and
 								applying the recommended changes.
-												Added unicode literals pepe

											
										
										
											2012-02-25 13:52:45 -05:00
 								References
 								==========
 								.. [1] Python-Modernize
 								   (http://github.com/mitsuhiko/python-modernize)
-												Rewrite PEP 414 to be less passionate in its tone and better address the common objections

											
										
										
											2012-03-04 02:24:43 -05:00
+								.. [2] Porting to Python 3: Migration Strategies
 								   (http://python3porting.com/strategies.html)
 								.. [3] Porting Python 2 Code to Python 3
 								   (http://docs.python.org/howto/pyporting.html)
-												Added unicode literals pepe

											
										
										
											2012-02-25 13:52:45 -05:00
-												Add link to Vinay's import hook project

											
										
										
											2012-03-04 08:02:40 -05:00
+								.. [4] uprefix import hook project
 								   (https://bitbucket.org/vinay.sajip/uprefix)
-												Mark PEP 414 as Final and incorporate feedback (both public and private)

											
										
										
											2012-03-05 07:56:18 -05:00
+								.. [5] install hook to remove unicode string prefix characters
 								   (https://github.com/mitsuhiko/unicode-literals-pep/tree/master/install-hook)
-												Add link to Vinay's import hook project

											
										
										
											2012-03-04 08:02:40 -05:00
-												Added unicode literals pepe

											
										
										
											2012-02-25 13:52:45 -05:00
+								Copyright
 								=========
 								This document has been placed in the public domain.
 								..
 								   Local Variables:
 								   mode: indented-text
 								   indent-tabs-mode: nil
 								   sentence-end-double-space: t
 								   fill-column: 70
 								   End: