Update PEP 414 to record the exclusion of raw Unicode literals from the scope

2012-06-20 21:45:58 +10:00 · 2012-06-20 21:45:58 +10:00 · 31b656092d
parent e88d8fbb66
commit 31b656092d
2 changed files with 35 additions and 5 deletions
--- a/pep-0403.txt
+++ b/pep-0403.txt
@ -90,6 +90,8 @@ And early binding semantics in a list comprehension could be attained via::
    def adder(i):
        return lambda x: x + i

+If a list comprehension grows to the 
+

 Proposal
 ========
--- a/pep-0414.txt
+++ b/pep-0414.txt
@ -40,7 +40,7 @@ on Python 3.
 Specifically, the Python 3 definition for string literal prefixes will be
 expanded to allow::

-    "u" | "U" | "ur" | "UR" | "Ur" | "uR"
+    "u" | "U"

 in addition to the currently supported::

@ -61,13 +61,40 @@ The following will all denote ordinary Python 3 strings::
    U'''text'''
    U"""text"""

-Combination of the unicode prefix with the raw string prefix will also be
-supported, just as it was in Python 2.
-
 No changes are proposed to Python 3's actual Unicode handling, only to the
 acceptable forms for string literals.


+Exclusion of "Raw" Unicode Literals
+===================================
+
+Python 2 supports a concept of "raw" Unicode literals that don't meet the
+convential definition of a raw string: ``\uXXXX`` and ``\UXXXXXXXX`` escape
+sequences are still processed by the compiler and converted to the
+appropriate Unicode code points when creating the associated Unicode objects.
+
+Python 3 has no corresponding concept - the compiler performs *no*
+preprocessing of the contents of raw string literals. This matches the
+behaviour of 8-bit raw string literals in Python 2.
+
+Since such strings are rarely used and would be interpreted differently in
+Python 3 if permitted, it was decided that leaving them out entirely was
+a better choice. Code which uses them will thus still fail immediately on
+Python 3 (with a Syntax Error), rather than potentially producing different
+output.
+
+To get equivalent behaviour that will run on both Python 2 and Python 3,
+either an ordinary Unicode literal can be used (with appropriate additional
+escaping within the string), or else string concatenation or string
+formatting can be combine the raw portions of the string with those that
+require the use of Unicode escape sequences.
+
+Note that when using ``from __future__ import unicode_literals`` in Python 2,
+the nominally "raw" Unicode string literals will process ``\uXXXX`` and
+``\UXXXXXXXX`` escape sequences, just like Python 2 strings explicitly marked
+with the "raw Unicode" prefix.
+
+
 Author's Note
 =============

@ -318,7 +345,8 @@ people that are pointing out inadequacies in the current porting toolset
 how to use them properly".

 These responses are a case of completely missing the point of what people are
-complaining about. The feedback that resulted in this PEP isn't due to people complaining that ports aren't possible. Instead, the feedback is coming from
+complaining about. The feedback that resulted in this PEP isn't due to people
+complaining that ports aren't possible. Instead, the feedback is coming from
 people that have succesfully *completed* ports and are objecting that they
 found the experience thoroughly *unpleasant* for the class of application that
 they needed to port (specifically, Unicode aware web frameworks and support