Update PEP 414 to record the exclusion of raw Unicode literals from the scope

This commit is contained in:
Nick Coghlan 2012-06-20 21:45:58 +10:00
parent e88d8fbb66
commit 31b656092d
2 changed files with 35 additions and 5 deletions

View File

@ -90,6 +90,8 @@ And early binding semantics in a list comprehension could be attained via::
def adder(i):
return lambda x: x + i
If a list comprehension grows to the
Proposal
========

View File

@ -40,7 +40,7 @@ on Python 3.
Specifically, the Python 3 definition for string literal prefixes will be
expanded to allow::
"u" | "U" | "ur" | "UR" | "Ur" | "uR"
"u" | "U"
in addition to the currently supported::
@ -61,13 +61,40 @@ The following will all denote ordinary Python 3 strings::
U'''text'''
U"""text"""
Combination of the unicode prefix with the raw string prefix will also be
supported, just as it was in Python 2.
No changes are proposed to Python 3's actual Unicode handling, only to the
acceptable forms for string literals.
Exclusion of "Raw" Unicode Literals
===================================
Python 2 supports a concept of "raw" Unicode literals that don't meet the
convential definition of a raw string: ``\uXXXX`` and ``\UXXXXXXXX`` escape
sequences are still processed by the compiler and converted to the
appropriate Unicode code points when creating the associated Unicode objects.
Python 3 has no corresponding concept - the compiler performs *no*
preprocessing of the contents of raw string literals. This matches the
behaviour of 8-bit raw string literals in Python 2.
Since such strings are rarely used and would be interpreted differently in
Python 3 if permitted, it was decided that leaving them out entirely was
a better choice. Code which uses them will thus still fail immediately on
Python 3 (with a Syntax Error), rather than potentially producing different
output.
To get equivalent behaviour that will run on both Python 2 and Python 3,
either an ordinary Unicode literal can be used (with appropriate additional
escaping within the string), or else string concatenation or string
formatting can be combine the raw portions of the string with those that
require the use of Unicode escape sequences.
Note that when using ``from __future__ import unicode_literals`` in Python 2,
the nominally "raw" Unicode string literals will process ``\uXXXX`` and
``\UXXXXXXXX`` escape sequences, just like Python 2 strings explicitly marked
with the "raw Unicode" prefix.
Author's Note
=============
@ -318,7 +345,8 @@ people that are pointing out inadequacies in the current porting toolset
how to use them properly".
These responses are a case of completely missing the point of what people are
complaining about. The feedback that resulted in this PEP isn't due to people complaining that ports aren't possible. Instead, the feedback is coming from
complaining about. The feedback that resulted in this PEP isn't due to people
complaining that ports aren't possible. Instead, the feedback is coming from
people that have succesfully *completed* ports and are objecting that they
found the experience thoroughly *unpleasant* for the class of application that
they needed to port (specifically, Unicode aware web frameworks and support