Explain XID_Start and XID_Continue properly;

refer to DerivedCoreProperties.
This commit is contained in:
Martin v. Löwis 2007-08-15 07:50:22 +00:00
parent 8089f06be4
commit 30dcee2e0a
1 changed files with 15 additions and 4 deletions

View File

@ -71,16 +71,26 @@ the ``unicodedata`` module.
The identifier syntax is ``<XID_Start> <XID_Continue>*``.
``XID_Start`` is defined as all characters having one of the general
The exact specification of what characters have the XID_Start or
XID_Continue properties can be found in the DerivedCoreProperties
file of the Unicode data in use by Python (4.1 at the time this
PEP was written), see [6]_. For reference, the construction rules
for these sets are given below. The XID_ properties are derived
from ID_Start/ID_Continue, which are derived themselves.
``ID_Start`` is defined as all characters having one of the general
categories uppercase letters (Lu), lowercase letters (Ll), titlecase
letters (Lt), modifier letters (Lm), other letters (Lo), letter
numbers (Nl), the underscore, and characters carrying the
Other_ID_Start property (XXX adjust for XID_Start).
Other_ID_Start property. ``XID_Start`` then closes this set under
normalization, by removing all characters whose NFKC normalization
is not of the form ID_Start ID_Continue* anymore.
``XID_Continue`` is defined as all characters in ``XID_Start``, plus
``ID_Continue`` is defined as all characters in ``ID_Start``, plus
nonspacing marks (Mn), spacing combining marks (Mc), decimal number
(Nd), connector punctuations (Pc), and characters carryig the
Other_ID_Continue property (XXX adjust for XID_Continue).
Other_ID_Continue property. Again, ``XID_Continue`` closes this set
under NFKC-normalization; it also adds U+00B7 to support Catalan.
All identifiers are converted into the normal form NFKC while parsing;
comparison of identifiers is based on NFKC.
@ -251,6 +261,7 @@ References
.. [3] http://www.unicode.org/reports/tr36/
.. [4] http://mail.python.org/pipermail/python-3000/2007-June/008161.html
.. [5] http://mail.python.org/pipermail/python-3000/2007-May/007925.html
.. [6] http://www.unicode.org/Public/4.1.0/ucd/DerivedCoreProperties.txt
Copyright
=========