Specify NFKC instead of NFC.

This commit is contained in:
Martin v. Löwis 2007-08-14 16:24:05 +00:00
parent 10a3df9553
commit 512559f6a1
1 changed files with 3 additions and 7 deletions

View File

@ -82,8 +82,8 @@ nonspacing marks (Mn), spacing combining marks (Mc), decimal number
(Nd), connector punctuations (Pc), and characters carryig the
Other_ID_Continue property.
All identifiers are converted into the normal form NFC while parsing;
comparison of identifiers is based on NFC.
All identifiers are converted into the normal form NFKC while parsing;
comparison of identifiers is based on NFKC.
A non-normative HTML file listing all valid identifier characters for
Unicode 4.1 can be found at
@ -117,7 +117,7 @@ The following changes will need to be made to the parser:
non-identifier character (e.g. a space or punctuation character)
2. The entire UTF-8 string is passed to a function to normalize the
string to NFC, and then verify that it follows the identifier
string to NFKC, and then verify that it follows the identifier
syntax. No such callout is made for pure-ASCII identifiers, which
continue to be parsed the way they are today. The Unicode database
must start including the Other_ID_{Start|Continue} property.
@ -154,10 +154,6 @@ C#. It's not clear whether this would improve things (it might
for RTL languages); if there is a need, these can be added
later.
Another open issue is the choice of normalization form: some
people suggest to use NFKC instead of NFC, others suggest to
ban compatibility characters.
Some people would like to see an option on selecting support
for this PEP at run-time; opinions vary on what precisely
that option should be, and what precisely its default value