Specify NFKC instead of NFC.
This commit is contained in:
parent
10a3df9553
commit
512559f6a1
10
pep-3131.txt
10
pep-3131.txt
|
@ -82,8 +82,8 @@ nonspacing marks (Mn), spacing combining marks (Mc), decimal number
|
|||
(Nd), connector punctuations (Pc), and characters carryig the
|
||||
Other_ID_Continue property.
|
||||
|
||||
All identifiers are converted into the normal form NFC while parsing;
|
||||
comparison of identifiers is based on NFC.
|
||||
All identifiers are converted into the normal form NFKC while parsing;
|
||||
comparison of identifiers is based on NFKC.
|
||||
|
||||
A non-normative HTML file listing all valid identifier characters for
|
||||
Unicode 4.1 can be found at
|
||||
|
@ -117,7 +117,7 @@ The following changes will need to be made to the parser:
|
|||
non-identifier character (e.g. a space or punctuation character)
|
||||
|
||||
2. The entire UTF-8 string is passed to a function to normalize the
|
||||
string to NFC, and then verify that it follows the identifier
|
||||
string to NFKC, and then verify that it follows the identifier
|
||||
syntax. No such callout is made for pure-ASCII identifiers, which
|
||||
continue to be parsed the way they are today. The Unicode database
|
||||
must start including the Other_ID_{Start|Continue} property.
|
||||
|
@ -154,10 +154,6 @@ C#. It's not clear whether this would improve things (it might
|
|||
for RTL languages); if there is a need, these can be added
|
||||
later.
|
||||
|
||||
Another open issue is the choice of normalization form: some
|
||||
people suggest to use NFKC instead of NFC, others suggest to
|
||||
ban compatibility characters.
|
||||
|
||||
Some people would like to see an option on selecting support
|
||||
for this PEP at run-time; opinions vary on what precisely
|
||||
that option should be, and what precisely its default value
|
||||
|
|
Loading…
Reference in New Issue