Specify NFKC instead of NFC.
This commit is contained in:
parent
10a3df9553
commit
512559f6a1
10
pep-3131.txt
10
pep-3131.txt
|
@ -82,8 +82,8 @@ nonspacing marks (Mn), spacing combining marks (Mc), decimal number
|
||||||
(Nd), connector punctuations (Pc), and characters carryig the
|
(Nd), connector punctuations (Pc), and characters carryig the
|
||||||
Other_ID_Continue property.
|
Other_ID_Continue property.
|
||||||
|
|
||||||
All identifiers are converted into the normal form NFC while parsing;
|
All identifiers are converted into the normal form NFKC while parsing;
|
||||||
comparison of identifiers is based on NFC.
|
comparison of identifiers is based on NFKC.
|
||||||
|
|
||||||
A non-normative HTML file listing all valid identifier characters for
|
A non-normative HTML file listing all valid identifier characters for
|
||||||
Unicode 4.1 can be found at
|
Unicode 4.1 can be found at
|
||||||
|
@ -117,7 +117,7 @@ The following changes will need to be made to the parser:
|
||||||
non-identifier character (e.g. a space or punctuation character)
|
non-identifier character (e.g. a space or punctuation character)
|
||||||
|
|
||||||
2. The entire UTF-8 string is passed to a function to normalize the
|
2. The entire UTF-8 string is passed to a function to normalize the
|
||||||
string to NFC, and then verify that it follows the identifier
|
string to NFKC, and then verify that it follows the identifier
|
||||||
syntax. No such callout is made for pure-ASCII identifiers, which
|
syntax. No such callout is made for pure-ASCII identifiers, which
|
||||||
continue to be parsed the way they are today. The Unicode database
|
continue to be parsed the way they are today. The Unicode database
|
||||||
must start including the Other_ID_{Start|Continue} property.
|
must start including the Other_ID_{Start|Continue} property.
|
||||||
|
@ -154,10 +154,6 @@ C#. It's not clear whether this would improve things (it might
|
||||||
for RTL languages); if there is a need, these can be added
|
for RTL languages); if there is a need, these can be added
|
||||||
later.
|
later.
|
||||||
|
|
||||||
Another open issue is the choice of normalization form: some
|
|
||||||
people suggest to use NFKC instead of NFC, others suggest to
|
|
||||||
ban compatibility characters.
|
|
||||||
|
|
||||||
Some people would like to see an option on selecting support
|
Some people would like to see an option on selecting support
|
||||||
for this PEP at run-time; opinions vary on what precisely
|
for this PEP at run-time; opinions vary on what precisely
|
||||||
that option should be, and what precisely its default value
|
that option should be, and what precisely its default value
|
||||||
|
|
Loading…
Reference in New Issue