diff --git a/pep-3131.txt b/pep-3131.txt index a9043ab60..8daf03ea9 100644 --- a/pep-3131.txt +++ b/pep-3131.txt @@ -82,8 +82,8 @@ nonspacing marks (Mn), spacing combining marks (Mc), decimal number (Nd), connector punctuations (Pc), and characters carryig the Other_ID_Continue property. -All identifiers are converted into the normal form NFC while parsing; -comparison of identifiers is based on NFC. +All identifiers are converted into the normal form NFKC while parsing; +comparison of identifiers is based on NFKC. A non-normative HTML file listing all valid identifier characters for Unicode 4.1 can be found at @@ -117,7 +117,7 @@ The following changes will need to be made to the parser: non-identifier character (e.g. a space or punctuation character) 2. The entire UTF-8 string is passed to a function to normalize the - string to NFC, and then verify that it follows the identifier + string to NFKC, and then verify that it follows the identifier syntax. No such callout is made for pure-ASCII identifiers, which continue to be parsed the way they are today. The Unicode database must start including the Other_ID_{Start|Continue} property. @@ -154,10 +154,6 @@ C#. It's not clear whether this would improve things (it might for RTL languages); if there is a need, these can be added later. -Another open issue is the choice of normalization form: some -people suggest to use NFKC instead of NFC, others suggest to -ban compatibility characters. - Some people would like to see an option on selecting support for this PEP at run-time; opinions vary on what precisely that option should be, and what precisely its default value