Add ?!ng's roundup.

2007-06-05 18:54:31 +00:00 · 2007-06-05 18:54:31 +00:00 · e652b9751d
parent e3ca9c9120
commit e652b9751d
1 changed files with 86 additions and 0 deletions
--- a/pep-3131.txt
+++ b/pep-3131.txt
@ -154,6 +154,91 @@ C#. It's not clear whether this would improve things (it might
 for RTL languages); if there is a need, these can be added
 later.

+Another open issue is the choice of normalization form: some
+people suggest to use NFKC instead of NFC, others suggest to
+ban compatibility characters.
+
+Discussion
+==========
+
+Ka-Ping Yee summarizes discussion and further objection 
+in [4]_ as such:
+
+A. Should identifiers be allowed to contain any Unicode letter?
+
+   Drawbacks of allowing non-ASCII identifiers wholesale:
+
+   1. Python will lose the ability to make a reliable round trip to
+      a human-readable display on screen or on paper.
+
+   2. Python will become vulnerable to a new class of security exploits;
+      code and submitted patches will be much harder to inspect.
+
+   3. Humans will no longer be able to validate Python syntax.
+
+   4. Unicode is young; its problems are not yet well understood and
+      solved; tool support is weak.
+
+   5. Languages with non-ASCII identifiers use different character sets
+      and normalization schemes; PEP 3131's choices are non-obvious.
+
+   6. The Unicode bidi algorithm yields an extremely confusing display
+      order for RTL text when digits or operators are nearby.
+
+
+B. Should the default behaviour accept only ASCII identifiers, or
+   should it accept identifiers containing non-ASCII characters?
+
+   Arguments for ASCII only by default:
+
+   1. Non-ASCII identifiers by default makes common practice/assumptions
+      subtly/unknowingly wrong; rarely wrong is worse than obviously wrong.
+
+   2. Better to raise a warning than to fail silently when encountering
+      an probably unexpected situation.
+
+   3. All of current usage is ASCII-only; the vast majority of future
+      usage will be ASCII-only.
+
+   3. It is the pockets of Unicode adoption that are parochial, not the
+      ASCII advocates.
+
+   4. Python should audit for ASCII-only identifiers for the same
+      reasons that it audits for tab-space consistency
+
+   5. Incremental change is safer.
+
+   6. An ASCII-only default favors open-source development and sharing
+      of source code.
+
+   7. Existing projects won't have to waste any brainpower worrying
+      about the implications of Unicode identifiers.
+
+C. Should non-ASCII identifiers be optional?
+
+   Various voices in support of a flag (although there's been debate
+   over which should be the default, no one seems to be saying that
+   there shouldn't be an off switch)
+
+D. Should the identifier character set be configurable?
+
+   Various voices proposing and supporting a selectable character set,
+   so that users can get all the benefits of using their own language
+   without the drawbacks of confusable/unfamiliar characters
+
+
+E. Which identifier characters should be allowed?
+
+   1. What to do about bidi format control characters?
+
+   2. What about other ID_Continue characters?  What about characters
+      that look like punctuation?  What about other recommendations
+      in UTS #39?  What about mixed-script identifiers?
+
+F.  Which normalization form should be used, NFC or NFKC?
+
+G.  Should source code be required to be in normalized form?
+

 References
 ==========
@ -161,6 +246,7 @@ References
 .. [1] http://www.unicode.org/reports/tr31/
 .. [2] http://www.unicode.org/reports/tr39/
 .. [3] http://www.unicode.org/reports/tr36/
+.. [4] http://mail.python.org/pipermail/python-3000/2007-June/008161.html

 Copyright
 =========