Add ?!ng's roundup.

This commit is contained in:
Martin v. Löwis 2007-06-05 18:54:31 +00:00
parent e3ca9c9120
commit e652b9751d
1 changed files with 86 additions and 0 deletions

View File

@ -154,6 +154,91 @@ C#. It's not clear whether this would improve things (it might
for RTL languages); if there is a need, these can be added
later.
Another open issue is the choice of normalization form: some
people suggest to use NFKC instead of NFC, others suggest to
ban compatibility characters.
Discussion
==========
Ka-Ping Yee summarizes discussion and further objection
in [4]_ as such:
A. Should identifiers be allowed to contain any Unicode letter?
Drawbacks of allowing non-ASCII identifiers wholesale:
1. Python will lose the ability to make a reliable round trip to
a human-readable display on screen or on paper.
2. Python will become vulnerable to a new class of security exploits;
code and submitted patches will be much harder to inspect.
3. Humans will no longer be able to validate Python syntax.
4. Unicode is young; its problems are not yet well understood and
solved; tool support is weak.
5. Languages with non-ASCII identifiers use different character sets
and normalization schemes; PEP 3131's choices are non-obvious.
6. The Unicode bidi algorithm yields an extremely confusing display
order for RTL text when digits or operators are nearby.
B. Should the default behaviour accept only ASCII identifiers, or
should it accept identifiers containing non-ASCII characters?
Arguments for ASCII only by default:
1. Non-ASCII identifiers by default makes common practice/assumptions
subtly/unknowingly wrong; rarely wrong is worse than obviously wrong.
2. Better to raise a warning than to fail silently when encountering
an probably unexpected situation.
3. All of current usage is ASCII-only; the vast majority of future
usage will be ASCII-only.
3. It is the pockets of Unicode adoption that are parochial, not the
ASCII advocates.
4. Python should audit for ASCII-only identifiers for the same
reasons that it audits for tab-space consistency
5. Incremental change is safer.
6. An ASCII-only default favors open-source development and sharing
of source code.
7. Existing projects won't have to waste any brainpower worrying
about the implications of Unicode identifiers.
C. Should non-ASCII identifiers be optional?
Various voices in support of a flag (although there's been debate
over which should be the default, no one seems to be saying that
there shouldn't be an off switch)
D. Should the identifier character set be configurable?
Various voices proposing and supporting a selectable character set,
so that users can get all the benefits of using their own language
without the drawbacks of confusable/unfamiliar characters
E. Which identifier characters should be allowed?
1. What to do about bidi format control characters?
2. What about other ID_Continue characters? What about characters
that look like punctuation? What about other recommendations
in UTS #39? What about mixed-script identifiers?
F. Which normalization form should be used, NFC or NFKC?
G. Should source code be required to be in normalized form?
References
==========
@ -161,6 +246,7 @@ References
.. [1] http://www.unicode.org/reports/tr31/
.. [2] http://www.unicode.org/reports/tr39/
.. [3] http://www.unicode.org/reports/tr36/
.. [4] http://mail.python.org/pipermail/python-3000/2007-June/008161.html
Copyright
=========