Clarified the class of possible source code encodings. Added link

to Martin's patch.
2002-03-07 11:14:26 +00:00 · 2002-03-07 11:14:26 +00:00 · bcc3471c33
parent a04d6be380
commit bcc3471c33
1 changed files with 16 additions and 5 deletions
--- a/pep-0263.txt
+++ b/pep-0263.txt
@ -76,11 +76,13 @@ Concepts
       result in a decoding error during compilation of the Python
       source code.

-       Only ASCII compatible encodings are allowed as source code
-       encoding to assure that Python language elements other than
-       literals and comments remain readable by ASCII processing tools
-       and to avoid problems with wide characters encodings such as
-       UTF-16.
+       Any encoding which allows processing the first two lines in
+       the way indicated above is allowed as source code encoding,
+       this includes ASCII compatible encodings as well as certain
+       multi-byte encodings such as Shift_JIS. It does not include
+       encodings which use two or more bytes for all characters
+       like e.g. UTF-16. The reason for this is to keep the encoding
+       detection algorithm in the tokenizer simple.

    2. Handling of escape sequences should continue to work as it does 
       now, but with all possible source code encodings, that is
@ -138,14 +140,23 @@ Implementation
       input. 8-bit string input is subject to the standard procedure
       for encoding detection as decsribed above.

+    Martin v. Loewis is working on a patch which implements phase 1.
+    See [1] for details.
+
 Scope

    This PEP intends to provide an upgrade path from th current
    (more-or-less) undefined source code encoding situation to a more
    robust and portable definition.

+References
+
+    [1] Phase 1 implementation:
+        http://sourceforge.net/tracker/?func=detail&atid=305470&aid=526840&group_id=5470
+
 History

+    1.10 and above: see CVS history
    1.8: Added '.' to the coding RE.
    1.7: Added warnings to phase 1 implementation. Replaced the
         Latin-1 default encoding with the interpreter's default