Changed Python's source code encoding default to ASCII.

Added note about handling of Unicode literals in phase 1.
2002-03-15 17:07:12 +00:00 · 2002-03-15 17:07:12 +00:00 · 4903b95001
parent 130b28e2a4
commit 4903b95001
1 changed files with 17 additions and 16 deletions
--- a/pep-0263.txt
+++ b/pep-0263.txt
@ -40,10 +40,8 @@ Proposed Solution
 Defining the Encoding
-    Just as in coercion of strings to Unicode, Python will default to
+    Python will default to ASCII as standard encoding if no other
-    the interpreter's default encoding (which is ASCII in standard
+    encoding hints are given.
    Python installations) as standard encoding if no other encoding
    hints are given.
    To define a source code encoding, a magic comment must
    be placed into the source files either as first or second
@ -76,12 +74,12 @@ Concepts
       result in a decoding error during compilation of the Python
       source code.
-       Any encoding which allows processing the first two lines in
+       Any encoding which allows processing the first two lines in the
-       the way indicated above is allowed as source code encoding,
+       way indicated above is allowed as source code encoding, this
-       this includes ASCII compatible encodings as well as certain
+       includes ASCII compatible encodings as well as certain
       multi-byte encodings such as Shift_JIS. It does not include
-       encodings which use two or more bytes for all characters
+       encodings which use two or more bytes for all characters like
-       like e.g. UTF-16. The reason for this is to keep the encoding
+       e.g. UTF-16. The reason for this is to keep the encoding
       detection algorithm in the tokenizer simple.
    2. Handling of escape sequences should continue to work as it does 
@ -116,19 +114,22 @@ Implementation
    Since changing the Python tokenizer/parser combination will
    require major changes in the internals of the interpreter and
    enforcing the use of magic comments in source code files which
-    place non-default encoding characters in string literals, comments
+    place non-ASCII characters in string literals, comments
    and Unicode literals, the proposed solution should be implemented
    in two phases:
-    1. Implement the magic comment detection and default encoding
+    1. Implement the magic comment detection, but only apply the
-       handling, but only apply the detected encoding to Unicode
+       detected encoding to Unicode literals in the source file.
-       literals in the source file.
+
       If no magic comment is used, Python should continue to
       use the standard [raw-]unicode-escape codecs for Unicode
       literals.
       In addition to this step and to aid in the transition to
       explicit encoding declaration, the tokenizer must check the
-       complete source file for compliance with the default encoding
+       complete source file for compliance with the declared
-       (which usually is ASCII). If the source file does not properly
+       encoding. If the source file does not properly decode, a single
-       decode, a single warning is generated per file.
+       warning is generated per file.
    2. Change the tokenizer/compiler base string type from char* to
       Py_UNICODE* and apply the encoding to the complete file.