Changed Python's source code encoding default to ASCII.

Added note about handling of Unicode literals in phase 1.
This commit is contained in:
Marc-André Lemburg 2002-03-15 17:07:12 +00:00
parent 130b28e2a4
commit 4903b95001
1 changed files with 17 additions and 16 deletions

View File

@ -40,10 +40,8 @@ Proposed Solution
Defining the Encoding
Just as in coercion of strings to Unicode, Python will default to
the interpreter's default encoding (which is ASCII in standard
Python installations) as standard encoding if no other encoding
hints are given.
Python will default to ASCII as standard encoding if no other
encoding hints are given.
To define a source code encoding, a magic comment must
be placed into the source files either as first or second
@ -76,12 +74,12 @@ Concepts
result in a decoding error during compilation of the Python
source code.
Any encoding which allows processing the first two lines in
the way indicated above is allowed as source code encoding,
this includes ASCII compatible encodings as well as certain
Any encoding which allows processing the first two lines in the
way indicated above is allowed as source code encoding, this
includes ASCII compatible encodings as well as certain
multi-byte encodings such as Shift_JIS. It does not include
encodings which use two or more bytes for all characters
like e.g. UTF-16. The reason for this is to keep the encoding
encodings which use two or more bytes for all characters like
e.g. UTF-16. The reason for this is to keep the encoding
detection algorithm in the tokenizer simple.
2. Handling of escape sequences should continue to work as it does
@ -116,19 +114,22 @@ Implementation
Since changing the Python tokenizer/parser combination will
require major changes in the internals of the interpreter and
enforcing the use of magic comments in source code files which
place non-default encoding characters in string literals, comments
place non-ASCII characters in string literals, comments
and Unicode literals, the proposed solution should be implemented
in two phases:
1. Implement the magic comment detection and default encoding
handling, but only apply the detected encoding to Unicode
literals in the source file.
1. Implement the magic comment detection, but only apply the
detected encoding to Unicode literals in the source file.
If no magic comment is used, Python should continue to
use the standard [raw-]unicode-escape codecs for Unicode
literals.
In addition to this step and to aid in the transition to
explicit encoding declaration, the tokenizer must check the
complete source file for compliance with the default encoding
(which usually is ASCII). If the source file does not properly
decode, a single warning is generated per file.
complete source file for compliance with the declared
encoding. If the source file does not properly decode, a single
warning is generated per file.
2. Change the tokenizer/compiler base string type from char* to
Py_UNICODE* and apply the encoding to the complete file.