Changed Python's source code encoding default to ASCII.
Added note about handling of Unicode literals in phase 1.
This commit is contained in:
parent
130b28e2a4
commit
4903b95001
33
pep-0263.txt
33
pep-0263.txt
|
@ -40,10 +40,8 @@ Proposed Solution
|
||||||
|
|
||||||
Defining the Encoding
|
Defining the Encoding
|
||||||
|
|
||||||
Just as in coercion of strings to Unicode, Python will default to
|
Python will default to ASCII as standard encoding if no other
|
||||||
the interpreter's default encoding (which is ASCII in standard
|
encoding hints are given.
|
||||||
Python installations) as standard encoding if no other encoding
|
|
||||||
hints are given.
|
|
||||||
|
|
||||||
To define a source code encoding, a magic comment must
|
To define a source code encoding, a magic comment must
|
||||||
be placed into the source files either as first or second
|
be placed into the source files either as first or second
|
||||||
|
@ -76,12 +74,12 @@ Concepts
|
||||||
result in a decoding error during compilation of the Python
|
result in a decoding error during compilation of the Python
|
||||||
source code.
|
source code.
|
||||||
|
|
||||||
Any encoding which allows processing the first two lines in
|
Any encoding which allows processing the first two lines in the
|
||||||
the way indicated above is allowed as source code encoding,
|
way indicated above is allowed as source code encoding, this
|
||||||
this includes ASCII compatible encodings as well as certain
|
includes ASCII compatible encodings as well as certain
|
||||||
multi-byte encodings such as Shift_JIS. It does not include
|
multi-byte encodings such as Shift_JIS. It does not include
|
||||||
encodings which use two or more bytes for all characters
|
encodings which use two or more bytes for all characters like
|
||||||
like e.g. UTF-16. The reason for this is to keep the encoding
|
e.g. UTF-16. The reason for this is to keep the encoding
|
||||||
detection algorithm in the tokenizer simple.
|
detection algorithm in the tokenizer simple.
|
||||||
|
|
||||||
2. Handling of escape sequences should continue to work as it does
|
2. Handling of escape sequences should continue to work as it does
|
||||||
|
@ -116,19 +114,22 @@ Implementation
|
||||||
Since changing the Python tokenizer/parser combination will
|
Since changing the Python tokenizer/parser combination will
|
||||||
require major changes in the internals of the interpreter and
|
require major changes in the internals of the interpreter and
|
||||||
enforcing the use of magic comments in source code files which
|
enforcing the use of magic comments in source code files which
|
||||||
place non-default encoding characters in string literals, comments
|
place non-ASCII characters in string literals, comments
|
||||||
and Unicode literals, the proposed solution should be implemented
|
and Unicode literals, the proposed solution should be implemented
|
||||||
in two phases:
|
in two phases:
|
||||||
|
|
||||||
1. Implement the magic comment detection and default encoding
|
1. Implement the magic comment detection, but only apply the
|
||||||
handling, but only apply the detected encoding to Unicode
|
detected encoding to Unicode literals in the source file.
|
||||||
literals in the source file.
|
|
||||||
|
If no magic comment is used, Python should continue to
|
||||||
|
use the standard [raw-]unicode-escape codecs for Unicode
|
||||||
|
literals.
|
||||||
|
|
||||||
In addition to this step and to aid in the transition to
|
In addition to this step and to aid in the transition to
|
||||||
explicit encoding declaration, the tokenizer must check the
|
explicit encoding declaration, the tokenizer must check the
|
||||||
complete source file for compliance with the default encoding
|
complete source file for compliance with the declared
|
||||||
(which usually is ASCII). If the source file does not properly
|
encoding. If the source file does not properly decode, a single
|
||||||
decode, a single warning is generated per file.
|
warning is generated per file.
|
||||||
|
|
||||||
2. Change the tokenizer/compiler base string type from char* to
|
2. Change the tokenizer/compiler base string type from char* to
|
||||||
Py_UNICODE* and apply the encoding to the complete file.
|
Py_UNICODE* and apply the encoding to the complete file.
|
||||||
|
|
Loading…
Reference in New Issue