Clarified the class of possible source code encodings. Added link

to Martin's patch.
This commit is contained in:
Marc-André Lemburg 2002-03-07 11:14:26 +00:00
parent a04d6be380
commit bcc3471c33
1 changed files with 16 additions and 5 deletions

View File

@ -76,11 +76,13 @@ Concepts
result in a decoding error during compilation of the Python
source code.
Only ASCII compatible encodings are allowed as source code
encoding to assure that Python language elements other than
literals and comments remain readable by ASCII processing tools
and to avoid problems with wide characters encodings such as
UTF-16.
Any encoding which allows processing the first two lines in
the way indicated above is allowed as source code encoding,
this includes ASCII compatible encodings as well as certain
multi-byte encodings such as Shift_JIS. It does not include
encodings which use two or more bytes for all characters
like e.g. UTF-16. The reason for this is to keep the encoding
detection algorithm in the tokenizer simple.
2. Handling of escape sequences should continue to work as it does
now, but with all possible source code encodings, that is
@ -138,14 +140,23 @@ Implementation
input. 8-bit string input is subject to the standard procedure
for encoding detection as decsribed above.
Martin v. Loewis is working on a patch which implements phase 1.
See [1] for details.
Scope
This PEP intends to provide an upgrade path from th current
(more-or-less) undefined source code encoding situation to a more
robust and portable definition.
References
[1] Phase 1 implementation:
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=526840&group_id=5470
History
1.10 and above: see CVS history
1.8: Added '.' to the coding RE.
1.7: Added warnings to phase 1 implementation. Replaced the
Latin-1 default encoding with the interpreter's default