Clarified the class of possible source code encodings. Added link

to Martin's patch.
This commit is contained in:
Marc-André Lemburg 2002-03-07 11:14:26 +00:00
parent a04d6be380
commit bcc3471c33
1 changed files with 16 additions and 5 deletions

View File

@ -76,11 +76,13 @@ Concepts
result in a decoding error during compilation of the Python result in a decoding error during compilation of the Python
source code. source code.
Only ASCII compatible encodings are allowed as source code Any encoding which allows processing the first two lines in
encoding to assure that Python language elements other than the way indicated above is allowed as source code encoding,
literals and comments remain readable by ASCII processing tools this includes ASCII compatible encodings as well as certain
and to avoid problems with wide characters encodings such as multi-byte encodings such as Shift_JIS. It does not include
UTF-16. encodings which use two or more bytes for all characters
like e.g. UTF-16. The reason for this is to keep the encoding
detection algorithm in the tokenizer simple.
2. Handling of escape sequences should continue to work as it does 2. Handling of escape sequences should continue to work as it does
now, but with all possible source code encodings, that is now, but with all possible source code encodings, that is
@ -138,14 +140,23 @@ Implementation
input. 8-bit string input is subject to the standard procedure input. 8-bit string input is subject to the standard procedure
for encoding detection as decsribed above. for encoding detection as decsribed above.
Martin v. Loewis is working on a patch which implements phase 1.
See [1] for details.
Scope Scope
This PEP intends to provide an upgrade path from th current This PEP intends to provide an upgrade path from th current
(more-or-less) undefined source code encoding situation to a more (more-or-less) undefined source code encoding situation to a more
robust and portable definition. robust and portable definition.
References
[1] Phase 1 implementation:
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=526840&group_id=5470
History History
1.10 and above: see CVS history
1.8: Added '.' to the coding RE. 1.8: Added '.' to the coding RE.
1.7: Added warnings to phase 1 implementation. Replaced the 1.7: Added warnings to phase 1 implementation. Replaced the
Latin-1 default encoding with the interpreter's default Latin-1 default encoding with the interpreter's default