Add PEP 3142.

2007-04-27 05:11:00 +00:00 · 2007-04-27 05:11:00 +00:00 · 3c28c51417
parent fc960b4572
commit 3c28c51417
2 changed files with 109 additions and 0 deletions
--- a/pep-0000.txt
+++ b/pep-0000.txt
@ -118,6 +118,7 @@ Index by Category
 S  3118  Revising the buffer protocol                 Oliphant, Banks
 S  3119  Introducing Abstract Base Classes            GvR, Talin
 S  3141  A Type Hierarchy for Numbers                 Yasskin
 S  3142  Using UTF-8 as the default source encoding   von Löwis
 Finished PEPs (done, implemented in Subversion)
@ -475,6 +476,7 @@ Numerical Index
 S  3118  Revising the buffer protocol                 Oliphant, Banks
 S  3119  Introducing Abstract Base Classes            GvR, Talin
 S  3141  A Type Hierarchy for Numbers                 Yasskin
 S  3142  Using UTF-8 as the default source encoding   von Löwis
 Key
--- a/pep-3142.txt
+++ b/pep-3142.txt
@ -0,0 +1,107 @@
 PEP: 3142
 Title: Using UTF-8 as the default source encoding
 Version: $Revision $
 Last-Modified: $Date $
 Author: Martin v. Löwis <martin@v.loewis.de>
 Status: Draft
 Type: Standards Track
 Content-Type: text/x-rst
 Created: 15-Apr-2007
 Python-Version: 3.0
 Post-History:
 Specification
 =============
 This PEP proposes to change the default source encoding from ASCII to
 UTF-8. Support for alternative source encodings [#pep263]_ continues to
 exist; an explicit encoding declaration takes precedence over the
 default.
 A Bit of History
 ================
 In Python 1, the source encoding was unspecified, except that the
 source encoding had to be a superset of the system's basic execution
 character set (i.e. an ASCII superset, on most systems).  The source
 encoding was only relevant for the lexis itself (bytes representing
 letters for keywords, identifiers, punctuation, line breaks, etc).
 The contents of a string literal was copied literally from the file
 on source.
 In Python 2.0, the source encoding changed to Latin-1 as a side effect
 of introducing Unicode. For Unicode string literals, the characters
 were still copied literally from the source file, but widened on a
 character-by-character basis. As Unicode gives a fixed interpretation
 to code points, this algorithm effectively fixed a source encoding, at
 least for files containing non-ASCII characters in Unicode literals.
 PEP 263 identified the problem that you can use only those Unicode
 characters in a Unicode literal which are also in Latin-1, and
 introduced a syntax for declaring the source encoding. If no source
 encoding was given, the default should be ASCII. For compatibility
 with Python 2.0 and 2.1, files were interpreted as Latin-1 for a
 transitional period. This transition ended with Python 2.5, which
 gives an error if non-ASCII characters are encountered and no source
 encoding is declared.
 Rationale
 =========
 With PEP 263, using arbitrary non-ASCII characters in a Python file is
 possible, but tedious. One has to explicitly add an encoding
 declaration. Even though some editors (like IDLE and Emacs) support
 the declarations of PEP 263, many editors still do not (and never
 will); users have to explicitly adjust the encoding which the editor
 assumes on a file-by-file basis.
 When the default encoding is changed to UTF-8, adding non-ASCII text
 to Python files becomes easier and more portable: On some systems,
 editors will automatically choose UTF-8 when saving text (e.g. on Unix
 systems where the locale uses UTF-8). On other systems, editors will
 guess the encoding when reading the file, and UTF-8 is easy to
 guess. Yet other editors support associating a default encoding with a
 file extension, allowing users to associate .py with UTF-8.
 For Python 2, an important reason for using non-UTF-8 encodings was
 that byte string literals would be in the source encoding at run-time,
 allowing then to output them to a file or render them to the user
 as-is. With Python 3, all strings will be Unicode strings, so the
 original encoding of the source will have no impact at run-time.
 Implementation
 ==============
 The parser needs to be changed to accept bytes > 127 if no source
 encoding is specified; instead of giving an error, it needs to check
 that the bytes are well-formed UTF-8 (decoding is not necessary,
 as the parser converts all source code to UTF-8, anyway).
 IDLE needs to be changed to use UTF-8 as the default encoding.
 References
 ==========
 .. [#pep263]
   http://www.python.org/dev/peps/pep-0263/
 Copyright
 =========
 This document has been placed in the public domain.
 ..
   Local Variables:
   mode: indented-text
   indent-tabs-mode: nil
   sentence-end-double-space: t
   fill-column: 70
   coding: utf-8
   End: