Add PEP 3142.

2007-04-27 05:11:00 +00:00 · 2007-04-27 05:11:00 +00:00 · 3c28c51417
parent fc960b4572
commit 3c28c51417
2 changed files with 109 additions and 0 deletions
--- a/pep-0000.txt
+++ b/pep-0000.txt
@ -118,6 +118,7 @@ Index by Category
 S  3118  Revising the buffer protocol                 Oliphant, Banks
 S  3119  Introducing Abstract Base Classes            GvR, Talin
 S  3141  A Type Hierarchy for Numbers                 Yasskin
+ S  3142  Using UTF-8 as the default source encoding   von Löwis

 Finished PEPs (done, implemented in Subversion)

@ -475,6 +476,7 @@ Numerical Index
 S  3118  Revising the buffer protocol                 Oliphant, Banks
 S  3119  Introducing Abstract Base Classes            GvR, Talin
 S  3141  A Type Hierarchy for Numbers                 Yasskin
+ S  3142  Using UTF-8 as the default source encoding   von Löwis


 Key
--- a/pep-3142.txt
+++ b/pep-3142.txt
@ -0,0 +1,107 @@
+PEP: 3142
+Title: Using UTF-8 as the default source encoding
+Version: $Revision $
+Last-Modified: $Date $
+Author: Martin v. Löwis <martin@v.loewis.de>
+Status: Draft
+Type: Standards Track
+Content-Type: text/x-rst
+Created: 15-Apr-2007
+Python-Version: 3.0
+Post-History:
+
+
+Specification
+=============
+
+This PEP proposes to change the default source encoding from ASCII to
+UTF-8. Support for alternative source encodings [#pep263]_ continues to
+exist; an explicit encoding declaration takes precedence over the
+default.
+
+
+A Bit of History
+================
+
+In Python 1, the source encoding was unspecified, except that the
+source encoding had to be a superset of the system's basic execution
+character set (i.e. an ASCII superset, on most systems).  The source
+encoding was only relevant for the lexis itself (bytes representing
+letters for keywords, identifiers, punctuation, line breaks, etc).
+The contents of a string literal was copied literally from the file
+on source.
+
+In Python 2.0, the source encoding changed to Latin-1 as a side effect
+of introducing Unicode. For Unicode string literals, the characters
+were still copied literally from the source file, but widened on a
+character-by-character basis. As Unicode gives a fixed interpretation
+to code points, this algorithm effectively fixed a source encoding, at
+least for files containing non-ASCII characters in Unicode literals.
+
+PEP 263 identified the problem that you can use only those Unicode
+characters in a Unicode literal which are also in Latin-1, and
+introduced a syntax for declaring the source encoding. If no source
+encoding was given, the default should be ASCII. For compatibility
+with Python 2.0 and 2.1, files were interpreted as Latin-1 for a
+transitional period. This transition ended with Python 2.5, which
+gives an error if non-ASCII characters are encountered and no source
+encoding is declared.
+
+Rationale
+=========
+
+With PEP 263, using arbitrary non-ASCII characters in a Python file is
+possible, but tedious. One has to explicitly add an encoding
+declaration. Even though some editors (like IDLE and Emacs) support
+the declarations of PEP 263, many editors still do not (and never
+will); users have to explicitly adjust the encoding which the editor
+assumes on a file-by-file basis.
+
+When the default encoding is changed to UTF-8, adding non-ASCII text
+to Python files becomes easier and more portable: On some systems,
+editors will automatically choose UTF-8 when saving text (e.g. on Unix
+systems where the locale uses UTF-8). On other systems, editors will
+guess the encoding when reading the file, and UTF-8 is easy to
+guess. Yet other editors support associating a default encoding with a
+file extension, allowing users to associate .py with UTF-8.
+
+For Python 2, an important reason for using non-UTF-8 encodings was
+that byte string literals would be in the source encoding at run-time,
+allowing then to output them to a file or render them to the user
+as-is. With Python 3, all strings will be Unicode strings, so the
+original encoding of the source will have no impact at run-time.
+
+Implementation
+==============
+
+The parser needs to be changed to accept bytes > 127 if no source
+encoding is specified; instead of giving an error, it needs to check
+that the bytes are well-formed UTF-8 (decoding is not necessary,
+as the parser converts all source code to UTF-8, anyway).
+
+IDLE needs to be changed to use UTF-8 as the default encoding.
+
+
+References
+==========
+
+.. [#pep263]
+   http://www.python.org/dev/peps/pep-0263/
+   
+
+
+Copyright
+=========
+
+This document has been placed in the public domain.
+
+
+
+..
+   Local Variables:
+   mode: indented-text
+   indent-tabs-mode: nil
+   sentence-end-double-space: t
+   fill-column: 70
+   coding: utf-8
+   End: