From 3c28c51417cd45cd3f34291403add11da69c7d42 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Martin=20v=2E=20L=C3=B6wis?= Date: Fri, 27 Apr 2007 05:11:00 +0000 Subject: [PATCH] Add PEP 3142. --- pep-0000.txt | 2 + pep-3142.txt | 107 +++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 109 insertions(+) create mode 100644 pep-3142.txt diff --git a/pep-0000.txt b/pep-0000.txt index 0ecdb129b..10ad0785e 100644 --- a/pep-0000.txt +++ b/pep-0000.txt @@ -118,6 +118,7 @@ Index by Category S 3118 Revising the buffer protocol Oliphant, Banks S 3119 Introducing Abstract Base Classes GvR, Talin S 3141 A Type Hierarchy for Numbers Yasskin + S 3142 Using UTF-8 as the default source encoding von Löwis Finished PEPs (done, implemented in Subversion) @@ -475,6 +476,7 @@ Numerical Index S 3118 Revising the buffer protocol Oliphant, Banks S 3119 Introducing Abstract Base Classes GvR, Talin S 3141 A Type Hierarchy for Numbers Yasskin + S 3142 Using UTF-8 as the default source encoding von Löwis Key diff --git a/pep-3142.txt b/pep-3142.txt new file mode 100644 index 000000000..1cd85aa9c --- /dev/null +++ b/pep-3142.txt @@ -0,0 +1,107 @@ +PEP: 3142 +Title: Using UTF-8 as the default source encoding +Version: $Revision $ +Last-Modified: $Date $ +Author: Martin v. Löwis +Status: Draft +Type: Standards Track +Content-Type: text/x-rst +Created: 15-Apr-2007 +Python-Version: 3.0 +Post-History: + + +Specification +============= + +This PEP proposes to change the default source encoding from ASCII to +UTF-8. Support for alternative source encodings [#pep263]_ continues to +exist; an explicit encoding declaration takes precedence over the +default. + + +A Bit of History +================ + +In Python 1, the source encoding was unspecified, except that the +source encoding had to be a superset of the system's basic execution +character set (i.e. an ASCII superset, on most systems). The source +encoding was only relevant for the lexis itself (bytes representing +letters for keywords, identifiers, punctuation, line breaks, etc). +The contents of a string literal was copied literally from the file +on source. + +In Python 2.0, the source encoding changed to Latin-1 as a side effect +of introducing Unicode. For Unicode string literals, the characters +were still copied literally from the source file, but widened on a +character-by-character basis. As Unicode gives a fixed interpretation +to code points, this algorithm effectively fixed a source encoding, at +least for files containing non-ASCII characters in Unicode literals. + +PEP 263 identified the problem that you can use only those Unicode +characters in a Unicode literal which are also in Latin-1, and +introduced a syntax for declaring the source encoding. If no source +encoding was given, the default should be ASCII. For compatibility +with Python 2.0 and 2.1, files were interpreted as Latin-1 for a +transitional period. This transition ended with Python 2.5, which +gives an error if non-ASCII characters are encountered and no source +encoding is declared. + +Rationale +========= + +With PEP 263, using arbitrary non-ASCII characters in a Python file is +possible, but tedious. One has to explicitly add an encoding +declaration. Even though some editors (like IDLE and Emacs) support +the declarations of PEP 263, many editors still do not (and never +will); users have to explicitly adjust the encoding which the editor +assumes on a file-by-file basis. + +When the default encoding is changed to UTF-8, adding non-ASCII text +to Python files becomes easier and more portable: On some systems, +editors will automatically choose UTF-8 when saving text (e.g. on Unix +systems where the locale uses UTF-8). On other systems, editors will +guess the encoding when reading the file, and UTF-8 is easy to +guess. Yet other editors support associating a default encoding with a +file extension, allowing users to associate .py with UTF-8. + +For Python 2, an important reason for using non-UTF-8 encodings was +that byte string literals would be in the source encoding at run-time, +allowing then to output them to a file or render them to the user +as-is. With Python 3, all strings will be Unicode strings, so the +original encoding of the source will have no impact at run-time. + +Implementation +============== + +The parser needs to be changed to accept bytes > 127 if no source +encoding is specified; instead of giving an error, it needs to check +that the bytes are well-formed UTF-8 (decoding is not necessary, +as the parser converts all source code to UTF-8, anyway). + +IDLE needs to be changed to use UTF-8 as the default encoding. + + +References +========== + +.. [#pep263] + http://www.python.org/dev/peps/pep-0263/ + + + +Copyright +========= + +This document has been placed in the public domain. + + + +.. + Local Variables: + mode: indented-text + indent-tabs-mode: nil + sentence-end-double-space: t + fill-column: 70 + coding: utf-8 + End: