Convert PEP 263 to reST (#191)

- convert the content to text/x-rst - update http://python.org/sf/* url with https://bugs.python.org/issue*
2017-02-02 09:58:20 -08:00 · 2017-02-02 09:58:20 -08:00 · 9b58a292ea
parent 05dbf8cb04
commit 9b58a292ea
1 changed files with 189 additions and 162 deletions
--- a/pep-0263.txt
+++ b/pep-0263.txt
@ -6,133 +6,144 @@ Author: mal@lemburg.com (Marc-André Lemburg),
  martin@v.loewis.de (Martin von Löwis)
 Status: Final
 Type: Standards Track
+Content-Type: text/x-rst
 Created: 06-Jun-2001
 Python-Version: 2.3
-Post-History: 
+Post-History:
+

 Abstract
+========
+
+This PEP proposes to introduce a syntax to declare the encoding of
+a Python source file. The encoding information is then used by the
+Python parser to interpret the file using the given encoding. Most
+notably this enhances the interpretation of Unicode literals in
+the source code and makes it possible to write Unicode literals
+using e.g. UTF-8 directly in an Unicode aware editor.

-    This PEP proposes to introduce a syntax to declare the encoding of
-    a Python source file. The encoding information is then used by the
-    Python parser to interpret the file using the given encoding. Most
-    notably this enhances the interpretation of Unicode literals in
-    the source code and makes it possible to write Unicode literals
-    using e.g. UTF-8 directly in an Unicode aware editor.

 Problem
+=======
+
+In Python 2.1, Unicode literals can only be written using the
+Latin-1 based encoding "unicode-escape". This makes the
+programming environment rather unfriendly to Python users who live
+and work in non-Latin-1 locales such as many of the Asian
+countries. Programmers can write their 8-bit strings using the
+favorite encoding, but are bound to the "unicode-escape" encoding
+for Unicode literals.

-    In Python 2.1, Unicode literals can only be written using the
-    Latin-1 based encoding "unicode-escape". This makes the
-    programming environment rather unfriendly to Python users who live
-    and work in non-Latin-1 locales such as many of the Asian 
-    countries. Programmers can write their 8-bit strings using the
-    favorite encoding, but are bound to the "unicode-escape" encoding
-    for Unicode literals.

 Proposed Solution
+=================

-    I propose to make the Python source code encoding both visible and
-    changeable on a per-source file basis by using a special comment
-    at the top of the file to declare the encoding.
+I propose to make the Python source code encoding both visible and
+changeable on a per-source file basis by using a special comment
+at the top of the file to declare the encoding.
+
+To make Python aware of this encoding declaration a number of
+concept changes are necessary with respect to the handling of
+Python source code data.

-    To make Python aware of this encoding declaration a number of
-    concept changes are necessary with respect to the handling of
-    Python source code data.

 Defining the Encoding
+=====================

-    Python will default to ASCII as standard encoding if no other
-    encoding hints are given.
+Python will default to ASCII as standard encoding if no other
+encoding hints are given.

-    To define a source code encoding, a magic comment must
-    be placed into the source files either as first or second
-    line in the file, such as:
+To define a source code encoding, a magic comment must
+be placed into the source files either as first or second
+line in the file, such as::

-          # coding=<encoding name>
+    # coding=<encoding name>

-    or (using formats recognized by popular editors)
+or (using formats recognized by popular editors)::

-          #!/usr/bin/python
-          # -*- coding: <encoding name> -*-
+    #!/usr/bin/python
+    # -*- coding: <encoding name> -*-

-    or
+or::

-          #!/usr/bin/python
-          # vim: set fileencoding=<encoding name> :
+    #!/usr/bin/python
+    # vim: set fileencoding=<encoding name> ::

-    More precisely, the first or second line must match the regular
-    expression "^[ \t\v]*#.*?coding[:=][ \t]*([-_.a-zA-Z0-9]+)".
-    The first group of this
-    expression is then interpreted as encoding name. If the encoding
-    is unknown to Python, an error is raised during compilation. There
-    must not be any Python statement on the line that contains the
-    encoding declaration.  If the first line matches the second line
-    is ignored.
+More precisely, the first or second line must match the regular
+expression "``^[ \t\v]*#.*?coding[:=][ \t]*([-_.a-zA-Z0-9]+)```".
+The first group of this
+expression is then interpreted as encoding name. If the encoding
+is unknown to Python, an error is raised during compilation. There
+must not be any Python statement on the line that contains the
+encoding declaration.  If the first line matches the second line
+is ignored.

-    To aid with platforms such as Windows, which add Unicode BOM marks
-    to the beginning of Unicode files, the UTF-8 signature
-    '\xef\xbb\xbf' will be interpreted as 'utf-8' encoding as well
-    (even if no magic encoding comment is given).
+To aid with platforms such as Windows, which add Unicode BOM marks
+to the beginning of Unicode files, the UTF-8 signature
+'``\xef\xbb\xbf``' will be interpreted as 'utf-8' encoding as well
+(even if no magic encoding comment is given).
+
+If a source file uses both the UTF-8 BOM mark signature and a
+magic encoding comment, the only allowed encoding for the comment
+is 'utf-8'.  Any other encoding will cause an error.

-    If a source file uses both the UTF-8 BOM mark signature and a
-    magic encoding comment, the only allowed encoding for the comment
-    is 'utf-8'.  Any other encoding will cause an error.

 Examples
+========

-    These are some examples to clarify the different styles for
-    defining the source code encoding at the top of a Python source
-    file:
+These are some examples to clarify the different styles for
+defining the source code encoding at the top of a Python source
+file:

-    1. With interpreter binary and using Emacs style file encoding
-       comment:
+1. With interpreter binary and using Emacs style file encoding
+   comment::

-          #!/usr/bin/python
-          # -*- coding: latin-1 -*-
-          import os, sys
-          ...
+       #!/usr/bin/python
+       # -*- coding: latin-1 -*-
+       import os, sys
+       ...

-          #!/usr/bin/python
-          # -*- coding: iso-8859-15 -*-
-          import os, sys
-          ...
+       #!/usr/bin/python
+       # -*- coding: iso-8859-15 -*-
+       import os, sys
+       ...

-          #!/usr/bin/python
-          # -*- coding: ascii -*-
-          import os, sys
-          ...
+       #!/usr/bin/python
+       # -*- coding: ascii -*-
+       import os, sys
+       ...

-    2. Without interpreter line, using plain text:
+2. Without interpreter line, using plain text::

-          # This Python file uses the following encoding: utf-8
-          import os, sys
-          ...
+       # This Python file uses the following encoding: utf-8
+       import os, sys
+       ...

-    3. Text editors might have different ways of defining the file's
-       encoding, e.g.
+3. Text editors might have different ways of defining the file's
+   encoding, e.g.::

-          #!/usr/local/bin/python
-          # coding: latin-1
-          import os, sys
-          ...
+       #!/usr/local/bin/python
+       # coding: latin-1
+       import os, sys
+       ...

-    4. Without encoding comment, Python's parser will assume ASCII
-       text:
+4. Without encoding comment, Python's parser will assume ASCII
+   text::

-          #!/usr/local/bin/python
-          import os, sys
-          ...
+       #!/usr/local/bin/python
+       import os, sys
+       ...

-    5. Encoding comments which don't work:
+5. Encoding comments which don't work:

-       Missing "coding:" prefix:
+   1. Missing "coding:" prefix::

          #!/usr/local/bin/python
          # latin-1
          import os, sys
          ...

-       Encoding comment not on line 1 or 2:
+   2. Encoding comment not on line 1 or 2::

          #!/usr/local/bin/python
          #
@ -140,125 +151,141 @@ Examples
          import os, sys
          ...

-       Unsupported encoding:
+   3. Unsupported encoding::

          #!/usr/local/bin/python
          # -*- coding: utf-42 -*-
          import os, sys
          ...

+
 Concepts
+========

-    The PEP is based on the following concepts which would have to be
-    implemented to enable usage of such a magic comment:
+The PEP is based on the following concepts which would have to be
+implemented to enable usage of such a magic comment:

-    1. The complete Python source file should use a single encoding.
-       Embedding of differently encoded data is not allowed and will
-       result in a decoding error during compilation of the Python
-       source code.
+1. The complete Python source file should use a single encoding.
+   Embedding of differently encoded data is not allowed and will
+   result in a decoding error during compilation of the Python
+   source code.

-       Any encoding which allows processing the first two lines in the
-       way indicated above is allowed as source code encoding, this
-       includes ASCII compatible encodings as well as certain
-       multi-byte encodings such as Shift_JIS. It does not include
-       encodings which use two or more bytes for all characters like
-       e.g. UTF-16. The reason for this is to keep the encoding
-       detection algorithm in the tokenizer simple.
+   Any encoding which allows processing the first two lines in the
+   way indicated above is allowed as source code encoding, this
+   includes ASCII compatible encodings as well as certain
+   multi-byte encodings such as Shift_JIS. It does not include
+   encodings which use two or more bytes for all characters like
+   e.g. UTF-16. The reason for this is to keep the encoding
+   detection algorithm in the tokenizer simple.

-    2. Handling of escape sequences should continue to work as it does 
-       now, but with all possible source code encodings, that is
-       standard string literals (both 8-bit and Unicode) are subject to 
-       escape sequence expansion while raw string literals only expand
-       a very small subset of escape sequences.
+2. Handling of escape sequences should continue to work as it does
+   now, but with all possible source code encodings, that is
+   standard string literals (both 8-bit and Unicode) are subject to
+   escape sequence expansion while raw string literals only expand
+   a very small subset of escape sequences.

-    3. Python's tokenizer/compiler combo will need to be updated to
-       work as follows:
+3. Python's tokenizer/compiler combo will need to be updated to
+   work as follows:

-       1. read the file
+   1. read the file

-       2. decode it into Unicode assuming a fixed per-file encoding
+   2. decode it into Unicode assuming a fixed per-file encoding

-       3. convert it into a UTF-8 byte string
+   3. convert it into a UTF-8 byte string

-       4. tokenize the UTF-8 content
+   4. tokenize the UTF-8 content

-       5. compile it, creating Unicode objects from the given Unicode data
-          and creating string objects from the Unicode literal data
-          by first reencoding the UTF-8 data into 8-bit string data
-          using the given file encoding
+   5. compile it, creating Unicode objects from the given Unicode data
+      and creating string objects from the Unicode literal data
+      by first reencoding the UTF-8 data into 8-bit string data
+      using the given file encoding
+
+Note that Python identifiers are restricted to the ASCII
+subset of the encoding, and thus need no further conversion
+after step 4.

-       Note that Python identifiers are restricted to the ASCII
-       subset of the encoding, and thus need no further conversion
-       after step 4.

 Implementation
+==============

-    For backwards-compatibility with existing code which currently
-    uses non-ASCII in string literals without declaring an encoding,
-    the implementation will be introduced in two phases:
+For backwards-compatibility with existing code which currently
+uses non-ASCII in string literals without declaring an encoding,
+the implementation will be introduced in two phases:

-    1. Allow non-ASCII in string literals and comments, by internally
-       treating a missing encoding declaration as a declaration of
-       "iso-8859-1". This will cause arbitrary byte strings to
-       correctly round-trip between step 2 and step 5 of the
-       processing, and provide compatibility with Python 2.2 for
-       Unicode literals that contain non-ASCII bytes.
+1. Allow non-ASCII in string literals and comments, by internally
+   treating a missing encoding declaration as a declaration of
+   "iso-8859-1". This will cause arbitrary byte strings to
+   correctly round-trip between step 2 and step 5 of the
+   processing, and provide compatibility with Python 2.2 for
+   Unicode literals that contain non-ASCII bytes.

-       A warning will be issued if non-ASCII bytes are found in the
-       input, once per improperly encoded input file.
+   A warning will be issued if non-ASCII bytes are found in the
+   input, once per improperly encoded input file.

-    2. Remove the warning, and change the default encoding to "ascii".
+2. Remove the warning, and change the default encoding to "ascii".

-    The builtin compile() API will be enhanced to accept Unicode as
-    input. 8-bit string input is subject to the standard procedure for
-    encoding detection as described above.
+The builtin ``compile()`` API will be enhanced to accept Unicode as
+input.  8-bit string input is subject to the standard procedure for
+encoding detection as described above.

-    If a Unicode string with a coding declaration is passed to compile(),
-    a SyntaxError will be raised.
+If a Unicode string with a coding declaration is passed to ``compile()``,
+a ``SyntaxError`` will be raised.
+
+SUZUKI Hisao is working on a patch; see [2]_ for details. A patch
+implementing only phase 1 is available at [1]_.

-    SUZUKI Hisao is working on a patch; see [2] for details. A patch
-    implementing only phase 1 is available at [1].

 Phases
-    Implementation of steps 1 and 2 above were completed in 2.3,
-    except for changing the default encoding to "ascii".
+======
+
+Implementation of steps 1 and 2 above were completed in 2.3,
+except for changing the default encoding to "ascii".
+
+The default encoding was set to "ascii" in version 2.5.
+

-    The default encoding was set to "ascii" in version 2.5.
-   
 Scope
+=====
+
+This PEP intends to provide an upgrade path from the current
+(more-or-less) undefined source code encoding situation to a more
+robust and portable definition.

-    This PEP intends to provide an upgrade path from the current
-    (more-or-less) undefined source code encoding situation to a more
-    robust and portable definition.

 References
+==========

-    [1] Phase 1 implementation:
-        http://python.org/sf/526840
-    [2] Phase 2 implementation:
-        http://python.org/sf/534304
+.. [1] Phase 1 implementation:
+       https://bugs.python.org/issue526840
+
+.. [2] Phase 2 implementation:
+       https://bugs.python.org/issue534304

 History
+=======
+
+- 1.10 and above: see CVS history
+- 1.8: Added '.' to the coding RE.
+- 1.7: Added warnings to phase 1 implementation. Replaced the
+  Latin-1 default encoding with the interpreter's default
+  encoding. Added tweaks to ``compile()``.
+- 1.4 - 1.6: Minor tweaks
+- 1.3: Worked in comments by Martin v. Loewis:
+  UTF-8 BOM mark detection, Emacs style magic comment,
+  two phase approach to the implementation

-    1.10 and above: see CVS history
-    1.8: Added '.' to the coding RE.
-    1.7: Added warnings to phase 1 implementation. Replaced the
-         Latin-1 default encoding with the interpreter's default
-         encoding. Added tweaks to compile().
-    1.4 - 1.6: Minor tweaks
-    1.3: Worked in comments by Martin v. Loewis: 
-         UTF-8 BOM mark detection, Emacs style magic comment,
-         two phase approach to the implementation

 Copyright
+=========

-    This document has been placed in the public domain.
+This document has been placed in the public domain.

-
-Local Variables:
-mode: indented-text
-indent-tabs-mode: nil
-sentence-end-double-space: t
-fill-column: 70
-coding: utf-8
-End:
+
+..
+  Local Variables:
+  mode: indented-text
+  indent-tabs-mode: nil
+  sentence-end-double-space: t
+  fill-column: 70
+  coding: utf-8
+  End: