reSTify PEP 293 (#353)

2017-08-19 03:02:48 +08:00 · 2017-08-19 03:02:48 +08:00 · 3ea921ba72
parent e6fe4f377f
commit 3ea921ba72
1 changed files with 328 additions and 311 deletions
--- a/pep-0293.txt
+++ b/pep-0293.txt
@ -5,12 +5,14 @@ Last-Modified: $Date$
 Author: Walter Dörwald <walter@livinglogic.de>
 Status: Final
 Type: Standards Track
+Content-Type: text/x-rst
 Created: 18-Jun-2002
 Python-Version: 2.3
 Post-History: 19-Jun-2002


 Abstract
+========

 This PEP aims at extending Python's fixed codec error handling
 schemes with a more flexible callback based approach.
@ -25,6 +27,7 @@ Abstract


 Specification
+=============

 Currently the set of codec error handling algorithms is fixed to
 either "strict", "replace" or "ignore" and the semantics of these
@ -33,19 +36,19 @@ Specification
 The proposed patch will make the set of error handling algorithms
 extensible through a codec error handler registry which maps
 handler names to handler functions.  This registry consists of the
-    following two C functions:
+following two C functions::

    int PyCodec_RegisterError(const char *name, PyObject *error)

    PyObject *PyCodec_LookupError(const char *name)

-    and their Python counterparts
+and their Python counterparts::

    codecs.register_error(name, error)

    codecs.lookup_error(name)

-    PyCodec_LookupError raises a LookupError if no callback function
+``PyCodec_LookupError`` raises a ``LookupError`` if no callback function
 has been registered under this name.

 Similar to the encoding name registry there is no way of
@ -59,7 +62,7 @@ Specification
 with this object.  The callback returns information about how to
 proceed (or raises an exception).

-    For encoding, the exception object will look like this:
+For encoding, the exception object will look like this::

    class UnicodeEncodeError(UnicodeError):
        def __init__(self, encoding, object, start, end, reason):
@ -77,14 +80,14 @@ Specification
 getter methods for the attributes, which have the following
 meaning:

-      * encoding: The name of the encoding;
-      * object: The original unicode object for which encode() has
+* ``encoding``: The name of the encoding;
+* ``object``: The original unicode object for which ``encode()`` has
  been called;
-      * start: The position of the first unencodable character;
-      * end: (The position of the last unencodable character)+1 (or
+* ``start``: The position of the first unencodable character;
+* ``end``: (The position of the last unencodable character)+1 (or
  the length of object, if all characters from start to the end
  of object are unencodable);
-      * reason: The reason why object[start:end] couldn't be encoded.
+* ``reason``: The reason why ``object[start:end]`` couldn't be encoded.

 If object has consecutive unencodable characters, the encoder
 should collect those characters for one call to the callback if
@ -95,18 +98,18 @@ Specification

 The callback must not modify the exception object.  If the
 callback does not raise an exception (either the one passed in, or
-    a different one), it must return a tuple:
+a different one), it must return a tuple::

    (replacement, newpos)

 replacement is a unicode object that the encoder will encode and
-    emit instead of the unencodable object[start:end] part, newpos
+emit instead of the unencodable ``object[start:end]`` part, newpos
 specifies a new position within object, where (after encoding the
 replacement) the encoder will continue encoding.

 Negative values for newpos are treated as being relative to
 end of object. If newpos is out of bounds the encoder will raise
-    an IndexError.
+an ``IndexError``.

 If the replacement string itself contains an unencodable character
 the encoder raises the exception object (but may set a different
@ -115,44 +118,45 @@ Specification
 Should further encoding errors occur, the encoder is allowed to
 reuse the exception object for the next call to the callback.
 Furthermore, the encoder is allowed to cache the result of
-    codecs.lookup_error.
+``codecs.lookup_error``.

 If the callback does not know how to handle the exception, it must
-    raise a TypeError.
+raise a ``TypeError``.

 Decoding works similar to encoding with the following differences:
-    The exception class is named UnicodeDecodeError and the attribute
+
+* The exception class is named ``UnicodeDecodeError`` and the attribute
  object is the original 8bit string that the decoder is currently
  decoding.

-    The decoder will call the callback with those bytes that
+* The decoder will call the callback with those bytes that
  constitute one undecodable sequence, even if there is more than
  one undecodable sequence that is undecodable for the same reason
  directly after the first one.  E.g. for the "unicode-escape"
-    encoding, when decoding the illegal string "\\u00\\u01x", the
-    callback will be called twice (once for "\\u00" and once for
-    "\\u01").  This is done to be able to generate the correct number
+  encoding, when decoding the illegal string ``\\u00\\u01x``, the
+  callback will be called twice (once for ``\\u00`` and once for
+  ``\\u01``).  This is done to be able to generate the correct number
  of replacement characters.

-    The replacement returned from the callback is a unicode object
+* The replacement returned from the callback is a unicode object
  that will be emitted by the decoder as-is without further
-    processing instead of the undecodable object[start:end] part.
+  processing instead of the undecodable ``object[start:end]`` part.

 There is a third API that uses the old strict/ignore/replace error
-    handling scheme:
+handling scheme::

    PyUnicode_TranslateCharmap/unicode.translate

-    The proposed patch will enhance PyUnicode_TranslateCharmap, so
+The proposed patch will enhance ``PyUnicode_TranslateCharmap``, so
 that it also supports the callback registry.  This has the
-    additional side effect that PyUnicode_TranslateCharmap will
+additional side effect that ``PyUnicode_TranslateCharmap`` will
 support multi-character replacement strings (see SF feature
-    request #403100 [1]).
+request #403100 [1]_).

-    For PyUnicode_TranslateCharmap the exception class will be named
-    UnicodeTranslateError.  PyUnicode_TranslateCharmap will collect
+For ``PyUnicode_TranslateCharmap`` the exception class will be named
+``UnicodeTranslateError``.  ``PyUnicode_TranslateCharmap`` will collect
 all consecutive untranslatable characters (i.e. those that map to
-    None) and call the callback with them.  The replacement returned
+``None``) and call the callback with them.  The replacement returned
 from the callback is a unicode object that will be put in the
 translated result as-is, without further processing.

@ -163,9 +167,9 @@ Specification
 callback names: "backslashreplace" and "xmlcharrefreplace", which
 can be used for encoding and translating and which will also be
 implemented in-place for all encoders and
-    PyUnicode_TranslateCharmap.
+``PyUnicode_TranslateCharmap``.

-    The Python equivalent of these five callbacks will look like this:
+The Python equivalent of these five callbacks will look like this::

    def strict(exc):
        raise exc
@ -212,16 +216,17 @@ Specification
             raise TypeError("can't handle %s" % exc.__name__)

 These five callback handlers will also be accessible to Python as
-    codecs.strict_error, codecs.ignore_error, codecs.replace_error,
-    codecs.backslashreplace_error and codecs.xmlcharrefreplace_error.
+``codecs.strict_error``, ``codecs.ignore_error``, ``codecs.replace_error``,
+``codecs.backslashreplace_error`` and ``codecs.xmlcharrefreplace_error``.


 Rationale
+=========

 Most legacy encoding do not support the full range of Unicode
 characters.  For these cases many high level protocols support a
 way of escaping a Unicode character (e.g. Python itself supports
-    the \x, \u and \U convention, XML supports character references
+the ``\x``, ``\u`` and ``\U`` convention, XML supports character references
 via &#xxx; etc.).

 When implementing such an encoding algorithm, a problem with the
@ -231,12 +236,16 @@ Rationale
 because encode does not provide any information about the location
 of the error(s), so

+::
+
    # (1)
    us = u"xxx"
    s = us.encode(encoding)

 has to be replaced by

+::
+
    # (2)
    us = u"xxx"
    v = []
@ -257,7 +266,7 @@ Rationale
 character.

 To work around this problem, a stream writer - which keeps state
-    between calls to the encoding function - has to be used:
+between calls to the encoding function - has to be used::

    # (3)
    us = u"xxx"
@ -274,7 +283,7 @@ Rationale
    s = v.getvalue()

 To compare the speed of (1) and (3) the following test script has
-    been used:
+been used::

    # (4)
    import time
@ -306,7 +315,7 @@ Rationale
    print "2:", t3-t2
    print "factor:", (t3-t2)/(t2-t1)

-    On Linux this gives the following output (with Python 2.3a0):
+On Linux this gives the following output (with Python 2.3a0)::

    1: 0.274321913719
    2: 51.1284689903
@ -316,19 +325,23 @@ Rationale

 Callbacks must be stateless, because as soon as a callback is
 registered it is available globally and can be called by multiple
-    encode() calls.  To be able to use stateful callbacks, the errors
+``encode()`` calls.  To be able to use stateful callbacks, the errors
 parameter for encode/decode/translate would have to be changed
-    from char * to PyObject *, so that the callback could be used
+from ``char *`` to ``PyObject *``, so that the callback could be used
 directly, without the need to register the callback globally.  As
 this requires changes to lots of C prototypes, this approach was
 rejected.

 Currently all encoding/decoding functions have arguments

+::
+
    const Py_UNICODE *p, int size

 or

+::
+
    const char *p, int size

 to specify the unicode characters/8bit characters to be
@ -343,35 +356,36 @@ Rationale
 For stream readers/writers the errors attribute must be changeable
 to be able to switch between different error handling methods
 during the lifetime of the stream reader/writer. This is currently
-    the case for codecs.StreamReader and codecs.StreamWriter and
+the case for ``codecs.StreamReader`` and ``codecs.StreamWriter`` and
 all their subclasses. All core codecs and probably most of the
-    third party codecs (e.g. JapaneseCodecs) derive their stream
+third party codecs (e.g. ``JapaneseCodecs``) derive their stream
 readers/writers from these classes so this already works,
 but the attribute errors should be documented as a requirement.


 Implementation Notes
+====================

 A sample implementation is available as SourceForge patch #432401
-    [2] including a script for testing the speed of various
+[2]_ including a script for testing the speed of various
 string/encoding/error combinations and a test script.

 Currently the new exception classes are old style Python
 classes. This means that accessing attributes results
 in a dict lookup. The C API is implemented in a way
 that makes it possible to switch to new style classes
-    behind the scene, if Exception (and UnicodeError) will
+behind the scene, if ``Exception`` (and ``UnicodeError``) will
 be changed to new style classes implemented in C for
 improved performance.

-    The class codecs.StreamReaderWriter uses the errors parameter for
+The class ``codecs.StreamReaderWriter`` uses the errors parameter for
 both reading and writing.  To be more flexible this should
 probably be changed to two separate parameters for reading and
 writing.

-    The errors parameter of PyUnicode_TranslateCharmap is not
+The errors parameter of ``PyUnicode_TranslateCharmap`` is not
 availably to Python, which makes testing of the new functionality
-    of PyUnicode_TranslateCharmap impossible with Python scripts.  The
+of ``PyUnicode_TranslateCharmap`` impossible with Python scripts.  The
 patch should add an optional argument errors to unicode.translate
 to expose the functionality and make testing possible.

@ -379,11 +393,12 @@ Implementation Notes
 unicode and want to use the new machinery can define their own
 exception classes and the strict handlers will automatically work
 with it. The other predefined error handlers are unicode specific
-    and expect to get a Unicode(Encode|Decode|Translate)Error
+and expect to get a ``Unicode(Encode|Decode|Translate)Error``
 exception object so they won't work.


 Backwards Compatibility
+=======================

 The semantics of unicode.encode with errors="replace" has changed:
 The old version always stored a ? character in the output string
@ -393,26 +408,28 @@ Backwards Compatibility
 supported encodings are ASCII based, and thus map ? to ?, this
 should not be a problem in practice.

-    Illegal values for the errors argument raised ValueError before,
-    now they will raise LookupError.
+Illegal values for the errors argument raised ``ValueError`` before,
+now they will raise ``LookupError``.


 References
+==========

-    [1] SF feature request #403100
+.. [1] SF feature request #403100
       "Multicharacter replacements in PyUnicode_TranslateCharmap"
       http://www.python.org/sf/403100

-    [2] SF patch #432401 "unicode encoding error callbacks"
+.. [2] SF patch #432401 "unicode encoding error callbacks"
       http://www.python.org/sf/432401


 Copyright
+=========

 This document has been placed in the public domain.


-
+..
   Local Variables:
   mode: indented-text
   indent-tabs-mode: nil