Some typo fixes in PEP 3138; also add variables footer.

2012-09-30 08:55:27 +02:00 · 2012-09-30 08:55:27 +02:00 · 9c83bacdff
parent caf06e1290
commit 9c83bacdff
1 changed files with 146 additions and 127 deletions
--- a/pep-3138.txt
+++ b/pep-3138.txt
@ -13,11 +13,11 @@ Post-History: 05-May-2008, 05-Jun-2008
 Abstract
 ========
-This PEP proposes a new string representation form for Python 3000. In
+This PEP proposes a new string representation form for Python 3000.
-Python prior to Python 3000, the repr() built-in function converted
+In Python prior to Python 3000, the repr() built-in function converted
-arbitrary objects to printable ASCII strings for debugging and logging.
+arbitrary objects to printable ASCII strings for debugging and
-For Python 3000, a wider range of characters, based on the Unicode
+logging.  For Python 3000, a wider range of characters, based on the
-standard, should be considered 'printable'.
+Unicode standard, should be considered 'printable'.
 Motivation
@ -28,8 +28,8 @@ algorithm.
 - Convert CR, LF, TAB and '\\' to '\\r', '\\n', '\\t', '\\\\'.
- Convert other non-printable characters(0x00-0x1f, 0x7f) and non-ASCII
+- Convert other non-printable characters(0x00-0x1f, 0x7f) and
-  characters(>=0x80) to '\\xXX'.
+  non-ASCII characters (>= 0x80) to '\\xXX'.
 - Backslash-escape quote characters (apostrophe, ') and add the quote
  character at the beginning and the end.
@ -41,8 +41,8 @@ For Unicode strings, the following additional conversions are done.
 - Convert 16-bit characters (>= 0x100) to '\\uXXXX'.
- Convert 21-bit characters(>=0x10000) and surrogate pair characters to
+- Convert 21-bit characters (>= 0x10000) and surrogate pair characters
-  '\\U00xxxxxx'.
+  to '\\U00xxxxxx'.
 This algorithm converts any string to printable ASCII, and repr() is
 used as a handy and safe way to print strings for debugging or for
@ -53,19 +53,19 @@ ASCII, this is very inconvenient.
 We can use ``print(aJapaneseString)`` to get a readable string, but we
 don't have a similar workaround for printing strings from collections
-such as lists or tuples. ``print(listOfJapaneseStrings)`` uses repr() to
+such as lists or tuples.  ``print(listOfJapaneseStrings)`` uses repr()
-build the string to be printed, so the resulting strings are always
+to build the string to be printed, so the resulting strings are always
-hex-escaped. Or when ``open(japaneseFilemame)`` raises an exception, the
+hex-escaped.  Or when ``open(japaneseFilemame)`` raises an exception,
-error message is something like ``IOError: [Errno 2] No such file or
+the error message is something like ``IOError: [Errno 2] No such file
-directory: '\u65e5\u672c\u8a9e'``, which isn't helpful.
+or directory: '\u65e5\u672c\u8a9e'``, which isn't helpful.
 Python 3000 has a lot of nice features for non-Latin users such as
-non-ASCII identifiers, so it would be helpful if Python could also progress
+non-ASCII identifiers, so it would be helpful if Python could also
-in a similar way for printable output.
+progress in a similar way for printable output.
 Some users might be concerned that such output will mess up their
-console if they print binary data like images. But this is unlikely to
+console if they print binary data like images.  But this is unlikely
-happen in practice because bytes and strings are different types in
+to happen in practice because bytes and strings are different types in
 Python 3000, so printing an image to the console won't mess it up.
 This issue was once discussed by Hye-Shik Chang [1]_, but was rejected.
@ -75,9 +75,10 @@ Specification
 =============
 - Add a new function to the Python C API ``int Py_UNICODE_ISPRINTABLE
-  (Py_UNICODE ch)``. This function returns 0 if repr() should escape the
+  (Py_UNICODE ch)``.  This function returns 0 if repr() should escape
-  Unicode character ``ch``; otherwise it returns 1. Characters that should
+  the Unicode character ``ch``; otherwise it returns 1.  Characters
-  be escaped are defined in the Unicode character database as:
+  that should be escaped are defined in the Unicode character database
  as:
  * Cc (Other, Control)
  * Cf (Other, Format)
@ -85,122 +86,128 @@ Specification
  * Co (Other, Private Use)
  * Cn (Other, Not Assigned)
  * Zl (Separator, Line), refers to LINE SEPARATOR ('\\u2028').
- * Zp (Separator, Paragraph), refers to PARAGRAPH SEPARATOR ('\\u2029').
+  * Zp (Separator, Paragraph), refers to PARAGRAPH SEPARATOR
- * Zs (Separator, Space) other than ASCII space('\\x20'). Characters in
+    ('\\u2029').
-   this category should be escaped to avoid ambiguity.
+  * Zs (Separator, Space) other than ASCII space ('\\x20').  Characters
    in this category should be escaped to avoid ambiguity.
 - The algorithm to build repr() strings should be changed to:
  * Convert CR, LF, TAB and '\\' to '\\r', '\\n', '\\t', '\\\\'.
- * Convert non-printable ASCII characters(0x00-0x1f, 0x7f) to '\\xXX'.
+  * Convert non-printable ASCII characters (0x00-0x1f, 0x7f) to
    '\\xXX'.
- * Convert leading surrogate pair characters without trailing character
+  * Convert leading surrogate pair characters without trailing
-   (0xd800-0xdbff, but not followed by 0xdc00-0xdfff) to '\\uXXXX'.
+    character (0xd800-0xdbff, but not followed by 0xdc00-0xdfff) to
    '\\uXXXX'.
- * Convert non-printable characters(Py_UNICODE_ISPRINTABLE() returns 0)
+  * Convert non-printable characters (Py_UNICODE_ISPRINTABLE() returns
-   to 'xXX', '\\uXXXX' or '\\U00xxxxxx'.
+    0) to 'xXX', '\\uXXXX' or '\\U00xxxxxx'.
- * Backslash-escape quote characters (apostrophe, 0x27) and add quote
+  * Backslash-escape quote characters (apostrophe, 0x27) and add a
-   character at the beginning and the end.
+    quote character at the beginning and the end.
- Set the Unicode error-handler for sys.stderr to 'backslashreplace' by
+- Set the Unicode error-handler for sys.stderr to 'backslashreplace'
-  default.
+  by default.
 - Add a new function to the Python C API ``PyObject *PyObject_ASCII
-  (PyObject *o)``. This function converts any python object to a string
+  (PyObject *o)``.  This function converts any python object to a
-  using PyObject_Repr() and then hex-escapes all non-ASCII characters. 
+  string using PyObject_Repr() and then hex-escapes all non-ASCII
-  ``PyObject_ASCII()`` generates the same string as ``PyObject_Repr()``
+  characters.  ``PyObject_ASCII()`` generates the same string as
-  in Python 2.
+  ``PyObject_Repr()`` in Python 2.
- Add a new built-in function, ``ascii()``. This function converts any
+- Add a new built-in function, ``ascii()``.  This function converts
-  python object to a string using repr() and then hex-escapes all non-ASCII
+  any python object to a string using repr() and then hex-escapes all
-  characters. ``ascii()`` generates the same string as ``repr()`` in
+  non-ASCII characters.  ``ascii()`` generates the same string as
-  Python 2.
+  ``repr()`` in Python 2.
- Add ``'%a'`` string format operator. ``'%a'`` converts any python
+- Add a ``'%a'`` string format operator.  ``'%a'`` converts any python
  object to a string using repr() and then hex-escapes all non-ASCII
-  characters. The ``'%a'`` format operator generates the same string as
+  characters.  The ``'%a'`` format operator generates the same string
-  ``'%r'`` in Python 2. Also, add ``'!a'`` conversion flags to the
+  as ``'%r'`` in Python 2.  Also, add ``'!a'`` conversion flags to the
  ``string.format()`` method and add ``'%A'`` operator to the
-  PyUnicode_FromFormat(). They converts any object to an ASCII string
+  PyUnicode_FromFormat().  They convert any object to an ASCII string
  as ``'%a'`` string format operator.
- Add an ``isprintable()`` method to the string type. ``str.isprintable()``
+- Add an ``isprintable()`` method to the string type.
-  returns False if repr() should escape any character in the string;
+  ``str.isprintable()`` returns False if repr() would escape any
-  otherwise returns True. The ``isprintable()`` method calls the
+  character in the string; otherwise returns True.  The
-  ``Py_UNICODE_ISPRINTABLE()`` function internally.
+  ``isprintable()`` method calls the ``Py_UNICODE_ISPRINTABLE()``
  function internally.
 Rationale
 =========
-The repr() in Python 3000 should be Unicode not ASCII based, just like
+The repr() in Python 3000 should be Unicode, not ASCII based, just
-Python 3000 strings. Also, conversion should not be affected by the
+like Python 3000 strings.  Also, conversion should not be affected by
-locale setting, because the locale is not necessarily the same as the
+the locale setting, because the locale is not necessarily the same as
-output device's locale. For example, it is common for a daemon process
+the output device's locale.  For example, it is common for a daemon
-to be invoked in an ASCII setting, but writes UTF-8 to its log files.
+process to be invoked in an ASCII setting, but writes UTF-8 to its log
-Also, web applications might want to report the error information in
+files.  Also, web applications might want to report the error
-more readable form based on the HTML page's encoding.
+information in more readable form based on the HTML page's encoding.
 Characters not supported by the user's console could be hex-escaped on
-printing, by the Unicode encoder's error-handler. If the error-handler
+printing, by the Unicode encoder's error-handler.  If the
-of the output file is 'backslashreplace', such characters are
+error-handler of the output file is 'backslashreplace', such
-hex-escaped without raising UnicodeEncodeError. For example, if your default
+characters are hex-escaped without raising UnicodeEncodeError.  For
-encoding is ASCII, ``print('Hello ¢')`` will print 'Hello \\xa2'. If
+example, if the default encoding is ASCII, ``print('Hello ¢')`` will
-your encoding is ISO-8859-1, 'Hello ¢' will be printed.
+print 'Hello \\xa2'.  If the encoding is ISO-8859-1, 'Hello ¢' will be
 printed.
-The default error-handler for sys.stdout is 'strict'. Other applications
+The default error-handler for sys.stdout is 'strict'.  Other
-reading the output might not understand hex-escaped characters, so
+applications reading the output might not understand hex-escaped
-unsupported characters should be trapped when writing. If you need to
+characters, so unsupported characters should be trapped when writing.
-escape unsupported characters, you should explicitly change the
+If unsupported characters must be escaped, the error-handler should be
-error-handler. Unlike sys.stdout, sys.stderr doesn't raise
+changed explicitly.  Unlike sys.stdout, sys.stderr doesn't raise
 UnicodeEncodingError by default, because the default error-handler is
-'backslashreplace'. So printing error messeges containing non-ASCII
+'backslashreplace'.  So printing error messages containing non-ASCII
-characters to sys.stderr will not raise an exception. Also, information
+characters to sys.stderr will not raise an exception.  Also,
-about uncaught exceptions (exception object, traceback) are printed by
+information about uncaught exceptions (exception object, traceback) is
-the interpreter without raising exceptions.
+printed by the interpreter without raising exceptions.
 Alternate Solutions
 -------------------
-To help debugging in non-Latin languages without changing repr(), other
+To help debugging in non-Latin languages without changing repr(),
-suggestions were made.
+other suggestions were made.
 - Supply a tool to print lists or dicts.
-  Strings to be printed for debugging are not only contained by lists or
+  Strings to be printed for debugging are not only contained by lists
-  dicts, but also in many other types of object. File objects contain a
+  or dicts, but also in many other types of object.  File objects
-  file name in Unicode, exception objects contain a message in Unicode,
+  contain a file name in Unicode, exception objects contain a message
-  etc. These strings should be printed in readable form when repr()ed.
+  in Unicode, etc.  These strings should be printed in readable form
-  It is unlikely to be possible to implement a tool to print all
+  when repr()ed.  It is unlikely to be possible to implement a tool to
-  possible object types.
+  print all possible object types.
 - Use sys.displayhook and sys.excepthook.
  For interactive sessions, we can write hooks to restore hex escaped
-  characters to the original characters. But these hooks are called only
+  characters to the original characters.  But these hooks are called
-  when printing the result of evaluating an expression entered in an
+  only when printing the result of evaluating an expression entered in
-  interactive Python session, and doesn't work for the ``print()`` function,
+  an interactive Python session, and don't work for the ``print()``
-  for non-interactive sessions or for ``logging.debug("%r", ...)``, etc.
+  function, for non-interactive sessions or for ``logging.debug("%r",
  ...)``, etc.
 - Subclass sys.stdout and sys.stderr.
  It is difficult to implement a subclass to restore hex-escaped
-  characters since there isn't enough information left by the time it's
+  characters since there isn't enough information left by the time
-  a string to undo the escaping correctly in all cases. For example,
+  it's a string to undo the escaping correctly in all cases.  For
-  ``print("\\"+"u0041")`` should be printed as '\\u0041', not 'A'. But
+  example, ``print("\\"+"u0041")`` should be printed as '\\u0041', not
-  there is no chance to tell file objects apart.
+  'A'. But there is no chance to tell file objects apart.
 - Make the encoding used by unicode_repr() adjustable, and make the
  existing repr() the default.
  With adjustable repr(), the result of using repr() is unpredictable
  and would make it impossible to write correct code involving repr().
-  And if current repr() is the default, then the old convention remains
+  And if current repr() is the default, then the old convention
-  intact and users may expect ASCII strings as the result of repr().
+  remains intact and users may expect ASCII strings as the result of
-  Third party applications or libraries could be confused when a custom
+  repr().  Third party applications or libraries could be confused
-  repr() function is used.
+  when a custom repr() function is used.
 Backwards Compatibility
@ -208,8 +215,8 @@ Backwards Compatibility
 Changing repr() may break some existing code, especially testing code.
 Five of Python's regression tests fail with this modification.  If you
-need repr() strings without non-ASCII character as Python 2, you can use
+need repr() strings without non-ASCII character as Python 2, you can
-the following function. ::
+use the following function. ::
  def repr_ascii(obj):
      return str(repr(obj).encode("ASCII", "backslashreplace"), "ASCII")
@ -221,25 +228,25 @@ UnicodeEncodeError. ::
  log.write(repr(data))     # UnicodeEncodeError will be raised
                            # if data contains unsupported characters.
-To avoid exceptions being raised, you can explicitly specify the error-
+To avoid exceptions being raised, you can explicitly specify the
-handler. ::
+error-handler. ::
  log = open("logfile", "w", errors="backslashreplace")
  log.write(repr(data))  # Unsupported characters will be escaped.
-For a console that uses a Unicode-based encoding, for example, en_US.
+For a console that uses a Unicode-based encoding, for example,
-utf8 or de_DE.utf8, the backslashescape trick doesn't work and all
+en_US.utf8 or de_DE.utf8, the backslashreplace trick doesn't work and
-printable characters are not escaped. This will cause a problem of
+all printable characters are not escaped.  This will cause a problem
-similarly drawing characters in Western, Greek and Cyrillic languages.
+of similarly drawing characters in Western, Greek and Cyrillic
-These languages use similar (but different) alphabets (descended from a
+languages.  These languages use similar (but different) alphabets
-common ancestor) and contain letters that look similar but have
+(descended from a common ancestor) and contain letters that look
-different character codes. For example, it is hard to distinguish Latin
+similar but have different character codes.  For example, it is hard
-'a', 'e' and 'o' from Cyrillic 'а', 'е' and 'о'. (The visual
+to distinguish Latin 'a', 'e' and 'o' from Cyrillic 'а', 'е' and 'о'.
-representation, of course, very much depends on the fonts used but
+(The visual representation, of course, very much depends on the fonts
-usually these letters are almost indistinguishable.) To avoid the
+used but usually these letters are almost indistinguishable.)  To
-problem, the user can adjust the terminal encoding to get a result
+avoid the problem, the user can adjust the terminal encoding to get a
-suitable for their environment.
+result suitable for their environment.
 Rejected Proposals
@ -252,20 +259,21 @@ Rejected Proposals
  idea. [2]_
 - Use character names to escape characters, instead of hex character
-  codes. For example, ``repr('\u03b1')`` can be converted to ``"\N{GREEK
+  codes.  For example, ``repr('\u03b1')`` can be converted to
-  SMALL LETTER ALPHA}"``.
+  ``"\N{GREEK SMALL LETTER ALPHA}"``.
  Using character names can be very verbose compared to hex-escape.
-  e.g., ``repr("\ufbf9")`` is converted to ``"\N{ARABIC LIGATURE UIGHUR
+  e.g., ``repr("\ufbf9")`` is converted to ``"\N{ARABIC LIGATURE
-  KIRGHIZ YEH WITH HAMZA ABOVE WITH ALEF MAKSURA ISOLATED FORM}"``.
+  UIGHUR KIRGHIZ YEH WITH HAMZA ABOVE WITH ALEF MAKSURA ISOLATED
  FORM}"``.
 - Default error-handler of sys.stdout should be 'backslashreplace'.
  Stuff written to stdout might be consumed by another program that
-  might misinterpret the \ escapes. For interactive session, it is
+  might misinterpret the \\ escapes.  For interactive sessions, it is
-  possible to make 'backslashreplace' error-handler to default, but may
+  possible to make the 'backslashreplace' error-handler the default,
-  add confusion of the kind "it works in interactive mode but not when
+  but this may add confusion of the kind "it works in interactive mode
-  redirecting to a file".
+  but not when redirecting to a file".
 Implementation
@ -288,3 +296,14 @@ Copyright
 =========
 This document has been placed in the public domain.
 ..
  Local Variables:
  mode: indented-text
  indent-tabs-mode: nil
  sentence-end-double-space: t
  fill-column: 70
  coding: utf-8
  End: