python-peps/pep-3138.txt

PEP: 3138
Title: String representation in Python 3000
Version: $Revision$
Last-Modified: $Date$
Author: Atsuo Ishimoto <ishimoto--at--gembook.org>
Status: Final
Type: Standards Track
Content-Type: text/x-rst
Created: 05-May-2008
Post-History: 05-May-2008, 05-Jun-2008


Abstract
========

This PEP proposes a new string representation form for Python 3000.
In Python prior to Python 3000, the repr() built-in function converted
arbitrary objects to printable ASCII strings for debugging and
logging.  For Python 3000, a wider range of characters, based on the
Unicode standard, should be considered 'printable'.


Motivation
==========

The current repr() converts 8-bit strings to ASCII using following
algorithm.

- Convert CR, LF, TAB and '\\' to '\\r', '\\n', '\\t', '\\\\'.

- Convert other non-printable characters(0x00-0x1f, 0x7f) and
  non-ASCII characters (>= 0x80) to '\\xXX'.

- Backslash-escape quote characters (apostrophe, ') and add the quote
  character at the beginning and the end.

For Unicode strings, the following additional conversions are done.

- Convert leading surrogate pair characters without trailing character
  (0xd800-0xdbff, but not followed by 0xdc00-0xdfff) to '\\uXXXX'.

- Convert 16-bit characters (>= 0x100) to '\\uXXXX'.

- Convert 21-bit characters (>= 0x10000) and surrogate pair characters
  to '\\U00xxxxxx'.

This algorithm converts any string to printable ASCII, and repr() is
used as a handy and safe way to print strings for debugging or for
logging.  Although all non-ASCII characters are escaped, this does not
matter when most of the string's characters are ASCII.  But for other
languages, such as Japanese where most characters in a string are not
ASCII, this is very inconvenient.

We can use ``print(aJapaneseString)`` to get a readable string, but we
don't have a similar workaround for printing strings from collections
such as lists or tuples.  ``print(listOfJapaneseStrings)`` uses repr()
to build the string to be printed, so the resulting strings are always
hex-escaped.  Or when ``open(japaneseFilename)`` raises an exception,
the error message is something like ``IOError: [Errno 2] No such file
or directory: '\u65e5\u672c\u8a9e'``, which isn't helpful.

Python 3000 has a lot of nice features for non-Latin users such as
non-ASCII identifiers, so it would be helpful if Python could also
progress in a similar way for printable output.

Some users might be concerned that such output will mess up their
console if they print binary data like images.  But this is unlikely
to happen in practice because bytes and strings are different types in
Python 3000, so printing an image to the console won't mess it up.

This issue was once discussed by Hye-Shik Chang [1]_, but was rejected.


Specification
=============

- Add a new function to the Python C API ``int Py_UNICODE_ISPRINTABLE
  (Py_UNICODE ch)``.  This function returns 0 if repr() should escape
  the Unicode character ``ch``; otherwise it returns 1.  Characters
  that should be escaped are defined in the Unicode character database
  as:

  * Cc (Other, Control)
  * Cf (Other, Format)
  * Cs (Other, Surrogate)
  * Co (Other, Private Use)
  * Cn (Other, Not Assigned)
  * Zl (Separator, Line), refers to LINE SEPARATOR ('\\u2028').
  * Zp (Separator, Paragraph), refers to PARAGRAPH SEPARATOR
    ('\\u2029').
  * Zs (Separator, Space) other than ASCII space ('\\x20').  Characters
    in this category should be escaped to avoid ambiguity.

- The algorithm to build repr() strings should be changed to:

  * Convert CR, LF, TAB and '\\' to '\\r', '\\n', '\\t', '\\\\'.

  * Convert non-printable ASCII characters (0x00-0x1f, 0x7f) to
    '\\xXX'.

  * Convert leading surrogate pair characters without trailing
    character (0xd800-0xdbff, but not followed by 0xdc00-0xdfff) to
    '\\uXXXX'.

  * Convert non-printable characters (Py_UNICODE_ISPRINTABLE() returns
    0) to '\\xXX', '\\uXXXX' or '\\U00xxxxxx'.

  * Backslash-escape quote characters (apostrophe, 0x27) and add a
    quote character at the beginning and the end.

- Set the Unicode error-handler for sys.stderr to 'backslashreplace'
  by default.

- Add a new function to the Python C API ``PyObject *PyObject_ASCII
  (PyObject *o)``.  This function converts any python object to a
  string using PyObject_Repr() and then hex-escapes all non-ASCII
  characters.  ``PyObject_ASCII()`` generates the same string as
  ``PyObject_Repr()`` in Python 2.

- Add a new built-in function, ``ascii()``.  This function converts
  any python object to a string using repr() and then hex-escapes all
  non-ASCII characters.  ``ascii()`` generates the same string as
  ``repr()`` in Python 2.

- Add a ``'%a'`` string format operator.  ``'%a'`` converts any python
  object to a string using repr() and then hex-escapes all non-ASCII
  characters.  The ``'%a'`` format operator generates the same string
  as ``'%r'`` in Python 2.  Also, add ``'!a'`` conversion flags to the
  ``string.format()`` method and add ``'%A'`` operator to the
  PyUnicode_FromFormat().  They convert any object to an ASCII string
  as ``'%a'`` string format operator.

- Add an ``isprintable()`` method to the string type.
  ``str.isprintable()`` returns False if repr() would escape any
  character in the string; otherwise returns True.  The
  ``isprintable()`` method calls the ``Py_UNICODE_ISPRINTABLE()``
  function internally.


Rationale
=========

The repr() in Python 3000 should be Unicode, not ASCII based, just
like Python 3000 strings.  Also, conversion should not be affected by
the locale setting, because the locale is not necessarily the same as
the output device's locale.  For example, it is common for a daemon
process to be invoked in an ASCII setting, but writes UTF-8 to its log
files.  Also, web applications might want to report the error
information in more readable form based on the HTML page's encoding.

Characters not supported by the user's console could be hex-escaped on
printing, by the Unicode encoder's error-handler.  If the
error-handler of the output file is 'backslashreplace', such
characters are hex-escaped without raising UnicodeEncodeError.  For
example, if the default encoding is ASCII, ``print('Hello ¢')`` will
print 'Hello \\xa2'.  If the encoding is ISO-8859-1, 'Hello ¢' will be
printed.

The default error-handler for sys.stdout is 'strict'.  Other
applications reading the output might not understand hex-escaped
characters, so unsupported characters should be trapped when writing.
If unsupported characters must be escaped, the error-handler should be
changed explicitly.  Unlike sys.stdout, sys.stderr doesn't raise
UnicodeEncodingError by default, because the default error-handler is
'backslashreplace'.  So printing error messages containing non-ASCII
characters to sys.stderr will not raise an exception.  Also,
information about uncaught exceptions (exception object, traceback) is
printed by the interpreter without raising exceptions.

Alternate Solutions
-------------------

To help debugging in non-Latin languages without changing repr(),
other suggestions were made.

- Supply a tool to print lists or dicts.

  Strings to be printed for debugging are not only contained by lists
  or dicts, but also in many other types of object.  File objects
  contain a file name in Unicode, exception objects contain a message
  in Unicode, etc.  These strings should be printed in readable form
  when repr()ed.  It is unlikely to be possible to implement a tool to
  print all possible object types.

- Use sys.displayhook and sys.excepthook.

  For interactive sessions, we can write hooks to restore hex escaped
  characters to the original characters.  But these hooks are called
  only when printing the result of evaluating an expression entered in
  an interactive Python session, and don't work for the ``print()``
  function, for non-interactive sessions or for ``logging.debug("%r",
  ...)``, etc.

- Subclass sys.stdout and sys.stderr.

  It is difficult to implement a subclass to restore hex-escaped
  characters since there isn't enough information left by the time
  it's a string to undo the escaping correctly in all cases.  For
  example, ``print("\\"+"u0041")`` should be printed as '\\u0041', not
  'A'. But there is no chance to tell file objects apart.

- Make the encoding used by unicode_repr() adjustable, and make the
  existing repr() the default.

  With adjustable repr(), the result of using repr() is unpredictable
  and would make it impossible to write correct code involving repr().
  And if current repr() is the default, then the old convention
  remains intact and users may expect ASCII strings as the result of
  repr().  Third party applications or libraries could be confused
  when a custom repr() function is used.


Backwards Compatibility
=======================

Changing repr() may break some existing code, especially testing code.
Five of Python's regression tests fail with this modification.  If you
need repr() strings without non-ASCII character as Python 2, you can
use the following function. ::

  def repr_ascii(obj):
      return str(repr(obj).encode("ASCII", "backslashreplace"), "ASCII")

For logging or for debugging, the following code can raise
UnicodeEncodeError. ::

  log = open("logfile", "w")
  log.write(repr(data))     # UnicodeEncodeError will be raised
                            # if data contains unsupported characters.

To avoid exceptions being raised, you can explicitly specify the
error-handler. ::

  log = open("logfile", "w", errors="backslashreplace")
  log.write(repr(data))  # Unsupported characters will be escaped.


For a console that uses a Unicode-based encoding, for example,
en_US.utf8 or de_DE.utf8, the backslashreplace trick doesn't work and
all printable characters are not escaped.  This will cause a problem
of similarly drawing characters in Western, Greek and Cyrillic
languages.  These languages use similar (but different) alphabets
(descended from a common ancestor) and contain letters that look
similar but have different character codes.  For example, it is hard
to distinguish Latin 'a', 'e' and 'o' from Cyrillic 'а', 'е' and 'о'.
(The visual representation, of course, very much depends on the fonts
used but usually these letters are almost indistinguishable.)  To
avoid the problem, the user can adjust the terminal encoding to get a
result suitable for their environment.


Rejected Proposals
==================

- Add encoding and errors arguments to the builtin print() function,
  with defaults of sys.getfilesystemencoding() and 'backslashreplace'.

  Complicated to implement, and in general, this is not seen as a good
  idea. [2]_

- Use character names to escape characters, instead of hex character
  codes.  For example, ``repr('\u03b1')`` can be converted to
  ``"\N{GREEK SMALL LETTER ALPHA}"``.

  Using character names can be very verbose compared to hex-escape.
  e.g., ``repr("\ufbf9")`` is converted to ``"\N{ARABIC LIGATURE
  UIGHUR KIRGHIZ YEH WITH HAMZA ABOVE WITH ALEF MAKSURA ISOLATED
  FORM}"``.

- Default error-handler of sys.stdout should be 'backslashreplace'.

  Stuff written to stdout might be consumed by another program that
  might misinterpret the \\ escapes.  For interactive sessions, it is
  possible to make the 'backslashreplace' error-handler the default,
  but this may add confusion of the kind "it works in interactive mode
  but not when redirecting to a file".


Implementation
==============

The author wrote a patch in http://bugs.python.org/issue2630; this was
committed to the Python 3.0 branch in revision 64138 on 06-11-2008.


References
==========

.. [1] Multibyte string on string\::string_print
       (http://bugs.python.org/issue479898)

.. [2] [Python-3000] Displaying strings containing unicode escapes
       (https://mail.python.org/pipermail/python-3000/2008-April/013366.html)

Copyright
=========

This document has been placed in the public domain.


..
  Local Variables:
  mode: indented-text
  indent-tabs-mode: nil
  sentence-end-double-space: t
  fill-column: 70
  coding: utf-8
  End:
-												Add PEP 3138 by Atsuo Ishimoto, String representation in Python 3000.

											
										
										
											2008-05-05 17:50:12 -04:00
+								PEP: 3138
 								Title: String representation in Python 3000
 								Version: $Revision$
 								Last-Modified: $Date$
 								Author: Atsuo Ishimoto <ishimoto--at--gembook.org>
-												PEP 3138 is final.

											
										
										
											2008-06-11 14:41:49 -04:00
+								Status: Final
-												Add PEP 3138 by Atsuo Ishimoto, String representation in Python 3000.

											
										
										
											2008-05-05 17:50:12 -04:00
+								Type: Standards Track
 								Content-Type: text/x-rst
 								Created: 05-May-2008
-												New cleaned-up version from Atsuo.

											
										
										
											2008-06-05 13:28:44 -04:00
+								Post-History: 05-May-2008, 05-Jun-2008
-												Add PEP 3138 by Atsuo Ishimoto, String representation in Python 3000.

											
										
										
											2008-05-05 17:50:12 -04:00
-												New version from Atsuo.

											
										
										
											2008-06-02 18:19:25 -04:00
-												Add PEP 3138 by Atsuo Ishimoto, String representation in Python 3000.

											
										
										
											2008-05-05 17:50:12 -04:00
+								Abstract
 								========
-												Some typo fixes in PEP 3138; also add variables footer.

											
										
										
											2012-09-30 02:55:27 -04:00
+								This PEP proposes a new string representation form for Python 3000.
 								In Python prior to Python 3000, the repr() built-in function converted
 								arbitrary objects to printable ASCII strings for debugging and
 								logging.  For Python 3000, a wider range of characters, based on the
 								Unicode standard, should be considered 'printable'.
-												Add PEP 3138 by Atsuo Ishimoto, String representation in Python 3000.

											
										
										
											2008-05-05 17:50:12 -04:00
-												Two spaces between each section

											
										
										
											2008-05-05 18:30:14 -04:00
-												Add PEP 3138 by Atsuo Ishimoto, String representation in Python 3000.

											
										
										
											2008-05-05 17:50:12 -04:00
+								Motivation
 								==========
 								The current repr() converts 8-bit strings to ASCII using following
 								algorithm.
 								- Convert CR, LF, TAB and '\\' to '\\r', '\\n', '\\t', '\\\\'.
-												Some typo fixes in PEP 3138; also add variables footer.

											
										
										
											2012-09-30 02:55:27 -04:00
+								- Convert other non-printable characters(0x00-0x1f, 0x7f) and
 								  non-ASCII characters (>= 0x80) to '\\xXX'.
-												Add PEP 3138 by Atsuo Ishimoto, String representation in Python 3000.

											
										
										
											2008-05-05 17:50:12 -04:00
-												New version from Atsuo.

											
										
										
											2008-06-02 18:19:25 -04:00
+								- Backslash-escape quote characters (apostrophe, ') and add the quote
-												Fix lay-out glitches and remove gmail turd.

											
										
										
											2008-06-02 18:26:21 -04:00
+								  character at the beginning and the end.
-												Add PEP 3138 by Atsuo Ishimoto, String representation in Python 3000.

											
										
										
											2008-05-05 17:50:12 -04:00
 								For Unicode strings, the following additional conversions are done.
 								- Convert leading surrogate pair characters without trailing character
-												Fix lay-out glitches and remove gmail turd.

											
										
										
											2008-06-02 18:26:21 -04:00
+								  (0xd800-0xdbff, but not followed by 0xdc00-0xdfff) to '\\uXXXX'.
-												Add PEP 3138 by Atsuo Ishimoto, String representation in Python 3000.

											
										
										
											2008-05-05 17:50:12 -04:00
-												Some typo fixes in PEP 3138; also add variables footer.

											
										
										
											2012-09-30 02:55:27 -04:00
+								- Convert 16-bit characters (>= 0x100) to '\\uXXXX'.
-												Add PEP 3138 by Atsuo Ishimoto, String representation in Python 3000.

											
										
										
											2008-05-05 17:50:12 -04:00
-												Some typo fixes in PEP 3138; also add variables footer.

											
										
										
											2012-09-30 02:55:27 -04:00
+								- Convert 21-bit characters (>= 0x10000) and surrogate pair characters
 								  to '\\U00xxxxxx'.
-												Add PEP 3138 by Atsuo Ishimoto, String representation in Python 3000.

											
										
										
											2008-05-05 17:50:12 -04:00
 								This algorithm converts any string to printable ASCII, and repr() is
-												New version from Atsuo.

											
										
										
											2008-06-02 18:19:25 -04:00
+								used as a handy and safe way to print strings for debugging or for
-												Some typo fixes in PEP 3138; also add variables footer.

											
										
										
											2012-09-30 02:55:27 -04:00
+								logging.  Although all non-ASCII characters are escaped, this does not
 								matter when most of the string's characters are ASCII.  But for other
-												Add PEP 3138 by Atsuo Ishimoto, String representation in Python 3000.

											
										
										
											2008-05-05 17:50:12 -04:00
+								languages, such as Japanese where most characters in a string are not
-												New version from Atsuo.

											
										
										
											2008-06-02 18:19:25 -04:00
+								ASCII, this is very inconvenient.
 								We can use ``print(aJapaneseString)`` to get a readable string, but we
 								don't have a similar workaround for printing strings from collections
-												Some typo fixes in PEP 3138; also add variables footer.

											
										
										
											2012-09-30 02:55:27 -04:00
+								such as lists or tuples.  ``print(listOfJapaneseStrings)`` uses repr()
 								to build the string to be printed, so the resulting strings are always
-												Fix typos (#1113)


											
										
										
											2019-07-03 14:20:45 -04:00
+								hex-escaped.  Or when ``open(japaneseFilename)`` raises an exception,
-												Some typo fixes in PEP 3138; also add variables footer.

											
										
										
											2012-09-30 02:55:27 -04:00
+								the error message is something like ``IOError: [Errno 2] No such file
 								or directory: '\u65e5\u672c\u8a9e'``, which isn't helpful.
-												New version from Atsuo.

											
										
										
											2008-06-02 18:19:25 -04:00
 								Python 3000 has a lot of nice features for non-Latin users such as
-												Some typo fixes in PEP 3138; also add variables footer.

											
										
										
											2012-09-30 02:55:27 -04:00
+								non-ASCII identifiers, so it would be helpful if Python could also
 								progress in a similar way for printable output.
-												Add PEP 3138 by Atsuo Ishimoto, String representation in Python 3000.

											
										
										
											2008-05-05 17:50:12 -04:00
 								Some users might be concerned that such output will mess up their
-												Some typo fixes in PEP 3138; also add variables footer.

											
										
										
											2012-09-30 02:55:27 -04:00
+								console if they print binary data like images.  But this is unlikely
 								to happen in practice because bytes and strings are different types in
-												Add PEP 3138 by Atsuo Ishimoto, String representation in Python 3000.

											
										
										
											2008-05-05 17:50:12 -04:00
+								Python 3000, so printing an image to the console won't mess it up.
-												Some typo fixes in PEP 3138; also add variables footer.

											
										
										
											2012-09-30 02:55:27 -04:00
+								This issue was once discussed by Hye-Shik Chang [1]_, but was rejected.
-												Add PEP 3138 by Atsuo Ishimoto, String representation in Python 3000.

											
										
										
											2008-05-05 17:50:12 -04:00
-												Two spaces between each section

											
										
										
											2008-05-05 18:30:14 -04:00
-												Add PEP 3138 by Atsuo Ishimoto, String representation in Python 3000.

											
										
										
											2008-05-05 17:50:12 -04:00
+								Specification
 								=============
-												Correct PY_UNICODE_ISPRINTABLE to Py_UNICODE_ISPRINTABLE.

											
										
										
											2008-06-02 18:23:23 -04:00
+								- Add a new function to the Python C API ``int Py_UNICODE_ISPRINTABLE
-												Some typo fixes in PEP 3138; also add variables footer.

											
										
										
											2012-09-30 02:55:27 -04:00
+								  (Py_UNICODE ch)``.  This function returns 0 if repr() should escape
 								  the Unicode character ``ch``; otherwise it returns 1.  Characters
 								  that should be escaped are defined in the Unicode character database
 								  as:
 								  * Cc (Other, Control)
 								  * Cf (Other, Format)
 								  * Cs (Other, Surrogate)
 								  * Co (Other, Private Use)
 								  * Cn (Other, Not Assigned)
 								  * Zl (Separator, Line), refers to LINE SEPARATOR ('\\u2028').
 								  * Zp (Separator, Paragraph), refers to PARAGRAPH SEPARATOR
 								    ('\\u2029').
 								  * Zs (Separator, Space) other than ASCII space ('\\x20').  Characters
 								    in this category should be escaped to avoid ambiguity.
-												New version from Atsuo.

											
										
										
											2008-06-02 18:19:25 -04:00
-												Add PEP 3138 by Atsuo Ishimoto, String representation in Python 3000.

											
										
										
											2008-05-05 17:50:12 -04:00
+								- The algorithm to build repr() strings should be changed to:
-												Some typo fixes in PEP 3138; also add variables footer.

											
										
										
											2012-09-30 02:55:27 -04:00
+								  * Convert CR, LF, TAB and '\\' to '\\r', '\\n', '\\t', '\\\\'.
-												New version from Atsuo.

											
										
										
											2008-06-02 18:19:25 -04:00
-												Some typo fixes in PEP 3138; also add variables footer.

											
										
										
											2012-09-30 02:55:27 -04:00
+								  * Convert non-printable ASCII characters (0x00-0x1f, 0x7f) to
 								    '\\xXX'.
-												Add PEP 3138 by Atsuo Ishimoto, String representation in Python 3000.

											
										
										
											2008-05-05 17:50:12 -04:00
-												Some typo fixes in PEP 3138; also add variables footer.

											
										
										
											2012-09-30 02:55:27 -04:00
+								  * Convert leading surrogate pair characters without trailing
 								    character (0xd800-0xdbff, but not followed by 0xdc00-0xdfff) to
 								    '\\uXXXX'.
-												Add PEP 3138 by Atsuo Ishimoto, String representation in Python 3000.

											
										
										
											2008-05-05 17:50:12 -04:00
-												Some typo fixes in PEP 3138; also add variables footer.

											
										
										
											2012-09-30 02:55:27 -04:00
+								  * Convert non-printable characters (Py_UNICODE_ISPRINTABLE() returns
-												add missing backslash (#2473)


											
										
										
											2022-03-27 13:46:23 -04:00
+) to '\\xXX', '\\uXXXX' or '\\U00xxxxxx'.
-												Add PEP 3138 by Atsuo Ishimoto, String representation in Python 3000.

											
										
										
											2008-05-05 17:50:12 -04:00
-												Some typo fixes in PEP 3138; also add variables footer.

											
										
										
											2012-09-30 02:55:27 -04:00
+								  * Backslash-escape quote characters (apostrophe, 0x27) and add a
 								    quote character at the beginning and the end.
-												Add PEP 3138 by Atsuo Ishimoto, String representation in Python 3000.

											
										
										
											2008-05-05 17:50:12 -04:00
-												Some typo fixes in PEP 3138; also add variables footer.

											
										
										
											2012-09-30 02:55:27 -04:00
+								- Set the Unicode error-handler for sys.stderr to 'backslashreplace'
 								  by default.
-												New version from Atsuo.

											
										
										
											2008-06-02 18:19:25 -04:00
-												New cleaned-up version from Atsuo.

											
										
										
											2008-06-05 13:28:44 -04:00
+								- Add a new function to the Python C API ``PyObject *PyObject_ASCII
-												Some typo fixes in PEP 3138; also add variables footer.

											
										
										
											2012-09-30 02:55:27 -04:00
+								  (PyObject *o)``.  This function converts any python object to a
 								  string using PyObject_Repr() and then hex-escapes all non-ASCII
 								  characters.  ``PyObject_ASCII()`` generates the same string as
 								  ``PyObject_Repr()`` in Python 2.
-												New cleaned-up version from Atsuo.

											
										
										
											2008-06-05 13:28:44 -04:00
-												Some typo fixes in PEP 3138; also add variables footer.

											
										
										
											2012-09-30 02:55:27 -04:00
+								- Add a new built-in function, ``ascii()``.  This function converts
 								  any python object to a string using repr() and then hex-escapes all
 								  non-ASCII characters.  ``ascii()`` generates the same string as
 								  ``repr()`` in Python 2.
-												New cleaned-up version from Atsuo.

											
										
										
											2008-06-05 13:28:44 -04:00
-												Some typo fixes in PEP 3138; also add variables footer.

											
										
										
											2012-09-30 02:55:27 -04:00
+								- Add a ``'%a'`` string format operator.  ``'%a'`` converts any python
-												Fix lay-out glitches and remove gmail turd.

											
										
										
											2008-06-02 18:26:21 -04:00
+								  object to a string using repr() and then hex-escapes all non-ASCII
-												Some typo fixes in PEP 3138; also add variables footer.

											
										
										
											2012-09-30 02:55:27 -04:00
+								  characters.  The ``'%a'`` format operator generates the same string
 								  as ``'%r'`` in Python 2.  Also, add ``'!a'`` conversion flags to the
-												New cleaned-up version from Atsuo.

											
										
										
											2008-06-05 13:28:44 -04:00
+								  ``string.format()`` method and add ``'%A'`` operator to the
-												Some typo fixes in PEP 3138; also add variables footer.

											
										
										
											2012-09-30 02:55:27 -04:00
+								  PyUnicode_FromFormat().  They convert any object to an ASCII string
-												New cleaned-up version from Atsuo.

											
										
										
											2008-06-05 13:28:44 -04:00
+								  as ``'%a'`` string format operator.
-												New version from Atsuo.

											
										
										
											2008-06-02 18:19:25 -04:00
-												Some typo fixes in PEP 3138; also add variables footer.

											
										
										
											2012-09-30 02:55:27 -04:00
+								- Add an ``isprintable()`` method to the string type.
 								  ``str.isprintable()`` returns False if repr() would escape any
 								  character in the string; otherwise returns True.  The
 								  ``isprintable()`` method calls the ``Py_UNICODE_ISPRINTABLE()``
 								  function internally.
-												Add PEP 3138 by Atsuo Ishimoto, String representation in Python 3000.

											
										
										
											2008-05-05 17:50:12 -04:00
-												Two spaces between each section

											
										
										
											2008-05-05 18:30:14 -04:00
-												Add PEP 3138 by Atsuo Ishimoto, String representation in Python 3000.

											
										
										
											2008-05-05 17:50:12 -04:00
+								Rationale
 								=========
-												Some typo fixes in PEP 3138; also add variables footer.

											
										
										
											2012-09-30 02:55:27 -04:00
+								The repr() in Python 3000 should be Unicode, not ASCII based, just
 								like Python 3000 strings.  Also, conversion should not be affected by
 								the locale setting, because the locale is not necessarily the same as
 								the output device's locale.  For example, it is common for a daemon
 								process to be invoked in an ASCII setting, but writes UTF-8 to its log
 								files.  Also, web applications might want to report the error
 								information in more readable form based on the HTML page's encoding.
-												New version from Atsuo.

											
										
										
											2008-06-02 18:19:25 -04:00
 								Characters not supported by the user's console could be hex-escaped on
-												Some typo fixes in PEP 3138; also add variables footer.

											
										
										
											2012-09-30 02:55:27 -04:00
+								printing, by the Unicode encoder's error-handler.  If the
 								error-handler of the output file is 'backslashreplace', such
 								characters are hex-escaped without raising UnicodeEncodeError.  For
 								example, if the default encoding is ASCII, ``print('Hello ¢')`` will
 								print 'Hello \\xa2'.  If the encoding is ISO-8859-1, 'Hello ¢' will be
 								printed.
 								The default error-handler for sys.stdout is 'strict'.  Other
 								applications reading the output might not understand hex-escaped
 								characters, so unsupported characters should be trapped when writing.
 								If unsupported characters must be escaped, the error-handler should be
 								changed explicitly.  Unlike sys.stdout, sys.stderr doesn't raise
-												New cleaned-up version from Atsuo.

											
										
										
											2008-06-05 13:28:44 -04:00
+								UnicodeEncodingError by default, because the default error-handler is
-												Some typo fixes in PEP 3138; also add variables footer.

											
										
										
											2012-09-30 02:55:27 -04:00
+								'backslashreplace'.  So printing error messages containing non-ASCII
 								characters to sys.stderr will not raise an exception.  Also,
 								information about uncaught exceptions (exception object, traceback) is
 								printed by the interpreter without raising exceptions.
-												Two spaces between each section

											
										
										
											2008-05-05 18:30:14 -04:00
-												Add PEP 3138 by Atsuo Ishimoto, String representation in Python 3000.

											
										
										
											2008-05-05 17:50:12 -04:00
+								Alternate Solutions
 								-------------------
-												Some typo fixes in PEP 3138; also add variables footer.

											
										
										
											2012-09-30 02:55:27 -04:00
+								To help debugging in non-Latin languages without changing repr(),
 								other suggestions were made.
-												Add PEP 3138 by Atsuo Ishimoto, String representation in Python 3000.

											
										
										
											2008-05-05 17:50:12 -04:00
 								- Supply a tool to print lists or dicts.
-												Some typo fixes in PEP 3138; also add variables footer.

											
										
										
											2012-09-30 02:55:27 -04:00
+								  Strings to be printed for debugging are not only contained by lists
 								  or dicts, but also in many other types of object.  File objects
 								  contain a file name in Unicode, exception objects contain a message
 								  in Unicode, etc.  These strings should be printed in readable form
 								  when repr()ed.  It is unlikely to be possible to implement a tool to
 								  print all possible object types.
-												Add PEP 3138 by Atsuo Ishimoto, String representation in Python 3000.

											
										
										
											2008-05-05 17:50:12 -04:00
 								- Use sys.displayhook and sys.excepthook.
-												Fix lay-out glitches and remove gmail turd.

											
										
										
											2008-06-02 18:26:21 -04:00
+								  For interactive sessions, we can write hooks to restore hex escaped
-												Some typo fixes in PEP 3138; also add variables footer.

											
										
										
											2012-09-30 02:55:27 -04:00
+								  characters to the original characters.  But these hooks are called
 								  only when printing the result of evaluating an expression entered in
 								  an interactive Python session, and don't work for the ``print()``
 								  function, for non-interactive sessions or for ``logging.debug("%r",
 								  ...)``, etc.
-												Add PEP 3138 by Atsuo Ishimoto, String representation in Python 3000.

											
										
										
											2008-05-05 17:50:12 -04:00
 								- Subclass sys.stdout and sys.stderr.
-												Fix lay-out glitches and remove gmail turd.

											
										
										
											2008-06-02 18:26:21 -04:00
+								  It is difficult to implement a subclass to restore hex-escaped
-												Some typo fixes in PEP 3138; also add variables footer.

											
										
										
											2012-09-30 02:55:27 -04:00
+								  characters since there isn't enough information left by the time
 								  it's a string to undo the escaping correctly in all cases.  For
 								  example, ``print("\\"+"u0041")`` should be printed as '\\u0041', not
 								  'A'. But there is no chance to tell file objects apart.
-												Add PEP 3138 by Atsuo Ishimoto, String representation in Python 3000.

											
										
										
											2008-05-05 17:50:12 -04:00
-												New version from Atsuo.

											
										
										
											2008-06-02 18:19:25 -04:00
+								- Make the encoding used by unicode_repr() adjustable, and make the
-												Fix lay-out glitches and remove gmail turd.

											
										
										
											2008-06-02 18:26:21 -04:00
+								  existing repr() the default.
-												New version from Atsuo.

											
										
										
											2008-06-02 18:19:25 -04:00
-												Fix lay-out glitches and remove gmail turd.

											
										
										
											2008-06-02 18:26:21 -04:00
+								  With adjustable repr(), the result of using repr() is unpredictable
 								  and would make it impossible to write correct code involving repr().
-												Some typo fixes in PEP 3138; also add variables footer.

											
										
										
											2012-09-30 02:55:27 -04:00
+								  And if current repr() is the default, then the old convention
 								  remains intact and users may expect ASCII strings as the result of
 								  repr().  Third party applications or libraries could be confused
 								  when a custom repr() function is used.
-												Add PEP 3138 by Atsuo Ishimoto, String representation in Python 3000.

											
										
										
											2008-05-05 17:50:12 -04:00
-												New version from Atsuo.

											
										
										
											2008-06-02 18:19:25 -04:00
 								Backwards Compatibility
 								=======================
 								Changing repr() may break some existing code, especially testing code.
-												Some typo fixes in PEP 3138; also add variables footer.

											
										
										
											2012-09-30 02:55:27 -04:00
+								Five of Python's regression tests fail with this modification.  If you
 								need repr() strings without non-ASCII character as Python 2, you can
 								use the following function. ::
-												New version from Atsuo.

											
										
										
											2008-06-02 18:19:25 -04:00
-												New cleaned-up version from Atsuo.

											
										
										
											2008-06-05 13:28:44 -04:00
+								  def repr_ascii(obj):
 								      return str(repr(obj).encode("ASCII", "backslashreplace"), "ASCII")
-												New version from Atsuo.

											
										
										
											2008-06-02 18:19:25 -04:00
 								For logging or for debugging, the following code can raise
 								UnicodeEncodeError. ::
-												New cleaned-up version from Atsuo.

											
										
										
											2008-06-05 13:28:44 -04:00
+								  log = open("logfile", "w")
 								  log.write(repr(data))     # UnicodeEncodeError will be raised
 								                            # if data contains unsupported characters.
-												New version from Atsuo.

											
										
										
											2008-06-02 18:19:25 -04:00
-												Some typo fixes in PEP 3138; also add variables footer.

											
										
										
											2012-09-30 02:55:27 -04:00
+								To avoid exceptions being raised, you can explicitly specify the
 								error-handler. ::
-												New version from Atsuo.

											
										
										
											2008-06-02 18:19:25 -04:00
-												New cleaned-up version from Atsuo.

											
										
										
											2008-06-05 13:28:44 -04:00
+								  log = open("logfile", "w", errors="backslashreplace")
 								  log.write(repr(data))  # Unsupported characters will be escaped.
-												New version from Atsuo.

											
										
										
											2008-06-02 18:19:25 -04:00
-												Some typo fixes in PEP 3138; also add variables footer.

											
										
										
											2012-09-30 02:55:27 -04:00
+								For a console that uses a Unicode-based encoding, for example,
 								en_US.utf8 or de_DE.utf8, the backslashreplace trick doesn't work and
 								all printable characters are not escaped.  This will cause a problem
 								of similarly drawing characters in Western, Greek and Cyrillic
 								languages.  These languages use similar (but different) alphabets
 								(descended from a common ancestor) and contain letters that look
 								similar but have different character codes.  For example, it is hard
 								to distinguish Latin 'a', 'e' and 'o' from Cyrillic 'а', 'е' and 'о'.
 								(The visual representation, of course, very much depends on the fonts
 								used but usually these letters are almost indistinguishable.)  To
 								avoid the problem, the user can adjust the terminal encoding to get a
 								result suitable for their environment.
-												Add PEP 3138 by Atsuo Ishimoto, String representation in Python 3000.

											
										
										
											2008-05-05 17:50:12 -04:00
-												New version from Atsuo.

											
										
										
											2008-06-02 18:19:25 -04:00
+								Rejected Proposals
 								==================
-												Add PEP 3138 by Atsuo Ishimoto, String representation in Python 3000.

											
										
										
											2008-05-05 17:50:12 -04:00
-												New version from Atsuo.

											
										
										
											2008-06-02 18:19:25 -04:00
+								- Add encoding and errors arguments to the builtin print() function,
-												Fix lay-out glitches and remove gmail turd.

											
										
										
											2008-06-02 18:26:21 -04:00
+								  with defaults of sys.getfilesystemencoding() and 'backslashreplace'.
-												New version from Atsuo.

											
										
										
											2008-06-02 18:19:25 -04:00
-												Fix lay-out glitches and remove gmail turd.

											
										
										
											2008-06-02 18:26:21 -04:00
+								  Complicated to implement, and in general, this is not seen as a good
 								  idea. [2]_
-												New version from Atsuo.

											
										
										
											2008-06-02 18:19:25 -04:00
 								- Use character names to escape characters, instead of hex character
-												Some typo fixes in PEP 3138; also add variables footer.

											
										
										
											2012-09-30 02:55:27 -04:00
+								  codes.  For example, ``repr('\u03b1')`` can be converted to
 								  ``"\N{GREEK SMALL LETTER ALPHA}"``.
-												New version from Atsuo.

											
										
										
											2008-06-02 18:19:25 -04:00
-												Some typo fixes in PEP 3138; also add variables footer.

											
										
										
											2012-09-30 02:55:27 -04:00
+								  Using character names can be very verbose compared to hex-escape.
 								  e.g., ``repr("\ufbf9")`` is converted to ``"\N{ARABIC LIGATURE
 								  UIGHUR KIRGHIZ YEH WITH HAMZA ABOVE WITH ALEF MAKSURA ISOLATED
 								  FORM}"``.
-												Add PEP 3138 by Atsuo Ishimoto, String representation in Python 3000.

											
										
										
											2008-05-05 17:50:12 -04:00
-												New version from Atsuo.

											
										
										
											2008-06-02 18:19:25 -04:00
+								- Default error-handler of sys.stdout should be 'backslashreplace'.
-												Add PEP 3138 by Atsuo Ishimoto, String representation in Python 3000.

											
										
										
											2008-05-05 17:50:12 -04:00
-												Fix lay-out glitches and remove gmail turd.

											
										
										
											2008-06-02 18:26:21 -04:00
+								  Stuff written to stdout might be consumed by another program that
-												Some typo fixes in PEP 3138; also add variables footer.

											
										
										
											2012-09-30 02:55:27 -04:00
+								  might misinterpret the \\ escapes.  For interactive sessions, it is
 								  possible to make the 'backslashreplace' error-handler the default,
 								  but this may add confusion of the kind "it works in interactive mode
 								  but not when redirecting to a file".
-												Add PEP 3138 by Atsuo Ishimoto, String representation in Python 3000.

											
										
										
											2008-05-05 17:50:12 -04:00
-												Two spaces between each section

											
										
										
											2008-05-05 18:30:14 -04:00
-												PEP 3138 is final.

											
										
										
											2008-06-11 14:41:49 -04:00
+								Implementation
 								==============
-												Add PEP 3138 by Atsuo Ishimoto, String representation in Python 3000.

											
										
										
											2008-05-05 17:50:12 -04:00
-												PEP 3138 is final.

											
										
										
											2008-06-11 14:41:49 -04:00
+								The author wrote a patch in http://bugs.python.org/issue2630; this was
 								committed to the Python 3.0 branch in revision 64138 on 06-11-2008.
-												Add PEP 3138 by Atsuo Ishimoto, String representation in Python 3000.

											
										
										
											2008-05-05 17:50:12 -04:00
 								References
 								==========
-												New cleaned-up version from Atsuo.

											
										
										
											2008-06-05 13:28:44 -04:00
+								.. [1] Multibyte string on string\::string_print
-												Add PEP 3138 by Atsuo Ishimoto, String representation in Python 3000.

											
										
										
											2008-05-05 17:50:12 -04:00
+								       (http://bugs.python.org/issue479898)
-												New version from Atsuo.

											
										
										
											2008-06-02 18:19:25 -04:00
+								.. [2] [Python-3000] Displaying strings containing unicode escapes
-												promote m.p.o links to https

											
										
										
											2017-06-11 15:02:39 -04:00
+								       (https://mail.python.org/pipermail/python-3000/2008-April/013366.html)
-												Add PEP 3138 by Atsuo Ishimoto, String representation in Python 3000.

											
										
										
											2008-05-05 17:50:12 -04:00
 								Copyright
 								=========
 								This document has been placed in the public domain.
-												Some typo fixes in PEP 3138; also add variables footer.

											
										
										
											2012-09-30 02:55:27 -04:00
 								..
 								  Local Variables:
 								  mode: indented-text
 								  indent-tabs-mode: nil
 								  sentence-end-double-space: t
 								  fill-column: 70
 								  coding: utf-8
 								  End: