Fix lay-out glitches and remove gmail turd.

This commit is contained in:
Guido van Rossum 2008-06-02 22:26:21 +00:00
parent 564e85c33f
commit 9934e842de
1 changed files with 64 additions and 65 deletions

View File

@ -29,20 +29,20 @@ algorithm.
- Convert CR, LF, TAB and '\\' to '\\r', '\\n', '\\t', '\\\\'.
- Convert other non-printable characters(0x00-0x1f, 0x7f) and non-ASCII
characters(>=0x80) to '\\xXX'.
characters(>=0x80) to '\\xXX'.
- Backslash-escape quote characters (apostrophe, ') and add the quote
character at the beginning and the end.
character at the beginning and the end.
For Unicode strings, the following additional conversions are done.
- Convert leading surrogate pair characters without trailing character
(0xd800-0xdbff, but not followed by 0xdc00-0xdfff) to '\\uXXXX'.
(0xd800-0xdbff, but not followed by 0xdc00-0xdfff) to '\\uXXXX'.
- Convert 16-bit characters(>=0x100) to '\\uXXXX'.
- Convert 21-bit characters(>=0x10000) and surrogate pair characters to
'\\U00xxxxxx'.
'\\U00xxxxxx'.
This algorithm converts any string to printable ASCII, and repr() is
used as a handy and safe way to print strings for debugging or for
@ -75,19 +75,19 @@ Specification
=============
- Add a new function to the Python C API ``int Py_UNICODE_ISPRINTABLE
(Py_UNICODE ch)``. This function returns 0 if repr() should escape the
Unicode character ``ch``; otherwise it returns 1. Characters that should
be escaped are defined in the Unicode character database as:
(Py_UNICODE ch)``. This function returns 0 if repr() should escape the
Unicode character ``ch``; otherwise it returns 1. Characters that should
be escaped are defined in the Unicode character database as:
* Cc (Other, Control)
* Cf (Other, Format)
* Cs (Other, Surrogate)
* Co (Other, Private Use)
* Cn (Other, Not Assigned)
* Zl (Separator, Line), refers to LINE SEPARATOR ('\\u2028').
* Zp (Separator, Paragraph), refers to PARAGRAPH SEPARATOR ('\\u2029').
* Zs (Separator, Space) other than ASCII space('\\x20'). Characters in
this category should be escaped to avoid ambiguity.
* Cc (Other, Control)
* Cf (Other, Format)
* Cs (Other, Surrogate)
* Co (Other, Private Use)
* Cn (Other, Not Assigned)
* Zl (Separator, Line), refers to LINE SEPARATOR ('\\u2028').
* Zp (Separator, Paragraph), refers to PARAGRAPH SEPARATOR ('\\u2029').
* Zs (Separator, Space) other than ASCII space('\\x20'). Characters in
this category should be escaped to avoid ambiguity.
- The algorithm to build repr() strings should be changed to:
@ -105,22 +105,22 @@ Specification
character at the beginning and the end.
- Set the Unicode error-handler for sys.stderr to 'backslashreplace' by
default.
default.
- Add ``'%a'`` string format operator. ``'%a'`` converts any python
object to a string using repr() and then hex-escapes all non-ASCII
characters. The ``'%a'`` format operator generates the same string as
``'%r'`` in Python 2.
object to a string using repr() and then hex-escapes all non-ASCII
characters. The ``'%a'`` format operator generates the same string as
``'%r'`` in Python 2.
- Add a new built-in function, ``ascii()``. This function converts any
python object to a string using repr() and then hex-escapes all non-
ASCII characters. ``ascii()`` generates the same string as ``repr()``
in Python 2.
python object to a string using repr() and then hex-escapes all non-
ASCII characters. ``ascii()`` generates the same string as ``repr()``
in Python 2.
- Add an ``isprintable()`` method to the string type. ``str.isprintable()``
returns False if repr() should escape any character in the string;
otherwise returns True. The ``isprintable()`` method calls the
`` Py_UNICODE_ISPRINTABLE()`` function internally.
returns False if repr() should escape any character in the string;
otherwise returns True. The ``isprintable()`` method calls the
`` Py_UNICODE_ISPRINTABLE()`` function internally.
Rationale
@ -157,38 +157,38 @@ suggestions were made.
- Supply a tool to print lists or dicts.
Strings to be printed for debugging are not only contained by lists or
dicts, but also in many other types of object. File objects contain a
file name in Unicode, exception objects contain a message in Unicode,
etc. These strings should be printed in readable form when repr()ed.
It is unlikely to be possible to implement a tool to print all
possible object types.
Strings to be printed for debugging are not only contained by lists or
dicts, but also in many other types of object. File objects contain a
file name in Unicode, exception objects contain a message in Unicode,
etc. These strings should be printed in readable form when repr()ed.
It is unlikely to be possible to implement a tool to print all
possible object types.
- Use sys.displayhook and sys.excepthook.
For interactive sessions, we can write hooks to restore hex escaped
characters to the original characters. But these hooks are called only
when printing the result of evaluating an expression entered in an
interactive Python session, and doesn't work for the print() function,
for non-interactive sessions or for logging.debug("%r", ...), etc.
For interactive sessions, we can write hooks to restore hex escaped
characters to the original characters. But these hooks are called only
when printing the result of evaluating an expression entered in an
interactive Python session, and doesn't work for the print() function,
for non-interactive sessions or for logging.debug("%r", ...), etc.
- Subclass sys.stdout and sys.stderr.
It is difficult to implement a subclass to restore hex-escaped
characters since there isn't enough information left by the time it's
a string to undo the escaping correctly in all cases. For example, ``
print("\\"+"u0041")`` should be printed as '\\u0041', not 'A'. But
there is no chance to tell file objects apart.
It is difficult to implement a subclass to restore hex-escaped
characters since there isn't enough information left by the time it's
a string to undo the escaping correctly in all cases. For example, ``
print("\\"+"u0041")`` should be printed as '\\u0041', not 'A'. But
there is no chance to tell file objects apart.
- Make the encoding used by unicode_repr() adjustable, and make the
existing repr() the default.
existing repr() the default.
With adjustable repr(), the result of using repr() is unpredictable
and would make it impossible to write correct code involving repr().
And if current repr() is the default, then the old convention remains
intact and users may expect ASCII strings as the result of repr().
Third party applications or libraries could be confused when a custom
repr() function is used.
With adjustable repr(), the result of using repr() is unpredictable
and would make it impossible to write correct code involving repr().
And if current repr() is the default, then the old convention remains
intact and users may expect ASCII strings as the result of repr().
Third party applications or libraries could be confused when a custom
repr() function is used.
Backwards Compatibility
@ -234,37 +234,36 @@ Open Issues
===========
- Is the ``ascii()`` function necessary, or is it sufficient to document
how to do it? If necessary, should ``ascii()`` belong to the builtin
namespace?
how to do it? If necessary, should ``ascii()`` belong to the builtin
namespace?
Rejected Proposals
==================
- Add encoding and errors arguments to the builtin print() function,
with defaults of sys.getfilesystemencoding() and 'backslashreplace'.
with defaults of sys.getfilesystemencoding() and 'backslashreplace'.
Complicated to implement, and in general, this is not seen as a good
idea. [2]_
Complicated to implement, and in general, this is not seen as a good
idea. [2]_
- Use character names to escape characters, instead of hex character
codes. For example, ``repr('\u03b1')`` can be converted to
``"\N{GREEK SMALL LETTER ALPHA}"``.
codes. For example, ``repr('\u03b1')`` can be converted to
``"\N{GREEK SMALL LETTER ALPHA}"``.
Using character names can be very verbose compared to hex-escape.
e.g., ``repr("\ufbf9")`` is converted to ``"\N{ARABIC LIGATURE UIGHUR
KIRGHIZ YEH WITH HAMZA ABOVE WITH ALEF MAKSURA ISOLATED FORM}"``.
Using character names can be very verbose compared to hex-escape.
e.g., ``repr("\ufbf9")`` is converted to ``"\N{ARABIC LIGATURE UIGHUR
KIRGHIZ YEH WITH HAMZA ABOVE WITH ALEF MAKSURA ISOLATED FORM}"``.
- Default error-handler of sys.stdout should be 'backslashreplace'.
Stuff written to stdout might be consumed by another program that
might misinterpret the \ escapes. For interactive session, it is
possible to make 'backslashreplace' error-handler to default, but may
add confusion of the kind "it works in interactive mode but not when
redirecting to a file".
Stuff written to stdout might be consumed by another program that
might misinterpret the \ escapes. For interactive session, it is
possible to make 'backslashreplace' error-handler to default, but may
add confusion of the kind "it works in interactive mode but not when
redirecting to a file".
- Hide quoted text -
Reference Implementation
========================