Fix lay-out glitches and remove gmail turd.
This commit is contained in:
parent
564e85c33f
commit
9934e842de
129
pep-3138.txt
129
pep-3138.txt
|
@ -29,20 +29,20 @@ algorithm.
|
|||
- Convert CR, LF, TAB and '\\' to '\\r', '\\n', '\\t', '\\\\'.
|
||||
|
||||
- Convert other non-printable characters(0x00-0x1f, 0x7f) and non-ASCII
|
||||
characters(>=0x80) to '\\xXX'.
|
||||
characters(>=0x80) to '\\xXX'.
|
||||
|
||||
- Backslash-escape quote characters (apostrophe, ') and add the quote
|
||||
character at the beginning and the end.
|
||||
character at the beginning and the end.
|
||||
|
||||
For Unicode strings, the following additional conversions are done.
|
||||
|
||||
- Convert leading surrogate pair characters without trailing character
|
||||
(0xd800-0xdbff, but not followed by 0xdc00-0xdfff) to '\\uXXXX'.
|
||||
(0xd800-0xdbff, but not followed by 0xdc00-0xdfff) to '\\uXXXX'.
|
||||
|
||||
- Convert 16-bit characters(>=0x100) to '\\uXXXX'.
|
||||
|
||||
- Convert 21-bit characters(>=0x10000) and surrogate pair characters to
|
||||
'\\U00xxxxxx'.
|
||||
'\\U00xxxxxx'.
|
||||
|
||||
This algorithm converts any string to printable ASCII, and repr() is
|
||||
used as a handy and safe way to print strings for debugging or for
|
||||
|
@ -75,19 +75,19 @@ Specification
|
|||
=============
|
||||
|
||||
- Add a new function to the Python C API ``int Py_UNICODE_ISPRINTABLE
|
||||
(Py_UNICODE ch)``. This function returns 0 if repr() should escape the
|
||||
Unicode character ``ch``; otherwise it returns 1. Characters that should
|
||||
be escaped are defined in the Unicode character database as:
|
||||
(Py_UNICODE ch)``. This function returns 0 if repr() should escape the
|
||||
Unicode character ``ch``; otherwise it returns 1. Characters that should
|
||||
be escaped are defined in the Unicode character database as:
|
||||
|
||||
* Cc (Other, Control)
|
||||
* Cf (Other, Format)
|
||||
* Cs (Other, Surrogate)
|
||||
* Co (Other, Private Use)
|
||||
* Cn (Other, Not Assigned)
|
||||
* Zl (Separator, Line), refers to LINE SEPARATOR ('\\u2028').
|
||||
* Zp (Separator, Paragraph), refers to PARAGRAPH SEPARATOR ('\\u2029').
|
||||
* Zs (Separator, Space) other than ASCII space('\\x20'). Characters in
|
||||
this category should be escaped to avoid ambiguity.
|
||||
* Cc (Other, Control)
|
||||
* Cf (Other, Format)
|
||||
* Cs (Other, Surrogate)
|
||||
* Co (Other, Private Use)
|
||||
* Cn (Other, Not Assigned)
|
||||
* Zl (Separator, Line), refers to LINE SEPARATOR ('\\u2028').
|
||||
* Zp (Separator, Paragraph), refers to PARAGRAPH SEPARATOR ('\\u2029').
|
||||
* Zs (Separator, Space) other than ASCII space('\\x20'). Characters in
|
||||
this category should be escaped to avoid ambiguity.
|
||||
|
||||
- The algorithm to build repr() strings should be changed to:
|
||||
|
||||
|
@ -105,22 +105,22 @@ Specification
|
|||
character at the beginning and the end.
|
||||
|
||||
- Set the Unicode error-handler for sys.stderr to 'backslashreplace' by
|
||||
default.
|
||||
default.
|
||||
|
||||
- Add ``'%a'`` string format operator. ``'%a'`` converts any python
|
||||
object to a string using repr() and then hex-escapes all non-ASCII
|
||||
characters. The ``'%a'`` format operator generates the same string as
|
||||
``'%r'`` in Python 2.
|
||||
object to a string using repr() and then hex-escapes all non-ASCII
|
||||
characters. The ``'%a'`` format operator generates the same string as
|
||||
``'%r'`` in Python 2.
|
||||
|
||||
- Add a new built-in function, ``ascii()``. This function converts any
|
||||
python object to a string using repr() and then hex-escapes all non-
|
||||
ASCII characters. ``ascii()`` generates the same string as ``repr()``
|
||||
in Python 2.
|
||||
python object to a string using repr() and then hex-escapes all non-
|
||||
ASCII characters. ``ascii()`` generates the same string as ``repr()``
|
||||
in Python 2.
|
||||
|
||||
- Add an ``isprintable()`` method to the string type. ``str.isprintable()``
|
||||
returns False if repr() should escape any character in the string;
|
||||
otherwise returns True. The ``isprintable()`` method calls the
|
||||
`` Py_UNICODE_ISPRINTABLE()`` function internally.
|
||||
returns False if repr() should escape any character in the string;
|
||||
otherwise returns True. The ``isprintable()`` method calls the
|
||||
`` Py_UNICODE_ISPRINTABLE()`` function internally.
|
||||
|
||||
|
||||
Rationale
|
||||
|
@ -157,38 +157,38 @@ suggestions were made.
|
|||
|
||||
- Supply a tool to print lists or dicts.
|
||||
|
||||
Strings to be printed for debugging are not only contained by lists or
|
||||
dicts, but also in many other types of object. File objects contain a
|
||||
file name in Unicode, exception objects contain a message in Unicode,
|
||||
etc. These strings should be printed in readable form when repr()ed.
|
||||
It is unlikely to be possible to implement a tool to print all
|
||||
possible object types.
|
||||
Strings to be printed for debugging are not only contained by lists or
|
||||
dicts, but also in many other types of object. File objects contain a
|
||||
file name in Unicode, exception objects contain a message in Unicode,
|
||||
etc. These strings should be printed in readable form when repr()ed.
|
||||
It is unlikely to be possible to implement a tool to print all
|
||||
possible object types.
|
||||
|
||||
- Use sys.displayhook and sys.excepthook.
|
||||
|
||||
For interactive sessions, we can write hooks to restore hex escaped
|
||||
characters to the original characters. But these hooks are called only
|
||||
when printing the result of evaluating an expression entered in an
|
||||
interactive Python session, and doesn't work for the print() function,
|
||||
for non-interactive sessions or for logging.debug("%r", ...), etc.
|
||||
For interactive sessions, we can write hooks to restore hex escaped
|
||||
characters to the original characters. But these hooks are called only
|
||||
when printing the result of evaluating an expression entered in an
|
||||
interactive Python session, and doesn't work for the print() function,
|
||||
for non-interactive sessions or for logging.debug("%r", ...), etc.
|
||||
|
||||
- Subclass sys.stdout and sys.stderr.
|
||||
|
||||
It is difficult to implement a subclass to restore hex-escaped
|
||||
characters since there isn't enough information left by the time it's
|
||||
a string to undo the escaping correctly in all cases. For example, ``
|
||||
print("\\"+"u0041")`` should be printed as '\\u0041', not 'A'. But
|
||||
there is no chance to tell file objects apart.
|
||||
It is difficult to implement a subclass to restore hex-escaped
|
||||
characters since there isn't enough information left by the time it's
|
||||
a string to undo the escaping correctly in all cases. For example, ``
|
||||
print("\\"+"u0041")`` should be printed as '\\u0041', not 'A'. But
|
||||
there is no chance to tell file objects apart.
|
||||
|
||||
- Make the encoding used by unicode_repr() adjustable, and make the
|
||||
existing repr() the default.
|
||||
existing repr() the default.
|
||||
|
||||
With adjustable repr(), the result of using repr() is unpredictable
|
||||
and would make it impossible to write correct code involving repr().
|
||||
And if current repr() is the default, then the old convention remains
|
||||
intact and users may expect ASCII strings as the result of repr().
|
||||
Third party applications or libraries could be confused when a custom
|
||||
repr() function is used.
|
||||
With adjustable repr(), the result of using repr() is unpredictable
|
||||
and would make it impossible to write correct code involving repr().
|
||||
And if current repr() is the default, then the old convention remains
|
||||
intact and users may expect ASCII strings as the result of repr().
|
||||
Third party applications or libraries could be confused when a custom
|
||||
repr() function is used.
|
||||
|
||||
|
||||
Backwards Compatibility
|
||||
|
@ -234,37 +234,36 @@ Open Issues
|
|||
===========
|
||||
|
||||
- Is the ``ascii()`` function necessary, or is it sufficient to document
|
||||
how to do it? If necessary, should ``ascii()`` belong to the builtin
|
||||
namespace?
|
||||
how to do it? If necessary, should ``ascii()`` belong to the builtin
|
||||
namespace?
|
||||
|
||||
|
||||
Rejected Proposals
|
||||
==================
|
||||
|
||||
- Add encoding and errors arguments to the builtin print() function,
|
||||
with defaults of sys.getfilesystemencoding() and 'backslashreplace'.
|
||||
with defaults of sys.getfilesystemencoding() and 'backslashreplace'.
|
||||
|
||||
Complicated to implement, and in general, this is not seen as a good
|
||||
idea. [2]_
|
||||
Complicated to implement, and in general, this is not seen as a good
|
||||
idea. [2]_
|
||||
|
||||
- Use character names to escape characters, instead of hex character
|
||||
codes. For example, ``repr('\u03b1')`` can be converted to
|
||||
``"\N{GREEK SMALL LETTER ALPHA}"``.
|
||||
codes. For example, ``repr('\u03b1')`` can be converted to
|
||||
``"\N{GREEK SMALL LETTER ALPHA}"``.
|
||||
|
||||
Using character names can be very verbose compared to hex-escape.
|
||||
e.g., ``repr("\ufbf9")`` is converted to ``"\N{ARABIC LIGATURE UIGHUR
|
||||
KIRGHIZ YEH WITH HAMZA ABOVE WITH ALEF MAKSURA ISOLATED FORM}"``.
|
||||
Using character names can be very verbose compared to hex-escape.
|
||||
e.g., ``repr("\ufbf9")`` is converted to ``"\N{ARABIC LIGATURE UIGHUR
|
||||
KIRGHIZ YEH WITH HAMZA ABOVE WITH ALEF MAKSURA ISOLATED FORM}"``.
|
||||
|
||||
- Default error-handler of sys.stdout should be 'backslashreplace'.
|
||||
|
||||
Stuff written to stdout might be consumed by another program that
|
||||
might misinterpret the \ escapes. For interactive session, it is
|
||||
possible to make 'backslashreplace' error-handler to default, but may
|
||||
add confusion of the kind "it works in interactive mode but not when
|
||||
redirecting to a file".
|
||||
Stuff written to stdout might be consumed by another program that
|
||||
might misinterpret the \ escapes. For interactive session, it is
|
||||
possible to make 'backslashreplace' error-handler to default, but may
|
||||
add confusion of the kind "it works in interactive mode but not when
|
||||
redirecting to a file".
|
||||
|
||||
|
||||
- Hide quoted text -
|
||||
Reference Implementation
|
||||
========================
|
||||
|
||||
|
|
Loading…
Reference in New Issue