Add examples for the "List a directory into stdout" use case.
This commit is contained in:
Victor Stinner 2017-01-11 22:32:24 +01:00
parent 1b6b889ed6
commit 0e107f280c
1 changed files with 47 additions and 13 deletions

View File

@ -278,17 +278,20 @@ handler, instead using the locale encoding (with ``strict`` or
Basically, the UTF-8 mode behaves as Python 2: it "just works" and don't
bother users with encodings, but it can produce mojibake. It can be
configured as strict to prevent mojibake: the UTF-8 encoding is used
with the ``strict`` error handler in this case.
with the ``strict`` error handler for inputs and outputs, but the
``surrogateescape`` error handler is still used for `operating system
data`_.
New ``-X utf8`` command line option and ``PYTHONUTF8`` environment
variable are added to control the UTF-8 mode. The UTF-8 mode is enabled
by ``-X utf8`` or ``PYTHONUTF8=1``. The UTF-8 is configured as strict
by ``-X utf8=strict`` or ``PYTHONUTF8=strict``.
by ``-X utf8=strict`` or ``PYTHONUTF8=strict``. Other option values fail
with an error.
The POSIX locale enables the UTF-8 mode. In this case, the UTF-8 mode
can be explicitly disabled by ``-X utf8=0`` or ``PYTHONUTF8=0``.
The ``-X utf8`` has the priority on the ``PYTHONUTF8`` environment
The ``-X utf8`` has the priority over the ``PYTHONUTF8`` environment
variable. For example, ``PYTHONUTF8=0 python3 -X utf8`` enables the
UTF-8 mode.
@ -389,7 +392,7 @@ code.
The UTF-8 mode can produce mojibake since Python and external code don't
both of invalid bytes, but it's a deliberate choice. The UTF-8 mode can
be configured as strict to prevent mojibake and be fail early when data
is not decodable from UTF-8.
is not decodable from UTF-8 or not encodable to UTF-8.
External code using text
^^^^^^^^^^^^^^^^^^^^^^^^
@ -441,6 +444,38 @@ To be able to always work, the program must be able to produce mojibake.
Mojibake is more user friendly than an error with a truncated or empty
output.
Example with a directory which contains the file called ``b'xxx\xff'``
(the byte ``0xFF`` is invalid in UTF-8).
Default and UTF-8 Strict mode fail on ``print()`` with an encode error::
$ python3.7 ../ls.py
Traceback (most recent call last):
File "../ls.py", line 5, in <module>
print(name)
UnicodeEncodeError: 'utf-8' codec can't encode character '\udcff' ...
$ python3.7 -X utf8=strict ../ls.py
Traceback (most recent call last):
File "../ls.py", line 5, in <module>
print(name)
UnicodeEncodeError: 'utf-8' codec can't encode character '\udcff' ...
The UTF-8 mode, POSIX locale, Python 2 and the UNIX ``ls`` command work
but display mojibake::
$ python3.7 -X utf8 ../ls.py
xxx<78>
$ LC_ALL=C /python3.6 ../ls.py
xxx<78>
$ python2 ../ls.py
xxx<78>
$ ls
'xxx'$'\377'
List a directory into a text file
---------------------------------
@ -647,9 +682,9 @@ Backward Compatibility
======================
The main backward incompatible change is that the UTF-8 encoding is now
used if the locale is POSIX. Since the UTF-8 encoding is used with the
``surrogateescape`` error handler, ecoding errors should not occur and
so the change should not break applications.
used by default if the locale is POSIX. Since the UTF-8 encoding is used
with the ``surrogateescape`` error handler, encoding errors should not
occur and so the change should not break applications.
The more likely source of trouble comes from external libraries. Python
can decode successfully data from UTF-8, but a library using the locale
@ -658,9 +693,8 @@ encoding text in a library is a rare operation. Very few libraries
expect text, most libraries expect bytes and even manipulate bytes
internally.
If the locale is not POSIX, the PEP has no impact on the backward
compatibility since the UTF-8 mode is disabled by default in this case,
it must be enabled explicitly.
The PEP only changes the default behaviour if the locale is POSIX. For
other locales, the *default* behaviour is unchanged.
Alternatives
@ -672,9 +706,9 @@ Don't modify the encoding of the POSIX locale
A first version of the PEP did not change the encoding and error handler
used of the POSIX locale.
The problem is that adding a command line option or setting an
environment variable is not possible in some cases, or at least not
convenient.
The problem is that adding the ``-X utf8`` command line option or
setting the ``PYTHONUTF8`` environment variable is not possible in some
cases, or at least not convenient.
Moreover, many users simply expect that Python 3 behaves as Python 2:
don't bother them with encodings and "just works" in all cases. These