PEP 540
Add examples for the "List a directory into stdout" use case.
This commit is contained in:
parent
1b6b889ed6
commit
0e107f280c
60
pep-0540.txt
60
pep-0540.txt
|
@ -278,17 +278,20 @@ handler, instead using the locale encoding (with ``strict`` or
|
||||||
Basically, the UTF-8 mode behaves as Python 2: it "just works" and don't
|
Basically, the UTF-8 mode behaves as Python 2: it "just works" and don't
|
||||||
bother users with encodings, but it can produce mojibake. It can be
|
bother users with encodings, but it can produce mojibake. It can be
|
||||||
configured as strict to prevent mojibake: the UTF-8 encoding is used
|
configured as strict to prevent mojibake: the UTF-8 encoding is used
|
||||||
with the ``strict`` error handler in this case.
|
with the ``strict`` error handler for inputs and outputs, but the
|
||||||
|
``surrogateescape`` error handler is still used for `operating system
|
||||||
|
data`_.
|
||||||
|
|
||||||
New ``-X utf8`` command line option and ``PYTHONUTF8`` environment
|
New ``-X utf8`` command line option and ``PYTHONUTF8`` environment
|
||||||
variable are added to control the UTF-8 mode. The UTF-8 mode is enabled
|
variable are added to control the UTF-8 mode. The UTF-8 mode is enabled
|
||||||
by ``-X utf8`` or ``PYTHONUTF8=1``. The UTF-8 is configured as strict
|
by ``-X utf8`` or ``PYTHONUTF8=1``. The UTF-8 is configured as strict
|
||||||
by ``-X utf8=strict`` or ``PYTHONUTF8=strict``.
|
by ``-X utf8=strict`` or ``PYTHONUTF8=strict``. Other option values fail
|
||||||
|
with an error.
|
||||||
|
|
||||||
The POSIX locale enables the UTF-8 mode. In this case, the UTF-8 mode
|
The POSIX locale enables the UTF-8 mode. In this case, the UTF-8 mode
|
||||||
can be explicitly disabled by ``-X utf8=0`` or ``PYTHONUTF8=0``.
|
can be explicitly disabled by ``-X utf8=0`` or ``PYTHONUTF8=0``.
|
||||||
|
|
||||||
The ``-X utf8`` has the priority on the ``PYTHONUTF8`` environment
|
The ``-X utf8`` has the priority over the ``PYTHONUTF8`` environment
|
||||||
variable. For example, ``PYTHONUTF8=0 python3 -X utf8`` enables the
|
variable. For example, ``PYTHONUTF8=0 python3 -X utf8`` enables the
|
||||||
UTF-8 mode.
|
UTF-8 mode.
|
||||||
|
|
||||||
|
@ -389,7 +392,7 @@ code.
|
||||||
The UTF-8 mode can produce mojibake since Python and external code don't
|
The UTF-8 mode can produce mojibake since Python and external code don't
|
||||||
both of invalid bytes, but it's a deliberate choice. The UTF-8 mode can
|
both of invalid bytes, but it's a deliberate choice. The UTF-8 mode can
|
||||||
be configured as strict to prevent mojibake and be fail early when data
|
be configured as strict to prevent mojibake and be fail early when data
|
||||||
is not decodable from UTF-8.
|
is not decodable from UTF-8 or not encodable to UTF-8.
|
||||||
|
|
||||||
External code using text
|
External code using text
|
||||||
^^^^^^^^^^^^^^^^^^^^^^^^
|
^^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
@ -441,6 +444,38 @@ To be able to always work, the program must be able to produce mojibake.
|
||||||
Mojibake is more user friendly than an error with a truncated or empty
|
Mojibake is more user friendly than an error with a truncated or empty
|
||||||
output.
|
output.
|
||||||
|
|
||||||
|
Example with a directory which contains the file called ``b'xxx\xff'``
|
||||||
|
(the byte ``0xFF`` is invalid in UTF-8).
|
||||||
|
|
||||||
|
Default and UTF-8 Strict mode fail on ``print()`` with an encode error::
|
||||||
|
|
||||||
|
$ python3.7 ../ls.py
|
||||||
|
Traceback (most recent call last):
|
||||||
|
File "../ls.py", line 5, in <module>
|
||||||
|
print(name)
|
||||||
|
UnicodeEncodeError: 'utf-8' codec can't encode character '\udcff' ...
|
||||||
|
|
||||||
|
$ python3.7 -X utf8=strict ../ls.py
|
||||||
|
Traceback (most recent call last):
|
||||||
|
File "../ls.py", line 5, in <module>
|
||||||
|
print(name)
|
||||||
|
UnicodeEncodeError: 'utf-8' codec can't encode character '\udcff' ...
|
||||||
|
|
||||||
|
The UTF-8 mode, POSIX locale, Python 2 and the UNIX ``ls`` command work
|
||||||
|
but display mojibake::
|
||||||
|
|
||||||
|
$ python3.7 -X utf8 ../ls.py
|
||||||
|
xxx<78>
|
||||||
|
|
||||||
|
$ LC_ALL=C /python3.6 ../ls.py
|
||||||
|
xxx<78>
|
||||||
|
|
||||||
|
$ python2 ../ls.py
|
||||||
|
xxx<78>
|
||||||
|
|
||||||
|
$ ls
|
||||||
|
'xxx'$'\377'
|
||||||
|
|
||||||
|
|
||||||
List a directory into a text file
|
List a directory into a text file
|
||||||
---------------------------------
|
---------------------------------
|
||||||
|
@ -647,9 +682,9 @@ Backward Compatibility
|
||||||
======================
|
======================
|
||||||
|
|
||||||
The main backward incompatible change is that the UTF-8 encoding is now
|
The main backward incompatible change is that the UTF-8 encoding is now
|
||||||
used if the locale is POSIX. Since the UTF-8 encoding is used with the
|
used by default if the locale is POSIX. Since the UTF-8 encoding is used
|
||||||
``surrogateescape`` error handler, ecoding errors should not occur and
|
with the ``surrogateescape`` error handler, encoding errors should not
|
||||||
so the change should not break applications.
|
occur and so the change should not break applications.
|
||||||
|
|
||||||
The more likely source of trouble comes from external libraries. Python
|
The more likely source of trouble comes from external libraries. Python
|
||||||
can decode successfully data from UTF-8, but a library using the locale
|
can decode successfully data from UTF-8, but a library using the locale
|
||||||
|
@ -658,9 +693,8 @@ encoding text in a library is a rare operation. Very few libraries
|
||||||
expect text, most libraries expect bytes and even manipulate bytes
|
expect text, most libraries expect bytes and even manipulate bytes
|
||||||
internally.
|
internally.
|
||||||
|
|
||||||
If the locale is not POSIX, the PEP has no impact on the backward
|
The PEP only changes the default behaviour if the locale is POSIX. For
|
||||||
compatibility since the UTF-8 mode is disabled by default in this case,
|
other locales, the *default* behaviour is unchanged.
|
||||||
it must be enabled explicitly.
|
|
||||||
|
|
||||||
|
|
||||||
Alternatives
|
Alternatives
|
||||||
|
@ -672,9 +706,9 @@ Don't modify the encoding of the POSIX locale
|
||||||
A first version of the PEP did not change the encoding and error handler
|
A first version of the PEP did not change the encoding and error handler
|
||||||
used of the POSIX locale.
|
used of the POSIX locale.
|
||||||
|
|
||||||
The problem is that adding a command line option or setting an
|
The problem is that adding the ``-X utf8`` command line option or
|
||||||
environment variable is not possible in some cases, or at least not
|
setting the ``PYTHONUTF8`` environment variable is not possible in some
|
||||||
convenient.
|
cases, or at least not convenient.
|
||||||
|
|
||||||
Moreover, many users simply expect that Python 3 behaves as Python 2:
|
Moreover, many users simply expect that Python 3 behaves as Python 2:
|
||||||
don't bother them with encodings and "just works" in all cases. These
|
don't bother them with encodings and "just works" in all cases. These
|
||||||
|
|
Loading…
Reference in New Issue