PEP 538: document core design principles
Also provides a bit more background on the rationale for using "strict" by default on stdin and stdout when coercing the locale to one based on UTF-8
This commit is contained in:
parent
927e704d7e
commit
1f1abb3b6a
55
pep-0538.txt
55
pep-0538.txt
|
@ -292,6 +292,34 @@ locale category, rather than overriding all the locale categories::
|
|||
ℙƴ☂ℌøἤ
|
||||
|
||||
|
||||
Design Principles
|
||||
=================
|
||||
|
||||
The above motivation leads to the following core design principles for the
|
||||
proposed solution:
|
||||
|
||||
* if a locale other than the default C locale is explicitly configured, we'll
|
||||
continue to respect it
|
||||
* if we're changing the locale setting without an explicit config option, we'll
|
||||
emit a warning on stderr that we're doing so rather than silently changing
|
||||
the process configuration. This will alert application and system integrators
|
||||
to the change, even if they don't closely follow the PEP process or Python
|
||||
release announcements. However, to minimize the chance of introducing new
|
||||
problems for end users, we'll do this *without* using the warnings system, so
|
||||
even running with ``-Werror`` won't turn it into a runtime exception
|
||||
|
||||
The general design principle of Python 3 to prefer raising an exception over
|
||||
incorrectly encoding or decoding data then leads to the following additional
|
||||
design guideline:
|
||||
|
||||
* if a UTF-8 based Linux container is run on a host that is explicitly
|
||||
configured to use a non-UTF-8 encoding, and tries to exchange locally
|
||||
encoded data with that host rather than exchanging explicitly UTF-8 encoded
|
||||
data, this will ideally lead to an immediate runtime exception rather than
|
||||
to silent data corruption
|
||||
|
||||
|
||||
|
||||
Specification
|
||||
=============
|
||||
|
||||
|
@ -489,17 +517,25 @@ are valid:
|
|||
default encoding of ASCII the way CPython currently does
|
||||
|
||||
|
||||
Using "strict" error handling by default
|
||||
----------------------------------------
|
||||
Defaulting to "strict" error handling on the standard IO streams
|
||||
----------------------------------------------------------------
|
||||
|
||||
By coercing the locale away from the legacy C default and its assumption of
|
||||
ASCII as the preferred text encoding, this PEP also disables the implicit use
|
||||
of the "surrogateescape" error handler on the standard IO streams that was
|
||||
introduced in Python 3.5.
|
||||
introduced in Python 3.5 ([15_]).
|
||||
|
||||
This is deliberate, as while UTF-8 as the preferred text encoding is a good
|
||||
working assumption for network service development and for more recent releases
|
||||
of client operating systems, it still isn't a universally valid assumption.
|
||||
This is deliberate, as that change was primarily aimed at handling the case
|
||||
where the correct system encoding was the ASCII-compatible UTF-8 (or another
|
||||
ASCII-compatible encoding), but the nominal encoding used for operating system
|
||||
interfaces in the current process was ASCII.
|
||||
|
||||
With this PEP, that assumption is being narrowed a step further, such that
|
||||
rather than assuming "an ASCII-compatible encoding", we instead assume UTF-8
|
||||
specifically. If that assumption is genuinely wrong, then it continues to be
|
||||
friendlier to users of other encodings to alert them to the runtime's mistaken
|
||||
assumption, rather than continuing on and potentially corrupting their data
|
||||
permanently.
|
||||
|
||||
In particular, GB 18030 [12_] is a Chinese national text encoding standard
|
||||
that handles all Unicode code points, but is incompatible with both ASCII and
|
||||
|
@ -514,6 +550,10 @@ container application that is assuming the use of UTF-8 or vice-versa is likely
|
|||
to cause an immediate Unicode encoding or decoding error, rather than
|
||||
potentially causing silent data corruption.
|
||||
|
||||
For users that would prefer more permissive behaviour, setting
|
||||
``PYTHONIOENCODING=:surrogateescape`` will continue to be supported, as this
|
||||
PEP makes no changes to that feature.
|
||||
|
||||
|
||||
Dropping official support for Unicode handling in the legacy C locale
|
||||
---------------------------------------------------------------------
|
||||
|
@ -722,6 +762,9 @@ References
|
|||
.. [14] ISO-2022
|
||||
(https://en.wikipedia.org/wiki/ISO/IEC_2022)
|
||||
|
||||
.. [15] Use "surrogateescape" error handler for sys.stdin and sys.stdout on UNIX for the C locale
|
||||
(https://bugs.python.org/issue19977)
|
||||
|
||||
Copyright
|
||||
=========
|
||||
|
||||
|
|
Loading…
Reference in New Issue