PEP 538: document core design principles

Also provides a bit more background on the rationale for
using "strict" by default on stdin and stdout when coercing
the locale to one based on UTF-8
This commit is contained in:
Nick Coghlan 2017-01-08 11:54:24 +10:00
parent 927e704d7e
commit 1f1abb3b6a
1 changed files with 49 additions and 6 deletions

View File

@ -292,6 +292,34 @@ locale category, rather than overriding all the locale categories::
ℙƴ☂ℌøἤ
Design Principles
=================
The above motivation leads to the following core design principles for the
proposed solution:
* if a locale other than the default C locale is explicitly configured, we'll
continue to respect it
* if we're changing the locale setting without an explicit config option, we'll
emit a warning on stderr that we're doing so rather than silently changing
the process configuration. This will alert application and system integrators
to the change, even if they don't closely follow the PEP process or Python
release announcements. However, to minimize the chance of introducing new
problems for end users, we'll do this *without* using the warnings system, so
even running with ``-Werror`` won't turn it into a runtime exception
The general design principle of Python 3 to prefer raising an exception over
incorrectly encoding or decoding data then leads to the following additional
design guideline:
* if a UTF-8 based Linux container is run on a host that is explicitly
configured to use a non-UTF-8 encoding, and tries to exchange locally
encoded data with that host rather than exchanging explicitly UTF-8 encoded
data, this will ideally lead to an immediate runtime exception rather than
to silent data corruption
Specification
=============
@ -489,17 +517,25 @@ are valid:
default encoding of ASCII the way CPython currently does
Using "strict" error handling by default
----------------------------------------
Defaulting to "strict" error handling on the standard IO streams
----------------------------------------------------------------
By coercing the locale away from the legacy C default and its assumption of
ASCII as the preferred text encoding, this PEP also disables the implicit use
of the "surrogateescape" error handler on the standard IO streams that was
introduced in Python 3.5.
introduced in Python 3.5 ([15_]).
This is deliberate, as while UTF-8 as the preferred text encoding is a good
working assumption for network service development and for more recent releases
of client operating systems, it still isn't a universally valid assumption.
This is deliberate, as that change was primarily aimed at handling the case
where the correct system encoding was the ASCII-compatible UTF-8 (or another
ASCII-compatible encoding), but the nominal encoding used for operating system
interfaces in the current process was ASCII.
With this PEP, that assumption is being narrowed a step further, such that
rather than assuming "an ASCII-compatible encoding", we instead assume UTF-8
specifically. If that assumption is genuinely wrong, then it continues to be
friendlier to users of other encodings to alert them to the runtime's mistaken
assumption, rather than continuing on and potentially corrupting their data
permanently.
In particular, GB 18030 [12_] is a Chinese national text encoding standard
that handles all Unicode code points, but is incompatible with both ASCII and
@ -514,6 +550,10 @@ container application that is assuming the use of UTF-8 or vice-versa is likely
to cause an immediate Unicode encoding or decoding error, rather than
potentially causing silent data corruption.
For users that would prefer more permissive behaviour, setting
``PYTHONIOENCODING=:surrogateescape`` will continue to be supported, as this
PEP makes no changes to that feature.
Dropping official support for Unicode handling in the legacy C locale
---------------------------------------------------------------------
@ -722,6 +762,9 @@ References
.. [14] ISO-2022
(https://en.wikipedia.org/wiki/ISO/IEC_2022)
.. [15] Use "surrogateescape" error handler for sys.stdin and sys.stdout on UNIX for the C locale
(https://bugs.python.org/issue19977)
Copyright
=========