diff --git a/pep-0538.txt b/pep-0538.txt index 6376ce16b..1a6a1fa6d 100644 --- a/pep-0538.txt +++ b/pep-0538.txt @@ -292,6 +292,34 @@ locale category, rather than overriding all the locale categories:: ℙƴ☂ℌøἤ +Design Principles +================= + +The above motivation leads to the following core design principles for the +proposed solution: + +* if a locale other than the default C locale is explicitly configured, we'll + continue to respect it +* if we're changing the locale setting without an explicit config option, we'll + emit a warning on stderr that we're doing so rather than silently changing + the process configuration. This will alert application and system integrators + to the change, even if they don't closely follow the PEP process or Python + release announcements. However, to minimize the chance of introducing new + problems for end users, we'll do this *without* using the warnings system, so + even running with ``-Werror`` won't turn it into a runtime exception + +The general design principle of Python 3 to prefer raising an exception over +incorrectly encoding or decoding data then leads to the following additional +design guideline: + +* if a UTF-8 based Linux container is run on a host that is explicitly + configured to use a non-UTF-8 encoding, and tries to exchange locally + encoded data with that host rather than exchanging explicitly UTF-8 encoded + data, this will ideally lead to an immediate runtime exception rather than + to silent data corruption + + + Specification ============= @@ -489,17 +517,25 @@ are valid: default encoding of ASCII the way CPython currently does -Using "strict" error handling by default ----------------------------------------- +Defaulting to "strict" error handling on the standard IO streams +---------------------------------------------------------------- By coercing the locale away from the legacy C default and its assumption of ASCII as the preferred text encoding, this PEP also disables the implicit use of the "surrogateescape" error handler on the standard IO streams that was -introduced in Python 3.5. +introduced in Python 3.5 ([15_]). -This is deliberate, as while UTF-8 as the preferred text encoding is a good -working assumption for network service development and for more recent releases -of client operating systems, it still isn't a universally valid assumption. +This is deliberate, as that change was primarily aimed at handling the case +where the correct system encoding was the ASCII-compatible UTF-8 (or another +ASCII-compatible encoding), but the nominal encoding used for operating system +interfaces in the current process was ASCII. + +With this PEP, that assumption is being narrowed a step further, such that +rather than assuming "an ASCII-compatible encoding", we instead assume UTF-8 +specifically. If that assumption is genuinely wrong, then it continues to be +friendlier to users of other encodings to alert them to the runtime's mistaken +assumption, rather than continuing on and potentially corrupting their data +permanently. In particular, GB 18030 [12_] is a Chinese national text encoding standard that handles all Unicode code points, but is incompatible with both ASCII and @@ -514,6 +550,10 @@ container application that is assuming the use of UTF-8 or vice-versa is likely to cause an immediate Unicode encoding or decoding error, rather than potentially causing silent data corruption. +For users that would prefer more permissive behaviour, setting +``PYTHONIOENCODING=:surrogateescape`` will continue to be supported, as this +PEP makes no changes to that feature. + Dropping official support for Unicode handling in the legacy C locale --------------------------------------------------------------------- @@ -722,6 +762,9 @@ References .. [14] ISO-2022 (https://en.wikipedia.org/wiki/ISO/IEC_2022) +.. [15] Use "surrogateescape" error handler for sys.stdin and sys.stdout on UNIX for the C locale + (https://bugs.python.org/issue19977) + Copyright =========