PEP 538: document core design principles

Also provides a bit more background on the rationale for using "strict" by default on stdin and stdout when coercing the locale to one based on UTF-8
2017-01-08 11:54:24 +10:00 · 2017-01-08 11:54:24 +10:00 · 1f1abb3b6a
parent 927e704d7e
commit 1f1abb3b6a
1 changed files with 49 additions and 6 deletions
--- a/pep-0538.txt
+++ b/pep-0538.txt
@ -292,6 +292,34 @@ locale category, rather than overriding all the locale categories::
    ℙƴ☂ℌøἤ


+Design Principles
+=================
+
+The above motivation leads to the following core design principles for the
+proposed solution:
+
+* if a locale other than the default C locale is explicitly configured, we'll
+  continue to respect it
+* if we're changing the locale setting without an explicit config option, we'll
+  emit a warning on stderr that we're doing so rather than silently changing
+  the process configuration. This will alert application and system integrators
+  to the change, even if they don't closely follow the PEP process or Python
+  release announcements. However, to minimize the chance of introducing new
+  problems for end users, we'll do this *without* using the warnings system, so
+  even running with ``-Werror`` won't turn it into a runtime exception
+
+The general design principle of Python 3 to prefer raising an exception over
+incorrectly encoding or decoding data then leads to the following additional
+design guideline:
+
+* if a UTF-8 based Linux container is run on a host that is explicitly
+  configured to use a non-UTF-8 encoding, and tries to exchange locally
+  encoded data with that host rather than exchanging explicitly UTF-8 encoded
+  data, this will ideally lead to an immediate runtime exception rather than
+  to silent data corruption
+
+
+
 Specification
 =============

@ -489,17 +517,25 @@ are valid:
  default encoding of ASCII the way CPython currently does


-Using "strict" error handling by default
----------------------------------------
+Defaulting to "strict" error handling on the standard IO streams
+----------------------------------------------------------------

 By coercing the locale away from the legacy C default and its assumption of
 ASCII as the preferred text encoding, this PEP also disables the implicit use
 of the "surrogateescape" error handler on the standard IO streams that was
-introduced in Python 3.5.
+introduced in Python 3.5 ([15_]).

-This is deliberate, as while UTF-8 as the preferred text encoding is a good
-working assumption for network service development and for more recent releases
-of client operating systems, it still isn't a universally valid assumption.
+This is deliberate, as that change was primarily aimed at handling the case
+where the correct system encoding was the ASCII-compatible UTF-8 (or another
+ASCII-compatible encoding), but the nominal encoding used for operating system
+interfaces in the current process was ASCII.
+
+With this PEP, that assumption is being narrowed a step further, such that
+rather than assuming "an ASCII-compatible encoding", we instead assume UTF-8
+specifically. If that assumption is genuinely wrong, then it continues to be
+friendlier to users of other encodings to alert them to the runtime's mistaken
+assumption, rather than continuing on and potentially corrupting their data
+permanently.

 In particular, GB 18030 [12_] is a Chinese national text encoding standard
 that handles all Unicode code points, but is incompatible with both ASCII and
@ -514,6 +550,10 @@ container application that is assuming the use of UTF-8 or vice-versa is likely
 to cause an immediate Unicode encoding or decoding error, rather than
 potentially causing silent data corruption.

+For users that would prefer more permissive behaviour, setting
+``PYTHONIOENCODING=:surrogateescape`` will continue to be supported, as this
+PEP makes no changes to that feature.
+

 Dropping official support for Unicode handling in the legacy C locale
 ---------------------------------------------------------------------
@ -722,6 +762,9 @@ References
 .. [14] ISO-2022
   (https://en.wikipedia.org/wiki/ISO/IEC_2022)

+.. [15] Use "surrogateescape" error handler for sys.stdin and sys.stdout on UNIX for the C locale
+   (https://bugs.python.org/issue19977)
+
 Copyright
 =========