PEP 538: update based on implementation progress

- using PYTHONIOENCODING poses a compatibility problem for
  Python 2 subprocesses, so use Py_SetStandardStreamEncoding
  instead
- note that components checking for "no output on stderr
  means success" will either need to avoid the warning or
  switch to checking return codes instead
- Docker, Inc. ends with a full stop, not a comma (noted by
  Jan Pokorný)
- explicitly acknowledge Charalampos Stratakis's work on the
  Fedora 26 backport
This commit is contained in:
Nick Coghlan 2017-03-17 18:27:53 +10:00
parent 1085515c33
commit 0789423c46
1 changed files with 37 additions and 17 deletions

View File

@ -48,9 +48,10 @@ changed such that:
the standalone CPython binary will automatically attempt to coerce the ``C``
locale to the first available locale out of ``C.UTF-8``, ``C.utf8``, or
``UTF-8``
* if the locale is successfully coerced, and PEP 540 is not accepted, then
``PYTHONIOENCODING`` (if not otherwise set) will be set to
``utf-8:surrogateescape``.
* if the locale is successfully coerced, PEP 540 is not accepted, and the
``PYTHONIOENCODING`` environment variable is not set, then
``Py_SetStandardStreamEncoding`` will be called with ``"utf-8"`` and
``"surrogateescape"`` as arguments.
* if the locale is successfully coerced, and PEP 540 *is* accepted, then
``PYTHONUTF8`` (if not otherwise set) will be set to ``1``
* if the subsequent runtime initialization process detects that the legacy
@ -279,7 +280,7 @@ locale that both distros provide::
LC_CTYPE="C.UTF-8"
LC_ALL=
The Alpine Linux based Python images provided by Docker, Inc, already use the
The Alpine Linux based Python images provided by Docker, Inc. already use the
C.UTF-8 locale by default::
$ docker run --rm python:3 python3 -c 'print("ℙƴ☂ℌøἤ")'
@ -303,8 +304,8 @@ this unfortunately doesn't help on platforms that ship older versions of glibc
without that feature, and also don't provide C.UTF-8 as an on-disk locale the
way Debian and Fedora do. For these platforms, the mechanism proposed in
PEP 540 at least allows CPython itself to behave sensibly, albeit without any
mechanism to get other C/C++ components that decode binary streams as text to
do the same.
common mechanism to get other C/C++ components that decode binary streams as
text to do the same.
Design Principles
@ -347,9 +348,9 @@ run as a standalone command line application.
It further proposes to emit a warning on stderr if the legacy ``C`` locale
is in effect at the point where the language runtime itself is initialized,
the explicit environmental flag to disable locale coercion is not set, and
the PEP 540 UTF-8 encoding override is also disabled, in order to warn
system and application integrators that they're running CPython in an
unsupported configuration.
the PEP 540 UTF-8 encoding override is also disabled (or not implemented), in
order to warn system and application integrators that they're running CPython
in an unsupported configuration.
Legacy C locale coercion in the standalone Python interpreter binary
@ -404,8 +405,10 @@ will be implemented at runtime on all platforms other than Mac OS X and Windows,
rather than attempting to determine which locales to try at compile time.
If the locale settings are changed successfully, and the ``PYTHONIOENCODING``
environment variable is currently unset, then it will be forced to
``PYTHONIOENCODING=utf-8:surrogateescape``.
environment variable is currently unset, then Py_SetStandardStreamEncoding will
be called to force the standard IO streams to ``utf-8`` as the nominal text
encoding and ``surrogateescape`` as the error handler (``stderr`` will
continue to use ``backslashreplace`` as it's error handler as usual)`.
When this locale coercion is activated, the following warning will be
printed on stderr, with the warning containing whichever locale was
@ -427,6 +430,12 @@ settings, SSH forwarding of unknown locales, and developers explicitly
requesting ``LANG=C``), as long as the target platform provides at least one
of the candidate UTF-8 based environments.
The one case where failures may still occur is when ``stderr`` is specifically
being checked for no output, which can be resolved either by configuring
a locale other than the C locale, or else by using a mechanism other than
"there was no output on stderr" to check for subprocess errors (e.g. checking
process return codes).
If none of the candidate locales are successfully configured, then
initialization will continue in the C locale and the Unicode compatibility
warning described in the next section will be emitted just as it would for
@ -571,9 +580,10 @@ introduced in Python 3.5 ([15_]), as well as the automatic use of
``surrogateescape`` when operating in PEP 540's UTF-8 mode.
Rather than introducing yet another configuration option to address that,
this PEP proposes to use the existing ``PYTHONIOENCODING`` setting to ensure
that the ``surrogateescape`` handler is enabled when the interpreter is
required to make assumptions regarding the expected filesystem encoding.
this PEP proposes to use the existing ``PySettStandardStreamEncoding``
interface to ensure that the ``surrogateescape`` handler is enabled when
the interpreter is required to make assumptions regarding the expected
filesystem encoding.
The aim of this behaviour is to attempt to ensure that operating system
provided text values are typically able to be transparently passed through a
@ -682,8 +692,14 @@ explicitly configured::
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x81 in position 0: invalid start byte
Note: an alternative to setting ``PYTHONIOENCODING`` as the PEP currently
proposes would be to instead *always* default to ``surrogateescape`` on the
Note: in order to also affect subprocesses running Python 3, earlier versions
of this PEP proposed setting ``PYTHONIOENCODING`` to ``utf-8:surrogateescape``
rather than calling ``Py_SetStandardStreamEncoding`` when the locale coercion
triggered. Unfortunately, this approach proved to have undesirable side
effects when Python 2 applications were invoked in subprocesses (as there is
no ``surrogateescape`` error handler available in Python 2).
Another design option would be to *always* default to ``surrogateescape`` on the
standard streams, and require the use of ``PYTHONIOENCODING=:strict`` to request
text encoding validation during stream processing. Adopting such an approach
would bring Python 3 more into line with typical C/C++ tools that pass along
@ -697,7 +713,8 @@ and would hence also make the last example display the desired output::
However, such a change would have broader implications than the C locale
specific changes currently proposed, so it is considered out of scope for this
PEP.
PEP. Instead, an improved solution is left to the combination of this PEP with
PEP 540, by automatically setting ``PYTHONUTF8=1`` when locale coercion occurs.
Dropping official support for ASCII based text handling in the legacy C locale
@ -869,6 +886,9 @@ utility development framework [2_]::
The change was originally proposed as a downstream patch for Fedora's
system Python 3.6 package [3_], and then reformulated as a PEP for Python 3.7
with a section allowing for backports to earlier versions by redistributors.
In parallel with the development of the upstream patch, Charalampos Stratakis
has been working on the Fedora 26 backport and providing feedback on the
practical viability of the proposed changes.
The initial draft was posted to the Python Linux SIG for discussion [10_] and
then amended based on both that discussion and Victor Stinner's work in