PEP 538: update based on implementation progress
- using PYTHONIOENCODING poses a compatibility problem for Python 2 subprocesses, so use Py_SetStandardStreamEncoding instead - note that components checking for "no output on stderr means success" will either need to avoid the warning or switch to checking return codes instead - Docker, Inc. ends with a full stop, not a comma (noted by Jan Pokorný) - explicitly acknowledge Charalampos Stratakis's work on the Fedora 26 backport
This commit is contained in:
parent
1085515c33
commit
0789423c46
54
pep-0538.txt
54
pep-0538.txt
|
@ -48,9 +48,10 @@ changed such that:
|
|||
the standalone CPython binary will automatically attempt to coerce the ``C``
|
||||
locale to the first available locale out of ``C.UTF-8``, ``C.utf8``, or
|
||||
``UTF-8``
|
||||
* if the locale is successfully coerced, and PEP 540 is not accepted, then
|
||||
``PYTHONIOENCODING`` (if not otherwise set) will be set to
|
||||
``utf-8:surrogateescape``.
|
||||
* if the locale is successfully coerced, PEP 540 is not accepted, and the
|
||||
``PYTHONIOENCODING`` environment variable is not set, then
|
||||
``Py_SetStandardStreamEncoding`` will be called with ``"utf-8"`` and
|
||||
``"surrogateescape"`` as arguments.
|
||||
* if the locale is successfully coerced, and PEP 540 *is* accepted, then
|
||||
``PYTHONUTF8`` (if not otherwise set) will be set to ``1``
|
||||
* if the subsequent runtime initialization process detects that the legacy
|
||||
|
@ -279,7 +280,7 @@ locale that both distros provide::
|
|||
LC_CTYPE="C.UTF-8"
|
||||
LC_ALL=
|
||||
|
||||
The Alpine Linux based Python images provided by Docker, Inc, already use the
|
||||
The Alpine Linux based Python images provided by Docker, Inc. already use the
|
||||
C.UTF-8 locale by default::
|
||||
|
||||
$ docker run --rm python:3 python3 -c 'print("ℙƴ☂ℌøἤ")'
|
||||
|
@ -303,8 +304,8 @@ this unfortunately doesn't help on platforms that ship older versions of glibc
|
|||
without that feature, and also don't provide C.UTF-8 as an on-disk locale the
|
||||
way Debian and Fedora do. For these platforms, the mechanism proposed in
|
||||
PEP 540 at least allows CPython itself to behave sensibly, albeit without any
|
||||
mechanism to get other C/C++ components that decode binary streams as text to
|
||||
do the same.
|
||||
common mechanism to get other C/C++ components that decode binary streams as
|
||||
text to do the same.
|
||||
|
||||
|
||||
Design Principles
|
||||
|
@ -347,9 +348,9 @@ run as a standalone command line application.
|
|||
It further proposes to emit a warning on stderr if the legacy ``C`` locale
|
||||
is in effect at the point where the language runtime itself is initialized,
|
||||
the explicit environmental flag to disable locale coercion is not set, and
|
||||
the PEP 540 UTF-8 encoding override is also disabled, in order to warn
|
||||
system and application integrators that they're running CPython in an
|
||||
unsupported configuration.
|
||||
the PEP 540 UTF-8 encoding override is also disabled (or not implemented), in
|
||||
order to warn system and application integrators that they're running CPython
|
||||
in an unsupported configuration.
|
||||
|
||||
|
||||
Legacy C locale coercion in the standalone Python interpreter binary
|
||||
|
@ -404,8 +405,10 @@ will be implemented at runtime on all platforms other than Mac OS X and Windows,
|
|||
rather than attempting to determine which locales to try at compile time.
|
||||
|
||||
If the locale settings are changed successfully, and the ``PYTHONIOENCODING``
|
||||
environment variable is currently unset, then it will be forced to
|
||||
``PYTHONIOENCODING=utf-8:surrogateescape``.
|
||||
environment variable is currently unset, then Py_SetStandardStreamEncoding will
|
||||
be called to force the standard IO streams to ``utf-8`` as the nominal text
|
||||
encoding and ``surrogateescape`` as the error handler (``stderr`` will
|
||||
continue to use ``backslashreplace`` as it's error handler as usual)`.
|
||||
|
||||
When this locale coercion is activated, the following warning will be
|
||||
printed on stderr, with the warning containing whichever locale was
|
||||
|
@ -427,6 +430,12 @@ settings, SSH forwarding of unknown locales, and developers explicitly
|
|||
requesting ``LANG=C``), as long as the target platform provides at least one
|
||||
of the candidate UTF-8 based environments.
|
||||
|
||||
The one case where failures may still occur is when ``stderr`` is specifically
|
||||
being checked for no output, which can be resolved either by configuring
|
||||
a locale other than the C locale, or else by using a mechanism other than
|
||||
"there was no output on stderr" to check for subprocess errors (e.g. checking
|
||||
process return codes).
|
||||
|
||||
If none of the candidate locales are successfully configured, then
|
||||
initialization will continue in the C locale and the Unicode compatibility
|
||||
warning described in the next section will be emitted just as it would for
|
||||
|
@ -571,9 +580,10 @@ introduced in Python 3.5 ([15_]), as well as the automatic use of
|
|||
``surrogateescape`` when operating in PEP 540's UTF-8 mode.
|
||||
|
||||
Rather than introducing yet another configuration option to address that,
|
||||
this PEP proposes to use the existing ``PYTHONIOENCODING`` setting to ensure
|
||||
that the ``surrogateescape`` handler is enabled when the interpreter is
|
||||
required to make assumptions regarding the expected filesystem encoding.
|
||||
this PEP proposes to use the existing ``PySettStandardStreamEncoding``
|
||||
interface to ensure that the ``surrogateescape`` handler is enabled when
|
||||
the interpreter is required to make assumptions regarding the expected
|
||||
filesystem encoding.
|
||||
|
||||
The aim of this behaviour is to attempt to ensure that operating system
|
||||
provided text values are typically able to be transparently passed through a
|
||||
|
@ -682,8 +692,14 @@ explicitly configured::
|
|||
(result, consumed) = self._buffer_decode(data, self.errors, final)
|
||||
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x81 in position 0: invalid start byte
|
||||
|
||||
Note: an alternative to setting ``PYTHONIOENCODING`` as the PEP currently
|
||||
proposes would be to instead *always* default to ``surrogateescape`` on the
|
||||
Note: in order to also affect subprocesses running Python 3, earlier versions
|
||||
of this PEP proposed setting ``PYTHONIOENCODING`` to ``utf-8:surrogateescape``
|
||||
rather than calling ``Py_SetStandardStreamEncoding`` when the locale coercion
|
||||
triggered. Unfortunately, this approach proved to have undesirable side
|
||||
effects when Python 2 applications were invoked in subprocesses (as there is
|
||||
no ``surrogateescape`` error handler available in Python 2).
|
||||
|
||||
Another design option would be to *always* default to ``surrogateescape`` on the
|
||||
standard streams, and require the use of ``PYTHONIOENCODING=:strict`` to request
|
||||
text encoding validation during stream processing. Adopting such an approach
|
||||
would bring Python 3 more into line with typical C/C++ tools that pass along
|
||||
|
@ -697,7 +713,8 @@ and would hence also make the last example display the desired output::
|
|||
|
||||
However, such a change would have broader implications than the C locale
|
||||
specific changes currently proposed, so it is considered out of scope for this
|
||||
PEP.
|
||||
PEP. Instead, an improved solution is left to the combination of this PEP with
|
||||
PEP 540, by automatically setting ``PYTHONUTF8=1`` when locale coercion occurs.
|
||||
|
||||
|
||||
Dropping official support for ASCII based text handling in the legacy C locale
|
||||
|
@ -869,6 +886,9 @@ utility development framework [2_]::
|
|||
The change was originally proposed as a downstream patch for Fedora's
|
||||
system Python 3.6 package [3_], and then reformulated as a PEP for Python 3.7
|
||||
with a section allowing for backports to earlier versions by redistributors.
|
||||
In parallel with the development of the upstream patch, Charalampos Stratakis
|
||||
has been working on the Fedora 26 backport and providing feedback on the
|
||||
practical viability of the proposed changes.
|
||||
|
||||
The initial draft was posted to the Python Linux SIG for discussion [10_] and
|
||||
then amended based on both that discussion and Victor Stinner's work in
|
||||
|
|
Loading…
Reference in New Issue