PEP 538: Update reference implementation (#219)
- updates reference implementation to use PYTHONCOERCECLOCALE - removes hard dependency on PEP 540 - still notes PEP 540 covers case where no relevant C-with-UTF-8 locale is available - clarifies that these settings are still recommended over the legacy C locale settings for older Python 3 versions, even if we don't recommend backporting the automatic coercion
This commit is contained in:
parent
5f82542ec4
commit
a20a56ceb5
93
pep-0538.txt
93
pep-0538.txt
|
@ -6,7 +6,6 @@ Author: Nick Coghlan <ncoghlan@gmail.com>
|
|||
Status: Draft
|
||||
Type: Standards Track
|
||||
Content-Type: text/x-rst
|
||||
Requires: 540
|
||||
Created: 28-Dec-2016
|
||||
Python-Version: 3.7
|
||||
Post-History: 03-Jan-2017 (linux-sig),
|
||||
|
@ -28,36 +27,49 @@ PEP 540 proposes a change to CPython's handling of the legacy C locale such
|
|||
that CPython will assume the use of UTF-8 in such environments, rather than
|
||||
persisting with the demonstrably problematic assumption of ASCII as an
|
||||
appropriate encoding for communicating with operating system interfaces.
|
||||
This is a good approach for cases where network encoding interoperability
|
||||
is a more important concern than local encoding interoperability.
|
||||
|
||||
However, it comes at the cost of making CPython's encoding assumptions diverge
|
||||
from those of other C and C++ components in the same process, as well as those
|
||||
of components running in subprocesses that share the same environment.
|
||||
|
||||
Accordingly, this PEP further proposes that the way the CPython implementation
|
||||
handles the default C locale be changed such that:
|
||||
It also requires changes to the internals of how CPython itself works, rather
|
||||
than using existing configuration settings that are supported by Python
|
||||
versions prior to Python 3.7.
|
||||
|
||||
* the standalone CPython binary will automatically attempt to coerce the ``C``
|
||||
locale to ``C.UTF-8``, ``C.utf8``, or ``UTF-8`` (depending on the system),
|
||||
unless the new ``PYTHONCOERCECLOCALE`` environment variable is set to ``0``
|
||||
Accordingly, this PEP proposes that independently of the UTF-8 mode proposed
|
||||
in PEP 540, the way the CPython implementation handles the default C locale be
|
||||
changed such that:
|
||||
|
||||
* unless the new ``PYTHONCOERCECLOCALE`` environment variable is set to ``0``,
|
||||
the standalone CPython binary will automatically attempt to coerce the ``C``
|
||||
locale to the first available locale out of ``C.UTF-8``, ``C.utf8``, or
|
||||
``UTF-8``
|
||||
* if the locale is successfully coerced, and PEP 540 is not accepted, then
|
||||
``PYTHONIOENCODING`` (if not otherwise set) will be set to
|
||||
``utf-8:surrogateescape``.
|
||||
* if the locale is successfully coerced, and PEP 540 *is* accepted, then
|
||||
``PYTHONUTF8`` (if not otherwise set) will be set to ``1``
|
||||
* if the subsequent runtime initialization process detects that the legacy
|
||||
``C`` locale remains active (e.g. none of ``C.UTF-8``, ``C.utf8`` or ``UTF-8``
|
||||
are available, locale coercion is disabled, or the runtime is embedded in an
|
||||
application other than the main CPython binary), and the ``PYTHONUTF8``
|
||||
feature defined in PEP 540 is also disabled, it will emit a warning on
|
||||
stderr that use of the legacy ``C`` locale's default ASCII text encoding
|
||||
may cause various Unicode compatibility issues
|
||||
feature defined in PEP 540 is also disabled (or not implemented), it will
|
||||
emit a warning on stderr that use of the legacy ``C`` locale's default ASCII
|
||||
text encoding may cause various Unicode compatibility issues
|
||||
|
||||
With this change, any \*nix platform that does *not* offer at least one of the
|
||||
``C.UTF-8``, ``C.utf8`` or ``UTF-8`` locales as part of its standard
|
||||
configuration would only be considered a fully supported platform for CPython
|
||||
3.7+ deployments when either the new ``PYTHONUTF8`` defined in PEP 540 is used,
|
||||
or else a suitable locale other than the default ``C`` locale is configured
|
||||
explicitly (e.g. ``zh_CN.gb18030``).
|
||||
3.7+ deployments when either the new ``PYTHONUTF8`` mode defined in PEP 540 is
|
||||
used, or else a suitable locale other than the default ``C`` locale is
|
||||
configured explicitly (e.g. `en_AU.UTF-8`, ``zh_CN.gb18030``).
|
||||
|
||||
Redistributors (such as Linux distributions) with a narrower target audience
|
||||
than the upstream CPython development team may also choose to opt in to this
|
||||
behaviour for the Python 3.6.x series by applying the necessary changes as a
|
||||
downstream patch when first introducing Python 3.6.0.
|
||||
locale coercion behaviour for the Python 3.6.x series by applying the necessary
|
||||
changes as a downstream patch when first introducing Python 3.6.0.
|
||||
|
||||
|
||||
Background
|
||||
|
@ -120,7 +132,7 @@ still fail in the following cases:
|
|||
|
||||
* SSH environment forwarding means that SSH clients may sometimes forward
|
||||
client locale settings to servers that don't have that locale installed. This
|
||||
leads to CPython running in the default ASCII-based C locale.
|
||||
leads to CPython running in the default ASCII-based C locale
|
||||
* some process environments (such as Linux containers) may not have any
|
||||
explicit locale configured at all. As with unknown locales, this leads to
|
||||
CPython running in the default ASCII-based C locale
|
||||
|
@ -156,7 +168,7 @@ Relationship with other PEPs
|
|||
============================
|
||||
|
||||
This PEP shares a common problem statement with PEP 540 (improving Python 3's
|
||||
behaviour in the default C locale), but diverged markedly in the proposed
|
||||
behaviour in the default C locale), but diverges markedly in the proposed
|
||||
solution:
|
||||
|
||||
* PEP 540 proposes to entirely decouple CPython's default text encoding from
|
||||
|
@ -174,7 +186,7 @@ solution:
|
|||
traditional strong support for integration with other components written
|
||||
in C and C++, while actively helping to push forward the adoption and
|
||||
standardisation of the C.UTF-8 locale as a Unicode-aware replacement for
|
||||
the legacy C locale in the wider Linux ecosystem
|
||||
the legacy C locale in the wider C/C++ ecosystem
|
||||
|
||||
After reviewing both PEPs, it became clear that they didn't actually conflict
|
||||
at a technical level, and the proposal in PEP 540 offered a superior option in
|
||||
|
@ -183,14 +195,18 @@ reference behaviour for platforms where the notion of a "locale encoding"
|
|||
doesn't make sense (for example, embedded systems running MicroPython rather
|
||||
than the CPython reference interpreter).
|
||||
|
||||
As a result, this PEP was amended to specify PEP 540 as a pre-requisite, with
|
||||
the aim being to coerce other C/C++ components into behaving consistently with
|
||||
CPython's assumption of UTF-8 as the system encoding, rather than CPython itself
|
||||
relying on that setting change.
|
||||
Meanwhile, this PEP offered improved compatibility with other C/C++ components,
|
||||
and an approach more amenable to being backported to Python 3.6 by downstream
|
||||
redistributors.
|
||||
|
||||
As a result of that change, the ``LC_CTYPE=en_US.UTF-8`` legacy fallback was
|
||||
removed from the list of UTF-8 locales tried as a coercion target, with CPython
|
||||
instead relying solely on the C locale text encoding bypass in such cases.
|
||||
As a result, this PEP was amended to refer to PEP 540 as a complementary
|
||||
solution that offered improved behaviour both when locale coercion triggered,
|
||||
as well as when none of the standard UTF-8 based locales were available.
|
||||
|
||||
The availability of PEP 540 also meant that the ``LC_CTYPE=en_US.UTF-8`` legacy
|
||||
fallback was removed from the list of UTF-8 locales tried as a coercion target,
|
||||
with CPython instead relying solely on the proposed PYTHONUTF8 mode in such
|
||||
cases.
|
||||
|
||||
|
||||
Motivation
|
||||
|
@ -203,9 +219,8 @@ application development. Technologies like Gnome Flatpak [7_] and
|
|||
Ubunty Snappy [8_] further aim to bring these same techniques to Linux GUI
|
||||
application development.
|
||||
|
||||
When using Python 3 for application development in
|
||||
these contexts, it isn't uncommon to see text encoding related errors akin to
|
||||
the following::
|
||||
When using Python 3 for application development in these contexts, it isn't
|
||||
uncommon to see text encoding related errors akin to the following::
|
||||
|
||||
$ docker run --rm fedora:25 python3 -c 'print("ℙƴ☂ℌøἤ")'
|
||||
Unable to decode the command from the command line:
|
||||
|
@ -304,6 +319,7 @@ proposed solution:
|
|||
release announcements. However, to minimize the chance of introducing new
|
||||
problems for end users, we'll do this *without* using the warnings system, so
|
||||
even running with ``-Werror`` won't turn it into a runtime exception
|
||||
* any changes made will use *existing* configuration options
|
||||
|
||||
To minimize the negative impact on systems currently correctly configured to
|
||||
use GB-18030 or another partially ASCII compatible universal encoding leads to
|
||||
|
@ -434,7 +450,8 @@ be issued::
|
|||
|
||||
Python runtime initialized with LC_CTYPE=C (a locale with default ASCII
|
||||
encoding), which may cause Unicode compatibility problems. Using C.UTF-8
|
||||
(if available) as an alternative Unicode-compatible locale is recommended.
|
||||
C.utf8, or UTF-8 (if available) as alternative Unicode-compatible
|
||||
locales is recommended.
|
||||
|
||||
In this case, no actual change will be made to the locale settings.
|
||||
|
||||
|
@ -754,14 +771,15 @@ runtimes even when running a version with this change applied.
|
|||
Implementation
|
||||
==============
|
||||
|
||||
A draft implementation of the change (including test cases) has been
|
||||
posted to issue 28180 [1_], which is an end user request that
|
||||
A draft implementation of the change (including test cases and documentation)
|
||||
is linked from issue 28180 [1_], which is an end user request that
|
||||
``sys.getfilesystemencoding()`` default to ``utf-8`` rather than ``ascii``.
|
||||
|
||||
NOTE: The currently posted draft implementation is for a previous iteration
|
||||
of the PEP prior to the incorporation of the feedback noted in [11_]. It was
|
||||
broadly the same in concept (i.e. coercing the legacy C locale to one based on
|
||||
UTF-8), but differs in several details.
|
||||
This patch is now being maintained as the ``pep538-coerce-c-locale`` feature
|
||||
branch [18_] in Nick Coghlan's fork of the CPython repository on GitHub.
|
||||
|
||||
NOTE: As discussed in [1_], the currently posted draft implementation has some
|
||||
known issues on Android.
|
||||
|
||||
|
||||
Backporting to earlier Python 3 releases
|
||||
|
@ -789,6 +807,10 @@ backport it to even earlier Python 3.x releases based on the needs and
|
|||
interests of their particular user base, this wouldn't be encouraged as a
|
||||
general practice.
|
||||
|
||||
However, configuring Python 3 *environments* (such as base container
|
||||
images) to use these configuration settings by default is both allowed
|
||||
and recommended.
|
||||
|
||||
|
||||
Acknowledgements
|
||||
================
|
||||
|
@ -882,6 +904,9 @@ References
|
|||
.. [17] UTF-8 locale discussion on "locale.getdefaultlocale() fails on Mac OS X with default language set to English"
|
||||
(http://bugs.python.org/issue18378#msg215215)
|
||||
|
||||
.. [18] GitHub branch diff for ``ncoghlan:pep538-coerce-c-locale``
|
||||
(https://github.com/python/cpython/compare/master...ncoghlan:pep538-coerce-c-locale)
|
||||
|
||||
Copyright
|
||||
=========
|
||||
|
||||
|
|
Loading…
Reference in New Issue