PEP 538: Update reference implementation (#219)
- updates reference implementation to use PYTHONCOERCECLOCALE - removes hard dependency on PEP 540 - still notes PEP 540 covers case where no relevant C-with-UTF-8 locale is available - clarifies that these settings are still recommended over the legacy C locale settings for older Python 3 versions, even if we don't recommend backporting the automatic coercion
This commit is contained in:
parent
5f82542ec4
commit
a20a56ceb5
93
pep-0538.txt
93
pep-0538.txt
|
@ -6,7 +6,6 @@ Author: Nick Coghlan <ncoghlan@gmail.com>
|
||||||
Status: Draft
|
Status: Draft
|
||||||
Type: Standards Track
|
Type: Standards Track
|
||||||
Content-Type: text/x-rst
|
Content-Type: text/x-rst
|
||||||
Requires: 540
|
|
||||||
Created: 28-Dec-2016
|
Created: 28-Dec-2016
|
||||||
Python-Version: 3.7
|
Python-Version: 3.7
|
||||||
Post-History: 03-Jan-2017 (linux-sig),
|
Post-History: 03-Jan-2017 (linux-sig),
|
||||||
|
@ -28,36 +27,49 @@ PEP 540 proposes a change to CPython's handling of the legacy C locale such
|
||||||
that CPython will assume the use of UTF-8 in such environments, rather than
|
that CPython will assume the use of UTF-8 in such environments, rather than
|
||||||
persisting with the demonstrably problematic assumption of ASCII as an
|
persisting with the demonstrably problematic assumption of ASCII as an
|
||||||
appropriate encoding for communicating with operating system interfaces.
|
appropriate encoding for communicating with operating system interfaces.
|
||||||
|
This is a good approach for cases where network encoding interoperability
|
||||||
|
is a more important concern than local encoding interoperability.
|
||||||
|
|
||||||
However, it comes at the cost of making CPython's encoding assumptions diverge
|
However, it comes at the cost of making CPython's encoding assumptions diverge
|
||||||
from those of other C and C++ components in the same process, as well as those
|
from those of other C and C++ components in the same process, as well as those
|
||||||
of components running in subprocesses that share the same environment.
|
of components running in subprocesses that share the same environment.
|
||||||
|
|
||||||
Accordingly, this PEP further proposes that the way the CPython implementation
|
It also requires changes to the internals of how CPython itself works, rather
|
||||||
handles the default C locale be changed such that:
|
than using existing configuration settings that are supported by Python
|
||||||
|
versions prior to Python 3.7.
|
||||||
|
|
||||||
* the standalone CPython binary will automatically attempt to coerce the ``C``
|
Accordingly, this PEP proposes that independently of the UTF-8 mode proposed
|
||||||
locale to ``C.UTF-8``, ``C.utf8``, or ``UTF-8`` (depending on the system),
|
in PEP 540, the way the CPython implementation handles the default C locale be
|
||||||
unless the new ``PYTHONCOERCECLOCALE`` environment variable is set to ``0``
|
changed such that:
|
||||||
|
|
||||||
|
* unless the new ``PYTHONCOERCECLOCALE`` environment variable is set to ``0``,
|
||||||
|
the standalone CPython binary will automatically attempt to coerce the ``C``
|
||||||
|
locale to the first available locale out of ``C.UTF-8``, ``C.utf8``, or
|
||||||
|
``UTF-8``
|
||||||
|
* if the locale is successfully coerced, and PEP 540 is not accepted, then
|
||||||
|
``PYTHONIOENCODING`` (if not otherwise set) will be set to
|
||||||
|
``utf-8:surrogateescape``.
|
||||||
|
* if the locale is successfully coerced, and PEP 540 *is* accepted, then
|
||||||
|
``PYTHONUTF8`` (if not otherwise set) will be set to ``1``
|
||||||
* if the subsequent runtime initialization process detects that the legacy
|
* if the subsequent runtime initialization process detects that the legacy
|
||||||
``C`` locale remains active (e.g. none of ``C.UTF-8``, ``C.utf8`` or ``UTF-8``
|
``C`` locale remains active (e.g. none of ``C.UTF-8``, ``C.utf8`` or ``UTF-8``
|
||||||
are available, locale coercion is disabled, or the runtime is embedded in an
|
are available, locale coercion is disabled, or the runtime is embedded in an
|
||||||
application other than the main CPython binary), and the ``PYTHONUTF8``
|
application other than the main CPython binary), and the ``PYTHONUTF8``
|
||||||
feature defined in PEP 540 is also disabled, it will emit a warning on
|
feature defined in PEP 540 is also disabled (or not implemented), it will
|
||||||
stderr that use of the legacy ``C`` locale's default ASCII text encoding
|
emit a warning on stderr that use of the legacy ``C`` locale's default ASCII
|
||||||
may cause various Unicode compatibility issues
|
text encoding may cause various Unicode compatibility issues
|
||||||
|
|
||||||
With this change, any \*nix platform that does *not* offer at least one of the
|
With this change, any \*nix platform that does *not* offer at least one of the
|
||||||
``C.UTF-8``, ``C.utf8`` or ``UTF-8`` locales as part of its standard
|
``C.UTF-8``, ``C.utf8`` or ``UTF-8`` locales as part of its standard
|
||||||
configuration would only be considered a fully supported platform for CPython
|
configuration would only be considered a fully supported platform for CPython
|
||||||
3.7+ deployments when either the new ``PYTHONUTF8`` defined in PEP 540 is used,
|
3.7+ deployments when either the new ``PYTHONUTF8`` mode defined in PEP 540 is
|
||||||
or else a suitable locale other than the default ``C`` locale is configured
|
used, or else a suitable locale other than the default ``C`` locale is
|
||||||
explicitly (e.g. ``zh_CN.gb18030``).
|
configured explicitly (e.g. `en_AU.UTF-8`, ``zh_CN.gb18030``).
|
||||||
|
|
||||||
Redistributors (such as Linux distributions) with a narrower target audience
|
Redistributors (such as Linux distributions) with a narrower target audience
|
||||||
than the upstream CPython development team may also choose to opt in to this
|
than the upstream CPython development team may also choose to opt in to this
|
||||||
behaviour for the Python 3.6.x series by applying the necessary changes as a
|
locale coercion behaviour for the Python 3.6.x series by applying the necessary
|
||||||
downstream patch when first introducing Python 3.6.0.
|
changes as a downstream patch when first introducing Python 3.6.0.
|
||||||
|
|
||||||
|
|
||||||
Background
|
Background
|
||||||
|
@ -120,7 +132,7 @@ still fail in the following cases:
|
||||||
|
|
||||||
* SSH environment forwarding means that SSH clients may sometimes forward
|
* SSH environment forwarding means that SSH clients may sometimes forward
|
||||||
client locale settings to servers that don't have that locale installed. This
|
client locale settings to servers that don't have that locale installed. This
|
||||||
leads to CPython running in the default ASCII-based C locale.
|
leads to CPython running in the default ASCII-based C locale
|
||||||
* some process environments (such as Linux containers) may not have any
|
* some process environments (such as Linux containers) may not have any
|
||||||
explicit locale configured at all. As with unknown locales, this leads to
|
explicit locale configured at all. As with unknown locales, this leads to
|
||||||
CPython running in the default ASCII-based C locale
|
CPython running in the default ASCII-based C locale
|
||||||
|
@ -156,7 +168,7 @@ Relationship with other PEPs
|
||||||
============================
|
============================
|
||||||
|
|
||||||
This PEP shares a common problem statement with PEP 540 (improving Python 3's
|
This PEP shares a common problem statement with PEP 540 (improving Python 3's
|
||||||
behaviour in the default C locale), but diverged markedly in the proposed
|
behaviour in the default C locale), but diverges markedly in the proposed
|
||||||
solution:
|
solution:
|
||||||
|
|
||||||
* PEP 540 proposes to entirely decouple CPython's default text encoding from
|
* PEP 540 proposes to entirely decouple CPython's default text encoding from
|
||||||
|
@ -174,7 +186,7 @@ solution:
|
||||||
traditional strong support for integration with other components written
|
traditional strong support for integration with other components written
|
||||||
in C and C++, while actively helping to push forward the adoption and
|
in C and C++, while actively helping to push forward the adoption and
|
||||||
standardisation of the C.UTF-8 locale as a Unicode-aware replacement for
|
standardisation of the C.UTF-8 locale as a Unicode-aware replacement for
|
||||||
the legacy C locale in the wider Linux ecosystem
|
the legacy C locale in the wider C/C++ ecosystem
|
||||||
|
|
||||||
After reviewing both PEPs, it became clear that they didn't actually conflict
|
After reviewing both PEPs, it became clear that they didn't actually conflict
|
||||||
at a technical level, and the proposal in PEP 540 offered a superior option in
|
at a technical level, and the proposal in PEP 540 offered a superior option in
|
||||||
|
@ -183,14 +195,18 @@ reference behaviour for platforms where the notion of a "locale encoding"
|
||||||
doesn't make sense (for example, embedded systems running MicroPython rather
|
doesn't make sense (for example, embedded systems running MicroPython rather
|
||||||
than the CPython reference interpreter).
|
than the CPython reference interpreter).
|
||||||
|
|
||||||
As a result, this PEP was amended to specify PEP 540 as a pre-requisite, with
|
Meanwhile, this PEP offered improved compatibility with other C/C++ components,
|
||||||
the aim being to coerce other C/C++ components into behaving consistently with
|
and an approach more amenable to being backported to Python 3.6 by downstream
|
||||||
CPython's assumption of UTF-8 as the system encoding, rather than CPython itself
|
redistributors.
|
||||||
relying on that setting change.
|
|
||||||
|
|
||||||
As a result of that change, the ``LC_CTYPE=en_US.UTF-8`` legacy fallback was
|
As a result, this PEP was amended to refer to PEP 540 as a complementary
|
||||||
removed from the list of UTF-8 locales tried as a coercion target, with CPython
|
solution that offered improved behaviour both when locale coercion triggered,
|
||||||
instead relying solely on the C locale text encoding bypass in such cases.
|
as well as when none of the standard UTF-8 based locales were available.
|
||||||
|
|
||||||
|
The availability of PEP 540 also meant that the ``LC_CTYPE=en_US.UTF-8`` legacy
|
||||||
|
fallback was removed from the list of UTF-8 locales tried as a coercion target,
|
||||||
|
with CPython instead relying solely on the proposed PYTHONUTF8 mode in such
|
||||||
|
cases.
|
||||||
|
|
||||||
|
|
||||||
Motivation
|
Motivation
|
||||||
|
@ -203,9 +219,8 @@ application development. Technologies like Gnome Flatpak [7_] and
|
||||||
Ubunty Snappy [8_] further aim to bring these same techniques to Linux GUI
|
Ubunty Snappy [8_] further aim to bring these same techniques to Linux GUI
|
||||||
application development.
|
application development.
|
||||||
|
|
||||||
When using Python 3 for application development in
|
When using Python 3 for application development in these contexts, it isn't
|
||||||
these contexts, it isn't uncommon to see text encoding related errors akin to
|
uncommon to see text encoding related errors akin to the following::
|
||||||
the following::
|
|
||||||
|
|
||||||
$ docker run --rm fedora:25 python3 -c 'print("ℙƴ☂ℌøἤ")'
|
$ docker run --rm fedora:25 python3 -c 'print("ℙƴ☂ℌøἤ")'
|
||||||
Unable to decode the command from the command line:
|
Unable to decode the command from the command line:
|
||||||
|
@ -304,6 +319,7 @@ proposed solution:
|
||||||
release announcements. However, to minimize the chance of introducing new
|
release announcements. However, to minimize the chance of introducing new
|
||||||
problems for end users, we'll do this *without* using the warnings system, so
|
problems for end users, we'll do this *without* using the warnings system, so
|
||||||
even running with ``-Werror`` won't turn it into a runtime exception
|
even running with ``-Werror`` won't turn it into a runtime exception
|
||||||
|
* any changes made will use *existing* configuration options
|
||||||
|
|
||||||
To minimize the negative impact on systems currently correctly configured to
|
To minimize the negative impact on systems currently correctly configured to
|
||||||
use GB-18030 or another partially ASCII compatible universal encoding leads to
|
use GB-18030 or another partially ASCII compatible universal encoding leads to
|
||||||
|
@ -434,7 +450,8 @@ be issued::
|
||||||
|
|
||||||
Python runtime initialized with LC_CTYPE=C (a locale with default ASCII
|
Python runtime initialized with LC_CTYPE=C (a locale with default ASCII
|
||||||
encoding), which may cause Unicode compatibility problems. Using C.UTF-8
|
encoding), which may cause Unicode compatibility problems. Using C.UTF-8
|
||||||
(if available) as an alternative Unicode-compatible locale is recommended.
|
C.utf8, or UTF-8 (if available) as alternative Unicode-compatible
|
||||||
|
locales is recommended.
|
||||||
|
|
||||||
In this case, no actual change will be made to the locale settings.
|
In this case, no actual change will be made to the locale settings.
|
||||||
|
|
||||||
|
@ -754,14 +771,15 @@ runtimes even when running a version with this change applied.
|
||||||
Implementation
|
Implementation
|
||||||
==============
|
==============
|
||||||
|
|
||||||
A draft implementation of the change (including test cases) has been
|
A draft implementation of the change (including test cases and documentation)
|
||||||
posted to issue 28180 [1_], which is an end user request that
|
is linked from issue 28180 [1_], which is an end user request that
|
||||||
``sys.getfilesystemencoding()`` default to ``utf-8`` rather than ``ascii``.
|
``sys.getfilesystemencoding()`` default to ``utf-8`` rather than ``ascii``.
|
||||||
|
|
||||||
NOTE: The currently posted draft implementation is for a previous iteration
|
This patch is now being maintained as the ``pep538-coerce-c-locale`` feature
|
||||||
of the PEP prior to the incorporation of the feedback noted in [11_]. It was
|
branch [18_] in Nick Coghlan's fork of the CPython repository on GitHub.
|
||||||
broadly the same in concept (i.e. coercing the legacy C locale to one based on
|
|
||||||
UTF-8), but differs in several details.
|
NOTE: As discussed in [1_], the currently posted draft implementation has some
|
||||||
|
known issues on Android.
|
||||||
|
|
||||||
|
|
||||||
Backporting to earlier Python 3 releases
|
Backporting to earlier Python 3 releases
|
||||||
|
@ -789,6 +807,10 @@ backport it to even earlier Python 3.x releases based on the needs and
|
||||||
interests of their particular user base, this wouldn't be encouraged as a
|
interests of their particular user base, this wouldn't be encouraged as a
|
||||||
general practice.
|
general practice.
|
||||||
|
|
||||||
|
However, configuring Python 3 *environments* (such as base container
|
||||||
|
images) to use these configuration settings by default is both allowed
|
||||||
|
and recommended.
|
||||||
|
|
||||||
|
|
||||||
Acknowledgements
|
Acknowledgements
|
||||||
================
|
================
|
||||||
|
@ -882,6 +904,9 @@ References
|
||||||
.. [17] UTF-8 locale discussion on "locale.getdefaultlocale() fails on Mac OS X with default language set to English"
|
.. [17] UTF-8 locale discussion on "locale.getdefaultlocale() fails on Mac OS X with default language set to English"
|
||||||
(http://bugs.python.org/issue18378#msg215215)
|
(http://bugs.python.org/issue18378#msg215215)
|
||||||
|
|
||||||
|
.. [18] GitHub branch diff for ``ncoghlan:pep538-coerce-c-locale``
|
||||||
|
(https://github.com/python/cpython/compare/master...ncoghlan:pep538-coerce-c-locale)
|
||||||
|
|
||||||
Copyright
|
Copyright
|
||||||
=========
|
=========
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue