PEP 538: Only set LC_CTYPE, never LANG
It looks like setting LANG may have undesirable side effects in some cases, and all the issues the PEP aims to handle are resolved by setting LC_CTYPE. The proposal and implementation have thus been updated to only set LC_CTYPE, even when the target coercion locale is a full locale.
This commit is contained in:
parent
db50b27755
commit
12cecb0548
108
pep-0538.txt
108
pep-0538.txt
|
@ -51,7 +51,6 @@ changed to be roughly equivalent to the following existing configuration
|
|||
settings (supported since Python 3.1)::
|
||||
|
||||
LC_CTYPE=C.UTF-8
|
||||
LANG=C.UTF-8
|
||||
PYTHONIOENCODING=utf-8:surrogateescape
|
||||
|
||||
The exact target locale for coercion will be chosen from a predefined list at
|
||||
|
@ -153,7 +152,7 @@ The simplest way to deal with this problem for currently released versions of
|
|||
CPython is to explicitly set a more sensible locale when launching the
|
||||
application. For example::
|
||||
|
||||
LANG=C.UTF-8 python3 ...
|
||||
LC_CTYPE=C.UTF-8 python3 ...
|
||||
|
||||
The ``C.UTF-8`` locale is a full locale definition that uses ``UTF-8`` for the
|
||||
``LC_CTYPE`` category, and the same settings as the ``C`` locale for all other
|
||||
|
@ -276,19 +275,19 @@ The simplest way to get Python 3 (regardless of the exact version) to behave
|
|||
sensibly in Fedora and Debian based containers is to run it in the ``C.UTF-8``
|
||||
locale that both distros provide::
|
||||
|
||||
$ docker run --rm -e LANG=C.UTF-8 fedora:25 python3 -c 'print("ℙƴ☂ℌøἤ")'
|
||||
$ docker run --rm -e LC_CTYPE=C.UTF-8 fedora:25 python3 -c 'print("ℙƴ☂ℌøἤ")'
|
||||
ℙƴ☂ℌøἤ
|
||||
$ docker run --rm -e LANG=C.UTF-8 ncoghlan/debian-python python3 -c 'print("ℙƴ☂ℌøἤ")'
|
||||
$ docker run --rm -e LC_CTYPE=C.UTF-8 ncoghlan/debian-python python3 -c 'print("ℙƴ☂ℌøἤ")'
|
||||
ℙƴ☂ℌøἤ
|
||||
|
||||
$ docker run --rm -e LANG=C.UTF-8 fedora:25 locale | grep -E 'LC_ALL|LC_CTYPE|LANG'
|
||||
LANG=C.UTF-8
|
||||
LC_CTYPE="C.UTF-8"
|
||||
$ docker run --rm -e LC_CTYPE=C.UTF-8 fedora:25 locale | grep -E 'LC_ALL|LC_CTYPE|LANG'
|
||||
LANG=
|
||||
LC_CTYPE=C.UTF-8
|
||||
LC_ALL=
|
||||
$ docker run --rm -e LANG=C.UTF-8 ncoghlan/debian-python locale | grep -E 'LC_ALL|LC_CTYPE|LANG'
|
||||
LANG=C.UTF-8
|
||||
$ docker run --rm -e LC_CTYPE=C.UTF-8 ncoghlan/debian-python locale | grep -E 'LC_ALL|LC_CTYPE|LANG'
|
||||
LANG=
|
||||
LANGUAGE=
|
||||
LC_CTYPE="C.UTF-8"
|
||||
LC_CTYPE=C.UTF-8
|
||||
LC_ALL=
|
||||
|
||||
The Alpine Linux based Python images provided by Docker, Inc. already use the
|
||||
|
@ -358,8 +357,9 @@ use an explicit locale category like ``LC_TIME``, ``LC_MONETARY`` or
|
|||
``LC_NUMERIC`` while otherwise running in the legacy C locale gives the
|
||||
following design principles:
|
||||
|
||||
* don't make any environmental changes that would override explicit settings for
|
||||
locale categories other than ``LC_CTYPE`` (most notably: don't set ``LC_ALL``)
|
||||
* don't make any environmental changes that would alter any existing settings
|
||||
for locale categories other than ``LC_CTYPE`` (most notably: don't set
|
||||
``LC_ALL`` or ``LANG``)
|
||||
|
||||
Finally, maintaining compatibility with running arbitrary subprocesses in
|
||||
orchestration use cases leads to the following design principle:
|
||||
|
@ -374,11 +374,12 @@ Specification
|
|||
|
||||
To better handle the cases where CPython would otherwise end up attempting
|
||||
to operate in the ``C`` locale, this PEP proposes that CPython automatically
|
||||
attempt to coerce the legacy ``C`` locale to a UTF-8 based locale when it is
|
||||
run as a standalone command line application.
|
||||
attempt to coerce the legacy ``C`` locale to a UTF-8 based locale for the
|
||||
``LC_CTYPE`` category when it is run as a standalone command line application.
|
||||
|
||||
It further proposes to emit a warning on stderr if the legacy ``C`` locale
|
||||
is in effect at the point where the language runtime itself is initialized,
|
||||
is in effect for the ``LC_CTYPE`` category at the point where the language
|
||||
runtime itself is initialized,
|
||||
and the explicit environmental flag to disable locale coercion is not set, in
|
||||
order to warn system and application integrators that they're running CPython
|
||||
in an unsupported configuration.
|
||||
|
@ -423,17 +424,13 @@ Three such locales will be tried:
|
|||
* ``C.UTF-8`` (available at least in Debian, Ubuntu, Alpine, and Fedora 25+, and
|
||||
expected to be available by default in a future version of glibc)
|
||||
* ``C.utf8`` (available at least in HP-UX)
|
||||
* ``UTF-8`` (available in at least some \*BSD variants)
|
||||
* ``UTF-8`` (available in at least some \*BSD variants, including Mac OS X)
|
||||
|
||||
For ``C.UTF-8`` and ``C.utf8``, the coercion will be implemented by setting
|
||||
both the ``LC_CTYPE`` and ``LANG`` environment variables to the candidate
|
||||
locale name, such that future calls to ``setlocale()`` will see them, as will
|
||||
other components looking for those settings (such as GUI development
|
||||
frameworks).
|
||||
|
||||
For the platforms where it is defined, ``UTF-8`` is a partial locale that only
|
||||
defines the ``LC_CTYPE`` category. Accordingly, only the ``LC_CTYPE``
|
||||
environment variable would be set when using this fallback option.
|
||||
The coercion will be implemented by setting the ``LC_CTYPE`` environment
|
||||
variable to the candidate locale name, such that future calls to
|
||||
``setlocale()`` will see it, as will other components looking for those
|
||||
settings (such as GUI development frameworks and Python's own ``locale``
|
||||
module).
|
||||
|
||||
To allow for better cross-platform binary portability and to adjust
|
||||
automatically to future changes in locale availability, these checks will be
|
||||
|
@ -444,15 +441,9 @@ When this locale coercion is activated, the following warning will be
|
|||
printed on stderr, with the warning containing whichever locale was
|
||||
successfully configured::
|
||||
|
||||
Python detected LC_CTYPE=C: LC_CTYPE & LANG coerced to C.UTF-8 (set another
|
||||
Python detected LC_CTYPE=C: LC_CTYPE coerced to C.UTF-8 (set another
|
||||
locale or PYTHONCOERCECLOCALE=0 to disable this locale coercion behaviour).
|
||||
|
||||
When falling back to the ``UTF-8`` locale, the message would be slightly
|
||||
different::
|
||||
|
||||
Python detected LC_CTYPE=C: LC_CTYPE coerced to UTF-8 (set another locale
|
||||
or PYTHONCOERCECLOCALE=0 to disable this locale coercion behaviour).
|
||||
|
||||
As long as the current platform provides at least one of the candidate UTF-8
|
||||
based environments, this locale coercion will mean that the standard
|
||||
Python binary *and* locale-aware extensions should once again "just work"
|
||||
|
@ -489,9 +480,9 @@ Legacy C locale warning during runtime initialization
|
|||
|
||||
By the time that ``Py_Initialize`` is called, arbitrary locale-dependent
|
||||
operations may have taken place in the current process. This means that
|
||||
by the time it is called, it is *too late* to switch to a different locale -
|
||||
doing so would introduce inconsistencies in decoded text, even in the context
|
||||
of the standalone Python interpreter binary.
|
||||
by the time it is called, it is *too late* to reliably switch to a different
|
||||
locale - doing so would introduce inconsistencies in decoded text, even in the
|
||||
context of the standalone Python interpreter binary.
|
||||
|
||||
Accordingly, when ``Py_Initialize`` is called and CPython detects that the
|
||||
configured locale is still the default ``C`` locale and
|
||||
|
@ -860,8 +851,8 @@ whether or not the current locale configuration is likely to cause Unicode
|
|||
handling problems.
|
||||
|
||||
|
||||
Setting both LC_CTYPE & LANG for UTF-8 locale coercion
|
||||
------------------------------------------------------
|
||||
Explicitly setting LC_CTYPE for UTF-8 locale coercion
|
||||
-----------------------------------------------------
|
||||
|
||||
Python is often used as a glue language, integrating other C/C++ ABI compatible
|
||||
components in the current process, and components written in arbitrary
|
||||
|
@ -872,19 +863,46 @@ problem has arisen from a setting like ``LC_CTYPE=UTF-8`` being provided on a
|
|||
system where no ``UTF-8`` locale is defined (e.g. when a Mac OS X ssh client is
|
||||
configured to forward locale settings, and the user logs into a Linux server).
|
||||
|
||||
Setting ``LANG`` to ``C.UTF-8`` ensures that even components that only check
|
||||
the ``LANG`` fallback for their locale settings will still use ``C.UTF-8``.
|
||||
This should be sufficient to ensure that when the locale coercion is activated,
|
||||
the switch to the UTF-8 based locale will be applied consistently across the
|
||||
current process and any subprocesses that inherit the current environment.
|
||||
|
||||
Together, these should ensure that when the locale coercion is activated, the
|
||||
switch to the UTF-8 based locale will be applied consistently across the current
|
||||
process and any subprocesses that inherit the current environment.
|
||||
|
||||
Avoiding setting LANG for UTF-8 locale coercion
|
||||
-----------------------------------------------
|
||||
|
||||
Earlier versions of this PEP proposed setting the ``LANG`` category indepdent
|
||||
default locale, in addition to setting ``LC_CTYPE``.
|
||||
|
||||
This was later removed on the grounds that setting only ``LC_CTYPE`` is
|
||||
sufficient to handle all of the problematic scenarios that the PEP aimed
|
||||
to resolve, while setting ``LANG`` as well would break cases where ``LANG``
|
||||
was set correctly, and the locale problems were solely due to an incorrect
|
||||
``LC_CTYPE`` setting ([22_]).
|
||||
|
||||
For example, consider a Python application that called the Linux ``date``
|
||||
utility in a subprocess rather than doing its own date formatting::
|
||||
|
||||
$ LANG=ja_JP.UTF-8 LC_CTYPE=C date
|
||||
2017年 5月 23日 火曜日 17:31:03 JST
|
||||
|
||||
$ LANG=ja_JP.UTF-8 LC_CTYPE=C.UTF-8 date # Coercing only LC_CTYPE
|
||||
2017年 5月 23日 火曜日 17:32:58 JST
|
||||
|
||||
$ LANG=C.UTF-8 LC_CTYPE=C.UTF-8 date # Coercing both of LC_CTYPE and LANG
|
||||
Tue May 23 17:31:10 JST 2017
|
||||
|
||||
With only ``LC_CTYPE`` updated in the Python process, the subprocess would
|
||||
continue to behave as expected. However, if ``LANG`` was updated as well,
|
||||
that would effectively override the ``LC_TIME`` setting and use the wrong
|
||||
date formatting conventions.
|
||||
|
||||
|
||||
Avoiding setting LC_ALL for UTF-8 locale coercion
|
||||
-------------------------------------------------
|
||||
|
||||
Earlier versions of this PEP proposed setting the ``LC_ALL`` locale override,
|
||||
rather than just setting ``LC_CTYPE`` and ``LANG``.
|
||||
in addition to setting ``LC_CTYPE``.
|
||||
|
||||
This was changed after it was determined that just setting ``LC_CTYPE`` and
|
||||
``LANG`` should be sufficient to handle all the scenarios the PEP aims to
|
||||
|
@ -1198,6 +1216,10 @@ References
|
|||
.. [21] GNU readline misbehaviour on Mac OS X with ``LANG=C``
|
||||
(https://mail.python.org/pipermail/python-dev/2017-May/147897.html)
|
||||
|
||||
.. [22] Potential problems when setting LANG in addition to setting LC_CTYPE
|
||||
(https://mail.python.org/pipermail/python-dev/2017-May/147968.html)
|
||||
|
||||
|
||||
Copyright
|
||||
=========
|
||||
|
||||
|
|
Loading…
Reference in New Issue