PEP 538: Update for latest python-dev discussion
* default standard stream error handler is always "surrogateescape" for the potential coercion target locales * PEP 540 is now a purely optional follow-on PEP that improves the handling of cases where none of these locales are available, but doesn't require revisiting the changes made for this PEP * the locale coercion and warning behaviours are now enabled by default for all \*nix platforms, even Mac OS X * covered the Android-specific changes to the use of `setlocale` * state explicitly that we're aware this makes the behaviour of standalone CPython and embedded CPython diverge, we just think the potential benefits are sufficient to accept that downside * note the reference implementation has yet to be updated with these changes
This commit is contained in:
parent
dc175c5902
commit
2fb53e7c1b
219
pep-0538.txt
219
pep-0538.txt
|
@ -36,9 +36,9 @@ However, it comes at the cost of making CPython's encoding assumptions diverge
|
||||||
from those of other locale-aware components in the same process, as well as
|
from those of other locale-aware components in the same process, as well as
|
||||||
those of components running in subprocesses that share the same environment.
|
those of components running in subprocesses that share the same environment.
|
||||||
|
|
||||||
It also requires changes to the internals of how CPython itself works, rather
|
It also requires non-trivial changes to the internals of how CPython itself
|
||||||
than using existing configuration settings that are supported by Python
|
works, rather than relying primarily on existing configuration settings that
|
||||||
versions prior to Python 3.7.
|
are supported by Python versions prior to Python 3.7.
|
||||||
|
|
||||||
Accordingly, this PEP proposes that independently of the UTF-8 mode proposed
|
Accordingly, this PEP proposes that independently of the UTF-8 mode proposed
|
||||||
in PEP 540, the way the CPython implementation handles the default C locale be
|
in PEP 540, the way the CPython implementation handles the default C locale be
|
||||||
|
@ -48,27 +48,25 @@ changed such that:
|
||||||
the standalone CPython binary will automatically attempt to coerce the ``C``
|
the standalone CPython binary will automatically attempt to coerce the ``C``
|
||||||
locale to the first available locale out of ``C.UTF-8``, ``C.utf8``, or
|
locale to the first available locale out of ``C.UTF-8``, ``C.utf8``, or
|
||||||
``UTF-8``
|
``UTF-8``
|
||||||
* if the locale is successfully coerced, PEP 540 is not accepted, and the
|
* ``Py_Initialize`` will be updated to treat these potential coercion target
|
||||||
``PYTHONIOENCODING`` environment variable is not set, then
|
locales the same way it already treats the ``C`` locale: the default standard
|
||||||
``Py_SetStandardStreamEncoding`` will be called with ``"utf-8"`` and
|
stream error handler for these locales will become ``surrogateescape`` (this
|
||||||
``"surrogateescape"`` as arguments.
|
default can be overridden through ``PYTHONIOENCODING`` and
|
||||||
* if the locale is successfully coerced, and PEP 540 *is* accepted, then
|
``Py_SetStandardStreamEncoding`` as usual)
|
||||||
``PYTHONUTF8`` (if not otherwise set) will be set to ``1``
|
* if ``Py_Initialize`` detects that the legacy ``C`` locale remains active
|
||||||
* if the subsequent runtime initialization process detects that the legacy
|
(e.g. none of ``C.UTF-8``, ``C.utf8`` or ``UTF-8``
|
||||||
``C`` locale remains active (e.g. none of ``C.UTF-8``, ``C.utf8`` or ``UTF-8``
|
|
||||||
are available, or the runtime is embedded in an application other than the
|
are available, or the runtime is embedded in an application other than the
|
||||||
main CPython binary), locale coercion is not explicitly disabled, and the
|
main CPython binary), and locale coercion is not explicitly disabled, it will
|
||||||
``PYTHONUTF8`` feature defined in PEP 540 is also disabled (or not
|
emit a warning on stderr that use of the legacy ``C`` locale's default ASCII
|
||||||
implemented), it will emit a warning on stderr that use of the legacy
|
text encoding may cause various Unicode compatibility issues
|
||||||
``C`` locale's default ASCII text encoding may cause various Unicode
|
|
||||||
compatibility issues
|
|
||||||
|
|
||||||
With this change, any \*nix platform that does *not* offer at least one of the
|
With this change, any \*nix platform that does *not* offer at least one of the
|
||||||
``C.UTF-8``, ``C.utf8`` or ``UTF-8`` locales as part of its standard
|
``C.UTF-8``, ``C.utf8`` or ``UTF-8`` locales as part of its standard
|
||||||
configuration would only be considered a fully supported platform for CPython
|
configuration would only be considered a fully supported platform for CPython
|
||||||
3.7+ deployments when either the new ``PYTHONUTF8`` mode defined in PEP 540 is
|
3.7+ deployments when a suitable locale other than the default ``C`` locale is
|
||||||
used, or else a suitable locale other than the default ``C`` locale is
|
configured explicitly (e.g. ``en_AU.UTF-8``, ``zh_CN.gb18030``). If PEP 540 is
|
||||||
configured explicitly (e.g. ``en_AU.UTF-8``, ``zh_CN.gb18030``).
|
accepted in addition to this PEP, then such platforms would also be supported
|
||||||
|
when using the proposed ``PYTHONUTF8`` mode.
|
||||||
|
|
||||||
Redistributors (such as Linux distributions) with a narrower target audience
|
Redistributors (such as Linux distributions) with a narrower target audience
|
||||||
than the upstream CPython development team may also choose to opt in to this
|
than the upstream CPython development team may also choose to opt in to this
|
||||||
|
@ -140,6 +138,9 @@ still fail in the following cases:
|
||||||
* some process environments (such as Linux containers) may not have any
|
* some process environments (such as Linux containers) may not have any
|
||||||
explicit locale configured at all. As with unknown locales, this leads to
|
explicit locale configured at all. As with unknown locales, this leads to
|
||||||
CPython running in the default ASCII-based C locale
|
CPython running in the default ASCII-based C locale
|
||||||
|
* on Android, rather than configuring the locale based on environment variables,
|
||||||
|
the empty locale ``""`` is treated as specifically requesting the ``"C"``
|
||||||
|
locale
|
||||||
|
|
||||||
The simplest way to deal with this problem for currently released versions of
|
The simplest way to deal with this problem for currently released versions of
|
||||||
CPython is to explicitly set a more sensible locale when launching the
|
CPython is to explicitly set a more sensible locale when launching the
|
||||||
|
@ -204,13 +205,13 @@ components, and an approach more amenable to being backported to Python 3.6
|
||||||
by downstream redistributors.
|
by downstream redistributors.
|
||||||
|
|
||||||
As a result, this PEP was amended to refer to PEP 540 as a complementary
|
As a result, this PEP was amended to refer to PEP 540 as a complementary
|
||||||
solution that offered improved behaviour both when locale coercion triggered,
|
solution that offered improved behaviour when none of the standard UTF-8 based
|
||||||
as well as when none of the standard UTF-8 based locales were available.
|
locales were available.
|
||||||
|
|
||||||
The availability of PEP 540 also meant that the ``LC_CTYPE=en_US.UTF-8`` legacy
|
The availability of PEP 540 also meant that the ``LC_CTYPE=en_US.UTF-8`` legacy
|
||||||
fallback was removed from the list of UTF-8 locales tried as a coercion target,
|
fallback was removed from the list of UTF-8 locales tried as a coercion target,
|
||||||
with CPython instead relying solely on the proposed PYTHONUTF8 mode in such
|
with the expectation being that CPython will instead rely solely on the
|
||||||
cases.
|
proposed PYTHONUTF8 mode in such cases.
|
||||||
|
|
||||||
|
|
||||||
Motivation
|
Motivation
|
||||||
|
@ -323,7 +324,11 @@ proposed solution:
|
||||||
release announcements. However, to minimize the chance of introducing new
|
release announcements. However, to minimize the chance of introducing new
|
||||||
problems for end users, we'll do this *without* using the warnings system, so
|
problems for end users, we'll do this *without* using the warnings system, so
|
||||||
even running with ``-Werror`` won't turn it into a runtime exception
|
even running with ``-Werror`` won't turn it into a runtime exception
|
||||||
* any changes made will use *existing* configuration options
|
* as far as is feasible, any changes made will use *existing* configuration
|
||||||
|
options
|
||||||
|
* Python's runtime behaviour in potential coercion target locales should be
|
||||||
|
identical regardless of whether the locale was set explicitly in the
|
||||||
|
environment or implicitly as a locale coercion target
|
||||||
|
|
||||||
Minimizing the negative impact on systems currently correctly configured to
|
Minimizing the negative impact on systems currently correctly configured to
|
||||||
use GB-18030 or another partially ASCII compatible universal encoding leads to
|
use GB-18030 or another partially ASCII compatible universal encoding leads to
|
||||||
|
@ -347,11 +352,14 @@ run as a standalone command line application.
|
||||||
|
|
||||||
It further proposes to emit a warning on stderr if the legacy ``C`` locale
|
It further proposes to emit a warning on stderr if the legacy ``C`` locale
|
||||||
is in effect at the point where the language runtime itself is initialized,
|
is in effect at the point where the language runtime itself is initialized,
|
||||||
the explicit environmental flag to disable locale coercion is not set, and
|
and the explicit environmental flag to disable locale coercion is not set, in
|
||||||
the PEP 540 UTF-8 encoding override is also disabled (or not implemented), in
|
|
||||||
order to warn system and application integrators that they're running CPython
|
order to warn system and application integrators that they're running CPython
|
||||||
in an unsupported configuration.
|
in an unsupported configuration.
|
||||||
|
|
||||||
|
In addition to these general changes, some additional Android-specific changes
|
||||||
|
are proposed to handle the differences in the behaviour of ``setlocale`` on that
|
||||||
|
platform.
|
||||||
|
|
||||||
|
|
||||||
Legacy C locale coercion in the standalone Python interpreter binary
|
Legacy C locale coercion in the standalone Python interpreter binary
|
||||||
--------------------------------------------------------------------
|
--------------------------------------------------------------------
|
||||||
|
@ -401,14 +409,8 @@ defines the ``LC_CTYPE`` category. Accordingly, only the ``LC_CTYPE``
|
||||||
environment variable would be set when using this fallback option.
|
environment variable would be set when using this fallback option.
|
||||||
|
|
||||||
To adjust automatically to future changes in locale availability, these checks
|
To adjust automatically to future changes in locale availability, these checks
|
||||||
will be implemented at runtime on all platforms other than Mac OS X and Windows,
|
will be implemented at runtime on all platforms other than Windows, rather
|
||||||
rather than attempting to determine which locales to try at compile time.
|
than attempting to determine which locales to try at compile time.
|
||||||
|
|
||||||
If the locale settings are changed successfully, and the ``PYTHONIOENCODING``
|
|
||||||
environment variable is currently unset, then ``Py_SetStandardStreamEncoding``
|
|
||||||
will be called to force the standard IO streams to ``utf-8`` as the nominal
|
|
||||||
text encoding and ``surrogateescape`` as the error handler (``stderr`` will
|
|
||||||
continue to use ``backslashreplace`` as it's error handler as usual).
|
|
||||||
|
|
||||||
When this locale coercion is activated, the following warning will be
|
When this locale coercion is activated, the following warning will be
|
||||||
printed on stderr, with the warning containing whichever locale was
|
printed on stderr, with the warning containing whichever locale was
|
||||||
|
@ -423,7 +425,7 @@ different::
|
||||||
Python detected LC_CTYPE=C: LC_CTYPE coerced to UTF-8 (set another locale
|
Python detected LC_CTYPE=C: LC_CTYPE coerced to UTF-8 (set another locale
|
||||||
or PYTHONCOERCECLOCALE=0 to disable this locale coercion behaviour).
|
or PYTHONCOERCECLOCALE=0 to disable this locale coercion behaviour).
|
||||||
|
|
||||||
In combination with PEP 540, this locale coercion will mean that the standard
|
This locale coercion will mean that the standard
|
||||||
Python binary *and* locale-aware extensions should once again "just work"
|
Python binary *and* locale-aware extensions should once again "just work"
|
||||||
in the three main failure cases we're aware of (missing locale
|
in the three main failure cases we're aware of (missing locale
|
||||||
settings, SSH forwarding of unknown locales, and developers explicitly
|
settings, SSH forwarding of unknown locales, and developers explicitly
|
||||||
|
@ -453,8 +455,8 @@ or not to suppress the locale compatibility warning will be similarly
|
||||||
independent of these settings.
|
independent of these settings.
|
||||||
|
|
||||||
|
|
||||||
Changes to the runtime initialization process
|
Legacy C locale warning during runtime initialization
|
||||||
---------------------------------------------
|
-----------------------------------------------------
|
||||||
|
|
||||||
By the time that ``Py_Initialize`` is called, arbitrary locale-dependent
|
By the time that ``Py_Initialize`` is called, arbitrary locale-dependent
|
||||||
operations may have taken place in the current process. This means that
|
operations may have taken place in the current process. This means that
|
||||||
|
@ -463,9 +465,8 @@ doing so would introduce inconsistencies in decoded text, even in the context
|
||||||
of the standalone Python interpreter binary.
|
of the standalone Python interpreter binary.
|
||||||
|
|
||||||
Accordingly, when ``Py_Initialize`` is called and CPython detects that the
|
Accordingly, when ``Py_Initialize`` is called and CPython detects that the
|
||||||
configured locale is still the default ``C`` locale, ``PYTHONCOERCECLOCALE=0``
|
configured locale is still the default ``C`` locale and
|
||||||
is set, *and* the ``PYTHONUTF8`` feature from PEP 540 is disabled (or not
|
``PYTHONCOERCECLOCALE=0`` is not set, the following warning will be issued::
|
||||||
implemented), the following warning will be issued::
|
|
||||||
|
|
||||||
Python runtime initialized with LC_CTYPE=C (a locale with default ASCII
|
Python runtime initialized with LC_CTYPE=C (a locale with default ASCII
|
||||||
encoding), which may cause Unicode compatibility problems. Using C.UTF-8,
|
encoding), which may cause Unicode compatibility problems. Using C.UTF-8,
|
||||||
|
@ -499,10 +500,42 @@ The locale warning behaviour would be controlled by the flag
|
||||||
``--with[out]-c-locale-warning``, which would set the ``PY_WARN_ON_C_LOCALE``
|
``--with[out]-c-locale-warning``, which would set the ``PY_WARN_ON_C_LOCALE``
|
||||||
preprocessor definition.
|
preprocessor definition.
|
||||||
|
|
||||||
On platforms where they would have no effect (e.g. Mac OS X, iOS, Android,
|
On platforms which don't use the ``autotools`` based build system (i.e.
|
||||||
Windows) these preprocessor variables would always be undefined.
|
Windows) these preprocessor variables would always be undefined.
|
||||||
|
|
||||||
|
|
||||||
|
Changes to the default error handling on the standard streams
|
||||||
|
-------------------------------------------------------------
|
||||||
|
|
||||||
|
Since Python 3.5, CPython has defaulted to using ``surrogateescape`` on the
|
||||||
|
standard streams (``sys.stdin``, ``sys.stdout``, ``sys.stderr``) when it
|
||||||
|
detects that the current locale is ``C`` and no specific error handled has
|
||||||
|
been set using either the ``PYTHONIOENCODING`` environment variable or the
|
||||||
|
``Py_setStandardStreamEncoding`` API. For other locales, the default error
|
||||||
|
handler for the standard streams is ``strict``.
|
||||||
|
|
||||||
|
In order to preserve this behaviour without introducing any behavioural
|
||||||
|
discrepancies between locale coercion and explicitly configuring a locale, the
|
||||||
|
coercion target locales (``C.UTF-8``, ``C.utf8``, and ``UTF-8``) will be added
|
||||||
|
to the list of locales that use ``surrogateescape`` as their default error
|
||||||
|
handler for the standard streams.
|
||||||
|
|
||||||
|
|
||||||
|
Changes to locale settings on Android
|
||||||
|
-------------------------------------
|
||||||
|
|
||||||
|
Independently of the other changes in this PEP, CPython on Android systems
|
||||||
|
will be updated to call ``setlocale(LC_ALL, "C.UTF-8")`` where it currently
|
||||||
|
calls ``setlocale(LC_ALL, "")`` and ``setlocale(LC_CTYPE, "C.UTF-8")`` where
|
||||||
|
it currently calls ``setlocale(LC_CTYPE, "")``.
|
||||||
|
|
||||||
|
This Android-specific behaviour is being introduced due to the following
|
||||||
|
Android-specific details:
|
||||||
|
|
||||||
|
* on Android, passing ``""`` to ``setlocale`` is equivalent to passing ``"C"``
|
||||||
|
* the ``C.UTF-8`` locale is always available
|
||||||
|
|
||||||
|
|
||||||
Platform Support Changes
|
Platform Support Changes
|
||||||
========================
|
========================
|
||||||
|
|
||||||
|
@ -515,19 +548,6 @@ A new "Legacy C Locale" section will be added to PEP 11 that states:
|
||||||
cannot be reproduced in an appropriately configured non-ASCII locale will be
|
cannot be reproduced in an appropriately configured non-ASCII locale will be
|
||||||
closed as "won't fix".
|
closed as "won't fix".
|
||||||
|
|
||||||
If PEP 540 is also implemented, then this section would instead state:
|
|
||||||
|
|
||||||
* as of CPython 3.7, the legacy C locale is only supported when operating in
|
|
||||||
"UTF-8" mode. Any Unicode handling issues that occur only in that locale
|
|
||||||
and cannot be reproduced in an appropriately configured non-ASCII locale will
|
|
||||||
be closed as "won't fix"
|
|
||||||
* as of CPython 3.7, \*nix platforms are expected to provide at least one of
|
|
||||||
``C.UTF-8`` (full locale), ``C.utf8`` (full locale) or ``UTF-8`` (
|
|
||||||
``LC_CTYPE``-only locale) as an alternative to the legacy ``C`` locale.
|
|
||||||
Any Unicode related integration problems with other locale-aware components
|
|
||||||
that occur only in that locale and cannot be reproduced in an appropriately
|
|
||||||
configured non-ASCII locale will be closed as "won't fix".
|
|
||||||
|
|
||||||
|
|
||||||
Rationale
|
Rationale
|
||||||
=========
|
=========
|
||||||
|
@ -580,10 +600,8 @@ introduced in Python 3.5 ([15_]), as well as the automatic use of
|
||||||
``surrogateescape`` when operating in PEP 540's UTF-8 mode.
|
``surrogateescape`` when operating in PEP 540's UTF-8 mode.
|
||||||
|
|
||||||
Rather than introducing yet another configuration option to address that,
|
Rather than introducing yet another configuration option to address that,
|
||||||
this PEP proposes to use the existing ``Py_SetStandardStreamEncoding``
|
this PEP proposes to extend the "surrogateescape" default to also apply to
|
||||||
interface to ensure that the ``surrogateescape`` handler is enabled when
|
the three potential coercion target locales.
|
||||||
the interpreter is required to make assumptions regarding the expected
|
|
||||||
filesystem encoding.
|
|
||||||
|
|
||||||
The aim of this behaviour is to attempt to ensure that operating system
|
The aim of this behaviour is to attempt to ensure that operating system
|
||||||
provided text values are typically able to be transparently passed through a
|
provided text values are typically able to be transparently passed through a
|
||||||
|
@ -673,14 +691,14 @@ now displays both files as originally intended::
|
||||||
GB18030: ℙƴ☂ℌøἤ
|
GB18030: ℙƴ☂ℌøἤ
|
||||||
|
|
||||||
The rationale for retaining ``surrogateescape`` as the default IO encoding is
|
The rationale for retaining ``surrogateescape`` as the default IO encoding is
|
||||||
that it will preserve the following helpful behaviour in the C locale::
|
that it will preserve the following helpful behaviour in the ``C`` locale::
|
||||||
|
|
||||||
$ cat gb18030.txt \
|
$ cat gb18030.txt \
|
||||||
| LANG=C python3 -c "import sys; print(sys.stdin.read())" \
|
| LANG=C python3 -c "import sys; print(sys.stdin.read())" \
|
||||||
| iconv -f GB18030 -t UTF-8
|
| iconv -f GB18030 -t UTF-8
|
||||||
ℙƴ☂ℌøἤ
|
ℙƴ☂ℌøἤ
|
||||||
|
|
||||||
Rather than reverting to the exception seen when a UTF-8 based locale is
|
Rather than reverting to the exception currently seen when a UTF-8 based locale is
|
||||||
explicitly configured::
|
explicitly configured::
|
||||||
|
|
||||||
$ cat gb18030.txt \
|
$ cat gb18030.txt \
|
||||||
|
@ -692,29 +710,9 @@ explicitly configured::
|
||||||
(result, consumed) = self._buffer_decode(data, self.errors, final)
|
(result, consumed) = self._buffer_decode(data, self.errors, final)
|
||||||
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x81 in position 0: invalid start byte
|
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x81 in position 0: invalid start byte
|
||||||
|
|
||||||
Note: in order to also affect subprocesses running Python 3, earlier versions
|
As an added benefit, environments explicitly configured to use one of the
|
||||||
of this PEP proposed setting ``PYTHONIOENCODING`` to ``utf-8:surrogateescape``
|
coercion target locales will implicitly gain the encoding transparency behaviour
|
||||||
rather than calling ``Py_SetStandardStreamEncoding`` when the locale coercion
|
currently enabled by default in the ``C`` locale.
|
||||||
triggered. Unfortunately, this approach proved to have undesirable side
|
|
||||||
effects when Python 2 applications were invoked in subprocesses (as there is
|
|
||||||
no ``surrogateescape`` error handler available in Python 2).
|
|
||||||
|
|
||||||
Another design option would be to *always* default to ``surrogateescape`` on the
|
|
||||||
standard streams, and require the use of ``PYTHONIOENCODING=:strict`` to request
|
|
||||||
text encoding validation during stream processing. Adopting such an approach
|
|
||||||
would bring Python 3 more into line with typical C/C++ tools that pass along
|
|
||||||
the raw bytes without checking them for conformance to their nominal encoding,
|
|
||||||
and would hence also make the last example display the desired output::
|
|
||||||
|
|
||||||
$ cat gb18030.txt \
|
|
||||||
| PYTHONIOENCODING=:surrogateescape python3 -c "import sys; print(sys.stdin.read())" \
|
|
||||||
| iconv -f GB18030 -t UTF-8
|
|
||||||
ℙƴ☂ℌøἤ
|
|
||||||
|
|
||||||
However, such a change would have broader implications than the C locale
|
|
||||||
specific changes currently proposed, so it is considered out of scope for this
|
|
||||||
PEP. Instead, an improved solution is left to the combination of this PEP with
|
|
||||||
PEP 540, by automatically setting ``PYTHONUTF8=1`` when locale coercion occurs.
|
|
||||||
|
|
||||||
|
|
||||||
Dropping official support for ASCII based text handling in the legacy C locale
|
Dropping official support for ASCII based text handling in the legacy C locale
|
||||||
|
@ -724,8 +722,8 @@ We've been trying to get strict bytes/text separation to work reliably in the
|
||||||
legacy C locale for over a decade at this point. Not only haven't we been able
|
legacy C locale for over a decade at this point. Not only haven't we been able
|
||||||
to get it to work, neither has anyone else - the only viable alternatives
|
to get it to work, neither has anyone else - the only viable alternatives
|
||||||
identified have been to pass the bytes along verbatim without eagerly decoding
|
identified have been to pass the bytes along verbatim without eagerly decoding
|
||||||
them to text (C/C++, Python 2.x, Ruby, etc), or else to ignore the nominal
|
them to text (C/C++, Python 2.x, Ruby, etc), or else to largely ignore the
|
||||||
C/C++ locale encoding entirely and assume the use of either UTF-8 (PEP 540,
|
nominal C/C++ locale encoding and assume the use of either UTF-8 (PEP 540,
|
||||||
Rust, Go, Node.js, etc) or UTF-16-LE (JVM, .NET CLR).
|
Rust, Go, Node.js, etc) or UTF-16-LE (JVM, .NET CLR).
|
||||||
|
|
||||||
While this PEP ensures that developers that genuinely need to do so can still
|
While this PEP ensures that developers that genuinely need to do so can still
|
||||||
|
@ -740,6 +738,11 @@ locale setting (or PEP 540's UTF-8 mode, if that is available).
|
||||||
Providing implicit locale coercion only when running standalone
|
Providing implicit locale coercion only when running standalone
|
||||||
---------------------------------------------------------------
|
---------------------------------------------------------------
|
||||||
|
|
||||||
|
The major downside of the proposed design in this PEP is that it introduces a
|
||||||
|
potential discrepancy between the behaviour of the CPython runtime when it is
|
||||||
|
run as a standalone application and when it is run as an embedded component
|
||||||
|
inside a larger system (e.g. ``mod_wsgi`` running inside Apache ``httpd``).
|
||||||
|
|
||||||
Over the course of Python 3.x development, multiple attempts have been made
|
Over the course of Python 3.x development, multiple attempts have been made
|
||||||
to improve the handling of incorrect locale settings at the point where the
|
to improve the handling of incorrect locale settings at the point where the
|
||||||
Python interpreter is initialised. The problem that emerged is that this is
|
Python interpreter is initialised. The problem that emerged is that this is
|
||||||
|
@ -765,6 +768,19 @@ The ``Py_Initialize`` API then only gains an explicit warning (emitted on
|
||||||
embedding application to specify something more reasonable.
|
embedding application to specify something more reasonable.
|
||||||
|
|
||||||
|
|
||||||
|
Allowing restoration of the legacy behaviour
|
||||||
|
--------------------------------------------
|
||||||
|
|
||||||
|
The CPython command line interpreter is often used to investigate faults that
|
||||||
|
occur in other applications that embed CPython, and those applications may still
|
||||||
|
be using the C locale even after this PEP is implemented.
|
||||||
|
|
||||||
|
Providing a simple on/off switch for the locale coercion behaviour makes it
|
||||||
|
much easier to reproduce the behaviour of such applications for debugging
|
||||||
|
purposes, as well as making it easier to reproduce the behaviour of older 3.x
|
||||||
|
runtimes even when running a version with this change applied.
|
||||||
|
|
||||||
|
|
||||||
Querying LC_CTYPE for C locale detection
|
Querying LC_CTYPE for C locale detection
|
||||||
----------------------------------------
|
----------------------------------------
|
||||||
|
|
||||||
|
@ -777,8 +793,8 @@ whether or not the current locale configuration is likely to cause Unicode
|
||||||
handling problems.
|
handling problems.
|
||||||
|
|
||||||
|
|
||||||
Setting both LANG & LC_ALL for C.UTF-8 locale coercion
|
Setting both LANG & LC_ALL for UTF-8 locale coercion
|
||||||
------------------------------------------------------
|
----------------------------------------------------
|
||||||
|
|
||||||
Python is often used as a glue language, integrating other C/C++ ABI compatible
|
Python is often used as a glue language, integrating other C/C++ ABI compatible
|
||||||
components in the current process, and components written in arbitrary
|
components in the current process, and components written in arbitrary
|
||||||
|
@ -795,21 +811,23 @@ Setting ``LANG`` to ``C.UTF-8`` ensures that even components that only check
|
||||||
the ``LANG`` fallback for their locale settings will still use ``C.UTF-8``.
|
the ``LANG`` fallback for their locale settings will still use ``C.UTF-8``.
|
||||||
|
|
||||||
Together, these should ensure that when the locale coercion is activated, the
|
Together, these should ensure that when the locale coercion is activated, the
|
||||||
switch to the C.UTF-8 locale will be applied consistently across the current
|
switch to the UTF-8 based locale will be applied consistently across the current
|
||||||
process and any subprocesses that inherit the current environment.
|
process and any subprocesses that inherit the current environment.
|
||||||
|
|
||||||
|
|
||||||
Allowing restoration of the legacy behaviour
|
Enabling C locale coercion and warnings on Mac OS X
|
||||||
--------------------------------------------
|
---------------------------------------------------
|
||||||
|
|
||||||
The CPython command line interpreter is often used to investigate faults that
|
On Mac OS X, CPython already assumes the use of UTF-8 for system interfaces,
|
||||||
occur in other applications that embed CPython, and those applications may still
|
and we expect most other locale-aware components to do the same.
|
||||||
be using the C locale even after this PEP is implemented.
|
|
||||||
|
|
||||||
Providing a simple on/off switch for the locale coercion behaviour makes it
|
However, Mac OS X is also frequently used as a development and testing platform
|
||||||
much easier to reproduce the behaviour of such applications for debugging
|
for Python software intended for deployment to other \*nix environments (such as
|
||||||
purposes, as well as making it easier to reproduce the behaviour of older 3.x
|
Linux).
|
||||||
runtimes even when running a version with this change applied.
|
|
||||||
|
Accordingly, this PEP enables the locale coercion and warning features on
|
||||||
|
Mac OS X in the name of cross platform consistency, even though they're expected
|
||||||
|
to almost entirely redundant on Mac OS X itself.
|
||||||
|
|
||||||
|
|
||||||
Implementation
|
Implementation
|
||||||
|
@ -823,9 +841,10 @@ This reference implementation covers not only the enhancement request in
|
||||||
issue 28180 [1_], but also the Android compatibility fixes needed to resolve
|
issue 28180 [1_], but also the Android compatibility fixes needed to resolve
|
||||||
issue 28997 [16_].
|
issue 28997 [16_].
|
||||||
|
|
||||||
NOTE: The reference implementation is currently missing the ``configure.ac``
|
.. note:
|
||||||
checks that are needed to ensure that ``PY_COERCE_C_LOCALE`` and
|
|
||||||
``PY_WARN_ON_C_LOCALE`` are always undefined on Mac OS X.
|
The reference implementation has not yet been updated for the 2017-05-06
|
||||||
|
amendments to the PEP
|
||||||
|
|
||||||
|
|
||||||
Backporting to earlier Python 3 releases
|
Backporting to earlier Python 3 releases
|
||||||
|
|
Loading…
Reference in New Issue