PEP 540: Update based on PEP 538 experience (#493)
* PEP 540: Update based on PEP 538 experience - CentOS 7 is a better example of problems than Alpine Linux (default locale is C/POSIX, C.UTF-8 isn't available for coercion) - reword the annex discussing the design trade-offs between PEP 538's locale coercion and PEP 540's introduction of a UTF-8 specific mode * Tweak wording of PEP 538 comparison
This commit is contained in:
parent
6e93c8d2e6
commit
690d32cd84
41
pep-0540.txt
41
pep-0540.txt
|
@ -55,6 +55,13 @@ strings.
|
||||||
When all data are stored as UTF-8 but the locale is often misconfigured,
|
When all data are stored as UTF-8 but the locale is often misconfigured,
|
||||||
an obvious solution is to ignore the locale and use UTF-8.
|
an obvious solution is to ignore the locale and use UTF-8.
|
||||||
|
|
||||||
|
PEP 538 attempts to mitigate this problem by coercing the C locale
|
||||||
|
to a UTF-8 based locale when one is available, but that isn't a
|
||||||
|
universal solution. For example, CentOS 7's container images default
|
||||||
|
to the POSIX locale, and don't include the C.UTF-8 locale, so PEP 538's
|
||||||
|
locale coercion is ineffective.
|
||||||
|
|
||||||
|
|
||||||
Passthough undecodable bytes: surrogateescape
|
Passthough undecodable bytes: surrogateescape
|
||||||
---------------------------------------------
|
---------------------------------------------
|
||||||
|
|
||||||
|
@ -203,23 +210,29 @@ encoding.
|
||||||
locale encoding, and this code page never uses the ASCII encoding.
|
locale encoding, and this code page never uses the ASCII encoding.
|
||||||
|
|
||||||
|
|
||||||
Annex: Differences between the PEP 538 and the PEP 540
|
Annex: Differences between PEP 538 and PEP 540
|
||||||
======================================================
|
==============================================
|
||||||
|
|
||||||
The PEP 538 uses the "C.UTF-8" locale which is quite new and only
|
PEP 538's locale coercion is only effective if a suitable UTF-8
|
||||||
supported by a few Linux distributions; this locale is not currently
|
based locale is available as a coercion target. PEP 540's
|
||||||
supported by FreeBSD or macOS for example. This PEP 540 supports all
|
UTF-8 mode can be enabled even for operating systems that don't
|
||||||
operating systems.
|
provide a suitable platform locale (such as CentOS 7).
|
||||||
|
|
||||||
The PEP 538 only changes the behaviour for the POSIX locale. While the
|
PEP 538 only changes the interpreter's behaviour for the C locale. While the
|
||||||
new UTF-8 mode of this PEP is only enabled by the POSIX locale, it can
|
new UTF-8 mode of this PEP is only enabled by default in the C locale, it can
|
||||||
be enabled manually for any other locale.
|
also be enabled manually for any other locale.
|
||||||
|
|
||||||
The PEP 538 is implemented with ``setlocale(LC_CTYPE, "C.UTF-8")``: any
|
PEP 538 is implemented with ``setlocale(LC_CTYPE, "<coercion target>")`` and
|
||||||
non-Python code running in the process is impacted by this change. This
|
``setenv("LC_CTYPE", "<coercion target>")``, so any non-Python code running
|
||||||
PEP is implemented in Python internals and ignores the locale:
|
in the process and any subprocesses that inherit the environment is impacted
|
||||||
non-Python running in the same process is not aware of the "Python UTF-8
|
by the change. PEP 540 is implemented in Python internals and ignores the
|
||||||
mode".
|
locale: non-Python running in the same process is not aware of the
|
||||||
|
"Python UTF-8 mode". The benefit of the PEP 538 approach is that it helps
|
||||||
|
ensure that encoding handling in binary extension modules and subprocesses
|
||||||
|
is consistent with CPython's encoding handling. The upside of the PEP 540
|
||||||
|
approach is that it allows an embedding application to change the
|
||||||
|
interpreter's behaviour without having to change the process global
|
||||||
|
locale settings.
|
||||||
|
|
||||||
|
|
||||||
Links
|
Links
|
||||||
|
|
Loading…
Reference in New Issue