PEP 540: Update based on PEP 538 experience (#493)
* PEP 540: Update based on PEP 538 experience - CentOS 7 is a better example of problems than Alpine Linux (default locale is C/POSIX, C.UTF-8 isn't available for coercion) - reword the annex discussing the design trade-offs between PEP 538's locale coercion and PEP 540's introduction of a UTF-8 specific mode * Tweak wording of PEP 538 comparison
This commit is contained in:
parent
6e93c8d2e6
commit
690d32cd84
41
pep-0540.txt
41
pep-0540.txt
|
@ -55,6 +55,13 @@ strings.
|
|||
When all data are stored as UTF-8 but the locale is often misconfigured,
|
||||
an obvious solution is to ignore the locale and use UTF-8.
|
||||
|
||||
PEP 538 attempts to mitigate this problem by coercing the C locale
|
||||
to a UTF-8 based locale when one is available, but that isn't a
|
||||
universal solution. For example, CentOS 7's container images default
|
||||
to the POSIX locale, and don't include the C.UTF-8 locale, so PEP 538's
|
||||
locale coercion is ineffective.
|
||||
|
||||
|
||||
Passthough undecodable bytes: surrogateescape
|
||||
---------------------------------------------
|
||||
|
||||
|
@ -203,23 +210,29 @@ encoding.
|
|||
locale encoding, and this code page never uses the ASCII encoding.
|
||||
|
||||
|
||||
Annex: Differences between the PEP 538 and the PEP 540
|
||||
======================================================
|
||||
Annex: Differences between PEP 538 and PEP 540
|
||||
==============================================
|
||||
|
||||
The PEP 538 uses the "C.UTF-8" locale which is quite new and only
|
||||
supported by a few Linux distributions; this locale is not currently
|
||||
supported by FreeBSD or macOS for example. This PEP 540 supports all
|
||||
operating systems.
|
||||
PEP 538's locale coercion is only effective if a suitable UTF-8
|
||||
based locale is available as a coercion target. PEP 540's
|
||||
UTF-8 mode can be enabled even for operating systems that don't
|
||||
provide a suitable platform locale (such as CentOS 7).
|
||||
|
||||
The PEP 538 only changes the behaviour for the POSIX locale. While the
|
||||
new UTF-8 mode of this PEP is only enabled by the POSIX locale, it can
|
||||
be enabled manually for any other locale.
|
||||
PEP 538 only changes the interpreter's behaviour for the C locale. While the
|
||||
new UTF-8 mode of this PEP is only enabled by default in the C locale, it can
|
||||
also be enabled manually for any other locale.
|
||||
|
||||
The PEP 538 is implemented with ``setlocale(LC_CTYPE, "C.UTF-8")``: any
|
||||
non-Python code running in the process is impacted by this change. This
|
||||
PEP is implemented in Python internals and ignores the locale:
|
||||
non-Python running in the same process is not aware of the "Python UTF-8
|
||||
mode".
|
||||
PEP 538 is implemented with ``setlocale(LC_CTYPE, "<coercion target>")`` and
|
||||
``setenv("LC_CTYPE", "<coercion target>")``, so any non-Python code running
|
||||
in the process and any subprocesses that inherit the environment is impacted
|
||||
by the change. PEP 540 is implemented in Python internals and ignores the
|
||||
locale: non-Python running in the same process is not aware of the
|
||||
"Python UTF-8 mode". The benefit of the PEP 538 approach is that it helps
|
||||
ensure that encoding handling in binary extension modules and subprocesses
|
||||
is consistent with CPython's encoding handling. The upside of the PEP 540
|
||||
approach is that it allows an embedding application to change the
|
||||
interpreter's behaviour without having to change the process global
|
||||
locale settings.
|
||||
|
||||
|
||||
Links
|
||||
|
|
Loading…
Reference in New Issue