PEP 540: Update based on PEP 538 experience (#493)

* PEP 540: Update based on PEP 538 experience

- CentOS 7 is a better example of problems than Alpine Linux
  (default locale is C/POSIX, C.UTF-8 isn't available for coercion)
- reword the annex discussing the design trade-offs between
  PEP 538's locale coercion and PEP 540's introduction of a
  UTF-8 specific mode

* Tweak wording of PEP 538 comparison
This commit is contained in:
Nick Coghlan 2017-12-06 19:48:45 +10:00 committed by Victor Stinner
parent 6e93c8d2e6
commit 690d32cd84
1 changed files with 27 additions and 14 deletions

View File

@ -55,6 +55,13 @@ strings.
When all data are stored as UTF-8 but the locale is often misconfigured,
an obvious solution is to ignore the locale and use UTF-8.
PEP 538 attempts to mitigate this problem by coercing the C locale
to a UTF-8 based locale when one is available, but that isn't a
universal solution. For example, CentOS 7's container images default
to the POSIX locale, and don't include the C.UTF-8 locale, so PEP 538's
locale coercion is ineffective.
Passthough undecodable bytes: surrogateescape
---------------------------------------------
@ -203,23 +210,29 @@ encoding.
locale encoding, and this code page never uses the ASCII encoding.
Annex: Differences between the PEP 538 and the PEP 540
======================================================
Annex: Differences between PEP 538 and PEP 540
==============================================
The PEP 538 uses the "C.UTF-8" locale which is quite new and only
supported by a few Linux distributions; this locale is not currently
supported by FreeBSD or macOS for example. This PEP 540 supports all
operating systems.
PEP 538's locale coercion is only effective if a suitable UTF-8
based locale is available as a coercion target. PEP 540's
UTF-8 mode can be enabled even for operating systems that don't
provide a suitable platform locale (such as CentOS 7).
The PEP 538 only changes the behaviour for the POSIX locale. While the
new UTF-8 mode of this PEP is only enabled by the POSIX locale, it can
be enabled manually for any other locale.
PEP 538 only changes the interpreter's behaviour for the C locale. While the
new UTF-8 mode of this PEP is only enabled by default in the C locale, it can
also be enabled manually for any other locale.
The PEP 538 is implemented with ``setlocale(LC_CTYPE, "C.UTF-8")``: any
non-Python code running in the process is impacted by this change. This
PEP is implemented in Python internals and ignores the locale:
non-Python running in the same process is not aware of the "Python UTF-8
mode".
PEP 538 is implemented with ``setlocale(LC_CTYPE, "<coercion target>")`` and
``setenv("LC_CTYPE", "<coercion target>")``, so any non-Python code running
in the process and any subprocesses that inherit the environment is impacted
by the change. PEP 540 is implemented in Python internals and ignores the
locale: non-Python running in the same process is not aware of the
"Python UTF-8 mode". The benefit of the PEP 538 approach is that it helps
ensure that encoding handling in binary extension modules and subprocesses
is consistent with CPython's encoding handling. The upside of the PEP 540
approach is that it allows an embedding application to change the
interpreter's behaviour without having to change the process global
locale settings.
Links