python-peps/pep-0538.txt

PEP: 538
Title: Coercing the legacy C locale to a UTF-8 based locale
Version: $Revision$
Last-Modified: $Date$
Author: Nick Coghlan <ncoghlan@gmail.com>
BDFL-Delegate: INADA Naoki
Status: Final
Type: Standards Track
Content-Type: text/x-rst
Created: 28-Dec-2016
Python-Version: 3.7
Post-History: 03-Jan-2017,
              07-Jan-2017,
              05-Mar-2017,
              09-May-2017
Resolution: https://mail.python.org/pipermail/python-dev/2017-May/148035.html

Abstract
========

An ongoing challenge with Python 3 on \*nix systems is the conflict between
needing to use the configured locale encoding by default for consistency with
other locale-aware components in the same process or subprocesses,
and the fact that the standard C locale (as defined in POSIX:2001) typically
implies a default text encoding of ASCII, which is entirely inadequate for the
development of networked services and client applications in a multilingual
world.

:pep:`540` proposes a change to CPython's handling of the legacy C locale such
that CPython will assume the use of UTF-8 in such environments, rather than
persisting with the demonstrably problematic assumption of ASCII as an
appropriate encoding for communicating with operating system interfaces.
This is a good approach for cases where network encoding interoperability
is a more important concern than local encoding interoperability.

However, it comes at the cost of making CPython's encoding assumptions diverge
from those of other locale-aware components in the same process, as well as
those of components running in subprocesses that share the same environment.

This can cause interoperability problems with some extension modules (such as
GNU readline's command line history editing), as well as with components
running in subprocesses (such as older Python runtimes).

It also requires non-trivial changes to the internals of how CPython itself
works, rather than relying primarily on existing configuration settings that
are supported by Python versions prior to Python 3.7.

Accordingly, this PEP proposes that independently of the UTF-8 mode proposed
in :pep:`540`, the way the CPython implementation handles the default C locale be
changed to be roughly equivalent to the following existing configuration
settings (supported since Python 3.1)::

    LC_CTYPE=C.UTF-8
    PYTHONIOENCODING=utf-8:surrogateescape

The exact target locale for coercion will be chosen from a predefined list at
runtime based on the actually available locales.

The reinterpreted locale settings will be written back to the environment so
they're visible to other components in the same process and in subprocesses,
but the changed ``PYTHONIOENCODING`` default will be made implicit in order to
avoid causing compatibility problems with Python 2 subprocesses that don't
provide the ``surrogateescape`` error handler.

The new legacy locale coercion behavior can be disabled either by setting
``LC_ALL`` (which may still lead to a Unicode compatibility warning) or by
setting the new ``PYTHONCOERCECLOCALE`` environment variable to ``0``.

With this change, any \*nix platform that does *not* offer at least one of the
``C.UTF-8``, ``C.utf8`` or ``UTF-8`` locales as part of its standard
configuration would only be considered a fully supported platform for CPython
3.7+ deployments when a suitable locale other than the default ``C`` locale is
configured explicitly (e.g. ``en_AU.UTF-8``, ``zh_CN.gb18030``). If :pep:`540` is
accepted in addition to this PEP, then pure Python modules would also be
supported when using the proposed ``PYTHONUTF8`` mode, but expectations for
full Unicode compatibility in extension modules would continue to be limited
to the platforms covered by this PEP.

As it only reflects a change in default settings rather than a fundamentally
new capability, redistributors (such as Linux distributions) with a narrower
target audience than the upstream CPython development team may also choose to
opt in to this locale coercion behaviour for the Python 3.6.x series by
applying the necessary changes as a downstream patch.


Implementation Notes
====================

Attempting to implement the PEP as originally accepted showed that the
proposal to emit locale coercion and compatibility warnings by default
simply wasn't practical (there were too many cases where previously working
code failed *because of the warnings*, rather than because of latent locale
handling defects in the affected code).

As a result, the ``PY_WARN_ON_C_LOCALE`` config flag was removed, and replaced
with a runtime ``PYTHONCOERCECLOCALE=warn`` environment variable setting
that allows developers and system integrators to opt-in to receiving locale
coercion and compatibility warnings, without emitting them by default.

The output examples in the PEP itself have also been updated to remove
the warnings and make them easier to read.


Background
==========

While the CPython interpreter is starting up, it may need to convert from
the ``char *`` format to the ``wchar_t *`` format, or from one of those formats
to ``PyUnicodeObject *``, in a way that's consistent with the locale settings
of the overall system. It handles these cases by relying on the operating
system to do the conversion and then ensuring that the text encoding name
reported by ``sys.getfilesystemencoding()`` matches the encoding used during
this early bootstrapping process.

On Windows, the limitations of the ``mbcs`` format used by default in these
conversions proved sufficiently problematic that :pep:`528` and :pep:`529` were
implemented to bypass the operating system supplied interfaces for binary data
handling and force the use of UTF-8 instead.

On Mac OS X, iOS, and Android, many components, including CPython, already
assume the use of UTF-8 as the system encoding, regardless of the locale
setting. However, this isn't the case for all components, and the discrepancy
can cause problems in some situations (for example, when using the GNU readline
module [16_]).

On non-Apple and non-Android \*nix systems, these operations are handled using
the C locale system in glibc, which has the following characteristics [4_]:

* by default, all processes start in the ``C`` locale, which uses ``ASCII``
  for these conversions. This is almost never what anyone doing multilingual
  text processing actually wants (including CPython and C/C++ GUI frameworks).
* calling ``setlocale(LC_ALL, "")`` reconfigures the active locale based on
  the locale categories configured in the current process environment
* if the locale requested by the current environment is unknown, or no specific
  locale is configured, then the default ``C`` locale will remain active

The specific locale category that covers the APIs that CPython depends on is
``LC_CTYPE``, which applies to "classification and conversion of characters,
and to multibyte and wide characters" [5_]. Accordingly, CPython includes the
following key calls to ``setlocale``:

* in the main ``python`` binary, CPython calls ``setlocale(LC_ALL, "")`` to
  configure the entire C locale subsystem according to the process environment.
  It does this prior to making any calls into the shared CPython library
* in ``Py_Initialize``, CPython calls ``setlocale(LC_CTYPE, "")``, such that
  the configured locale settings for that category *always* match those set in
  the environment. It does this unconditionally, and it *doesn't* revert the
  process state change in ``Py_Finalize``

(This summary of the locale handling omits several technical details related
to exactly where and when the text encoding declared as part of the locale
settings is used - see :pep:`540` for further discussion, as these particular
details matter more when decoupling CPython from the declared C locale than
they do when overriding the locale with one based on UTF-8)

These calls are usually sufficient to provide sensible behaviour, but they can
still fail in the following cases:

* SSH environment forwarding means that SSH clients may sometimes forward
  client locale settings to servers that don't have that locale installed. This
  leads to CPython running in the default ASCII-based C locale
* some process environments (such as Linux containers) may not have any
  explicit locale configured at all. As with unknown locales, this leads to
  CPython running in the default ASCII-based C locale
* on Android, rather than configuring the locale based on environment variables,
  the empty locale ``""`` is treated as specifically requesting the ``"C"``
  locale

The simplest way to deal with this problem for currently released versions of
CPython is to explicitly set a more sensible locale when launching the
application. For example::

    LC_CTYPE=C.UTF-8 python3 ...

The ``C.UTF-8`` locale is a full locale definition that uses ``UTF-8`` for the
``LC_CTYPE`` category, and the same settings as the ``C`` locale for all other
categories (including ``LC_COLLATE``). It is offered by a number of Linux
distributions (including Debian, Ubuntu, Fedora, Alpine and Android) as an
alternative to the ASCII-based C locale. Some other platforms (such as
``HP-UX``) offer an equivalent locale definition under the name ``C.utf8``.

Mac OS X and other \*BSD systems have taken a different approach: instead of
offering a ``C.UTF-8`` locale, they offer a partial ``UTF-8`` locale that only
defines the ``LC_CTYPE`` category. On such systems, the preferred
environmental locale adjustment is to set ``LC_CTYPE=UTF-8`` rather than to set
``LC_ALL`` or ``LANG``. [17_]

In the specific case of Docker containers and similar technologies, the
appropriate locale setting can be specified directly in the container image
definition.

Another common failure case is developers specifying ``LANG=C`` in order to
see otherwise translated user interface messages in English, rather than the
more narrowly scoped ``LC_MESSAGES=C`` or ``LANGUAGE=en``.


Relationship with other PEPs
============================

This PEP shares a common problem statement with :pep:`540` (improving Python 3's
behaviour in the default C locale), but diverges markedly in the proposed
solution:

* :pep:`540` proposes to entirely decouple CPython's default text encoding from
  the C locale system in that case, allowing text handling inconsistencies to
  arise between CPython and other locale-aware components running in the same
  process and in subprocesses. This approach aims to make CPython behave less
  like a locale-aware application, and more like locale-independent language
  runtimes like those for Go, Node.js (V8), and Rust
* this PEP proposes to override the legacy C locale with a more recently
  defined locale that uses UTF-8 as its default text encoding. This means that
  the text encoding override will apply not only to CPython, but also to any
  locale-aware extension modules loaded into the current process, as well as to
  locale-aware applications invoked in subprocesses that inherit their
  environment from the parent process. This approach aims to retain CPython's
  traditional strong support for integration with other locale-aware components
  while also actively helping to push forward the adoption and standardisation
  of the C.UTF-8 locale as a Unicode-aware replacement for the legacy C locale
  in the wider C/C++ ecosystem

After reviewing both PEPs, it became clear that they didn't actually conflict
at a technical level, and the proposal in :pep:`540` offered a superior option in
cases where no suitable locale was available, as well as offering a better
reference behaviour for platforms where the notion of a "locale encoding"
doesn't make sense (for example, embedded systems running MicroPython rather
than the CPython reference interpreter).

Meanwhile, this PEP offered improved compatibility with other locale-aware
components, and an approach more amenable to being backported to Python 3.6
by downstream redistributors.

As a result, this PEP was amended to refer to :pep:`540` as a complementary
solution that offered improved behaviour when none of the standard UTF-8 based
locales were available, as well as extending the changes in the default
settings to APIs that aren't currently independently configurable (such as
the default encoding and error handler for ``open()``).

The availability of :pep:`540` also meant that the ``LC_CTYPE=en_US.UTF-8`` legacy
fallback was removed from the list of UTF-8 locales tried as a coercion target,
with the expectation being that CPython will instead rely solely on the
proposed PYTHONUTF8 mode in such cases.


Motivation
==========

While Linux container technologies like Docker, Kubernetes, and OpenShift are
best known for their use in web service development, the related container
formats and execution models are also being adopted for Linux command line
application development. Technologies like Gnome Flatpak [7_] and
Ubuntu Snappy [8_] further aim to bring these same techniques to Linux GUI
application development.

When using Python 3 for application development in these contexts, it isn't
uncommon to see text encoding related errors akin to the following::

    $ docker run --rm fedora:25 python3 -c 'print("ℙƴ☂ℌøἤ")'
    Unable to decode the command from the command line:
    UnicodeEncodeError: 'utf-8' codec can't encode character '\udce2' in position 7: surrogates not allowed
    $ docker run --rm ncoghlan/debian-python python3 -c 'print("ℙƴ☂ℌøἤ")'
    Unable to decode the command from the command line:
    UnicodeEncodeError: 'utf-8' codec can't encode character '\udce2' in position 7: surrogates not allowed

Even though the same command is likely to work fine when run locally::

    $ python3 -c 'print("ℙƴ☂ℌøἤ")'
    ℙƴ☂ℌøἤ

The source of the problem can be seen by instead running the ``locale`` command
in the three environments::

    $ locale | grep -E 'LC_ALL|LC_CTYPE|LANG'
    LANG=en_AU.UTF-8
    LC_CTYPE="en_AU.UTF-8"
    LC_ALL=
    $ docker run --rm fedora:25 locale | grep -E 'LC_ALL|LC_CTYPE|LANG'
    LANG=
    LC_CTYPE="POSIX"
    LC_ALL=
    $ docker run --rm ncoghlan/debian-python locale | grep -E 'LC_ALL|LC_CTYPE|LANG'
    LANG=
    LANGUAGE=
    LC_CTYPE="POSIX"
    LC_ALL=

In this particular example, we can see that the host system locale is set to
"en_AU.UTF-8", so CPython uses UTF-8 as the default text encoding. By contrast,
the base Docker images for Fedora and Debian don't have any specific locale
set, so they use the POSIX locale by default, which is an alias for the
ASCII-based default C locale.

The simplest way to get Python 3 (regardless of the exact version) to behave
sensibly in Fedora and Debian based containers is to run it in the ``C.UTF-8``
locale that both distros provide::

    $ docker run --rm -e LC_CTYPE=C.UTF-8 fedora:25 python3 -c 'print("ℙƴ☂ℌøἤ")'
    ℙƴ☂ℌøἤ
    $ docker run --rm -e LC_CTYPE=C.UTF-8 ncoghlan/debian-python python3 -c 'print("ℙƴ☂ℌøἤ")'
    ℙƴ☂ℌøἤ

    $ docker run --rm -e LC_CTYPE=C.UTF-8 fedora:25 locale | grep -E 'LC_ALL|LC_CTYPE|LANG'
    LANG=
    LC_CTYPE=C.UTF-8
    LC_ALL=
    $ docker run --rm -e LC_CTYPE=C.UTF-8 ncoghlan/debian-python locale | grep -E 'LC_ALL|LC_CTYPE|LANG'
    LANG=
    LANGUAGE=
    LC_CTYPE=C.UTF-8
    LC_ALL=

The Alpine Linux based Python images provided by Docker, Inc. already use the
C.UTF-8 locale by default::

    $ docker run --rm python:3 python3 -c 'print("ℙƴ☂ℌøἤ")'
    ℙƴ☂ℌøἤ
    $ docker run --rm python:3 locale | grep -E 'LC_ALL|LC_CTYPE|LANG'
    LANG=C.UTF-8
    LANGUAGE=
    LC_CTYPE="C.UTF-8"
    LC_ALL=

Similarly, for custom container images (i.e. those adding additional content on
top of a base distro image), a more suitable locale can be set in the image
definition so everything just works by default. However, it would provide a much
nicer and more consistent user experience if CPython were able to just deal
with this problem automatically rather than relying on redistributors or end
users to handle it through system configuration changes.

While the glibc developers are working towards making the C.UTF-8 locale
universally available for use by glibc based applications like CPython [6_],
this unfortunately doesn't help on platforms that ship older versions of glibc
without that feature, and also don't provide C.UTF-8 (or an equivalent) as an
on-disk locale the way Debian and Fedora do. These platforms are considered
out of scope for this PEP - see :pep:`540` for further discussion of possible
options for improving CPython's default behaviour in such environments.


Design Principles
=================

The above motivation leads to the following core design principles for the
proposed solution:

* if a locale other than the default C locale is explicitly configured, we'll
  continue to respect it
* as far as is feasible, any changes made will use *existing* configuration
  options
* Python's runtime behaviour in potential coercion target locales should be
  identical regardless of whether the locale was set explicitly in the
  environment or implicitly as a locale coercion target
* for Python 3.7, if we're changing the locale setting without an explicit
  config option, we'll emit a warning on stderr that we're doing so rather
  than silently changing the process configuration. This will alert application
  and system integrators to the change, even if they don't closely follow the
  PEP process or Python release announcements. However, to minimize the chance
  of introducing new problems for end users, we'll do this *without* using the
  warnings system, so even running with ``-Werror`` won't turn it into a runtime
  exception. (Note: these warnings ended up being silenced by default. See the
  Implementation Note above for more details)
* for Python 3.7, any changed defaults will offer some form of explicit "off"
  switch at build time, runtime, or both


Minimizing the negative impact on systems currently correctly configured to
use GB-18030 or another partially ASCII compatible universal encoding leads to
the following design principle:

* if a UTF-8 based Linux container is run on a host that is explicitly
  configured to use a non-UTF-8 encoding, and tries to exchange locally
  encoded data with that host rather than exchanging explicitly UTF-8 encoded
  data, CPython will endeavour to correctly round-trip host provided data that
  is concatenated or split solely at common ASCII compatible code points, but
  may otherwise emit nonsensical results.

Minimizing the negative impact on systems and programs correctly configured to
use an explicit locale category like ``LC_TIME``, ``LC_MONETARY`` or
``LC_NUMERIC`` while otherwise running in the legacy C locale gives the
following design principles:

* don't make any environmental changes that would alter any existing settings
  for locale categories other than ``LC_CTYPE`` (most notably: don't set
  ``LC_ALL`` or ``LANG``)

Finally, maintaining compatibility with running arbitrary subprocesses in
orchestration use cases leads to the following design principle:

* don't make any Python-specific environmental changes that might be
  incompatible with any still supported version of CPython (including
  CPython 2.7)


Specification
=============

To better handle the cases where CPython would otherwise end up attempting
to operate in the ``C`` locale, this PEP proposes that CPython automatically
attempt to coerce the legacy ``C`` locale to a UTF-8 based locale for the
``LC_CTYPE`` category when it is run as a standalone command line application.

It further proposes to emit a warning on stderr if the legacy ``C`` locale
is in effect for the ``LC_CTYPE`` category at the point where the language
runtime itself is initialized,
and the explicit environmental flag to disable locale coercion is not set, in
order to warn system and application integrators that they're running CPython
in an unsupported configuration.

In addition to these general changes, some additional Android-specific changes
are proposed to handle the differences in the behaviour of ``setlocale`` on that
platform.


Legacy C locale coercion in the standalone Python interpreter binary
--------------------------------------------------------------------

When run as a standalone application, CPython has the opportunity to
reconfigure the C locale before any locale dependent operations are executed
in the process.

This means that it can change the locale settings not only for the CPython
runtime, but also for any other locale-aware components running in the current
process (e.g. as part of extension modules), as well as in subprocesses that
inherit their environment from the current process.

After calling ``setlocale(LC_ALL, "")`` to initialize the locale settings in
the current process, the main interpreter binary will be updated to include
the following call::

    const char *ctype_loc = setlocale(LC_CTYPE, NULL);

This cryptic invocation is the API that C provides to query the current locale
setting without changing it. Given that query, it is possible to check for
exactly the ``C`` locale with ``strcmp``::

    ctype_loc != NULL && strcmp(ctype_loc, "C") == 0 # true only in the C locale

This call also returns ``"C"`` when either no particular locale is set, or the
nominal locale is set to an alias for the ``C`` locale (such as ``POSIX``).

Given this information, CPython can then attempt to coerce the locale to one
that uses UTF-8 rather than ASCII as the default encoding.

Three such locales will be tried:

* ``C.UTF-8`` (available at least in Debian, Ubuntu, Alpine, and Fedora 25+, and
  expected to be available by default in a future version of glibc)
* ``C.utf8`` (available at least in HP-UX)
* ``UTF-8`` (available in at least some \*BSD variants, including Mac OS X)

The coercion will be implemented by setting the ``LC_CTYPE`` environment
variable to the candidate locale name, such that future calls to
``setlocale()`` will see it, as will other components looking for those
settings (such as GUI development frameworks and Python's own ``locale``
module).

To allow for better cross-platform binary portability and to adjust
automatically to future changes in locale availability, these checks will be
implemented at runtime on all platforms other than Windows, rather than
attempting to determine which locales to try at compile time.

When this locale coercion is activated, the following warning will be
printed on stderr, with the warning containing whichever locale was
successfully configured::

    Python detected LC_CTYPE=C: LC_CTYPE coerced to C.UTF-8 (set another
    locale or PYTHONCOERCECLOCALE=0 to disable this locale coercion behaviour).

(Note: this warning ended up being silenced by default. See the
Implementation Note above for more details)

As long as the current platform provides at least one of the candidate UTF-8
based environments, this locale coercion will mean that the standard
Python binary *and* locale-aware extensions should once again "just work"
in the three main failure cases we're aware of (missing locale
settings, SSH forwarding of unknown locales via ``LANG`` or ``LC_CTYPE``, and
developers explicitly requesting ``LANG=C``).

The one case where failures may still occur is when ``stderr`` is specifically
being checked for no output, which can be resolved either by configuring
a locale other than the C locale, or else by using a mechanism other than
"there was no output on stderr" to check for subprocess errors (e.g. checking
process return codes).

If none of the candidate locales are successfully configured, or the ``LC_ALL``,
locale override is defined in the current process environment, then
initialization will continue in the C locale and the Unicode compatibility
warning described in the next section will be emitted just as it would for
any other application.

If ``PYTHONCOERCECLOCALE=0`` is explicitly set, initialization will continue in
the C locale and the Unicode compatibility warning described in the next
section will be automatically suppressed.

The interpreter will always check for the ``PYTHONCOERCECLOCALE`` environment
variable at startup (even when running under the ``-E`` or ``-I`` switches),
as the locale coercion check necessarily takes place before any command line
argument processing. For consistency, the runtime check to determine whether
or not to suppress the locale compatibility warning will be similarly
independent of these settings.


Legacy C locale warning during runtime initialization
-----------------------------------------------------

By the time that ``Py_Initialize`` is called, arbitrary locale-dependent
operations may have taken place in the current process. This means that
by the time it is called, it is *too late* to reliably switch to a different
locale - doing so would introduce inconsistencies in decoded text, even in the
context of the standalone Python interpreter binary.

Accordingly, when ``Py_Initialize`` is called and CPython detects that the
configured locale is still the default ``C`` locale and
``PYTHONCOERCECLOCALE=0`` is not set, the following warning will be issued::

   Python runtime initialized with LC_CTYPE=C (a locale with default ASCII
   encoding), which may cause Unicode compatibility problems. Using C.UTF-8,
   C.utf8, or UTF-8 (if available) as alternative Unicode-compatible
   locales is recommended.

(Note: this warning ended up being silenced by default. See the
Implementation Note above for more details)

In this case, no actual change will be made to the locale settings.

Instead, the warning informs both system and application integrators that
they're running Python 3 in a configuration that we don't expect to work
properly.

The second sentence providing recommendations may eventually be conditionally
compiled based on the operating system (e.g. recommending ``LC_CTYPE=UTF-8``
on \*BSD systems), but the initial implementation will just use the common
generic message shown above.


New build-time configuration options
------------------------------------

While both of the above behaviours would be enabled by default, they would
also have new associated configuration options and preprocessor definitions
for the benefit of redistributors that want to override those default settings.

The locale coercion behaviour would be controlled by the flag
``--with[out]-c-locale-coercion``, which would set the ``PY_COERCE_C_LOCALE``
preprocessor definition.

The locale warning behaviour would be controlled by the flag
``--with[out]-c-locale-warning``, which would set the ``PY_WARN_ON_C_LOCALE``
preprocessor definition.

(Note: this compile time warning option ended up being replaced by a runtime
``PYTHONCOERCECLOCALE=warn`` option. See the Implementation Note above for
more details)

On platforms which don't use the ``autotools`` based build system (i.e.
Windows) these preprocessor variables would always be undefined.


Changes to the default error handling on the standard streams
-------------------------------------------------------------

Since Python 3.5, CPython has defaulted to using ``surrogateescape`` on the
standard streams (``sys.stdin``, ``sys.stdout``) when it detects that the
current locale is ``C`` and no specific error handled has been set using
either the ``PYTHONIOENCODING`` environment variable or the
``Py_setStandardStreamEncoding`` API. For other locales, the default error
handler for the standard streams is ``strict``.

In order to preserve this behaviour without introducing any behavioural
discrepancies between locale coercion and explicitly configuring a locale, the
coercion target locales (``C.UTF-8``, ``C.utf8``, and ``UTF-8``) will be added
to the list of locales that use ``surrogateescape`` as their default error
handler for the standard streams.

No changes are proposed to the default error handler for ``sys.stderr``: that
will continue to be ``backslashreplace``.


Changes to locale settings on Android
-------------------------------------

Independently of the other changes in this PEP, CPython on Android systems
will be updated to call ``setlocale(LC_ALL, "C.UTF-8")`` where it currently
calls ``setlocale(LC_ALL, "")`` and ``setlocale(LC_CTYPE, "C.UTF-8")`` where
it currently calls ``setlocale(LC_CTYPE, "")``.

This Android-specific behaviour is being introduced due to the following
Android-specific details:

* on Android, passing ``""`` to ``setlocale`` is equivalent to passing ``"C"``
* the ``C.UTF-8`` locale is always available


Platform Support Changes
========================

A new "Legacy C Locale" section will be added to :pep:`11` that states:

* as of CPython 3.7, \*nix platforms are expected to provide at least one of
  ``C.UTF-8`` (full locale), ``C.utf8`` (full locale) or ``UTF-8`` (
  ``LC_CTYPE``-only locale) as an alternative to the legacy ``C`` locale.
  Any Unicode related integration problems that occur only in the legacy ``C``
  locale and cannot be reproduced in an appropriately configured non-ASCII
  locale will be closed as "won't fix".


Rationale
=========


Improving the handling of the C locale
--------------------------------------

It has been clear for some time that the C locale's default encoding of
``ASCII`` is entirely the wrong choice for development of modern networked
services. Newer languages like Rust and Go have eschewed that default entirely,
and instead made it a deployment requirement that systems be configured to use
UTF-8 as the text encoding for operating system interfaces. Similarly, Node.js
assumes UTF-8 by default (a behaviour inherited from the V8 JavaScript engine)
and requires custom build settings to indicate it should use the system
locale settings for locale-aware operations. Both the JVM and the .NET CLR
use UTF-16-LE as their primary encoding for passing text between applications
and the application runtime (i.e. the JVM/CLR, not the host operating system).

The challenge for CPython has been the fact that in addition to being used for
network service development, it is also extensively used as an embedded
scripting language in larger applications, and as a desktop application
development language, where it is more important to be consistent with other
locale-aware components sharing the same process, as well as with the user's
desktop locale settings, than it is with the emergent conventions of modern
network service development.

The core premise of this PEP is that for *all* of these use cases, the
assumption of ASCII implied by the default "C" locale is the wrong choice,
and furthermore that the following assumptions are valid:

* in desktop application use cases, the process locale will *already* be
  configured appropriately, and if it isn't, then that is an operating system
  or embedding application level problem that needs to be reported to and
  resolved by the operating system provider or application developer
* in network service development use cases (especially those based on Linux
  containers), the process locale may not be configured *at all*, and if it
  isn't, then the expectation is that components will impose their own default
  encoding the way Rust, Go and Node.js do, rather than trusting the legacy C
  default encoding of ASCII the way CPython currently does


Defaulting to "surrogateescape" error handling on the standard IO streams
-------------------------------------------------------------------------

By coercing the locale away from the legacy C default and its assumption of
ASCII as the preferred text encoding, this PEP also disables the implicit use
of the "surrogateescape" error handler on the standard IO streams that was
introduced in Python 3.5 ([15_]), as well as the automatic use of
``surrogateescape`` when operating in :pep:`540`'s proposed UTF-8 mode.

Rather than introducing yet another configuration option to adjust that
behaviour, this PEP instead proposes to extend the "surrogateescape" default
for ``stdin`` and ``stderr`` error handling to also apply to the three
potential coercion target locales.

The aim of this behaviour is to attempt to ensure that operating system
provided text values are typically able to be transparently passed through a
Python 3 application even if it is incorrect in assuming that that text has
been encoded as UTF-8.

In particular, GB 18030 [12_] is a Chinese national text encoding standard
that handles all Unicode code points, that is formally incompatible with both
ASCII and UTF-8, but will nevertheless often tolerate processing as surrogate
escaped data - the points where GB 18030 reuses ASCII byte values in an
incompatible way are likely to be invalid in UTF-8, and will therefore be
escaped and opaque to string processing operations that split on or search for
the relevant ASCII code points. Operations that don't involve splitting on or
searching for particular ASCII or Unicode code point values are almost
certain to work correctly.

Similarly, Shift-JIS [13_] and ISO-2022-JP [14_] remain in widespread use in
Japan, and are incompatible with both ASCII and UTF-8, but will tolerate text
processing operations that don't involve splitting on or searching for
particular ASCII or Unicode code point values.

As an example, consider two files, one encoded with UTF-8 (the default encoding
for ``en_AU.UTF-8``), and one encoded with GB-18030 (the default encoding for
``zh_CN.gb18030``)::

    $ python3 -c 'open("utf8.txt", "wb").write("ℙƴ☂ℌøἤ\n".encode("utf-8"))'
    $ python3 -c 'open("gb18030.txt", "wb").write("ℙƴ☂ℌøἤ\n".encode("gb18030"))'

On disk, we can see that these are two very different files::

    $ python3 -c 'print("UTF-8:  ", open("utf8.txt", "rb").read().strip()); \
                  print("GB18030:", open("gb18030.txt", "rb").read().strip())'
    UTF-8:   b'\xe2\x84\x99\xc6\xb4\xe2\x98\x82\xe2\x84\x8c\xc3\xb8\xe1\xbc\xa4\n'
    GB18030: b'\x816\xbd6\x810\x9d0\x817\xa29\x816\xbc4\x810\x8b3\x816\x8d6\n'

That nevertheless can both be rendered correctly to the terminal as long as
they're decoded prior to printing::

    $ python3 -c 'print("UTF-8:  ", open("utf8.txt", "r", encoding="utf-8").read().strip()); \
                  print("GB18030:", open("gb18030.txt", "r", encoding="gb18030").read().strip())'
    UTF-8:   ℙƴ☂ℌøἤ
    GB18030: ℙƴ☂ℌøἤ

By contrast, if we just pass along the raw bytes, as ``cat`` and similar C/C++
utilities will tend to do::

    $ LANG=en_AU.UTF-8 cat utf8.txt gb18030.txt
    ℙƴ☂ℌøἤ
    <20>6<EFBFBD>6<EFBFBD>0<EFBFBD>0<EFBFBD>7<EFBFBD>9<EFBFBD>6<EFBFBD>4<EFBFBD>0<EFBFBD>3<EFBFBD>6<EFBFBD>6

Even setting a specifically Chinese locale won't help in getting the
GB-18030 encoded file rendered correctly::

    $ LANG=zh_CN.gb18030 cat utf8.txt gb18030.txt
    ℙƴ☂ℌøἤ
    <20>6<EFBFBD>6<EFBFBD>0<EFBFBD>0<EFBFBD>7<EFBFBD>9<EFBFBD>6<EFBFBD>4<EFBFBD>0<EFBFBD>3<EFBFBD>6<EFBFBD>6

The problem is that the *terminal* encoding setting remains UTF-8, regardless
of the nominal locale. A GB18030 terminal can be emulated using the ``iconv``
utility::

    $ cat utf8.txt gb18030.txt | iconv -f GB18030 -t UTF-8
    鈩櫰粹槀鈩屆羔激
    ℙƴ☂ℌøἤ

This reverses the problem, such that the GB18030 file is rendered correctly,
but the UTF-8 file has been converted to unrelated hanzi characters, rather than
the expected rendering of "Python" as non-ASCII characters.

With the emulated GB18030 terminal encoding, assuming UTF-8 in Python results
in *both* files being displayed incorrectly::

    $ python3 -c 'print("UTF-8:  ", open("utf8.txt", "r", encoding="utf-8").read().strip()); \
                  print("GB18030:", open("gb18030.txt", "r", encoding="gb18030").read().strip())' \
      | iconv -f GB18030 -t UTF-8
    UTF-8:   鈩櫰粹槀鈩屆羔激
    GB18030: 鈩櫰粹槀鈩屆羔激

However, setting the locale correctly means that the emulated GB18030 terminal
now displays both files as originally intended::

    $ LANG=zh_CN.gb18030 \
      python3 -c 'print("UTF-8:  ", open("utf8.txt", "r", encoding="utf-8").read().strip()); \
                  print("GB18030:", open("gb18030.txt", "r", encoding="gb18030").read().strip())' \
      | iconv -f GB18030 -t UTF-8
    UTF-8:   ℙƴ☂ℌøἤ
    GB18030: ℙƴ☂ℌøἤ

The rationale for retaining ``surrogateescape`` as the default IO encoding is
that it will preserve the following helpful behaviour in the ``C`` locale::

    $ cat gb18030.txt \
      | LANG=C python3 -c "import sys; print(sys.stdin.read())" \
      | iconv -f GB18030 -t UTF-8
    ℙƴ☂ℌøἤ

Rather than reverting to the exception currently seen when a UTF-8 based locale is
explicitly configured::

    $ cat gb18030.txt \
      | python3 -c "import sys; print(sys.stdin.read())" \
      | iconv -f GB18030 -t UTF-8
    Traceback (most recent call last):
    File "<string>", line 1, in <module>
    File "/usr/lib64/python3.5/codecs.py", line 321, in decode
        (result, consumed) = self._buffer_decode(data, self.errors, final)
    UnicodeDecodeError: 'utf-8' codec can't decode byte 0x81 in position 0: invalid start byte

As an added benefit, environments explicitly configured to use one of the
coercion target locales will implicitly gain the encoding transparency behaviour
currently enabled by default in the ``C`` locale.


Avoiding setting PYTHONIOENCODING during UTF-8 locale coercion
--------------------------------------------------------------

Rather than changing the default handling of the standard streams during
interpreter initialization, earlier versions of this PEP proposed setting
``PYTHONIOENCODING`` to ``utf-8:surrogateescape``. This turned out to create
a significant compatibility problem: since the ``surrogateescape`` handler
only exists in Python 3.1+, running Python 2.7 processes in subprocesses could
potentially break in a confusing way with that configuration.

The current design means that earlier Python versions will instead retain their
default ``strict`` error handling on the standard streams, while Python 3.7+
will consistently use the more permissive ``surrogateescape`` handler even
when these locales are explicitly configured (rather than being reached through
locale coercion).


Dropping official support for ASCII based text handling in the legacy C locale
------------------------------------------------------------------------------

We've been trying to get strict bytes/text separation to work reliably in the
legacy C locale for over a decade at this point. Not only haven't we been able
to get it to work, neither has anyone else - the only viable alternatives
identified have been to pass the bytes along verbatim without eagerly decoding
them to text (C/C++, Python 2.x, Ruby, etc), or else to largely ignore the
nominal C/C++ locale encoding and assume the use of either UTF-8 (:pep:`540`,
Rust, Go, Node.js, etc) or UTF-16-LE (JVM, .NET CLR).

While this PEP ensures that developers that genuinely need to do so can still
opt-in to running their Python code in the legacy C locale (by setting
``LC_ALL=C``, ``PYTHONCOERCECLOCALE=0``, or running a custom build that sets
``--without-c-locale-coercion``), it also makes it clear that we *don't*
expect Python 3's Unicode handling to be completely reliable in that
configuration, and the recommended alternative is to use a more appropriate
locale setting (potentially in combination with :pep:`540`'s UTF-8 mode, if that
is available).


Providing implicit locale coercion only when running standalone
---------------------------------------------------------------

The major downside of the proposed design in this PEP is that it introduces a
potential discrepancy between the behaviour of the CPython runtime when it is
run as a standalone application and when it is run as an embedded component
inside a larger system (e.g. ``mod_wsgi`` running inside Apache ``httpd``).

Over the course of Python 3.x development, multiple attempts have been made
to improve the handling of incorrect locale settings at the point where the
Python interpreter is initialised. The problem that emerged is that this is
ultimately *too late* in the interpreter startup process - data such as command
line arguments and the contents of environment variables may have already been
retrieved from the operating system and processed under the incorrect ASCII
text encoding assumption well before ``Py_Initialize`` is called.

The problems created by those inconsistencies were then even harder to diagnose
and debug than those created by believing the operating system's claim that
ASCII was a suitable encoding to use for operating system interfaces. This was
the case even for the default CPython binary, let alone larger C/C++
applications that embed CPython as a scripting engine.

The approach proposed in this PEP handles that problem by moving the locale
coercion as early as possible in the interpreter startup sequence when running
standalone: it takes place directly in the C-level ``main()`` function, even
before calling in to the ``Py_Main()`` library function that implements the
features of the CPython interpreter CLI.

The ``Py_Initialize`` API then only gains an explicit warning (emitted on
``stderr``) when it detects use of the ``C`` locale, and relies on the
embedding application to specify something more reasonable.

That said, the reference implementation for this PEP adds most of the
functionality to the shared library, with the CLI being updated to
unconditionally call two new private APIs::

    if (_Py_LegacyLocaleDetected()) {
        _Py_CoerceLegacyLocale();
    }

These are similar to other "pre-configuration" APIs intended for embedding
applications: they're designed to be called *before* ``Py_Initialize``, and
hence change the way the interpreter gets initialized.

If these were made public (either as part of this PEP or in a subsequent RFE),
then it would be straightforward for other embedding applications to recreate
the same behaviour as is proposed for the CPython CLI.


Allowing restoration of the legacy behaviour
--------------------------------------------

The CPython command line interpreter is often used to investigate faults that
occur in other applications that embed CPython, and those applications may still
be using the C locale even after this PEP is implemented.

Providing a simple on/off switch for the locale coercion behaviour makes it
much easier to reproduce the behaviour of such applications for debugging
purposes, as well as making it easier to reproduce the behaviour of older 3.x
runtimes even when running a version with this change applied.


Querying LC_CTYPE for C locale detection
----------------------------------------

``LC_CTYPE`` is the actual locale category that CPython relies on to drive the
implicit decoding of environment variables, command line arguments, and other
text values received from the operating system.

As such, it makes sense to check it specifically when attempting to determine
whether or not the current locale configuration is likely to cause Unicode
handling problems.


Explicitly setting LC_CTYPE for UTF-8 locale coercion
-----------------------------------------------------

Python is often used as a glue language, integrating other C/C++ ABI compatible
components in the current process, and components written in arbitrary
languages in subprocesses.

Setting ``LC_CTYPE`` to ``C.UTF-8`` is important to handle cases where the
problem has arisen from a setting like ``LC_CTYPE=UTF-8`` being provided on a
system where no ``UTF-8`` locale is defined (e.g. when a Mac OS X ssh client is
configured to forward locale settings, and the user logs into a Linux server).

This should be sufficient to ensure that when the locale coercion is activated,
the switch to the UTF-8 based locale will be applied consistently across the
current process and any subprocesses that inherit the current environment.


Avoiding setting LANG for UTF-8 locale coercion
-----------------------------------------------

Earlier versions of this PEP proposed setting the ``LANG`` category independent
default locale, in addition to setting ``LC_CTYPE``.

This was later removed on the grounds that setting only ``LC_CTYPE`` is
sufficient to handle all of the problematic scenarios that the PEP aimed
to resolve, while setting ``LANG`` as well would break cases where ``LANG``
was set correctly, and the locale problems were solely due to an incorrect
``LC_CTYPE`` setting ([22_]).

For example, consider a Python application that called the Linux ``date``
utility in a subprocess rather than doing its own date formatting::

    $ LANG=ja_JP.UTF-8 LC_CTYPE=C date
    2017年  5月 23日 火曜日 17:31:03 JST

    $ LANG=ja_JP.UTF-8 LC_CTYPE=C.UTF-8 date  # Coercing only LC_CTYPE
    2017年  5月 23日 火曜日 17:32:58 JST

    $ LANG=C.UTF-8 LC_CTYPE=C.UTF-8 date  # Coercing both of LC_CTYPE and LANG
    Tue May 23 17:31:10 JST 2017

With only ``LC_CTYPE`` updated in the Python process, the subprocess would
continue to behave as expected. However, if ``LANG`` was updated as well,
that would effectively override the ``LC_TIME`` setting and use the wrong
date formatting conventions.


Avoiding setting LC_ALL for UTF-8 locale coercion
-------------------------------------------------

Earlier versions of this PEP proposed setting the ``LC_ALL`` locale override,
in addition to setting ``LC_CTYPE``.

This was changed after it was determined that just setting ``LC_CTYPE`` and
``LANG`` should be sufficient to handle all the scenarios the PEP aims to
cover, as it avoids causing any problems in cases like the following::

    $ LANG=C LC_MONETARY=ja_JP.utf8 ./python -c \
      "from locale import setlocale, LC_ALL, currency; setlocale(LC_ALL, ''); print(currency(1e6))"
    ￥1000000


Skipping locale coercion if LC_ALL is set in the current environment
--------------------------------------------------------------------

With locale coercion now only setting ``LC_CTYPE`` and ``LANG``, it will have
no effect if ``LC_ALL`` is also set. To avoid emitting a spurious locale
coercion notice in that case, coercion is instead skipped entirely.


Considering locale coercion independently of "UTF-8 mode"
---------------------------------------------------------

With both this PEP's locale coercion and :pep:`540`'s UTF-8 mode under
consideration for Python 3.7, it makes sense to ask whether or not we can
limit ourselves to only doing one or the other, rather than making both
changes.

The UTF-8 mode proposed in :pep:`540` has two major limitations that make it a
potential complement to this PEP rather than a potential replacement.

First, unlike this PEP, :pep:`540`'s UTF-8 mode makes it possible to change default
behaviours that are not currently configurable at all. While that's exactly
what makes the proposal interesting, it's also what makes it an entirely
unproven approach. By contrast, the approach proposed in this PEP builds
directly atop existing configuration settings for the C locale system (
``LC_CTYPE``, ``LANG``) and Python's standard streams (``PYTHONIOENCODING``)
that have already been in use for years to handle the kinds of compatibility
problems discussed in this PEP.

Secondly, one of the things we know based on that experience is that the
proposed locale coercion can resolve problems not only in CPython itself,
but also in extension modules that interact with the standard streams, like
GNU readline. As an example, consider the following interactive session
from a :pep:`538` enabled CPython build, where each line after the first is
executed by doing "up-arrow, left-arrow x4, delete, enter"::

    $ LANG=C ./python
    Python 3.7.0a0 (heads/pep538-coerce-c-locale:188e780, May  7 2017, 00:21:13)
    [GCC 6.3.1 20161221 (Red Hat 6.3.1-1)] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> print("ℙƴ☂ℌøἤ")
    ℙƴ☂ℌøἤ
    >>> print("ℙƴ☂ℌἤ")
    ℙƴ☂ℌἤ
    >>> print("ℙƴ☂ἤ")
    ℙƴ☂ἤ
    >>> print("ℙƴἤ")
    ℙƴἤ
    >>> print("ℙἤ")
    ℙἤ
    >>> print("ἤ")
    ἤ
    >>>

This is exactly what we'd expect from a well-behaved command history editor.

By contrast, the following is what currently happens on an older release if
you only change the Python level stream encoding settings without updating the
locale settings::

    $ LANG=C PYTHONIOENCODING=utf-8:surrogateescape python3
    Python 3.5.3 (default, Apr 24 2017, 13:32:13)
    [GCC 6.3.1 20161221 (Red Hat 6.3.1-1)] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> print("ℙƴ☂ℌøἤ")
    ℙƴ☂ℌøἤ
    >>> print("ℙƴ☂ℌ<E29882>")
     File "<stdin>", line 0

       ^
    SyntaxError: 'utf-8' codec can't decode bytes in position 20-21:
    invalid continuation byte

That particular misbehaviour is coming from GNU readline, *not* CPython -
because the command history editing wasn't UTF-8 aware, it corrupted the history
buffer and fed such nonsense to stdin that even the surrogateescape error
handler was bypassed. While :pep:`540`'s UTF-8 mode could technically be updated
to also reconfigure readline, that's just *one* extension module that might
be interacting with the standard streams without going through the CPython
C API, and any change made by CPython would only apply when readline is running
directly as part of Python 3.7 rather than in a separate subprocess.

However, if we actually change the configured locale, GNU readline starts
behaving itself, without requiring any changes to the embedding application::

    $ LANG=C.UTF-8 python3
    Python 3.5.3 (default, Apr 24 2017, 13:32:13)
    [GCC 6.3.1 20161221 (Red Hat 6.3.1-1)] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> print("ℙƴ☂ℌøἤ")
    ℙƴ☂ℌøἤ
    >>> print("ℙƴ☂ℌἤ")
    ℙƴ☂ℌἤ
    >>> print("ℙƴ☂ἤ")
    ℙƴ☂ἤ
    >>> print("ℙƴἤ")
    ℙƴἤ
    >>> print("ℙἤ")
    ℙἤ
    >>> print("ἤ")
    ἤ
    >>>
    $ LC_CTYPE=C.UTF-8 python3
    Python 3.5.3 (default, Apr 24 2017, 13:32:13)
    [GCC 6.3.1 20161221 (Red Hat 6.3.1-1)] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> print("ℙƴ☂ℌøἤ")
    ℙƴ☂ℌøἤ
    >>> print("ℙƴ☂ℌἤ")
    ℙƴ☂ℌἤ
    >>> print("ℙƴ☂ἤ")
    ℙƴ☂ἤ
    >>> print("ℙƴἤ")
    ℙƴἤ
    >>> print("ℙἤ")
    ℙἤ
    >>> print("ἤ")
    ἤ
    >>>


Enabling C locale coercion and warnings on Mac OS X, iOS and Android
--------------------------------------------------------------------

On Mac OS X, iOS, and Android, CPython already assumes the use of UTF-8 for
system interfaces, and we expect most other locale-aware components to do the
same.

Accordingly, this PEP originally proposed to disable locale coercion and
warnings at build time for these platforms, on the assumption that it would
be entirely redundant.

However, that assumption turned out to be incorrect, as subsequent
investigations showed that if you explicitly configure ``LANG=C`` on
these platforms, extension modules like GNU readline will misbehave in much the
same way as they do on other \*nix systems. [21_]

In addition, Mac OS X is also frequently used as a development and testing
platform for Python software intended for deployment to other \*nix environments
(such as Linux or Android), and Linux is similarly often used as a development
and testing platform for mobile and Mac OS X applications.

Accordingly, this PEP enables the locale coercion and warning features by
default on all platforms that use CPython's ``autotools`` based build toolchain
(i.e. everywhere other than Windows).


Implementation
==============

The reference implementation is being developed in the
``pep538-coerce-c-locale`` feature branch [18_] in Nick Coghlan's fork of the
CPython repository on GitHub. A work-in-progress PR is available at [20_].

This reference implementation covers not only the enhancement request in
issue 28180 [1_], but also the Android compatibility fixes needed to resolve
issue 28997 [16_].


Backporting to earlier Python 3 releases
========================================

Backporting to Python 3.6.x
---------------------------

If this PEP is accepted for Python 3.7, redistributors backporting the change
specifically to their initial Python 3.6.x release will be both allowed and
encouraged. However, such backports should only be undertaken either in
conjunction with the changes needed to also provide a suitable locale by
default, or else specifically for platforms where such a locale is already
consistently available.

At least the Fedora project is planning to pursue this approach for the
upcoming Fedora 26 release [19_].


Backporting to other 3.x releases
---------------------------------

While the proposed behavioural change is seen primarily as a bug fix addressing
Python 3's current misbehaviour in the default ASCII-based C locale, it still
represents a reasonably significant change in the way CPython interacts with
the C locale system. As such, while some redistributors may still choose to
backport it to even earlier Python 3.x releases based on the needs and
interests of their particular user base, this wouldn't be encouraged as a
general practice.

However, configuring Python 3 *environments* (such as base container
images) to use these configuration settings by default is both allowed
and recommended.


Acknowledgements
================

The locale coercion approach proposed in this PEP is inspired directly by
Armin Ronacher's handling of this problem in the ``click`` command line
utility development framework [2_]::

    $ LANG=C python3 -c 'import click; cli = click.command()(lambda:None); cli()'
    Traceback (most recent call last):
      ...
    RuntimeError: Click will abort further execution because Python 3 was
    configured to use ASCII as encoding for the environment.  Either run this
    under Python 2 or consult http://click.pocoo.org/python3/ for mitigation
    steps.

    This system supports the C.UTF-8 locale which is recommended.
    You might be able to resolve your issue by exporting the
    following environment variables:

        export LC_ALL=C.UTF-8
        export LANG=C.UTF-8

The change was originally proposed as a downstream patch for Fedora's
system Python 3.6 package [3_], and then reformulated as a PEP for Python 3.7
with a section allowing for backports to earlier versions by redistributors.
In parallel with the development of the upstream patch, Charalampos Stratakis
has been working on the Fedora 26 backport and providing feedback on the
practical viability of the proposed changes.

The initial draft was posted to the Python Linux SIG for discussion [10_] and
then amended based on both that discussion and Victor Stinner's work in
:pep:`540` [11_].

The "ℙƴ☂ℌøἤ" string used in the Unicode handling examples throughout this PEP
is taken from Ned Batchelder's excellent "Pragmatic Unicode" presentation [9_].

Stephen Turnbull has long provided valuable insight into the text encoding
handling challenges he regularly encounters at the University of Tsukuba
(筑波大学).


References
==========

.. [1] CPython: sys.getfilesystemencoding() should default to utf-8
   (http://bugs.python.org/issue28180)

.. [2] Locale configuration required for click applications under Python 3
   (http://click.pocoo.org/5/python3/#python-3-surrogate-handling)

.. [3] Fedora: force C.UTF-8 when Python 3 is run under the C locale
   (https://bugzilla.redhat.com/show_bug.cgi?id=1404918)

.. [4] GNU C: How Programs Set the Locale
   ( https://www.gnu.org/software/libc/manual/html_node/Setting-the-Locale.html)

.. [5] GNU C: Locale Categories
   (https://www.gnu.org/software/libc/manual/html_node/Locale-Categories.html)

.. [6] glibc C.UTF-8 locale proposal
   (https://sourceware.org/glibc/wiki/Proposals/C.UTF-8)

.. [7] GNOME Flatpak
   (http://flatpak.org/)

.. [8] Ubuntu Snappy
   (https://www.ubuntu.com/desktop/snappy)

.. [9] Pragmatic Unicode
   (http://nedbatchelder.com/text/unipain.html)

.. [10] linux-sig discussion of initial PEP draft
   (https://mail.python.org/pipermail/linux-sig/2017-January/000014.html)

.. [11] Feedback notes from linux-sig discussion and PEP 540
   (https://github.com/python/peps/issues/171)

.. [12] GB 18030
   (https://en.wikipedia.org/wiki/GB_18030)

.. [13] Shift-JIS
   (https://en.wikipedia.org/wiki/Shift_JIS)

.. [14] ISO-2022
   (https://en.wikipedia.org/wiki/ISO/IEC_2022)

.. [15] Use "surrogateescape" error handler for sys.stdin and sys.stdout on UNIX for the C locale
   (https://bugs.python.org/issue19977)

.. [16] test_readline.test_nonascii fails on Android
   (http://bugs.python.org/issue28997)

.. [17] UTF-8 locale discussion on "locale.getdefaultlocale() fails on Mac OS X with default language set to English"
   (http://bugs.python.org/issue18378#msg215215)

.. [18] GitHub branch diff for ``ncoghlan:pep538-coerce-c-locale``
   (https://github.com/python/cpython/compare/master...ncoghlan:pep538-coerce-c-locale)

.. [19] Fedora 26 change proposal for locale coercion backport
   (https://fedoraproject.org/wiki/Changes/python3_c.utf-8_locale)

.. [20] GitHub pull request for the reference implementation
   (https://github.com/python/cpython/pull/659)

.. [21] GNU readline misbehaviour on Mac OS X with ``LANG=C``
   (https://mail.python.org/pipermail/python-dev/2017-May/147897.html)

.. [22] Potential problems when setting LANG in addition to setting LC_CTYPE
   (https://mail.python.org/pipermail/python-dev/2017-May/147968.html)


Copyright
=========

This document has been placed in the public domain under the terms of the
CC0 1.0 license: https://creativecommons.org/publicdomain/zero/1.0/


..
   Local Variables:
   mode: indented-text
   indent-tabs-mode: nil
   sentence-end-double-space: t
   fill-column: 70
   coding: utf-8
   End:
-												PEP 538: coerce legacy C locale to C.UTF-8

											
										
										
											2016-12-27 21:31:21 -05:00
+								PEP: 538
-												PEP 538: Fix title and add new post date

											
										
										
											2017-03-05 02:35:19 -05:00
+								Title: Coercing the legacy C locale to a UTF-8 based locale
-												PEP 538: coerce legacy C locale to C.UTF-8

											
										
										
											2016-12-27 21:31:21 -05:00
+								Version: $Revision$
 								Last-Modified: $Date$
 								Author: Nick Coghlan <ncoghlan@gmail.com>
-												Update BDFL delegations for PEPs 538 & 540

- Barry stepping down due to lack of time
- Naoki Inada will handle both PEPs due to
  the significant overlap between them

											
										
										
											2017-04-24 00:33:34 -04:00
+								BDFL-Delegate: INADA Naoki
-												PEP 538 implementation has been merged

											
										
										
											2017-06-10 23:17:59 -04:00
+								Status: Final
-												PEP 538: coerce legacy C locale to C.UTF-8

											
										
										
											2016-12-27 21:31:21 -05:00
+								Type: Standards Track
 								Content-Type: text/x-rst
 								Created: 28-Dec-2016
 								Python-Version: 3.7
-												Several PEPs: Normalise `Post-History` (#2375)


											
										
										
											2022-03-09 11:04:44 -05:00
+								Post-History: 03-Jan-2017,
 -Jan-2017,
 -Mar-2017,
 -May-2017
-												Mark PEP 538 as Accepted

											
										
										
											2017-05-28 02:53:44 -04:00
+								Resolution: https://mail.python.org/pipermail/python-dev/2017-May/148035.html
-												PEP 538: coerce legacy C locale to C.UTF-8

											
										
										
											2016-12-27 21:31:21 -05:00
 								Abstract
 								========
 								An ongoing challenge with Python 3 on \*nix systems is the conflict between
 								needing to use the configured locale encoding by default for consistency with
-												pep-0538: rephrase UTF-8 locale description (#249)


											
										
										
											2017-05-01 02:26:50 -04:00
+								other locale-aware components in the same process or subprocesses,
-												PEP 538: Update to depend on PEP 540

- relies entirely on PEP 540 when no appropriate locale
  is available
- uses surrogateescape on standard streams by default
- accounts for BSD-style UTF-8 locales
- avoids any reliance on the en_US-UTF-8 locale
- makes note of related GNU readline issue on Android

											
										
										
											2017-01-20 09:13:24 -05:00
+								and the fact that the standard C locale (as defined in POSIX:2001) typically
 								implies a default text encoding of ASCII, which is entirely inadequate for the
-												PEP 538: update for PEP 540 & linux-sig feedback

- PYTHONALLOWCLOCALE=1 -> PYTHONCOERCECLOCALE=0
- reword the proposed library warning
- try all of C.UTF-8, c.utf8 and en_US.UTF-8
- compare and contrast with PEP 540
- new Motivation section showing specific Docker problems
- discuss implications of "strict" error handling
- define configure options to turn the new behaviour off

											
										
										
											2017-01-07 02:04:39 -05:00
+								development of networked services and client applications in a multilingual
 								world.
-												Several PEPs: Use explicit `:pep:` and `:rfc:` roles (#2209)


											
										
										
											2022-01-21 06:03:51 -05:00
+								:pep:`540` proposes a change to CPython's handling of the legacy C locale such
-												PEP 538: Update to depend on PEP 540

- relies entirely on PEP 540 when no appropriate locale
  is available
- uses surrogateescape on standard streams by default
- accounts for BSD-style UTF-8 locales
- avoids any reliance on the en_US-UTF-8 locale
- makes note of related GNU readline issue on Android

											
										
										
											2017-01-20 09:13:24 -05:00
+								that CPython will assume the use of UTF-8 in such environments, rather than
 								persisting with the demonstrably problematic assumption of ASCII as an
 								appropriate encoding for communicating with operating system interfaces.
-												PEP 538: Update reference implementation (#219)

- updates reference implementation to use PYTHONCOERCECLOCALE
- removes hard dependency on PEP 540
- still notes PEP 540 covers case where no relevant C-with-UTF-8
  locale is available
- clarifies that these settings are still recommended over the
  legacy C locale settings for older Python 3 versions, even if
  we don't recommend backporting the automatic coercion
											
										
										
											2017-03-05 02:29:54 -05:00
+								This is a good approach for cases where network encoding interoperability
 								is a more important concern than local encoding interoperability.
-												PEP 538: Update to depend on PEP 540

- relies entirely on PEP 540 when no appropriate locale
  is available
- uses surrogateescape on standard streams by default
- accounts for BSD-style UTF-8 locales
- avoids any reliance on the en_US-UTF-8 locale
- makes note of related GNU readline issue on Android

											
										
										
											2017-01-20 09:13:24 -05:00
 								However, it comes at the cost of making CPython's encoding assumptions diverge
-												PEP 538: update for python-dev & implementation feedback

- PYTHONCOERCECLOCALE=0 now also disables the library warning
- PEP just refers to locale-aware/locale-independent components,
  without specifically limiting that to C/C++ components

											
										
										
											2017-03-13 01:06:48 -04:00
+								from those of other locale-aware components in the same process, as well as
 								those of components running in subprocesses that share the same environment.
-												PEP 538: Update to depend on PEP 540

- relies entirely on PEP 540 when no appropriate locale
  is available
- uses surrogateescape on standard streams by default
- accounts for BSD-style UTF-8 locales
- avoids any reliance on the en_US-UTF-8 locale
- makes note of related GNU readline issue on Android

											
										
										
											2017-01-20 09:13:24 -05:00
-												PEP 538 updates for python-dev review

* Tidy up the abstract and emphasise the equivalence between
  this proposal and long supported configuration settings
* Don't set LC_ALL (set LC_CTYPE instead)
* Add a rationale for that change
* Use GNU readline misbehaviour as a specific example of the
  benefits of reconfiguring the locale
* Clarify rationale for enabling the changes by default on all
  autotools-using platforms
* Mention the possibility of exposing a public API for use by
  embedding platforms

											
										
										
											2017-05-09 06:46:59 -04:00
+								This can cause interoperability problems with some extension modules (such as
 								GNU readline's command line history editing), as well as with components
 								running in subprocesses (such as older Python runtimes).
-												PEP 538: Update for latest python-dev discussion

* default standard stream error handler is always "surrogateescape"
  for the potential coercion target locales
* PEP 540 is now a purely optional follow-on PEP that improves the
  handling of cases where none of these locales are available,
  but doesn't require revisiting the changes made for this PEP
* the locale coercion and warning behaviours are now enabled by
  default for all \*nix platforms, even Mac OS X
* covered the Android-specific changes to the use of `setlocale`
* state explicitly that we're aware this makes the behaviour
  of standalone CPython and embedded CPython diverge, we just think
  the potential benefits are sufficient to accept that downside
* note the reference implementation has yet to be updated with
  these changes

											
										
										
											2017-05-06 02:58:19 -04:00
+								It also requires non-trivial changes to the internals of how CPython itself
 								works, rather than relying primarily on existing configuration settings that
 								are supported by Python versions prior to Python 3.7.
-												PEP 538: Update reference implementation (#219)

- updates reference implementation to use PYTHONCOERCECLOCALE
- removes hard dependency on PEP 540
- still notes PEP 540 covers case where no relevant C-with-UTF-8
  locale is available
- clarifies that these settings are still recommended over the
  legacy C locale settings for older Python 3 versions, even if
  we don't recommend backporting the automatic coercion
											
										
										
											2017-03-05 02:29:54 -05:00
 								Accordingly, this PEP proposes that independently of the UTF-8 mode proposed
-												Several PEPs: Use explicit `:pep:` and `:rfc:` roles (#2209)


											
										
										
											2022-01-21 06:03:51 -05:00
+								in :pep:`540`, the way the CPython implementation handles the default C locale be
-												PEP 538 updates for python-dev review

* Tidy up the abstract and emphasise the equivalence between
  this proposal and long supported configuration settings
* Don't set LC_ALL (set LC_CTYPE instead)
* Add a rationale for that change
* Use GNU readline misbehaviour as a specific example of the
  benefits of reconfiguring the locale
* Clarify rationale for enabling the changes by default on all
  autotools-using platforms
* Mention the possibility of exposing a public API for use by
  embedding platforms

											
										
										
											2017-05-09 06:46:59 -04:00
+								changed to be roughly equivalent to the following existing configuration
 								settings (supported since Python 3.1)::
 								    LC_CTYPE=C.UTF-8
 								    PYTHONIOENCODING=utf-8:surrogateescape
 								The exact target locale for coercion will be chosen from a predefined list at
 								runtime based on the actually available locales.
 								The reinterpreted locale settings will be written back to the environment so
 								they're visible to other components in the same process and in subprocesses,
 								but the changed ``PYTHONIOENCODING`` default will be made implicit in order to
 								avoid causing compatibility problems with Python 2 subprocesses that don't
 								provide the ``surrogateescape`` error handler.
 								The new legacy locale coercion behavior can be disabled either by setting
 								``LC_ALL`` (which may still lead to a Unicode compatibility warning) or by
 								setting the new ``PYTHONCOERCECLOCALE`` environment variable to ``0``.
-												PEP 538: update for PEP 540 & linux-sig feedback

- PYTHONALLOWCLOCALE=1 -> PYTHONCOERCECLOCALE=0
- reword the proposed library warning
- try all of C.UTF-8, c.utf8 and en_US.UTF-8
- compare and contrast with PEP 540
- new Motivation section showing specific Docker problems
- discuss implications of "strict" error handling
- define configure options to turn the new behaviour off

											
										
										
											2017-01-07 02:04:39 -05:00
 								With this change, any \*nix platform that does *not* offer at least one of the
-												PEP 538: Update to depend on PEP 540

- relies entirely on PEP 540 when no appropriate locale
  is available
- uses surrogateescape on standard streams by default
- accounts for BSD-style UTF-8 locales
- avoids any reliance on the en_US-UTF-8 locale
- makes note of related GNU readline issue on Android

											
										
										
											2017-01-20 09:13:24 -05:00
+								``C.UTF-8``, ``C.utf8`` or ``UTF-8`` locales as part of its standard
-												PEP 538: update for PEP 540 & linux-sig feedback

- PYTHONALLOWCLOCALE=1 -> PYTHONCOERCECLOCALE=0
- reword the proposed library warning
- try all of C.UTF-8, c.utf8 and en_US.UTF-8
- compare and contrast with PEP 540
- new Motivation section showing specific Docker problems
- discuss implications of "strict" error handling
- define configure options to turn the new behaviour off

											
										
										
											2017-01-07 02:04:39 -05:00
+								configuration would only be considered a fully supported platform for CPython
-												PEP 538: Update for latest python-dev discussion

* default standard stream error handler is always "surrogateescape"
  for the potential coercion target locales
* PEP 540 is now a purely optional follow-on PEP that improves the
  handling of cases where none of these locales are available,
  but doesn't require revisiting the changes made for this PEP
* the locale coercion and warning behaviours are now enabled by
  default for all \*nix platforms, even Mac OS X
* covered the Android-specific changes to the use of `setlocale`
* state explicitly that we're aware this makes the behaviour
  of standalone CPython and embedded CPython diverge, we just think
  the potential benefits are sufficient to accept that downside
* note the reference implementation has yet to be updated with
  these changes

											
										
										
											2017-05-06 02:58:19 -04:00
+.7+ deployments when a suitable locale other than the default ``C`` locale is
-												Several PEPs: Use explicit `:pep:` and `:rfc:` roles (#2209)


											
										
										
											2022-01-21 06:03:51 -05:00
+								configured explicitly (e.g. ``en_AU.UTF-8``, ``zh_CN.gb18030``). If :pep:`540` is
-												PEP 538 updates for python-dev review

* Tidy up the abstract and emphasise the equivalence between
  this proposal and long supported configuration settings
* Don't set LC_ALL (set LC_CTYPE instead)
* Add a rationale for that change
* Use GNU readline misbehaviour as a specific example of the
  benefits of reconfiguring the locale
* Clarify rationale for enabling the changes by default on all
  autotools-using platforms
* Mention the possibility of exposing a public API for use by
  embedding platforms

											
										
										
											2017-05-09 06:46:59 -04:00
+								accepted in addition to this PEP, then pure Python modules would also be
 								supported when using the proposed ``PYTHONUTF8`` mode, but expectations for
 								full Unicode compatibility in extension modules would continue to be limited
 								to the platforms covered by this PEP.
-												PEP 538: coerce legacy C locale to C.UTF-8

											
										
										
											2016-12-27 21:31:21 -05:00
-												PEP 538 updates for python-dev review

* Tidy up the abstract and emphasise the equivalence between
  this proposal and long supported configuration settings
* Don't set LC_ALL (set LC_CTYPE instead)
* Add a rationale for that change
* Use GNU readline misbehaviour as a specific example of the
  benefits of reconfiguring the locale
* Clarify rationale for enabling the changes by default on all
  autotools-using platforms
* Mention the possibility of exposing a public API for use by
  embedding platforms

											
										
										
											2017-05-09 06:46:59 -04:00
+								As it only reflects a change in default settings rather than a fundamentally
 								new capability, redistributors (such as Linux distributions) with a narrower
 								target audience than the upstream CPython development team may also choose to
 								opt in to this locale coercion behaviour for the Python 3.6.x series by
 								applying the necessary changes as a downstream patch.
-												PEP 538: coerce legacy C locale to C.UTF-8

											
										
										
											2016-12-27 21:31:21 -05:00
-												PEP 538: Note significant implementation-driven changes

											
										
										
											2017-06-17 22:01:45 -04:00
+								Implementation Notes
 								====================
 								Attempting to implement the PEP as originally accepted showed that the
 								proposal to emit locale coercion and compatibility warnings by default
 								simply wasn't practical (there were too many cases where previously working
 								code failed *because of the warnings*, rather than because of latent locale
 								handling defects in the affected code).
 								As a result, the ``PY_WARN_ON_C_LOCALE`` config flag was removed, and replaced
 								with a runtime ``PYTHONCOERCECLOCALE=warn`` environment variable setting
 								that allows developers and system integrators to opt-in to receiving locale
 								coercion and compatibility warnings, without emitting them by default.
-												PEP 538: Remove warnings from examples

PEP 538 is likely to be read by folks trying to understand the
implications of locale coercion in Python 3.7.

While I don't want to lose the history that we originally approved
and implemented "warn by default" behaviour, the examples are
easier to read if the warnings are omitted.

I also added inline notes for the sections affected by the change
in how the warning notifications are handled, since the implementation
note at the top is easy to miss when following a direct link to a specific
section.
											
										
										
											2018-03-29 10:05:50 -04:00
+								The output examples in the PEP itself have also been updated to remove
 								the warnings and make them easier to read.
-												PEP 538: Note significant implementation-driven changes

											
										
										
											2017-06-17 22:01:45 -04:00
-												PEP 538: add Background section on locale handling

											
										
										
											2017-01-03 00:19:37 -05:00
+								Background
 								==========
 								While the CPython interpreter is starting up, it may need to convert from
 								the ``char *`` format to the ``wchar_t *`` format, or from one of those formats
-												PEP 538: Update to depend on PEP 540

- relies entirely on PEP 540 when no appropriate locale
  is available
- uses surrogateescape on standard streams by default
- accounts for BSD-style UTF-8 locales
- avoids any reliance on the en_US-UTF-8 locale
- makes note of related GNU readline issue on Android

											
										
										
											2017-01-20 09:13:24 -05:00
+								to ``PyUnicodeObject *``, in a way that's consistent with the locale settings
 								of the overall system. It handles these cases by relying on the operating
 								system to do the conversion and then ensuring that the text encoding name
 								reported by ``sys.getfilesystemencoding()`` matches the encoding used during
 								this early bootstrapping process.
-												PEP 538: add Background section on locale handling

											
										
										
											2017-01-03 00:19:37 -05:00
 								On Windows, the limitations of the ``mbcs`` format used by default in these
-												Several PEPs: Use explicit `:pep:` and `:rfc:` roles (#2209)


											
										
										
											2022-01-21 06:03:51 -05:00
+								conversions proved sufficiently problematic that :pep:`528` and :pep:`529` were
-												PEP 538: add Background section on locale handling

											
										
										
											2017-01-03 00:19:37 -05:00
+								implemented to bypass the operating system supplied interfaces for binary data
 								handling and force the use of UTF-8 instead.
-												PEP 538 updates for python-dev review

* Tidy up the abstract and emphasise the equivalence between
  this proposal and long supported configuration settings
* Don't set LC_ALL (set LC_CTYPE instead)
* Add a rationale for that change
* Use GNU readline misbehaviour as a specific example of the
  benefits of reconfiguring the locale
* Clarify rationale for enabling the changes by default on all
  autotools-using platforms
* Mention the possibility of exposing a public API for use by
  embedding platforms

											
										
										
											2017-05-09 06:46:59 -04:00
+								On Mac OS X, iOS, and Android, many components, including CPython, already
 								assume the use of UTF-8 as the system encoding, regardless of the locale
 								setting. However, this isn't the case for all components, and the discrepancy
 								can cause problems in some situations (for example, when using the GNU readline
 								module [16_]).
-												PEP 538: update for PEP 540 & linux-sig feedback

- PYTHONALLOWCLOCALE=1 -> PYTHONCOERCECLOCALE=0
- reword the proposed library warning
- try all of C.UTF-8, c.utf8 and en_US.UTF-8
- compare and contrast with PEP 540
- new Motivation section showing specific Docker problems
- discuss implications of "strict" error handling
- define configure options to turn the new behaviour off

											
										
										
											2017-01-07 02:04:39 -05:00
-												PEP 538: Update to depend on PEP 540

- relies entirely on PEP 540 when no appropriate locale
  is available
- uses surrogateescape on standard streams by default
- accounts for BSD-style UTF-8 locales
- avoids any reliance on the en_US-UTF-8 locale
- makes note of related GNU readline issue on Android

											
										
										
											2017-01-20 09:13:24 -05:00
+								On non-Apple and non-Android \*nix systems, these operations are handled using
 								the C locale system in glibc, which has the following characteristics [4_]:
-												PEP 538: add Background section on locale handling

											
										
										
											2017-01-03 00:19:37 -05:00
 								* by default, all processes start in the ``C`` locale, which uses ``ASCII``
 								  for these conversions. This is almost never what anyone doing multilingual
-												PEP 538: update for PEP 540 & linux-sig feedback

- PYTHONALLOWCLOCALE=1 -> PYTHONCOERCECLOCALE=0
- reword the proposed library warning
- try all of C.UTF-8, c.utf8 and en_US.UTF-8
- compare and contrast with PEP 540
- new Motivation section showing specific Docker problems
- discuss implications of "strict" error handling
- define configure options to turn the new behaviour off

											
										
										
											2017-01-07 02:04:39 -05:00
+								  text processing actually wants (including CPython and C/C++ GUI frameworks).
-												PEP 538: add Background section on locale handling

											
										
										
											2017-01-03 00:19:37 -05:00
+								* calling ``setlocale(LC_ALL, "")`` reconfigures the active locale based on
 								  the locale categories configured in the current process environment
 								* if the locale requested by the current environment is unknown, or no specific
 								  locale is configured, then the default ``C`` locale will remain active
 								The specific locale category that covers the APIs that CPython depends on is
 								``LC_CTYPE``, which applies to "classification and conversion of characters,
 								and to multibyte and wide characters" [5_]. Accordingly, CPython includes the
 								following key calls to ``setlocale``:
-												PEP 538: update for PEP 540 & linux-sig feedback

- PYTHONALLOWCLOCALE=1 -> PYTHONCOERCECLOCALE=0
- reword the proposed library warning
- try all of C.UTF-8, c.utf8 and en_US.UTF-8
- compare and contrast with PEP 540
- new Motivation section showing specific Docker problems
- discuss implications of "strict" error handling
- define configure options to turn the new behaviour off

											
										
										
											2017-01-07 02:04:39 -05:00
+								* in the main ``python`` binary, CPython calls ``setlocale(LC_ALL, "")`` to
 								  configure the entire C locale subsystem according to the process environment.
 								  It does this prior to making any calls into the shared CPython library
-												PEP 538: add Background section on locale handling

											
										
										
											2017-01-03 00:19:37 -05:00
+								* in ``Py_Initialize``, CPython calls ``setlocale(LC_CTYPE, "")``, such that
 								  the configured locale settings for that category *always* match those set in
 								  the environment. It does this unconditionally, and it *doesn't* revert the
 								  process state change in ``Py_Finalize``
-												PEP 538: update for PEP 540 & linux-sig feedback

- PYTHONALLOWCLOCALE=1 -> PYTHONCOERCECLOCALE=0
- reword the proposed library warning
- try all of C.UTF-8, c.utf8 and en_US.UTF-8
- compare and contrast with PEP 540
- new Motivation section showing specific Docker problems
- discuss implications of "strict" error handling
- define configure options to turn the new behaviour off

											
										
										
											2017-01-07 02:04:39 -05:00
 								(This summary of the locale handling omits several technical details related
 								to exactly where and when the text encoding declared as part of the locale
-												Several PEPs: Use explicit `:pep:` and `:rfc:` roles (#2209)


											
										
										
											2022-01-21 06:03:51 -05:00
+								settings is used - see :pep:`540` for further discussion, as these particular
-												PEP 538: update for PEP 540 & linux-sig feedback

- PYTHONALLOWCLOCALE=1 -> PYTHONCOERCECLOCALE=0
- reword the proposed library warning
- try all of C.UTF-8, c.utf8 and en_US.UTF-8
- compare and contrast with PEP 540
- new Motivation section showing specific Docker problems
- discuss implications of "strict" error handling
- define configure options to turn the new behaviour off

											
										
										
											2017-01-07 02:04:39 -05:00
+								details matter more when decoupling CPython from the declared C locale than
 								they do when overriding the locale with one based on UTF-8)
-												PEP 538: add Background section on locale handling

											
										
										
											2017-01-03 00:19:37 -05:00
 								These calls are usually sufficient to provide sensible behaviour, but they can
 								still fail in the following cases:
-												PEP 538: Update to depend on PEP 540

- relies entirely on PEP 540 when no appropriate locale
  is available
- uses surrogateescape on standard streams by default
- accounts for BSD-style UTF-8 locales
- avoids any reliance on the en_US-UTF-8 locale
- makes note of related GNU readline issue on Android

											
										
										
											2017-01-20 09:13:24 -05:00
+								* SSH environment forwarding means that SSH clients may sometimes forward
-												PEP 538: update for PEP 540 & linux-sig feedback

- PYTHONALLOWCLOCALE=1 -> PYTHONCOERCECLOCALE=0
- reword the proposed library warning
- try all of C.UTF-8, c.utf8 and en_US.UTF-8
- compare and contrast with PEP 540
- new Motivation section showing specific Docker problems
- discuss implications of "strict" error handling
- define configure options to turn the new behaviour off

											
										
										
											2017-01-07 02:04:39 -05:00
+								  client locale settings to servers that don't have that locale installed. This
-												PEP 538: Update reference implementation (#219)

- updates reference implementation to use PYTHONCOERCECLOCALE
- removes hard dependency on PEP 540
- still notes PEP 540 covers case where no relevant C-with-UTF-8
  locale is available
- clarifies that these settings are still recommended over the
  legacy C locale settings for older Python 3 versions, even if
  we don't recommend backporting the automatic coercion
											
										
										
											2017-03-05 02:29:54 -05:00
+								  leads to CPython running in the default ASCII-based C locale
-												PEP 538: add Background section on locale handling

											
										
										
											2017-01-03 00:19:37 -05:00
+								* some process environments (such as Linux containers) may not have any
-												PEP 538: update for PEP 540 & linux-sig feedback

- PYTHONALLOWCLOCALE=1 -> PYTHONCOERCECLOCALE=0
- reword the proposed library warning
- try all of C.UTF-8, c.utf8 and en_US.UTF-8
- compare and contrast with PEP 540
- new Motivation section showing specific Docker problems
- discuss implications of "strict" error handling
- define configure options to turn the new behaviour off

											
										
										
											2017-01-07 02:04:39 -05:00
+								  explicit locale configured at all. As with unknown locales, this leads to
 								  CPython running in the default ASCII-based C locale
-												PEP 538: Update for latest python-dev discussion

* default standard stream error handler is always "surrogateescape"
  for the potential coercion target locales
* PEP 540 is now a purely optional follow-on PEP that improves the
  handling of cases where none of these locales are available,
  but doesn't require revisiting the changes made for this PEP
* the locale coercion and warning behaviours are now enabled by
  default for all \*nix platforms, even Mac OS X
* covered the Android-specific changes to the use of `setlocale`
* state explicitly that we're aware this makes the behaviour
  of standalone CPython and embedded CPython diverge, we just think
  the potential benefits are sufficient to accept that downside
* note the reference implementation has yet to be updated with
  these changes

											
										
										
											2017-05-06 02:58:19 -04:00
+								* on Android, rather than configuring the locale based on environment variables,
 								  the empty locale ``""`` is treated as specifically requesting the ``"C"``
 								  locale
-												PEP 538: update for PEP 540 & linux-sig feedback

- PYTHONALLOWCLOCALE=1 -> PYTHONCOERCECLOCALE=0
- reword the proposed library warning
- try all of C.UTF-8, c.utf8 and en_US.UTF-8
- compare and contrast with PEP 540
- new Motivation section showing specific Docker problems
- discuss implications of "strict" error handling
- define configure options to turn the new behaviour off

											
										
										
											2017-01-07 02:04:39 -05:00
 								The simplest way to deal with this problem for currently released versions of
 								CPython is to explicitly set a more sensible locale when launching the
 								application. For example::
-												PEP 538: Only set LC_CTYPE, never LANG

It looks like setting LANG may have undesirable
side effects in some cases, and all the issues
the PEP aims to handle are resolved by setting
LC_CTYPE.

The proposal and implementation have thus been
updated to only set LC_CTYPE, even when the
target coercion locale is a full locale.

											
										
										
											2017-05-27 03:08:32 -04:00
+								    LC_CTYPE=C.UTF-8 python3 ...
-												PEP 538: update for PEP 540 & linux-sig feedback

- PYTHONALLOWCLOCALE=1 -> PYTHONCOERCECLOCALE=0
- reword the proposed library warning
- try all of C.UTF-8, c.utf8 and en_US.UTF-8
- compare and contrast with PEP 540
- new Motivation section showing specific Docker problems
- discuss implications of "strict" error handling
- define configure options to turn the new behaviour off

											
										
										
											2017-01-07 02:04:39 -05:00
-												PEP 538: Update to depend on PEP 540

- relies entirely on PEP 540 when no appropriate locale
  is available
- uses surrogateescape on standard streams by default
- accounts for BSD-style UTF-8 locales
- avoids any reliance on the en_US-UTF-8 locale
- makes note of related GNU readline issue on Android

											
										
										
											2017-01-20 09:13:24 -05:00
+								The ``C.UTF-8`` locale is a full locale definition that uses ``UTF-8`` for the
 								``LC_CTYPE`` category, and the same settings as the ``C`` locale for all other
 								categories (including ``LC_COLLATE``). It is offered by a number of Linux
 								distributions (including Debian, Ubuntu, Fedora, Alpine and Android) as an
-												PEP 538 updates for python-dev review

* Tidy up the abstract and emphasise the equivalence between
  this proposal and long supported configuration settings
* Don't set LC_ALL (set LC_CTYPE instead)
* Add a rationale for that change
* Use GNU readline misbehaviour as a specific example of the
  benefits of reconfiguring the locale
* Clarify rationale for enabling the changes by default on all
  autotools-using platforms
* Mention the possibility of exposing a public API for use by
  embedding platforms

											
										
										
											2017-05-09 06:46:59 -04:00
+								alternative to the ASCII-based C locale. Some other platforms (such as
 								``HP-UX``) offer an equivalent locale definition under the name ``C.utf8``.
-												PEP 538: Update to depend on PEP 540

- relies entirely on PEP 540 when no appropriate locale
  is available
- uses surrogateescape on standard streams by default
- accounts for BSD-style UTF-8 locales
- avoids any reliance on the en_US-UTF-8 locale
- makes note of related GNU readline issue on Android

											
										
										
											2017-01-20 09:13:24 -05:00
-												pep-0538: rephrase UTF-8 locale description (#249)


											
										
										
											2017-05-01 02:26:50 -04:00
+								Mac OS X and other \*BSD systems have taken a different approach: instead of
-												PEP 538 updates for python-dev review

* Tidy up the abstract and emphasise the equivalence between
  this proposal and long supported configuration settings
* Don't set LC_ALL (set LC_CTYPE instead)
* Add a rationale for that change
* Use GNU readline misbehaviour as a specific example of the
  benefits of reconfiguring the locale
* Clarify rationale for enabling the changes by default on all
  autotools-using platforms
* Mention the possibility of exposing a public API for use by
  embedding platforms

											
										
										
											2017-05-09 06:46:59 -04:00
+								offering a ``C.UTF-8`` locale, they offer a partial ``UTF-8`` locale that only
-												pep-0538: rephrase UTF-8 locale description (#249)


											
										
										
											2017-05-01 02:26:50 -04:00
+								defines the ``LC_CTYPE`` category. On such systems, the preferred
-												PEP 538: Update to depend on PEP 540

- relies entirely on PEP 540 when no appropriate locale
  is available
- uses surrogateescape on standard streams by default
- accounts for BSD-style UTF-8 locales
- avoids any reliance on the en_US-UTF-8 locale
- makes note of related GNU readline issue on Android

											
										
										
											2017-01-20 09:13:24 -05:00
+								environmental locale adjustment is to set ``LC_CTYPE=UTF-8`` rather than to set
 								``LC_ALL`` or ``LANG``. [17_]
-												PEP 538: update for PEP 540 & linux-sig feedback

- PYTHONALLOWCLOCALE=1 -> PYTHONCOERCECLOCALE=0
- reword the proposed library warning
- try all of C.UTF-8, c.utf8 and en_US.UTF-8
- compare and contrast with PEP 540
- new Motivation section showing specific Docker problems
- discuss implications of "strict" error handling
- define configure options to turn the new behaviour off

											
										
										
											2017-01-07 02:04:39 -05:00
+								In the specific case of Docker containers and similar technologies, the
 								appropriate locale setting can be specified directly in the container image
 								definition.
 								Another common failure case is developers specifying ``LANG=C`` in order to
 								see otherwise translated user interface messages in English, rather than the
-												PEP 538: update for python-dev & implementation feedback

- PYTHONCOERCECLOCALE=0 now also disables the library warning
- PEP just refers to locale-aware/locale-independent components,
  without specifically limiting that to C/C++ components

											
										
										
											2017-03-13 01:06:48 -04:00
+								more narrowly scoped ``LC_MESSAGES=C`` or ``LANGUAGE=en``.
-												PEP 538: update for PEP 540 & linux-sig feedback

- PYTHONALLOWCLOCALE=1 -> PYTHONCOERCECLOCALE=0
- reword the proposed library warning
- try all of C.UTF-8, c.utf8 and en_US.UTF-8
- compare and contrast with PEP 540
- new Motivation section showing specific Docker problems
- discuss implications of "strict" error handling
- define configure options to turn the new behaviour off

											
										
										
											2017-01-07 02:04:39 -05:00
 								Relationship with other PEPs
 								============================
-												Several PEPs: Use explicit `:pep:` and `:rfc:` roles (#2209)


											
										
										
											2022-01-21 06:03:51 -05:00
+								This PEP shares a common problem statement with :pep:`540` (improving Python 3's
-												PEP 538: Update reference implementation (#219)

- updates reference implementation to use PYTHONCOERCECLOCALE
- removes hard dependency on PEP 540
- still notes PEP 540 covers case where no relevant C-with-UTF-8
  locale is available
- clarifies that these settings are still recommended over the
  legacy C locale settings for older Python 3 versions, even if
  we don't recommend backporting the automatic coercion
											
										
										
											2017-03-05 02:29:54 -05:00
+								behaviour in the default C locale), but diverges markedly in the proposed
-												PEP 538: update for PEP 540 & linux-sig feedback

- PYTHONALLOWCLOCALE=1 -> PYTHONCOERCECLOCALE=0
- reword the proposed library warning
- try all of C.UTF-8, c.utf8 and en_US.UTF-8
- compare and contrast with PEP 540
- new Motivation section showing specific Docker problems
- discuss implications of "strict" error handling
- define configure options to turn the new behaviour off

											
										
										
											2017-01-07 02:04:39 -05:00
+								solution:
-												Several PEPs: Use explicit `:pep:` and `:rfc:` roles (#2209)


											
										
										
											2022-01-21 06:03:51 -05:00
+								* :pep:`540` proposes to entirely decouple CPython's default text encoding from
-												PEP 538: update for PEP 540 & linux-sig feedback

- PYTHONALLOWCLOCALE=1 -> PYTHONCOERCECLOCALE=0
- reword the proposed library warning
- try all of C.UTF-8, c.utf8 and en_US.UTF-8
- compare and contrast with PEP 540
- new Motivation section showing specific Docker problems
- discuss implications of "strict" error handling
- define configure options to turn the new behaviour off

											
										
										
											2017-01-07 02:04:39 -05:00
+								  the C locale system in that case, allowing text handling inconsistencies to
-												Fix a couple of typos.

											
										
										
											2017-03-28 18:16:44 -04:00
+								  arise between CPython and other locale-aware components running in the same
-												PEP 538: update for python-dev & implementation feedback

- PYTHONCOERCECLOCALE=0 now also disables the library warning
- PEP just refers to locale-aware/locale-independent components,
  without specifically limiting that to C/C++ components

											
										
										
											2017-03-13 01:06:48 -04:00
+								  process and in subprocesses. This approach aims to make CPython behave less
 								  like a locale-aware application, and more like locale-independent language
-												PEP 538: Minor JVM/CLR related clarifications

											
										
										
											2017-05-04 10:59:53 -04:00
+								  runtimes like those for Go, Node.js (V8), and Rust
-												PEP 538: Update to depend on PEP 540

- relies entirely on PEP 540 when no appropriate locale
  is available
- uses surrogateescape on standard streams by default
- accounts for BSD-style UTF-8 locales
- avoids any reliance on the en_US-UTF-8 locale
- makes note of related GNU readline issue on Android

											
										
										
											2017-01-20 09:13:24 -05:00
+								* this PEP proposes to override the legacy C locale with a more recently
-												PEP 538: update for PEP 540 & linux-sig feedback

- PYTHONALLOWCLOCALE=1 -> PYTHONCOERCECLOCALE=0
- reword the proposed library warning
- try all of C.UTF-8, c.utf8 and en_US.UTF-8
- compare and contrast with PEP 540
- new Motivation section showing specific Docker problems
- discuss implications of "strict" error handling
- define configure options to turn the new behaviour off

											
										
										
											2017-01-07 02:04:39 -05:00
+								  defined locale that uses UTF-8 as its default text encoding. This means that
 								  the text encoding override will apply not only to CPython, but also to any
-												PEP 538: update for python-dev & implementation feedback

- PYTHONCOERCECLOCALE=0 now also disables the library warning
- PEP just refers to locale-aware/locale-independent components,
  without specifically limiting that to C/C++ components

											
										
										
											2017-03-13 01:06:48 -04:00
+								  locale-aware extension modules loaded into the current process, as well as to
 								  locale-aware applications invoked in subprocesses that inherit their
-												PEP 538: update for PEP 540 & linux-sig feedback

- PYTHONALLOWCLOCALE=1 -> PYTHONCOERCECLOCALE=0
- reword the proposed library warning
- try all of C.UTF-8, c.utf8 and en_US.UTF-8
- compare and contrast with PEP 540
- new Motivation section showing specific Docker problems
- discuss implications of "strict" error handling
- define configure options to turn the new behaviour off

											
										
										
											2017-01-07 02:04:39 -05:00
+								  environment from the parent process. This approach aims to retain CPython's
-												PEP 538: update for python-dev & implementation feedback

- PYTHONCOERCECLOCALE=0 now also disables the library warning
- PEP just refers to locale-aware/locale-independent components,
  without specifically limiting that to C/C++ components

											
										
										
											2017-03-13 01:06:48 -04:00
+								  traditional strong support for integration with other locale-aware components
 								  while also actively helping to push forward the adoption and standardisation
 								  of the C.UTF-8 locale as a Unicode-aware replacement for the legacy C locale
 								  in the wider C/C++ ecosystem
-												PEP 538: update for PEP 540 & linux-sig feedback

- PYTHONALLOWCLOCALE=1 -> PYTHONCOERCECLOCALE=0
- reword the proposed library warning
- try all of C.UTF-8, c.utf8 and en_US.UTF-8
- compare and contrast with PEP 540
- new Motivation section showing specific Docker problems
- discuss implications of "strict" error handling
- define configure options to turn the new behaviour off

											
										
										
											2017-01-07 02:04:39 -05:00
-												PEP 538: Update to depend on PEP 540

- relies entirely on PEP 540 when no appropriate locale
  is available
- uses surrogateescape on standard streams by default
- accounts for BSD-style UTF-8 locales
- avoids any reliance on the en_US-UTF-8 locale
- makes note of related GNU readline issue on Android

											
										
										
											2017-01-20 09:13:24 -05:00
+								After reviewing both PEPs, it became clear that they didn't actually conflict
-												Several PEPs: Use explicit `:pep:` and `:rfc:` roles (#2209)


											
										
										
											2022-01-21 06:03:51 -05:00
+								at a technical level, and the proposal in :pep:`540` offered a superior option in
-												PEP 538: Add missing words

											
										
										
											2017-01-20 09:35:51 -05:00
+								cases where no suitable locale was available, as well as offering a better
-												PEP 538: Update to depend on PEP 540

- relies entirely on PEP 540 when no appropriate locale
  is available
- uses surrogateescape on standard streams by default
- accounts for BSD-style UTF-8 locales
- avoids any reliance on the en_US-UTF-8 locale
- makes note of related GNU readline issue on Android

											
										
										
											2017-01-20 09:13:24 -05:00
+								reference behaviour for platforms where the notion of a "locale encoding"
 								doesn't make sense (for example, embedded systems running MicroPython rather
-												PEP 538: Add missing words

											
										
										
											2017-01-20 09:35:51 -05:00
+								than the CPython reference interpreter).
-												PEP 538: update for PEP 540 & linux-sig feedback

- PYTHONALLOWCLOCALE=1 -> PYTHONCOERCECLOCALE=0
- reword the proposed library warning
- try all of C.UTF-8, c.utf8 and en_US.UTF-8
- compare and contrast with PEP 540
- new Motivation section showing specific Docker problems
- discuss implications of "strict" error handling
- define configure options to turn the new behaviour off

											
										
										
											2017-01-07 02:04:39 -05:00
-												PEP 538: update for python-dev & implementation feedback

- PYTHONCOERCECLOCALE=0 now also disables the library warning
- PEP just refers to locale-aware/locale-independent components,
  without specifically limiting that to C/C++ components

											
										
										
											2017-03-13 01:06:48 -04:00
+								Meanwhile, this PEP offered improved compatibility with other locale-aware
 								components, and an approach more amenable to being backported to Python 3.6
 								by downstream redistributors.
-												PEP 538: Update reference implementation (#219)

- updates reference implementation to use PYTHONCOERCECLOCALE
- removes hard dependency on PEP 540
- still notes PEP 540 covers case where no relevant C-with-UTF-8
  locale is available
- clarifies that these settings are still recommended over the
  legacy C locale settings for older Python 3 versions, even if
  we don't recommend backporting the automatic coercion
											
										
										
											2017-03-05 02:29:54 -05:00
-												Several PEPs: Use explicit `:pep:` and `:rfc:` roles (#2209)


											
										
										
											2022-01-21 06:03:51 -05:00
+								As a result, this PEP was amended to refer to :pep:`540` as a complementary
-												PEP 538: Update for latest python-dev discussion

* default standard stream error handler is always "surrogateescape"
  for the potential coercion target locales
* PEP 540 is now a purely optional follow-on PEP that improves the
  handling of cases where none of these locales are available,
  but doesn't require revisiting the changes made for this PEP
* the locale coercion and warning behaviours are now enabled by
  default for all \*nix platforms, even Mac OS X
* covered the Android-specific changes to the use of `setlocale`
* state explicitly that we're aware this makes the behaviour
  of standalone CPython and embedded CPython diverge, we just think
  the potential benefits are sufficient to accept that downside
* note the reference implementation has yet to be updated with
  these changes

											
										
										
											2017-05-06 02:58:19 -04:00
+								solution that offered improved behaviour when none of the standard UTF-8 based
-												PEP 538 updates for python-dev review

* Tidy up the abstract and emphasise the equivalence between
  this proposal and long supported configuration settings
* Don't set LC_ALL (set LC_CTYPE instead)
* Add a rationale for that change
* Use GNU readline misbehaviour as a specific example of the
  benefits of reconfiguring the locale
* Clarify rationale for enabling the changes by default on all
  autotools-using platforms
* Mention the possibility of exposing a public API for use by
  embedding platforms

											
										
										
											2017-05-09 06:46:59 -04:00
+								locales were available, as well as extending the changes in the default
 								settings to APIs that aren't currently independently configurable (such as
 								the default encoding and error handler for ``open()``).
-												PEP 538: update for PEP 540 & linux-sig feedback

- PYTHONALLOWCLOCALE=1 -> PYTHONCOERCECLOCALE=0
- reword the proposed library warning
- try all of C.UTF-8, c.utf8 and en_US.UTF-8
- compare and contrast with PEP 540
- new Motivation section showing specific Docker problems
- discuss implications of "strict" error handling
- define configure options to turn the new behaviour off

											
										
										
											2017-01-07 02:04:39 -05:00
-												Several PEPs: Use explicit `:pep:` and `:rfc:` roles (#2209)


											
										
										
											2022-01-21 06:03:51 -05:00
+								The availability of :pep:`540` also meant that the ``LC_CTYPE=en_US.UTF-8`` legacy
-												PEP 538: Update reference implementation (#219)

- updates reference implementation to use PYTHONCOERCECLOCALE
- removes hard dependency on PEP 540
- still notes PEP 540 covers case where no relevant C-with-UTF-8
  locale is available
- clarifies that these settings are still recommended over the
  legacy C locale settings for older Python 3 versions, even if
  we don't recommend backporting the automatic coercion
											
										
										
											2017-03-05 02:29:54 -05:00
+								fallback was removed from the list of UTF-8 locales tried as a coercion target,
-												PEP 538: Update for latest python-dev discussion

* default standard stream error handler is always "surrogateescape"
  for the potential coercion target locales
* PEP 540 is now a purely optional follow-on PEP that improves the
  handling of cases where none of these locales are available,
  but doesn't require revisiting the changes made for this PEP
* the locale coercion and warning behaviours are now enabled by
  default for all \*nix platforms, even Mac OS X
* covered the Android-specific changes to the use of `setlocale`
* state explicitly that we're aware this makes the behaviour
  of standalone CPython and embedded CPython diverge, we just think
  the potential benefits are sufficient to accept that downside
* note the reference implementation has yet to be updated with
  these changes

											
										
										
											2017-05-06 02:58:19 -04:00
+								with the expectation being that CPython will instead rely solely on the
 								proposed PYTHONUTF8 mode in such cases.
-												PEP 538: update for PEP 540 & linux-sig feedback

- PYTHONALLOWCLOCALE=1 -> PYTHONCOERCECLOCALE=0
- reword the proposed library warning
- try all of C.UTF-8, c.utf8 and en_US.UTF-8
- compare and contrast with PEP 540
- new Motivation section showing specific Docker problems
- discuss implications of "strict" error handling
- define configure options to turn the new behaviour off

											
										
										
											2017-01-07 02:04:39 -05:00
 								Motivation
 								==========
-												PEP 538: add Background section on locale handling

											
										
										
											2017-01-03 00:19:37 -05:00
-												PEP 538: update for PEP 540 & linux-sig feedback

- PYTHONALLOWCLOCALE=1 -> PYTHONCOERCECLOCALE=0
- reword the proposed library warning
- try all of C.UTF-8, c.utf8 and en_US.UTF-8
- compare and contrast with PEP 540
- new Motivation section showing specific Docker problems
- discuss implications of "strict" error handling
- define configure options to turn the new behaviour off

											
										
										
											2017-01-07 02:04:39 -05:00
+								While Linux container technologies like Docker, Kubernetes, and OpenShift are
 								best known for their use in web service development, the related container
 								formats and execution models are also being adopted for Linux command line
 								application development. Technologies like Gnome Flatpak [7_] and
-												Fix a couple of typos.

											
										
										
											2017-03-28 18:16:44 -04:00
+								Ubuntu Snappy [8_] further aim to bring these same techniques to Linux GUI
-												PEP 538: update for PEP 540 & linux-sig feedback

- PYTHONALLOWCLOCALE=1 -> PYTHONCOERCECLOCALE=0
- reword the proposed library warning
- try all of C.UTF-8, c.utf8 and en_US.UTF-8
- compare and contrast with PEP 540
- new Motivation section showing specific Docker problems
- discuss implications of "strict" error handling
- define configure options to turn the new behaviour off

											
										
										
											2017-01-07 02:04:39 -05:00
+								application development.
-												PEP 538: Update reference implementation (#219)

- updates reference implementation to use PYTHONCOERCECLOCALE
- removes hard dependency on PEP 540
- still notes PEP 540 covers case where no relevant C-with-UTF-8
  locale is available
- clarifies that these settings are still recommended over the
  legacy C locale settings for older Python 3 versions, even if
  we don't recommend backporting the automatic coercion
											
										
										
											2017-03-05 02:29:54 -05:00
+								When using Python 3 for application development in these contexts, it isn't
 								uncommon to see text encoding related errors akin to the following::
-												PEP 538: update for PEP 540 & linux-sig feedback

- PYTHONALLOWCLOCALE=1 -> PYTHONCOERCECLOCALE=0
- reword the proposed library warning
- try all of C.UTF-8, c.utf8 and en_US.UTF-8
- compare and contrast with PEP 540
- new Motivation section showing specific Docker problems
- discuss implications of "strict" error handling
- define configure options to turn the new behaviour off

											
										
										
											2017-01-07 02:04:39 -05:00
 								    $ docker run --rm fedora:25 python3 -c 'print("ℙƴ☂ℌøἤ")'
 								    Unable to decode the command from the command line:
 								    UnicodeEncodeError: 'utf-8' codec can't encode character '\udce2' in position 7: surrogates not allowed
 								    $ docker run --rm ncoghlan/debian-python python3 -c 'print("ℙƴ☂ℌøἤ")'
 								    Unable to decode the command from the command line:
 								    UnicodeEncodeError: 'utf-8' codec can't encode character '\udce2' in position 7: surrogates not allowed
 								Even though the same command is likely to work fine when run locally::
 								    $ python3 -c 'print("ℙƴ☂ℌøἤ")'
 								    ℙƴ☂ℌøἤ
 								The source of the problem can be seen by instead running the ``locale`` command
 								in the three environments::
 								    $ locale | grep -E 'LC_ALL|LC_CTYPE|LANG'
 								    LANG=en_AU.UTF-8
 								    LC_CTYPE="en_AU.UTF-8"
 								    LC_ALL=
 								    $ docker run --rm fedora:25 locale | grep -E 'LC_ALL|LC_CTYPE|LANG'
 								    LANG=
 								    LC_CTYPE="POSIX"
 								    LC_ALL=
 								    $ docker run --rm ncoghlan/debian-python locale | grep -E 'LC_ALL|LC_CTYPE|LANG'
 								    LANG=
 								    LANGUAGE=
 								    LC_CTYPE="POSIX"
 								    LC_ALL=
 								In this particular example, we can see that the host system locale is set to
 								"en_AU.UTF-8", so CPython uses UTF-8 as the default text encoding. By contrast,
 								the base Docker images for Fedora and Debian don't have any specific locale
 								set, so they use the POSIX locale by default, which is an alias for the
 								ASCII-based default C locale.
 								The simplest way to get Python 3 (regardless of the exact version) to behave
 								sensibly in Fedora and Debian based containers is to run it in the ``C.UTF-8``
 								locale that both distros provide::
-												PEP 538: Only set LC_CTYPE, never LANG

It looks like setting LANG may have undesirable
side effects in some cases, and all the issues
the PEP aims to handle are resolved by setting
LC_CTYPE.

The proposal and implementation have thus been
updated to only set LC_CTYPE, even when the
target coercion locale is a full locale.

											
										
										
											2017-05-27 03:08:32 -04:00
+								    $ docker run --rm -e LC_CTYPE=C.UTF-8 fedora:25 python3 -c 'print("ℙƴ☂ℌøἤ")'
-												PEP 538: update for PEP 540 & linux-sig feedback

- PYTHONALLOWCLOCALE=1 -> PYTHONCOERCECLOCALE=0
- reword the proposed library warning
- try all of C.UTF-8, c.utf8 and en_US.UTF-8
- compare and contrast with PEP 540
- new Motivation section showing specific Docker problems
- discuss implications of "strict" error handling
- define configure options to turn the new behaviour off

											
										
										
											2017-01-07 02:04:39 -05:00
+								    ℙƴ☂ℌøἤ
-												PEP 538: Only set LC_CTYPE, never LANG

It looks like setting LANG may have undesirable
side effects in some cases, and all the issues
the PEP aims to handle are resolved by setting
LC_CTYPE.

The proposal and implementation have thus been
updated to only set LC_CTYPE, even when the
target coercion locale is a full locale.

											
										
										
											2017-05-27 03:08:32 -04:00
+								    $ docker run --rm -e LC_CTYPE=C.UTF-8 ncoghlan/debian-python python3 -c 'print("ℙƴ☂ℌøἤ")'
-												PEP 538: update for PEP 540 & linux-sig feedback

- PYTHONALLOWCLOCALE=1 -> PYTHONCOERCECLOCALE=0
- reword the proposed library warning
- try all of C.UTF-8, c.utf8 and en_US.UTF-8
- compare and contrast with PEP 540
- new Motivation section showing specific Docker problems
- discuss implications of "strict" error handling
- define configure options to turn the new behaviour off

											
										
										
											2017-01-07 02:04:39 -05:00
+								    ℙƴ☂ℌøἤ
-												PEP 538: Only set LC_CTYPE, never LANG

It looks like setting LANG may have undesirable
side effects in some cases, and all the issues
the PEP aims to handle are resolved by setting
LC_CTYPE.

The proposal and implementation have thus been
updated to only set LC_CTYPE, even when the
target coercion locale is a full locale.

											
										
										
											2017-05-27 03:08:32 -04:00
+								    $ docker run --rm -e LC_CTYPE=C.UTF-8 fedora:25 locale | grep -E 'LC_ALL|LC_CTYPE|LANG'
 								    LANG=
 								    LC_CTYPE=C.UTF-8
-												PEP 538: update for PEP 540 & linux-sig feedback

- PYTHONALLOWCLOCALE=1 -> PYTHONCOERCECLOCALE=0
- reword the proposed library warning
- try all of C.UTF-8, c.utf8 and en_US.UTF-8
- compare and contrast with PEP 540
- new Motivation section showing specific Docker problems
- discuss implications of "strict" error handling
- define configure options to turn the new behaviour off

											
										
										
											2017-01-07 02:04:39 -05:00
+								    LC_ALL=
-												PEP 538: Only set LC_CTYPE, never LANG

It looks like setting LANG may have undesirable
side effects in some cases, and all the issues
the PEP aims to handle are resolved by setting
LC_CTYPE.

The proposal and implementation have thus been
updated to only set LC_CTYPE, even when the
target coercion locale is a full locale.

											
										
										
											2017-05-27 03:08:32 -04:00
+								    $ docker run --rm -e LC_CTYPE=C.UTF-8 ncoghlan/debian-python locale | grep -E 'LC_ALL|LC_CTYPE|LANG'
 								    LANG=
-												PEP 538: update for PEP 540 & linux-sig feedback

- PYTHONALLOWCLOCALE=1 -> PYTHONCOERCECLOCALE=0
- reword the proposed library warning
- try all of C.UTF-8, c.utf8 and en_US.UTF-8
- compare and contrast with PEP 540
- new Motivation section showing specific Docker problems
- discuss implications of "strict" error handling
- define configure options to turn the new behaviour off

											
										
										
											2017-01-07 02:04:39 -05:00
+								    LANGUAGE=
-												PEP 538: Only set LC_CTYPE, never LANG

It looks like setting LANG may have undesirable
side effects in some cases, and all the issues
the PEP aims to handle are resolved by setting
LC_CTYPE.

The proposal and implementation have thus been
updated to only set LC_CTYPE, even when the
target coercion locale is a full locale.

											
										
										
											2017-05-27 03:08:32 -04:00
+								    LC_CTYPE=C.UTF-8
-												PEP 538: update for PEP 540 & linux-sig feedback

- PYTHONALLOWCLOCALE=1 -> PYTHONCOERCECLOCALE=0
- reword the proposed library warning
- try all of C.UTF-8, c.utf8 and en_US.UTF-8
- compare and contrast with PEP 540
- new Motivation section showing specific Docker problems
- discuss implications of "strict" error handling
- define configure options to turn the new behaviour off

											
										
										
											2017-01-07 02:04:39 -05:00
+								    LC_ALL=
-												PEP 538: update based on implementation progress

- using PYTHONIOENCODING poses a compatibility problem for
  Python 2 subprocesses, so use Py_SetStandardStreamEncoding
  instead
- note that components checking for "no output on stderr
  means success" will either need to avoid the warning or
  switch to checking return codes instead
- Docker, Inc. ends with a full stop, not a comma (noted by
  Jan Pokorný)
- explicitly acknowledge Charalampos Stratakis's work on the
  Fedora 26 backport

											
										
										
											2017-03-17 04:27:53 -04:00
+								The Alpine Linux based Python images provided by Docker, Inc. already use the
-												PEP 538: update for PEP 540 & linux-sig feedback

- PYTHONALLOWCLOCALE=1 -> PYTHONCOERCECLOCALE=0
- reword the proposed library warning
- try all of C.UTF-8, c.utf8 and en_US.UTF-8
- compare and contrast with PEP 540
- new Motivation section showing specific Docker problems
- discuss implications of "strict" error handling
- define configure options to turn the new behaviour off

											
										
										
											2017-01-07 02:04:39 -05:00
+								C.UTF-8 locale by default::
 								    $ docker run --rm python:3 python3 -c 'print("ℙƴ☂ℌøἤ")'
 								    ℙƴ☂ℌøἤ
 								    $ docker run --rm python:3 locale | grep -E 'LC_ALL|LC_CTYPE|LANG'
 								    LANG=C.UTF-8
 								    LANGUAGE=
 								    LC_CTYPE="C.UTF-8"
 								    LC_ALL=
 								Similarly, for custom container images (i.e. those adding additional content on
 								top of a base distro image), a more suitable locale can be set in the image
 								definition so everything just works by default. However, it would provide a much
 								nicer and more consistent user experience if CPython were able to just deal
 								with this problem automatically rather than relying on redistributors or end
 								users to handle it through system configuration changes.
 								While the glibc developers are working towards making the C.UTF-8 locale
 								universally available for use by glibc based applications like CPython [6_],
 								this unfortunately doesn't help on platforms that ship older versions of glibc
-												PEP 538 updates for python-dev review

* Tidy up the abstract and emphasise the equivalence between
  this proposal and long supported configuration settings
* Don't set LC_ALL (set LC_CTYPE instead)
* Add a rationale for that change
* Use GNU readline misbehaviour as a specific example of the
  benefits of reconfiguring the locale
* Clarify rationale for enabling the changes by default on all
  autotools-using platforms
* Mention the possibility of exposing a public API for use by
  embedding platforms

											
										
										
											2017-05-09 06:46:59 -04:00
+								without that feature, and also don't provide C.UTF-8 (or an equivalent) as an
 								on-disk locale the way Debian and Fedora do. These platforms are considered
-												Several PEPs: Use explicit `:pep:` and `:rfc:` roles (#2209)


											
										
										
											2022-01-21 06:03:51 -05:00
+								out of scope for this PEP - see :pep:`540` for further discussion of possible
-												PEP 538 updates for python-dev review

* Tidy up the abstract and emphasise the equivalence between
  this proposal and long supported configuration settings
* Don't set LC_ALL (set LC_CTYPE instead)
* Add a rationale for that change
* Use GNU readline misbehaviour as a specific example of the
  benefits of reconfiguring the locale
* Clarify rationale for enabling the changes by default on all
  autotools-using platforms
* Mention the possibility of exposing a public API for use by
  embedding platforms

											
										
										
											2017-05-09 06:46:59 -04:00
+								options for improving CPython's default behaviour in such environments.
-												PEP 538: update for PEP 540 & linux-sig feedback

- PYTHONALLOWCLOCALE=1 -> PYTHONCOERCECLOCALE=0
- reword the proposed library warning
- try all of C.UTF-8, c.utf8 and en_US.UTF-8
- compare and contrast with PEP 540
- new Motivation section showing specific Docker problems
- discuss implications of "strict" error handling
- define configure options to turn the new behaviour off

											
										
										
											2017-01-07 02:04:39 -05:00
-												PEP 538: document core design principles

Also provides a bit more background on the rationale for
using "strict" by default on stdin and stdout when coercing
the locale to one based on UTF-8

											
										
										
											2017-01-07 20:54:24 -05:00
+								Design Principles
 								=================
 								The above motivation leads to the following core design principles for the
 								proposed solution:
 								* if a locale other than the default C locale is explicitly configured, we'll
 								  continue to respect it
-												PEP 538: Update for latest python-dev discussion

* default standard stream error handler is always "surrogateescape"
  for the potential coercion target locales
* PEP 540 is now a purely optional follow-on PEP that improves the
  handling of cases where none of these locales are available,
  but doesn't require revisiting the changes made for this PEP
* the locale coercion and warning behaviours are now enabled by
  default for all \*nix platforms, even Mac OS X
* covered the Android-specific changes to the use of `setlocale`
* state explicitly that we're aware this makes the behaviour
  of standalone CPython and embedded CPython diverge, we just think
  the potential benefits are sufficient to accept that downside
* note the reference implementation has yet to be updated with
  these changes

											
										
										
											2017-05-06 02:58:19 -04:00
+								* as far as is feasible, any changes made will use *existing* configuration
 								  options
 								* Python's runtime behaviour in potential coercion target locales should be
 								  identical regardless of whether the locale was set explicitly in the
 								  environment or implicitly as a locale coercion target
-												PEP 538 updates for python-dev review

* Tidy up the abstract and emphasise the equivalence between
  this proposal and long supported configuration settings
* Don't set LC_ALL (set LC_CTYPE instead)
* Add a rationale for that change
* Use GNU readline misbehaviour as a specific example of the
  benefits of reconfiguring the locale
* Clarify rationale for enabling the changes by default on all
  autotools-using platforms
* Mention the possibility of exposing a public API for use by
  embedding platforms

											
										
										
											2017-05-09 06:46:59 -04:00
+								* for Python 3.7, if we're changing the locale setting without an explicit
 								  config option, we'll emit a warning on stderr that we're doing so rather
 								  than silently changing the process configuration. This will alert application
 								  and system integrators to the change, even if they don't closely follow the
 								  PEP process or Python release announcements. However, to minimize the chance
 								  of introducing new problems for end users, we'll do this *without* using the
 								  warnings system, so even running with ``-Werror`` won't turn it into a runtime
-												PEP 538: Remove warnings from examples

PEP 538 is likely to be read by folks trying to understand the
implications of locale coercion in Python 3.7.

While I don't want to lose the history that we originally approved
and implemented "warn by default" behaviour, the examples are
easier to read if the warnings are omitted.

I also added inline notes for the sections affected by the change
in how the warning notifications are handled, since the implementation
note at the top is easy to miss when following a direct link to a specific
section.
											
										
										
											2018-03-29 10:05:50 -04:00
+								  exception. (Note: these warnings ended up being silenced by default. See the
 								  Implementation Note above for more details)
-												PEP 538 updates for python-dev review

* Tidy up the abstract and emphasise the equivalence between
  this proposal and long supported configuration settings
* Don't set LC_ALL (set LC_CTYPE instead)
* Add a rationale for that change
* Use GNU readline misbehaviour as a specific example of the
  benefits of reconfiguring the locale
* Clarify rationale for enabling the changes by default on all
  autotools-using platforms
* Mention the possibility of exposing a public API for use by
  embedding platforms

											
										
										
											2017-05-09 06:46:59 -04:00
+								* for Python 3.7, any changed defaults will offer some form of explicit "off"
 								  switch at build time, runtime, or both
-												PEP 538: document core design principles

Also provides a bit more background on the rationale for
using "strict" by default on stdin and stdout when coercing
the locale to one based on UTF-8

											
										
										
											2017-01-07 20:54:24 -05:00
-												PEP 538: Remove warnings from examples

PEP 538 is likely to be read by folks trying to understand the
implications of locale coercion in Python 3.7.

While I don't want to lose the history that we originally approved
and implemented "warn by default" behaviour, the examples are
easier to read if the warnings are omitted.

I also added inline notes for the sections affected by the change
in how the warning notifications are handled, since the implementation
note at the top is easy to miss when following a direct link to a specific
section.
											
										
										
											2018-03-29 10:05:50 -04:00
-												PEP 538: update for python-dev & implementation feedback

- PYTHONCOERCECLOCALE=0 now also disables the library warning
- PEP just refers to locale-aware/locale-independent components,
  without specifically limiting that to C/C++ components

											
										
										
											2017-03-13 01:06:48 -04:00
+								Minimizing the negative impact on systems currently correctly configured to
-												PEP 538: Update to depend on PEP 540

- relies entirely on PEP 540 when no appropriate locale
  is available
- uses surrogateescape on standard streams by default
- accounts for BSD-style UTF-8 locales
- avoids any reliance on the en_US-UTF-8 locale
- makes note of related GNU readline issue on Android

											
										
										
											2017-01-20 09:13:24 -05:00
+								use GB-18030 or another partially ASCII compatible universal encoding leads to
-												PEP 538 updates for python-dev review

* Tidy up the abstract and emphasise the equivalence between
  this proposal and long supported configuration settings
* Don't set LC_ALL (set LC_CTYPE instead)
* Add a rationale for that change
* Use GNU readline misbehaviour as a specific example of the
  benefits of reconfiguring the locale
* Clarify rationale for enabling the changes by default on all
  autotools-using platforms
* Mention the possibility of exposing a public API for use by
  embedding platforms

											
										
										
											2017-05-09 06:46:59 -04:00
+								the following design principle:
-												PEP 538: document core design principles

Also provides a bit more background on the rationale for
using "strict" by default on stdin and stdout when coercing
the locale to one based on UTF-8

											
										
										
											2017-01-07 20:54:24 -05:00
 								* if a UTF-8 based Linux container is run on a host that is explicitly
 								  configured to use a non-UTF-8 encoding, and tries to exchange locally
 								  encoded data with that host rather than exchanging explicitly UTF-8 encoded
-												PEP 538: Update to depend on PEP 540

- relies entirely on PEP 540 when no appropriate locale
  is available
- uses surrogateescape on standard streams by default
- accounts for BSD-style UTF-8 locales
- avoids any reliance on the en_US-UTF-8 locale
- makes note of related GNU readline issue on Android

											
										
										
											2017-01-20 09:13:24 -05:00
+								  data, CPython will endeavour to correctly round-trip host provided data that
 								  is concatenated or split solely at common ASCII compatible code points, but
 								  may otherwise emit nonsensical results.
-												PEP 538: document core design principles

Also provides a bit more background on the rationale for
using "strict" by default on stdin and stdout when coercing
the locale to one based on UTF-8

											
										
										
											2017-01-07 20:54:24 -05:00
-												PEP 538 updates for python-dev review

* Tidy up the abstract and emphasise the equivalence between
  this proposal and long supported configuration settings
* Don't set LC_ALL (set LC_CTYPE instead)
* Add a rationale for that change
* Use GNU readline misbehaviour as a specific example of the
  benefits of reconfiguring the locale
* Clarify rationale for enabling the changes by default on all
  autotools-using platforms
* Mention the possibility of exposing a public API for use by
  embedding platforms

											
										
										
											2017-05-09 06:46:59 -04:00
+								Minimizing the negative impact on systems and programs correctly configured to
 								use an explicit locale category like ``LC_TIME``, ``LC_MONETARY`` or
 								``LC_NUMERIC`` while otherwise running in the legacy C locale gives the
 								following design principles:
-												PEP 538: Only set LC_CTYPE, never LANG

It looks like setting LANG may have undesirable
side effects in some cases, and all the issues
the PEP aims to handle are resolved by setting
LC_CTYPE.

The proposal and implementation have thus been
updated to only set LC_CTYPE, even when the
target coercion locale is a full locale.

											
										
										
											2017-05-27 03:08:32 -04:00
+								* don't make any environmental changes that would alter any existing settings
 								  for locale categories other than ``LC_CTYPE`` (most notably: don't set
 								  ``LC_ALL`` or ``LANG``)
-												PEP 538 updates for python-dev review

* Tidy up the abstract and emphasise the equivalence between
  this proposal and long supported configuration settings
* Don't set LC_ALL (set LC_CTYPE instead)
* Add a rationale for that change
* Use GNU readline misbehaviour as a specific example of the
  benefits of reconfiguring the locale
* Clarify rationale for enabling the changes by default on all
  autotools-using platforms
* Mention the possibility of exposing a public API for use by
  embedding platforms

											
										
										
											2017-05-09 06:46:59 -04:00
 								Finally, maintaining compatibility with running arbitrary subprocesses in
 								orchestration use cases leads to the following design principle:
 								* don't make any Python-specific environmental changes that might be
 								  incompatible with any still supported version of CPython (including
 								  CPython 2.7)
-												PEP 538: document core design principles

Also provides a bit more background on the rationale for
using "strict" by default on stdin and stdout when coercing
the locale to one based on UTF-8

											
										
										
											2017-01-07 20:54:24 -05:00
-												PEP 538: update for PEP 540 & linux-sig feedback

- PYTHONALLOWCLOCALE=1 -> PYTHONCOERCECLOCALE=0
- reword the proposed library warning
- try all of C.UTF-8, c.utf8 and en_US.UTF-8
- compare and contrast with PEP 540
- new Motivation section showing specific Docker problems
- discuss implications of "strict" error handling
- define configure options to turn the new behaviour off

											
										
										
											2017-01-07 02:04:39 -05:00
+								Specification
 								=============
-												PEP 538: add Background section on locale handling

											
										
										
											2017-01-03 00:19:37 -05:00
 								To better handle the cases where CPython would otherwise end up attempting
-												PEP 538: update for PEP 540 & linux-sig feedback

- PYTHONALLOWCLOCALE=1 -> PYTHONCOERCECLOCALE=0
- reword the proposed library warning
- try all of C.UTF-8, c.utf8 and en_US.UTF-8
- compare and contrast with PEP 540
- new Motivation section showing specific Docker problems
- discuss implications of "strict" error handling
- define configure options to turn the new behaviour off

											
										
										
											2017-01-07 02:04:39 -05:00
+								to operate in the ``C`` locale, this PEP proposes that CPython automatically
-												PEP 538: Only set LC_CTYPE, never LANG

It looks like setting LANG may have undesirable
side effects in some cases, and all the issues
the PEP aims to handle are resolved by setting
LC_CTYPE.

The proposal and implementation have thus been
updated to only set LC_CTYPE, even when the
target coercion locale is a full locale.

											
										
										
											2017-05-27 03:08:32 -04:00
+								attempt to coerce the legacy ``C`` locale to a UTF-8 based locale for the
 								``LC_CTYPE`` category when it is run as a standalone command line application.
-												PEP 538: update for PEP 540 & linux-sig feedback

- PYTHONALLOWCLOCALE=1 -> PYTHONCOERCECLOCALE=0
- reword the proposed library warning
- try all of C.UTF-8, c.utf8 and en_US.UTF-8
- compare and contrast with PEP 540
- new Motivation section showing specific Docker problems
- discuss implications of "strict" error handling
- define configure options to turn the new behaviour off

											
										
										
											2017-01-07 02:04:39 -05:00
 								It further proposes to emit a warning on stderr if the legacy ``C`` locale
-												PEP 538: Only set LC_CTYPE, never LANG

It looks like setting LANG may have undesirable
side effects in some cases, and all the issues
the PEP aims to handle are resolved by setting
LC_CTYPE.

The proposal and implementation have thus been
updated to only set LC_CTYPE, even when the
target coercion locale is a full locale.

											
										
										
											2017-05-27 03:08:32 -04:00
+								is in effect for the ``LC_CTYPE`` category at the point where the language
 								runtime itself is initialized,
-												PEP 538: Update for latest python-dev discussion

* default standard stream error handler is always "surrogateescape"
  for the potential coercion target locales
* PEP 540 is now a purely optional follow-on PEP that improves the
  handling of cases where none of these locales are available,
  but doesn't require revisiting the changes made for this PEP
* the locale coercion and warning behaviours are now enabled by
  default for all \*nix platforms, even Mac OS X
* covered the Android-specific changes to the use of `setlocale`
* state explicitly that we're aware this makes the behaviour
  of standalone CPython and embedded CPython diverge, we just think
  the potential benefits are sufficient to accept that downside
* note the reference implementation has yet to be updated with
  these changes

											
										
										
											2017-05-06 02:58:19 -04:00
+								and the explicit environmental flag to disable locale coercion is not set, in
-												PEP 538: update based on implementation progress

- using PYTHONIOENCODING poses a compatibility problem for
  Python 2 subprocesses, so use Py_SetStandardStreamEncoding
  instead
- note that components checking for "no output on stderr
  means success" will either need to avoid the warning or
  switch to checking return codes instead
- Docker, Inc. ends with a full stop, not a comma (noted by
  Jan Pokorný)
- explicitly acknowledge Charalampos Stratakis's work on the
  Fedora 26 backport

											
										
										
											2017-03-17 04:27:53 -04:00
+								order to warn system and application integrators that they're running CPython
 								in an unsupported configuration.
-												PEP 538: update for PEP 540 & linux-sig feedback

- PYTHONALLOWCLOCALE=1 -> PYTHONCOERCECLOCALE=0
- reword the proposed library warning
- try all of C.UTF-8, c.utf8 and en_US.UTF-8
- compare and contrast with PEP 540
- new Motivation section showing specific Docker problems
- discuss implications of "strict" error handling
- define configure options to turn the new behaviour off

											
										
										
											2017-01-07 02:04:39 -05:00
-												PEP 538: Update for latest python-dev discussion

* default standard stream error handler is always "surrogateescape"
  for the potential coercion target locales
* PEP 540 is now a purely optional follow-on PEP that improves the
  handling of cases where none of these locales are available,
  but doesn't require revisiting the changes made for this PEP
* the locale coercion and warning behaviours are now enabled by
  default for all \*nix platforms, even Mac OS X
* covered the Android-specific changes to the use of `setlocale`
* state explicitly that we're aware this makes the behaviour
  of standalone CPython and embedded CPython diverge, we just think
  the potential benefits are sufficient to accept that downside
* note the reference implementation has yet to be updated with
  these changes

											
										
										
											2017-05-06 02:58:19 -04:00
+								In addition to these general changes, some additional Android-specific changes
 								are proposed to handle the differences in the behaviour of ``setlocale`` on that
 								platform.
-												PEP 538: coerce legacy C locale to C.UTF-8

											
										
										
											2016-12-27 21:31:21 -05:00
-												PEP 538: update for PEP 540 & linux-sig feedback

- PYTHONALLOWCLOCALE=1 -> PYTHONCOERCECLOCALE=0
- reword the proposed library warning
- try all of C.UTF-8, c.utf8 and en_US.UTF-8
- compare and contrast with PEP 540
- new Motivation section showing specific Docker problems
- discuss implications of "strict" error handling
- define configure options to turn the new behaviour off

											
										
										
											2017-01-07 02:04:39 -05:00
+								Legacy C locale coercion in the standalone Python interpreter binary
 								--------------------------------------------------------------------
-												PEP 538: coerce legacy C locale to C.UTF-8

											
										
										
											2016-12-27 21:31:21 -05:00
-												PEP 538: update for PEP 540 & linux-sig feedback

- PYTHONALLOWCLOCALE=1 -> PYTHONCOERCECLOCALE=0
- reword the proposed library warning
- try all of C.UTF-8, c.utf8 and en_US.UTF-8
- compare and contrast with PEP 540
- new Motivation section showing specific Docker problems
- discuss implications of "strict" error handling
- define configure options to turn the new behaviour off

											
										
										
											2017-01-07 02:04:39 -05:00
+								When run as a standalone application, CPython has the opportunity to
 								reconfigure the C locale before any locale dependent operations are executed
 								in the process.
-												PEP 538: coerce legacy C locale to C.UTF-8

											
										
										
											2016-12-27 21:31:21 -05:00
-												PEP 538: update for PEP 540 & linux-sig feedback

- PYTHONALLOWCLOCALE=1 -> PYTHONCOERCECLOCALE=0
- reword the proposed library warning
- try all of C.UTF-8, c.utf8 and en_US.UTF-8
- compare and contrast with PEP 540
- new Motivation section showing specific Docker problems
- discuss implications of "strict" error handling
- define configure options to turn the new behaviour off

											
										
										
											2017-01-07 02:04:39 -05:00
+								This means that it can change the locale settings not only for the CPython
-												PEP 538: update for python-dev & implementation feedback

- PYTHONCOERCECLOCALE=0 now also disables the library warning
- PEP just refers to locale-aware/locale-independent components,
  without specifically limiting that to C/C++ components

											
										
										
											2017-03-13 01:06:48 -04:00
+								runtime, but also for any other locale-aware components running in the current
-												PEP 538: update for PEP 540 & linux-sig feedback

- PYTHONALLOWCLOCALE=1 -> PYTHONCOERCECLOCALE=0
- reword the proposed library warning
- try all of C.UTF-8, c.utf8 and en_US.UTF-8
- compare and contrast with PEP 540
- new Motivation section showing specific Docker problems
- discuss implications of "strict" error handling
- define configure options to turn the new behaviour off

											
										
										
											2017-01-07 02:04:39 -05:00
+								process (e.g. as part of extension modules), as well as in subprocesses that
 								inherit their environment from the current process.
-												PEP 538: add Background section on locale handling

											
										
										
											2017-01-03 00:19:37 -05:00
-												PEP 538: update for PEP 540 & linux-sig feedback

- PYTHONALLOWCLOCALE=1 -> PYTHONCOERCECLOCALE=0
- reword the proposed library warning
- try all of C.UTF-8, c.utf8 and en_US.UTF-8
- compare and contrast with PEP 540
- new Motivation section showing specific Docker problems
- discuss implications of "strict" error handling
- define configure options to turn the new behaviour off

											
										
										
											2017-01-07 02:04:39 -05:00
+								After calling ``setlocale(LC_ALL, "")`` to initialize the locale settings in
 								the current process, the main interpreter binary will be updated to include
 								the following call::
-												PEP 538: coerce legacy C locale to C.UTF-8

											
										
										
											2016-12-27 21:31:21 -05:00
-												PEP 538: update for PEP 540 & linux-sig feedback

- PYTHONALLOWCLOCALE=1 -> PYTHONCOERCECLOCALE=0
- reword the proposed library warning
- try all of C.UTF-8, c.utf8 and en_US.UTF-8
- compare and contrast with PEP 540
- new Motivation section showing specific Docker problems
- discuss implications of "strict" error handling
- define configure options to turn the new behaviour off

											
										
										
											2017-01-07 02:04:39 -05:00
+								    const char *ctype_loc = setlocale(LC_CTYPE, NULL);
 								This cryptic invocation is the API that C provides to query the current locale
 								setting without changing it. Given that query, it is possible to check for
 								exactly the ``C`` locale with ``strcmp``::
 								    ctype_loc != NULL && strcmp(ctype_loc, "C") == 0 # true only in the C locale
-												PEP 538: clarify setlocale behaviour in POSIX locale

											
										
										
											2017-01-07 07:14:20 -05:00
+								This call also returns ``"C"`` when either no particular locale is set, or the
 								nominal locale is set to an alias for the ``C`` locale (such as ``POSIX``).
-												PEP 538: update for PEP 540 & linux-sig feedback

- PYTHONALLOWCLOCALE=1 -> PYTHONCOERCECLOCALE=0
- reword the proposed library warning
- try all of C.UTF-8, c.utf8 and en_US.UTF-8
- compare and contrast with PEP 540
- new Motivation section showing specific Docker problems
- discuss implications of "strict" error handling
- define configure options to turn the new behaviour off

											
										
										
											2017-01-07 02:04:39 -05:00
+								Given this information, CPython can then attempt to coerce the locale to one
 								that uses UTF-8 rather than ASCII as the default encoding.
 								Three such locales will be tried:
-												PEP 538 updates for python-dev review

* Tidy up the abstract and emphasise the equivalence between
  this proposal and long supported configuration settings
* Don't set LC_ALL (set LC_CTYPE instead)
* Add a rationale for that change
* Use GNU readline misbehaviour as a specific example of the
  benefits of reconfiguring the locale
* Clarify rationale for enabling the changes by default on all
  autotools-using platforms
* Mention the possibility of exposing a public API for use by
  embedding platforms

											
										
										
											2017-05-09 06:46:59 -04:00
+								* ``C.UTF-8`` (available at least in Debian, Ubuntu, Alpine, and Fedora 25+, and
-												PEP 538: update for PEP 540 & linux-sig feedback

- PYTHONALLOWCLOCALE=1 -> PYTHONCOERCECLOCALE=0
- reword the proposed library warning
- try all of C.UTF-8, c.utf8 and en_US.UTF-8
- compare and contrast with PEP 540
- new Motivation section showing specific Docker problems
- discuss implications of "strict" error handling
- define configure options to turn the new behaviour off

											
										
										
											2017-01-07 02:04:39 -05:00
+								  expected to be available by default in a future version of glibc)
 								* ``C.utf8`` (available at least in HP-UX)
-												PEP 538: Only set LC_CTYPE, never LANG

It looks like setting LANG may have undesirable
side effects in some cases, and all the issues
the PEP aims to handle are resolved by setting
LC_CTYPE.

The proposal and implementation have thus been
updated to only set LC_CTYPE, even when the
target coercion locale is a full locale.

											
										
										
											2017-05-27 03:08:32 -04:00
+								* ``UTF-8`` (available in at least some \*BSD variants, including Mac OS X)
-												PEP 538: update for PEP 540 & linux-sig feedback

- PYTHONALLOWCLOCALE=1 -> PYTHONCOERCECLOCALE=0
- reword the proposed library warning
- try all of C.UTF-8, c.utf8 and en_US.UTF-8
- compare and contrast with PEP 540
- new Motivation section showing specific Docker problems
- discuss implications of "strict" error handling
- define configure options to turn the new behaviour off

											
										
										
											2017-01-07 02:04:39 -05:00
-												PEP 538: Only set LC_CTYPE, never LANG

It looks like setting LANG may have undesirable
side effects in some cases, and all the issues
the PEP aims to handle are resolved by setting
LC_CTYPE.

The proposal and implementation have thus been
updated to only set LC_CTYPE, even when the
target coercion locale is a full locale.

											
										
										
											2017-05-27 03:08:32 -04:00
+								The coercion will be implemented by setting the ``LC_CTYPE`` environment
 								variable to the candidate locale name, such that future calls to
 								``setlocale()`` will see it, as will other components looking for those
 								settings (such as GUI development frameworks and Python's own ``locale``
 								module).
-												PEP 538: Update to depend on PEP 540

- relies entirely on PEP 540 when no appropriate locale
  is available
- uses surrogateescape on standard streams by default
- accounts for BSD-style UTF-8 locales
- avoids any reliance on the en_US-UTF-8 locale
- makes note of related GNU readline issue on Android

											
										
										
											2017-01-20 09:13:24 -05:00
-												PEP 538 updates for python-dev review

* Tidy up the abstract and emphasise the equivalence between
  this proposal and long supported configuration settings
* Don't set LC_ALL (set LC_CTYPE instead)
* Add a rationale for that change
* Use GNU readline misbehaviour as a specific example of the
  benefits of reconfiguring the locale
* Clarify rationale for enabling the changes by default on all
  autotools-using platforms
* Mention the possibility of exposing a public API for use by
  embedding platforms

											
										
										
											2017-05-09 06:46:59 -04:00
+								To allow for better cross-platform binary portability and to adjust
 								automatically to future changes in locale availability, these checks will be
 								implemented at runtime on all platforms other than Windows, rather than
 								attempting to determine which locales to try at compile time.
-												PEP 538: update for PEP 540 & linux-sig feedback

- PYTHONALLOWCLOCALE=1 -> PYTHONCOERCECLOCALE=0
- reword the proposed library warning
- try all of C.UTF-8, c.utf8 and en_US.UTF-8
- compare and contrast with PEP 540
- new Motivation section showing specific Docker problems
- discuss implications of "strict" error handling
- define configure options to turn the new behaviour off

											
										
										
											2017-01-07 02:04:39 -05:00
 								When this locale coercion is activated, the following warning will be
 								printed on stderr, with the warning containing whichever locale was
 								successfully configured::
-												PEP 538: Only set LC_CTYPE, never LANG

It looks like setting LANG may have undesirable
side effects in some cases, and all the issues
the PEP aims to handle are resolved by setting
LC_CTYPE.

The proposal and implementation have thus been
updated to only set LC_CTYPE, even when the
target coercion locale is a full locale.

											
										
										
											2017-05-27 03:08:32 -04:00
+								    Python detected LC_CTYPE=C: LC_CTYPE coerced to C.UTF-8 (set another
-												PEP 538: tweak wording of proposed warnings

											
										
										
											2017-01-07 07:20:23 -05:00
+								    locale or PYTHONCOERCECLOCALE=0 to disable this locale coercion behaviour).
-												PEP 538: update for PEP 540 & linux-sig feedback

- PYTHONALLOWCLOCALE=1 -> PYTHONCOERCECLOCALE=0
- reword the proposed library warning
- try all of C.UTF-8, c.utf8 and en_US.UTF-8
- compare and contrast with PEP 540
- new Motivation section showing specific Docker problems
- discuss implications of "strict" error handling
- define configure options to turn the new behaviour off

											
										
										
											2017-01-07 02:04:39 -05:00
-												PEP 538: Remove warnings from examples

PEP 538 is likely to be read by folks trying to understand the
implications of locale coercion in Python 3.7.

While I don't want to lose the history that we originally approved
and implemented "warn by default" behaviour, the examples are
easier to read if the warnings are omitted.

I also added inline notes for the sections affected by the change
in how the warning notifications are handled, since the implementation
note at the top is easy to miss when following a direct link to a specific
section.
											
										
										
											2018-03-29 10:05:50 -04:00
+								(Note: this warning ended up being silenced by default. See the
 								Implementation Note above for more details)
-												Remove trailing spaces from many PEPs (#983)


											
										
										
											2019-04-16 10:50:15 -04:00
-												PEP 538 updates for python-dev review

* Tidy up the abstract and emphasise the equivalence between
  this proposal and long supported configuration settings
* Don't set LC_ALL (set LC_CTYPE instead)
* Add a rationale for that change
* Use GNU readline misbehaviour as a specific example of the
  benefits of reconfiguring the locale
* Clarify rationale for enabling the changes by default on all
  autotools-using platforms
* Mention the possibility of exposing a public API for use by
  embedding platforms

											
										
										
											2017-05-09 06:46:59 -04:00
+								As long as the current platform provides at least one of the candidate UTF-8
 								based environments, this locale coercion will mean that the standard
-												PEP 538: update for python-dev & implementation feedback

- PYTHONCOERCECLOCALE=0 now also disables the library warning
- PEP just refers to locale-aware/locale-independent components,
  without specifically limiting that to C/C++ components

											
										
										
											2017-03-13 01:06:48 -04:00
+								Python binary *and* locale-aware extensions should once again "just work"
-												PEP 538: Update to depend on PEP 540

- relies entirely on PEP 540 when no appropriate locale
  is available
- uses surrogateescape on standard streams by default
- accounts for BSD-style UTF-8 locales
- avoids any reliance on the en_US-UTF-8 locale
- makes note of related GNU readline issue on Android

											
										
										
											2017-01-20 09:13:24 -05:00
+								in the three main failure cases we're aware of (missing locale
-												PEP 538 updates for python-dev review

* Tidy up the abstract and emphasise the equivalence between
  this proposal and long supported configuration settings
* Don't set LC_ALL (set LC_CTYPE instead)
* Add a rationale for that change
* Use GNU readline misbehaviour as a specific example of the
  benefits of reconfiguring the locale
* Clarify rationale for enabling the changes by default on all
  autotools-using platforms
* Mention the possibility of exposing a public API for use by
  embedding platforms

											
										
										
											2017-05-09 06:46:59 -04:00
+								settings, SSH forwarding of unknown locales via ``LANG`` or ``LC_CTYPE``, and
 								developers explicitly requesting ``LANG=C``).
-												PEP 538: update for PEP 540 & linux-sig feedback

- PYTHONALLOWCLOCALE=1 -> PYTHONCOERCECLOCALE=0
- reword the proposed library warning
- try all of C.UTF-8, c.utf8 and en_US.UTF-8
- compare and contrast with PEP 540
- new Motivation section showing specific Docker problems
- discuss implications of "strict" error handling
- define configure options to turn the new behaviour off

											
										
										
											2017-01-07 02:04:39 -05:00
-												PEP 538: update based on implementation progress

- using PYTHONIOENCODING poses a compatibility problem for
  Python 2 subprocesses, so use Py_SetStandardStreamEncoding
  instead
- note that components checking for "no output on stderr
  means success" will either need to avoid the warning or
  switch to checking return codes instead
- Docker, Inc. ends with a full stop, not a comma (noted by
  Jan Pokorný)
- explicitly acknowledge Charalampos Stratakis's work on the
  Fedora 26 backport

											
										
										
											2017-03-17 04:27:53 -04:00
+								The one case where failures may still occur is when ``stderr`` is specifically
 								being checked for no output, which can be resolved either by configuring
 								a locale other than the C locale, or else by using a mechanism other than
 								"there was no output on stderr" to check for subprocess errors (e.g. checking
 								process return codes).
-												PEP 538 updates for python-dev review

* Tidy up the abstract and emphasise the equivalence between
  this proposal and long supported configuration settings
* Don't set LC_ALL (set LC_CTYPE instead)
* Add a rationale for that change
* Use GNU readline misbehaviour as a specific example of the
  benefits of reconfiguring the locale
* Clarify rationale for enabling the changes by default on all
  autotools-using platforms
* Mention the possibility of exposing a public API for use by
  embedding platforms

											
										
										
											2017-05-09 06:46:59 -04:00
+								If none of the candidate locales are successfully configured, or the ``LC_ALL``,
 								locale override is defined in the current process environment, then
-												PEP 538: update for python-dev & implementation feedback

- PYTHONCOERCECLOCALE=0 now also disables the library warning
- PEP just refers to locale-aware/locale-independent components,
  without specifically limiting that to C/C++ components

											
										
										
											2017-03-13 01:06:48 -04:00
+								initialization will continue in the C locale and the Unicode compatibility
 								warning described in the next section will be emitted just as it would for
 								any other application.
 								If ``PYTHONCOERCECLOCALE=0`` is explicitly set, initialization will continue in
 								the C locale and the Unicode compatibility warning described in the next
 								section will be automatically suppressed.
-												PEP 538: update for PEP 540 & linux-sig feedback

- PYTHONALLOWCLOCALE=1 -> PYTHONCOERCECLOCALE=0
- reword the proposed library warning
- try all of C.UTF-8, c.utf8 and en_US.UTF-8
- compare and contrast with PEP 540
- new Motivation section showing specific Docker problems
- discuss implications of "strict" error handling
- define configure options to turn the new behaviour off

											
										
										
											2017-01-07 02:04:39 -05:00
 								The interpreter will always check for the ``PYTHONCOERCECLOCALE`` environment
-												PEP 538: update for python-dev & implementation feedback

- PYTHONCOERCECLOCALE=0 now also disables the library warning
- PEP just refers to locale-aware/locale-independent components,
  without specifically limiting that to C/C++ components

											
										
										
											2017-03-13 01:06:48 -04:00
+								variable at startup (even when running under the ``-E`` or ``-I`` switches),
 								as the locale coercion check necessarily takes place before any command line
 								argument processing. For consistency, the runtime check to determine whether
 								or not to suppress the locale compatibility warning will be similarly
 								independent of these settings.
-												PEP 538: update for PEP 540 & linux-sig feedback

- PYTHONALLOWCLOCALE=1 -> PYTHONCOERCECLOCALE=0
- reword the proposed library warning
- try all of C.UTF-8, c.utf8 and en_US.UTF-8
- compare and contrast with PEP 540
- new Motivation section showing specific Docker problems
- discuss implications of "strict" error handling
- define configure options to turn the new behaviour off

											
										
										
											2017-01-07 02:04:39 -05:00
-												PEP 538: Update for latest python-dev discussion

* default standard stream error handler is always "surrogateescape"
  for the potential coercion target locales
* PEP 540 is now a purely optional follow-on PEP that improves the
  handling of cases where none of these locales are available,
  but doesn't require revisiting the changes made for this PEP
* the locale coercion and warning behaviours are now enabled by
  default for all \*nix platforms, even Mac OS X
* covered the Android-specific changes to the use of `setlocale`
* state explicitly that we're aware this makes the behaviour
  of standalone CPython and embedded CPython diverge, we just think
  the potential benefits are sufficient to accept that downside
* note the reference implementation has yet to be updated with
  these changes

											
										
										
											2017-05-06 02:58:19 -04:00
+								Legacy C locale warning during runtime initialization
 								-----------------------------------------------------
-												PEP 538: add Background section on locale handling

											
										
										
											2017-01-03 00:19:37 -05:00
-												PEP 538: update for PEP 540 & linux-sig feedback

- PYTHONALLOWCLOCALE=1 -> PYTHONCOERCECLOCALE=0
- reword the proposed library warning
- try all of C.UTF-8, c.utf8 and en_US.UTF-8
- compare and contrast with PEP 540
- new Motivation section showing specific Docker problems
- discuss implications of "strict" error handling
- define configure options to turn the new behaviour off

											
										
										
											2017-01-07 02:04:39 -05:00
+								By the time that ``Py_Initialize`` is called, arbitrary locale-dependent
 								operations may have taken place in the current process. This means that
-												PEP 538: Only set LC_CTYPE, never LANG

It looks like setting LANG may have undesirable
side effects in some cases, and all the issues
the PEP aims to handle are resolved by setting
LC_CTYPE.

The proposal and implementation have thus been
updated to only set LC_CTYPE, even when the
target coercion locale is a full locale.

											
										
										
											2017-05-27 03:08:32 -04:00
+								by the time it is called, it is *too late* to reliably switch to a different
 								locale - doing so would introduce inconsistencies in decoded text, even in the
 								context of the standalone Python interpreter binary.
-												PEP 538: coerce legacy C locale to C.UTF-8

											
										
										
											2016-12-27 21:31:21 -05:00
-												PEP 538: update for PEP 540 & linux-sig feedback

- PYTHONALLOWCLOCALE=1 -> PYTHONCOERCECLOCALE=0
- reword the proposed library warning
- try all of C.UTF-8, c.utf8 and en_US.UTF-8
- compare and contrast with PEP 540
- new Motivation section showing specific Docker problems
- discuss implications of "strict" error handling
- define configure options to turn the new behaviour off

											
										
										
											2017-01-07 02:04:39 -05:00
+								Accordingly, when ``Py_Initialize`` is called and CPython detects that the
-												PEP 538: Update for latest python-dev discussion

* default standard stream error handler is always "surrogateescape"
  for the potential coercion target locales
* PEP 540 is now a purely optional follow-on PEP that improves the
  handling of cases where none of these locales are available,
  but doesn't require revisiting the changes made for this PEP
* the locale coercion and warning behaviours are now enabled by
  default for all \*nix platforms, even Mac OS X
* covered the Android-specific changes to the use of `setlocale`
* state explicitly that we're aware this makes the behaviour
  of standalone CPython and embedded CPython diverge, we just think
  the potential benefits are sufficient to accept that downside
* note the reference implementation has yet to be updated with
  these changes

											
										
										
											2017-05-06 02:58:19 -04:00
+								configured locale is still the default ``C`` locale and
 								``PYTHONCOERCECLOCALE=0`` is not set, the following warning will be issued::
-												PEP 538: coerce legacy C locale to C.UTF-8

											
										
										
											2016-12-27 21:31:21 -05:00
-												PEP 538: update for PEP 540 & linux-sig feedback

- PYTHONALLOWCLOCALE=1 -> PYTHONCOERCECLOCALE=0
- reword the proposed library warning
- try all of C.UTF-8, c.utf8 and en_US.UTF-8
- compare and contrast with PEP 540
- new Motivation section showing specific Docker problems
- discuss implications of "strict" error handling
- define configure options to turn the new behaviour off

											
										
										
											2017-01-07 02:04:39 -05:00
+								   Python runtime initialized with LC_CTYPE=C (a locale with default ASCII
-												PEP 538: update for python-dev & implementation feedback

- PYTHONCOERCECLOCALE=0 now also disables the library warning
- PEP just refers to locale-aware/locale-independent components,
  without specifically limiting that to C/C++ components

											
										
										
											2017-03-13 01:06:48 -04:00
+								   encoding), which may cause Unicode compatibility problems. Using C.UTF-8,
-												PEP 538: Update reference implementation (#219)

- updates reference implementation to use PYTHONCOERCECLOCALE
- removes hard dependency on PEP 540
- still notes PEP 540 covers case where no relevant C-with-UTF-8
  locale is available
- clarifies that these settings are still recommended over the
  legacy C locale settings for older Python 3 versions, even if
  we don't recommend backporting the automatic coercion
											
										
										
											2017-03-05 02:29:54 -05:00
+								   C.utf8, or UTF-8 (if available) as alternative Unicode-compatible
 								   locales is recommended.
-												PEP 538: update for PEP 540 & linux-sig feedback

- PYTHONALLOWCLOCALE=1 -> PYTHONCOERCECLOCALE=0
- reword the proposed library warning
- try all of C.UTF-8, c.utf8 and en_US.UTF-8
- compare and contrast with PEP 540
- new Motivation section showing specific Docker problems
- discuss implications of "strict" error handling
- define configure options to turn the new behaviour off

											
										
										
											2017-01-07 02:04:39 -05:00
-												PEP 538: Remove warnings from examples

PEP 538 is likely to be read by folks trying to understand the
implications of locale coercion in Python 3.7.

While I don't want to lose the history that we originally approved
and implemented "warn by default" behaviour, the examples are
easier to read if the warnings are omitted.

I also added inline notes for the sections affected by the change
in how the warning notifications are handled, since the implementation
note at the top is easy to miss when following a direct link to a specific
section.
											
										
										
											2018-03-29 10:05:50 -04:00
+								(Note: this warning ended up being silenced by default. See the
 								Implementation Note above for more details)
-												PEP 538: update for PEP 540 & linux-sig feedback

- PYTHONALLOWCLOCALE=1 -> PYTHONCOERCECLOCALE=0
- reword the proposed library warning
- try all of C.UTF-8, c.utf8 and en_US.UTF-8
- compare and contrast with PEP 540
- new Motivation section showing specific Docker problems
- discuss implications of "strict" error handling
- define configure options to turn the new behaviour off

											
										
										
											2017-01-07 02:04:39 -05:00
+								In this case, no actual change will be made to the locale settings.
 								Instead, the warning informs both system and application integrators that
 								they're running Python 3 in a configuration that we don't expect to work
 								properly.
-												PEP 538: update for python-dev & implementation feedback

- PYTHONCOERCECLOCALE=0 now also disables the library warning
- PEP just refers to locale-aware/locale-independent components,
  without specifically limiting that to C/C++ components

											
										
										
											2017-03-13 01:06:48 -04:00
+								The second sentence providing recommendations may eventually be conditionally
 								compiled based on the operating system (e.g. recommending ``LC_CTYPE=UTF-8``
 								on \*BSD systems), but the initial implementation will just use the common
 								generic message shown above.
-												PEP 538: Update to depend on PEP 540

- relies entirely on PEP 540 when no appropriate locale
  is available
- uses surrogateescape on standard streams by default
- accounts for BSD-style UTF-8 locales
- avoids any reliance on the en_US-UTF-8 locale
- makes note of related GNU readline issue on Android

											
										
										
											2017-01-20 09:13:24 -05:00
-												PEP 538: update for PEP 540 & linux-sig feedback

- PYTHONALLOWCLOCALE=1 -> PYTHONCOERCECLOCALE=0
- reword the proposed library warning
- try all of C.UTF-8, c.utf8 and en_US.UTF-8
- compare and contrast with PEP 540
- new Motivation section showing specific Docker problems
- discuss implications of "strict" error handling
- define configure options to turn the new behaviour off

											
										
										
											2017-01-07 02:04:39 -05:00
 								New build-time configuration options
 								------------------------------------
 								While both of the above behaviours would be enabled by default, they would
 								also have new associated configuration options and preprocessor definitions
 								for the benefit of redistributors that want to override those default settings.
 								The locale coercion behaviour would be controlled by the flag
 								``--with[out]-c-locale-coercion``, which would set the ``PY_COERCE_C_LOCALE``
 								preprocessor definition.
 								The locale warning behaviour would be controlled by the flag
 								``--with[out]-c-locale-warning``, which would set the ``PY_WARN_ON_C_LOCALE``
 								preprocessor definition.
-												PEP 538: Remove warnings from examples

PEP 538 is likely to be read by folks trying to understand the
implications of locale coercion in Python 3.7.

While I don't want to lose the history that we originally approved
and implemented "warn by default" behaviour, the examples are
easier to read if the warnings are omitted.

I also added inline notes for the sections affected by the change
in how the warning notifications are handled, since the implementation
note at the top is easy to miss when following a direct link to a specific
section.
											
										
										
											2018-03-29 10:05:50 -04:00
+								(Note: this compile time warning option ended up being replaced by a runtime
 								``PYTHONCOERCECLOCALE=warn`` option. See the Implementation Note above for
 								more details)
-												PEP 538: Update for latest python-dev discussion

* default standard stream error handler is always "surrogateescape"
  for the potential coercion target locales
* PEP 540 is now a purely optional follow-on PEP that improves the
  handling of cases where none of these locales are available,
  but doesn't require revisiting the changes made for this PEP
* the locale coercion and warning behaviours are now enabled by
  default for all \*nix platforms, even Mac OS X
* covered the Android-specific changes to the use of `setlocale`
* state explicitly that we're aware this makes the behaviour
  of standalone CPython and embedded CPython diverge, we just think
  the potential benefits are sufficient to accept that downside
* note the reference implementation has yet to be updated with
  these changes

											
										
										
											2017-05-06 02:58:19 -04:00
+								On platforms which don't use the ``autotools`` based build system (i.e.
-												PEP 538: update for PEP 540 & linux-sig feedback

- PYTHONALLOWCLOCALE=1 -> PYTHONCOERCECLOCALE=0
- reword the proposed library warning
- try all of C.UTF-8, c.utf8 and en_US.UTF-8
- compare and contrast with PEP 540
- new Motivation section showing specific Docker problems
- discuss implications of "strict" error handling
- define configure options to turn the new behaviour off

											
										
										
											2017-01-07 02:04:39 -05:00
+								Windows) these preprocessor variables would always be undefined.
-												PEP 538: coerce legacy C locale to C.UTF-8

											
										
										
											2016-12-27 21:31:21 -05:00
-												PEP 538: cite all 3 common failure modes

											
										
										
											2017-01-07 03:19:44 -05:00
-												PEP 538: Update for latest python-dev discussion

* default standard stream error handler is always "surrogateescape"
  for the potential coercion target locales
* PEP 540 is now a purely optional follow-on PEP that improves the
  handling of cases where none of these locales are available,
  but doesn't require revisiting the changes made for this PEP
* the locale coercion and warning behaviours are now enabled by
  default for all \*nix platforms, even Mac OS X
* covered the Android-specific changes to the use of `setlocale`
* state explicitly that we're aware this makes the behaviour
  of standalone CPython and embedded CPython diverge, we just think
  the potential benefits are sufficient to accept that downside
* note the reference implementation has yet to be updated with
  these changes

											
										
										
											2017-05-06 02:58:19 -04:00
+								Changes to the default error handling on the standard streams
 								-------------------------------------------------------------
 								Since Python 3.5, CPython has defaulted to using ``surrogateescape`` on the
-												PEP 538: clarify sys.stderr error handler isn't changing

The previous wording suggested that all three standard streams
were affected by the proposed change to the default error handler.

Only stdin and stdout are affected - the default error handler for
stderr continues to be "backslashreplace"

											
										
										
											2017-05-06 06:59:05 -04:00
+								standard streams (``sys.stdin``, ``sys.stdout``) when it detects that the
 								current locale is ``C`` and no specific error handled has been set using
 								either the ``PYTHONIOENCODING`` environment variable or the
-												PEP 538: Update for latest python-dev discussion

* default standard stream error handler is always "surrogateescape"
  for the potential coercion target locales
* PEP 540 is now a purely optional follow-on PEP that improves the
  handling of cases where none of these locales are available,
  but doesn't require revisiting the changes made for this PEP
* the locale coercion and warning behaviours are now enabled by
  default for all \*nix platforms, even Mac OS X
* covered the Android-specific changes to the use of `setlocale`
* state explicitly that we're aware this makes the behaviour
  of standalone CPython and embedded CPython diverge, we just think
  the potential benefits are sufficient to accept that downside
* note the reference implementation has yet to be updated with
  these changes

											
										
										
											2017-05-06 02:58:19 -04:00
+								``Py_setStandardStreamEncoding`` API. For other locales, the default error
 								handler for the standard streams is ``strict``.
 								In order to preserve this behaviour without introducing any behavioural
 								discrepancies between locale coercion and explicitly configuring a locale, the
 								coercion target locales (``C.UTF-8``, ``C.utf8``, and ``UTF-8``) will be added
 								to the list of locales that use ``surrogateescape`` as their default error
 								handler for the standard streams.
-												PEP 538: clarify sys.stderr error handler isn't changing

The previous wording suggested that all three standard streams
were affected by the proposed change to the default error handler.

Only stdin and stdout are affected - the default error handler for
stderr continues to be "backslashreplace"

											
										
										
											2017-05-06 06:59:05 -04:00
+								No changes are proposed to the default error handler for ``sys.stderr``: that
 								will continue to be ``backslashreplace``.
-												PEP 538: Update for latest python-dev discussion

* default standard stream error handler is always "surrogateescape"
  for the potential coercion target locales
* PEP 540 is now a purely optional follow-on PEP that improves the
  handling of cases where none of these locales are available,
  but doesn't require revisiting the changes made for this PEP
* the locale coercion and warning behaviours are now enabled by
  default for all \*nix platforms, even Mac OS X
* covered the Android-specific changes to the use of `setlocale`
* state explicitly that we're aware this makes the behaviour
  of standalone CPython and embedded CPython diverge, we just think
  the potential benefits are sufficient to accept that downside
* note the reference implementation has yet to be updated with
  these changes

											
										
										
											2017-05-06 02:58:19 -04:00
 								Changes to locale settings on Android
 								-------------------------------------
 								Independently of the other changes in this PEP, CPython on Android systems
 								will be updated to call ``setlocale(LC_ALL, "C.UTF-8")`` where it currently
 								calls ``setlocale(LC_ALL, "")`` and ``setlocale(LC_CTYPE, "C.UTF-8")`` where
 								it currently calls ``setlocale(LC_CTYPE, "")``.
 								This Android-specific behaviour is being introduced due to the following
 								Android-specific details:
 								* on Android, passing ``""`` to ``setlocale`` is equivalent to passing ``"C"``
 								* the ``C.UTF-8`` locale is always available
-												PEP 538: coerce legacy C locale to C.UTF-8

											
										
										
											2016-12-27 21:31:21 -05:00
+								Platform Support Changes
 								========================
-												Several PEPs: Use explicit `:pep:` and `:rfc:` roles (#2209)


											
										
										
											2022-01-21 06:03:51 -05:00
+								A new "Legacy C Locale" section will be added to :pep:`11` that states:
-												PEP 538: coerce legacy C locale to C.UTF-8

											
										
										
											2016-12-27 21:31:21 -05:00
-												PEP 538: update for python-dev & implementation feedback

- PYTHONCOERCECLOCALE=0 now also disables the library warning
- PEP just refers to locale-aware/locale-independent components,
  without specifically limiting that to C/C++ components

											
										
										
											2017-03-13 01:06:48 -04:00
+								* as of CPython 3.7, \*nix platforms are expected to provide at least one of
 								  ``C.UTF-8`` (full locale), ``C.utf8`` (full locale) or ``UTF-8`` (
 								  ``LC_CTYPE``-only locale) as an alternative to the legacy ``C`` locale.
-												PEP 538 updates for python-dev review

* Tidy up the abstract and emphasise the equivalence between
  this proposal and long supported configuration settings
* Don't set LC_ALL (set LC_CTYPE instead)
* Add a rationale for that change
* Use GNU readline misbehaviour as a specific example of the
  benefits of reconfiguring the locale
* Clarify rationale for enabling the changes by default on all
  autotools-using platforms
* Mention the possibility of exposing a public API for use by
  embedding platforms

											
										
										
											2017-05-09 06:46:59 -04:00
+								  Any Unicode related integration problems that occur only in the legacy ``C``
 								  locale and cannot be reproduced in an appropriately configured non-ASCII
 								  locale will be closed as "won't fix".
-												PEP 538: update for python-dev & implementation feedback

- PYTHONCOERCECLOCALE=0 now also disables the library warning
- PEP just refers to locale-aware/locale-independent components,
  without specifically limiting that to C/C++ components

											
										
										
											2017-03-13 01:06:48 -04:00
-												PEP 538: coerce legacy C locale to C.UTF-8

											
										
										
											2016-12-27 21:31:21 -05:00
 								Rationale
 								=========
 								Improving the handling of the C locale
 								--------------------------------------
 								It has been clear for some time that the C locale's default encoding of
 								``ASCII`` is entirely the wrong choice for development of modern networked
 								services. Newer languages like Rust and Go have eschewed that default entirely,
 								and instead made it a deployment requirement that systems be configured to use
 								UTF-8 as the text encoding for operating system interfaces. Similarly, Node.js
 								assumes UTF-8 by default (a behaviour inherited from the V8 JavaScript engine)
 								and requires custom build settings to indicate it should use the system
-												PEP 538: add Background section on locale handling

											
										
										
											2017-01-03 00:19:37 -05:00
+								locale settings for locale-aware operations. Both the JVM and the .NET CLR
 								use UTF-16-LE as their primary encoding for passing text between applications
-												PEP 538: Minor JVM/CLR related clarifications

											
										
										
											2017-05-04 10:59:53 -04:00
+								and the application runtime (i.e. the JVM/CLR, not the host operating system).
-												PEP 538: coerce legacy C locale to C.UTF-8

											
										
										
											2016-12-27 21:31:21 -05:00
 								The challenge for CPython has been the fact that in addition to being used for
 								network service development, it is also extensively used as an embedded
 								scripting language in larger applications, and as a desktop application
 								development language, where it is more important to be consistent with other
-												PEP 538: update for python-dev & implementation feedback

- PYTHONCOERCECLOCALE=0 now also disables the library warning
- PEP just refers to locale-aware/locale-independent components,
  without specifically limiting that to C/C++ components

											
										
										
											2017-03-13 01:06:48 -04:00
+								locale-aware components sharing the same process, as well as with the user's
 								desktop locale settings, than it is with the emergent conventions of modern
 								network service development.
-												PEP 538: coerce legacy C locale to C.UTF-8

											
										
										
											2016-12-27 21:31:21 -05:00
-												PEP 538: Update to depend on PEP 540

- relies entirely on PEP 540 when no appropriate locale
  is available
- uses surrogateescape on standard streams by default
- accounts for BSD-style UTF-8 locales
- avoids any reliance on the en_US-UTF-8 locale
- makes note of related GNU readline issue on Android

											
										
										
											2017-01-20 09:13:24 -05:00
+								The core premise of this PEP is that for *all* of these use cases, the
 								assumption of ASCII implied by the default "C" locale is the wrong choice,
 								and furthermore that the following assumptions are valid:
-												PEP 538: coerce legacy C locale to C.UTF-8

											
										
										
											2016-12-27 21:31:21 -05:00
 								* in desktop application use cases, the process locale will *already* be
 								  configured appropriately, and if it isn't, then that is an operating system
-												PEP 538: Update to depend on PEP 540

- relies entirely on PEP 540 when no appropriate locale
  is available
- uses surrogateescape on standard streams by default
- accounts for BSD-style UTF-8 locales
- avoids any reliance on the en_US-UTF-8 locale
- makes note of related GNU readline issue on Android

											
										
										
											2017-01-20 09:13:24 -05:00
+								  or embedding application level problem that needs to be reported to and
 								  resolved by the operating system provider or application developer
-												PEP 538: coerce legacy C locale to C.UTF-8

											
										
										
											2016-12-27 21:31:21 -05:00
+								* in network service development use cases (especially those based on Linux
 								  containers), the process locale may not be configured *at all*, and if it
 								  isn't, then the expectation is that components will impose their own default
 								  encoding the way Rust, Go and Node.js do, rather than trusting the legacy C
 								  default encoding of ASCII the way CPython currently does
-												PEP 538: Update to depend on PEP 540

- relies entirely on PEP 540 when no appropriate locale
  is available
- uses surrogateescape on standard streams by default
- accounts for BSD-style UTF-8 locales
- avoids any reliance on the en_US-UTF-8 locale
- makes note of related GNU readline issue on Android

											
										
										
											2017-01-20 09:13:24 -05:00
+								Defaulting to "surrogateescape" error handling on the standard IO streams
 								-------------------------------------------------------------------------
-												PEP 538: update for PEP 540 & linux-sig feedback

- PYTHONALLOWCLOCALE=1 -> PYTHONCOERCECLOCALE=0
- reword the proposed library warning
- try all of C.UTF-8, c.utf8 and en_US.UTF-8
- compare and contrast with PEP 540
- new Motivation section showing specific Docker problems
- discuss implications of "strict" error handling
- define configure options to turn the new behaviour off

											
										
										
											2017-01-07 02:04:39 -05:00
 								By coercing the locale away from the legacy C default and its assumption of
 								ASCII as the preferred text encoding, this PEP also disables the implicit use
 								of the "surrogateescape" error handler on the standard IO streams that was
-												PEP 538: Update to depend on PEP 540

- relies entirely on PEP 540 when no appropriate locale
  is available
- uses surrogateescape on standard streams by default
- accounts for BSD-style UTF-8 locales
- avoids any reliance on the en_US-UTF-8 locale
- makes note of related GNU readline issue on Android

											
										
										
											2017-01-20 09:13:24 -05:00
+								introduced in Python 3.5 ([15_]), as well as the automatic use of
-												Several PEPs: Use explicit `:pep:` and `:rfc:` roles (#2209)


											
										
										
											2022-01-21 06:03:51 -05:00
+								``surrogateescape`` when operating in :pep:`540`'s proposed UTF-8 mode.
-												PEP 538: document core design principles

Also provides a bit more background on the rationale for
using "strict" by default on stdin and stdout when coercing
the locale to one based on UTF-8

											
										
										
											2017-01-07 20:54:24 -05:00
-												PEP 538: clarify sys.stderr error handler isn't changing

The previous wording suggested that all three standard streams
were affected by the proposed change to the default error handler.

Only stdin and stdout are affected - the default error handler for
stderr continues to be "backslashreplace"

											
										
										
											2017-05-06 06:59:05 -04:00
+								Rather than introducing yet another configuration option to adjust that
 								behaviour, this PEP instead proposes to extend the "surrogateescape" default
 								for ``stdin`` and ``stderr`` error handling to also apply to the three
 								potential coercion target locales.
-												PEP 538: update for PEP 540 & linux-sig feedback

- PYTHONALLOWCLOCALE=1 -> PYTHONCOERCECLOCALE=0
- reword the proposed library warning
- try all of C.UTF-8, c.utf8 and en_US.UTF-8
- compare and contrast with PEP 540
- new Motivation section showing specific Docker problems
- discuss implications of "strict" error handling
- define configure options to turn the new behaviour off

											
										
										
											2017-01-07 02:04:39 -05:00
-												PEP 538: Update to depend on PEP 540

- relies entirely on PEP 540 when no appropriate locale
  is available
- uses surrogateescape on standard streams by default
- accounts for BSD-style UTF-8 locales
- avoids any reliance on the en_US-UTF-8 locale
- makes note of related GNU readline issue on Android

											
										
										
											2017-01-20 09:13:24 -05:00
+								The aim of this behaviour is to attempt to ensure that operating system
 								provided text values are typically able to be transparently passed through a
 								Python 3 application even if it is incorrect in assuming that that text has
 								been encoded as UTF-8.
-												PEP 538: update for PEP 540 & linux-sig feedback

- PYTHONALLOWCLOCALE=1 -> PYTHONCOERCECLOCALE=0
- reword the proposed library warning
- try all of C.UTF-8, c.utf8 and en_US.UTF-8
- compare and contrast with PEP 540
- new Motivation section showing specific Docker problems
- discuss implications of "strict" error handling
- define configure options to turn the new behaviour off

											
										
										
											2017-01-07 02:04:39 -05:00
 								In particular, GB 18030 [12_] is a Chinese national text encoding standard
-												PEP 538: Update to depend on PEP 540

- relies entirely on PEP 540 when no appropriate locale
  is available
- uses surrogateescape on standard streams by default
- accounts for BSD-style UTF-8 locales
- avoids any reliance on the en_US-UTF-8 locale
- makes note of related GNU readline issue on Android

											
										
										
											2017-01-20 09:13:24 -05:00
+								that handles all Unicode code points, that is formally incompatible with both
 								ASCII and UTF-8, but will nevertheless often tolerate processing as surrogate
 								escaped data - the points where GB 18030 reuses ASCII byte values in an
 								incompatible way are likely to be invalid in UTF-8, and will therefore be
 								escaped and opaque to string processing operations that split on or search for
 								the relevant ASCII code points. Operations that don't involve splitting on or
 								searching for particular ASCII or Unicode code point values are almost
 								certain to work correctly.
-												PEP 538: update for PEP 540 & linux-sig feedback

- PYTHONALLOWCLOCALE=1 -> PYTHONCOERCECLOCALE=0
- reword the proposed library warning
- try all of C.UTF-8, c.utf8 and en_US.UTF-8
- compare and contrast with PEP 540
- new Motivation section showing specific Docker problems
- discuss implications of "strict" error handling
- define configure options to turn the new behaviour off

											
										
										
											2017-01-07 02:04:39 -05:00
 								Similarly, Shift-JIS [13_] and ISO-2022-JP [14_] remain in widespread use in
-												PEP 538: Update to depend on PEP 540

- relies entirely on PEP 540 when no appropriate locale
  is available
- uses surrogateescape on standard streams by default
- accounts for BSD-style UTF-8 locales
- avoids any reliance on the en_US-UTF-8 locale
- makes note of related GNU readline issue on Android

											
										
										
											2017-01-20 09:13:24 -05:00
+								Japan, and are incompatible with both ASCII and UTF-8, but will tolerate text
 								processing operations that don't involve splitting on or searching for
 								particular ASCII or Unicode code point values.
 								As an example, consider two files, one encoded with UTF-8 (the default encoding
 								for ``en_AU.UTF-8``), and one encoded with GB-18030 (the default encoding for
 								``zh_CN.gb18030``)::
 								    $ python3 -c 'open("utf8.txt", "wb").write("ℙƴ☂ℌøἤ\n".encode("utf-8"))'
-												PEP 538: typo fixes

											
										
										
											2017-02-06 09:02:06 -05:00
+								    $ python3 -c 'open("gb18030.txt", "wb").write("ℙƴ☂ℌøἤ\n".encode("gb18030"))'
-												PEP 538: Update to depend on PEP 540

- relies entirely on PEP 540 when no appropriate locale
  is available
- uses surrogateescape on standard streams by default
- accounts for BSD-style UTF-8 locales
- avoids any reliance on the en_US-UTF-8 locale
- makes note of related GNU readline issue on Android

											
										
										
											2017-01-20 09:13:24 -05:00
 								On disk, we can see that these are two very different files::
 								    $ python3 -c 'print("UTF-8:  ", open("utf8.txt", "rb").read().strip()); \
 								                  print("GB18030:", open("gb18030.txt", "rb").read().strip())'
 								    UTF-8:   b'\xe2\x84\x99\xc6\xb4\xe2\x98\x82\xe2\x84\x8c\xc3\xb8\xe1\xbc\xa4\n'
 								    GB18030: b'\x816\xbd6\x810\x9d0\x817\xa29\x816\xbc4\x810\x8b3\x816\x8d6\n'
 								That nevertheless can both be rendered correctly to the terminal as long as
 								they're decoded prior to printing::
 								    $ python3 -c 'print("UTF-8:  ", open("utf8.txt", "r", encoding="utf-8").read().strip()); \
 								                  print("GB18030:", open("gb18030.txt", "r", encoding="gb18030").read().strip())'
 								    UTF-8:   ℙƴ☂ℌøἤ
 								    GB18030: ℙƴ☂ℌøἤ
 								By contrast, if we just pass along the raw bytes, as ``cat`` and similar C/C++
 								utilities will tend to do::
 								    $ LANG=en_AU.UTF-8 cat utf8.txt gb18030.txt
 								    ℙƴ☂ℌøἤ
 								    <20>6<EFBFBD>6<EFBFBD>0<EFBFBD>0<EFBFBD>7<EFBFBD>9<EFBFBD>6<EFBFBD>4<EFBFBD>0<EFBFBD>3<EFBFBD>6<EFBFBD>6
 								Even setting a specifically Chinese locale won't help in getting the
 								GB-18030 encoded file rendered correctly::
-												PEP 538: update for PEP 540 & linux-sig feedback

- PYTHONALLOWCLOCALE=1 -> PYTHONCOERCECLOCALE=0
- reword the proposed library warning
- try all of C.UTF-8, c.utf8 and en_US.UTF-8
- compare and contrast with PEP 540
- new Motivation section showing specific Docker problems
- discuss implications of "strict" error handling
- define configure options to turn the new behaviour off

											
										
										
											2017-01-07 02:04:39 -05:00
-												PEP 538: Update to depend on PEP 540

- relies entirely on PEP 540 when no appropriate locale
  is available
- uses surrogateescape on standard streams by default
- accounts for BSD-style UTF-8 locales
- avoids any reliance on the en_US-UTF-8 locale
- makes note of related GNU readline issue on Android

											
										
										
											2017-01-20 09:13:24 -05:00
+								    $ LANG=zh_CN.gb18030 cat utf8.txt gb18030.txt
 								    ℙƴ☂ℌøἤ
 								    <20>6<EFBFBD>6<EFBFBD>0<EFBFBD>0<EFBFBD>7<EFBFBD>9<EFBFBD>6<EFBFBD>4<EFBFBD>0<EFBFBD>3<EFBFBD>6<EFBFBD>6
 								The problem is that the *terminal* encoding setting remains UTF-8, regardless
 								of the nominal locale. A GB18030 terminal can be emulated using the ``iconv``
 								utility::
 								    $ cat utf8.txt gb18030.txt | iconv -f GB18030 -t UTF-8
 								    鈩櫰粹槀鈩屆羔激
 								    ℙƴ☂ℌøἤ
 								This reverses the problem, such that the GB18030 file is rendered correctly,
 								but the UTF-8 file has been converted to unrelated hanzi characters, rather than
 								the expected rendering of "Python" as non-ASCII characters.
 								With the emulated GB18030 terminal encoding, assuming UTF-8 in Python results
 								in *both* files being displayed incorrectly::
 								    $ python3 -c 'print("UTF-8:  ", open("utf8.txt", "r", encoding="utf-8").read().strip()); \
 								                  print("GB18030:", open("gb18030.txt", "r", encoding="gb18030").read().strip())' \
 								      | iconv -f GB18030 -t UTF-8
 								    UTF-8:   鈩櫰粹槀鈩屆羔激
 								    GB18030: 鈩櫰粹槀鈩屆羔激
 								However, setting the locale correctly means that the emulated GB18030 terminal
 								now displays both files as originally intended::
 								    $ LANG=zh_CN.gb18030 \
 								      python3 -c 'print("UTF-8:  ", open("utf8.txt", "r", encoding="utf-8").read().strip()); \
 								                  print("GB18030:", open("gb18030.txt", "r", encoding="gb18030").read().strip())' \
 								      | iconv -f GB18030 -t UTF-8
 								    UTF-8:   ℙƴ☂ℌøἤ
 								    GB18030: ℙƴ☂ℌøἤ
 								The rationale for retaining ``surrogateescape`` as the default IO encoding is
-												PEP 538: Update for latest python-dev discussion

* default standard stream error handler is always "surrogateescape"
  for the potential coercion target locales
* PEP 540 is now a purely optional follow-on PEP that improves the
  handling of cases where none of these locales are available,
  but doesn't require revisiting the changes made for this PEP
* the locale coercion and warning behaviours are now enabled by
  default for all \*nix platforms, even Mac OS X
* covered the Android-specific changes to the use of `setlocale`
* state explicitly that we're aware this makes the behaviour
  of standalone CPython and embedded CPython diverge, we just think
  the potential benefits are sufficient to accept that downside
* note the reference implementation has yet to be updated with
  these changes

											
										
										
											2017-05-06 02:58:19 -04:00
+								that it will preserve the following helpful behaviour in the ``C`` locale::
-												PEP 538: Update to depend on PEP 540

- relies entirely on PEP 540 when no appropriate locale
  is available
- uses surrogateescape on standard streams by default
- accounts for BSD-style UTF-8 locales
- avoids any reliance on the en_US-UTF-8 locale
- makes note of related GNU readline issue on Android

											
										
										
											2017-01-20 09:13:24 -05:00
 								    $ cat gb18030.txt \
 								      | LANG=C python3 -c "import sys; print(sys.stdin.read())" \
 								      | iconv -f GB18030 -t UTF-8
 								    ℙƴ☂ℌøἤ
-												PEP 538: Update for latest python-dev discussion

* default standard stream error handler is always "surrogateescape"
  for the potential coercion target locales
* PEP 540 is now a purely optional follow-on PEP that improves the
  handling of cases where none of these locales are available,
  but doesn't require revisiting the changes made for this PEP
* the locale coercion and warning behaviours are now enabled by
  default for all \*nix platforms, even Mac OS X
* covered the Android-specific changes to the use of `setlocale`
* state explicitly that we're aware this makes the behaviour
  of standalone CPython and embedded CPython diverge, we just think
  the potential benefits are sufficient to accept that downside
* note the reference implementation has yet to be updated with
  these changes

											
										
										
											2017-05-06 02:58:19 -04:00
+								Rather than reverting to the exception currently seen when a UTF-8 based locale is
-												PEP 538: Update to depend on PEP 540

- relies entirely on PEP 540 when no appropriate locale
  is available
- uses surrogateescape on standard streams by default
- accounts for BSD-style UTF-8 locales
- avoids any reliance on the en_US-UTF-8 locale
- makes note of related GNU readline issue on Android

											
										
										
											2017-01-20 09:13:24 -05:00
+								explicitly configured::
-												PEP 538: update for PEP 540 & linux-sig feedback

- PYTHONALLOWCLOCALE=1 -> PYTHONCOERCECLOCALE=0
- reword the proposed library warning
- try all of C.UTF-8, c.utf8 and en_US.UTF-8
- compare and contrast with PEP 540
- new Motivation section showing specific Docker problems
- discuss implications of "strict" error handling
- define configure options to turn the new behaviour off

											
										
										
											2017-01-07 02:04:39 -05:00
-												PEP 538: Update to depend on PEP 540

- relies entirely on PEP 540 when no appropriate locale
  is available
- uses surrogateescape on standard streams by default
- accounts for BSD-style UTF-8 locales
- avoids any reliance on the en_US-UTF-8 locale
- makes note of related GNU readline issue on Android

											
										
										
											2017-01-20 09:13:24 -05:00
+								    $ cat gb18030.txt \
 								      | python3 -c "import sys; print(sys.stdin.read())" \
 								      | iconv -f GB18030 -t UTF-8
 								    Traceback (most recent call last):
 								    File "<string>", line 1, in <module>
 								    File "/usr/lib64/python3.5/codecs.py", line 321, in decode
 								        (result, consumed) = self._buffer_decode(data, self.errors, final)
 								    UnicodeDecodeError: 'utf-8' codec can't decode byte 0x81 in position 0: invalid start byte
-												PEP 538: Update for latest python-dev discussion

* default standard stream error handler is always "surrogateescape"
  for the potential coercion target locales
* PEP 540 is now a purely optional follow-on PEP that improves the
  handling of cases where none of these locales are available,
  but doesn't require revisiting the changes made for this PEP
* the locale coercion and warning behaviours are now enabled by
  default for all \*nix platforms, even Mac OS X
* covered the Android-specific changes to the use of `setlocale`
* state explicitly that we're aware this makes the behaviour
  of standalone CPython and embedded CPython diverge, we just think
  the potential benefits are sufficient to accept that downside
* note the reference implementation has yet to be updated with
  these changes

											
										
										
											2017-05-06 02:58:19 -04:00
+								As an added benefit, environments explicitly configured to use one of the
 								coercion target locales will implicitly gain the encoding transparency behaviour
 								currently enabled by default in the ``C`` locale.
-												PEP 538: update for python-dev & implementation feedback

- PYTHONCOERCECLOCALE=0 now also disables the library warning
- PEP just refers to locale-aware/locale-independent components,
  without specifically limiting that to C/C++ components

											
										
										
											2017-03-13 01:06:48 -04:00
-												PEP 538: update for PEP 540 & linux-sig feedback

- PYTHONALLOWCLOCALE=1 -> PYTHONCOERCECLOCALE=0
- reword the proposed library warning
- try all of C.UTF-8, c.utf8 and en_US.UTF-8
- compare and contrast with PEP 540
- new Motivation section showing specific Docker problems
- discuss implications of "strict" error handling
- define configure options to turn the new behaviour off

											
										
										
											2017-01-07 02:04:39 -05:00
-												PEP 538 updates for python-dev review

* Tidy up the abstract and emphasise the equivalence between
  this proposal and long supported configuration settings
* Don't set LC_ALL (set LC_CTYPE instead)
* Add a rationale for that change
* Use GNU readline misbehaviour as a specific example of the
  benefits of reconfiguring the locale
* Clarify rationale for enabling the changes by default on all
  autotools-using platforms
* Mention the possibility of exposing a public API for use by
  embedding platforms

											
										
										
											2017-05-09 06:46:59 -04:00
+								Avoiding setting PYTHONIOENCODING during UTF-8 locale coercion
 								--------------------------------------------------------------
 								Rather than changing the default handling of the standard streams during
 								interpreter initialization, earlier versions of this PEP proposed setting
 								``PYTHONIOENCODING`` to ``utf-8:surrogateescape``. This turned out to create
 								a significant compatibility problem: since the ``surrogateescape`` handler
 								only exists in Python 3.1+, running Python 2.7 processes in subprocesses could
 								potentially break in a confusing way with that configuration.
 								The current design means that earlier Python versions will instead retain their
 								default ``strict`` error handling on the standard streams, while Python 3.7+
 								will consistently use the more permissive ``surrogateescape`` handler even
 								when these locales are explicitly configured (rather than being reached through
 								locale coercion).
-												PEP 538: Update to depend on PEP 540

- relies entirely on PEP 540 when no appropriate locale
  is available
- uses surrogateescape on standard streams by default
- accounts for BSD-style UTF-8 locales
- avoids any reliance on the en_US-UTF-8 locale
- makes note of related GNU readline issue on Android

											
										
										
											2017-01-20 09:13:24 -05:00
+								Dropping official support for ASCII based text handling in the legacy C locale
 								------------------------------------------------------------------------------
-												PEP 538: coerce legacy C locale to C.UTF-8

											
										
										
											2016-12-27 21:31:21 -05:00
 								We've been trying to get strict bytes/text separation to work reliably in the
 								legacy C locale for over a decade at this point. Not only haven't we been able
 								to get it to work, neither has anyone else - the only viable alternatives
 								identified have been to pass the bytes along verbatim without eagerly decoding
-												PEP 538: Update for latest python-dev discussion

* default standard stream error handler is always "surrogateescape"
  for the potential coercion target locales
* PEP 540 is now a purely optional follow-on PEP that improves the
  handling of cases where none of these locales are available,
  but doesn't require revisiting the changes made for this PEP
* the locale coercion and warning behaviours are now enabled by
  default for all \*nix platforms, even Mac OS X
* covered the Android-specific changes to the use of `setlocale`
* state explicitly that we're aware this makes the behaviour
  of standalone CPython and embedded CPython diverge, we just think
  the potential benefits are sufficient to accept that downside
* note the reference implementation has yet to be updated with
  these changes

											
										
										
											2017-05-06 02:58:19 -04:00
+								them to text (C/C++, Python 2.x, Ruby, etc), or else to largely ignore the
-												Several PEPs: Use explicit `:pep:` and `:rfc:` roles (#2209)


											
										
										
											2022-01-21 06:03:51 -05:00
+								nominal C/C++ locale encoding and assume the use of either UTF-8 (:pep:`540`,
-												PEP 538: Update to depend on PEP 540

- relies entirely on PEP 540 when no appropriate locale
  is available
- uses surrogateescape on standard streams by default
- accounts for BSD-style UTF-8 locales
- avoids any reliance on the en_US-UTF-8 locale
- makes note of related GNU readline issue on Android

											
										
										
											2017-01-20 09:13:24 -05:00
+								Rust, Go, Node.js, etc) or UTF-16-LE (JVM, .NET CLR).
-												PEP 538: coerce legacy C locale to C.UTF-8

											
										
										
											2016-12-27 21:31:21 -05:00
-												PEP 538: update for python-dev & implementation feedback

- PYTHONCOERCECLOCALE=0 now also disables the library warning
- PEP just refers to locale-aware/locale-independent components,
  without specifically limiting that to C/C++ components

											
										
										
											2017-03-13 01:06:48 -04:00
+								While this PEP ensures that developers that genuinely need to do so can still
-												PEP 538 updates for python-dev review

* Tidy up the abstract and emphasise the equivalence between
  this proposal and long supported configuration settings
* Don't set LC_ALL (set LC_CTYPE instead)
* Add a rationale for that change
* Use GNU readline misbehaviour as a specific example of the
  benefits of reconfiguring the locale
* Clarify rationale for enabling the changes by default on all
  autotools-using platforms
* Mention the possibility of exposing a public API for use by
  embedding platforms

											
										
										
											2017-05-09 06:46:59 -04:00
+								opt-in to running their Python code in the legacy C locale (by setting
 								``LC_ALL=C``, ``PYTHONCOERCECLOCALE=0``, or running a custom build that sets
-												PEP 538: update for python-dev & implementation feedback

- PYTHONCOERCECLOCALE=0 now also disables the library warning
- PEP just refers to locale-aware/locale-independent components,
  without specifically limiting that to C/C++ components

											
										
										
											2017-03-13 01:06:48 -04:00
+								``--without-c-locale-coercion``), it also makes it clear that we *don't*
 								expect Python 3's Unicode handling to be completely reliable in that
 								configuration, and the recommended alternative is to use a more appropriate
-												Several PEPs: Use explicit `:pep:` and `:rfc:` roles (#2209)


											
										
										
											2022-01-21 06:03:51 -05:00
+								locale setting (potentially in combination with :pep:`540`'s UTF-8 mode, if that
-												PEP 538 updates for python-dev review

* Tidy up the abstract and emphasise the equivalence between
  this proposal and long supported configuration settings
* Don't set LC_ALL (set LC_CTYPE instead)
* Add a rationale for that change
* Use GNU readline misbehaviour as a specific example of the
  benefits of reconfiguring the locale
* Clarify rationale for enabling the changes by default on all
  autotools-using platforms
* Mention the possibility of exposing a public API for use by
  embedding platforms

											
										
										
											2017-05-09 06:46:59 -04:00
+								is available).
-												PEP 538: coerce legacy C locale to C.UTF-8

											
										
										
											2016-12-27 21:31:21 -05:00
 								Providing implicit locale coercion only when running standalone
 								---------------------------------------------------------------
-												PEP 538: Update for latest python-dev discussion

* default standard stream error handler is always "surrogateescape"
  for the potential coercion target locales
* PEP 540 is now a purely optional follow-on PEP that improves the
  handling of cases where none of these locales are available,
  but doesn't require revisiting the changes made for this PEP
* the locale coercion and warning behaviours are now enabled by
  default for all \*nix platforms, even Mac OS X
* covered the Android-specific changes to the use of `setlocale`
* state explicitly that we're aware this makes the behaviour
  of standalone CPython and embedded CPython diverge, we just think
  the potential benefits are sufficient to accept that downside
* note the reference implementation has yet to be updated with
  these changes

											
										
										
											2017-05-06 02:58:19 -04:00
+								The major downside of the proposed design in this PEP is that it introduces a
 								potential discrepancy between the behaviour of the CPython runtime when it is
 								run as a standalone application and when it is run as an embedded component
 								inside a larger system (e.g. ``mod_wsgi`` running inside Apache ``httpd``).
-												PEP 538: coerce legacy C locale to C.UTF-8

											
										
										
											2016-12-27 21:31:21 -05:00
+								Over the course of Python 3.x development, multiple attempts have been made
 								to improve the handling of incorrect locale settings at the point where the
 								Python interpreter is initialised. The problem that emerged is that this is
 								ultimately *too late* in the interpreter startup process - data such as command
 								line arguments and the contents of environment variables may have already been
 								retrieved from the operating system and processed under the incorrect ASCII
 								text encoding assumption well before ``Py_Initialize`` is called.
 								The problems created by those inconsistencies were then even harder to diagnose
 								and debug than those created by believing the operating system's claim that
 								ASCII was a suitable encoding to use for operating system interfaces. This was
 								the case even for the default CPython binary, let alone larger C/C++
 								applications that embed CPython as a scripting engine.
 								The approach proposed in this PEP handles that problem by moving the locale
 								coercion as early as possible in the interpreter startup sequence when running
 								standalone: it takes place directly in the C-level ``main()`` function, even
-												pep-538: fix some wrong backtick usage. (GH-251)


											
										
										
											2017-05-04 09:20:13 -04:00
+								before calling in to the ``Py_Main()`` library function that implements the
-												PEP 538: coerce legacy C locale to C.UTF-8

											
										
										
											2016-12-27 21:31:21 -05:00
+								features of the CPython interpreter CLI.
 								The ``Py_Initialize`` API then only gains an explicit warning (emitted on
 								``stderr``) when it detects use of the ``C`` locale, and relies on the
 								embedding application to specify something more reasonable.
-												PEP 538 updates for python-dev review

* Tidy up the abstract and emphasise the equivalence between
  this proposal and long supported configuration settings
* Don't set LC_ALL (set LC_CTYPE instead)
* Add a rationale for that change
* Use GNU readline misbehaviour as a specific example of the
  benefits of reconfiguring the locale
* Clarify rationale for enabling the changes by default on all
  autotools-using platforms
* Mention the possibility of exposing a public API for use by
  embedding platforms

											
										
										
											2017-05-09 06:46:59 -04:00
+								That said, the reference implementation for this PEP adds most of the
 								functionality to the shared library, with the CLI being updated to
 								unconditionally call two new private APIs::
 								    if (_Py_LegacyLocaleDetected()) {
 								        _Py_CoerceLegacyLocale();
 								    }
 								These are similar to other "pre-configuration" APIs intended for embedding
 								applications: they're designed to be called *before* ``Py_Initialize``, and
 								hence change the way the interpreter gets initialized.
 								If these were made public (either as part of this PEP or in a subsequent RFE),
 								then it would be straightforward for other embedding applications to recreate
 								the same behaviour as is proposed for the CPython CLI.
-												PEP 538: coerce legacy C locale to C.UTF-8

											
										
										
											2016-12-27 21:31:21 -05:00
-												PEP 538: Update for latest python-dev discussion

* default standard stream error handler is always "surrogateescape"
  for the potential coercion target locales
* PEP 540 is now a purely optional follow-on PEP that improves the
  handling of cases where none of these locales are available,
  but doesn't require revisiting the changes made for this PEP
* the locale coercion and warning behaviours are now enabled by
  default for all \*nix platforms, even Mac OS X
* covered the Android-specific changes to the use of `setlocale`
* state explicitly that we're aware this makes the behaviour
  of standalone CPython and embedded CPython diverge, we just think
  the potential benefits are sufficient to accept that downside
* note the reference implementation has yet to be updated with
  these changes

											
										
										
											2017-05-06 02:58:19 -04:00
+								Allowing restoration of the legacy behaviour
 								--------------------------------------------
 								The CPython command line interpreter is often used to investigate faults that
 								occur in other applications that embed CPython, and those applications may still
 								be using the C locale even after this PEP is implemented.
 								Providing a simple on/off switch for the locale coercion behaviour makes it
 								much easier to reproduce the behaviour of such applications for debugging
 								purposes, as well as making it easier to reproduce the behaviour of older 3.x
 								runtimes even when running a version with this change applied.
-												PEP 538: coerce legacy C locale to C.UTF-8

											
										
										
											2016-12-27 21:31:21 -05:00
+								Querying LC_CTYPE for C locale detection
 								----------------------------------------
 								``LC_CTYPE`` is the actual locale category that CPython relies on to drive the
 								implicit decoding of environment variables, command line arguments, and other
 								text values received from the operating system.
 								As such, it makes sense to check it specifically when attempting to determine
 								whether or not the current locale configuration is likely to cause Unicode
 								handling problems.
-												PEP 538: Only set LC_CTYPE, never LANG

It looks like setting LANG may have undesirable
side effects in some cases, and all the issues
the PEP aims to handle are resolved by setting
LC_CTYPE.

The proposal and implementation have thus been
updated to only set LC_CTYPE, even when the
target coercion locale is a full locale.

											
										
										
											2017-05-27 03:08:32 -04:00
+								Explicitly setting LC_CTYPE for UTF-8 locale coercion
 								-----------------------------------------------------
-												PEP 538: coerce legacy C locale to C.UTF-8

											
										
										
											2016-12-27 21:31:21 -05:00
 								Python is often used as a glue language, integrating other C/C++ ABI compatible
 								components in the current process, and components written in arbitrary
 								languages in subprocesses.
-												PEP 538 updates for python-dev review

* Tidy up the abstract and emphasise the equivalence between
  this proposal and long supported configuration settings
* Don't set LC_ALL (set LC_CTYPE instead)
* Add a rationale for that change
* Use GNU readline misbehaviour as a specific example of the
  benefits of reconfiguring the locale
* Clarify rationale for enabling the changes by default on all
  autotools-using platforms
* Mention the possibility of exposing a public API for use by
  embedding platforms

											
										
										
											2017-05-09 06:46:59 -04:00
+								Setting ``LC_CTYPE`` to ``C.UTF-8`` is important to handle cases where the
-												PEP 538: update for python-dev & implementation feedback

- PYTHONCOERCECLOCALE=0 now also disables the library warning
- PEP just refers to locale-aware/locale-independent components,
  without specifically limiting that to C/C++ components

											
										
										
											2017-03-13 01:06:48 -04:00
+								problem has arisen from a setting like ``LC_CTYPE=UTF-8`` being provided on a
 								system where no ``UTF-8`` locale is defined (e.g. when a Mac OS X ssh client is
-												PEP 538: Update to depend on PEP 540

- relies entirely on PEP 540 when no appropriate locale
  is available
- uses surrogateescape on standard streams by default
- accounts for BSD-style UTF-8 locales
- avoids any reliance on the en_US-UTF-8 locale
- makes note of related GNU readline issue on Android

											
										
										
											2017-01-20 09:13:24 -05:00
+								configured to forward locale settings, and the user logs into a Linux server).
-												PEP 538: coerce legacy C locale to C.UTF-8

											
										
										
											2016-12-27 21:31:21 -05:00
-												PEP 538: Only set LC_CTYPE, never LANG

It looks like setting LANG may have undesirable
side effects in some cases, and all the issues
the PEP aims to handle are resolved by setting
LC_CTYPE.

The proposal and implementation have thus been
updated to only set LC_CTYPE, even when the
target coercion locale is a full locale.

											
										
										
											2017-05-27 03:08:32 -04:00
+								This should be sufficient to ensure that when the locale coercion is activated,
 								the switch to the UTF-8 based locale will be applied consistently across the
 								current process and any subprocesses that inherit the current environment.
 								Avoiding setting LANG for UTF-8 locale coercion
 								-----------------------------------------------
-												Fix typos (#1113)


											
										
										
											2019-07-03 14:20:45 -04:00
+								Earlier versions of this PEP proposed setting the ``LANG`` category independent
-												PEP 538: Only set LC_CTYPE, never LANG

It looks like setting LANG may have undesirable
side effects in some cases, and all the issues
the PEP aims to handle are resolved by setting
LC_CTYPE.

The proposal and implementation have thus been
updated to only set LC_CTYPE, even when the
target coercion locale is a full locale.

											
										
										
											2017-05-27 03:08:32 -04:00
+								default locale, in addition to setting ``LC_CTYPE``.
 								This was later removed on the grounds that setting only ``LC_CTYPE`` is
 								sufficient to handle all of the problematic scenarios that the PEP aimed
 								to resolve, while setting ``LANG`` as well would break cases where ``LANG``
 								was set correctly, and the locale problems were solely due to an incorrect
 								``LC_CTYPE`` setting ([22_]).
-												PEP 538: coerce legacy C locale to C.UTF-8

											
										
										
											2016-12-27 21:31:21 -05:00
-												PEP 538: Only set LC_CTYPE, never LANG

It looks like setting LANG may have undesirable
side effects in some cases, and all the issues
the PEP aims to handle are resolved by setting
LC_CTYPE.

The proposal and implementation have thus been
updated to only set LC_CTYPE, even when the
target coercion locale is a full locale.

											
										
										
											2017-05-27 03:08:32 -04:00
+								For example, consider a Python application that called the Linux ``date``
 								utility in a subprocess rather than doing its own date formatting::
 								    $ LANG=ja_JP.UTF-8 LC_CTYPE=C date
 年  5月 23日 火曜日 17:31:03 JST
 								    $ LANG=ja_JP.UTF-8 LC_CTYPE=C.UTF-8 date  # Coercing only LC_CTYPE
 年  5月 23日 火曜日 17:32:58 JST
 								    $ LANG=C.UTF-8 LC_CTYPE=C.UTF-8 date  # Coercing both of LC_CTYPE and LANG
 								    Tue May 23 17:31:10 JST 2017
 								With only ``LC_CTYPE`` updated in the Python process, the subprocess would
 								continue to behave as expected. However, if ``LANG`` was updated as well,
 								that would effectively override the ``LC_TIME`` setting and use the wrong
 								date formatting conventions.
-												PEP 538: coerce legacy C locale to C.UTF-8

											
										
										
											2016-12-27 21:31:21 -05:00
-												PEP 538 updates for python-dev review

* Tidy up the abstract and emphasise the equivalence between
  this proposal and long supported configuration settings
* Don't set LC_ALL (set LC_CTYPE instead)
* Add a rationale for that change
* Use GNU readline misbehaviour as a specific example of the
  benefits of reconfiguring the locale
* Clarify rationale for enabling the changes by default on all
  autotools-using platforms
* Mention the possibility of exposing a public API for use by
  embedding platforms

											
										
										
											2017-05-09 06:46:59 -04:00
+								Avoiding setting LC_ALL for UTF-8 locale coercion
 								-------------------------------------------------
-												PEP 538: coerce legacy C locale to C.UTF-8

											
										
										
											2016-12-27 21:31:21 -05:00
-												PEP 538 updates for python-dev review

* Tidy up the abstract and emphasise the equivalence between
  this proposal and long supported configuration settings
* Don't set LC_ALL (set LC_CTYPE instead)
* Add a rationale for that change
* Use GNU readline misbehaviour as a specific example of the
  benefits of reconfiguring the locale
* Clarify rationale for enabling the changes by default on all
  autotools-using platforms
* Mention the possibility of exposing a public API for use by
  embedding platforms

											
										
										
											2017-05-09 06:46:59 -04:00
+								Earlier versions of this PEP proposed setting the ``LC_ALL`` locale override,
-												PEP 538: Only set LC_CTYPE, never LANG

It looks like setting LANG may have undesirable
side effects in some cases, and all the issues
the PEP aims to handle are resolved by setting
LC_CTYPE.

The proposal and implementation have thus been
updated to only set LC_CTYPE, even when the
target coercion locale is a full locale.

											
										
										
											2017-05-27 03:08:32 -04:00
+								in addition to setting ``LC_CTYPE``.
-												PEP 538: coerce legacy C locale to C.UTF-8

											
										
										
											2016-12-27 21:31:21 -05:00
-												PEP 538 updates for python-dev review

* Tidy up the abstract and emphasise the equivalence between
  this proposal and long supported configuration settings
* Don't set LC_ALL (set LC_CTYPE instead)
* Add a rationale for that change
* Use GNU readline misbehaviour as a specific example of the
  benefits of reconfiguring the locale
* Clarify rationale for enabling the changes by default on all
  autotools-using platforms
* Mention the possibility of exposing a public API for use by
  embedding platforms

											
										
										
											2017-05-09 06:46:59 -04:00
+								This was changed after it was determined that just setting ``LC_CTYPE`` and
 								``LANG`` should be sufficient to handle all the scenarios the PEP aims to
 								cover, as it avoids causing any problems in cases like the following::
-												PEP 538: Update for latest python-dev discussion

* default standard stream error handler is always "surrogateescape"
  for the potential coercion target locales
* PEP 540 is now a purely optional follow-on PEP that improves the
  handling of cases where none of these locales are available,
  but doesn't require revisiting the changes made for this PEP
* the locale coercion and warning behaviours are now enabled by
  default for all \*nix platforms, even Mac OS X
* covered the Android-specific changes to the use of `setlocale`
* state explicitly that we're aware this makes the behaviour
  of standalone CPython and embedded CPython diverge, we just think
  the potential benefits are sufficient to accept that downside
* note the reference implementation has yet to be updated with
  these changes

											
										
										
											2017-05-06 02:58:19 -04:00
-												PEP 538 updates for python-dev review

* Tidy up the abstract and emphasise the equivalence between
  this proposal and long supported configuration settings
* Don't set LC_ALL (set LC_CTYPE instead)
* Add a rationale for that change
* Use GNU readline misbehaviour as a specific example of the
  benefits of reconfiguring the locale
* Clarify rationale for enabling the changes by default on all
  autotools-using platforms
* Mention the possibility of exposing a public API for use by
  embedding platforms

											
										
										
											2017-05-09 06:46:59 -04:00
+								    $ LANG=C LC_MONETARY=ja_JP.utf8 ./python -c \
 								      "from locale import setlocale, LC_ALL, currency; setlocale(LC_ALL, ''); print(currency(1e6))"
 								    ￥1000000
 								Skipping locale coercion if LC_ALL is set in the current environment
 								--------------------------------------------------------------------
 								With locale coercion now only setting ``LC_CTYPE`` and ``LANG``, it will have
 								no effect if ``LC_ALL`` is also set. To avoid emitting a spurious locale
 								coercion notice in that case, coercion is instead skipped entirely.
 								Considering locale coercion independently of "UTF-8 mode"
 								---------------------------------------------------------
-												Several PEPs: Use explicit `:pep:` and `:rfc:` roles (#2209)


											
										
										
											2022-01-21 06:03:51 -05:00
+								With both this PEP's locale coercion and :pep:`540`'s UTF-8 mode under
-												PEP 538 updates for python-dev review

* Tidy up the abstract and emphasise the equivalence between
  this proposal and long supported configuration settings
* Don't set LC_ALL (set LC_CTYPE instead)
* Add a rationale for that change
* Use GNU readline misbehaviour as a specific example of the
  benefits of reconfiguring the locale
* Clarify rationale for enabling the changes by default on all
  autotools-using platforms
* Mention the possibility of exposing a public API for use by
  embedding platforms

											
										
										
											2017-05-09 06:46:59 -04:00
+								consideration for Python 3.7, it makes sense to ask whether or not we can
 								limit ourselves to only doing one or the other, rather than making both
 								changes.
-												Several PEPs: Use explicit `:pep:` and `:rfc:` roles (#2209)


											
										
										
											2022-01-21 06:03:51 -05:00
+								The UTF-8 mode proposed in :pep:`540` has two major limitations that make it a
-												PEP 538 updates for python-dev review

* Tidy up the abstract and emphasise the equivalence between
  this proposal and long supported configuration settings
* Don't set LC_ALL (set LC_CTYPE instead)
* Add a rationale for that change
* Use GNU readline misbehaviour as a specific example of the
  benefits of reconfiguring the locale
* Clarify rationale for enabling the changes by default on all
  autotools-using platforms
* Mention the possibility of exposing a public API for use by
  embedding platforms

											
										
										
											2017-05-09 06:46:59 -04:00
+								potential complement to this PEP rather than a potential replacement.
-												Several PEPs: Use explicit `:pep:` and `:rfc:` roles (#2209)


											
										
										
											2022-01-21 06:03:51 -05:00
+								First, unlike this PEP, :pep:`540`'s UTF-8 mode makes it possible to change default
-												PEP 538 updates for python-dev review

* Tidy up the abstract and emphasise the equivalence between
  this proposal and long supported configuration settings
* Don't set LC_ALL (set LC_CTYPE instead)
* Add a rationale for that change
* Use GNU readline misbehaviour as a specific example of the
  benefits of reconfiguring the locale
* Clarify rationale for enabling the changes by default on all
  autotools-using platforms
* Mention the possibility of exposing a public API for use by
  embedding platforms

											
										
										
											2017-05-09 06:46:59 -04:00
+								behaviours that are not currently configurable at all. While that's exactly
 								what makes the proposal interesting, it's also what makes it an entirely
 								unproven approach. By contrast, the approach proposed in this PEP builds
 								directly atop existing configuration settings for the C locale system (
 								``LC_CTYPE``, ``LANG``) and Python's standard streams (``PYTHONIOENCODING``)
 								that have already been in use for years to handle the kinds of compatibility
 								problems discussed in this PEP.
 								Secondly, one of the things we know based on that experience is that the
 								proposed locale coercion can resolve problems not only in CPython itself,
 								but also in extension modules that interact with the standard streams, like
 								GNU readline. As an example, consider the following interactive session
-												Several PEPs: Use explicit `:pep:` and `:rfc:` roles (#2209)


											
										
										
											2022-01-21 06:03:51 -05:00
+								from a :pep:`538` enabled CPython build, where each line after the first is
-												PEP 538 updates for python-dev review

* Tidy up the abstract and emphasise the equivalence between
  this proposal and long supported configuration settings
* Don't set LC_ALL (set LC_CTYPE instead)
* Add a rationale for that change
* Use GNU readline misbehaviour as a specific example of the
  benefits of reconfiguring the locale
* Clarify rationale for enabling the changes by default on all
  autotools-using platforms
* Mention the possibility of exposing a public API for use by
  embedding platforms

											
										
										
											2017-05-09 06:46:59 -04:00
+								executed by doing "up-arrow, left-arrow x4, delete, enter"::
 								    $ LANG=C ./python
 								    Python 3.7.0a0 (heads/pep538-coerce-c-locale:188e780, May  7 2017, 00:21:13)
 								    [GCC 6.3.1 20161221 (Red Hat 6.3.1-1)] on linux
 								    Type "help", "copyright", "credits" or "license" for more information.
 								    >>> print("ℙƴ☂ℌøἤ")
 								    ℙƴ☂ℌøἤ
 								    >>> print("ℙƴ☂ℌἤ")
 								    ℙƴ☂ℌἤ
 								    >>> print("ℙƴ☂ἤ")
 								    ℙƴ☂ἤ
 								    >>> print("ℙƴἤ")
 								    ℙƴἤ
 								    >>> print("ℙἤ")
 								    ℙἤ
 								    >>> print("ἤ")
 								    ἤ
 								    >>>
 								This is exactly what we'd expect from a well-behaved command history editor.
 								By contrast, the following is what currently happens on an older release if
 								you only change the Python level stream encoding settings without updating the
 								locale settings::
 								    $ LANG=C PYTHONIOENCODING=utf-8:surrogateescape python3
 								    Python 3.5.3 (default, Apr 24 2017, 13:32:13)
 								    [GCC 6.3.1 20161221 (Red Hat 6.3.1-1)] on linux
 								    Type "help", "copyright", "credits" or "license" for more information.
 								    >>> print("ℙƴ☂ℌøἤ")
 								    ℙƴ☂ℌøἤ
 								    >>> print("ℙƴ☂ℌ<E29882>")
 								     File "<stdin>", line 0
 								       ^
 								    SyntaxError: 'utf-8' codec can't decode bytes in position 20-21:
 								    invalid continuation byte
 								That particular misbehaviour is coming from GNU readline, *not* CPython -
 								because the command history editing wasn't UTF-8 aware, it corrupted the history
 								buffer and fed such nonsense to stdin that even the surrogateescape error
-												Several PEPs: Use explicit `:pep:` and `:rfc:` roles (#2209)


											
										
										
											2022-01-21 06:03:51 -05:00
+								handler was bypassed. While :pep:`540`'s UTF-8 mode could technically be updated
-												PEP 538 updates for python-dev review

* Tidy up the abstract and emphasise the equivalence between
  this proposal and long supported configuration settings
* Don't set LC_ALL (set LC_CTYPE instead)
* Add a rationale for that change
* Use GNU readline misbehaviour as a specific example of the
  benefits of reconfiguring the locale
* Clarify rationale for enabling the changes by default on all
  autotools-using platforms
* Mention the possibility of exposing a public API for use by
  embedding platforms

											
										
										
											2017-05-09 06:46:59 -04:00
+								to also reconfigure readline, that's just *one* extension module that might
 								be interacting with the standard streams without going through the CPython
 								C API, and any change made by CPython would only apply when readline is running
 								directly as part of Python 3.7 rather than in a separate subprocess.
 								However, if we actually change the configured locale, GNU readline starts
 								behaving itself, without requiring any changes to the embedding application::
 								    $ LANG=C.UTF-8 python3
 								    Python 3.5.3 (default, Apr 24 2017, 13:32:13)
 								    [GCC 6.3.1 20161221 (Red Hat 6.3.1-1)] on linux
 								    Type "help", "copyright", "credits" or "license" for more information.
 								    >>> print("ℙƴ☂ℌøἤ")
 								    ℙƴ☂ℌøἤ
 								    >>> print("ℙƴ☂ℌἤ")
 								    ℙƴ☂ℌἤ
 								    >>> print("ℙƴ☂ἤ")
 								    ℙƴ☂ἤ
 								    >>> print("ℙƴἤ")
 								    ℙƴἤ
 								    >>> print("ℙἤ")
 								    ℙἤ
 								    >>> print("ἤ")
 								    ἤ
 								    >>>
 								    $ LC_CTYPE=C.UTF-8 python3
 								    Python 3.5.3 (default, Apr 24 2017, 13:32:13)
 								    [GCC 6.3.1 20161221 (Red Hat 6.3.1-1)] on linux
 								    Type "help", "copyright", "credits" or "license" for more information.
 								    >>> print("ℙƴ☂ℌøἤ")
 								    ℙƴ☂ℌøἤ
 								    >>> print("ℙƴ☂ℌἤ")
 								    ℙƴ☂ℌἤ
 								    >>> print("ℙƴ☂ἤ")
 								    ℙƴ☂ἤ
 								    >>> print("ℙƴἤ")
 								    ℙƴἤ
 								    >>> print("ℙἤ")
 								    ℙἤ
 								    >>> print("ἤ")
 								    ἤ
 								    >>>
 								Enabling C locale coercion and warnings on Mac OS X, iOS and Android
 								--------------------------------------------------------------------
 								On Mac OS X, iOS, and Android, CPython already assumes the use of UTF-8 for
 								system interfaces, and we expect most other locale-aware components to do the
 								same.
 								Accordingly, this PEP originally proposed to disable locale coercion and
 								warnings at build time for these platforms, on the assumption that it would
 								be entirely redundant.
-												PEP 538: Remove warnings from examples

PEP 538 is likely to be read by folks trying to understand the
implications of locale coercion in Python 3.7.

While I don't want to lose the history that we originally approved
and implemented "warn by default" behaviour, the examples are
easier to read if the warnings are omitted.

I also added inline notes for the sections affected by the change
in how the warning notifications are handled, since the implementation
note at the top is easy to miss when following a direct link to a specific
section.
											
										
										
											2018-03-29 10:05:50 -04:00
+								However, that assumption turned out to be incorrect, as subsequent
-												PEP 538 updates for python-dev review

* Tidy up the abstract and emphasise the equivalence between
  this proposal and long supported configuration settings
* Don't set LC_ALL (set LC_CTYPE instead)
* Add a rationale for that change
* Use GNU readline misbehaviour as a specific example of the
  benefits of reconfiguring the locale
* Clarify rationale for enabling the changes by default on all
  autotools-using platforms
* Mention the possibility of exposing a public API for use by
  embedding platforms

											
										
										
											2017-05-09 06:46:59 -04:00
+								investigations showed that if you explicitly configure ``LANG=C`` on
 								these platforms, extension modules like GNU readline will misbehave in much the
 								same way as they do on other \*nix systems. [21_]
 								In addition, Mac OS X is also frequently used as a development and testing
 								platform for Python software intended for deployment to other \*nix environments
 								(such as Linux or Android), and Linux is similarly often used as a development
 								and testing platform for mobile and Mac OS X applications.
 								Accordingly, this PEP enables the locale coercion and warning features by
 								default on all platforms that use CPython's ``autotools`` based build toolchain
 								(i.e. everywhere other than Windows).
-												PEP 538: coerce legacy C locale to C.UTF-8

											
										
										
											2016-12-27 21:31:21 -05:00
 								Implementation
 								==============
-												PEP 538: switch to PR as main reference implementation

											
										
										
											2017-03-13 02:13:28 -04:00
+								The reference implementation is being developed in the
 								``pep538-coerce-c-locale`` feature branch [18_] in Nick Coghlan's fork of the
 								CPython repository on GitHub. A work-in-progress PR is available at [20_].
-												PEP 538: Update to depend on PEP 540

- relies entirely on PEP 540 when no appropriate locale
  is available
- uses surrogateescape on standard streams by default
- accounts for BSD-style UTF-8 locales
- avoids any reliance on the en_US-UTF-8 locale
- makes note of related GNU readline issue on Android

											
										
										
											2017-01-20 09:13:24 -05:00
-												PEP 538: Update for status changes

- reference implementation should align with the PEP now, aside from
  excluding Mac OS X from the changes
- Fedora's initial 3.6 release is going to be 3.6.1, so reword the
  backport section accordingly

											
										
										
											2017-03-13 04:08:49 -04:00
+								This reference implementation covers not only the enhancement request in
 								issue 28180 [1_], but also the Android compatibility fixes needed to resolve
 								issue 28997 [16_].
-												PEP 538: coerce legacy C locale to C.UTF-8

											
										
										
											2016-12-27 21:31:21 -05:00
 								Backporting to earlier Python 3 releases
 								========================================
-												PEP 538: Update for status changes

- reference implementation should align with the PEP now, aside from
  excluding Mac OS X from the changes
- Fedora's initial 3.6 release is going to be 3.6.1, so reword the
  backport section accordingly

											
										
										
											2017-03-13 04:08:49 -04:00
+								Backporting to Python 3.6.x
-												PEP 538: update for PEP 540 & linux-sig feedback

- PYTHONALLOWCLOCALE=1 -> PYTHONCOERCECLOCALE=0
- reword the proposed library warning
- try all of C.UTF-8, c.utf8 and en_US.UTF-8
- compare and contrast with PEP 540
- new Motivation section showing specific Docker problems
- discuss implications of "strict" error handling
- define configure options to turn the new behaviour off

											
										
										
											2017-01-07 02:04:39 -05:00
+								---------------------------
 								If this PEP is accepted for Python 3.7, redistributors backporting the change
-												PEP 538: Update for status changes

- reference implementation should align with the PEP now, aside from
  excluding Mac OS X from the changes
- Fedora's initial 3.6 release is going to be 3.6.1, so reword the
  backport section accordingly

											
										
										
											2017-03-13 04:08:49 -04:00
+								specifically to their initial Python 3.6.x release will be both allowed and
-												PEP 538: update for PEP 540 & linux-sig feedback

- PYTHONALLOWCLOCALE=1 -> PYTHONCOERCECLOCALE=0
- reword the proposed library warning
- try all of C.UTF-8, c.utf8 and en_US.UTF-8
- compare and contrast with PEP 540
- new Motivation section showing specific Docker problems
- discuss implications of "strict" error handling
- define configure options to turn the new behaviour off

											
										
										
											2017-01-07 02:04:39 -05:00
+								encouraged. However, such backports should only be undertaken either in
-												PEP 538: Update to depend on PEP 540

- relies entirely on PEP 540 when no appropriate locale
  is available
- uses surrogateescape on standard streams by default
- accounts for BSD-style UTF-8 locales
- avoids any reliance on the en_US-UTF-8 locale
- makes note of related GNU readline issue on Android

											
										
										
											2017-01-20 09:13:24 -05:00
+								conjunction with the changes needed to also provide a suitable locale by
 								default, or else specifically for platforms where such a locale is already
-												PEP 538: update for PEP 540 & linux-sig feedback

- PYTHONALLOWCLOCALE=1 -> PYTHONCOERCECLOCALE=0
- reword the proposed library warning
- try all of C.UTF-8, c.utf8 and en_US.UTF-8
- compare and contrast with PEP 540
- new Motivation section showing specific Docker problems
- discuss implications of "strict" error handling
- define configure options to turn the new behaviour off

											
										
										
											2017-01-07 02:04:39 -05:00
+								consistently available.
-												PEP 538: update for python-dev & implementation feedback

- PYTHONCOERCECLOCALE=0 now also disables the library warning
- PEP just refers to locale-aware/locale-independent components,
  without specifically limiting that to C/C++ components

											
										
										
											2017-03-13 01:06:48 -04:00
+								At least the Fedora project is planning to pursue this approach for the
 								upcoming Fedora 26 release [19_].
-												PEP 538: update for PEP 540 & linux-sig feedback

- PYTHONALLOWCLOCALE=1 -> PYTHONCOERCECLOCALE=0
- reword the proposed library warning
- try all of C.UTF-8, c.utf8 and en_US.UTF-8
- compare and contrast with PEP 540
- new Motivation section showing specific Docker problems
- discuss implications of "strict" error handling
- define configure options to turn the new behaviour off

											
										
										
											2017-01-07 02:04:39 -05:00
 								Backporting to other 3.x releases
 								---------------------------------
 								While the proposed behavioural change is seen primarily as a bug fix addressing
 								Python 3's current misbehaviour in the default ASCII-based C locale, it still
-												PEP 538: Update to depend on PEP 540

- relies entirely on PEP 540 when no appropriate locale
  is available
- uses surrogateescape on standard streams by default
- accounts for BSD-style UTF-8 locales
- avoids any reliance on the en_US-UTF-8 locale
- makes note of related GNU readline issue on Android

											
										
										
											2017-01-20 09:13:24 -05:00
+								represents a reasonably significant change in the way CPython interacts with
-												PEP 538: update for PEP 540 & linux-sig feedback

- PYTHONALLOWCLOCALE=1 -> PYTHONCOERCECLOCALE=0
- reword the proposed library warning
- try all of C.UTF-8, c.utf8 and en_US.UTF-8
- compare and contrast with PEP 540
- new Motivation section showing specific Docker problems
- discuss implications of "strict" error handling
- define configure options to turn the new behaviour off

											
										
										
											2017-01-07 02:04:39 -05:00
+								the C locale system. As such, while some redistributors may still choose to
 								backport it to even earlier Python 3.x releases based on the needs and
 								interests of their particular user base, this wouldn't be encouraged as a
 								general practice.
-												PEP 538: coerce legacy C locale to C.UTF-8

											
										
										
											2016-12-27 21:31:21 -05:00
-												PEP 538: Update reference implementation (#219)

- updates reference implementation to use PYTHONCOERCECLOCALE
- removes hard dependency on PEP 540
- still notes PEP 540 covers case where no relevant C-with-UTF-8
  locale is available
- clarifies that these settings are still recommended over the
  legacy C locale settings for older Python 3 versions, even if
  we don't recommend backporting the automatic coercion
											
										
										
											2017-03-05 02:29:54 -05:00
+								However, configuring Python 3 *environments* (such as base container
 								images) to use these configuration settings by default is both allowed
 								and recommended.
-												PEP 538: coerce legacy C locale to C.UTF-8

											
										
										
											2016-12-27 21:31:21 -05:00
 								Acknowledgements
 								================
 								The locale coercion approach proposed in this PEP is inspired directly by
 								Armin Ronacher's handling of this problem in the ``click`` command line
 								utility development framework [2_]::
 								    $ LANG=C python3 -c 'import click; cli = click.command()(lambda:None); cli()'
 								    Traceback (most recent call last):
 								      ...
 								    RuntimeError: Click will abort further execution because Python 3 was
 								    configured to use ASCII as encoding for the environment.  Either run this
 								    under Python 2 or consult http://click.pocoo.org/python3/ for mitigation
 								    steps.
 								    This system supports the C.UTF-8 locale which is recommended.
 								    You might be able to resolve your issue by exporting the
 								    following environment variables:
 								        export LC_ALL=C.UTF-8
 								        export LANG=C.UTF-8
 								The change was originally proposed as a downstream patch for Fedora's
 								system Python 3.6 package [3_], and then reformulated as a PEP for Python 3.7
 								with a section allowing for backports to earlier versions by redistributors.
-												PEP 538: update based on implementation progress

- using PYTHONIOENCODING poses a compatibility problem for
  Python 2 subprocesses, so use Py_SetStandardStreamEncoding
  instead
- note that components checking for "no output on stderr
  means success" will either need to avoid the warning or
  switch to checking return codes instead
- Docker, Inc. ends with a full stop, not a comma (noted by
  Jan Pokorný)
- explicitly acknowledge Charalampos Stratakis's work on the
  Fedora 26 backport

											
										
										
											2017-03-17 04:27:53 -04:00
+								In parallel with the development of the upstream patch, Charalampos Stratakis
 								has been working on the Fedora 26 backport and providing feedback on the
 								practical viability of the proposed changes.
-												PEP 538: coerce legacy C locale to C.UTF-8

											
										
										
											2016-12-27 21:31:21 -05:00
-												PEP 538: update for PEP 540 & linux-sig feedback

- PYTHONALLOWCLOCALE=1 -> PYTHONCOERCECLOCALE=0
- reword the proposed library warning
- try all of C.UTF-8, c.utf8 and en_US.UTF-8
- compare and contrast with PEP 540
- new Motivation section showing specific Docker problems
- discuss implications of "strict" error handling
- define configure options to turn the new behaviour off

											
										
										
											2017-01-07 02:04:39 -05:00
+								The initial draft was posted to the Python Linux SIG for discussion [10_] and
 								then amended based on both that discussion and Victor Stinner's work in
-												Several PEPs: Use explicit `:pep:` and `:rfc:` roles (#2209)


											
										
										
											2022-01-21 06:03:51 -05:00
+								:pep:`540` [11_].
-												PEP 538: update for PEP 540 & linux-sig feedback

- PYTHONALLOWCLOCALE=1 -> PYTHONCOERCECLOCALE=0
- reword the proposed library warning
- try all of C.UTF-8, c.utf8 and en_US.UTF-8
- compare and contrast with PEP 540
- new Motivation section showing specific Docker problems
- discuss implications of "strict" error handling
- define configure options to turn the new behaviour off

											
										
										
											2017-01-07 02:04:39 -05:00
 								The "ℙƴ☂ℌøἤ" string used in the Unicode handling examples throughout this PEP
 								is taken from Ned Batchelder's excellent "Pragmatic Unicode" presentation [9_].
-												PEP 538: Update to depend on PEP 540

- relies entirely on PEP 540 when no appropriate locale
  is available
- uses surrogateescape on standard streams by default
- accounts for BSD-style UTF-8 locales
- avoids any reliance on the en_US-UTF-8 locale
- makes note of related GNU readline issue on Android

											
										
										
											2017-01-20 09:13:24 -05:00
+								Stephen Turnbull has long provided valuable insight into the text encoding
 								handling challenges he regularly encounters at the University of Tsukuba
 								(筑波大学).
-												PEP 538: coerce legacy C locale to C.UTF-8

											
										
										
											2016-12-27 21:31:21 -05:00
 								References
 								==========
 								.. [1] CPython: sys.getfilesystemencoding() should default to utf-8
 								   (http://bugs.python.org/issue28180)
 								.. [2] Locale configuration required for click applications under Python 3
 								   (http://click.pocoo.org/5/python3/#python-3-surrogate-handling)
 								.. [3] Fedora: force C.UTF-8 when Python 3 is run under the C locale
 								   (https://bugzilla.redhat.com/show_bug.cgi?id=1404918)
-												PEP 538: add Background section on locale handling

											
										
										
											2017-01-03 00:19:37 -05:00
+								.. [4] GNU C: How Programs Set the Locale
 								   ( https://www.gnu.org/software/libc/manual/html_node/Setting-the-Locale.html)
 								.. [5] GNU C: Locale Categories
 								   (https://www.gnu.org/software/libc/manual/html_node/Locale-Categories.html)
-												PEP 538: update for PEP 540 & linux-sig feedback

- PYTHONALLOWCLOCALE=1 -> PYTHONCOERCECLOCALE=0
- reword the proposed library warning
- try all of C.UTF-8, c.utf8 and en_US.UTF-8
- compare and contrast with PEP 540
- new Motivation section showing specific Docker problems
- discuss implications of "strict" error handling
- define configure options to turn the new behaviour off

											
										
										
											2017-01-07 02:04:39 -05:00
+								.. [6] glibc C.UTF-8 locale proposal
 								   (https://sourceware.org/glibc/wiki/Proposals/C.UTF-8)
 								.. [7] GNOME Flatpak
 								   (http://flatpak.org/)
 								.. [8] Ubuntu Snappy
 								   (https://www.ubuntu.com/desktop/snappy)
 								.. [9] Pragmatic Unicode
 								   (http://nedbatchelder.com/text/unipain.html)
 								.. [10] linux-sig discussion of initial PEP draft
 								   (https://mail.python.org/pipermail/linux-sig/2017-January/000014.html)
 								.. [11] Feedback notes from linux-sig discussion and PEP 540
 								   (https://github.com/python/peps/issues/171)
 								.. [12] GB 18030
 								   (https://en.wikipedia.org/wiki/GB_18030)
 								.. [13] Shift-JIS
 								   (https://en.wikipedia.org/wiki/Shift_JIS)
 								.. [14] ISO-2022
 								   (https://en.wikipedia.org/wiki/ISO/IEC_2022)
-												PEP 538: coerce legacy C locale to C.UTF-8

											
										
										
											2016-12-27 21:31:21 -05:00
-												PEP 538: document core design principles

Also provides a bit more background on the rationale for
using "strict" by default on stdin and stdout when coercing
the locale to one based on UTF-8

											
										
										
											2017-01-07 20:54:24 -05:00
+								.. [15] Use "surrogateescape" error handler for sys.stdin and sys.stdout on UNIX for the C locale
 								   (https://bugs.python.org/issue19977)
-												PEP 538: Update to depend on PEP 540

- relies entirely on PEP 540 when no appropriate locale
  is available
- uses surrogateescape on standard streams by default
- accounts for BSD-style UTF-8 locales
- avoids any reliance on the en_US-UTF-8 locale
- makes note of related GNU readline issue on Android

											
										
										
											2017-01-20 09:13:24 -05:00
+								.. [16] test_readline.test_nonascii fails on Android
 								   (http://bugs.python.org/issue28997)
 								.. [17] UTF-8 locale discussion on "locale.getdefaultlocale() fails on Mac OS X with default language set to English"
 								   (http://bugs.python.org/issue18378#msg215215)
-												PEP 538: Update reference implementation (#219)

- updates reference implementation to use PYTHONCOERCECLOCALE
- removes hard dependency on PEP 540
- still notes PEP 540 covers case where no relevant C-with-UTF-8
  locale is available
- clarifies that these settings are still recommended over the
  legacy C locale settings for older Python 3 versions, even if
  we don't recommend backporting the automatic coercion
											
										
										
											2017-03-05 02:29:54 -05:00
+								.. [18] GitHub branch diff for ``ncoghlan:pep538-coerce-c-locale``
 								   (https://github.com/python/cpython/compare/master...ncoghlan:pep538-coerce-c-locale)
-												PEP 538: update for python-dev & implementation feedback

- PYTHONCOERCECLOCALE=0 now also disables the library warning
- PEP just refers to locale-aware/locale-independent components,
  without specifically limiting that to C/C++ components

											
										
										
											2017-03-13 01:06:48 -04:00
+								.. [19] Fedora 26 change proposal for locale coercion backport
 								   (https://fedoraproject.org/wiki/Changes/python3_c.utf-8_locale)
-												PEP 538: switch to PR as main reference implementation

											
										
										
											2017-03-13 02:13:28 -04:00
+								.. [20] GitHub pull request for the reference implementation
 								   (https://github.com/python/cpython/pull/659)
-												PEP 538 updates for python-dev review

* Tidy up the abstract and emphasise the equivalence between
  this proposal and long supported configuration settings
* Don't set LC_ALL (set LC_CTYPE instead)
* Add a rationale for that change
* Use GNU readline misbehaviour as a specific example of the
  benefits of reconfiguring the locale
* Clarify rationale for enabling the changes by default on all
  autotools-using platforms
* Mention the possibility of exposing a public API for use by
  embedding platforms

											
										
										
											2017-05-09 06:46:59 -04:00
+								.. [21] GNU readline misbehaviour on Mac OS X with ``LANG=C``
 								   (https://mail.python.org/pipermail/python-dev/2017-May/147897.html)
-												PEP 538: Only set LC_CTYPE, never LANG

It looks like setting LANG may have undesirable
side effects in some cases, and all the issues
the PEP aims to handle are resolved by setting
LC_CTYPE.

The proposal and implementation have thus been
updated to only set LC_CTYPE, even when the
target coercion locale is a full locale.

											
										
										
											2017-05-27 03:08:32 -04:00
+								.. [22] Potential problems when setting LANG in addition to setting LC_CTYPE
 								   (https://mail.python.org/pipermail/python-dev/2017-May/147968.html)
-												PEP 538: coerce legacy C locale to C.UTF-8

											
										
										
											2016-12-27 21:31:21 -05:00
+								Copyright
 								=========
 								This document has been placed in the public domain under the terms of the
 								CC0 1.0 license: https://creativecommons.org/publicdomain/zero/1.0/
 								..
 								   Local Variables:
 								   mode: indented-text
 								   indent-tabs-mode: nil
 								   sentence-end-double-space: t
 								   fill-column: 70
 								   coding: utf-8
 								   End: