python-peps/pep-0597.rst

247 lines
7.2 KiB
ReStructuredText

PEP: 597
Title: Soft deprecation of default encoding
Last-Modified: 23-Jun-2020
Author: Inada Naoki <songofacandy@gmail.com>
Discussions-To: https://discuss.python.org/t/3880
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 05-Jun-2019
Python-Version: 3.10
Abstract
========
This PEP proposes:
* ``TextIOWrapper`` raises a ``PendingDeprecationWarning`` when the
``encoding`` option is not specified and dev mode is enabled.
* Add ``encoding="locale"`` option to ``TextIOWrapper``. It behaves
like ``encoding=None`` but don't raise a warning.
* Add ``io.LOCALE_ENCODING = "locale"`` constant to avoid confusing
``LookupError``.
Motivation
==========
Using the default encoding is a common mistake
----------------------------------------------
Developers using macOS or Linux may forget that the default encoding
is not always UTF-8.
For example, ``long_description = open("README.md").read()`` in
``setup.py`` is a common mistake. Many Windows users can not install
the package if there is at least one non-ASCII character (e.g. emoji)
in the ``README.md`` file which is encoded in UTF-8.
For example, 489 packages of the 4000 most downloaded packages from
PyPI used non-ASCII characters in README. And 82 packages of them
can not be installed from source package when locale encoding is
ASCII. [1_] They used the default encoding to read README or TOML
file.
Another example is ``logging.basicConfig(filename="log.txt")``.
Some users expect UTF-8 is used by default, but locale encoding is
used actually. [2_]
Even Python experts assume that default encoding is UTF-8.
It creates bugs that happen only on Windows. See [3_] and [4_].
Raising a warning when the ``encoding`` option is omitted will
help to find such mistakes.
Prepare to change the default encoding to UTF-8
-----------------------------------------------
We chose to use locale encoding for the default text encoding
in Python 3.0. But UTF-8 has been adopted very widely since then.
We might change the default text encoding to UTF-8 in the future.
But this change will affect many applications and libraries.
Many ``DeprecationWarning`` will be raised if we start raising
the warning by default. It will be too noisy.
While this PEP doesn't cover the change, this PEP will help to reduce
the number of ``DeprecationWarning`` in the future.
Specification
=============
Raising a PendingDeprecationWarning
---------------------------------------
``TextIOWrapper`` raises the ``PendingDeprecationWarning`` when the
``encoding`` option is omitted and dev mode is enabled.
``encoding="locale"`` option
----------------------------
When ``encoding="locale"`` is specified to the ``TextIOWrapper``, it
behaves same to ``encoding=None`` except it doesn't raise warning.
In detail, the encoding is chosen by this order:
1. ``os.device_encoding(buffer.fileno())``
2. ``locale.getpreferredencoding(False)``
This option can be used to use the locale encoding explicitly and
suppress the ``PendingDeprecationWarning``.
``io.LOCALE_ENCODING``
----------------------
``io`` module has ``io.LOCALE_ENCODING = "locale"`` constant. This
constant can be used to avoid confusing ``LookupError: unknown
encoding: locale`` error when the code is run in Python older than
3.10 accidentally.
The constant can be used to test that ``encoding="locale"`` option
is supported too.
::
# Want to suppress the Warning in dev mode but still need support
# old Python versions.
locale_encoding = getattr(io, "LOCALE_ENCODING", None)
with open(filename, encoding=locale_encoding) as f:
...
``io.text_encoding``
--------------------
``TextIOWrapper`` is used indirectly in most cases. For example,
``open``, and ``pathlib.Path.read_text()`` use it. Warning to these
functions doesn't make sense. Callers of these functions should be
warned instead.
``io.text_encoding(encoding, stacklevel=1)`` is a helper function for
it. Pure Python implementation will be like this::
def text_encoding(encoding, stacklevel=1):
"""
Helper function to choose the text encoding.
When encoding is not None, just return it.
Otherwise, return the default text encoding ("locale" for now),
and raise a PendingDeprecationWarning in dev mode.
This function can be used in APIs having encoding=None option.
But please consider encoding="utf-8" for new APIs.
"""
if encoding is None:
if sys.flags.dev_mode:
import warnings
warnings.warn(
"'encoding' option is not specified. The default encoding "
"might be changed to 'utf-8' in the future",
PendingDeprecationWarning, stacklevel + 2)
encoding = LOCALE_ENCODING
return encoding
``pathlib.Path.read_text()`` can use this function like this::
def read_text(self, encoding=None, errors=None):
"""
Open the file in text mode, read it, and close the file.
"""
encoding = io.text_encoding(encoding)
with self.open(mode='r', encoding=encoding, errors=errors) as f:
return f.read()
subprocess module doesn't warn
------------------------------
While the subprocess module uses TextIOWrapper, it doesn't raise
``PendingDeprecationWarning``. It uses the ``io.LOCALE_ENCODING``
by default.
Rationale
=========
"locale" is not a codec alias
-----------------------------
We don't add the "locale" to the codec alias because locale can be
changed in runtime.
Additionally, ``TextIOWrapper`` checks ``os.device_encoding()``
when ``encoding=None``. This behavior can not be implemented in
the codec.
Use a PendingDeprecationWarning
-------------------------------
This PEP doesn't cover changing the default encoding to UTF-8.
So we use ``PendingDeprecationWarning`` instead of
``DeprecationWarning`` for now.
Raise warning only in dev mode
------------------------------
This PEP will produce a huge amount of ``PendingDeprecationWarning``.
It will be too noisy for most Python developers.
We need to fix all warnings in the standard library. We need to wait
pip and major dev tools like ``pytest`` fix warnings before raising
this warning by default.
subprocess module doesn't warn
------------------------------
The default encoding for PIPE is relating to the encoding of the
stdio than the default encoding of ``TextIOWrapper``. So this PEP
doesn't propose to raise warning from the subprocess module.
Reference Implementation
========================
https://github.com/python/cpython/pull/19481
References
==========
.. [1] "Packages can't be installed when encoding is not UTF-8"
(https://github.com/methane/pep597-pypi-ascii)
.. [2] "Logging - Inconsistent behaviour when handling unicode"
(https://bugs.python.org/issue37111)
.. [3] Packaging tutorial in packaging.python.org didn't specify
encoding to read a ``README.md``
(https://github.com/pypa/packaging.python.org/pull/682)
.. [4] ``json.tool`` had used locale encoding to read JSON files.
(https://bugs.python.org/issue33684)
Copyright
=========
This document has been placed in the public domain.
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End: