python-peps/pep-0597.rst

247 lines
7.2 KiB
ReStructuredText
Raw Normal View History

PEP: 597
2020-06-22 21:35:56 -04:00
Title: Soft deprecation of default encoding
Last-Modified: 23-Jun-2020
Author: Inada Naoki <songofacandy@gmail.com>
2020-04-16 19:34:21 -04:00
Discussions-To: https://discuss.python.org/t/3880
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 05-Jun-2019
Python-Version: 3.10
Abstract
========
2020-04-16 19:34:21 -04:00
This PEP proposes:
2020-04-16 19:34:21 -04:00
* ``TextIOWrapper`` raises a ``PendingDeprecationWarning`` when the
2020-06-22 21:35:56 -04:00
``encoding`` option is not specified and dev mode is enabled.
2020-04-16 19:34:21 -04:00
2020-06-22 21:35:56 -04:00
* Add ``encoding="locale"`` option to ``TextIOWrapper``. It behaves
2020-04-16 19:34:21 -04:00
like ``encoding=None`` but don't raise a warning.
* Add ``io.LOCALE_ENCODING = "locale"`` constant to avoid confusing
2020-09-07 23:42:48 -04:00
``LookupError``.
Motivation
==========
2020-06-22 21:35:56 -04:00
Using the default encoding is a common mistake
----------------------------------------------
2020-02-04 04:35:06 -05:00
Developers using macOS or Linux may forget that the default encoding
is not always UTF-8.
For example, ``long_description = open("README.md").read()`` in
2020-02-04 04:35:06 -05:00
``setup.py`` is a common mistake. Many Windows users can not install
2020-04-16 19:34:21 -04:00
the package if there is at least one non-ASCII character (e.g. emoji)
2020-06-22 21:35:56 -04:00
in the ``README.md`` file which is encoded in UTF-8.
2020-04-16 19:34:21 -04:00
For example, 489 packages of the 4000 most downloaded packages from
PyPI used non-ASCII characters in README. And 82 packages of them
can not be installed from source package when locale encoding is
2020-06-22 21:35:56 -04:00
ASCII. [1_] They used the default encoding to read README or TOML
file.
2020-04-16 19:34:21 -04:00
Another example is ``logging.basicConfig(filename="log.txt")``.
Some users expect UTF-8 is used by default, but locale encoding is
used actually. [2_]
2020-02-04 04:35:06 -05:00
Even Python experts assume that default encoding is UTF-8.
2020-04-16 19:34:21 -04:00
It creates bugs that happen only on Windows. See [3_] and [4_].
Raising a warning when the ``encoding`` option is omitted will
help to find such mistakes.
2020-04-16 19:34:21 -04:00
Prepare to change the default encoding to UTF-8
-----------------------------------------------
We chose to use locale encoding for the default text encoding
2020-06-22 21:35:56 -04:00
in Python 3.0. But UTF-8 has been adopted very widely since then.
2020-04-16 19:34:21 -04:00
We might change the default text encoding to UTF-8 in the future.
But this change will affect many applications and libraries.
Many ``DeprecationWarning`` will be raised if we start raising
2020-06-22 21:35:56 -04:00
the warning by default. It will be too noisy.
2020-04-16 19:34:21 -04:00
While this PEP doesn't cover the change, this PEP will help to reduce
the number of ``DeprecationWarning`` in the future.
Specification
=============
2020-04-16 19:34:21 -04:00
Raising a PendingDeprecationWarning
---------------------------------------
``TextIOWrapper`` raises the ``PendingDeprecationWarning`` when the
2020-06-22 21:35:56 -04:00
``encoding`` option is omitted and dev mode is enabled.
2020-04-16 19:34:21 -04:00
``encoding="locale"`` option
----------------------------
2020-04-16 19:34:21 -04:00
When ``encoding="locale"`` is specified to the ``TextIOWrapper``, it
2020-06-22 21:35:56 -04:00
behaves same to ``encoding=None`` except it doesn't raise warning.
In detail, the encoding is chosen by this order:
2020-04-16 19:34:21 -04:00
1. ``os.device_encoding(buffer.fileno())``
2. ``locale.getpreferredencoding(False)``
2020-06-22 21:35:56 -04:00
This option can be used to use the locale encoding explicitly and
suppress the ``PendingDeprecationWarning``.
``io.LOCALE_ENCODING``
----------------------
``io`` module has ``io.LOCALE_ENCODING = "locale"`` constant. This
constant can be used to avoid confusing ``LookupError: unknown
encoding: locale`` error when the code is run in Python older than
3.10 accidentally.
The constant can be used to test that ``encoding="locale"`` option
is supported too.
::
# Want to suppress the Warning in dev mode but still need support
# old Python versions.
locale_encoding = getattr(io, "LOCALE_ENCODING", None)
with open(filename, encoding=locale_encoding) as f:
...
2020-04-16 19:34:21 -04:00
``io.text_encoding``
--------------------
2020-06-22 21:35:56 -04:00
``TextIOWrapper`` is used indirectly in most cases. For example,
``open``, and ``pathlib.Path.read_text()`` use it. Warning to these
functions doesn't make sense. Callers of these functions should be
warned instead.
2020-06-22 21:35:56 -04:00
``io.text_encoding(encoding, stacklevel=1)`` is a helper function for
it. Pure Python implementation will be like this::
2020-04-16 19:34:21 -04:00
def text_encoding(encoding, stacklevel=1):
"""
Helper function to choose the text encoding.
2020-04-16 19:34:21 -04:00
When encoding is not None, just return it.
Otherwise, return the default text encoding ("locale" for now),
and raise a PendingDeprecationWarning in dev mode.
2020-04-16 19:34:21 -04:00
This function can be used in APIs having encoding=None option.
But please consider encoding="utf-8" for new APIs.
"""
if encoding is None:
if sys.flags.dev_mode:
import warnings
warnings.warn(
"'encoding' option is not specified. The default encoding "
2020-06-22 21:35:56 -04:00
"might be changed to 'utf-8' in the future",
2020-04-16 19:34:21 -04:00
PendingDeprecationWarning, stacklevel + 2)
encoding = LOCALE_ENCODING
2020-04-16 19:34:21 -04:00
return encoding
2020-04-16 19:34:21 -04:00
``pathlib.Path.read_text()`` can use this function like this::
2020-04-16 19:34:21 -04:00
def read_text(self, encoding=None, errors=None):
"""
Open the file in text mode, read it, and close the file.
"""
encoding = io.text_encoding(encoding)
with self.open(mode='r', encoding=encoding, errors=errors) as f:
return f.read()
2020-04-16 19:34:21 -04:00
subprocess module doesn't warn
------------------------------
While the subprocess module uses TextIOWrapper, it doesn't raise
``PendingDeprecationWarning``. It uses the ``io.LOCALE_ENCODING``
by default.
2020-04-16 19:34:21 -04:00
Rationale
=========
"locale" is not a codec alias
-----------------------------
We don't add the "locale" to the codec alias because locale can be
changed in runtime.
Additionally, ``TextIOWrapper`` checks ``os.device_encoding()``
2020-06-22 21:35:56 -04:00
when ``encoding=None``. This behavior can not be implemented in
2020-04-16 19:34:21 -04:00
the codec.
Use a PendingDeprecationWarning
-------------------------------
2020-06-22 21:35:56 -04:00
This PEP doesn't cover changing the default encoding to UTF-8.
So we use ``PendingDeprecationWarning`` instead of
``DeprecationWarning`` for now.
2020-04-16 19:34:21 -04:00
Raise warning only in dev mode
------------------------------
This PEP will produce a huge amount of ``PendingDeprecationWarning``.
It will be too noisy for most Python developers.
2020-06-22 21:35:56 -04:00
We need to fix all warnings in the standard library. We need to wait
pip and major dev tools like ``pytest`` fix warnings before raising
this warning by default.
2020-04-16 19:34:21 -04:00
subprocess module doesn't warn
------------------------------
2020-06-22 21:35:56 -04:00
The default encoding for PIPE is relating to the encoding of the
stdio than the default encoding of ``TextIOWrapper``. So this PEP
doesn't propose to raise warning from the subprocess module.
Reference Implementation
========================
2020-04-16 19:34:21 -04:00
https://github.com/python/cpython/pull/19481
References
==========
2020-04-16 19:34:21 -04:00
.. [1] "Packages can't be installed when encoding is not UTF-8"
(https://github.com/methane/pep597-pypi-ascii)
.. [2] "Logging - Inconsistent behaviour when handling unicode"
(https://bugs.python.org/issue37111)
.. [3] Packaging tutorial in packaging.python.org didn't specify
encoding to read a ``README.md``
(https://github.com/pypa/packaging.python.org/pull/682)
.. [4] ``json.tool`` had used locale encoding to read JSON files.
(https://bugs.python.org/issue33684)
Copyright
=========
This document has been placed in the public domain.
2020-04-16 19:34:21 -04:00
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End: