python-peps/pep-0597.rst

243 lines
7.1 KiB
ReStructuredText
Raw Normal View History

PEP: 597
2021-01-30 04:18:19 -05:00
Title: Add optional EncodingWarning
Last-Modified: 30-Jan-2021
2020-06-22 21:35:56 -04:00
Author: Inada Naoki <songofacandy@gmail.com>
2020-04-16 19:34:21 -04:00
Discussions-To: https://discuss.python.org/t/3880
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 05-Jun-2019
Python-Version: 3.10
Abstract
========
2021-01-30 04:18:19 -05:00
Add a new warning category ``EncodingWarning``. It is emitted when
2021-01-30 22:44:30 -05:00
``encoding`` option is omitted and the default encoding is a locale
2021-01-30 04:18:19 -05:00
encoding.
2021-01-30 04:18:19 -05:00
The warning is disabled by default. New ``-X warn_encoding``
2021-01-30 22:44:30 -05:00
command-line option and ``PYTHONWARNENCODING`` environment variable
2021-01-30 04:18:19 -05:00
are used to enable the warnings.
Motivation
==========
2020-06-22 21:35:56 -04:00
Using the default encoding is a common mistake
----------------------------------------------
2020-02-04 04:35:06 -05:00
Developers using macOS or Linux may forget that the default encoding
is not always UTF-8.
For example, ``long_description = open("README.md").read()`` in
2020-02-04 04:35:06 -05:00
``setup.py`` is a common mistake. Many Windows users can not install
2020-04-16 19:34:21 -04:00
the package if there is at least one non-ASCII character (e.g. emoji)
2020-06-22 21:35:56 -04:00
in the ``README.md`` file which is encoded in UTF-8.
2020-04-16 19:34:21 -04:00
For example, 489 packages of the 4000 most downloaded packages from
PyPI used non-ASCII characters in README. And 82 packages of them
can not be installed from source package when locale encoding is
2020-06-22 21:35:56 -04:00
ASCII. [1_] They used the default encoding to read README or TOML
file.
2020-04-16 19:34:21 -04:00
Another example is ``logging.basicConfig(filename="log.txt")``.
Some users expect UTF-8 is used by default, but locale encoding is
used actually. [2_]
2020-02-04 04:35:06 -05:00
Even Python experts assume that default encoding is UTF-8.
2020-04-16 19:34:21 -04:00
It creates bugs that happen only on Windows. See [3_] and [4_].
2021-01-30 22:44:30 -05:00
Emitting a warning when the ``encoding`` option is omitted will help
to find such mistakes.
2020-04-16 19:34:21 -04:00
Prepare to change the default encoding to UTF-8
-----------------------------------------------
2021-01-30 22:44:30 -05:00
We had chosen to use locale encoding for the default text encoding in
Python 3.0. But UTF-8 has been adopted very widely since then.
2020-04-16 19:34:21 -04:00
We might change the default text encoding to UTF-8 in the future.
But this change will affect many applications and libraries.
2021-01-30 22:44:30 -05:00
Many ``DeprecationWarning`` will be emitted if we start emitting the
warning by default. It will be too noisy.
2020-04-16 19:34:21 -04:00
2021-01-30 22:44:30 -05:00
Although this PEP doesn't propose to change the default encoding,
this PEP will help to reduce the warning in the future if we decide
to change the default encoding.
Specification
=============
2021-01-30 04:18:19 -05:00
``EncodingWarning``
--------------------
2020-04-16 19:34:21 -04:00
2021-01-30 04:18:19 -05:00
Add new ``EncodingWarning`` warning class which is a subclass of
2021-01-30 22:44:30 -05:00
``Warning``. It is used to warn when the ``encoding`` option is
omitted and the default encoding is locale-specific.
2021-01-30 04:18:19 -05:00
Options to enable the warning
------------------------------
2021-01-30 04:18:19 -05:00
``-X warn_encoding`` option and the ``PYTHONWARNENCODING``
environment variable are added. They are used to enable the
``EncodingWarning``.
2021-01-30 22:44:30 -05:00
``sys.flags.encoding_warning`` is also added. The flag represents
``EncodingWarning`` is enabled.
2021-01-30 04:18:19 -05:00
When the option is enabled, ``io.TextIOWrapper()``, ``open()``, and
2021-01-30 22:44:30 -05:00
other modules using them will emit ``EncodingWarning`` when
``encoding`` is omitted.
2021-01-30 04:18:19 -05:00
``encoding="locale"`` option
----------------------------
``io.TextIOWrapper`` accepts ``encoding="locale"`` option. It means
same to current ``encoding=None``. But ``io.TextIOWrapper`` doesn't
emit ``EncodingWarning`` when ``encoding="locale"`` is specified.
2021-01-30 04:18:19 -05:00
Add ``io.LOCALE_ENCODING = "locale"`` constant too. This constant can
be used to avoid confusing ``LookupError: unknown encoding: locale``
error when the code is run in old Python accidentally.
2021-01-30 04:18:19 -05:00
The constant can be used to test that ``encoding="locale"`` option is
supported too. For example,
2021-01-30 04:18:19 -05:00
.. code-block::
2021-01-30 04:18:19 -05:00
# Want to suppress an EncodingWarning but still need support
# old Python versions.
locale_encoding = getattr(io, "LOCALE_ENCODING", None)
with open(filename, encoding=locale_encoding) as f:
...
2021-01-30 04:18:19 -05:00
``io.text_encoding()``
-----------------------
2021-01-30 04:18:19 -05:00
``io.text_encoding()`` is a helper function for functions having
2021-01-30 22:44:30 -05:00
``encoding=None`` option and passing it to ``io.TextIOWrapper()`` or
2021-01-30 04:18:19 -05:00
``open()``.
2021-01-30 04:18:19 -05:00
Pure Python implementation will be like this::
2020-04-16 19:34:21 -04:00
def text_encoding(encoding, stacklevel=1):
2021-01-30 04:18:19 -05:00
"""Helper function to choose the text encoding.
When *encoding* is not None, just return it.
Otherwise, return the default text encoding (i.e., "locale").
2021-01-30 04:18:19 -05:00
This function emits EncodingWarning if *encoding* is None and
sys.flags.encoding_warning is true.
2021-01-30 04:18:19 -05:00
This function can be used in APIs having encoding=None option
and pass it to TextIOWrapper or open.
But please consider using encoding="utf-8" for new APIs.
2020-04-16 19:34:21 -04:00
"""
if encoding is None:
2021-01-30 04:18:19 -05:00
if sys.flags.encoding_warning:
2020-04-16 19:34:21 -04:00
import warnings
2021-01-30 04:18:19 -05:00
warnings.warn("'encoding' option is omitted",
EncodingWarning, stacklevel + 2)
encoding = LOCALE_ENCODING
2020-04-16 19:34:21 -04:00
return encoding
2021-01-30 04:18:19 -05:00
For example, ``pathlib.Path.read_text()`` can use the function like:
.. code-block::
2020-04-16 19:34:21 -04:00
def read_text(self, encoding=None, errors=None):
encoding = io.text_encoding(encoding)
with self.open(mode='r', encoding=encoding, errors=errors) as f:
return f.read()
2021-01-30 22:44:30 -05:00
By using ``io.text_encoding()``, ``EncodingWarning`` is emitted for
the caller of ``read_text()`` instead of ``read_text()``.
2020-04-16 19:34:21 -04:00
2021-01-30 22:44:30 -05:00
Affected stdlibs
-------------------
2021-01-30 04:18:19 -05:00
2021-01-30 22:44:30 -05:00
Many stdlibs will be affected by this change.
Most APIs accepting ``encoding=None`` will use ``io.text_encoding()``
as written in the previous section.
Where using locale encoding as the default encoding is reasonable,
``encoding=io.LOCALE_ENCODING`` will be used instead. For example,
``subprocess`` module will use locale encoding for the default
encoding of the pipes.
Many tests use ``open()`` without ``encoding`` specified to read
ASCII text files. They should be rewritten with ``encoding="ascii"``.
2020-04-16 19:34:21 -04:00
Rationale
=========
2021-01-30 04:18:19 -05:00
Opt-in warning
---------------
2021-01-30 22:44:30 -05:00
Although ``DeprecationWarning`` is suppressed by default, emitting
``DeprecationWarning`` always when ``encoding`` option is omitted
2021-01-30 04:18:19 -05:00
would be too noisy.
2021-01-30 22:44:30 -05:00
Noisy warnings may lead developers to dismiss the
``DeprecationWarning``.
2021-01-30 04:18:19 -05:00
2020-04-16 19:34:21 -04:00
"locale" is not a codec alias
-----------------------------
We don't add the "locale" to the codec alias because locale can be
changed in runtime.
Additionally, ``TextIOWrapper`` checks ``os.device_encoding()``
2020-06-22 21:35:56 -04:00
when ``encoding=None``. This behavior can not be implemented in
2020-04-16 19:34:21 -04:00
the codec.
Reference Implementation
========================
2020-04-16 19:34:21 -04:00
https://github.com/python/cpython/pull/19481
References
==========
2020-04-16 19:34:21 -04:00
.. [1] "Packages can't be installed when encoding is not UTF-8"
(https://github.com/methane/pep597-pypi-ascii)
.. [2] "Logging - Inconsistent behaviour when handling unicode"
(https://bugs.python.org/issue37111)
.. [3] Packaging tutorial in packaging.python.org didn't specify
encoding to read a ``README.md``
(https://github.com/pypa/packaging.python.org/pull/682)
.. [4] ``json.tool`` had used locale encoding to read JSON files.
(https://bugs.python.org/issue33684)
Copyright
=========
This document has been placed in the public domain.
2020-04-16 19:34:21 -04:00
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End: