2019-06-05 08:09:19 -04:00
|
|
|
PEP: 597
|
2021-01-30 04:18:19 -05:00
|
|
|
Title: Add optional EncodingWarning
|
|
|
|
Last-Modified: 30-Jan-2021
|
2020-06-22 21:35:56 -04:00
|
|
|
Author: Inada Naoki <songofacandy@gmail.com>
|
2020-04-16 19:34:21 -04:00
|
|
|
Discussions-To: https://discuss.python.org/t/3880
|
2019-06-05 08:09:19 -04:00
|
|
|
Status: Draft
|
|
|
|
Type: Standards Track
|
|
|
|
Content-Type: text/x-rst
|
|
|
|
Created: 05-Jun-2019
|
2020-06-17 03:07:48 -04:00
|
|
|
Python-Version: 3.10
|
2019-06-05 08:09:19 -04:00
|
|
|
|
|
|
|
|
|
|
|
Abstract
|
|
|
|
========
|
|
|
|
|
2021-01-30 04:18:19 -05:00
|
|
|
Add a new warning category ``EncodingWarning``. It is emitted when
|
2021-01-30 22:44:30 -05:00
|
|
|
``encoding`` option is omitted and the default encoding is a locale
|
2021-01-30 04:18:19 -05:00
|
|
|
encoding.
|
2019-06-11 23:22:09 -04:00
|
|
|
|
2021-01-30 04:18:19 -05:00
|
|
|
The warning is disabled by default. New ``-X warn_encoding``
|
2021-01-30 22:44:30 -05:00
|
|
|
command-line option and ``PYTHONWARNENCODING`` environment variable
|
2021-01-30 04:18:19 -05:00
|
|
|
are used to enable the warnings.
|
2020-09-07 00:22:24 -04:00
|
|
|
|
2019-06-05 08:09:19 -04:00
|
|
|
|
|
|
|
Motivation
|
|
|
|
==========
|
|
|
|
|
2020-06-22 21:35:56 -04:00
|
|
|
Using the default encoding is a common mistake
|
|
|
|
----------------------------------------------
|
2019-06-11 23:22:09 -04:00
|
|
|
|
2020-02-04 04:35:06 -05:00
|
|
|
Developers using macOS or Linux may forget that the default encoding
|
|
|
|
is not always UTF-8.
|
2019-06-11 23:22:09 -04:00
|
|
|
|
|
|
|
For example, ``long_description = open("README.md").read()`` in
|
2020-02-04 04:35:06 -05:00
|
|
|
``setup.py`` is a common mistake. Many Windows users can not install
|
2020-04-16 19:34:21 -04:00
|
|
|
the package if there is at least one non-ASCII character (e.g. emoji)
|
2020-06-22 21:35:56 -04:00
|
|
|
in the ``README.md`` file which is encoded in UTF-8.
|
2020-04-16 19:34:21 -04:00
|
|
|
|
|
|
|
For example, 489 packages of the 4000 most downloaded packages from
|
|
|
|
PyPI used non-ASCII characters in README. And 82 packages of them
|
|
|
|
can not be installed from source package when locale encoding is
|
2020-06-22 21:35:56 -04:00
|
|
|
ASCII. [1_] They used the default encoding to read README or TOML
|
|
|
|
file.
|
2020-04-16 19:34:21 -04:00
|
|
|
|
|
|
|
Another example is ``logging.basicConfig(filename="log.txt")``.
|
|
|
|
Some users expect UTF-8 is used by default, but locale encoding is
|
|
|
|
used actually. [2_]
|
2019-06-11 23:22:09 -04:00
|
|
|
|
2020-02-04 04:35:06 -05:00
|
|
|
Even Python experts assume that default encoding is UTF-8.
|
2020-04-16 19:34:21 -04:00
|
|
|
It creates bugs that happen only on Windows. See [3_] and [4_].
|
|
|
|
|
2021-01-30 22:44:30 -05:00
|
|
|
Emitting a warning when the ``encoding`` option is omitted will help
|
|
|
|
to find such mistakes.
|
2019-06-11 23:22:09 -04:00
|
|
|
|
2020-04-16 19:34:21 -04:00
|
|
|
|
|
|
|
Prepare to change the default encoding to UTF-8
|
|
|
|
-----------------------------------------------
|
|
|
|
|
2021-01-30 22:44:30 -05:00
|
|
|
We had chosen to use locale encoding for the default text encoding in
|
|
|
|
Python 3.0. But UTF-8 has been adopted very widely since then.
|
2020-04-16 19:34:21 -04:00
|
|
|
|
|
|
|
We might change the default text encoding to UTF-8 in the future.
|
|
|
|
But this change will affect many applications and libraries.
|
2021-01-30 22:44:30 -05:00
|
|
|
Many ``DeprecationWarning`` will be emitted if we start emitting the
|
|
|
|
warning by default. It will be too noisy.
|
2020-04-16 19:34:21 -04:00
|
|
|
|
2021-01-30 22:44:30 -05:00
|
|
|
Although this PEP doesn't propose to change the default encoding,
|
|
|
|
this PEP will help to reduce the warning in the future if we decide
|
|
|
|
to change the default encoding.
|
2019-06-05 08:09:19 -04:00
|
|
|
|
|
|
|
|
|
|
|
Specification
|
|
|
|
=============
|
|
|
|
|
2021-01-30 04:18:19 -05:00
|
|
|
``EncodingWarning``
|
|
|
|
--------------------
|
2020-04-16 19:34:21 -04:00
|
|
|
|
2021-01-30 04:18:19 -05:00
|
|
|
Add new ``EncodingWarning`` warning class which is a subclass of
|
2021-01-30 22:44:30 -05:00
|
|
|
``Warning``. It is used to warn when the ``encoding`` option is
|
|
|
|
omitted and the default encoding is locale-specific.
|
2019-06-11 23:22:09 -04:00
|
|
|
|
|
|
|
|
2021-01-30 04:18:19 -05:00
|
|
|
Options to enable the warning
|
|
|
|
------------------------------
|
2019-06-05 08:09:19 -04:00
|
|
|
|
2021-01-30 04:18:19 -05:00
|
|
|
``-X warn_encoding`` option and the ``PYTHONWARNENCODING``
|
|
|
|
environment variable are added. They are used to enable the
|
|
|
|
``EncodingWarning``.
|
2019-06-05 08:09:19 -04:00
|
|
|
|
2021-01-30 22:44:30 -05:00
|
|
|
``sys.flags.encoding_warning`` is also added. The flag represents
|
|
|
|
``EncodingWarning`` is enabled.
|
2019-06-05 08:09:19 -04:00
|
|
|
|
2021-01-30 04:18:19 -05:00
|
|
|
When the option is enabled, ``io.TextIOWrapper()``, ``open()``, and
|
2021-01-30 22:44:30 -05:00
|
|
|
other modules using them will emit ``EncodingWarning`` when
|
|
|
|
``encoding`` is omitted.
|
2019-06-05 08:09:19 -04:00
|
|
|
|
2019-06-11 23:22:09 -04:00
|
|
|
|
2021-01-30 04:18:19 -05:00
|
|
|
``encoding="locale"`` option
|
|
|
|
----------------------------
|
|
|
|
|
|
|
|
``io.TextIOWrapper`` accepts ``encoding="locale"`` option. It means
|
|
|
|
same to current ``encoding=None``. But ``io.TextIOWrapper`` doesn't
|
|
|
|
emit ``EncodingWarning`` when ``encoding="locale"`` is specified.
|
2020-09-07 00:22:24 -04:00
|
|
|
|
2021-01-30 04:18:19 -05:00
|
|
|
Add ``io.LOCALE_ENCODING = "locale"`` constant too. This constant can
|
|
|
|
be used to avoid confusing ``LookupError: unknown encoding: locale``
|
|
|
|
error when the code is run in old Python accidentally.
|
2020-09-07 00:22:24 -04:00
|
|
|
|
2021-01-30 04:18:19 -05:00
|
|
|
The constant can be used to test that ``encoding="locale"`` option is
|
|
|
|
supported too. For example,
|
2020-09-07 00:22:24 -04:00
|
|
|
|
2021-01-30 04:18:19 -05:00
|
|
|
.. code-block::
|
2020-09-07 00:22:24 -04:00
|
|
|
|
2021-01-30 04:18:19 -05:00
|
|
|
# Want to suppress an EncodingWarning but still need support
|
2020-09-07 00:22:24 -04:00
|
|
|
# old Python versions.
|
|
|
|
locale_encoding = getattr(io, "LOCALE_ENCODING", None)
|
|
|
|
with open(filename, encoding=locale_encoding) as f:
|
|
|
|
...
|
|
|
|
|
|
|
|
|
2021-01-30 04:18:19 -05:00
|
|
|
``io.text_encoding()``
|
|
|
|
-----------------------
|
2019-06-11 23:22:09 -04:00
|
|
|
|
2021-01-30 04:18:19 -05:00
|
|
|
``io.text_encoding()`` is a helper function for functions having
|
2021-01-30 22:44:30 -05:00
|
|
|
``encoding=None`` option and passing it to ``io.TextIOWrapper()`` or
|
2021-01-30 04:18:19 -05:00
|
|
|
``open()``.
|
2019-06-05 08:09:19 -04:00
|
|
|
|
2021-01-30 04:18:19 -05:00
|
|
|
Pure Python implementation will be like this::
|
2019-06-11 23:22:09 -04:00
|
|
|
|
2020-04-16 19:34:21 -04:00
|
|
|
def text_encoding(encoding, stacklevel=1):
|
2021-01-30 04:18:19 -05:00
|
|
|
"""Helper function to choose the text encoding.
|
|
|
|
|
|
|
|
When *encoding* is not None, just return it.
|
|
|
|
Otherwise, return the default text encoding (i.e., "locale").
|
2019-06-11 23:22:09 -04:00
|
|
|
|
2021-01-30 04:18:19 -05:00
|
|
|
This function emits EncodingWarning if *encoding* is None and
|
|
|
|
sys.flags.encoding_warning is true.
|
2019-06-05 08:09:19 -04:00
|
|
|
|
2021-01-30 04:18:19 -05:00
|
|
|
This function can be used in APIs having encoding=None option
|
|
|
|
and pass it to TextIOWrapper or open.
|
|
|
|
But please consider using encoding="utf-8" for new APIs.
|
2020-04-16 19:34:21 -04:00
|
|
|
"""
|
|
|
|
if encoding is None:
|
2021-01-30 04:18:19 -05:00
|
|
|
if sys.flags.encoding_warning:
|
2020-04-16 19:34:21 -04:00
|
|
|
import warnings
|
2021-01-30 04:18:19 -05:00
|
|
|
warnings.warn("'encoding' option is omitted",
|
|
|
|
EncodingWarning, stacklevel + 2)
|
2020-09-07 00:22:24 -04:00
|
|
|
encoding = LOCALE_ENCODING
|
2020-04-16 19:34:21 -04:00
|
|
|
return encoding
|
2019-06-05 08:09:19 -04:00
|
|
|
|
2021-01-30 04:18:19 -05:00
|
|
|
For example, ``pathlib.Path.read_text()`` can use the function like:
|
|
|
|
|
|
|
|
.. code-block::
|
2019-06-05 08:09:19 -04:00
|
|
|
|
2020-04-16 19:34:21 -04:00
|
|
|
def read_text(self, encoding=None, errors=None):
|
|
|
|
encoding = io.text_encoding(encoding)
|
|
|
|
with self.open(mode='r', encoding=encoding, errors=errors) as f:
|
|
|
|
return f.read()
|
2019-06-05 08:09:19 -04:00
|
|
|
|
2021-01-30 22:44:30 -05:00
|
|
|
By using ``io.text_encoding()``, ``EncodingWarning`` is emitted for
|
|
|
|
the caller of ``read_text()`` instead of ``read_text()``.
|
2019-06-05 08:09:19 -04:00
|
|
|
|
2020-04-16 19:34:21 -04:00
|
|
|
|
2021-01-30 22:44:30 -05:00
|
|
|
Affected stdlibs
|
|
|
|
-------------------
|
2021-01-30 04:18:19 -05:00
|
|
|
|
2021-01-30 22:44:30 -05:00
|
|
|
Many stdlibs will be affected by this change.
|
|
|
|
|
|
|
|
Most APIs accepting ``encoding=None`` will use ``io.text_encoding()``
|
|
|
|
as written in the previous section.
|
|
|
|
|
|
|
|
Where using locale encoding as the default encoding is reasonable,
|
|
|
|
``encoding=io.LOCALE_ENCODING`` will be used instead. For example,
|
|
|
|
``subprocess`` module will use locale encoding for the default
|
|
|
|
encoding of the pipes.
|
|
|
|
|
|
|
|
Many tests use ``open()`` without ``encoding`` specified to read
|
|
|
|
ASCII text files. They should be rewritten with ``encoding="ascii"``.
|
2020-04-16 19:34:21 -04:00
|
|
|
|
|
|
|
|
|
|
|
Rationale
|
|
|
|
=========
|
|
|
|
|
2021-01-30 04:18:19 -05:00
|
|
|
Opt-in warning
|
|
|
|
---------------
|
|
|
|
|
2021-01-30 22:44:30 -05:00
|
|
|
Although ``DeprecationWarning`` is suppressed by default, emitting
|
|
|
|
``DeprecationWarning`` always when ``encoding`` option is omitted
|
2021-01-30 04:18:19 -05:00
|
|
|
would be too noisy.
|
|
|
|
|
2021-01-30 22:44:30 -05:00
|
|
|
Noisy warnings may lead developers to dismiss the
|
|
|
|
``DeprecationWarning``.
|
2021-01-30 04:18:19 -05:00
|
|
|
|
|
|
|
|
2020-04-16 19:34:21 -04:00
|
|
|
"locale" is not a codec alias
|
|
|
|
-----------------------------
|
|
|
|
|
|
|
|
We don't add the "locale" to the codec alias because locale can be
|
|
|
|
changed in runtime.
|
|
|
|
|
|
|
|
Additionally, ``TextIOWrapper`` checks ``os.device_encoding()``
|
2020-06-22 21:35:56 -04:00
|
|
|
when ``encoding=None``. This behavior can not be implemented in
|
2020-04-16 19:34:21 -04:00
|
|
|
the codec.
|
|
|
|
|
|
|
|
|
2019-06-11 23:22:09 -04:00
|
|
|
Reference Implementation
|
|
|
|
========================
|
2019-06-05 08:09:19 -04:00
|
|
|
|
2020-04-16 19:34:21 -04:00
|
|
|
https://github.com/python/cpython/pull/19481
|
2019-06-05 08:09:19 -04:00
|
|
|
|
|
|
|
|
|
|
|
References
|
|
|
|
==========
|
|
|
|
|
2020-04-16 19:34:21 -04:00
|
|
|
.. [1] "Packages can't be installed when encoding is not UTF-8"
|
|
|
|
(https://github.com/methane/pep597-pypi-ascii)
|
|
|
|
|
|
|
|
.. [2] "Logging - Inconsistent behaviour when handling unicode"
|
|
|
|
(https://bugs.python.org/issue37111)
|
|
|
|
|
|
|
|
.. [3] Packaging tutorial in packaging.python.org didn't specify
|
|
|
|
encoding to read a ``README.md``
|
|
|
|
(https://github.com/pypa/packaging.python.org/pull/682)
|
|
|
|
|
|
|
|
.. [4] ``json.tool`` had used locale encoding to read JSON files.
|
|
|
|
(https://bugs.python.org/issue33684)
|
2019-06-11 23:22:09 -04:00
|
|
|
|
2019-06-05 08:09:19 -04:00
|
|
|
|
|
|
|
Copyright
|
|
|
|
=========
|
|
|
|
|
|
|
|
This document has been placed in the public domain.
|
|
|
|
|
2020-04-16 19:34:21 -04:00
|
|
|
|
2019-06-05 08:09:19 -04:00
|
|
|
..
|
|
|
|
Local Variables:
|
|
|
|
mode: indented-text
|
|
|
|
indent-tabs-mode: nil
|
|
|
|
sentence-end-double-space: t
|
|
|
|
fill-column: 70
|
|
|
|
coding: utf-8
|
|
|
|
End:
|