python-peps/peps/pep-0597.rst

352 lines
11 KiB
ReStructuredText
Raw Normal View History

PEP: 597
2021-01-30 04:18:19 -05:00
Title: Add optional EncodingWarning
2021-09-16 20:59:22 -04:00
Last-Modified: 07-Aug-2021
2020-06-22 21:35:56 -04:00
Author: Inada Naoki <songofacandy@gmail.com>
2021-09-16 20:59:22 -04:00
Status: Final
Type: Standards Track
Content-Type: text/x-rst
Created: 05-Jun-2019
Python-Version: 3.10
Abstract
========
Add a new warning category ``EncodingWarning``. It is emitted when the
``encoding`` argument to ``open()`` is omitted and the default
locale-specific encoding is used.
The warning is disabled by default. A new ``-X warn_default_encoding``
command-line option and a new ``PYTHONWARNDEFAULTENCODING`` environment
variable can be used to enable it.
A ``"locale"`` argument value for ``encoding`` is added too. It
explicitly specifies that the locale encoding should be used, silencing
the warning.
2021-02-14 09:06:57 -05:00
Motivation
==========
2020-06-22 21:35:56 -04:00
Using the default encoding is a common mistake
----------------------------------------------
2020-02-04 04:35:06 -05:00
Developers using macOS or Linux may forget that the default encoding
is not always UTF-8.
For example, using ``long_description = open("README.md").read()`` in
``setup.py`` is a common mistake. Many Windows users cannot install
such packages if there is at least one non-ASCII character
(e.g. emoji, author names, copyright symbols, and the like)
in their UTF-8-encoded ``README.md`` file.
2020-04-16 19:34:21 -04:00
Of the 4000 most downloaded packages from PyPI, 489 use non-ASCII
characters in their README, and 82 fail to install from source on
non-UTF-8 locales due to not specifying an encoding for a non-ASCII
file. [1]_
2020-04-16 19:34:21 -04:00
Another example is ``logging.basicConfig(filename="log.txt")``.
Some users might expect it to use UTF-8 by default, but the locale
encoding is actually what is used. [2]_
Even Python experts may assume that the default encoding is UTF-8.
This creates bugs that only happen on Windows; see [3]_, [4]_, [5]_,
2021-02-14 09:06:57 -05:00
and [6]_ for example.
2020-04-16 19:34:21 -04:00
Emitting a warning when the ``encoding`` argument is omitted will help
find such mistakes.
2020-04-16 19:34:21 -04:00
2021-02-14 09:06:57 -05:00
Explicit way to use locale-specific encoding
--------------------------------------------
``open(filename)`` isn't explicit about which encoding is expected:
* If ASCII is assumed, this isn't a bug, but may result in decreased
performance on Windows, particularly with non-Latin-1 locale encodings
* If UTF-8 is assumed, this may be a bug or a platform-specific script
* If the locale encoding is assumed, the behavior is as expected
(but could change if future versions of Python modify the default)
2021-02-14 09:06:57 -05:00
From this point of view, ``open(filename)`` is not readable code.
2021-02-14 09:06:57 -05:00
``encoding=locale.getpreferredencoding(False)`` can be used to
specify the locale encoding explicitly, but it is too long and easy
to misuse (e.g. one can forget to pass ``False`` as its argument).
2021-02-14 09:06:57 -05:00
This PEP provides an explicit way to specify the locale encoding.
2020-04-16 19:34:21 -04:00
Prepare to change the default encoding to UTF-8
-----------------------------------------------
Since UTF-8 has become the de-facto standard text encoding,
we might default to it for opening files in the future.
2020-04-16 19:34:21 -04:00
However, such a change will affect many applications and libraries.
If we start emitting ``DeprecationWarning`` everywhere the ``encoding``
argument is omitted, it will be too noisy and painful.
2020-04-16 19:34:21 -04:00
Although this PEP doesn't propose changing the default encoding,
it will help enable that change by:
2021-02-14 09:06:57 -05:00
* Reducing the number of omitted ``encoding`` arguments in libraries
before we start emitting a ``DeprecationWarning`` by default.
2021-02-14 09:06:57 -05:00
* Allowing users to pass ``encoding="locale"`` to suppress
the current warning and any ``DeprecationWarning`` added in the future,
as well as retaining consistent behavior if later Python versions
change the default, ensuring support for any Python version >=3.10.
Specification
=============
2021-01-30 04:18:19 -05:00
``EncodingWarning``
-------------------
2020-04-16 19:34:21 -04:00
Add a new ``EncodingWarning`` warning class as a subclass of
``Warning``. It is emitted when the ``encoding`` argument is omitted and
the default locale-specific encoding is used.
2021-01-30 04:18:19 -05:00
Options to enable the warning
-----------------------------
The ``-X warn_default_encoding`` option and the
``PYTHONWARNDEFAULTENCODING`` environment variable are added. They
2021-02-21 02:01:52 -05:00
are used to enable ``EncodingWarning``.
``sys.flags.warn_default_encoding`` is also added. The flag is true when
2021-01-30 22:44:30 -05:00
``EncodingWarning`` is enabled.
When the flag is set, ``io.TextIOWrapper()``, ``open()`` and other
modules using them will emit ``EncodingWarning`` when the ``encoding``
argument is omitted.
2021-02-14 09:06:57 -05:00
Since ``EncodingWarning`` is a subclass of ``Warning``, they are
shown by default (if the ``warn_default_encoding`` flag is set), unlike
``DeprecationWarning``.
2021-02-14 09:06:57 -05:00
``encoding="locale"``
---------------------
2021-01-30 04:18:19 -05:00
``io.TextIOWrapper`` will accept ``"locale"`` as a valid argument to
``encoding``. It has the same meaning as the current ``encoding=None``,
except that ``io.TextIOWrapper`` doesn't emit ``EncodingWarning`` when
``encoding="locale"`` is specified.
2021-01-30 04:18:19 -05:00
``io.text_encoding()``
----------------------
``io.text_encoding()`` is a helper for functions with an
``encoding=None`` parameter that pass it to ``io.TextIOWrapper()`` or
2021-01-30 04:18:19 -05:00
``open()``.
A pure Python implementation will look like this::
2020-04-16 19:34:21 -04:00
def text_encoding(encoding, stacklevel=1):
"""A helper function to choose the text encoding.
2021-01-30 04:18:19 -05:00
When *encoding* is not None, just return it.
Otherwise, return the default text encoding (i.e. "locale").
This function emits an EncodingWarning if *encoding* is None and
sys.flags.warn_default_encoding is true.
This function can be used in APIs with an encoding=None parameter
that pass it to TextIOWrapper or open.
However, please consider using encoding="utf-8" for new APIs.
2020-04-16 19:34:21 -04:00
"""
if encoding is None:
if sys.flags.warn_default_encoding:
2020-04-16 19:34:21 -04:00
import warnings
warnings.warn(
"'encoding' argument not specified.",
EncodingWarning, stacklevel + 2)
2021-02-14 09:06:57 -05:00
encoding = "locale"
2020-04-16 19:34:21 -04:00
return encoding
For example, ``pathlib.Path.read_text()`` can use it like this:
2021-01-30 04:18:19 -05:00
.. code-block::
2020-04-16 19:34:21 -04:00
def read_text(self, encoding=None, errors=None):
encoding = io.text_encoding(encoding)
with self.open(mode='r', encoding=encoding, errors=errors) as f:
return f.read()
2021-01-30 22:44:30 -05:00
By using ``io.text_encoding()``, ``EncodingWarning`` is emitted for
2021-02-14 09:06:57 -05:00
the caller of ``read_text()`` instead of ``read_text()`` itself.
2020-04-16 19:34:21 -04:00
Affected standard library modules
---------------------------------
2021-01-30 04:18:19 -05:00
Many standard library modules will be affected by this change.
2021-01-30 22:44:30 -05:00
Most APIs accepting ``encoding=None`` will use ``io.text_encoding()``
as written in the previous section.
Where using the locale encoding as the default encoding is reasonable,
2021-02-14 09:06:57 -05:00
``encoding="locale"`` will be used instead. For example,
the ``subprocess`` module will use the locale encoding as the default
for pipes.
2021-01-30 22:44:30 -05:00
Many tests use ``open()`` without ``encoding`` specified to read
ASCII text files. They should be rewritten with ``encoding="ascii"``.
2020-04-16 19:34:21 -04:00
Rationale
=========
2021-01-30 04:18:19 -05:00
Opt-in warning
--------------
2021-01-30 04:18:19 -05:00
Although ``DeprecationWarning`` is suppressed by default, always
emitting ``DeprecationWarning`` when the ``encoding`` argument is
omitted would be too noisy.
2021-01-30 04:18:19 -05:00
2021-01-30 22:44:30 -05:00
Noisy warnings may lead developers to dismiss the
``DeprecationWarning``.
2021-01-30 04:18:19 -05:00
2020-04-16 19:34:21 -04:00
"locale" is not a codec alias
-----------------------------
We don't add "locale" as a codec alias because the locale can be
changed at runtime.
2020-04-16 19:34:21 -04:00
Additionally, ``TextIOWrapper`` checks ``os.device_encoding()``
when ``encoding=None``. This behavior cannot be implemented in
a codec.
2020-04-16 19:34:21 -04:00
2021-02-14 09:06:57 -05:00
Backward Compatibility
======================
The new warning is not emitted by default, so this PEP is 100%
backwards-compatible.
2021-02-14 09:06:57 -05:00
Forward Compatibility
=====================
Passing ``"locale"`` as the argument to ``encoding`` is not
forward-compatible. Code using it will not work on Python older than
3.10, and will instead raise ``LookupError: unknown encoding: locale``.
2021-02-14 09:06:57 -05:00
Until developers can drop Python 3.9 support, ``EncodingWarning``
can only be used for finding missing ``encoding="utf-8"`` arguments.
2021-02-14 09:06:57 -05:00
How to Teach This
2021-02-14 09:06:57 -05:00
=================
For new users
-------------
Since ``EncodingWarning`` is used to write cross-platform code,
there is no need to teach it to new users.
2021-02-14 09:06:57 -05:00
We can just recommend using UTF-8 for text files and using
``encoding="utf-8"`` when opening them.
2021-02-14 09:06:57 -05:00
For experienced users
---------------------
Using ``open(filename)`` to read text files encoded in UTF-8 is a
common mistake. It may not work on Windows because UTF-8 is not the
default encoding.
You can use ``-X warn_default_encoding`` or
``PYTHONWARNDEFAULTENCODING=1`` to find this type of mistake.
2021-02-14 09:06:57 -05:00
Omitting the ``encoding`` argument is not a bug when opening text files
encoded in the locale encoding, but ``encoding="locale"`` is recommended
in Python 3.10 and later because it is more explicit.
2021-02-14 09:06:57 -05:00
Reference Implementation
========================
2020-04-16 19:34:21 -04:00
https://github.com/python/cpython/pull/19481
2021-02-14 09:06:57 -05:00
Discussions
===========
2021-02-21 02:01:52 -05:00
The latest discussion thread is:
https://mail.python.org/archives/list/python-dev@python.org/thread/SFYUP2TWD5JZ5KDLVSTZ44GWKVY4YNCV/
2021-02-14 09:06:57 -05:00
* Why not implement this in linters?
* ``encoding="locale"`` and ``io.text_encoding()`` must be implemented
in Python.
2021-02-14 09:06:57 -05:00
* It is difficult to find all callers of functions wrapping
``open()`` or ``TextIOWrapper()`` (see the ``io.text_encoding()``
section).
2021-02-14 09:06:57 -05:00
* Many developers will not use the option.
* Some will, and report the warnings to libraries they use,
so the option is worth it even if many developers don't enable it.
2021-02-14 09:06:57 -05:00
* For example, I found [7]_ and [8]_ by running
``pip install -U pip``, and [9]_ by running ``tox``
with the reference implementation. This demonstrates how this
2021-03-02 22:22:59 -05:00
option can be used to find potential issues.
2021-02-14 09:06:57 -05:00
References
==========
2020-04-16 19:34:21 -04:00
.. [1] "Packages can't be installed when encoding is not UTF-8"
(https://github.com/methane/pep597-pypi-ascii)
.. [2] "Logging - Inconsistent behaviour when handling unicode"
(https://bugs.python.org/issue37111)
.. [3] Packaging tutorial in packaging.python.org didn't specify
encoding to read a ``README.md``
(https://github.com/pypa/packaging.python.org/pull/682)
.. [4] ``json.tool`` had used locale encoding to read JSON files.
(https://bugs.python.org/issue33684)
2021-02-14 09:06:57 -05:00
.. [5] site: Potential UnicodeDecodeError when handling pth file
(https://bugs.python.org/issue33684)
.. [6] pypa/pip: "Installing packages fails if Python 3 installed
into path with non-ASCII characters"
(https://github.com/pypa/pip/issues/9054)
.. [7] "site: Potential UnicodeDecodeError when handling pth file"
(https://bugs.python.org/issue43214)
.. [8] "[pypa/pip] Use ``encoding`` option or binary mode for open()"
(https://github.com/pypa/pip/pull/9608)
.. [9] "Possible UnicodeError caused by missing encoding="utf-8""
(https://github.com/tox-dev/tox/issues/1908)
Copyright
=========
2021-02-14 09:06:57 -05:00
This document is placed in the public domain or under the
CC0-1.0-Universal license, whichever is more permissive.
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
fill-column: 70
coding: utf-8
End: