352 lines
11 KiB
ReStructuredText
352 lines
11 KiB
ReStructuredText
PEP: 597
|
|
Title: Add optional EncodingWarning
|
|
Last-Modified: 07-Aug-2021
|
|
Author: Inada Naoki <songofacandy@gmail.com>
|
|
Status: Final
|
|
Type: Standards Track
|
|
Content-Type: text/x-rst
|
|
Created: 05-Jun-2019
|
|
Python-Version: 3.10
|
|
|
|
|
|
Abstract
|
|
========
|
|
|
|
Add a new warning category ``EncodingWarning``. It is emitted when the
|
|
``encoding`` argument to ``open()`` is omitted and the default
|
|
locale-specific encoding is used.
|
|
|
|
The warning is disabled by default. A new ``-X warn_default_encoding``
|
|
command-line option and a new ``PYTHONWARNDEFAULTENCODING`` environment
|
|
variable can be used to enable it.
|
|
|
|
A ``"locale"`` argument value for ``encoding`` is added too. It
|
|
explicitly specifies that the locale encoding should be used, silencing
|
|
the warning.
|
|
|
|
|
|
Motivation
|
|
==========
|
|
|
|
Using the default encoding is a common mistake
|
|
----------------------------------------------
|
|
|
|
Developers using macOS or Linux may forget that the default encoding
|
|
is not always UTF-8.
|
|
|
|
For example, using ``long_description = open("README.md").read()`` in
|
|
``setup.py`` is a common mistake. Many Windows users cannot install
|
|
such packages if there is at least one non-ASCII character
|
|
(e.g. emoji, author names, copyright symbols, and the like)
|
|
in their UTF-8-encoded ``README.md`` file.
|
|
|
|
Of the 4000 most downloaded packages from PyPI, 489 use non-ASCII
|
|
characters in their README, and 82 fail to install from source on
|
|
non-UTF-8 locales due to not specifying an encoding for a non-ASCII
|
|
file. [1]_
|
|
|
|
Another example is ``logging.basicConfig(filename="log.txt")``.
|
|
Some users might expect it to use UTF-8 by default, but the locale
|
|
encoding is actually what is used. [2]_
|
|
|
|
Even Python experts may assume that the default encoding is UTF-8.
|
|
This creates bugs that only happen on Windows; see [3]_, [4]_, [5]_,
|
|
and [6]_ for example.
|
|
|
|
Emitting a warning when the ``encoding`` argument is omitted will help
|
|
find such mistakes.
|
|
|
|
|
|
Explicit way to use locale-specific encoding
|
|
--------------------------------------------
|
|
|
|
``open(filename)`` isn't explicit about which encoding is expected:
|
|
|
|
* If ASCII is assumed, this isn't a bug, but may result in decreased
|
|
performance on Windows, particularly with non-Latin-1 locale encodings
|
|
* If UTF-8 is assumed, this may be a bug or a platform-specific script
|
|
* If the locale encoding is assumed, the behavior is as expected
|
|
(but could change if future versions of Python modify the default)
|
|
|
|
From this point of view, ``open(filename)`` is not readable code.
|
|
|
|
``encoding=locale.getpreferredencoding(False)`` can be used to
|
|
specify the locale encoding explicitly, but it is too long and easy
|
|
to misuse (e.g. one can forget to pass ``False`` as its argument).
|
|
|
|
This PEP provides an explicit way to specify the locale encoding.
|
|
|
|
|
|
Prepare to change the default encoding to UTF-8
|
|
-----------------------------------------------
|
|
|
|
Since UTF-8 has become the de-facto standard text encoding,
|
|
we might default to it for opening files in the future.
|
|
|
|
However, such a change will affect many applications and libraries.
|
|
If we start emitting ``DeprecationWarning`` everywhere the ``encoding``
|
|
argument is omitted, it will be too noisy and painful.
|
|
|
|
Although this PEP doesn't propose changing the default encoding,
|
|
it will help enable that change by:
|
|
|
|
* Reducing the number of omitted ``encoding`` arguments in libraries
|
|
before we start emitting a ``DeprecationWarning`` by default.
|
|
|
|
* Allowing users to pass ``encoding="locale"`` to suppress
|
|
the current warning and any ``DeprecationWarning`` added in the future,
|
|
as well as retaining consistent behavior if later Python versions
|
|
change the default, ensuring support for any Python version >=3.10.
|
|
|
|
|
|
Specification
|
|
=============
|
|
|
|
``EncodingWarning``
|
|
-------------------
|
|
|
|
Add a new ``EncodingWarning`` warning class as a subclass of
|
|
``Warning``. It is emitted when the ``encoding`` argument is omitted and
|
|
the default locale-specific encoding is used.
|
|
|
|
|
|
Options to enable the warning
|
|
-----------------------------
|
|
|
|
The ``-X warn_default_encoding`` option and the
|
|
``PYTHONWARNDEFAULTENCODING`` environment variable are added. They
|
|
are used to enable ``EncodingWarning``.
|
|
|
|
``sys.flags.warn_default_encoding`` is also added. The flag is true when
|
|
``EncodingWarning`` is enabled.
|
|
|
|
When the flag is set, ``io.TextIOWrapper()``, ``open()`` and other
|
|
modules using them will emit ``EncodingWarning`` when the ``encoding``
|
|
argument is omitted.
|
|
|
|
Since ``EncodingWarning`` is a subclass of ``Warning``, they are
|
|
shown by default (if the ``warn_default_encoding`` flag is set), unlike
|
|
``DeprecationWarning``.
|
|
|
|
|
|
``encoding="locale"``
|
|
---------------------
|
|
|
|
``io.TextIOWrapper`` will accept ``"locale"`` as a valid argument to
|
|
``encoding``. It has the same meaning as the current ``encoding=None``,
|
|
except that ``io.TextIOWrapper`` doesn't emit ``EncodingWarning`` when
|
|
``encoding="locale"`` is specified.
|
|
|
|
|
|
``io.text_encoding()``
|
|
----------------------
|
|
|
|
``io.text_encoding()`` is a helper for functions with an
|
|
``encoding=None`` parameter that pass it to ``io.TextIOWrapper()`` or
|
|
``open()``.
|
|
|
|
A pure Python implementation will look like this::
|
|
|
|
def text_encoding(encoding, stacklevel=1):
|
|
"""A helper function to choose the text encoding.
|
|
|
|
When *encoding* is not None, just return it.
|
|
Otherwise, return the default text encoding (i.e. "locale").
|
|
|
|
This function emits an EncodingWarning if *encoding* is None and
|
|
sys.flags.warn_default_encoding is true.
|
|
|
|
This function can be used in APIs with an encoding=None parameter
|
|
that pass it to TextIOWrapper or open.
|
|
However, please consider using encoding="utf-8" for new APIs.
|
|
"""
|
|
if encoding is None:
|
|
if sys.flags.warn_default_encoding:
|
|
import warnings
|
|
warnings.warn(
|
|
"'encoding' argument not specified.",
|
|
EncodingWarning, stacklevel + 2)
|
|
encoding = "locale"
|
|
return encoding
|
|
|
|
For example, ``pathlib.Path.read_text()`` can use it like this:
|
|
|
|
.. code-block::
|
|
|
|
def read_text(self, encoding=None, errors=None):
|
|
encoding = io.text_encoding(encoding)
|
|
with self.open(mode='r', encoding=encoding, errors=errors) as f:
|
|
return f.read()
|
|
|
|
By using ``io.text_encoding()``, ``EncodingWarning`` is emitted for
|
|
the caller of ``read_text()`` instead of ``read_text()`` itself.
|
|
|
|
|
|
Affected standard library modules
|
|
---------------------------------
|
|
|
|
Many standard library modules will be affected by this change.
|
|
|
|
Most APIs accepting ``encoding=None`` will use ``io.text_encoding()``
|
|
as written in the previous section.
|
|
|
|
Where using the locale encoding as the default encoding is reasonable,
|
|
``encoding="locale"`` will be used instead. For example,
|
|
the ``subprocess`` module will use the locale encoding as the default
|
|
for pipes.
|
|
|
|
Many tests use ``open()`` without ``encoding`` specified to read
|
|
ASCII text files. They should be rewritten with ``encoding="ascii"``.
|
|
|
|
|
|
Rationale
|
|
=========
|
|
|
|
Opt-in warning
|
|
--------------
|
|
|
|
Although ``DeprecationWarning`` is suppressed by default, always
|
|
emitting ``DeprecationWarning`` when the ``encoding`` argument is
|
|
omitted would be too noisy.
|
|
|
|
Noisy warnings may lead developers to dismiss the
|
|
``DeprecationWarning``.
|
|
|
|
|
|
"locale" is not a codec alias
|
|
-----------------------------
|
|
|
|
We don't add "locale" as a codec alias because the locale can be
|
|
changed at runtime.
|
|
|
|
Additionally, ``TextIOWrapper`` checks ``os.device_encoding()``
|
|
when ``encoding=None``. This behavior cannot be implemented in
|
|
a codec.
|
|
|
|
|
|
Backward Compatibility
|
|
======================
|
|
|
|
The new warning is not emitted by default, so this PEP is 100%
|
|
backwards-compatible.
|
|
|
|
|
|
Forward Compatibility
|
|
=====================
|
|
|
|
Passing ``"locale"`` as the argument to ``encoding`` is not
|
|
forward-compatible. Code using it will not work on Python older than
|
|
3.10, and will instead raise ``LookupError: unknown encoding: locale``.
|
|
|
|
Until developers can drop Python 3.9 support, ``EncodingWarning``
|
|
can only be used for finding missing ``encoding="utf-8"`` arguments.
|
|
|
|
|
|
How to Teach This
|
|
=================
|
|
|
|
For new users
|
|
-------------
|
|
|
|
Since ``EncodingWarning`` is used to write cross-platform code,
|
|
there is no need to teach it to new users.
|
|
|
|
We can just recommend using UTF-8 for text files and using
|
|
``encoding="utf-8"`` when opening them.
|
|
|
|
|
|
For experienced users
|
|
---------------------
|
|
|
|
Using ``open(filename)`` to read text files encoded in UTF-8 is a
|
|
common mistake. It may not work on Windows because UTF-8 is not the
|
|
default encoding.
|
|
|
|
You can use ``-X warn_default_encoding`` or
|
|
``PYTHONWARNDEFAULTENCODING=1`` to find this type of mistake.
|
|
|
|
Omitting the ``encoding`` argument is not a bug when opening text files
|
|
encoded in the locale encoding, but ``encoding="locale"`` is recommended
|
|
in Python 3.10 and later because it is more explicit.
|
|
|
|
|
|
Reference Implementation
|
|
========================
|
|
|
|
https://github.com/python/cpython/pull/19481
|
|
|
|
|
|
Discussions
|
|
===========
|
|
|
|
The latest discussion thread is:
|
|
https://mail.python.org/archives/list/python-dev@python.org/thread/SFYUP2TWD5JZ5KDLVSTZ44GWKVY4YNCV/
|
|
|
|
|
|
* Why not implement this in linters?
|
|
|
|
* ``encoding="locale"`` and ``io.text_encoding()`` must be implemented
|
|
in Python.
|
|
|
|
* It is difficult to find all callers of functions wrapping
|
|
``open()`` or ``TextIOWrapper()`` (see the ``io.text_encoding()``
|
|
section).
|
|
|
|
* Many developers will not use the option.
|
|
|
|
* Some will, and report the warnings to libraries they use,
|
|
so the option is worth it even if many developers don't enable it.
|
|
|
|
* For example, I found [7]_ and [8]_ by running
|
|
``pip install -U pip``, and [9]_ by running ``tox``
|
|
with the reference implementation. This demonstrates how this
|
|
option can be used to find potential issues.
|
|
|
|
|
|
References
|
|
==========
|
|
|
|
.. [1] "Packages can't be installed when encoding is not UTF-8"
|
|
(https://github.com/methane/pep597-pypi-ascii)
|
|
|
|
.. [2] "Logging - Inconsistent behaviour when handling unicode"
|
|
(https://bugs.python.org/issue37111)
|
|
|
|
.. [3] Packaging tutorial in packaging.python.org didn't specify
|
|
encoding to read a ``README.md``
|
|
(https://github.com/pypa/packaging.python.org/pull/682)
|
|
|
|
.. [4] ``json.tool`` had used locale encoding to read JSON files.
|
|
(https://bugs.python.org/issue33684)
|
|
|
|
.. [5] site: Potential UnicodeDecodeError when handling pth file
|
|
(https://bugs.python.org/issue33684)
|
|
|
|
.. [6] pypa/pip: "Installing packages fails if Python 3 installed
|
|
into path with non-ASCII characters"
|
|
(https://github.com/pypa/pip/issues/9054)
|
|
|
|
.. [7] "site: Potential UnicodeDecodeError when handling pth file"
|
|
(https://bugs.python.org/issue43214)
|
|
|
|
.. [8] "[pypa/pip] Use ``encoding`` option or binary mode for open()"
|
|
(https://github.com/pypa/pip/pull/9608)
|
|
|
|
.. [9] "Possible UnicodeError caused by missing encoding="utf-8""
|
|
(https://github.com/tox-dev/tox/issues/1908)
|
|
|
|
|
|
Copyright
|
|
=========
|
|
|
|
This document is placed in the public domain or under the
|
|
CC0-1.0-Universal license, whichever is more permissive.
|
|
|
|
..
|
|
Local Variables:
|
|
mode: indented-text
|
|
indent-tabs-mode: nil
|
|
fill-column: 70
|
|
coding: utf-8
|
|
End:
|