PEP 597: Update (#1799)

This commit is contained in:
Inada Naoki 2021-02-14 23:06:57 +09:00 committed by GitHub
parent 7d7965bf2d
commit 1653e14d8b
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 135 additions and 34 deletions

View File

@ -21,6 +21,9 @@ The warning is disabled by default. New ``-X warn_encoding``
command-line option and ``PYTHONWARNENCODING`` environment variable
are used to enable the warnings.
``encoding="locale"`` option is added too. It is used to specify
locale encoding explicitly.
Motivation
==========
@ -39,34 +42,57 @@ in the ``README.md`` file which is encoded in UTF-8.
For example, 489 packages of the 4000 most downloaded packages from
PyPI used non-ASCII characters in README. And 82 packages of them
can not be installed from source package when locale encoding is
ASCII. [1_] They used the default encoding to read README or TOML
ASCII. [1]_ They used the default encoding to read README or TOML
file.
Another example is ``logging.basicConfig(filename="log.txt")``.
Some users expect UTF-8 is used by default, but locale encoding is
used actually. [2_]
used actually. [2]_
Even Python experts assume that default encoding is UTF-8.
It creates bugs that happen only on Windows. See [3_] and [4_].
It creates bugs that happen only on Windows. See [3]_, [4]_, [5]_,
and [6]_ for example.
Emitting a warning when the ``encoding`` option is omitted will help
to find such mistakes.
Explicit way to use locale-specific encoding
--------------------------------------------
``open(filename)`` isn't explicit about which encoding is expected:
* Expects ASCII (not a bug, but inefficient on Windows)
* Expects UTF-8 (bug or platform specific script)
* Expects the locale encoding.
In this point of view, ``open(filename)`` is not readable.
``encoding=locale.getpreferredencoding(False)`` can be used to
specify the locale encoding explicitly. But it is too long and easy
to misuse. (e.g. forget to pass ``False`` to its parameter)
This PEP provides an explicit way to specify the locale encoding.
Prepare to change the default encoding to UTF-8
-----------------------------------------------
We had chosen to use locale encoding for the default text encoding in
Python 3.0. But UTF-8 has been adopted very widely since then.
Since UTF-8 becomes de-facto standard text encoding, we might change
the default text encoding to UTF-8 in the future.
We might change the default text encoding to UTF-8 in the future.
But this change will affect many applications and libraries.
Many ``DeprecationWarning`` will be emitted if we start emitting the
warning by default. It will be too noisy.
But this change will affect many applications and libraries. If we
start emitting ``DeprecationWarning`` everywhere ``encoding`` option
is omitted by default, it will be too noisy and painful.
Although this PEP doesn't propose to change the default encoding,
this PEP will help to reduce the warning in the future if we decide
to change the default encoding.
this PEP will the change:
* Reduce the number of omitted ``encoding`` option in many libraries
before emitting the warning by default.
* Users will be able to use ``encoding="locale"`` option to suppress
the warning without dropping Python 3.10 support.
Specification
@ -75,7 +101,7 @@ Specification
``EncodingWarning``
--------------------
Add new ``EncodingWarning`` warning class which is a subclass of
Add a new ``EncodingWarning`` warning class which is a subclass of
``Warning``. It is used to warn when the ``encoding`` option is
omitted and the default encoding is locale-specific.
@ -94,6 +120,9 @@ When the option is enabled, ``io.TextIOWrapper()``, ``open()``, and
other modules using them will emit ``EncodingWarning`` when
``encoding`` is omitted.
Since ``EncodingWarning`` is a subclass of ``Warning``, they are
shown by default, unlike ``DeprecationWarning``.
``encoding="locale"`` option
----------------------------
@ -102,21 +131,6 @@ other modules using them will emit ``EncodingWarning`` when
same to current ``encoding=None``. But ``io.TextIOWrapper`` doesn't
emit ``EncodingWarning`` when ``encoding="locale"`` is specified.
Add ``io.LOCALE_ENCODING = "locale"`` constant too. This constant can
be used to avoid confusing ``LookupError: unknown encoding: locale``
error when the code is run in old Python accidentally.
The constant can be used to test that ``encoding="locale"`` option is
supported too. For example,
.. code-block::
# Want to suppress an EncodingWarning but still need support
# old Python versions.
locale_encoding = getattr(io, "LOCALE_ENCODING", None)
with open(filename, encoding=locale_encoding) as f:
...
``io.text_encoding()``
-----------------------
@ -145,7 +159,7 @@ Pure Python implementation will be like this::
import warnings
warnings.warn("'encoding' option is omitted",
EncodingWarning, stacklevel + 2)
encoding = LOCALE_ENCODING
encoding = "locale"
return encoding
For example, ``pathlib.Path.read_text()`` can use the function like:
@ -158,11 +172,11 @@ For example, ``pathlib.Path.read_text()`` can use the function like:
return f.read()
By using ``io.text_encoding()``, ``EncodingWarning`` is emitted for
the caller of ``read_text()`` instead of ``read_text()``.
the caller of ``read_text()`` instead of ``read_text()`` itself.
Affected stdlibs
-------------------
-----------------
Many stdlibs will be affected by this change.
@ -170,8 +184,8 @@ Most APIs accepting ``encoding=None`` will use ``io.text_encoding()``
as written in the previous section.
Where using locale encoding as the default encoding is reasonable,
``encoding=io.LOCALE_ENCODING`` will be used instead. For example,
``subprocess`` module will use locale encoding for the default
``encoding="locale"`` will be used instead. For example,
the ``subprocess`` module will use locale encoding for the default
encoding of the pipes.
Many tests use ``open()`` without ``encoding`` specified to read
@ -185,7 +199,7 @@ Opt-in warning
---------------
Although ``DeprecationWarning`` is suppressed by default, emitting
``DeprecationWarning`` always when ``encoding`` option is omitted
``DeprecationWarning`` always when the ``encoding`` option is omitted
would be too noisy.
Noisy warnings may lead developers to dismiss the
@ -203,12 +217,82 @@ when ``encoding=None``. This behavior can not be implemented in
the codec.
Backward Compatibility
======================
The new warning is not emitted by default. So this PEP is 100%
backward compatible.
Forward Compatibility
=====================
``encoding="locale"`` option is not forward compatible. Codes
using the option will not work on Python older than 3.10. It will
raise ``LookupError: unknown encoding: locale``.
Until developers can drop Python 3.9 support, ``EncodingWarning``
can be used only for finding missing ``encoding="utf-8"`` options.
How to teach this
=================
For new users
-------------
Since ``EncodingWarning`` is used to write a cross-platform code,
no need to teach it to new users.
We can just recommend using UTF-8 for text files and use
``encoding="utf-8"`` when opening test files.
For experienced users
---------------------
Using ``open(filename)`` to read text files encoded in UTF-8 is a
common mistake. It may not work on Windows because UTF-8 is not the
default encoding.
You can use ``-X warn_encoding`` or ``PYTHONWARNENCODING=1`` to find
this type of mistake.
Omitting ``encoding`` option is not a bug when opening text files
encoded in locale encoding. But ``encoding="locale"`` is recommended
after Python 3.10 because it is more explicit.
Reference Implementation
========================
https://github.com/python/cpython/pull/19481
Discussions
===========
* Why not implement this in linters?
* ``encoding="locale"`` and ``io.text_encoding()`` must be in
Python.
* It is difficult to find all caller of functions wrapping
``open()`` or ``TextIOWrapper()``. (See ``io.text_encoding()``
section.)
* Many developers will not use the option.
* Some developers use the option and report the warnings to
libraries they use. So the option is worth enough even though
many developers won't use it.
* For example, I find [7]_ and [8]_ by running
``pip install -U pip`` and find [9]_ by running ``tox``
with the reference implementation. It demonstrates how this
option find potential issues.
References
==========
@ -225,11 +309,28 @@ References
.. [4] ``json.tool`` had used locale encoding to read JSON files.
(https://bugs.python.org/issue33684)
.. [5] site: Potential UnicodeDecodeError when handling pth file
(https://bugs.python.org/issue33684)
.. [6] pypa/pip: "Installing packages fails if Python 3 installed
into path with non-ASCII characters"
(https://github.com/pypa/pip/issues/9054)
.. [7] "site: Potential UnicodeDecodeError when handling pth file"
(https://bugs.python.org/issue43214)
.. [8] "[pypa/pip] Use ``encoding`` option or binary mode for open()"
(https://github.com/pypa/pip/pull/9608)
.. [9] "Possible UnicodeError caused by missing encoding="utf-8""
(https://github.com/tox-dev/tox/issues/1908)
Copyright
=========
This document has been placed in the public domain.
This document is placed in the public domain or under the
CC0-1.0-Universal license, whichever is more permissive.
..