PEP 597: Update (#1799)
This commit is contained in:
parent
7d7965bf2d
commit
1653e14d8b
169
pep-0597.rst
169
pep-0597.rst
|
@ -21,6 +21,9 @@ The warning is disabled by default. New ``-X warn_encoding``
|
|||
command-line option and ``PYTHONWARNENCODING`` environment variable
|
||||
are used to enable the warnings.
|
||||
|
||||
``encoding="locale"`` option is added too. It is used to specify
|
||||
locale encoding explicitly.
|
||||
|
||||
|
||||
Motivation
|
||||
==========
|
||||
|
@ -39,34 +42,57 @@ in the ``README.md`` file which is encoded in UTF-8.
|
|||
For example, 489 packages of the 4000 most downloaded packages from
|
||||
PyPI used non-ASCII characters in README. And 82 packages of them
|
||||
can not be installed from source package when locale encoding is
|
||||
ASCII. [1_] They used the default encoding to read README or TOML
|
||||
ASCII. [1]_ They used the default encoding to read README or TOML
|
||||
file.
|
||||
|
||||
Another example is ``logging.basicConfig(filename="log.txt")``.
|
||||
Some users expect UTF-8 is used by default, but locale encoding is
|
||||
used actually. [2_]
|
||||
used actually. [2]_
|
||||
|
||||
Even Python experts assume that default encoding is UTF-8.
|
||||
It creates bugs that happen only on Windows. See [3_] and [4_].
|
||||
It creates bugs that happen only on Windows. See [3]_, [4]_, [5]_,
|
||||
and [6]_ for example.
|
||||
|
||||
Emitting a warning when the ``encoding`` option is omitted will help
|
||||
to find such mistakes.
|
||||
|
||||
|
||||
Explicit way to use locale-specific encoding
|
||||
--------------------------------------------
|
||||
|
||||
``open(filename)`` isn't explicit about which encoding is expected:
|
||||
|
||||
* Expects ASCII (not a bug, but inefficient on Windows)
|
||||
* Expects UTF-8 (bug or platform specific script)
|
||||
* Expects the locale encoding.
|
||||
|
||||
In this point of view, ``open(filename)`` is not readable.
|
||||
|
||||
``encoding=locale.getpreferredencoding(False)`` can be used to
|
||||
specify the locale encoding explicitly. But it is too long and easy
|
||||
to misuse. (e.g. forget to pass ``False`` to its parameter)
|
||||
|
||||
This PEP provides an explicit way to specify the locale encoding.
|
||||
|
||||
|
||||
Prepare to change the default encoding to UTF-8
|
||||
-----------------------------------------------
|
||||
|
||||
We had chosen to use locale encoding for the default text encoding in
|
||||
Python 3.0. But UTF-8 has been adopted very widely since then.
|
||||
Since UTF-8 becomes de-facto standard text encoding, we might change
|
||||
the default text encoding to UTF-8 in the future.
|
||||
|
||||
We might change the default text encoding to UTF-8 in the future.
|
||||
But this change will affect many applications and libraries.
|
||||
Many ``DeprecationWarning`` will be emitted if we start emitting the
|
||||
warning by default. It will be too noisy.
|
||||
But this change will affect many applications and libraries. If we
|
||||
start emitting ``DeprecationWarning`` everywhere ``encoding`` option
|
||||
is omitted by default, it will be too noisy and painful.
|
||||
|
||||
Although this PEP doesn't propose to change the default encoding,
|
||||
this PEP will help to reduce the warning in the future if we decide
|
||||
to change the default encoding.
|
||||
this PEP will the change:
|
||||
|
||||
* Reduce the number of omitted ``encoding`` option in many libraries
|
||||
before emitting the warning by default.
|
||||
|
||||
* Users will be able to use ``encoding="locale"`` option to suppress
|
||||
the warning without dropping Python 3.10 support.
|
||||
|
||||
|
||||
Specification
|
||||
|
@ -75,7 +101,7 @@ Specification
|
|||
``EncodingWarning``
|
||||
--------------------
|
||||
|
||||
Add new ``EncodingWarning`` warning class which is a subclass of
|
||||
Add a new ``EncodingWarning`` warning class which is a subclass of
|
||||
``Warning``. It is used to warn when the ``encoding`` option is
|
||||
omitted and the default encoding is locale-specific.
|
||||
|
||||
|
@ -94,6 +120,9 @@ When the option is enabled, ``io.TextIOWrapper()``, ``open()``, and
|
|||
other modules using them will emit ``EncodingWarning`` when
|
||||
``encoding`` is omitted.
|
||||
|
||||
Since ``EncodingWarning`` is a subclass of ``Warning``, they are
|
||||
shown by default, unlike ``DeprecationWarning``.
|
||||
|
||||
|
||||
``encoding="locale"`` option
|
||||
----------------------------
|
||||
|
@ -102,21 +131,6 @@ other modules using them will emit ``EncodingWarning`` when
|
|||
same to current ``encoding=None``. But ``io.TextIOWrapper`` doesn't
|
||||
emit ``EncodingWarning`` when ``encoding="locale"`` is specified.
|
||||
|
||||
Add ``io.LOCALE_ENCODING = "locale"`` constant too. This constant can
|
||||
be used to avoid confusing ``LookupError: unknown encoding: locale``
|
||||
error when the code is run in old Python accidentally.
|
||||
|
||||
The constant can be used to test that ``encoding="locale"`` option is
|
||||
supported too. For example,
|
||||
|
||||
.. code-block::
|
||||
|
||||
# Want to suppress an EncodingWarning but still need support
|
||||
# old Python versions.
|
||||
locale_encoding = getattr(io, "LOCALE_ENCODING", None)
|
||||
with open(filename, encoding=locale_encoding) as f:
|
||||
...
|
||||
|
||||
|
||||
``io.text_encoding()``
|
||||
-----------------------
|
||||
|
@ -145,7 +159,7 @@ Pure Python implementation will be like this::
|
|||
import warnings
|
||||
warnings.warn("'encoding' option is omitted",
|
||||
EncodingWarning, stacklevel + 2)
|
||||
encoding = LOCALE_ENCODING
|
||||
encoding = "locale"
|
||||
return encoding
|
||||
|
||||
For example, ``pathlib.Path.read_text()`` can use the function like:
|
||||
|
@ -158,11 +172,11 @@ For example, ``pathlib.Path.read_text()`` can use the function like:
|
|||
return f.read()
|
||||
|
||||
By using ``io.text_encoding()``, ``EncodingWarning`` is emitted for
|
||||
the caller of ``read_text()`` instead of ``read_text()``.
|
||||
the caller of ``read_text()`` instead of ``read_text()`` itself.
|
||||
|
||||
|
||||
Affected stdlibs
|
||||
-------------------
|
||||
-----------------
|
||||
|
||||
Many stdlibs will be affected by this change.
|
||||
|
||||
|
@ -170,8 +184,8 @@ Most APIs accepting ``encoding=None`` will use ``io.text_encoding()``
|
|||
as written in the previous section.
|
||||
|
||||
Where using locale encoding as the default encoding is reasonable,
|
||||
``encoding=io.LOCALE_ENCODING`` will be used instead. For example,
|
||||
``subprocess`` module will use locale encoding for the default
|
||||
``encoding="locale"`` will be used instead. For example,
|
||||
the ``subprocess`` module will use locale encoding for the default
|
||||
encoding of the pipes.
|
||||
|
||||
Many tests use ``open()`` without ``encoding`` specified to read
|
||||
|
@ -185,7 +199,7 @@ Opt-in warning
|
|||
---------------
|
||||
|
||||
Although ``DeprecationWarning`` is suppressed by default, emitting
|
||||
``DeprecationWarning`` always when ``encoding`` option is omitted
|
||||
``DeprecationWarning`` always when the ``encoding`` option is omitted
|
||||
would be too noisy.
|
||||
|
||||
Noisy warnings may lead developers to dismiss the
|
||||
|
@ -203,12 +217,82 @@ when ``encoding=None``. This behavior can not be implemented in
|
|||
the codec.
|
||||
|
||||
|
||||
Backward Compatibility
|
||||
======================
|
||||
|
||||
The new warning is not emitted by default. So this PEP is 100%
|
||||
backward compatible.
|
||||
|
||||
|
||||
Forward Compatibility
|
||||
=====================
|
||||
|
||||
``encoding="locale"`` option is not forward compatible. Codes
|
||||
using the option will not work on Python older than 3.10. It will
|
||||
raise ``LookupError: unknown encoding: locale``.
|
||||
|
||||
Until developers can drop Python 3.9 support, ``EncodingWarning``
|
||||
can be used only for finding missing ``encoding="utf-8"`` options.
|
||||
|
||||
|
||||
How to teach this
|
||||
=================
|
||||
|
||||
For new users
|
||||
-------------
|
||||
|
||||
Since ``EncodingWarning`` is used to write a cross-platform code,
|
||||
no need to teach it to new users.
|
||||
|
||||
We can just recommend using UTF-8 for text files and use
|
||||
``encoding="utf-8"`` when opening test files.
|
||||
|
||||
|
||||
For experienced users
|
||||
---------------------
|
||||
|
||||
Using ``open(filename)`` to read text files encoded in UTF-8 is a
|
||||
common mistake. It may not work on Windows because UTF-8 is not the
|
||||
default encoding.
|
||||
|
||||
You can use ``-X warn_encoding`` or ``PYTHONWARNENCODING=1`` to find
|
||||
this type of mistake.
|
||||
|
||||
Omitting ``encoding`` option is not a bug when opening text files
|
||||
encoded in locale encoding. But ``encoding="locale"`` is recommended
|
||||
after Python 3.10 because it is more explicit.
|
||||
|
||||
|
||||
Reference Implementation
|
||||
========================
|
||||
|
||||
https://github.com/python/cpython/pull/19481
|
||||
|
||||
|
||||
Discussions
|
||||
===========
|
||||
|
||||
* Why not implement this in linters?
|
||||
|
||||
* ``encoding="locale"`` and ``io.text_encoding()`` must be in
|
||||
Python.
|
||||
|
||||
* It is difficult to find all caller of functions wrapping
|
||||
``open()`` or ``TextIOWrapper()``. (See ``io.text_encoding()``
|
||||
section.)
|
||||
|
||||
* Many developers will not use the option.
|
||||
|
||||
* Some developers use the option and report the warnings to
|
||||
libraries they use. So the option is worth enough even though
|
||||
many developers won't use it.
|
||||
|
||||
* For example, I find [7]_ and [8]_ by running
|
||||
``pip install -U pip`` and find [9]_ by running ``tox``
|
||||
with the reference implementation. It demonstrates how this
|
||||
option find potential issues.
|
||||
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
|
@ -225,11 +309,28 @@ References
|
|||
.. [4] ``json.tool`` had used locale encoding to read JSON files.
|
||||
(https://bugs.python.org/issue33684)
|
||||
|
||||
.. [5] site: Potential UnicodeDecodeError when handling pth file
|
||||
(https://bugs.python.org/issue33684)
|
||||
|
||||
.. [6] pypa/pip: "Installing packages fails if Python 3 installed
|
||||
into path with non-ASCII characters"
|
||||
(https://github.com/pypa/pip/issues/9054)
|
||||
|
||||
.. [7] "site: Potential UnicodeDecodeError when handling pth file"
|
||||
(https://bugs.python.org/issue43214)
|
||||
|
||||
.. [8] "[pypa/pip] Use ``encoding`` option or binary mode for open()"
|
||||
(https://github.com/pypa/pip/pull/9608)
|
||||
|
||||
.. [9] "Possible UnicodeError caused by missing encoding="utf-8""
|
||||
(https://github.com/tox-dev/tox/issues/1908)
|
||||
|
||||
|
||||
Copyright
|
||||
=========
|
||||
|
||||
This document has been placed in the public domain.
|
||||
This document is placed in the public domain or under the
|
||||
CC0-1.0-Universal license, whichever is more permissive.
|
||||
|
||||
|
||||
..
|
||||
|
|
Loading…
Reference in New Issue