PEP 597: Update (#1799)
This commit is contained in:
parent
7d7965bf2d
commit
1653e14d8b
169
pep-0597.rst
169
pep-0597.rst
|
@ -21,6 +21,9 @@ The warning is disabled by default. New ``-X warn_encoding``
|
||||||
command-line option and ``PYTHONWARNENCODING`` environment variable
|
command-line option and ``PYTHONWARNENCODING`` environment variable
|
||||||
are used to enable the warnings.
|
are used to enable the warnings.
|
||||||
|
|
||||||
|
``encoding="locale"`` option is added too. It is used to specify
|
||||||
|
locale encoding explicitly.
|
||||||
|
|
||||||
|
|
||||||
Motivation
|
Motivation
|
||||||
==========
|
==========
|
||||||
|
@ -39,34 +42,57 @@ in the ``README.md`` file which is encoded in UTF-8.
|
||||||
For example, 489 packages of the 4000 most downloaded packages from
|
For example, 489 packages of the 4000 most downloaded packages from
|
||||||
PyPI used non-ASCII characters in README. And 82 packages of them
|
PyPI used non-ASCII characters in README. And 82 packages of them
|
||||||
can not be installed from source package when locale encoding is
|
can not be installed from source package when locale encoding is
|
||||||
ASCII. [1_] They used the default encoding to read README or TOML
|
ASCII. [1]_ They used the default encoding to read README or TOML
|
||||||
file.
|
file.
|
||||||
|
|
||||||
Another example is ``logging.basicConfig(filename="log.txt")``.
|
Another example is ``logging.basicConfig(filename="log.txt")``.
|
||||||
Some users expect UTF-8 is used by default, but locale encoding is
|
Some users expect UTF-8 is used by default, but locale encoding is
|
||||||
used actually. [2_]
|
used actually. [2]_
|
||||||
|
|
||||||
Even Python experts assume that default encoding is UTF-8.
|
Even Python experts assume that default encoding is UTF-8.
|
||||||
It creates bugs that happen only on Windows. See [3_] and [4_].
|
It creates bugs that happen only on Windows. See [3]_, [4]_, [5]_,
|
||||||
|
and [6]_ for example.
|
||||||
|
|
||||||
Emitting a warning when the ``encoding`` option is omitted will help
|
Emitting a warning when the ``encoding`` option is omitted will help
|
||||||
to find such mistakes.
|
to find such mistakes.
|
||||||
|
|
||||||
|
|
||||||
|
Explicit way to use locale-specific encoding
|
||||||
|
--------------------------------------------
|
||||||
|
|
||||||
|
``open(filename)`` isn't explicit about which encoding is expected:
|
||||||
|
|
||||||
|
* Expects ASCII (not a bug, but inefficient on Windows)
|
||||||
|
* Expects UTF-8 (bug or platform specific script)
|
||||||
|
* Expects the locale encoding.
|
||||||
|
|
||||||
|
In this point of view, ``open(filename)`` is not readable.
|
||||||
|
|
||||||
|
``encoding=locale.getpreferredencoding(False)`` can be used to
|
||||||
|
specify the locale encoding explicitly. But it is too long and easy
|
||||||
|
to misuse. (e.g. forget to pass ``False`` to its parameter)
|
||||||
|
|
||||||
|
This PEP provides an explicit way to specify the locale encoding.
|
||||||
|
|
||||||
|
|
||||||
Prepare to change the default encoding to UTF-8
|
Prepare to change the default encoding to UTF-8
|
||||||
-----------------------------------------------
|
-----------------------------------------------
|
||||||
|
|
||||||
We had chosen to use locale encoding for the default text encoding in
|
Since UTF-8 becomes de-facto standard text encoding, we might change
|
||||||
Python 3.0. But UTF-8 has been adopted very widely since then.
|
the default text encoding to UTF-8 in the future.
|
||||||
|
|
||||||
We might change the default text encoding to UTF-8 in the future.
|
But this change will affect many applications and libraries. If we
|
||||||
But this change will affect many applications and libraries.
|
start emitting ``DeprecationWarning`` everywhere ``encoding`` option
|
||||||
Many ``DeprecationWarning`` will be emitted if we start emitting the
|
is omitted by default, it will be too noisy and painful.
|
||||||
warning by default. It will be too noisy.
|
|
||||||
|
|
||||||
Although this PEP doesn't propose to change the default encoding,
|
Although this PEP doesn't propose to change the default encoding,
|
||||||
this PEP will help to reduce the warning in the future if we decide
|
this PEP will the change:
|
||||||
to change the default encoding.
|
|
||||||
|
* Reduce the number of omitted ``encoding`` option in many libraries
|
||||||
|
before emitting the warning by default.
|
||||||
|
|
||||||
|
* Users will be able to use ``encoding="locale"`` option to suppress
|
||||||
|
the warning without dropping Python 3.10 support.
|
||||||
|
|
||||||
|
|
||||||
Specification
|
Specification
|
||||||
|
@ -75,7 +101,7 @@ Specification
|
||||||
``EncodingWarning``
|
``EncodingWarning``
|
||||||
--------------------
|
--------------------
|
||||||
|
|
||||||
Add new ``EncodingWarning`` warning class which is a subclass of
|
Add a new ``EncodingWarning`` warning class which is a subclass of
|
||||||
``Warning``. It is used to warn when the ``encoding`` option is
|
``Warning``. It is used to warn when the ``encoding`` option is
|
||||||
omitted and the default encoding is locale-specific.
|
omitted and the default encoding is locale-specific.
|
||||||
|
|
||||||
|
@ -94,6 +120,9 @@ When the option is enabled, ``io.TextIOWrapper()``, ``open()``, and
|
||||||
other modules using them will emit ``EncodingWarning`` when
|
other modules using them will emit ``EncodingWarning`` when
|
||||||
``encoding`` is omitted.
|
``encoding`` is omitted.
|
||||||
|
|
||||||
|
Since ``EncodingWarning`` is a subclass of ``Warning``, they are
|
||||||
|
shown by default, unlike ``DeprecationWarning``.
|
||||||
|
|
||||||
|
|
||||||
``encoding="locale"`` option
|
``encoding="locale"`` option
|
||||||
----------------------------
|
----------------------------
|
||||||
|
@ -102,21 +131,6 @@ other modules using them will emit ``EncodingWarning`` when
|
||||||
same to current ``encoding=None``. But ``io.TextIOWrapper`` doesn't
|
same to current ``encoding=None``. But ``io.TextIOWrapper`` doesn't
|
||||||
emit ``EncodingWarning`` when ``encoding="locale"`` is specified.
|
emit ``EncodingWarning`` when ``encoding="locale"`` is specified.
|
||||||
|
|
||||||
Add ``io.LOCALE_ENCODING = "locale"`` constant too. This constant can
|
|
||||||
be used to avoid confusing ``LookupError: unknown encoding: locale``
|
|
||||||
error when the code is run in old Python accidentally.
|
|
||||||
|
|
||||||
The constant can be used to test that ``encoding="locale"`` option is
|
|
||||||
supported too. For example,
|
|
||||||
|
|
||||||
.. code-block::
|
|
||||||
|
|
||||||
# Want to suppress an EncodingWarning but still need support
|
|
||||||
# old Python versions.
|
|
||||||
locale_encoding = getattr(io, "LOCALE_ENCODING", None)
|
|
||||||
with open(filename, encoding=locale_encoding) as f:
|
|
||||||
...
|
|
||||||
|
|
||||||
|
|
||||||
``io.text_encoding()``
|
``io.text_encoding()``
|
||||||
-----------------------
|
-----------------------
|
||||||
|
@ -145,7 +159,7 @@ Pure Python implementation will be like this::
|
||||||
import warnings
|
import warnings
|
||||||
warnings.warn("'encoding' option is omitted",
|
warnings.warn("'encoding' option is omitted",
|
||||||
EncodingWarning, stacklevel + 2)
|
EncodingWarning, stacklevel + 2)
|
||||||
encoding = LOCALE_ENCODING
|
encoding = "locale"
|
||||||
return encoding
|
return encoding
|
||||||
|
|
||||||
For example, ``pathlib.Path.read_text()`` can use the function like:
|
For example, ``pathlib.Path.read_text()`` can use the function like:
|
||||||
|
@ -158,11 +172,11 @@ For example, ``pathlib.Path.read_text()`` can use the function like:
|
||||||
return f.read()
|
return f.read()
|
||||||
|
|
||||||
By using ``io.text_encoding()``, ``EncodingWarning`` is emitted for
|
By using ``io.text_encoding()``, ``EncodingWarning`` is emitted for
|
||||||
the caller of ``read_text()`` instead of ``read_text()``.
|
the caller of ``read_text()`` instead of ``read_text()`` itself.
|
||||||
|
|
||||||
|
|
||||||
Affected stdlibs
|
Affected stdlibs
|
||||||
-------------------
|
-----------------
|
||||||
|
|
||||||
Many stdlibs will be affected by this change.
|
Many stdlibs will be affected by this change.
|
||||||
|
|
||||||
|
@ -170,8 +184,8 @@ Most APIs accepting ``encoding=None`` will use ``io.text_encoding()``
|
||||||
as written in the previous section.
|
as written in the previous section.
|
||||||
|
|
||||||
Where using locale encoding as the default encoding is reasonable,
|
Where using locale encoding as the default encoding is reasonable,
|
||||||
``encoding=io.LOCALE_ENCODING`` will be used instead. For example,
|
``encoding="locale"`` will be used instead. For example,
|
||||||
``subprocess`` module will use locale encoding for the default
|
the ``subprocess`` module will use locale encoding for the default
|
||||||
encoding of the pipes.
|
encoding of the pipes.
|
||||||
|
|
||||||
Many tests use ``open()`` without ``encoding`` specified to read
|
Many tests use ``open()`` without ``encoding`` specified to read
|
||||||
|
@ -185,7 +199,7 @@ Opt-in warning
|
||||||
---------------
|
---------------
|
||||||
|
|
||||||
Although ``DeprecationWarning`` is suppressed by default, emitting
|
Although ``DeprecationWarning`` is suppressed by default, emitting
|
||||||
``DeprecationWarning`` always when ``encoding`` option is omitted
|
``DeprecationWarning`` always when the ``encoding`` option is omitted
|
||||||
would be too noisy.
|
would be too noisy.
|
||||||
|
|
||||||
Noisy warnings may lead developers to dismiss the
|
Noisy warnings may lead developers to dismiss the
|
||||||
|
@ -203,12 +217,82 @@ when ``encoding=None``. This behavior can not be implemented in
|
||||||
the codec.
|
the codec.
|
||||||
|
|
||||||
|
|
||||||
|
Backward Compatibility
|
||||||
|
======================
|
||||||
|
|
||||||
|
The new warning is not emitted by default. So this PEP is 100%
|
||||||
|
backward compatible.
|
||||||
|
|
||||||
|
|
||||||
|
Forward Compatibility
|
||||||
|
=====================
|
||||||
|
|
||||||
|
``encoding="locale"`` option is not forward compatible. Codes
|
||||||
|
using the option will not work on Python older than 3.10. It will
|
||||||
|
raise ``LookupError: unknown encoding: locale``.
|
||||||
|
|
||||||
|
Until developers can drop Python 3.9 support, ``EncodingWarning``
|
||||||
|
can be used only for finding missing ``encoding="utf-8"`` options.
|
||||||
|
|
||||||
|
|
||||||
|
How to teach this
|
||||||
|
=================
|
||||||
|
|
||||||
|
For new users
|
||||||
|
-------------
|
||||||
|
|
||||||
|
Since ``EncodingWarning`` is used to write a cross-platform code,
|
||||||
|
no need to teach it to new users.
|
||||||
|
|
||||||
|
We can just recommend using UTF-8 for text files and use
|
||||||
|
``encoding="utf-8"`` when opening test files.
|
||||||
|
|
||||||
|
|
||||||
|
For experienced users
|
||||||
|
---------------------
|
||||||
|
|
||||||
|
Using ``open(filename)`` to read text files encoded in UTF-8 is a
|
||||||
|
common mistake. It may not work on Windows because UTF-8 is not the
|
||||||
|
default encoding.
|
||||||
|
|
||||||
|
You can use ``-X warn_encoding`` or ``PYTHONWARNENCODING=1`` to find
|
||||||
|
this type of mistake.
|
||||||
|
|
||||||
|
Omitting ``encoding`` option is not a bug when opening text files
|
||||||
|
encoded in locale encoding. But ``encoding="locale"`` is recommended
|
||||||
|
after Python 3.10 because it is more explicit.
|
||||||
|
|
||||||
|
|
||||||
Reference Implementation
|
Reference Implementation
|
||||||
========================
|
========================
|
||||||
|
|
||||||
https://github.com/python/cpython/pull/19481
|
https://github.com/python/cpython/pull/19481
|
||||||
|
|
||||||
|
|
||||||
|
Discussions
|
||||||
|
===========
|
||||||
|
|
||||||
|
* Why not implement this in linters?
|
||||||
|
|
||||||
|
* ``encoding="locale"`` and ``io.text_encoding()`` must be in
|
||||||
|
Python.
|
||||||
|
|
||||||
|
* It is difficult to find all caller of functions wrapping
|
||||||
|
``open()`` or ``TextIOWrapper()``. (See ``io.text_encoding()``
|
||||||
|
section.)
|
||||||
|
|
||||||
|
* Many developers will not use the option.
|
||||||
|
|
||||||
|
* Some developers use the option and report the warnings to
|
||||||
|
libraries they use. So the option is worth enough even though
|
||||||
|
many developers won't use it.
|
||||||
|
|
||||||
|
* For example, I find [7]_ and [8]_ by running
|
||||||
|
``pip install -U pip`` and find [9]_ by running ``tox``
|
||||||
|
with the reference implementation. It demonstrates how this
|
||||||
|
option find potential issues.
|
||||||
|
|
||||||
|
|
||||||
References
|
References
|
||||||
==========
|
==========
|
||||||
|
|
||||||
|
@ -225,11 +309,28 @@ References
|
||||||
.. [4] ``json.tool`` had used locale encoding to read JSON files.
|
.. [4] ``json.tool`` had used locale encoding to read JSON files.
|
||||||
(https://bugs.python.org/issue33684)
|
(https://bugs.python.org/issue33684)
|
||||||
|
|
||||||
|
.. [5] site: Potential UnicodeDecodeError when handling pth file
|
||||||
|
(https://bugs.python.org/issue33684)
|
||||||
|
|
||||||
|
.. [6] pypa/pip: "Installing packages fails if Python 3 installed
|
||||||
|
into path with non-ASCII characters"
|
||||||
|
(https://github.com/pypa/pip/issues/9054)
|
||||||
|
|
||||||
|
.. [7] "site: Potential UnicodeDecodeError when handling pth file"
|
||||||
|
(https://bugs.python.org/issue43214)
|
||||||
|
|
||||||
|
.. [8] "[pypa/pip] Use ``encoding`` option or binary mode for open()"
|
||||||
|
(https://github.com/pypa/pip/pull/9608)
|
||||||
|
|
||||||
|
.. [9] "Possible UnicodeError caused by missing encoding="utf-8""
|
||||||
|
(https://github.com/tox-dev/tox/issues/1908)
|
||||||
|
|
||||||
|
|
||||||
Copyright
|
Copyright
|
||||||
=========
|
=========
|
||||||
|
|
||||||
This document has been placed in the public domain.
|
This document is placed in the public domain or under the
|
||||||
|
CC0-1.0-Universal license, whichever is more permissive.
|
||||||
|
|
||||||
|
|
||||||
..
|
..
|
||||||
|
|
Loading…
Reference in New Issue