PEP 597: Apply grammar, syntax and polish fixes, and clarify phrasing and terminology (#1887)
* PEP 597: Apply streightforward grammar and syntax fixes * PEP 597: Copyedit prose for clarity, polish and to avoid repetition * PEP 597: Use accurate terminology for options, params and arguments * PEP 597: Add statements several places to clarify unclear meaning * PEP 597: Revise docstring and warning text in text_encoding function * PEP 597: Revise and clarify points based on author feedback Co-authored-by: Inada Naoki <songofacandy@gmail.com> Co-authored-by: Inada Naoki <songofacandy@gmail.com>
This commit is contained in:
parent
e7698a7dbe
commit
50913ae2b3
225
pep-0597.rst
225
pep-0597.rst
|
@ -12,16 +12,17 @@ Python-Version: 3.10
|
|||
Abstract
|
||||
========
|
||||
|
||||
Add a new warning category ``EncodingWarning``. It is emitted when
|
||||
``encoding`` option is omitted and the default encoding is a locale
|
||||
encoding.
|
||||
Add a new warning category ``EncodingWarning``. It is emitted when the
|
||||
``encoding`` argument to ``open()`` is omitted and the default
|
||||
locale-specific encoding is used.
|
||||
|
||||
The warning is disabled by default. New ``-X warn_default_encoding``
|
||||
command-line option and ``PYTHONWARNDEFAULTENCODING`` environment
|
||||
variable are used to enable the warnings.
|
||||
The warning is disabled by default. A new ``-X warn_default_encoding``
|
||||
command-line option and a new ``PYTHONWARNDEFAULTENCODING`` environment
|
||||
variable can be used to enable it.
|
||||
|
||||
``encoding="locale"`` option is added too. It is used to specify
|
||||
locale encoding explicitly.
|
||||
A ``"locale"`` argument value for ``encoding`` is added too. It
|
||||
explicitly specifies that the locale encoding should be used, silencing
|
||||
the warning.
|
||||
|
||||
|
||||
Motivation
|
||||
|
@ -33,27 +34,27 @@ Using the default encoding is a common mistake
|
|||
Developers using macOS or Linux may forget that the default encoding
|
||||
is not always UTF-8.
|
||||
|
||||
For example, ``long_description = open("README.md").read()`` in
|
||||
``setup.py`` is a common mistake. Many Windows users can not install
|
||||
the package if there is at least one non-ASCII character (e.g. emoji)
|
||||
in the ``README.md`` file which is encoded in UTF-8.
|
||||
For example, using ``long_description = open("README.md").read()`` in
|
||||
``setup.py`` is a common mistake. Many Windows users cannot install
|
||||
such packages if there is at least one non-ASCII character
|
||||
(e.g. emoji, author names, copyright symbols, and the like)
|
||||
in their UTF-8-encoded ``README.md`` file.
|
||||
|
||||
For example, 489 packages of the 4000 most downloaded packages from
|
||||
PyPI used non-ASCII characters in README. And 82 packages of them
|
||||
can not be installed from source package when locale encoding is
|
||||
ASCII. [1]_ They used the default encoding to read README or TOML
|
||||
file.
|
||||
Of the 4000 most downloaded packages from PyPI, 489 use non-ASCII
|
||||
characters in their README, and 82 fail to install from source on
|
||||
non-UTF-8 locales due to not specifying an encoding for a non-ASCII
|
||||
file. [1]_
|
||||
|
||||
Another example is ``logging.basicConfig(filename="log.txt")``.
|
||||
Some users expect UTF-8 is used by default, but locale encoding is
|
||||
used actually. [2]_
|
||||
Some users might expect it to use UTF-8 by default, but the locale
|
||||
encoding is actually what is used. [2]_
|
||||
|
||||
Even Python experts assume that default encoding is UTF-8.
|
||||
It creates bugs that happen only on Windows. See [3]_, [4]_, [5]_,
|
||||
Even Python experts may assume that the default encoding is UTF-8.
|
||||
This creates bugs that only happen on Windows; see [3]_, [4]_, [5]_,
|
||||
and [6]_ for example.
|
||||
|
||||
Emitting a warning when the ``encoding`` option is omitted will help
|
||||
to find such mistakes.
|
||||
Emitting a warning when the ``encoding`` argument is omitted will help
|
||||
find such mistakes.
|
||||
|
||||
|
||||
Explicit way to use locale-specific encoding
|
||||
|
@ -61,15 +62,17 @@ Explicit way to use locale-specific encoding
|
|||
|
||||
``open(filename)`` isn't explicit about which encoding is expected:
|
||||
|
||||
* Expects ASCII (not a bug, but inefficient on Windows)
|
||||
* Expects UTF-8 (bug or platform-specific script)
|
||||
* Expects the locale encoding.
|
||||
* If ASCII is assumed, this isn't a bug, but may result in decreased
|
||||
performance on Windows, particularly with non-Latin-1 locale encodings
|
||||
* If UTF-8 is assumed, this may be a bug or a platform-specific script
|
||||
* If the locale encoding is assumed, the behavior is as expected
|
||||
(but could change if future versions of Python modify the default)
|
||||
|
||||
In this point of view, ``open(filename)`` is not readable.
|
||||
From this point of view, ``open(filename)`` is not readable code.
|
||||
|
||||
``encoding=locale.getpreferredencoding(False)`` can be used to
|
||||
specify the locale encoding explicitly. But it is too long and easy
|
||||
to misuse. (e.g. forget to pass ``False`` to its parameter)
|
||||
specify the locale encoding explicitly, but it is too long and easy
|
||||
to misuse (e.g. one can forget to pass ``False`` as its argument).
|
||||
|
||||
This PEP provides an explicit way to specify the locale encoding.
|
||||
|
||||
|
@ -77,91 +80,96 @@ This PEP provides an explicit way to specify the locale encoding.
|
|||
Prepare to change the default encoding to UTF-8
|
||||
-----------------------------------------------
|
||||
|
||||
Since UTF-8 becomes de-facto standard text encoding, we might change
|
||||
the default text encoding to UTF-8 in the future.
|
||||
Since UTF-8 has become the de-facto standard text encoding,
|
||||
we might default to it for opening files in the future.
|
||||
|
||||
But this change will affect many applications and libraries. If we
|
||||
start emitting ``DeprecationWarning`` everywhere ``encoding`` option
|
||||
is omitted, it will be too noisy and painful.
|
||||
However, such a change will affect many applications and libraries.
|
||||
If we start emitting ``DeprecationWarning`` everywhere the ``encoding``
|
||||
argument is omitted, it will be too noisy and painful.
|
||||
|
||||
Although this PEP doesn't propose to change the default encoding,
|
||||
this PEP will help the change:
|
||||
Although this PEP doesn't propose changing the default encoding,
|
||||
it will help enable that change by:
|
||||
|
||||
* Reduce the number of omitted ``encoding`` options in many libraries
|
||||
before we start emitting the ``DeprecationWarning`` by default.
|
||||
* Reducing the number of omitted ``encoding`` arguments in libraries
|
||||
before we start emitting a ``DeprecationWarning`` by default.
|
||||
|
||||
* Users will be able to use ``encoding="locale"`` option to suppress
|
||||
the warning without dropping Python 3.10 support.
|
||||
* Allowing users to pass ``encoding="locale"`` to suppress
|
||||
the current warning and any ``DeprecationWarning`` added in the future,
|
||||
as well as retaining consistent behavior if later Python versions
|
||||
change the default, ensuring support for any Python version >=3.10.
|
||||
|
||||
|
||||
Specification
|
||||
=============
|
||||
|
||||
``EncodingWarning``
|
||||
--------------------
|
||||
-------------------
|
||||
|
||||
Add a new ``EncodingWarning`` warning class which is a subclass of
|
||||
``Warning``. It is used to warn when the ``encoding`` option is
|
||||
omitted and the default encoding is locale-specific.
|
||||
Add a new ``EncodingWarning`` warning class as a subclass of
|
||||
``Warning``. It is emitted when the ``encoding`` argument is omitted and
|
||||
the default locale-specific encoding is used.
|
||||
|
||||
|
||||
Options to enable the warning
|
||||
------------------------------
|
||||
-----------------------------
|
||||
|
||||
``-X warn_default_encoding`` option and the
|
||||
The ``-X warn_default_encoding`` option and the
|
||||
``PYTHONWARNDEFAULTENCODING`` environment variable are added. They
|
||||
are used to enable ``EncodingWarning``.
|
||||
|
||||
``sys.flags.encoding_warning`` is also added. The flag represents
|
||||
``sys.flags.warn_default_encoding`` is also added. The flag is true when
|
||||
``EncodingWarning`` is enabled.
|
||||
|
||||
When the option is enabled, ``io.TextIOWrapper()``, ``open()``, and
|
||||
other modules using them will emit ``EncodingWarning`` when the
|
||||
``encoding`` is omitted.
|
||||
When the flag is set, ``io.TextIOWrapper()``, ``open()`` and other
|
||||
modules using them will emit ``EncodingWarning`` when the ``encoding``
|
||||
argument is omitted.
|
||||
|
||||
Since ``EncodingWarning`` is a subclass of ``Warning``, they are
|
||||
shown by default, unlike ``DeprecationWarning``.
|
||||
shown by default (if the ``warn_default_encoding`` flag is set), unlike
|
||||
``DeprecationWarning``.
|
||||
|
||||
|
||||
``encoding="locale"`` option
|
||||
----------------------------
|
||||
``encoding="locale"``
|
||||
---------------------
|
||||
|
||||
``io.TextIOWrapper`` accepts ``encoding="locale"`` option. It means
|
||||
same to current ``encoding=None``. But ``io.TextIOWrapper`` doesn't
|
||||
emit ``EncodingWarning`` when ``encoding="locale"`` is specified.
|
||||
``io.TextIOWrapper`` will accept ``"locale"`` as a valid argument to
|
||||
``encoding``. It has the same meaning as the current ``encoding=None``,
|
||||
except that ``io.TextIOWrapper`` doesn't emit ``EncodingWarning`` when
|
||||
``encoding="locale"`` is specified.
|
||||
|
||||
|
||||
``io.text_encoding()``
|
||||
-----------------------
|
||||
----------------------
|
||||
|
||||
``io.text_encoding()`` is a helper function for functions having
|
||||
``encoding=None`` option and passing it to ``io.TextIOWrapper()`` or
|
||||
``io.text_encoding()`` is a helper for functions with an
|
||||
``encoding=None`` parameter that pass it to ``io.TextIOWrapper()`` or
|
||||
``open()``.
|
||||
|
||||
Pure Python implementation will be like this::
|
||||
A pure Python implementation will look like this::
|
||||
|
||||
def text_encoding(encoding, stacklevel=1):
|
||||
"""Helper function to choose the text encoding.
|
||||
"""A helper function to choose the text encoding.
|
||||
|
||||
When *encoding* is not None, just return it.
|
||||
Otherwise, return the default text encoding (i.e., "locale").
|
||||
Otherwise, return the default text encoding (i.e. "locale").
|
||||
|
||||
This function emits EncodingWarning if *encoding* is None and
|
||||
sys.flags.encoding_warning is true.
|
||||
This function emits an EncodingWarning if *encoding* is None and
|
||||
sys.flags.warn_default_encoding is true.
|
||||
|
||||
This function can be used in APIs having encoding=None option
|
||||
and pass it to TextIOWrapper or open.
|
||||
But please consider using encoding="utf-8" for new APIs.
|
||||
This function can be used in APIs with an encoding=None parameter
|
||||
that pass it to TextIOWrapper or open.
|
||||
However, please consider using encoding="utf-8" for new APIs.
|
||||
"""
|
||||
if encoding is None:
|
||||
if sys.flags.encoding_warning:
|
||||
if sys.flags.warn_default_encoding:
|
||||
import warnings
|
||||
warnings.warn("'encoding' option is omitted",
|
||||
warnings.warn(
|
||||
"'encoding' argument not specified.",
|
||||
EncodingWarning, stacklevel + 2)
|
||||
encoding = "locale"
|
||||
return encoding
|
||||
|
||||
For example, ``pathlib.Path.read_text()`` can use the function like:
|
||||
For example, ``pathlib.Path.read_text()`` can use it like this:
|
||||
|
||||
.. code-block::
|
||||
|
||||
|
@ -174,18 +182,18 @@ By using ``io.text_encoding()``, ``EncodingWarning`` is emitted for
|
|||
the caller of ``read_text()`` instead of ``read_text()`` itself.
|
||||
|
||||
|
||||
Affected stdlibs
|
||||
-----------------
|
||||
Affected standard library modules
|
||||
---------------------------------
|
||||
|
||||
Many stdlibs will be affected by this change.
|
||||
Many standard library modules will be affected by this change.
|
||||
|
||||
Most APIs accepting ``encoding=None`` will use ``io.text_encoding()``
|
||||
as written in the previous section.
|
||||
|
||||
Where using locale encoding as the default encoding is reasonable,
|
||||
Where using the locale encoding as the default encoding is reasonable,
|
||||
``encoding="locale"`` will be used instead. For example,
|
||||
the ``subprocess`` module will use locale encoding for the default
|
||||
encoding of the pipes.
|
||||
the ``subprocess`` module will use the locale encoding as the default
|
||||
for pipes.
|
||||
|
||||
Many tests use ``open()`` without ``encoding`` specified to read
|
||||
ASCII text files. They should be rewritten with ``encoding="ascii"``.
|
||||
|
@ -195,11 +203,11 @@ Rationale
|
|||
=========
|
||||
|
||||
Opt-in warning
|
||||
---------------
|
||||
--------------
|
||||
|
||||
Although ``DeprecationWarning`` is suppressed by default, emitting
|
||||
``DeprecationWarning`` always when the ``encoding`` option is omitted
|
||||
would be too noisy.
|
||||
Although ``DeprecationWarning`` is suppressed by default, always
|
||||
emitting ``DeprecationWarning`` when the ``encoding`` argument is
|
||||
omitted would be too noisy.
|
||||
|
||||
Noisy warnings may lead developers to dismiss the
|
||||
``DeprecationWarning``.
|
||||
|
@ -208,43 +216,43 @@ Noisy warnings may lead developers to dismiss the
|
|||
"locale" is not a codec alias
|
||||
-----------------------------
|
||||
|
||||
We don't add the "locale" to the codec alias because locale can be
|
||||
changed in runtime.
|
||||
We don't add "locale" as a codec alias because the locale can be
|
||||
changed at runtime.
|
||||
|
||||
Additionally, ``TextIOWrapper`` checks ``os.device_encoding()``
|
||||
when ``encoding=None``. This behavior can not be implemented in
|
||||
the codec.
|
||||
when ``encoding=None``. This behavior cannot be implemented in
|
||||
a codec.
|
||||
|
||||
|
||||
Backward Compatibility
|
||||
======================
|
||||
|
||||
The new warning is not emitted by default. So this PEP is 100%
|
||||
backward compatible.
|
||||
The new warning is not emitted by default, so this PEP is 100%
|
||||
backwards-compatible.
|
||||
|
||||
|
||||
Forward Compatibility
|
||||
=====================
|
||||
|
||||
``encoding="locale"`` option is not forward compatible. Codes
|
||||
using the option will not work on Python older than 3.10. It will
|
||||
raise ``LookupError: unknown encoding: locale``.
|
||||
Passing ``"locale"`` as the argument to ``encoding`` is not
|
||||
forward-compatible. Code using it will not work on Python older than
|
||||
3.10, and will instead raise ``LookupError: unknown encoding: locale``.
|
||||
|
||||
Until developers can drop Python 3.9 support, ``EncodingWarning``
|
||||
can be used only for finding missing ``encoding="utf-8"`` options.
|
||||
can only be used for finding missing ``encoding="utf-8"`` arguments.
|
||||
|
||||
|
||||
How to teach this
|
||||
How to Teach This
|
||||
=================
|
||||
|
||||
For new users
|
||||
-------------
|
||||
|
||||
Since ``EncodingWarning`` is used to write a cross-platform code,
|
||||
no need to teach it to new users.
|
||||
Since ``EncodingWarning`` is used to write cross-platform code,
|
||||
there is no need to teach it to new users.
|
||||
|
||||
We can just recommend using UTF-8 for text files and use
|
||||
``encoding="utf-8"`` when opening test files.
|
||||
We can just recommend using UTF-8 for text files and using
|
||||
``encoding="utf-8"`` when opening them.
|
||||
|
||||
|
||||
For experienced users
|
||||
|
@ -257,9 +265,9 @@ default encoding.
|
|||
You can use ``-X warn_default_encoding`` or
|
||||
``PYTHONWARNDEFAULTENCODING=1`` to find this type of mistake.
|
||||
|
||||
Omitting ``encoding`` option is not a bug when opening text files
|
||||
encoded in locale encoding. But ``encoding="locale"`` is recommended
|
||||
after Python 3.10 because it is more explicit.
|
||||
Omitting the ``encoding`` argument is not a bug when opening text files
|
||||
encoded in the locale encoding, but ``encoding="locale"`` is recommended
|
||||
in Python 3.10 and later because it is more explicit.
|
||||
|
||||
|
||||
Reference Implementation
|
||||
|
@ -277,22 +285,21 @@ https://mail.python.org/archives/list/python-dev@python.org/thread/SFYUP2TWD5JZ5
|
|||
|
||||
* Why not implement this in linters?
|
||||
|
||||
* ``encoding="locale"`` and ``io.text_encoding()`` must be in
|
||||
Python.
|
||||
* ``encoding="locale"`` and ``io.text_encoding()`` must be implemented
|
||||
in Python.
|
||||
|
||||
* It is difficult to find all caller of functions wrapping
|
||||
``open()`` or ``TextIOWrapper()``. (See ``io.text_encoding()``
|
||||
section.)
|
||||
* It is difficult to find all callers of functions wrapping
|
||||
``open()`` or ``TextIOWrapper()`` (see the ``io.text_encoding()``
|
||||
section).
|
||||
|
||||
* Many developers will not use the option.
|
||||
|
||||
* Some developers use the option and report the warnings to
|
||||
libraries they use. So the option is worth enough even though
|
||||
many developers won't use it.
|
||||
* Some will, and report the warnings to libraries they use,
|
||||
so the option is worth it even if many developers don't enable it.
|
||||
|
||||
* For example, I find [7]_ and [8]_ by running
|
||||
``pip install -U pip`` and find [9]_ by running ``tox``
|
||||
with the reference implementation. It demonstrates how this
|
||||
* For example, I found [7]_ and [8]_ by running
|
||||
``pip install -U pip``, and [9]_ by running ``tox``
|
||||
with the reference implementation. This demonstrates how this
|
||||
option can be used to find potential issues.
|
||||
|
||||
|
||||
|
|
Loading…
Reference in New Issue