PEP 597: Apply grammar, syntax and polish fixes, and clarify phrasing and terminology (#1887)

* PEP 597: Apply streightforward grammar and syntax fixes

* PEP 597: Copyedit prose for clarity, polish and to avoid repetition

* PEP 597: Use accurate terminology for options, params and arguments

* PEP 597: Add statements several places to clarify unclear meaning

* PEP 597: Revise docstring and warning text in text_encoding function

* PEP 597: Revise and clarify points based on author feedback

Co-authored-by: Inada Naoki <songofacandy@gmail.com>

Co-authored-by: Inada Naoki <songofacandy@gmail.com>
This commit is contained in:
CAM Gerlach 2021-04-03 21:29:50 -05:00 committed by GitHub
parent e7698a7dbe
commit 50913ae2b3
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 117 additions and 110 deletions

View File

@ -12,16 +12,17 @@ Python-Version: 3.10
Abstract Abstract
======== ========
Add a new warning category ``EncodingWarning``. It is emitted when Add a new warning category ``EncodingWarning``. It is emitted when the
``encoding`` option is omitted and the default encoding is a locale ``encoding`` argument to ``open()`` is omitted and the default
encoding. locale-specific encoding is used.
The warning is disabled by default. New ``-X warn_default_encoding`` The warning is disabled by default. A new ``-X warn_default_encoding``
command-line option and ``PYTHONWARNDEFAULTENCODING`` environment command-line option and a new ``PYTHONWARNDEFAULTENCODING`` environment
variable are used to enable the warnings. variable can be used to enable it.
``encoding="locale"`` option is added too. It is used to specify A ``"locale"`` argument value for ``encoding`` is added too. It
locale encoding explicitly. explicitly specifies that the locale encoding should be used, silencing
the warning.
Motivation Motivation
@ -33,27 +34,27 @@ Using the default encoding is a common mistake
Developers using macOS or Linux may forget that the default encoding Developers using macOS or Linux may forget that the default encoding
is not always UTF-8. is not always UTF-8.
For example, ``long_description = open("README.md").read()`` in For example, using ``long_description = open("README.md").read()`` in
``setup.py`` is a common mistake. Many Windows users can not install ``setup.py`` is a common mistake. Many Windows users cannot install
the package if there is at least one non-ASCII character (e.g. emoji) such packages if there is at least one non-ASCII character
in the ``README.md`` file which is encoded in UTF-8. (e.g. emoji, author names, copyright symbols, and the like)
in their UTF-8-encoded ``README.md`` file.
For example, 489 packages of the 4000 most downloaded packages from Of the 4000 most downloaded packages from PyPI, 489 use non-ASCII
PyPI used non-ASCII characters in README. And 82 packages of them characters in their README, and 82 fail to install from source on
can not be installed from source package when locale encoding is non-UTF-8 locales due to not specifying an encoding for a non-ASCII
ASCII. [1]_ They used the default encoding to read README or TOML file. [1]_
file.
Another example is ``logging.basicConfig(filename="log.txt")``. Another example is ``logging.basicConfig(filename="log.txt")``.
Some users expect UTF-8 is used by default, but locale encoding is Some users might expect it to use UTF-8 by default, but the locale
used actually. [2]_ encoding is actually what is used. [2]_
Even Python experts assume that default encoding is UTF-8. Even Python experts may assume that the default encoding is UTF-8.
It creates bugs that happen only on Windows. See [3]_, [4]_, [5]_, This creates bugs that only happen on Windows; see [3]_, [4]_, [5]_,
and [6]_ for example. and [6]_ for example.
Emitting a warning when the ``encoding`` option is omitted will help Emitting a warning when the ``encoding`` argument is omitted will help
to find such mistakes. find such mistakes.
Explicit way to use locale-specific encoding Explicit way to use locale-specific encoding
@ -61,15 +62,17 @@ Explicit way to use locale-specific encoding
``open(filename)`` isn't explicit about which encoding is expected: ``open(filename)`` isn't explicit about which encoding is expected:
* Expects ASCII (not a bug, but inefficient on Windows) * If ASCII is assumed, this isn't a bug, but may result in decreased
* Expects UTF-8 (bug or platform-specific script) performance on Windows, particularly with non-Latin-1 locale encodings
* Expects the locale encoding. * If UTF-8 is assumed, this may be a bug or a platform-specific script
* If the locale encoding is assumed, the behavior is as expected
(but could change if future versions of Python modify the default)
In this point of view, ``open(filename)`` is not readable. From this point of view, ``open(filename)`` is not readable code.
``encoding=locale.getpreferredencoding(False)`` can be used to ``encoding=locale.getpreferredencoding(False)`` can be used to
specify the locale encoding explicitly. But it is too long and easy specify the locale encoding explicitly, but it is too long and easy
to misuse. (e.g. forget to pass ``False`` to its parameter) to misuse (e.g. one can forget to pass ``False`` as its argument).
This PEP provides an explicit way to specify the locale encoding. This PEP provides an explicit way to specify the locale encoding.
@ -77,91 +80,96 @@ This PEP provides an explicit way to specify the locale encoding.
Prepare to change the default encoding to UTF-8 Prepare to change the default encoding to UTF-8
----------------------------------------------- -----------------------------------------------
Since UTF-8 becomes de-facto standard text encoding, we might change Since UTF-8 has become the de-facto standard text encoding,
the default text encoding to UTF-8 in the future. we might default to it for opening files in the future.
But this change will affect many applications and libraries. If we However, such a change will affect many applications and libraries.
start emitting ``DeprecationWarning`` everywhere ``encoding`` option If we start emitting ``DeprecationWarning`` everywhere the ``encoding``
is omitted, it will be too noisy and painful. argument is omitted, it will be too noisy and painful.
Although this PEP doesn't propose to change the default encoding, Although this PEP doesn't propose changing the default encoding,
this PEP will help the change: it will help enable that change by:
* Reduce the number of omitted ``encoding`` options in many libraries * Reducing the number of omitted ``encoding`` arguments in libraries
before we start emitting the ``DeprecationWarning`` by default. before we start emitting a ``DeprecationWarning`` by default.
* Users will be able to use ``encoding="locale"`` option to suppress * Allowing users to pass ``encoding="locale"`` to suppress
the warning without dropping Python 3.10 support. the current warning and any ``DeprecationWarning`` added in the future,
as well as retaining consistent behavior if later Python versions
change the default, ensuring support for any Python version >=3.10.
Specification Specification
============= =============
``EncodingWarning`` ``EncodingWarning``
-------------------- -------------------
Add a new ``EncodingWarning`` warning class which is a subclass of Add a new ``EncodingWarning`` warning class as a subclass of
``Warning``. It is used to warn when the ``encoding`` option is ``Warning``. It is emitted when the ``encoding`` argument is omitted and
omitted and the default encoding is locale-specific. the default locale-specific encoding is used.
Options to enable the warning Options to enable the warning
------------------------------ -----------------------------
``-X warn_default_encoding`` option and the The ``-X warn_default_encoding`` option and the
``PYTHONWARNDEFAULTENCODING`` environment variable are added. They ``PYTHONWARNDEFAULTENCODING`` environment variable are added. They
are used to enable ``EncodingWarning``. are used to enable ``EncodingWarning``.
``sys.flags.encoding_warning`` is also added. The flag represents ``sys.flags.warn_default_encoding`` is also added. The flag is true when
``EncodingWarning`` is enabled. ``EncodingWarning`` is enabled.
When the option is enabled, ``io.TextIOWrapper()``, ``open()``, and When the flag is set, ``io.TextIOWrapper()``, ``open()`` and other
other modules using them will emit ``EncodingWarning`` when the modules using them will emit ``EncodingWarning`` when the ``encoding``
``encoding`` is omitted. argument is omitted.
Since ``EncodingWarning`` is a subclass of ``Warning``, they are Since ``EncodingWarning`` is a subclass of ``Warning``, they are
shown by default, unlike ``DeprecationWarning``. shown by default (if the ``warn_default_encoding`` flag is set), unlike
``DeprecationWarning``.
``encoding="locale"`` option ``encoding="locale"``
---------------------------- ---------------------
``io.TextIOWrapper`` accepts ``encoding="locale"`` option. It means ``io.TextIOWrapper`` will accept ``"locale"`` as a valid argument to
same to current ``encoding=None``. But ``io.TextIOWrapper`` doesn't ``encoding``. It has the same meaning as the current ``encoding=None``,
emit ``EncodingWarning`` when ``encoding="locale"`` is specified. except that ``io.TextIOWrapper`` doesn't emit ``EncodingWarning`` when
``encoding="locale"`` is specified.
``io.text_encoding()`` ``io.text_encoding()``
----------------------- ----------------------
``io.text_encoding()`` is a helper function for functions having ``io.text_encoding()`` is a helper for functions with an
``encoding=None`` option and passing it to ``io.TextIOWrapper()`` or ``encoding=None`` parameter that pass it to ``io.TextIOWrapper()`` or
``open()``. ``open()``.
Pure Python implementation will be like this:: A pure Python implementation will look like this::
def text_encoding(encoding, stacklevel=1): def text_encoding(encoding, stacklevel=1):
"""Helper function to choose the text encoding. """A helper function to choose the text encoding.
When *encoding* is not None, just return it. When *encoding* is not None, just return it.
Otherwise, return the default text encoding (i.e., "locale"). Otherwise, return the default text encoding (i.e. "locale").
This function emits EncodingWarning if *encoding* is None and This function emits an EncodingWarning if *encoding* is None and
sys.flags.encoding_warning is true. sys.flags.warn_default_encoding is true.
This function can be used in APIs having encoding=None option This function can be used in APIs with an encoding=None parameter
and pass it to TextIOWrapper or open. that pass it to TextIOWrapper or open.
But please consider using encoding="utf-8" for new APIs. However, please consider using encoding="utf-8" for new APIs.
""" """
if encoding is None: if encoding is None:
if sys.flags.encoding_warning: if sys.flags.warn_default_encoding:
import warnings import warnings
warnings.warn("'encoding' option is omitted", warnings.warn(
EncodingWarning, stacklevel + 2) "'encoding' argument not specified.",
EncodingWarning, stacklevel + 2)
encoding = "locale" encoding = "locale"
return encoding return encoding
For example, ``pathlib.Path.read_text()`` can use the function like: For example, ``pathlib.Path.read_text()`` can use it like this:
.. code-block:: .. code-block::
@ -174,18 +182,18 @@ By using ``io.text_encoding()``, ``EncodingWarning`` is emitted for
the caller of ``read_text()`` instead of ``read_text()`` itself. the caller of ``read_text()`` instead of ``read_text()`` itself.
Affected stdlibs Affected standard library modules
----------------- ---------------------------------
Many stdlibs will be affected by this change. Many standard library modules will be affected by this change.
Most APIs accepting ``encoding=None`` will use ``io.text_encoding()`` Most APIs accepting ``encoding=None`` will use ``io.text_encoding()``
as written in the previous section. as written in the previous section.
Where using locale encoding as the default encoding is reasonable, Where using the locale encoding as the default encoding is reasonable,
``encoding="locale"`` will be used instead. For example, ``encoding="locale"`` will be used instead. For example,
the ``subprocess`` module will use locale encoding for the default the ``subprocess`` module will use the locale encoding as the default
encoding of the pipes. for pipes.
Many tests use ``open()`` without ``encoding`` specified to read Many tests use ``open()`` without ``encoding`` specified to read
ASCII text files. They should be rewritten with ``encoding="ascii"``. ASCII text files. They should be rewritten with ``encoding="ascii"``.
@ -195,11 +203,11 @@ Rationale
========= =========
Opt-in warning Opt-in warning
--------------- --------------
Although ``DeprecationWarning`` is suppressed by default, emitting Although ``DeprecationWarning`` is suppressed by default, always
``DeprecationWarning`` always when the ``encoding`` option is omitted emitting ``DeprecationWarning`` when the ``encoding`` argument is
would be too noisy. omitted would be too noisy.
Noisy warnings may lead developers to dismiss the Noisy warnings may lead developers to dismiss the
``DeprecationWarning``. ``DeprecationWarning``.
@ -208,43 +216,43 @@ Noisy warnings may lead developers to dismiss the
"locale" is not a codec alias "locale" is not a codec alias
----------------------------- -----------------------------
We don't add the "locale" to the codec alias because locale can be We don't add "locale" as a codec alias because the locale can be
changed in runtime. changed at runtime.
Additionally, ``TextIOWrapper`` checks ``os.device_encoding()`` Additionally, ``TextIOWrapper`` checks ``os.device_encoding()``
when ``encoding=None``. This behavior can not be implemented in when ``encoding=None``. This behavior cannot be implemented in
the codec. a codec.
Backward Compatibility Backward Compatibility
====================== ======================
The new warning is not emitted by default. So this PEP is 100% The new warning is not emitted by default, so this PEP is 100%
backward compatible. backwards-compatible.
Forward Compatibility Forward Compatibility
===================== =====================
``encoding="locale"`` option is not forward compatible. Codes Passing ``"locale"`` as the argument to ``encoding`` is not
using the option will not work on Python older than 3.10. It will forward-compatible. Code using it will not work on Python older than
raise ``LookupError: unknown encoding: locale``. 3.10, and will instead raise ``LookupError: unknown encoding: locale``.
Until developers can drop Python 3.9 support, ``EncodingWarning`` Until developers can drop Python 3.9 support, ``EncodingWarning``
can be used only for finding missing ``encoding="utf-8"`` options. can only be used for finding missing ``encoding="utf-8"`` arguments.
How to teach this How to Teach This
================= =================
For new users For new users
------------- -------------
Since ``EncodingWarning`` is used to write a cross-platform code, Since ``EncodingWarning`` is used to write cross-platform code,
no need to teach it to new users. there is no need to teach it to new users.
We can just recommend using UTF-8 for text files and use We can just recommend using UTF-8 for text files and using
``encoding="utf-8"`` when opening test files. ``encoding="utf-8"`` when opening them.
For experienced users For experienced users
@ -257,9 +265,9 @@ default encoding.
You can use ``-X warn_default_encoding`` or You can use ``-X warn_default_encoding`` or
``PYTHONWARNDEFAULTENCODING=1`` to find this type of mistake. ``PYTHONWARNDEFAULTENCODING=1`` to find this type of mistake.
Omitting ``encoding`` option is not a bug when opening text files Omitting the ``encoding`` argument is not a bug when opening text files
encoded in locale encoding. But ``encoding="locale"`` is recommended encoded in the locale encoding, but ``encoding="locale"`` is recommended
after Python 3.10 because it is more explicit. in Python 3.10 and later because it is more explicit.
Reference Implementation Reference Implementation
@ -277,22 +285,21 @@ https://mail.python.org/archives/list/python-dev@python.org/thread/SFYUP2TWD5JZ5
* Why not implement this in linters? * Why not implement this in linters?
* ``encoding="locale"`` and ``io.text_encoding()`` must be in * ``encoding="locale"`` and ``io.text_encoding()`` must be implemented
Python. in Python.
* It is difficult to find all caller of functions wrapping * It is difficult to find all callers of functions wrapping
``open()`` or ``TextIOWrapper()``. (See ``io.text_encoding()`` ``open()`` or ``TextIOWrapper()`` (see the ``io.text_encoding()``
section.) section).
* Many developers will not use the option. * Many developers will not use the option.
* Some developers use the option and report the warnings to * Some will, and report the warnings to libraries they use,
libraries they use. So the option is worth enough even though so the option is worth it even if many developers don't enable it.
many developers won't use it.
* For example, I find [7]_ and [8]_ by running * For example, I found [7]_ and [8]_ by running
``pip install -U pip`` and find [9]_ by running ``tox`` ``pip install -U pip``, and [9]_ by running ``tox``
with the reference implementation. It demonstrates how this with the reference implementation. This demonstrates how this
option can be used to find potential issues. option can be used to find potential issues.