PEP 597 3rd edition. (#1368)

This commit is contained in:
Inada Naoki 2020-04-17 08:34:21 +09:00 committed by GitHub
parent f0fe7c4730
commit 9c0fb2a445
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 149 additions and 69 deletions

View File

@ -1,124 +1,203 @@
PEP: 597 PEP: 597
Title: Enable UTF-8 mode by default on Windows Title: Soft deprecation of omitting encoding
Last-Modified: 14-Apr-2020
Author: Inada Naoki <songofacandy@gmail.com> Author: Inada Naoki <songofacandy@gmail.com>
Discussions-To: https://discuss.python.org/t/3880
Status: Draft Status: Draft
Type: Standards Track Type: Standards Track
Content-Type: text/x-rst Content-Type: text/x-rst
Created: 05-Jun-2019 Created: 05-Jun-2019
Python-Version: 3.10 Python-Version: 3.9
Abstract Abstract
======== ========
This PEP proposes to make UTF-8 mode [#]_ enabled by default on This PEP proposes:
Windows.
The goal of this PEP is providing "UTF-8 by default" experience to * ``TextIOWrapper`` raises a ``PendingDeprecationWarning`` when the
Windows users like Unix users. ``encoding`` option is not specified, and dev mode is enabled.
* Add ``encoding="locale"`` option to ``TextIOWrapper``. It behaves
like ``encoding=None`` but don't raise a warning.
Motivation Motivation
========== ==========
UTF-8 is the best encoding nowdays Omitting encoding is common mistake
---------------------------------- ------------------------------------
Popular text editors like VS Code uses UTF-8 by default.
Even Microsoft Notepad uses UTF-8 by default since the Windows 10
May 2019 Update.
Additionally, the default encoding of Python source files is UTF-8.
We can assume that most Python programmers use UTF-8 for most text
files.
Python is one of the most popular first programming languages.
New programmers may not know about encoding. If the default encoding
for text files is UTF-8, they can learn about encoding when they need
to handle legacy encoding.
People assume the default encoding is UTF-8 already
---------------------------------------------------
Developers using macOS or Linux may forget that the default encoding Developers using macOS or Linux may forget that the default encoding
is not always UTF-8. is not always UTF-8.
For example, ``long_description = open("README.md").read()`` in For example, ``long_description = open("README.md").read()`` in
``setup.py`` is a common mistake. Many Windows users can not install ``setup.py`` is a common mistake. Many Windows users can not install
the package if there is at least one emoji or any other non-ASCII the package if there is at least one non-ASCII character (e.g. emoji)
character in the ``README.md`` file. in the ``README.md`` file.
For example, 489 packages of the 4000 most downloaded packages from
PyPI used non-ASCII characters in README. And 82 packages of them
can not be installed from source package when locale encoding is
ASCII. [1_] They used default encoding to read README or TOML file.
Another example is ``logging.basicConfig(filename="log.txt")``.
Some users expect UTF-8 is used by default, but locale encoding is
used actually. [2_]
Even Python experts assume that default encoding is UTF-8. Even Python experts assume that default encoding is UTF-8.
It creates bugs that happen only on Windows. See [#]_ [#]_. It creates bugs that happen only on Windows. See [3_] and [4_].
Changing the default text encoding to UTF-8 will help many Windows Raising a warning when the ``encoding`` option is omitted will
users. help to find such mistakes.
Prepare to change the default encoding to UTF-8
-----------------------------------------------
We chose to use locale encoding for the default text encoding
in Python 3.0. But UTF-8 has been adopted very widely since then.
We might change the default text encoding to UTF-8 in the future.
But this change will affect many applications and libraries.
Many ``DeprecationWarning`` will be raised if we start raising
the warning by default. It will be too noisy.
While this PEP doesn't cover the change, this PEP will help to reduce
the number of ``DeprecationWarning`` in the future.
Specification Specification
============= =============
Enable UTF-8 mode on Windows unless it is disabled explicitly. Raising a PendingDeprecationWarning
---------------------------------------
UTF-8 mode affects these areas: ``TextIOWrapper`` raises the ``PendingDeprecationWarning`` when the
``encoding`` option is omitted, and dev mode is enabled.
* ``locale.getpreferredencoding`` returns "UTF-8".
* ``open``, ``subprocess.Popen``, ``pathlib.Path.read_text``,
``ZipFile.open``, and many other functions use UTF-8 when
the ``encoding`` option is omitted.
* The stdio uses "UTF-8" always.
* Console I/O uses "UTF-8" already [#]_. So this affects
only when the stdio are redirected.
On the other hand, UTF-8 mode doesn't affect to "mbcs" encoding.
Users can still use system encoding by chosing "mbcs" encoding
explicitly.
Backwards Compatibility ``encoding="locale"`` option
======================= ----------------------------
Some existing applications assuming the default text encoding is the When ``encoding="locale"`` is specified to the ``TextIOWrapper``, it
system encoding (a.k.a. ANSI encoding) will be broken by this change. behaves same to ``encoding=None``. In detail, the encoding is
chosen by:
Users can disable the UTF-8 mode by environment variable 1. ``os.device_encoding(buffer.fileno())``
(``PYTHONUTF8=0``) or command line option (``-Xutf8=0``) for backward 2. ``locale.getpreferredencoding(False)``
compatibility.
This option can be used to suppress the ``PendingDeprecationWarning``.
Rejected Ideas ``io.text_encoding``
=============== --------------------
Change the default encoding of TextIOWrapper to "UTF-8" ``TextIOWrapper`` is used indirectly in most cases. For example, ``open``, and ``pathlib.Path.read_text()`` use it. Warning to these
------------------------------------------------------- functions doesn't make sense. Callers of these functions should be warned instead.
This idea changed the default encoding to UTF-8 always, regardless of ``io.text_encoding(encoding, stacklevel=1)`` is a helper function for it.
platform, locale, and environment variables. Pure Python implementation will be like this::
While this idea looks ideal in terms of consistency, it will cause def text_encoding(encoding, stacklevel=1):
backward compatibility problems. """
Helper function to choose the text encoding.
Utilizing the UTF-8 mode seems better than adding one more backward When encoding is not None, just return it.
compatibility option like ``PYTHONLEGACYWINDOWSSTDIO``. Otherwise, return the default text encoding ("locale" for now),
and raise a PendingDeprecationWarning in dev mode.
This function can be used in APIs having encoding=None option.
But please consider encoding="utf-8" for new APIs.
"""
if encoding is None:
if sys.flags.dev_mode:
import warnings
warnings.warn(
"'encoding' option is not specified. The default encoding "
"will be changed to 'utf-8' in the future",
PendingDeprecationWarning, stacklevel + 2)
encoding = "locale"
return encoding
``pathlib.Path.read_text()`` can use this function like this::
def read_text(self, encoding=None, errors=None):
"""
Open the file in text mode, read it, and close the file.
"""
encoding = io.text_encoding(encoding)
with self.open(mode='r', encoding=encoding, errors=errors) as f:
return f.read()
subprocess module doesn't warn
------------------------------
While the subprocess module uses TextIOWrapper, it doesn't raise
``PendingDeprecationWarning``. It uses the "locale" encoding
by default.
Rationale
=========
"locale" is not a codec alias
-----------------------------
We don't add the "locale" to the codec alias because locale can be
changed in runtime.
Additionally, ``TextIOWrapper`` checks ``os.device_encoding()``
when ``encoding=None``. This behavior can not be implemented in
the codec.
Use a PendingDeprecationWarning
-------------------------------
This PEP doesn't make decision about changing default text encoding.
So we use ``PendingDeprecationWarning`` instead of ``DeprecationWarning`` for now.
Raise warning only in dev mode
------------------------------
This PEP will produce a huge amount of ``PendingDeprecationWarning``.
It will be too noisy for most Python developers.
We need to fix warnings in standard library, pip, and major dev tools
like ``pytest`` before raise this warning by default.
subprocess module doesn't warn
------------------------------
The default encoding for PIPE is relating to the encoding of the stdio.
It should be discussed later.
Reference Implementation Reference Implementation
======================== ========================
To be written. https://github.com/python/cpython/pull/19481
References References
========== ==========
.. [#] `PEP 540 -- Add a new UTF-8 Mode <https://www.python.org/dev/peps/pep-0540/>`_ .. [1] "Packages can't be installed when encoding is not UTF-8"
.. [#] https://github.com/pypa/packaging.python.org/pull/682 (https://github.com/methane/pep597-pypi-ascii)
.. [#] https://bugs.python.org/issue33684
.. [#] `PEP 528 -- Change Windows console encoding to UTF-8 <https://www.python.org/dev/peps/pep-0528/>`_ .. [2] "Logging - Inconsistent behaviour when handling unicode"
(https://bugs.python.org/issue37111)
.. [3] Packaging tutorial in packaging.python.org didn't specify
encoding to read a ``README.md``
(https://github.com/pypa/packaging.python.org/pull/682)
.. [4] ``json.tool`` had used locale encoding to read JSON files.
(https://bugs.python.org/issue33684)
Copyright Copyright
@ -126,6 +205,7 @@ Copyright
This document has been placed in the public domain. This document has been placed in the public domain.
.. ..
Local Variables: Local Variables:
mode: indented-text mode: indented-text