PEP 597 3rd edition. (#1368)
This commit is contained in:
parent
f0fe7c4730
commit
9c0fb2a445
218
pep-0597.rst
218
pep-0597.rst
|
@ -1,124 +1,203 @@
|
|||
PEP: 597
|
||||
Title: Enable UTF-8 mode by default on Windows
|
||||
Title: Soft deprecation of omitting encoding
|
||||
Last-Modified: 14-Apr-2020
|
||||
Author: Inada Naoki <songofacandy@gmail.com>
|
||||
Discussions-To: https://discuss.python.org/t/3880
|
||||
Status: Draft
|
||||
Type: Standards Track
|
||||
Content-Type: text/x-rst
|
||||
Created: 05-Jun-2019
|
||||
Python-Version: 3.10
|
||||
Python-Version: 3.9
|
||||
|
||||
|
||||
Abstract
|
||||
========
|
||||
|
||||
This PEP proposes to make UTF-8 mode [#]_ enabled by default on
|
||||
Windows.
|
||||
This PEP proposes:
|
||||
|
||||
The goal of this PEP is providing "UTF-8 by default" experience to
|
||||
Windows users like Unix users.
|
||||
* ``TextIOWrapper`` raises a ``PendingDeprecationWarning`` when the
|
||||
``encoding`` option is not specified, and dev mode is enabled.
|
||||
|
||||
* Add ``encoding="locale"`` option to ``TextIOWrapper``. It behaves
|
||||
like ``encoding=None`` but don't raise a warning.
|
||||
|
||||
|
||||
Motivation
|
||||
==========
|
||||
|
||||
UTF-8 is the best encoding nowdays
|
||||
----------------------------------
|
||||
|
||||
Popular text editors like VS Code uses UTF-8 by default.
|
||||
Even Microsoft Notepad uses UTF-8 by default since the Windows 10
|
||||
May 2019 Update.
|
||||
Additionally, the default encoding of Python source files is UTF-8.
|
||||
|
||||
We can assume that most Python programmers use UTF-8 for most text
|
||||
files.
|
||||
|
||||
Python is one of the most popular first programming languages.
|
||||
New programmers may not know about encoding. If the default encoding
|
||||
for text files is UTF-8, they can learn about encoding when they need
|
||||
to handle legacy encoding.
|
||||
|
||||
|
||||
People assume the default encoding is UTF-8 already
|
||||
---------------------------------------------------
|
||||
Omitting encoding is common mistake
|
||||
------------------------------------
|
||||
|
||||
Developers using macOS or Linux may forget that the default encoding
|
||||
is not always UTF-8.
|
||||
|
||||
For example, ``long_description = open("README.md").read()`` in
|
||||
``setup.py`` is a common mistake. Many Windows users can not install
|
||||
the package if there is at least one emoji or any other non-ASCII
|
||||
character in the ``README.md`` file.
|
||||
the package if there is at least one non-ASCII character (e.g. emoji)
|
||||
in the ``README.md`` file.
|
||||
|
||||
For example, 489 packages of the 4000 most downloaded packages from
|
||||
PyPI used non-ASCII characters in README. And 82 packages of them
|
||||
can not be installed from source package when locale encoding is
|
||||
ASCII. [1_] They used default encoding to read README or TOML file.
|
||||
|
||||
Another example is ``logging.basicConfig(filename="log.txt")``.
|
||||
Some users expect UTF-8 is used by default, but locale encoding is
|
||||
used actually. [2_]
|
||||
|
||||
Even Python experts assume that default encoding is UTF-8.
|
||||
It creates bugs that happen only on Windows. See [#]_ [#]_.
|
||||
It creates bugs that happen only on Windows. See [3_] and [4_].
|
||||
|
||||
Changing the default text encoding to UTF-8 will help many Windows
|
||||
users.
|
||||
Raising a warning when the ``encoding`` option is omitted will
|
||||
help to find such mistakes.
|
||||
|
||||
|
||||
Prepare to change the default encoding to UTF-8
|
||||
-----------------------------------------------
|
||||
|
||||
We chose to use locale encoding for the default text encoding
|
||||
in Python 3.0. But UTF-8 has been adopted very widely since then.
|
||||
|
||||
We might change the default text encoding to UTF-8 in the future.
|
||||
But this change will affect many applications and libraries.
|
||||
Many ``DeprecationWarning`` will be raised if we start raising
|
||||
the warning by default. It will be too noisy.
|
||||
|
||||
While this PEP doesn't cover the change, this PEP will help to reduce
|
||||
the number of ``DeprecationWarning`` in the future.
|
||||
|
||||
|
||||
Specification
|
||||
=============
|
||||
|
||||
Enable UTF-8 mode on Windows unless it is disabled explicitly.
|
||||
Raising a PendingDeprecationWarning
|
||||
---------------------------------------
|
||||
|
||||
UTF-8 mode affects these areas:
|
||||
|
||||
* ``locale.getpreferredencoding`` returns "UTF-8".
|
||||
|
||||
* ``open``, ``subprocess.Popen``, ``pathlib.Path.read_text``,
|
||||
``ZipFile.open``, and many other functions use UTF-8 when
|
||||
the ``encoding`` option is omitted.
|
||||
|
||||
* The stdio uses "UTF-8" always.
|
||||
|
||||
* Console I/O uses "UTF-8" already [#]_. So this affects
|
||||
only when the stdio are redirected.
|
||||
|
||||
On the other hand, UTF-8 mode doesn't affect to "mbcs" encoding.
|
||||
Users can still use system encoding by chosing "mbcs" encoding
|
||||
explicitly.
|
||||
``TextIOWrapper`` raises the ``PendingDeprecationWarning`` when the
|
||||
``encoding`` option is omitted, and dev mode is enabled.
|
||||
|
||||
|
||||
Backwards Compatibility
|
||||
=======================
|
||||
``encoding="locale"`` option
|
||||
----------------------------
|
||||
|
||||
Some existing applications assuming the default text encoding is the
|
||||
system encoding (a.k.a. ANSI encoding) will be broken by this change.
|
||||
When ``encoding="locale"`` is specified to the ``TextIOWrapper``, it
|
||||
behaves same to ``encoding=None``. In detail, the encoding is
|
||||
chosen by:
|
||||
|
||||
Users can disable the UTF-8 mode by environment variable
|
||||
(``PYTHONUTF8=0``) or command line option (``-Xutf8=0``) for backward
|
||||
compatibility.
|
||||
1. ``os.device_encoding(buffer.fileno())``
|
||||
2. ``locale.getpreferredencoding(False)``
|
||||
|
||||
This option can be used to suppress the ``PendingDeprecationWarning``.
|
||||
|
||||
|
||||
Rejected Ideas
|
||||
===============
|
||||
``io.text_encoding``
|
||||
--------------------
|
||||
|
||||
Change the default encoding of TextIOWrapper to "UTF-8"
|
||||
-------------------------------------------------------
|
||||
``TextIOWrapper`` is used indirectly in most cases. For example, ``open``, and ``pathlib.Path.read_text()`` use it. Warning to these
|
||||
functions doesn't make sense. Callers of these functions should be warned instead.
|
||||
|
||||
This idea changed the default encoding to UTF-8 always, regardless of
|
||||
platform, locale, and environment variables.
|
||||
``io.text_encoding(encoding, stacklevel=1)`` is a helper function for it.
|
||||
Pure Python implementation will be like this::
|
||||
|
||||
While this idea looks ideal in terms of consistency, it will cause
|
||||
backward compatibility problems.
|
||||
def text_encoding(encoding, stacklevel=1):
|
||||
"""
|
||||
Helper function to choose the text encoding.
|
||||
|
||||
Utilizing the UTF-8 mode seems better than adding one more backward
|
||||
compatibility option like ``PYTHONLEGACYWINDOWSSTDIO``.
|
||||
When encoding is not None, just return it.
|
||||
Otherwise, return the default text encoding ("locale" for now),
|
||||
and raise a PendingDeprecationWarning in dev mode.
|
||||
|
||||
This function can be used in APIs having encoding=None option.
|
||||
But please consider encoding="utf-8" for new APIs.
|
||||
"""
|
||||
if encoding is None:
|
||||
if sys.flags.dev_mode:
|
||||
import warnings
|
||||
warnings.warn(
|
||||
"'encoding' option is not specified. The default encoding "
|
||||
"will be changed to 'utf-8' in the future",
|
||||
PendingDeprecationWarning, stacklevel + 2)
|
||||
encoding = "locale"
|
||||
return encoding
|
||||
|
||||
``pathlib.Path.read_text()`` can use this function like this::
|
||||
|
||||
def read_text(self, encoding=None, errors=None):
|
||||
"""
|
||||
Open the file in text mode, read it, and close the file.
|
||||
"""
|
||||
encoding = io.text_encoding(encoding)
|
||||
with self.open(mode='r', encoding=encoding, errors=errors) as f:
|
||||
return f.read()
|
||||
|
||||
|
||||
subprocess module doesn't warn
|
||||
------------------------------
|
||||
|
||||
While the subprocess module uses TextIOWrapper, it doesn't raise
|
||||
``PendingDeprecationWarning``. It uses the "locale" encoding
|
||||
by default.
|
||||
|
||||
|
||||
Rationale
|
||||
=========
|
||||
|
||||
"locale" is not a codec alias
|
||||
-----------------------------
|
||||
|
||||
We don't add the "locale" to the codec alias because locale can be
|
||||
changed in runtime.
|
||||
|
||||
Additionally, ``TextIOWrapper`` checks ``os.device_encoding()``
|
||||
when ``encoding=None``. This behavior can not be implemented in
|
||||
the codec.
|
||||
|
||||
|
||||
Use a PendingDeprecationWarning
|
||||
-------------------------------
|
||||
|
||||
This PEP doesn't make decision about changing default text encoding.
|
||||
So we use ``PendingDeprecationWarning`` instead of ``DeprecationWarning`` for now.
|
||||
|
||||
|
||||
Raise warning only in dev mode
|
||||
------------------------------
|
||||
|
||||
This PEP will produce a huge amount of ``PendingDeprecationWarning``.
|
||||
It will be too noisy for most Python developers.
|
||||
|
||||
We need to fix warnings in standard library, pip, and major dev tools
|
||||
like ``pytest`` before raise this warning by default.
|
||||
|
||||
|
||||
subprocess module doesn't warn
|
||||
------------------------------
|
||||
|
||||
The default encoding for PIPE is relating to the encoding of the stdio.
|
||||
It should be discussed later.
|
||||
|
||||
|
||||
Reference Implementation
|
||||
========================
|
||||
|
||||
To be written.
|
||||
https://github.com/python/cpython/pull/19481
|
||||
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
.. [#] `PEP 540 -- Add a new UTF-8 Mode <https://www.python.org/dev/peps/pep-0540/>`_
|
||||
.. [#] https://github.com/pypa/packaging.python.org/pull/682
|
||||
.. [#] https://bugs.python.org/issue33684
|
||||
.. [#] `PEP 528 -- Change Windows console encoding to UTF-8 <https://www.python.org/dev/peps/pep-0528/>`_
|
||||
.. [1] "Packages can't be installed when encoding is not UTF-8"
|
||||
(https://github.com/methane/pep597-pypi-ascii)
|
||||
|
||||
.. [2] "Logging - Inconsistent behaviour when handling unicode"
|
||||
(https://bugs.python.org/issue37111)
|
||||
|
||||
.. [3] Packaging tutorial in packaging.python.org didn't specify
|
||||
encoding to read a ``README.md``
|
||||
(https://github.com/pypa/packaging.python.org/pull/682)
|
||||
|
||||
.. [4] ``json.tool`` had used locale encoding to read JSON files.
|
||||
(https://bugs.python.org/issue33684)
|
||||
|
||||
|
||||
Copyright
|
||||
|
@ -126,6 +205,7 @@ Copyright
|
|||
|
||||
This document has been placed in the public domain.
|
||||
|
||||
|
||||
..
|
||||
Local Variables:
|
||||
mode: indented-text
|
||||
|
|
Loading…
Reference in New Issue