224 lines
6.5 KiB
ReStructuredText
224 lines
6.5 KiB
ReStructuredText
PEP: 597
|
|
Title: Soft deprecation of default encoding
|
|
Last-Modified: 23-Jun-2020
|
|
Author: Inada Naoki <songofacandy@gmail.com>
|
|
Discussions-To: https://discuss.python.org/t/3880
|
|
Status: Draft
|
|
Type: Standards Track
|
|
Content-Type: text/x-rst
|
|
Created: 05-Jun-2019
|
|
Python-Version: 3.10
|
|
|
|
|
|
Abstract
|
|
========
|
|
|
|
This PEP proposes:
|
|
|
|
* ``TextIOWrapper`` raises a ``PendingDeprecationWarning`` when the
|
|
``encoding`` option is not specified and dev mode is enabled.
|
|
|
|
* Add ``encoding="locale"`` option to ``TextIOWrapper``. It behaves
|
|
like ``encoding=None`` but don't raise a warning.
|
|
|
|
|
|
Motivation
|
|
==========
|
|
|
|
Using the default encoding is a common mistake
|
|
----------------------------------------------
|
|
|
|
Developers using macOS or Linux may forget that the default encoding
|
|
is not always UTF-8.
|
|
|
|
For example, ``long_description = open("README.md").read()`` in
|
|
``setup.py`` is a common mistake. Many Windows users can not install
|
|
the package if there is at least one non-ASCII character (e.g. emoji)
|
|
in the ``README.md`` file which is encoded in UTF-8.
|
|
|
|
For example, 489 packages of the 4000 most downloaded packages from
|
|
PyPI used non-ASCII characters in README. And 82 packages of them
|
|
can not be installed from source package when locale encoding is
|
|
ASCII. [1_] They used the default encoding to read README or TOML
|
|
file.
|
|
|
|
Another example is ``logging.basicConfig(filename="log.txt")``.
|
|
Some users expect UTF-8 is used by default, but locale encoding is
|
|
used actually. [2_]
|
|
|
|
Even Python experts assume that default encoding is UTF-8.
|
|
It creates bugs that happen only on Windows. See [3_] and [4_].
|
|
|
|
Raising a warning when the ``encoding`` option is omitted will
|
|
help to find such mistakes.
|
|
|
|
|
|
Prepare to change the default encoding to UTF-8
|
|
-----------------------------------------------
|
|
|
|
We chose to use locale encoding for the default text encoding
|
|
in Python 3.0. But UTF-8 has been adopted very widely since then.
|
|
|
|
We might change the default text encoding to UTF-8 in the future.
|
|
But this change will affect many applications and libraries.
|
|
Many ``DeprecationWarning`` will be raised if we start raising
|
|
the warning by default. It will be too noisy.
|
|
|
|
While this PEP doesn't cover the change, this PEP will help to reduce
|
|
the number of ``DeprecationWarning`` in the future.
|
|
|
|
|
|
Specification
|
|
=============
|
|
|
|
Raising a PendingDeprecationWarning
|
|
---------------------------------------
|
|
|
|
``TextIOWrapper`` raises the ``PendingDeprecationWarning`` when the
|
|
``encoding`` option is omitted and dev mode is enabled.
|
|
|
|
|
|
``encoding="locale"`` option
|
|
----------------------------
|
|
|
|
When ``encoding="locale"`` is specified to the ``TextIOWrapper``, it
|
|
behaves same to ``encoding=None`` except it doesn't raise warning.
|
|
In detail, the encoding is chosen by this order:
|
|
|
|
1. ``os.device_encoding(buffer.fileno())``
|
|
2. ``locale.getpreferredencoding(False)``
|
|
|
|
This option can be used to use the locale encoding explicitly and
|
|
suppress the ``PendingDeprecationWarning``.
|
|
|
|
|
|
``io.text_encoding``
|
|
--------------------
|
|
|
|
``TextIOWrapper`` is used indirectly in most cases. For example,
|
|
``open``, and ``pathlib.Path.read_text()`` use it. Warning to these
|
|
functions doesn't make sense. Callers of these functions should be
|
|
warned instead.
|
|
|
|
``io.text_encoding(encoding, stacklevel=1)`` is a helper function for
|
|
it. Pure Python implementation will be like this::
|
|
|
|
def text_encoding(encoding, stacklevel=1):
|
|
"""
|
|
Helper function to choose the text encoding.
|
|
|
|
When encoding is not None, just return it.
|
|
Otherwise, return the default text encoding ("locale" for now),
|
|
and raise a PendingDeprecationWarning in dev mode.
|
|
|
|
This function can be used in APIs having encoding=None option.
|
|
But please consider encoding="utf-8" for new APIs.
|
|
"""
|
|
if encoding is None:
|
|
if sys.flags.dev_mode:
|
|
import warnings
|
|
warnings.warn(
|
|
"'encoding' option is not specified. The default encoding "
|
|
"might be changed to 'utf-8' in the future",
|
|
PendingDeprecationWarning, stacklevel + 2)
|
|
encoding = "locale"
|
|
return encoding
|
|
|
|
``pathlib.Path.read_text()`` can use this function like this::
|
|
|
|
def read_text(self, encoding=None, errors=None):
|
|
"""
|
|
Open the file in text mode, read it, and close the file.
|
|
"""
|
|
encoding = io.text_encoding(encoding)
|
|
with self.open(mode='r', encoding=encoding, errors=errors) as f:
|
|
return f.read()
|
|
|
|
|
|
subprocess module doesn't warn
|
|
------------------------------
|
|
|
|
While the subprocess module uses TextIOWrapper, it doesn't raise
|
|
``PendingDeprecationWarning``. It uses the "locale" encoding by
|
|
default.
|
|
|
|
|
|
Rationale
|
|
=========
|
|
|
|
"locale" is not a codec alias
|
|
-----------------------------
|
|
|
|
We don't add the "locale" to the codec alias because locale can be
|
|
changed in runtime.
|
|
|
|
Additionally, ``TextIOWrapper`` checks ``os.device_encoding()``
|
|
when ``encoding=None``. This behavior can not be implemented in
|
|
the codec.
|
|
|
|
|
|
Use a PendingDeprecationWarning
|
|
-------------------------------
|
|
|
|
This PEP doesn't cover changing the default encoding to UTF-8.
|
|
So we use ``PendingDeprecationWarning`` instead of
|
|
``DeprecationWarning`` for now.
|
|
|
|
|
|
Raise warning only in dev mode
|
|
------------------------------
|
|
|
|
This PEP will produce a huge amount of ``PendingDeprecationWarning``.
|
|
It will be too noisy for most Python developers.
|
|
|
|
We need to fix all warnings in the standard library. We need to wait
|
|
pip and major dev tools like ``pytest`` fix warnings before raising
|
|
this warning by default.
|
|
|
|
|
|
subprocess module doesn't warn
|
|
------------------------------
|
|
|
|
The default encoding for PIPE is relating to the encoding of the
|
|
stdio than the default encoding of ``TextIOWrapper``. So this PEP
|
|
doesn't propose to raise warning from the subprocess module.
|
|
|
|
|
|
Reference Implementation
|
|
========================
|
|
|
|
https://github.com/python/cpython/pull/19481
|
|
|
|
|
|
References
|
|
==========
|
|
|
|
.. [1] "Packages can't be installed when encoding is not UTF-8"
|
|
(https://github.com/methane/pep597-pypi-ascii)
|
|
|
|
.. [2] "Logging - Inconsistent behaviour when handling unicode"
|
|
(https://bugs.python.org/issue37111)
|
|
|
|
.. [3] Packaging tutorial in packaging.python.org didn't specify
|
|
encoding to read a ``README.md``
|
|
(https://github.com/pypa/packaging.python.org/pull/682)
|
|
|
|
.. [4] ``json.tool`` had used locale encoding to read JSON files.
|
|
(https://bugs.python.org/issue33684)
|
|
|
|
|
|
Copyright
|
|
=========
|
|
|
|
This document has been placed in the public domain.
|
|
|
|
|
|
..
|
|
Local Variables:
|
|
mode: indented-text
|
|
indent-tabs-mode: nil
|
|
sentence-end-double-space: t
|
|
fill-column: 70
|
|
coding: utf-8
|
|
End:
|