137 lines
3.6 KiB
ReStructuredText
137 lines
3.6 KiB
ReStructuredText
PEP: 597
|
|
Title: Enable UTF-8 mode by default on Windows
|
|
Author: Inada Naoki <songofacandy@gmail.com>
|
|
Status: Draft
|
|
Type: Standards Track
|
|
Content-Type: text/x-rst
|
|
Created: 05-Jun-2019
|
|
Python-Version: 3.10
|
|
|
|
|
|
Abstract
|
|
========
|
|
|
|
This PEP proposes to make UTF-8 mode [#]_ enabled by default on
|
|
Windows.
|
|
|
|
The goal of this PEP is providing "UTF-8 by default" experience to
|
|
Windows users like Unix users.
|
|
|
|
|
|
Motivation
|
|
==========
|
|
|
|
UTF-8 is the best encoding nowdays
|
|
----------------------------------
|
|
|
|
Popular text editors like VS Code uses UTF-8 by default.
|
|
Even Microsoft Notepad uses UTF-8 by default since the Windows 10
|
|
May 2019 Update.
|
|
Additionally, the default encoding of Python source files is UTF-8.
|
|
|
|
We can assume that most Python programmers use UTF-8 for most text
|
|
files.
|
|
|
|
Python is one of the most popular first programming languages.
|
|
New programmers may not know about encoding. If the default encoding
|
|
for text files is UTF-8, they can learn about encoding when they need
|
|
to handle legacy encoding.
|
|
|
|
|
|
People assume the default encoding is UTF-8 already
|
|
---------------------------------------------------
|
|
|
|
Developers using macOS or Linux may forget that the default encoding
|
|
is not always UTF-8.
|
|
|
|
For example, ``long_description = open("README.md").read()`` in
|
|
``setup.py`` is a common mistake. Many Windows users can not install
|
|
the package if there is at least one emoji or any other non-ASCII
|
|
character in the ``README.md`` file.
|
|
|
|
Even Python experts assume that default encoding is UTF-8.
|
|
It creates bugs that happen only on Windows. See [#]_ [#]_.
|
|
|
|
Changing the default text encoding to UTF-8 will help many Windows
|
|
users.
|
|
|
|
|
|
Specification
|
|
=============
|
|
|
|
Enable UTF-8 mode on Windows unless it is disabled explicitly.
|
|
|
|
UTF-8 mode affects these areas:
|
|
|
|
* ``locale.getpreferredencoding`` returns "UTF-8".
|
|
|
|
* ``open``, ``subprocess.Popen``, ``pathlib.Path.read_text``,
|
|
``ZipFile.open``, and many other functions use UTF-8 when
|
|
the ``encoding`` option is omitted.
|
|
|
|
* The stdio uses "UTF-8" always.
|
|
|
|
* Console I/O uses "UTF-8" already [#]_. So this affects
|
|
only when the stdio are redirected.
|
|
|
|
On the other hand, UTF-8 mode doesn't affect to "mbcs" encoding.
|
|
Users can still use system encoding by chosing "mbcs" encoding
|
|
explicitly.
|
|
|
|
|
|
Backwards Compatibility
|
|
=======================
|
|
|
|
Some existing applications assuming the default text encoding is the
|
|
system encoding (a.k.a. ANSI encoding) will be broken by this change.
|
|
|
|
Users can disable the UTF-8 mode by environment variable
|
|
(``PYTHONUTF8=0``) or command line option (``-Xutf8=0``) for backward
|
|
compatibility.
|
|
|
|
|
|
Rejected Ideas
|
|
===============
|
|
|
|
Change the default encoding of TextIOWrapper to "UTF-8"
|
|
-------------------------------------------------------
|
|
|
|
This idea changed the default encoding to UTF-8 always, regardless of
|
|
platform, locale, and environment variables.
|
|
|
|
While this idea looks ideal in terms of consistency, it will cause
|
|
backward compatibility problems.
|
|
|
|
Utilizing the UTF-8 mode seems better than adding one more backward
|
|
compatibility option like ``PYTHONLEGACYWINDOWSSTDIO``.
|
|
|
|
|
|
Reference Implementation
|
|
========================
|
|
|
|
To be written.
|
|
|
|
|
|
References
|
|
==========
|
|
|
|
.. [#] `PEP 540 -- Add a new UTF-8 Mode <https://www.python.org/dev/peps/pep-0540/>`_
|
|
.. [#] https://github.com/pypa/packaging.python.org/pull/682
|
|
.. [#] https://bugs.python.org/issue33684
|
|
.. [#] `PEP 528 -- Change Windows console encoding to UTF-8 <https://www.python.org/dev/peps/pep-0528/>`_
|
|
|
|
|
|
Copyright
|
|
=========
|
|
|
|
This document has been placed in the public domain.
|
|
|
|
..
|
|
Local Variables:
|
|
mode: indented-text
|
|
indent-tabs-mode: nil
|
|
sentence-end-double-space: t
|
|
fill-column: 70
|
|
coding: utf-8
|
|
End:
|