PEP: 597 Title: Enable UTF-8 mode by default on Windows Author: Inada Naoki Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 05-Jun-2019 Python-Version: 3.10 Abstract ======== This PEP proposes to make UTF-8 mode [#]_ enabled by default on Windows. The goal of this PEP is providing "UTF-8 by default" experience to Windows users like Unix users. Motivation ========== UTF-8 is the best encoding nowdays ---------------------------------- Popular text editors like VS Code uses UTF-8 by default. Even Microsoft Notepad uses UTF-8 by default since the Windows 10 May 2019 Update. Additionally, the default encoding of Python source files is UTF-8. We can assume that most Python programmers use UTF-8 for most text files. Python is one of the most popular first programming languages. New programmers may not know about encoding. If the default encoding for text files is UTF-8, they can learn about encoding when they need to handle legacy encoding. People assume the default encoding is UTF-8 already --------------------------------------------------- Developers using macOS or Linux may forget that the default encoding is not always UTF-8. For example, ``long_description = open("README.md").read()`` in ``setup.py`` is a common mistake. Many Windows users can not install the package if there is at least one emoji or any other non-ASCII character in the ``README.md`` file. Even Python experts assume that default encoding is UTF-8. It creates bugs that happen only on Windows. See [#]_ [#]_. Changing the default text encoding to UTF-8 will help many Windows users. Specification ============= Enable UTF-8 mode on Windows unless it is disabled explicitly. UTF-8 mode affects these areas: * ``locale.getpreferredencoding`` returns "UTF-8". * ``open``, ``subprocess.Popen``, ``pathlib.Path.read_text``, ``ZipFile.open``, and many other functions use UTF-8 when the ``encoding`` option is omitted. * The stdio uses "UTF-8" always. * Console I/O uses "UTF-8" already [#]_. So this affects only when the stdio are redirected. On the other hand, UTF-8 mode doesn't affect to "mbcs" encoding. Users can still use system encoding by chosing "mbcs" encoding explicitly. Backwards Compatibility ======================= Some existing applications assuming the default text encoding is the system encoding (a.k.a. ANSI encoding) will be broken by this change. Users can disable the UTF-8 mode by environment variable (``PYTHONUTF8=0``) or command line option (``-Xutf8=0``) for backward compatibility. Rejected Ideas =============== Change the default encoding of TextIOWrapper to "UTF-8" ------------------------------------------------------- This idea changed the default encoding to UTF-8 always, regardless of platform, locale, and environment variables. While this idea looks ideal in terms of consistency, it will cause backward compatibility problems. Utilizing the UTF-8 mode seems better than adding one more backward compatibility option like ``PYTHONLEGACYWINDOWSSTDIO``. Reference Implementation ======================== To be written. References ========== .. [#] `PEP 540 -- Add a new UTF-8 Mode `_ .. [#] https://github.com/pypa/packaging.python.org/pull/682 .. [#] https://bugs.python.org/issue33684 .. [#] `PEP 528 -- Change Windows console encoding to UTF-8 `_ Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: