PEP 686: Make UTF-8 mode default (#2443)
This commit is contained in:
parent
618233e725
commit
96bfcd082d
|
@ -0,0 +1,155 @@
|
|||
PEP: 686
|
||||
Title: Make UTF-8 mode default
|
||||
Author: Inada Naoki <songofacandy@gmail.com>
|
||||
Status: Draft
|
||||
Type: Standards Track
|
||||
Content-Type: text/x-rst
|
||||
Created: 18-Mar-2022
|
||||
Python-Version: 3.12
|
||||
|
||||
|
||||
Abstract
|
||||
========
|
||||
|
||||
This PEP proposes making UTF-8 mode [1]_ on by default.
|
||||
|
||||
With this change, Python uses UTF-8 for default encoding of files, stdio, and
|
||||
pipes consistently.
|
||||
|
||||
|
||||
Motivation
|
||||
==========
|
||||
|
||||
UTF-8 becomes de-facto standard text encoding.
|
||||
|
||||
* Default encoding of Python source files is UTF-8.
|
||||
* JSON, TOML, YAML uses UTF-8.
|
||||
* Most text editors including VS Code and Windows notepad use UTF-8 by
|
||||
default.
|
||||
* Most websites and text data on the internet uses UTF-8.
|
||||
* And many other popular programming languages including node.js, Go, Rust,
|
||||
Ruby, and Java uses UTF-8 by default.
|
||||
|
||||
Changing the default encoding to UTF-8 makes Python easier to interoperate
|
||||
with them.
|
||||
|
||||
Additionally, many Python developers using Unix forget that the default
|
||||
encoding is platform dependant. They omit to specify ``encoding="utf-8"`` when
|
||||
they read text files encoded in UTF-8 (e.g. JSON, TOML, Markdown, and Python
|
||||
source files). Inconsistent default encoding caused many bugs.
|
||||
|
||||
|
||||
Specification
|
||||
=============
|
||||
|
||||
Changes to UTF-8 mode
|
||||
---------------------
|
||||
|
||||
Currently, UTF-8 mode affects to ``locale.getpreferredencoding()``.
|
||||
|
||||
This PEP proposes to remove this override. UTF-8 mode will not affect to
|
||||
``locale`` module.
|
||||
|
||||
After this change, UTF-8 mode affects to:
|
||||
|
||||
* stdin, stdout, stderr
|
||||
|
||||
* User can override it with ``PYTHONIOENCODING``.
|
||||
|
||||
* filesystem encoding
|
||||
|
||||
* ``TextIOWrapper`` and APIs using it including ``open()``,
|
||||
``Path.read_text()``, ``subprocess.Popen(cmd, text=True)``, etc...
|
||||
|
||||
This change will be introduced in Python 3.11 if possible.
|
||||
|
||||
|
||||
Enable UTF-8 mode by default
|
||||
----------------------------
|
||||
|
||||
Python enables UTF-8 mode by default.
|
||||
|
||||
User can still disable UTF-8 mode by setting ``PYTHONUTF8=0`` or ``-X utf8=0``.
|
||||
|
||||
|
||||
Backward Compatibility
|
||||
======================
|
||||
|
||||
Most Unix systems use UTF-8 locale and Python enables UTF-8 mode when its
|
||||
locale is C or POSIX. So this change mostly affects Windows users.
|
||||
|
||||
When a Python program depends on the default encoding, this change may cause
|
||||
``UnicodeError``, mojibake, or even silent data corruption. So this change
|
||||
should be announced very loudly.
|
||||
|
||||
To resolve this backward incompatibility, users can do:
|
||||
|
||||
* Disable UTF-8 mode
|
||||
* Use ``EncodingWarning`` to find where the default encoding is used and use
|
||||
``encoding="locale"`` option to keep using locale encoding. [2]_
|
||||
|
||||
|
||||
Preceding examples
|
||||
==================
|
||||
|
||||
* Ruby changed the default ``external_encoding`` to UTF-8 on Windows in Ruby
|
||||
3.0 (2020). [3]_
|
||||
* Java changed the default text encoding to UTF-8 in JDK 18. (2022). [4]_
|
||||
|
||||
Both Ruby and Java have an option for backward compatibility.
|
||||
They don't provide any warning like ``EncodingWarning`` [2]_ in Python for use
|
||||
of the default encoding.
|
||||
|
||||
|
||||
Rejected Alternative
|
||||
====================
|
||||
|
||||
Deprecate implicit encoding
|
||||
---------------------------
|
||||
|
||||
Deprecating use of the default encoding is considered.
|
||||
|
||||
But there are many cases user uses the default encoding when just they need
|
||||
ASCII. And some users use Python only on Unix with UTF-8 locale.
|
||||
|
||||
So forcing users to specify the ``encoding`` option everywhere is too painful.
|
||||
|
||||
Java also rejected this idea [4]_.
|
||||
|
||||
|
||||
How to teach this
|
||||
=================
|
||||
|
||||
For new users, this change reduces things that need to teach.
|
||||
|
||||
Users can delay learning about text encoding until they need to handle
|
||||
non-UTF-8 text files.
|
||||
|
||||
For existing users, see `Backward compatibility`_ section.
|
||||
|
||||
|
||||
Resources
|
||||
=========
|
||||
|
||||
.. [1] `PEP 540 – Add a new UTF-8 Mode`__
|
||||
|
||||
__ https://peps.python.org/pep-0540/
|
||||
|
||||
.. [2] `PEP 597 – Add optional EncodingWarning`__
|
||||
|
||||
__ https://peps.python.org/pep-0597/
|
||||
|
||||
.. [3] `Set default for Encoding.default_external to UTF-8 on Windows`__
|
||||
|
||||
__ https://bugs.ruby-lang.org/issues/16604
|
||||
|
||||
.. [4] `JEP 400: UTF-8 by Default`__
|
||||
|
||||
__ https://openjdk.java.net/jeps/400
|
||||
|
||||
|
||||
Copyright
|
||||
=========
|
||||
|
||||
This document is placed in the public domain or under the
|
||||
CC0-1.0-Universal license, whichever is more permissive.
|
Loading…
Reference in New Issue