142 lines
4.0 KiB
ReStructuredText
142 lines
4.0 KiB
ReStructuredText
PEP: 686
|
|
Title: Make UTF-8 mode default
|
|
Author: Inada Naoki <songofacandy@gmail.com>
|
|
Discussions-To: https://discuss.python.org/t/14435
|
|
Status: Draft
|
|
Type: Standards Track
|
|
Content-Type: text/x-rst
|
|
Created: 18-Mar-2022
|
|
Python-Version: 3.12
|
|
Post-History: `18-Mar-2022 <https://discuss.python.org/t/14435>`__
|
|
|
|
|
|
Abstract
|
|
========
|
|
|
|
This PEP proposes making :pep:`UTF-8 mode <540>` on by default.
|
|
|
|
With this change, Python uses UTF-8 for default encoding of files, stdio, and
|
|
pipes consistently.
|
|
|
|
|
|
Motivation
|
|
==========
|
|
|
|
UTF-8 becomes de-facto standard text encoding.
|
|
|
|
* Default encoding of Python source files is UTF-8.
|
|
* JSON, TOML, YAML uses UTF-8.
|
|
* Most text editors including VS Code and Windows notepad use UTF-8 by
|
|
default.
|
|
* Most websites and text data on the internet uses UTF-8.
|
|
* And many other popular programming languages including node.js, Go, Rust,
|
|
and Java uses UTF-8 by default.
|
|
|
|
Changing the default encoding to UTF-8 makes Python easier to interoperate
|
|
with them.
|
|
|
|
Additionally, many Python developers using Unix forget that the default
|
|
encoding is platform dependant. They omit to specify ``encoding="utf-8"`` when
|
|
they read text files encoded in UTF-8 (e.g. JSON, TOML, Markdown, and Python
|
|
source files). Inconsistent default encoding caused many bugs.
|
|
|
|
|
|
Specification
|
|
=============
|
|
|
|
Enable UTF-8 mode by default
|
|
----------------------------
|
|
|
|
Python enables UTF-8 mode by default.
|
|
|
|
User can still disable UTF-8 mode by setting ``PYTHONUTF8=0`` or ``-X utf8=0``.
|
|
|
|
|
|
``locale.get_encoding()``
|
|
-------------------------
|
|
|
|
Add ``locale.get_encoding()``. It is same to
|
|
``locale.getpreferredencoding(False)`` except it don't follow UTF-8 mode.
|
|
|
|
This API will be used by ``io.TextIOWrapper`` to support ``encoding="locale"``
|
|
option.
|
|
|
|
This change will be released in Python 3.11 so that users can prepare before
|
|
UTF-8 mode is enabled by default.
|
|
|
|
|
|
Backward Compatibility
|
|
======================
|
|
|
|
Most Unix systems use UTF-8 locale and Python enables UTF-8 mode when its
|
|
locale is C or POSIX. So this change mostly affects Windows users.
|
|
|
|
When a Python program depends on the default encoding, this change may cause
|
|
``UnicodeError``, mojibake, or even silent data corruption. So this change
|
|
should be announced very loudly.
|
|
|
|
To resolve this backward incompatibility, users can do:
|
|
|
|
* Disable UTF-8 mode.
|
|
* Use ``EncodingWarning`` to find where the default encoding is used and use
|
|
``encoding="locale"`` option if locale encoding should be used
|
|
(as defined in :pep:`597`).
|
|
* Find every occurrence of ``locale.getpreferredencoding(False)`` in the
|
|
application, and replace it with ``locale.get_locale_encoding()`` if
|
|
locale encoding should be used.
|
|
* Test the application with UTF-8 mode.
|
|
|
|
|
|
Preceding examples
|
|
==================
|
|
|
|
* Ruby `changed <Feature #16604_>`__ the default ``external_encoding``
|
|
to UTF-8 on Windows in Ruby 3.0 (2020).
|
|
* Java `changed <JEP 400_>`__ the default text encoding
|
|
to UTF-8 in JDK 18. (2022).
|
|
|
|
Both Ruby and Java have an option for backward compatibility.
|
|
They don't provide any warning like :pep:`597`'s ``EncodingWarning``
|
|
in Python for use of the default encoding.
|
|
|
|
|
|
Rejected Alternative
|
|
====================
|
|
|
|
Deprecate implicit encoding
|
|
---------------------------
|
|
|
|
Deprecating use of the default encoding is considered.
|
|
|
|
But there are many cases user uses the default encoding when just they need
|
|
ASCII. And some users use Python only on Unix with UTF-8 locale.
|
|
|
|
So forcing users to specify the ``encoding`` option everywhere is too painful.
|
|
|
|
Java also rejected this idea in `JEP 400`_.
|
|
|
|
|
|
How to teach this
|
|
=================
|
|
|
|
For new users, this change reduces things that need to teach.
|
|
Users don't need to learn about text encoding in their first year.
|
|
They need to learn it when they need to use non-UTF-8 text files.
|
|
|
|
For existing users, see the `Backward compatibility`_ section.
|
|
|
|
|
|
References
|
|
==========
|
|
|
|
.. _Feature #16604: https://bugs.ruby-lang.org/issues/16604
|
|
|
|
.. _JEP 400: https://openjdk.java.net/jeps/400
|
|
|
|
|
|
Copyright
|
|
=========
|
|
|
|
This document is placed in the public domain or under the
|
|
CC0-1.0-Universal license, whichever is more permissive.
|