152 lines
4.4 KiB
ReStructuredText
152 lines
4.4 KiB
ReStructuredText
PEP: 686
|
|
Title: Make UTF-8 mode default
|
|
Author: Inada Naoki <songofacandy@gmail.com>
|
|
Discussions-To: https://discuss.python.org/t/14435
|
|
Status: Draft
|
|
Type: Standards Track
|
|
Content-Type: text/x-rst
|
|
Created: 18-Mar-2022
|
|
Python-Version: 3.13
|
|
Post-History: `18-Mar-2022 <https://discuss.python.org/t/14435>`__
|
|
|
|
|
|
Abstract
|
|
========
|
|
|
|
This PEP proposes enabling :pep:`UTF-8 mode <540>` by default.
|
|
|
|
With this change, Python consistently uses UTF-8 for default encoding of
|
|
files, stdio, and pipes.
|
|
|
|
|
|
Motivation
|
|
==========
|
|
|
|
UTF-8 becomes de facto standard text encoding.
|
|
|
|
* The default encoding of Python source files is UTF-8.
|
|
* JSON, TOML, YAML use UTF-8.
|
|
* Most text editors, including Visual Studio Code and Windows Notepad use
|
|
UTF-8 by default.
|
|
* Most websites and text data on the internet use UTF-8.
|
|
* And many other popular programming languages, including Node.js, Go, Rust,
|
|
and Java uses UTF-8 by default.
|
|
|
|
Changing the default encoding to UTF-8 makes it easier for Python to
|
|
interoperate with them.
|
|
|
|
Additionally, many Python developers using Unix forget that the default
|
|
encoding is platform dependent.
|
|
They omit to specify ``encoding="utf-8"`` when they read text files encoded
|
|
in UTF-8 (e.g. JSON, TOML, Markdown, and Python source files).
|
|
Inconsistent default encoding causes many bugs.
|
|
|
|
|
|
Specification
|
|
=============
|
|
|
|
Enable UTF-8 mode by default
|
|
----------------------------
|
|
|
|
Python enables UTF-8 mode by default.
|
|
|
|
Users can still disable UTF-8 mode by setting ``PYTHONUTF8=0`` or
|
|
``-X utf8=0``.
|
|
|
|
|
|
``locale.get_encoding()``
|
|
-------------------------
|
|
|
|
Currently, ``TextIOWrapper`` uses ``locale.getpreferredencoding(False)``
|
|
when ``encoding="locale"`` option is specified. It is ``"UTF-8"`` in UTF-8 mode.
|
|
|
|
This behavior is inconsistent with the :pep:`597` motivation.
|
|
``TextIOWrapper`` should use locale encoding when ``encoding="locale"`` is
|
|
passed before/after the default encoding is changed to UTF-8.
|
|
|
|
To fix this inconsistency, we will add ``locale.get_encoding()``.
|
|
It is the same as ``locale.getpreferredencoding(False)`` but it ignores
|
|
the UTF-8 mode.
|
|
|
|
This change will be released in Python 3.11 so that users can use UTF-8 mode
|
|
that is the same as Python 3.13.
|
|
|
|
|
|
Backward Compatibility
|
|
======================
|
|
|
|
Most Unix systems use UTF-8 locale and Python enables UTF-8 mode when its
|
|
locale is C or POSIX.
|
|
So this change mostly affects Windows users.
|
|
|
|
When a Python program depends on the default encoding, this change may cause
|
|
``UnicodeError``, mojibake, or even silent data corruption.
|
|
So this change should be announced loudly.
|
|
|
|
To resolve this backward incompatibility, users can do:
|
|
|
|
* Disable UTF-8 mode.
|
|
* Use ``EncodingWarning`` to find where the default encoding is used and use
|
|
``encoding="locale"`` option if locale encoding should be used
|
|
(as defined in :pep:`597`).
|
|
* Find every occurrence of ``locale.getpreferredencoding(False)`` in the
|
|
application, and replace it with ``locale.get_locale_encoding()`` if
|
|
locale encoding should be used.
|
|
* Test the application with UTF-8 mode.
|
|
|
|
|
|
Preceding examples
|
|
==================
|
|
|
|
* Ruby `changed <Feature #16604_>`__ the default ``external_encoding``
|
|
to UTF-8 on Windows in Ruby 3.0 (2020).
|
|
* Java `changed <JEP 400_>`__ the default text encoding
|
|
to UTF-8 in JDK 18. (2022).
|
|
|
|
Both Ruby and Java have an option for backward compatibility.
|
|
They don't provide any warning like :pep:`597`'s ``EncodingWarning``
|
|
in Python for use of the default encoding.
|
|
|
|
|
|
Rejected Alternative
|
|
====================
|
|
|
|
Deprecate implicit encoding
|
|
---------------------------
|
|
|
|
Deprecating the use of the default encoding is considered.
|
|
|
|
But there are many cases that the default encoding is used for reading/writing
|
|
only ASCII text.
|
|
Additionally, such warnings are not useful for non-cross platform applications
|
|
run on Unix.
|
|
|
|
So forcing users to specify the ``encoding`` everywhere is too painful.
|
|
|
|
Java also rejected this idea in `JEP 400`_.
|
|
|
|
|
|
How to teach this
|
|
=================
|
|
|
|
For new users, this change reduces things that need to teach.
|
|
Users don't need to learn about text encoding in their first year.
|
|
They should learn it when they need to use non-UTF-8 text files.
|
|
|
|
For existing users, see the `Backward compatibility`_ section.
|
|
|
|
|
|
References
|
|
==========
|
|
|
|
.. _Feature #16604: https://bugs.ruby-lang.org/issues/16604
|
|
|
|
.. _JEP 400: https://openjdk.java.net/jeps/400
|
|
|
|
|
|
Copyright
|
|
=========
|
|
|
|
This document is placed in the public domain or under the
|
|
CC0-1.0-Universal license, whichever is more permissive.
|