PEP 686: Grammar fixes (#2464)
Co-authored-by: Hugo van Kemenade <hugovk@users.noreply.github.com>
This commit is contained in:
parent
75235028b7
commit
a5edad1428
60
pep-0686.rst
60
pep-0686.rst
|
@ -13,32 +13,33 @@ Post-History: `18-Mar-2022 <https://discuss.python.org/t/14435>`__
|
|||
Abstract
|
||||
========
|
||||
|
||||
This PEP proposes making :pep:`UTF-8 mode <540>` on by default.
|
||||
This PEP proposes enabling :pep:`UTF-8 mode <540>` by default.
|
||||
|
||||
With this change, Python uses UTF-8 for default encoding of files, stdio, and
|
||||
pipes consistently.
|
||||
With this change, Python consistently uses UTF-8 for default encoding of
|
||||
files, stdio, and pipes.
|
||||
|
||||
|
||||
Motivation
|
||||
==========
|
||||
|
||||
UTF-8 becomes de-facto standard text encoding.
|
||||
UTF-8 becomes de facto standard text encoding.
|
||||
|
||||
* Default encoding of Python source files is UTF-8.
|
||||
* JSON, TOML, YAML uses UTF-8.
|
||||
* Most text editors including VS Code and Windows notepad use UTF-8 by
|
||||
default.
|
||||
* Most websites and text data on the internet uses UTF-8.
|
||||
* And many other popular programming languages including node.js, Go, Rust,
|
||||
* The default encoding of Python source files is UTF-8.
|
||||
* JSON, TOML, YAML use UTF-8.
|
||||
* Most text editors, including Visual Studio Code and Windows Notepad use
|
||||
UTF-8 by default.
|
||||
* Most websites and text data on the internet use UTF-8.
|
||||
* And many other popular programming languages, including Node.js, Go, Rust,
|
||||
and Java uses UTF-8 by default.
|
||||
|
||||
Changing the default encoding to UTF-8 makes Python easier to interoperate
|
||||
with them.
|
||||
Changing the default encoding to UTF-8 makes it easier for Python to
|
||||
interoperate with them.
|
||||
|
||||
Additionally, many Python developers using Unix forget that the default
|
||||
encoding is platform dependant. They omit to specify ``encoding="utf-8"`` when
|
||||
they read text files encoded in UTF-8 (e.g. JSON, TOML, Markdown, and Python
|
||||
source files). Inconsistent default encoding caused many bugs.
|
||||
encoding is platform dependent.
|
||||
They omit to specify ``encoding="utf-8"`` when they read text files encoded
|
||||
in UTF-8 (e.g. JSON, TOML, Markdown, and Python source files).
|
||||
Inconsistent default encoding causes many bugs.
|
||||
|
||||
|
||||
Specification
|
||||
|
@ -49,7 +50,8 @@ Enable UTF-8 mode by default
|
|||
|
||||
Python enables UTF-8 mode by default.
|
||||
|
||||
User can still disable UTF-8 mode by setting ``PYTHONUTF8=0`` or ``-X utf8=0``.
|
||||
Users can still disable UTF-8 mode by setting ``PYTHONUTF8=0`` or
|
||||
``-X utf8=0``.
|
||||
|
||||
|
||||
``locale.get_encoding()``
|
||||
|
@ -62,22 +64,24 @@ This behavior is inconsistent with the :pep:`597` motivation.
|
|||
``TextIOWrapper`` should use locale encoding when ``encoding="locale"`` is
|
||||
passed before/after the default encoding is changed to UTF-8.
|
||||
|
||||
To fix this inconsistency, we will add ``locale.get_encoding()``. It is same
|
||||
to ``locale.getpreferredencoding(False)`` but it ignore the UTF-8 mode.
|
||||
To fix this inconsistency, we will add ``locale.get_encoding()``.
|
||||
It is the same as ``locale.getpreferredencoding(False)`` but it ignores
|
||||
the UTF-8 mode.
|
||||
|
||||
This change will be released in Python 3.11 so that users can use UTF-8 mode
|
||||
that is same to Python 3.13.
|
||||
that is the same as Python 3.13.
|
||||
|
||||
|
||||
Backward Compatibility
|
||||
======================
|
||||
|
||||
Most Unix systems use UTF-8 locale and Python enables UTF-8 mode when its
|
||||
locale is C or POSIX. So this change mostly affects Windows users.
|
||||
locale is C or POSIX.
|
||||
So this change mostly affects Windows users.
|
||||
|
||||
When a Python program depends on the default encoding, this change may cause
|
||||
``UnicodeError``, mojibake, or even silent data corruption. So this change
|
||||
should be announced very loudly.
|
||||
``UnicodeError``, mojibake, or even silent data corruption.
|
||||
So this change should be announced loudly.
|
||||
|
||||
To resolve this backward incompatibility, users can do:
|
||||
|
||||
|
@ -110,12 +114,14 @@ Rejected Alternative
|
|||
Deprecate implicit encoding
|
||||
---------------------------
|
||||
|
||||
Deprecating use of the default encoding is considered.
|
||||
Deprecating the use of the default encoding is considered.
|
||||
|
||||
But there are many cases user uses the default encoding when just they need
|
||||
ASCII. And some users use Python only on Unix with UTF-8 locale.
|
||||
But there are many cases that the default encoding is used for reading/writing
|
||||
only ASCII text.
|
||||
Additionally, such warnings are not useful for non-cross platform applications
|
||||
run on Unix.
|
||||
|
||||
So forcing users to specify the ``encoding`` option everywhere is too painful.
|
||||
So forcing users to specify the ``encoding`` everywhere is too painful.
|
||||
|
||||
Java also rejected this idea in `JEP 400`_.
|
||||
|
||||
|
@ -125,7 +131,7 @@ How to teach this
|
|||
|
||||
For new users, this change reduces things that need to teach.
|
||||
Users don't need to learn about text encoding in their first year.
|
||||
They need to learn it when they need to use non-UTF-8 text files.
|
||||
They should learn it when they need to use non-UTF-8 text files.
|
||||
|
||||
For existing users, see the `Backward compatibility`_ section.
|
||||
|
||||
|
|
Loading…
Reference in New Issue