PEP 540: truncate to 72 columns
This commit is contained in:
parent
2f52641e11
commit
40a9e6f4b3
79
pep-0540.txt
79
pep-0540.txt
|
@ -18,10 +18,13 @@ Abstract
|
|||
Add a new "UTF-8 Mode" to enhance Python's use of UTF-8. When UTF-8 Mode
|
||||
is active, Python will:
|
||||
|
||||
* use the ``utf-8`` locale, irregardless of the locale currently set by the current platform, and
|
||||
* change the ``stdin`` and ``stdout`` error handlers to ``surrogateescape``.
|
||||
* use the ``utf-8`` locale, irregardless of the locale currently set by
|
||||
the current platform, and
|
||||
* change the ``stdin`` and ``stdout`` error handlers to
|
||||
``surrogateescape``.
|
||||
|
||||
This mode is off by default, but is automatically activated when using the "POSIX" locale.
|
||||
This mode is off by default, but is automatically activated when using
|
||||
the "POSIX" locale.
|
||||
|
||||
Add the ``-X utf8`` command line option and ``PYTHONUTF8`` environment
|
||||
variable to control UTF-8 Mode.
|
||||
|
@ -42,17 +45,20 @@ locale, but are unable change the locale for various reasons. This
|
|||
encoding is very limited in term of Unicode support: any non-ASCII
|
||||
character is likely to cause trouble.
|
||||
|
||||
It isn't always easy to get an accurate locale. Locales don't get
|
||||
the exact same name on different Linux distributions, FreeBSD, macOS, etc.
|
||||
It isn't always easy to get an accurate locale. Locales don't get the
|
||||
exact same name on different Linux distributions, FreeBSD, macOS, etc.
|
||||
And some locales, like the recent ``C.UTF-8`` locale, are only supported
|
||||
by a few platforms. The current locale can even vary on the *same* platform
|
||||
depending on context; for example, a SSH connection can use a different
|
||||
encoding than the filesystem or local terminal encoding on the same machine.
|
||||
by a few platforms. The current locale can even vary on the *same*
|
||||
platform depending on context; for example, a SSH connection can use a
|
||||
different encoding than the filesystem or local terminal encoding on the
|
||||
same machine.
|
||||
|
||||
On the flip side, Python 3.6 is already using UTF-8 by default on
|
||||
macOS, Android and Windows (:pep:`529`) for most functions--although ``open()`` is a notable exception here. UTF-8 is also the default encoding of Python
|
||||
scripts, XML and JSON file formats. The Go programming language uses
|
||||
UTF-8 for all strings.
|
||||
On the flip side, Python 3.6 is already using UTF-8 by default on macOS,
|
||||
Android and Windows (:pep:`529`) for most functions -- although
|
||||
``open()`` is a notable exception here. UTF-8 is also the default
|
||||
encoding of Python scripts, XML and JSON file formats. The Go
|
||||
programming language
|
||||
uses UTF-8 for all strings.
|
||||
|
||||
UTF-8 support is nearly ubiquitous for data read and written by modern
|
||||
platforms. It also has excellent support in Python. The problem is
|
||||
|
@ -63,8 +69,9 @@ suggests itself: ignore the locale encoding and use UTF-8.
|
|||
Passthough for undecodable bytes: surrogateescape
|
||||
-------------------------------------------------
|
||||
|
||||
When decoding bytes from UTF-8 using the default ``strict`` error handler,
|
||||
Python 3 raises a ``UnicodeDecodeError`` on the first undecodable byte.
|
||||
When decoding bytes from UTF-8 using the default ``strict`` error
|
||||
handler, Python 3 raises a ``UnicodeDecodeError`` on the first
|
||||
undecodable byte.
|
||||
|
||||
Unix command line tools like ``cat`` or ``grep`` and most Python 2
|
||||
applications simply do not have this class of bugs: they don't decode
|
||||
|
@ -72,18 +79,18 @@ data, but process data as a raw bytes sequence.
|
|||
|
||||
Python 3 already has a solution to behave like Unix tools and Python 2:
|
||||
the ``surrogateescape`` error handler (:pep:`383`). It allows processing
|
||||
data as if it were bytes, but uses Unicode in practice; undecodable bytes
|
||||
are stored as surrogate characters.
|
||||
data as if it were bytes, but uses Unicode in practice; undecodable
|
||||
bytes are stored as surrogate characters.
|
||||
|
||||
UTF-8 Mode sets the ``surrogateescape`` error handler for ``stdin``
|
||||
and ``stdout``, since these streams as commonly associated to Unix
|
||||
command line tools.
|
||||
|
||||
However, users have a different expectation on files. Files are expected
|
||||
to be properly encoded, and Python is expected to fail early when ``open()``
|
||||
is called with the wrong options, like opening a JPEG picture in text
|
||||
mode. The ``open()`` default error handler remains ``strict`` for these
|
||||
reasons.
|
||||
to be properly encoded, and Python is expected to fail early when
|
||||
``open()`` is called with the wrong options, like opening a JPEG picture
|
||||
in text mode. The ``open()`` default error handler remains ``strict``
|
||||
for these reasons.
|
||||
|
||||
|
||||
No change by default for best backward compatibility
|
||||
|
@ -92,14 +99,14 @@ No change by default for best backward compatibility
|
|||
While UTF-8 is perfect in most cases, sometimes the locale encoding is
|
||||
actually the best encoding.
|
||||
|
||||
This PEP changes the behaviour for the POSIX locale since this locale
|
||||
is usually equivalent to the ASCII encoding, whereas UTF-8 is a much better
|
||||
choice. It does not change the behaviour for other locales to prevent any
|
||||
risk or regression.
|
||||
This PEP changes the behaviour for the POSIX locale since this locale is
|
||||
usually equivalent to the ASCII encoding, whereas UTF-8 is a much better
|
||||
choice. It does not change the behaviour for other locales to prevent
|
||||
any risk or regression.
|
||||
|
||||
As users are responsible to enable explicitly the new UTF-8 Mode for these
|
||||
other locales, they are responsible for any potential mojibake issues caused
|
||||
by UTF-8 Mode.
|
||||
As users are responsible to enable explicitly the new UTF-8 Mode for
|
||||
these other locales, they are responsible for any potential mojibake
|
||||
issues caused by UTF-8 Mode.
|
||||
|
||||
|
||||
Proposal
|
||||
|
@ -109,11 +116,14 @@ Add a new UTF-8 Mode to use the UTF-8 encoding, ignore the locale
|
|||
encoding, and change ``stdin`` and ``stdout`` error handlers to
|
||||
``surrogateescape``.
|
||||
|
||||
Add the new ``-X utf8`` command line option and ``PYTHONUTF8`` environment
|
||||
variable. Users can explicitly activate UTF-8 Mode with the command-line option ``-X utf8`` or by setting the environment variable ``PYTHONUTF8=1``.
|
||||
Add the new ``-X utf8`` command line option and ``PYTHONUTF8``
|
||||
environment variable. Users can explicitly activate UTF-8 Mode with the
|
||||
command-line option ``-X utf8`` or by setting the environment variable
|
||||
``PYTHONUTF8=1``.
|
||||
|
||||
This mode is disabled by default and enabled by the POSIX locale.
|
||||
Users can explicitly disable UTF-8 Mode with the command-line option ``-X utf8=0`` or by setting the environment variable ``PYTHONUTF8=0``.
|
||||
This mode is disabled by default and enabled by the POSIX locale. Users
|
||||
can explicitly disable UTF-8 Mode with the command-line option ``-X
|
||||
utf8=0`` or by setting the environment variable ``PYTHONUTF8=0``.
|
||||
|
||||
For standard streams, the ``PYTHONIOENCODING`` environment variable has
|
||||
priority over UTF-8 Mode.
|
||||
|
@ -142,14 +152,15 @@ Relationship with the locale coercion (PEP 538)
|
|||
===============================================
|
||||
|
||||
The POSIX locale enables the locale coercion (:pep:`538`) and the UTF-8
|
||||
mode (:pep:`540`). When the locale coercion is enabled, enabling the UTF-8
|
||||
mode has no additional effect.
|
||||
mode (:pep:`540`). When the locale coercion is enabled, enabling the
|
||||
UTF-8 mode has no additional effect.
|
||||
|
||||
The UTF-8 Mode has the same effect as locale coercion:
|
||||
|
||||
* ``sys.getfilesystemencoding()`` returns ``'UTF-8'``,
|
||||
* ``locale.getpreferredencoding()`` returns ``UTF-8``, and
|
||||
* the ``sys.stdin`` and ``sys.stdout`` error handlers are set to ``surrogateescape``.
|
||||
* the ``sys.stdin`` and ``sys.stdout`` error handlers are set to
|
||||
``surrogateescape``.
|
||||
|
||||
These changes only affect Python code. But the locale coercion has
|
||||
addiditonal effects: the ``LC_CTYPE`` environment variable and the
|
||||
|
|
Loading…
Reference in New Issue