PEP 540: truncate to 72 columns
This commit is contained in:
parent
2f52641e11
commit
40a9e6f4b3
79
pep-0540.txt
79
pep-0540.txt
|
@ -18,10 +18,13 @@ Abstract
|
||||||
Add a new "UTF-8 Mode" to enhance Python's use of UTF-8. When UTF-8 Mode
|
Add a new "UTF-8 Mode" to enhance Python's use of UTF-8. When UTF-8 Mode
|
||||||
is active, Python will:
|
is active, Python will:
|
||||||
|
|
||||||
* use the ``utf-8`` locale, irregardless of the locale currently set by the current platform, and
|
* use the ``utf-8`` locale, irregardless of the locale currently set by
|
||||||
* change the ``stdin`` and ``stdout`` error handlers to ``surrogateescape``.
|
the current platform, and
|
||||||
|
* change the ``stdin`` and ``stdout`` error handlers to
|
||||||
|
``surrogateescape``.
|
||||||
|
|
||||||
This mode is off by default, but is automatically activated when using the "POSIX" locale.
|
This mode is off by default, but is automatically activated when using
|
||||||
|
the "POSIX" locale.
|
||||||
|
|
||||||
Add the ``-X utf8`` command line option and ``PYTHONUTF8`` environment
|
Add the ``-X utf8`` command line option and ``PYTHONUTF8`` environment
|
||||||
variable to control UTF-8 Mode.
|
variable to control UTF-8 Mode.
|
||||||
|
@ -42,17 +45,20 @@ locale, but are unable change the locale for various reasons. This
|
||||||
encoding is very limited in term of Unicode support: any non-ASCII
|
encoding is very limited in term of Unicode support: any non-ASCII
|
||||||
character is likely to cause trouble.
|
character is likely to cause trouble.
|
||||||
|
|
||||||
It isn't always easy to get an accurate locale. Locales don't get
|
It isn't always easy to get an accurate locale. Locales don't get the
|
||||||
the exact same name on different Linux distributions, FreeBSD, macOS, etc.
|
exact same name on different Linux distributions, FreeBSD, macOS, etc.
|
||||||
And some locales, like the recent ``C.UTF-8`` locale, are only supported
|
And some locales, like the recent ``C.UTF-8`` locale, are only supported
|
||||||
by a few platforms. The current locale can even vary on the *same* platform
|
by a few platforms. The current locale can even vary on the *same*
|
||||||
depending on context; for example, a SSH connection can use a different
|
platform depending on context; for example, a SSH connection can use a
|
||||||
encoding than the filesystem or local terminal encoding on the same machine.
|
different encoding than the filesystem or local terminal encoding on the
|
||||||
|
same machine.
|
||||||
|
|
||||||
On the flip side, Python 3.6 is already using UTF-8 by default on
|
On the flip side, Python 3.6 is already using UTF-8 by default on macOS,
|
||||||
macOS, Android and Windows (:pep:`529`) for most functions--although ``open()`` is a notable exception here. UTF-8 is also the default encoding of Python
|
Android and Windows (:pep:`529`) for most functions -- although
|
||||||
scripts, XML and JSON file formats. The Go programming language uses
|
``open()`` is a notable exception here. UTF-8 is also the default
|
||||||
UTF-8 for all strings.
|
encoding of Python scripts, XML and JSON file formats. The Go
|
||||||
|
programming language
|
||||||
|
uses UTF-8 for all strings.
|
||||||
|
|
||||||
UTF-8 support is nearly ubiquitous for data read and written by modern
|
UTF-8 support is nearly ubiquitous for data read and written by modern
|
||||||
platforms. It also has excellent support in Python. The problem is
|
platforms. It also has excellent support in Python. The problem is
|
||||||
|
@ -63,8 +69,9 @@ suggests itself: ignore the locale encoding and use UTF-8.
|
||||||
Passthough for undecodable bytes: surrogateescape
|
Passthough for undecodable bytes: surrogateescape
|
||||||
-------------------------------------------------
|
-------------------------------------------------
|
||||||
|
|
||||||
When decoding bytes from UTF-8 using the default ``strict`` error handler,
|
When decoding bytes from UTF-8 using the default ``strict`` error
|
||||||
Python 3 raises a ``UnicodeDecodeError`` on the first undecodable byte.
|
handler, Python 3 raises a ``UnicodeDecodeError`` on the first
|
||||||
|
undecodable byte.
|
||||||
|
|
||||||
Unix command line tools like ``cat`` or ``grep`` and most Python 2
|
Unix command line tools like ``cat`` or ``grep`` and most Python 2
|
||||||
applications simply do not have this class of bugs: they don't decode
|
applications simply do not have this class of bugs: they don't decode
|
||||||
|
@ -72,18 +79,18 @@ data, but process data as a raw bytes sequence.
|
||||||
|
|
||||||
Python 3 already has a solution to behave like Unix tools and Python 2:
|
Python 3 already has a solution to behave like Unix tools and Python 2:
|
||||||
the ``surrogateescape`` error handler (:pep:`383`). It allows processing
|
the ``surrogateescape`` error handler (:pep:`383`). It allows processing
|
||||||
data as if it were bytes, but uses Unicode in practice; undecodable bytes
|
data as if it were bytes, but uses Unicode in practice; undecodable
|
||||||
are stored as surrogate characters.
|
bytes are stored as surrogate characters.
|
||||||
|
|
||||||
UTF-8 Mode sets the ``surrogateescape`` error handler for ``stdin``
|
UTF-8 Mode sets the ``surrogateescape`` error handler for ``stdin``
|
||||||
and ``stdout``, since these streams as commonly associated to Unix
|
and ``stdout``, since these streams as commonly associated to Unix
|
||||||
command line tools.
|
command line tools.
|
||||||
|
|
||||||
However, users have a different expectation on files. Files are expected
|
However, users have a different expectation on files. Files are expected
|
||||||
to be properly encoded, and Python is expected to fail early when ``open()``
|
to be properly encoded, and Python is expected to fail early when
|
||||||
is called with the wrong options, like opening a JPEG picture in text
|
``open()`` is called with the wrong options, like opening a JPEG picture
|
||||||
mode. The ``open()`` default error handler remains ``strict`` for these
|
in text mode. The ``open()`` default error handler remains ``strict``
|
||||||
reasons.
|
for these reasons.
|
||||||
|
|
||||||
|
|
||||||
No change by default for best backward compatibility
|
No change by default for best backward compatibility
|
||||||
|
@ -92,14 +99,14 @@ No change by default for best backward compatibility
|
||||||
While UTF-8 is perfect in most cases, sometimes the locale encoding is
|
While UTF-8 is perfect in most cases, sometimes the locale encoding is
|
||||||
actually the best encoding.
|
actually the best encoding.
|
||||||
|
|
||||||
This PEP changes the behaviour for the POSIX locale since this locale
|
This PEP changes the behaviour for the POSIX locale since this locale is
|
||||||
is usually equivalent to the ASCII encoding, whereas UTF-8 is a much better
|
usually equivalent to the ASCII encoding, whereas UTF-8 is a much better
|
||||||
choice. It does not change the behaviour for other locales to prevent any
|
choice. It does not change the behaviour for other locales to prevent
|
||||||
risk or regression.
|
any risk or regression.
|
||||||
|
|
||||||
As users are responsible to enable explicitly the new UTF-8 Mode for these
|
As users are responsible to enable explicitly the new UTF-8 Mode for
|
||||||
other locales, they are responsible for any potential mojibake issues caused
|
these other locales, they are responsible for any potential mojibake
|
||||||
by UTF-8 Mode.
|
issues caused by UTF-8 Mode.
|
||||||
|
|
||||||
|
|
||||||
Proposal
|
Proposal
|
||||||
|
@ -109,11 +116,14 @@ Add a new UTF-8 Mode to use the UTF-8 encoding, ignore the locale
|
||||||
encoding, and change ``stdin`` and ``stdout`` error handlers to
|
encoding, and change ``stdin`` and ``stdout`` error handlers to
|
||||||
``surrogateescape``.
|
``surrogateescape``.
|
||||||
|
|
||||||
Add the new ``-X utf8`` command line option and ``PYTHONUTF8`` environment
|
Add the new ``-X utf8`` command line option and ``PYTHONUTF8``
|
||||||
variable. Users can explicitly activate UTF-8 Mode with the command-line option ``-X utf8`` or by setting the environment variable ``PYTHONUTF8=1``.
|
environment variable. Users can explicitly activate UTF-8 Mode with the
|
||||||
|
command-line option ``-X utf8`` or by setting the environment variable
|
||||||
|
``PYTHONUTF8=1``.
|
||||||
|
|
||||||
This mode is disabled by default and enabled by the POSIX locale.
|
This mode is disabled by default and enabled by the POSIX locale. Users
|
||||||
Users can explicitly disable UTF-8 Mode with the command-line option ``-X utf8=0`` or by setting the environment variable ``PYTHONUTF8=0``.
|
can explicitly disable UTF-8 Mode with the command-line option ``-X
|
||||||
|
utf8=0`` or by setting the environment variable ``PYTHONUTF8=0``.
|
||||||
|
|
||||||
For standard streams, the ``PYTHONIOENCODING`` environment variable has
|
For standard streams, the ``PYTHONIOENCODING`` environment variable has
|
||||||
priority over UTF-8 Mode.
|
priority over UTF-8 Mode.
|
||||||
|
@ -142,14 +152,15 @@ Relationship with the locale coercion (PEP 538)
|
||||||
===============================================
|
===============================================
|
||||||
|
|
||||||
The POSIX locale enables the locale coercion (:pep:`538`) and the UTF-8
|
The POSIX locale enables the locale coercion (:pep:`538`) and the UTF-8
|
||||||
mode (:pep:`540`). When the locale coercion is enabled, enabling the UTF-8
|
mode (:pep:`540`). When the locale coercion is enabled, enabling the
|
||||||
mode has no additional effect.
|
UTF-8 mode has no additional effect.
|
||||||
|
|
||||||
The UTF-8 Mode has the same effect as locale coercion:
|
The UTF-8 Mode has the same effect as locale coercion:
|
||||||
|
|
||||||
* ``sys.getfilesystemencoding()`` returns ``'UTF-8'``,
|
* ``sys.getfilesystemencoding()`` returns ``'UTF-8'``,
|
||||||
* ``locale.getpreferredencoding()`` returns ``UTF-8``, and
|
* ``locale.getpreferredencoding()`` returns ``UTF-8``, and
|
||||||
* the ``sys.stdin`` and ``sys.stdout`` error handlers are set to ``surrogateescape``.
|
* the ``sys.stdin`` and ``sys.stdout`` error handlers are set to
|
||||||
|
``surrogateescape``.
|
||||||
|
|
||||||
These changes only affect Python code. But the locale coercion has
|
These changes only affect Python code. But the locale coercion has
|
||||||
addiditonal effects: the ``LC_CTYPE`` environment variable and the
|
addiditonal effects: the ``LC_CTYPE`` environment variable and the
|
||||||
|
|
Loading…
Reference in New Issue