PEP 540: truncate to 72 columns

This commit is contained in:
Victor Stinner 2017-12-11 10:13:43 +01:00
parent 2f52641e11
commit 40a9e6f4b3
1 changed files with 45 additions and 34 deletions

View File

@ -18,10 +18,13 @@ Abstract
Add a new "UTF-8 Mode" to enhance Python's use of UTF-8. When UTF-8 Mode Add a new "UTF-8 Mode" to enhance Python's use of UTF-8. When UTF-8 Mode
is active, Python will: is active, Python will:
* use the ``utf-8`` locale, irregardless of the locale currently set by the current platform, and * use the ``utf-8`` locale, irregardless of the locale currently set by
* change the ``stdin`` and ``stdout`` error handlers to ``surrogateescape``. the current platform, and
* change the ``stdin`` and ``stdout`` error handlers to
``surrogateescape``.
This mode is off by default, but is automatically activated when using the "POSIX" locale. This mode is off by default, but is automatically activated when using
the "POSIX" locale.
Add the ``-X utf8`` command line option and ``PYTHONUTF8`` environment Add the ``-X utf8`` command line option and ``PYTHONUTF8`` environment
variable to control UTF-8 Mode. variable to control UTF-8 Mode.
@ -42,17 +45,20 @@ locale, but are unable change the locale for various reasons. This
encoding is very limited in term of Unicode support: any non-ASCII encoding is very limited in term of Unicode support: any non-ASCII
character is likely to cause trouble. character is likely to cause trouble.
It isn't always easy to get an accurate locale. Locales don't get It isn't always easy to get an accurate locale. Locales don't get the
the exact same name on different Linux distributions, FreeBSD, macOS, etc. exact same name on different Linux distributions, FreeBSD, macOS, etc.
And some locales, like the recent ``C.UTF-8`` locale, are only supported And some locales, like the recent ``C.UTF-8`` locale, are only supported
by a few platforms. The current locale can even vary on the *same* platform by a few platforms. The current locale can even vary on the *same*
depending on context; for example, a SSH connection can use a different platform depending on context; for example, a SSH connection can use a
encoding than the filesystem or local terminal encoding on the same machine. different encoding than the filesystem or local terminal encoding on the
same machine.
On the flip side, Python 3.6 is already using UTF-8 by default on On the flip side, Python 3.6 is already using UTF-8 by default on macOS,
macOS, Android and Windows (:pep:`529`) for most functions--although ``open()`` is a notable exception here. UTF-8 is also the default encoding of Python Android and Windows (:pep:`529`) for most functions -- although
scripts, XML and JSON file formats. The Go programming language uses ``open()`` is a notable exception here. UTF-8 is also the default
UTF-8 for all strings. encoding of Python scripts, XML and JSON file formats. The Go
programming language
uses UTF-8 for all strings.
UTF-8 support is nearly ubiquitous for data read and written by modern UTF-8 support is nearly ubiquitous for data read and written by modern
platforms. It also has excellent support in Python. The problem is platforms. It also has excellent support in Python. The problem is
@ -63,8 +69,9 @@ suggests itself: ignore the locale encoding and use UTF-8.
Passthough for undecodable bytes: surrogateescape Passthough for undecodable bytes: surrogateescape
------------------------------------------------- -------------------------------------------------
When decoding bytes from UTF-8 using the default ``strict`` error handler, When decoding bytes from UTF-8 using the default ``strict`` error
Python 3 raises a ``UnicodeDecodeError`` on the first undecodable byte. handler, Python 3 raises a ``UnicodeDecodeError`` on the first
undecodable byte.
Unix command line tools like ``cat`` or ``grep`` and most Python 2 Unix command line tools like ``cat`` or ``grep`` and most Python 2
applications simply do not have this class of bugs: they don't decode applications simply do not have this class of bugs: they don't decode
@ -72,18 +79,18 @@ data, but process data as a raw bytes sequence.
Python 3 already has a solution to behave like Unix tools and Python 2: Python 3 already has a solution to behave like Unix tools and Python 2:
the ``surrogateescape`` error handler (:pep:`383`). It allows processing the ``surrogateescape`` error handler (:pep:`383`). It allows processing
data as if it were bytes, but uses Unicode in practice; undecodable bytes data as if it were bytes, but uses Unicode in practice; undecodable
are stored as surrogate characters. bytes are stored as surrogate characters.
UTF-8 Mode sets the ``surrogateescape`` error handler for ``stdin`` UTF-8 Mode sets the ``surrogateescape`` error handler for ``stdin``
and ``stdout``, since these streams as commonly associated to Unix and ``stdout``, since these streams as commonly associated to Unix
command line tools. command line tools.
However, users have a different expectation on files. Files are expected However, users have a different expectation on files. Files are expected
to be properly encoded, and Python is expected to fail early when ``open()`` to be properly encoded, and Python is expected to fail early when
is called with the wrong options, like opening a JPEG picture in text ``open()`` is called with the wrong options, like opening a JPEG picture
mode. The ``open()`` default error handler remains ``strict`` for these in text mode. The ``open()`` default error handler remains ``strict``
reasons. for these reasons.
No change by default for best backward compatibility No change by default for best backward compatibility
@ -92,14 +99,14 @@ No change by default for best backward compatibility
While UTF-8 is perfect in most cases, sometimes the locale encoding is While UTF-8 is perfect in most cases, sometimes the locale encoding is
actually the best encoding. actually the best encoding.
This PEP changes the behaviour for the POSIX locale since this locale This PEP changes the behaviour for the POSIX locale since this locale is
is usually equivalent to the ASCII encoding, whereas UTF-8 is a much better usually equivalent to the ASCII encoding, whereas UTF-8 is a much better
choice. It does not change the behaviour for other locales to prevent any choice. It does not change the behaviour for other locales to prevent
risk or regression. any risk or regression.
As users are responsible to enable explicitly the new UTF-8 Mode for these As users are responsible to enable explicitly the new UTF-8 Mode for
other locales, they are responsible for any potential mojibake issues caused these other locales, they are responsible for any potential mojibake
by UTF-8 Mode. issues caused by UTF-8 Mode.
Proposal Proposal
@ -109,11 +116,14 @@ Add a new UTF-8 Mode to use the UTF-8 encoding, ignore the locale
encoding, and change ``stdin`` and ``stdout`` error handlers to encoding, and change ``stdin`` and ``stdout`` error handlers to
``surrogateescape``. ``surrogateescape``.
Add the new ``-X utf8`` command line option and ``PYTHONUTF8`` environment Add the new ``-X utf8`` command line option and ``PYTHONUTF8``
variable. Users can explicitly activate UTF-8 Mode with the command-line option ``-X utf8`` or by setting the environment variable ``PYTHONUTF8=1``. environment variable. Users can explicitly activate UTF-8 Mode with the
command-line option ``-X utf8`` or by setting the environment variable
``PYTHONUTF8=1``.
This mode is disabled by default and enabled by the POSIX locale. This mode is disabled by default and enabled by the POSIX locale. Users
Users can explicitly disable UTF-8 Mode with the command-line option ``-X utf8=0`` or by setting the environment variable ``PYTHONUTF8=0``. can explicitly disable UTF-8 Mode with the command-line option ``-X
utf8=0`` or by setting the environment variable ``PYTHONUTF8=0``.
For standard streams, the ``PYTHONIOENCODING`` environment variable has For standard streams, the ``PYTHONIOENCODING`` environment variable has
priority over UTF-8 Mode. priority over UTF-8 Mode.
@ -142,14 +152,15 @@ Relationship with the locale coercion (PEP 538)
=============================================== ===============================================
The POSIX locale enables the locale coercion (:pep:`538`) and the UTF-8 The POSIX locale enables the locale coercion (:pep:`538`) and the UTF-8
mode (:pep:`540`). When the locale coercion is enabled, enabling the UTF-8 mode (:pep:`540`). When the locale coercion is enabled, enabling the
mode has no additional effect. UTF-8 mode has no additional effect.
The UTF-8 Mode has the same effect as locale coercion: The UTF-8 Mode has the same effect as locale coercion:
* ``sys.getfilesystemencoding()`` returns ``'UTF-8'``, * ``sys.getfilesystemencoding()`` returns ``'UTF-8'``,
* ``locale.getpreferredencoding()`` returns ``UTF-8``, and * ``locale.getpreferredencoding()`` returns ``UTF-8``, and
* the ``sys.stdin`` and ``sys.stdout`` error handlers are set to ``surrogateescape``. * the ``sys.stdin`` and ``sys.stdout`` error handlers are set to
``surrogateescape``.
These changes only affect Python code. But the locale coercion has These changes only affect Python code. But the locale coercion has
addiditonal effects: the ``LC_CTYPE`` environment variable and the addiditonal effects: the ``LC_CTYPE`` environment variable and the