From 40a9e6f4b3d3fb3ba3d9c470c14a7b73b05547f0 Mon Sep 17 00:00:00 2001 From: Victor Stinner Date: Mon, 11 Dec 2017 10:13:43 +0100 Subject: [PATCH] PEP 540: truncate to 72 columns --- pep-0540.txt | 79 ++++++++++++++++++++++++++++++---------------------- 1 file changed, 45 insertions(+), 34 deletions(-) diff --git a/pep-0540.txt b/pep-0540.txt index 9c06c3cbc..a7cbd9af9 100644 --- a/pep-0540.txt +++ b/pep-0540.txt @@ -18,10 +18,13 @@ Abstract Add a new "UTF-8 Mode" to enhance Python's use of UTF-8. When UTF-8 Mode is active, Python will: -* use the ``utf-8`` locale, irregardless of the locale currently set by the current platform, and -* change the ``stdin`` and ``stdout`` error handlers to ``surrogateescape``. +* use the ``utf-8`` locale, irregardless of the locale currently set by + the current platform, and +* change the ``stdin`` and ``stdout`` error handlers to + ``surrogateescape``. -This mode is off by default, but is automatically activated when using the "POSIX" locale. +This mode is off by default, but is automatically activated when using +the "POSIX" locale. Add the ``-X utf8`` command line option and ``PYTHONUTF8`` environment variable to control UTF-8 Mode. @@ -42,17 +45,20 @@ locale, but are unable change the locale for various reasons. This encoding is very limited in term of Unicode support: any non-ASCII character is likely to cause trouble. -It isn't always easy to get an accurate locale. Locales don't get -the exact same name on different Linux distributions, FreeBSD, macOS, etc. +It isn't always easy to get an accurate locale. Locales don't get the +exact same name on different Linux distributions, FreeBSD, macOS, etc. And some locales, like the recent ``C.UTF-8`` locale, are only supported -by a few platforms. The current locale can even vary on the *same* platform -depending on context; for example, a SSH connection can use a different -encoding than the filesystem or local terminal encoding on the same machine. +by a few platforms. The current locale can even vary on the *same* +platform depending on context; for example, a SSH connection can use a +different encoding than the filesystem or local terminal encoding on the +same machine. -On the flip side, Python 3.6 is already using UTF-8 by default on -macOS, Android and Windows (:pep:`529`) for most functions--although ``open()`` is a notable exception here. UTF-8 is also the default encoding of Python -scripts, XML and JSON file formats. The Go programming language uses -UTF-8 for all strings. +On the flip side, Python 3.6 is already using UTF-8 by default on macOS, +Android and Windows (:pep:`529`) for most functions -- although +``open()`` is a notable exception here. UTF-8 is also the default +encoding of Python scripts, XML and JSON file formats. The Go +programming language +uses UTF-8 for all strings. UTF-8 support is nearly ubiquitous for data read and written by modern platforms. It also has excellent support in Python. The problem is @@ -63,8 +69,9 @@ suggests itself: ignore the locale encoding and use UTF-8. Passthough for undecodable bytes: surrogateescape ------------------------------------------------- -When decoding bytes from UTF-8 using the default ``strict`` error handler, -Python 3 raises a ``UnicodeDecodeError`` on the first undecodable byte. +When decoding bytes from UTF-8 using the default ``strict`` error +handler, Python 3 raises a ``UnicodeDecodeError`` on the first +undecodable byte. Unix command line tools like ``cat`` or ``grep`` and most Python 2 applications simply do not have this class of bugs: they don't decode @@ -72,18 +79,18 @@ data, but process data as a raw bytes sequence. Python 3 already has a solution to behave like Unix tools and Python 2: the ``surrogateescape`` error handler (:pep:`383`). It allows processing -data as if it were bytes, but uses Unicode in practice; undecodable bytes -are stored as surrogate characters. +data as if it were bytes, but uses Unicode in practice; undecodable +bytes are stored as surrogate characters. UTF-8 Mode sets the ``surrogateescape`` error handler for ``stdin`` and ``stdout``, since these streams as commonly associated to Unix command line tools. However, users have a different expectation on files. Files are expected -to be properly encoded, and Python is expected to fail early when ``open()`` -is called with the wrong options, like opening a JPEG picture in text -mode. The ``open()`` default error handler remains ``strict`` for these -reasons. +to be properly encoded, and Python is expected to fail early when +``open()`` is called with the wrong options, like opening a JPEG picture +in text mode. The ``open()`` default error handler remains ``strict`` +for these reasons. No change by default for best backward compatibility @@ -92,14 +99,14 @@ No change by default for best backward compatibility While UTF-8 is perfect in most cases, sometimes the locale encoding is actually the best encoding. -This PEP changes the behaviour for the POSIX locale since this locale -is usually equivalent to the ASCII encoding, whereas UTF-8 is a much better -choice. It does not change the behaviour for other locales to prevent any -risk or regression. +This PEP changes the behaviour for the POSIX locale since this locale is +usually equivalent to the ASCII encoding, whereas UTF-8 is a much better +choice. It does not change the behaviour for other locales to prevent +any risk or regression. -As users are responsible to enable explicitly the new UTF-8 Mode for these -other locales, they are responsible for any potential mojibake issues caused -by UTF-8 Mode. +As users are responsible to enable explicitly the new UTF-8 Mode for +these other locales, they are responsible for any potential mojibake +issues caused by UTF-8 Mode. Proposal @@ -109,11 +116,14 @@ Add a new UTF-8 Mode to use the UTF-8 encoding, ignore the locale encoding, and change ``stdin`` and ``stdout`` error handlers to ``surrogateescape``. -Add the new ``-X utf8`` command line option and ``PYTHONUTF8`` environment -variable. Users can explicitly activate UTF-8 Mode with the command-line option ``-X utf8`` or by setting the environment variable ``PYTHONUTF8=1``. +Add the new ``-X utf8`` command line option and ``PYTHONUTF8`` +environment variable. Users can explicitly activate UTF-8 Mode with the +command-line option ``-X utf8`` or by setting the environment variable +``PYTHONUTF8=1``. -This mode is disabled by default and enabled by the POSIX locale. -Users can explicitly disable UTF-8 Mode with the command-line option ``-X utf8=0`` or by setting the environment variable ``PYTHONUTF8=0``. +This mode is disabled by default and enabled by the POSIX locale. Users +can explicitly disable UTF-8 Mode with the command-line option ``-X +utf8=0`` or by setting the environment variable ``PYTHONUTF8=0``. For standard streams, the ``PYTHONIOENCODING`` environment variable has priority over UTF-8 Mode. @@ -142,14 +152,15 @@ Relationship with the locale coercion (PEP 538) =============================================== The POSIX locale enables the locale coercion (:pep:`538`) and the UTF-8 -mode (:pep:`540`). When the locale coercion is enabled, enabling the UTF-8 -mode has no additional effect. +mode (:pep:`540`). When the locale coercion is enabled, enabling the +UTF-8 mode has no additional effect. The UTF-8 Mode has the same effect as locale coercion: * ``sys.getfilesystemencoding()`` returns ``'UTF-8'``, * ``locale.getpreferredencoding()`` returns ``UTF-8``, and -* the ``sys.stdin`` and ``sys.stdout`` error handlers are set to ``surrogateescape``. +* the ``sys.stdin`` and ``sys.stdout`` error handlers are set to + ``surrogateescape``. These changes only affect Python code. But the locale coercion has addiditonal effects: the ``LC_CTYPE`` environment variable and the