PEP 540: Remove the strict mode

This commit is contained in:
Victor Stinner 2017-12-08 01:43:42 +01:00
parent 22b31e0e82
commit 366c82f00c
1 changed files with 27 additions and 40 deletions

View File

@ -14,13 +14,11 @@ Python-Version: 3.7
Abstract
========
Add a new UTF-8 mode to ignore the locale and use the UTF-8 encoding.
Add a new UTF-8 mode to ignore the locale, use the UTF-8 encoding, and
change ``stdin`` and ``stdout`` error handlers to ``surrogateescape``.
This mode is enabled by default in the POSIX locale, but otherwise
disabled by default.
Add also a "strict" UTF-8 mode which uses the ``strict`` error handler,
instead of ``surrogateescape``, with the UTF-8 encoding.
The new ``-X utf8`` command line option and ``PYTHONUTF8`` environment
variable are added to control the UTF-8 mode.
@ -65,8 +63,9 @@ locale coercion is ineffective.
Passthough undecodable bytes: surrogateescape
---------------------------------------------
When using the ``strict`` error handler, which is the default, Python 3
raises a ``UnicodeDecodeError`` on the first undecodable byte.
When decoding bytes from UTF-8 using the ``strict`` error handler, which
is the default, Python 3 raises a ``UnicodeDecodeError`` on the first
undecodable byte.
Unix command line tools like ``cat`` or ``grep`` and most Python 2
applications simply do not have this class of bugs: they don't decode
@ -88,13 +87,6 @@ mode. The ``open()`` default error handler remains ``strict`` for these
reasons.
Strict UTF-8 for correctness
----------------------------
When correctness matters more than usability, the ``strict`` error
handler is preferred over ``surrogateescape`` to raise an encoding error
at the first undecodable byte or unencodable character.
No change by default for best backward compatibility
----------------------------------------------------
@ -113,19 +105,14 @@ are responsible for any potential mojibake issues caused by this mode.
Proposal
========
Add a new UTF-8 mode to ignore the locale and use the UTF-8 encoding
with the ``surrogateescape`` error handler. This mode is enabled by
default in the POSIX locale, but otherwise disabled by default.
Add also a "strict" UTF-8 mode which uses the ``strict`` error handler,
instead of ``surrogateescape``, with the UTF-8 encoding.
Add a new UTF-8 mode to ignore the locale, use the UTF-8 encoding, and
change ``stdin`` and ``stdout`` error handlers to ``surrogateescape``.
This mode is enabled by default in the POSIX locale, but otherwise
disabled by default.
The new ``-X utf8`` command line option and ``PYTHONUTF8`` environment
variable are added to control the UTF-8 mode:
* The UTF-8 mode is enabled by ``-X utf8`` or ``PYTHONUTF8=1``
* The Strict UTF-8 mode is configured by ``-X utf8=strict`` or
``PYTHONUTF8=strict``
variable are added. The UTF-8 mode is enabled by ``-X utf8`` or
``PYTHONUTF8=1``.
The POSIX locale enables the UTF-8 mode. In this case, the UTF-8 mode
can be explicitly disabled by ``-X utf8=0`` or ``PYTHONUTF8=0``.
@ -154,14 +141,14 @@ The UTF-8 mode changes the default encoding and error handler used by
Encoding and error handler
--------------------------
============================ ======================= ========================== ==========================
Function Default UTF-8 mode or POSIX locale Strict UTF-8 mode
============================ ======================= ========================== ==========================
open() locale/strict **UTF-8**/strict **UTF-8**/strict
os.fsdecode(), os.fsencode() locale/surrogateescape **UTF-8**/surrogateescape **UTF-8**/surrogateescape
sys.stdin, sys.stdout locale/strict **UTF-8/surrogateescape** **UTF-8**/strict
sys.stderr locale/backslashreplace **UTF-8**/backslashreplace **UTF-8**/backslashreplace
============================ ======================= ========================== ==========================
============================ ======================= ==========================
Function Default UTF-8 mode or POSIX locale
============================ ======================= ==========================
open() locale/strict **UTF-8**/strict
os.fsdecode(), os.fsencode() locale/surrogateescape **UTF-8**/surrogateescape
sys.stdin, sys.stdout locale/strict **UTF-8/surrogateescape**
sys.stderr locale/backslashreplace **UTF-8**/backslashreplace
============================ ======================= ==========================
By comparison, Python 3.6 uses:
@ -179,14 +166,14 @@ Encoding and error handler on Windows
On Windows, the encodings and error handlers are different:
============================ ======================= ========================== ========================== ==========================
Function Default Legacy Windows FS encoding UTF-8 mode Strict UTF-8 mode
============================ ======================= ========================== ========================== ==========================
open() mbcs/strict mbcs/strict **UTF-8**/strict **UTF-8**/strict
os.fsdecode(), os.fsencode() UTF-8/surrogatepass **mbcs/replace** UTF-8/surrogatepass UTF-8/surrogatepass
sys.stdin, sys.stdout UTF-8/surrogateescape UTF-8/surrogateescape UTF-8/surrogateescape **UTF-8/strict**
sys.stderr UTF-8/backslashreplace UTF-8/backslashreplace UTF-8/backslashreplace UTF-8/backslashreplace
============================ ======================= ========================== ========================== ==========================
============================ ======================= ========================== ==========================
Function Default Legacy Windows FS encoding UTF-8 mode
============================ ======================= ========================== ==========================
open() mbcs/strict mbcs/strict **UTF-8**/strict
os.fsdecode(), os.fsencode() UTF-8/surrogatepass **mbcs/replace** UTF-8/surrogatepass
sys.stdin, sys.stdout UTF-8/surrogateescape UTF-8/surrogateescape UTF-8/surrogateescape
sys.stderr UTF-8/backslashreplace UTF-8/backslashreplace UTF-8/backslashreplace
============================ ======================= ========================== ==========================
By comparison, Python 3.6 uses: