PEP 540: Remove the strict mode
This commit is contained in:
parent
22b31e0e82
commit
366c82f00c
67
pep-0540.txt
67
pep-0540.txt
|
@ -14,13 +14,11 @@ Python-Version: 3.7
|
|||
Abstract
|
||||
========
|
||||
|
||||
Add a new UTF-8 mode to ignore the locale and use the UTF-8 encoding.
|
||||
Add a new UTF-8 mode to ignore the locale, use the UTF-8 encoding, and
|
||||
change ``stdin`` and ``stdout`` error handlers to ``surrogateescape``.
|
||||
This mode is enabled by default in the POSIX locale, but otherwise
|
||||
disabled by default.
|
||||
|
||||
Add also a "strict" UTF-8 mode which uses the ``strict`` error handler,
|
||||
instead of ``surrogateescape``, with the UTF-8 encoding.
|
||||
|
||||
The new ``-X utf8`` command line option and ``PYTHONUTF8`` environment
|
||||
variable are added to control the UTF-8 mode.
|
||||
|
||||
|
@ -65,8 +63,9 @@ locale coercion is ineffective.
|
|||
Passthough undecodable bytes: surrogateescape
|
||||
---------------------------------------------
|
||||
|
||||
When using the ``strict`` error handler, which is the default, Python 3
|
||||
raises a ``UnicodeDecodeError`` on the first undecodable byte.
|
||||
When decoding bytes from UTF-8 using the ``strict`` error handler, which
|
||||
is the default, Python 3 raises a ``UnicodeDecodeError`` on the first
|
||||
undecodable byte.
|
||||
|
||||
Unix command line tools like ``cat`` or ``grep`` and most Python 2
|
||||
applications simply do not have this class of bugs: they don't decode
|
||||
|
@ -88,13 +87,6 @@ mode. The ``open()`` default error handler remains ``strict`` for these
|
|||
reasons.
|
||||
|
||||
|
||||
Strict UTF-8 for correctness
|
||||
----------------------------
|
||||
|
||||
When correctness matters more than usability, the ``strict`` error
|
||||
handler is preferred over ``surrogateescape`` to raise an encoding error
|
||||
at the first undecodable byte or unencodable character.
|
||||
|
||||
No change by default for best backward compatibility
|
||||
----------------------------------------------------
|
||||
|
||||
|
@ -113,19 +105,14 @@ are responsible for any potential mojibake issues caused by this mode.
|
|||
Proposal
|
||||
========
|
||||
|
||||
Add a new UTF-8 mode to ignore the locale and use the UTF-8 encoding
|
||||
with the ``surrogateescape`` error handler. This mode is enabled by
|
||||
default in the POSIX locale, but otherwise disabled by default.
|
||||
|
||||
Add also a "strict" UTF-8 mode which uses the ``strict`` error handler,
|
||||
instead of ``surrogateescape``, with the UTF-8 encoding.
|
||||
Add a new UTF-8 mode to ignore the locale, use the UTF-8 encoding, and
|
||||
change ``stdin`` and ``stdout`` error handlers to ``surrogateescape``.
|
||||
This mode is enabled by default in the POSIX locale, but otherwise
|
||||
disabled by default.
|
||||
|
||||
The new ``-X utf8`` command line option and ``PYTHONUTF8`` environment
|
||||
variable are added to control the UTF-8 mode:
|
||||
|
||||
* The UTF-8 mode is enabled by ``-X utf8`` or ``PYTHONUTF8=1``
|
||||
* The Strict UTF-8 mode is configured by ``-X utf8=strict`` or
|
||||
``PYTHONUTF8=strict``
|
||||
variable are added. The UTF-8 mode is enabled by ``-X utf8`` or
|
||||
``PYTHONUTF8=1``.
|
||||
|
||||
The POSIX locale enables the UTF-8 mode. In this case, the UTF-8 mode
|
||||
can be explicitly disabled by ``-X utf8=0`` or ``PYTHONUTF8=0``.
|
||||
|
@ -154,14 +141,14 @@ The UTF-8 mode changes the default encoding and error handler used by
|
|||
Encoding and error handler
|
||||
--------------------------
|
||||
|
||||
============================ ======================= ========================== ==========================
|
||||
Function Default UTF-8 mode or POSIX locale Strict UTF-8 mode
|
||||
============================ ======================= ========================== ==========================
|
||||
open() locale/strict **UTF-8**/strict **UTF-8**/strict
|
||||
os.fsdecode(), os.fsencode() locale/surrogateescape **UTF-8**/surrogateescape **UTF-8**/surrogateescape
|
||||
sys.stdin, sys.stdout locale/strict **UTF-8/surrogateescape** **UTF-8**/strict
|
||||
sys.stderr locale/backslashreplace **UTF-8**/backslashreplace **UTF-8**/backslashreplace
|
||||
============================ ======================= ========================== ==========================
|
||||
============================ ======================= ==========================
|
||||
Function Default UTF-8 mode or POSIX locale
|
||||
============================ ======================= ==========================
|
||||
open() locale/strict **UTF-8**/strict
|
||||
os.fsdecode(), os.fsencode() locale/surrogateescape **UTF-8**/surrogateescape
|
||||
sys.stdin, sys.stdout locale/strict **UTF-8/surrogateescape**
|
||||
sys.stderr locale/backslashreplace **UTF-8**/backslashreplace
|
||||
============================ ======================= ==========================
|
||||
|
||||
By comparison, Python 3.6 uses:
|
||||
|
||||
|
@ -179,14 +166,14 @@ Encoding and error handler on Windows
|
|||
|
||||
On Windows, the encodings and error handlers are different:
|
||||
|
||||
============================ ======================= ========================== ========================== ==========================
|
||||
Function Default Legacy Windows FS encoding UTF-8 mode Strict UTF-8 mode
|
||||
============================ ======================= ========================== ========================== ==========================
|
||||
open() mbcs/strict mbcs/strict **UTF-8**/strict **UTF-8**/strict
|
||||
os.fsdecode(), os.fsencode() UTF-8/surrogatepass **mbcs/replace** UTF-8/surrogatepass UTF-8/surrogatepass
|
||||
sys.stdin, sys.stdout UTF-8/surrogateescape UTF-8/surrogateescape UTF-8/surrogateescape **UTF-8/strict**
|
||||
sys.stderr UTF-8/backslashreplace UTF-8/backslashreplace UTF-8/backslashreplace UTF-8/backslashreplace
|
||||
============================ ======================= ========================== ========================== ==========================
|
||||
============================ ======================= ========================== ==========================
|
||||
Function Default Legacy Windows FS encoding UTF-8 mode
|
||||
============================ ======================= ========================== ==========================
|
||||
open() mbcs/strict mbcs/strict **UTF-8**/strict
|
||||
os.fsdecode(), os.fsencode() UTF-8/surrogatepass **mbcs/replace** UTF-8/surrogatepass
|
||||
sys.stdin, sys.stdout UTF-8/surrogateescape UTF-8/surrogateescape UTF-8/surrogateescape
|
||||
sys.stderr UTF-8/backslashreplace UTF-8/backslashreplace UTF-8/backslashreplace
|
||||
============================ ======================= ========================== ==========================
|
||||
|
||||
By comparison, Python 3.6 uses:
|
||||
|
||||
|
|
Loading…
Reference in New Issue