PEP 540: open() error handler remains strict
This commit is contained in:
parent
3b263c0ae1
commit
22b31e0e82
30
pep-0540.txt
30
pep-0540.txt
|
@ -14,9 +14,9 @@ Python-Version: 3.7
|
|||
Abstract
|
||||
========
|
||||
|
||||
Add a new UTF-8 mode to ignore the locale and use the UTF-8 encoding
|
||||
with the ``surrogateescape`` error handler. This mode is enabled by
|
||||
default in the POSIX locale, but otherwise disabled by default.
|
||||
Add a new UTF-8 mode to ignore the locale and use the UTF-8 encoding.
|
||||
This mode is enabled by default in the POSIX locale, but otherwise
|
||||
disabled by default.
|
||||
|
||||
Add also a "strict" UTF-8 mode which uses the ``strict`` error handler,
|
||||
instead of ``surrogateescape``, with the UTF-8 encoding.
|
||||
|
@ -65,10 +65,8 @@ locale coercion is ineffective.
|
|||
Passthough undecodable bytes: surrogateescape
|
||||
---------------------------------------------
|
||||
|
||||
Using UTF-8 is nice, until you read the first file encoded to a
|
||||
different encoding. When using the ``strict`` error handler, which is
|
||||
the default, Python 3 raises a ``UnicodeDecodeError`` on the first
|
||||
undecodable byte.
|
||||
When using the ``strict`` error handler, which is the default, Python 3
|
||||
raises a ``UnicodeDecodeError`` on the first undecodable byte.
|
||||
|
||||
Unix command line tools like ``cat`` or ``grep`` and most Python 2
|
||||
applications simply do not have this class of bugs: they don't decode
|
||||
|
@ -79,12 +77,16 @@ the ``surrogateescape`` error handler (:pep:`383`). It allows to process
|
|||
data "as bytes" but uses Unicode in practice (undecodable bytes are
|
||||
stored as surrogate characters).
|
||||
|
||||
For an application written as a Unix "pipe" tool like ``grep``, taking
|
||||
input on stdin and writing output to stdout, ``surrogateescape`` allows
|
||||
to "passthrough" undecodable bytes.
|
||||
The UTF-8 mode uses the ``surrogateescape`` error handler for ``stdin``
|
||||
and ``stdout`` since these streams as commonly associated to Unix
|
||||
command line tools.
|
||||
|
||||
However, users have a different expectation on files. Files are expected
|
||||
to be properly encoded. Python is expected to fail early when ``open()``
|
||||
is called with the wrong options, like opening a JPEG picture in text
|
||||
mode. The ``open()`` default error handler remains ``strict`` for these
|
||||
reasons.
|
||||
|
||||
The UTF-8 encoding used with the ``surrogateescape`` error handler is a
|
||||
compromise between correctness and usability.
|
||||
|
||||
Strict UTF-8 for correctness
|
||||
----------------------------
|
||||
|
@ -155,7 +157,7 @@ Encoding and error handler
|
|||
============================ ======================= ========================== ==========================
|
||||
Function Default UTF-8 mode or POSIX locale Strict UTF-8 mode
|
||||
============================ ======================= ========================== ==========================
|
||||
open() locale/strict **UTF-8/surrogateescape** **UTF-8**/strict
|
||||
open() locale/strict **UTF-8**/strict **UTF-8**/strict
|
||||
os.fsdecode(), os.fsencode() locale/surrogateescape **UTF-8**/surrogateescape **UTF-8**/surrogateescape
|
||||
sys.stdin, sys.stdout locale/strict **UTF-8/surrogateescape** **UTF-8**/strict
|
||||
sys.stderr locale/backslashreplace **UTF-8**/backslashreplace **UTF-8**/backslashreplace
|
||||
|
@ -180,7 +182,7 @@ On Windows, the encodings and error handlers are different:
|
|||
============================ ======================= ========================== ========================== ==========================
|
||||
Function Default Legacy Windows FS encoding UTF-8 mode Strict UTF-8 mode
|
||||
============================ ======================= ========================== ========================== ==========================
|
||||
open() mbcs/strict mbcs/strict **UTF-8/surrogateescape** **UTF-8**/strict
|
||||
open() mbcs/strict mbcs/strict **UTF-8**/strict **UTF-8**/strict
|
||||
os.fsdecode(), os.fsencode() UTF-8/surrogatepass **mbcs/replace** UTF-8/surrogatepass UTF-8/surrogatepass
|
||||
sys.stdin, sys.stdout UTF-8/surrogateescape UTF-8/surrogateescape UTF-8/surrogateescape **UTF-8/strict**
|
||||
sys.stderr UTF-8/backslashreplace UTF-8/backslashreplace UTF-8/backslashreplace UTF-8/backslashreplace
|
||||
|
|
Loading…
Reference in New Issue