Update 540 for Windows
Describe encodings and error handlers used on Windows and the priority of PYTHONLEGACYWINDOWSFSENCODING.
This commit is contained in:
parent
dc6b4a07f4
commit
b9a2a993fe
69
pep-0540.txt
69
pep-0540.txt
|
@ -291,9 +291,24 @@ with an error.
|
|||
The POSIX locale enables the UTF-8 mode. In this case, the UTF-8 mode
|
||||
can be explicitly disabled by ``-X utf8=0`` or ``PYTHONUTF8=0``.
|
||||
|
||||
The ``-X utf8`` has the priority over the ``PYTHONUTF8`` environment
|
||||
variable. For example, ``PYTHONUTF8=0 python3 -X utf8`` enables the
|
||||
UTF-8 mode.
|
||||
Options priority for the UTF-8 mode:
|
||||
|
||||
* ``PYTHONLEGACYWINDOWSFSENCODING``
|
||||
* ``-X utf8``
|
||||
* ``PYTHONUTF8``
|
||||
* POSIX locale
|
||||
|
||||
For example, ``PYTHONUTF8=0 python3 -X utf8`` enables the UTF-8 mode,
|
||||
whereas ``LC_ALL=C python3.7 -X utf8=0`` disables the UTF-8 mode and so
|
||||
use the encoding of the POSIX locale.
|
||||
|
||||
Encodings used by ``open()``, highest priority first:
|
||||
|
||||
* *encoding* and *errors* parameters (if set)
|
||||
* UTF-8 mode
|
||||
* os.device_encoding(fd)
|
||||
* os.getpreferredencoding(False)
|
||||
|
||||
|
||||
Encoding and error handler
|
||||
--------------------------
|
||||
|
@ -303,7 +318,7 @@ open(), os.fsdecode(), os.fsencode(), sys.stdin, sys.stdout and
|
|||
sys.stderr:
|
||||
|
||||
============================ ======================= ========================== ==========================
|
||||
Function Default UTF-8 or POSIX locale UTF-8 Strict
|
||||
Function Default UTF-8 mode or POSIX locale UTF-8 Strict mode
|
||||
============================ ======================= ========================== ==========================
|
||||
open() locale/strict **UTF-8/surrogateescape** **UTF-8**/strict
|
||||
os.fsdecode(), os.fsencode() locale/surrogateescape **UTF-8**/surrogateescape **UTF-8**/surrogateescape
|
||||
|
@ -326,16 +341,50 @@ The UTF-8 mode uses the ``surrogateescape`` error handler instead of the
|
|||
strict mode for convenience: the idea is that data not encoded to UTF-8
|
||||
are passed through "Python" without being modified, as raw bytes.
|
||||
|
||||
The ``PYTHONIOENCODING`` environment variable has the priority on the
|
||||
The ``PYTHONIOENCODING`` environment variable has the priority over the
|
||||
UTF-8 mode for standard streams. For example, ``PYTHONIOENCODING=latin1
|
||||
python3 -X utf8`` uses the Latin1 encoding for stdin, stdout and stderr.
|
||||
|
||||
Encodings used by ``open()``, highest priority first:
|
||||
Encoding and error handler on Windows
|
||||
-------------------------------------
|
||||
|
||||
On Windows, the encodings and error handlers are different:
|
||||
|
||||
============================ ======================= ========================== ========================== ==========================
|
||||
Function Default Legacy Windows FS encoding UTF-8 mode UTF-8 Strict mode
|
||||
============================ ======================= ========================== ========================== ==========================
|
||||
open() mbcs/strict mbcs/strict **UTF-8/surrogateescape** **UTF-8**/strict
|
||||
os.fsdecode(), os.fsencode() UTF-8/surrogatepass **mbcs/replace** UTF-8/surrogatepass UTF-8/surrogatepass
|
||||
sys.stdin, sys.stdout UTF-8/surrogateescape UTF-8/surrogateescape UTF-8/surrogateescape **UTF-8/strict**
|
||||
sys.stderr UTF-8/backslashreplace UTF-8/backslashreplace UTF-8/backslashreplace UTF-8/backslashreplace
|
||||
============================ ======================= ========================== ========================== ==========================
|
||||
|
||||
By comparison, Python 3.6 uses:
|
||||
|
||||
============================ ======================= ==========================
|
||||
Function Default Legacy Windows FS encoding
|
||||
============================ ======================= ==========================
|
||||
open() mbcs/strict mbcs/strict
|
||||
os.fsdecode(), os.fsencode() UTF-8/surrogatepass **mbcs/replace**
|
||||
sys.stdin, sys.stdout UTF-8/surrogateescape UTF-8/surrogateescape
|
||||
sys.stderr UTF-8/backslashreplace UTF-8/backslashreplace
|
||||
============================ ======================= ==========================
|
||||
|
||||
The "Legacy Windows FS encoding" is enabled by setting the
|
||||
``PYTHONLEGACYWINDOWSFSENCODING`` environment variable to ``1``, see the
|
||||
`PEP 529`.
|
||||
|
||||
Enabling the legacy Windows filesystem encoding disables the UTF-8 mode
|
||||
(as ``-X utf8=0``).
|
||||
|
||||
If stdin and/or stdout is redirected to a pipe, sys.stdin and/or
|
||||
sys.output uses ``mbcs`` encoding by default, rather than UTF-8. But
|
||||
with the UTF-8 mode, sys.stdin and sys.stdout always use the UTF-8
|
||||
encoding.
|
||||
|
||||
There is no POSIX locale on Windows. The ANSI code page is used to the
|
||||
locale encoding, and this code page never uses the ASCII encoding.
|
||||
|
||||
* *encoding* and *errors* parameters (if set)
|
||||
* UTF-8 mode
|
||||
* os.device_encoding(fd)
|
||||
* os.getpreferredencoding(False)
|
||||
|
||||
Rationale
|
||||
---------
|
||||
|
|
Loading…
Reference in New Issue