Update 540 for Windows

Describe encodings and error handlers used on Windows and the
priority of PYTHONLEGACYWINDOWSFSENCODING.
This commit is contained in:
Victor Stinner 2017-01-12 13:26:21 +01:00
parent dc6b4a07f4
commit b9a2a993fe
1 changed files with 59 additions and 10 deletions

View File

@ -291,9 +291,24 @@ with an error.
The POSIX locale enables the UTF-8 mode. In this case, the UTF-8 mode
can be explicitly disabled by ``-X utf8=0`` or ``PYTHONUTF8=0``.
The ``-X utf8`` has the priority over the ``PYTHONUTF8`` environment
variable. For example, ``PYTHONUTF8=0 python3 -X utf8`` enables the
UTF-8 mode.
Options priority for the UTF-8 mode:
* ``PYTHONLEGACYWINDOWSFSENCODING``
* ``-X utf8``
* ``PYTHONUTF8``
* POSIX locale
For example, ``PYTHONUTF8=0 python3 -X utf8`` enables the UTF-8 mode,
whereas ``LC_ALL=C python3.7 -X utf8=0`` disables the UTF-8 mode and so
use the encoding of the POSIX locale.
Encodings used by ``open()``, highest priority first:
* *encoding* and *errors* parameters (if set)
* UTF-8 mode
* os.device_encoding(fd)
* os.getpreferredencoding(False)
Encoding and error handler
--------------------------
@ -303,7 +318,7 @@ open(), os.fsdecode(), os.fsencode(), sys.stdin, sys.stdout and
sys.stderr:
============================ ======================= ========================== ==========================
Function Default UTF-8 or POSIX locale UTF-8 Strict
Function Default UTF-8 mode or POSIX locale UTF-8 Strict mode
============================ ======================= ========================== ==========================
open() locale/strict **UTF-8/surrogateescape** **UTF-8**/strict
os.fsdecode(), os.fsencode() locale/surrogateescape **UTF-8**/surrogateescape **UTF-8**/surrogateescape
@ -326,16 +341,50 @@ The UTF-8 mode uses the ``surrogateescape`` error handler instead of the
strict mode for convenience: the idea is that data not encoded to UTF-8
are passed through "Python" without being modified, as raw bytes.
The ``PYTHONIOENCODING`` environment variable has the priority on the
The ``PYTHONIOENCODING`` environment variable has the priority over the
UTF-8 mode for standard streams. For example, ``PYTHONIOENCODING=latin1
python3 -X utf8`` uses the Latin1 encoding for stdin, stdout and stderr.
Encodings used by ``open()``, highest priority first:
Encoding and error handler on Windows
-------------------------------------
On Windows, the encodings and error handlers are different:
============================ ======================= ========================== ========================== ==========================
Function Default Legacy Windows FS encoding UTF-8 mode UTF-8 Strict mode
============================ ======================= ========================== ========================== ==========================
open() mbcs/strict mbcs/strict **UTF-8/surrogateescape** **UTF-8**/strict
os.fsdecode(), os.fsencode() UTF-8/surrogatepass **mbcs/replace** UTF-8/surrogatepass UTF-8/surrogatepass
sys.stdin, sys.stdout UTF-8/surrogateescape UTF-8/surrogateescape UTF-8/surrogateescape **UTF-8/strict**
sys.stderr UTF-8/backslashreplace UTF-8/backslashreplace UTF-8/backslashreplace UTF-8/backslashreplace
============================ ======================= ========================== ========================== ==========================
By comparison, Python 3.6 uses:
============================ ======================= ==========================
Function Default Legacy Windows FS encoding
============================ ======================= ==========================
open() mbcs/strict mbcs/strict
os.fsdecode(), os.fsencode() UTF-8/surrogatepass **mbcs/replace**
sys.stdin, sys.stdout UTF-8/surrogateescape UTF-8/surrogateescape
sys.stderr UTF-8/backslashreplace UTF-8/backslashreplace
============================ ======================= ==========================
The "Legacy Windows FS encoding" is enabled by setting the
``PYTHONLEGACYWINDOWSFSENCODING`` environment variable to ``1``, see the
`PEP 529`.
Enabling the legacy Windows filesystem encoding disables the UTF-8 mode
(as ``-X utf8=0``).
If stdin and/or stdout is redirected to a pipe, sys.stdin and/or
sys.output uses ``mbcs`` encoding by default, rather than UTF-8. But
with the UTF-8 mode, sys.stdin and sys.stdout always use the UTF-8
encoding.
There is no POSIX locale on Windows. The ANSI code page is used to the
locale encoding, and this code page never uses the ASCII encoding.
* *encoding* and *errors* parameters (if set)
* UTF-8 mode
* os.device_encoding(fd)
* os.getpreferredencoding(False)
Rationale
---------