Fix docutils markup errors.

This commit is contained in:
Guido van Rossum 2016-09-01 22:13:33 -07:00
parent 5b6a8241de
commit ea846e4ba6
1 changed files with 9 additions and 9 deletions

View File

@ -53,14 +53,14 @@ platform APIs for accessing the filesystem all accept and return text encoded in
this format. However, prior to Windows NT (and possibly further back), the
native format was a configurable machine option and a separate set of APIs
existed to accept this format. The option (the "active code page") and these
APIs (the "*A functions") still exist in recent versions of Windows for
APIs (the "\*A functions") still exist in recent versions of Windows for
backwards compatibility, though new functionality often only has a utf-16-le API
(the "*W functions").
(the "\*W functions").
In Python, str is recommended because it can correctly round-trip all characters
used in paths (on POSIX with surrogateescape handling; on Windows because str
maps to the native representation). On Windows bytes cannot round-trip all
characters used in paths, as Python internally uses the *A functions and hence
characters used in paths, as Python internally uses the \*A functions and hence
the encoding is "whatever the active code page is". Since the active code page
cannot represent all Unicode characters, the conversion of a path into bytes can
lose information without warning or any available indication.
@ -104,20 +104,20 @@ Proposal
Currently the default filesystem encoding is 'mbcs', which is a meta-encoder
that uses the active code page. However, when bytes are passed to the filesystem
they go through the *A APIs and the operating system handles encoding. In this
they go through the \*A APIs and the operating system handles encoding. In this
case, paths are always encoded using the equivalent of 'mbcs:replace' - we have
no ability to change this (though there is a user/machine configuration option
to change the encoding from CP_ACP to CP_OEM, so it won't necessarily always
match mbcs...)
This proposal would remove all use of the *A APIs and only ever call the *W
This proposal would remove all use of the \*A APIs and only ever call the \*W
APIs. When Windows returns paths to Python as str, they will be decoded from
utf-16-le and returned as text (in whatever the minimal representation is). When
Windows returns paths to Python as bytes, they will be decoded from utf-16-le to
utf-8 using surrogatepass (Windows does not validate surrogate pairs, so it is
possible to have invalid surrogates in filenames). Equally, when paths are
provided as bytes, they are decoded from utf-8 into utf-16-le and passed to the
*W APIs.
\*W APIs.
The use of utf-8 will not be configurable, with the possible exception of a
"legacy mode" environment variable or X-flag.
@ -166,7 +166,7 @@ Remove unused ANSI code
Remove all code paths using the ``narrow`` field, as these will no longer be
reachable by any caller. These are only used within ``posixmodule.c``. Other
uses of paths should have use of bytes paths replaced with decoding and use of
the *W APIs.
the \*W APIs.
Add legacy mode
---------------
@ -175,7 +175,7 @@ Add a legacy mode flag, enabled by the environment variable
``PYTHONLEGACYWINDOWSFSENCODING``. When this flag is set, the default filesystem
encoding is set to mbcs rather than utf-8, and the error mode is set to
'replace' rather than 'strict'. The ``path_converter`` will continue to decode
to wide characters and only *W APIs will be called, however, the bytes passed in
to wide characters and only \*W APIs will be called, however, the bytes passed in
and received from Python will be encoded the same as prior to this change.
Undeprecate bytes paths on Windows
@ -197,7 +197,7 @@ This is essentially the same as the proposed change, but instead of changing
``sys.getfilesystemencoding()`` to utf-8 it is changed to mbcs (which
dynamically maps to the active code page).
This approach allows the use of new functionality that is only available as *W
This approach allows the use of new functionality that is only available as \*W
APIs and also detection of encoding/decoding errors. For example, rather than
silently replacing Unicode characters with '?', it would be possible to warn or
fail the operation.