Fix docutils markup errors.
This commit is contained in:
parent
5b6a8241de
commit
ea846e4ba6
18
pep-0529.txt
18
pep-0529.txt
|
@ -53,14 +53,14 @@ platform APIs for accessing the filesystem all accept and return text encoded in
|
|||
this format. However, prior to Windows NT (and possibly further back), the
|
||||
native format was a configurable machine option and a separate set of APIs
|
||||
existed to accept this format. The option (the "active code page") and these
|
||||
APIs (the "*A functions") still exist in recent versions of Windows for
|
||||
APIs (the "\*A functions") still exist in recent versions of Windows for
|
||||
backwards compatibility, though new functionality often only has a utf-16-le API
|
||||
(the "*W functions").
|
||||
(the "\*W functions").
|
||||
|
||||
In Python, str is recommended because it can correctly round-trip all characters
|
||||
used in paths (on POSIX with surrogateescape handling; on Windows because str
|
||||
maps to the native representation). On Windows bytes cannot round-trip all
|
||||
characters used in paths, as Python internally uses the *A functions and hence
|
||||
characters used in paths, as Python internally uses the \*A functions and hence
|
||||
the encoding is "whatever the active code page is". Since the active code page
|
||||
cannot represent all Unicode characters, the conversion of a path into bytes can
|
||||
lose information without warning or any available indication.
|
||||
|
@ -104,20 +104,20 @@ Proposal
|
|||
|
||||
Currently the default filesystem encoding is 'mbcs', which is a meta-encoder
|
||||
that uses the active code page. However, when bytes are passed to the filesystem
|
||||
they go through the *A APIs and the operating system handles encoding. In this
|
||||
they go through the \*A APIs and the operating system handles encoding. In this
|
||||
case, paths are always encoded using the equivalent of 'mbcs:replace' - we have
|
||||
no ability to change this (though there is a user/machine configuration option
|
||||
to change the encoding from CP_ACP to CP_OEM, so it won't necessarily always
|
||||
match mbcs...)
|
||||
|
||||
This proposal would remove all use of the *A APIs and only ever call the *W
|
||||
This proposal would remove all use of the \*A APIs and only ever call the \*W
|
||||
APIs. When Windows returns paths to Python as str, they will be decoded from
|
||||
utf-16-le and returned as text (in whatever the minimal representation is). When
|
||||
Windows returns paths to Python as bytes, they will be decoded from utf-16-le to
|
||||
utf-8 using surrogatepass (Windows does not validate surrogate pairs, so it is
|
||||
possible to have invalid surrogates in filenames). Equally, when paths are
|
||||
provided as bytes, they are decoded from utf-8 into utf-16-le and passed to the
|
||||
*W APIs.
|
||||
\*W APIs.
|
||||
|
||||
The use of utf-8 will not be configurable, with the possible exception of a
|
||||
"legacy mode" environment variable or X-flag.
|
||||
|
@ -166,7 +166,7 @@ Remove unused ANSI code
|
|||
Remove all code paths using the ``narrow`` field, as these will no longer be
|
||||
reachable by any caller. These are only used within ``posixmodule.c``. Other
|
||||
uses of paths should have use of bytes paths replaced with decoding and use of
|
||||
the *W APIs.
|
||||
the \*W APIs.
|
||||
|
||||
Add legacy mode
|
||||
---------------
|
||||
|
@ -175,7 +175,7 @@ Add a legacy mode flag, enabled by the environment variable
|
|||
``PYTHONLEGACYWINDOWSFSENCODING``. When this flag is set, the default filesystem
|
||||
encoding is set to mbcs rather than utf-8, and the error mode is set to
|
||||
'replace' rather than 'strict'. The ``path_converter`` will continue to decode
|
||||
to wide characters and only *W APIs will be called, however, the bytes passed in
|
||||
to wide characters and only \*W APIs will be called, however, the bytes passed in
|
||||
and received from Python will be encoded the same as prior to this change.
|
||||
|
||||
Undeprecate bytes paths on Windows
|
||||
|
@ -197,7 +197,7 @@ This is essentially the same as the proposed change, but instead of changing
|
|||
``sys.getfilesystemencoding()`` to utf-8 it is changed to mbcs (which
|
||||
dynamically maps to the active code page).
|
||||
|
||||
This approach allows the use of new functionality that is only available as *W
|
||||
This approach allows the use of new functionality that is only available as \*W
|
||||
APIs and also detection of encoding/decoding errors. For example, rather than
|
||||
silently replacing Unicode characters with '?', it would be possible to warn or
|
||||
fail the operation.
|
||||
|
|
Loading…
Reference in New Issue