Convert PEPs 519, 528 and 529 from CRLF to LF line endings. (#236)

2017-04-02 00:04:46 +03:00 · 2017-04-02 00:04:46 +03:00 · d675175520
parent 425a46fb20
commit d675175520
3 changed files with 1192 additions and 1192 deletions
--- a/pep-0519.txt
+++ b/pep-0519.txt
--- a/pep-0528.txt
+++ b/pep-0528.txt
@ -1,182 +1,182 @@
-PEP: 528
-Title: Change Windows console encoding to UTF-8
-Version: $Revision$
-Last-Modified: $Date$
-Author: Steve Dower <steve.dower@python.org>
-Status: Final
-Type: Standards Track
-Content-Type: text/x-rst
-Created: 27-Aug-2016
-Python-Version: 3.6
-Post-History: 01-Sep-2016, 04-Sep-2016
-Resolution: https://mail.python.org/pipermail/python-dev/2016-September/146278.html
-
-Abstract
-========
-
-Historically, Python uses the ANSI APIs for interacting with the Windows
-operating system, often via C Runtime functions. However, these have been long
-discouraged in favor of the UTF-16 APIs. Within the operating system, all text
-is represented as UTF-16, and the ANSI APIs perform encoding and decoding using
-the active code page.
-
-This PEP proposes changing the default standard stream implementation on Windows
-to use the Unicode APIs. This will allow users to print and input the full range
-of Unicode characters at the default Windows console. This also requires a
-subtle change to how the tokenizer parses text from readline hooks.
-
-Specific Changes
-================
-
-Add _io.WindowsConsoleIO
------------------------
-
-Currently an instance of ``_io.FileIO`` is used to wrap the file descriptors
-representing standard input, output and error. We add a new class (implemented
-in C) ``_io.WindowsConsoleIO`` that acts as a raw IO object using the Windows
-console functions, specifically, ``ReadConsoleW`` and ``WriteConsoleW``.
-
-This class will be used when the legacy-mode flag is not in effect, when opening
-a standard stream by file descriptor and the stream is a console buffer rather
-than a redirected file. Otherwise, ``_io.FileIO`` will be used as it is today.
-
-This is a raw (bytes) IO class that requires text to be passed encoded with
-utf-8, which will be decoded to utf-16-le and passed to the Windows APIs.
-Similarly, bytes read from the class will be provided by the operating system as
-utf-16-le and converted into utf-8 when returned to Python.
-
-The use of an ASCII compatible encoding is required to maintain compatibility
-with code that bypasses the ``TextIOWrapper`` and directly writes ASCII bytes to
-the standard streams (for example, `Twisted's process_stdinreader.py`_). Code that assumes
-a particular encoding for the standard streams other than ASCII will likely
-break.
-
-Add _PyOS_WindowsConsoleReadline
--------------------------------
-
-To allow Unicode entry at the interactive prompt, a new readline hook is
-required. The existing ``PyOS_StdioReadline`` function will delegate to the new
-``_PyOS_WindowsConsoleReadline`` function when reading from a file descriptor
-that is a console buffer and the legacy-mode flag is not in effect (the logic
-should be identical to above).
-
-Since the readline interface is required to return an 8-bit encoded string with
-no embedded nulls, the ``_PyOS_WindowsConsoleReadline`` function transcodes from
-utf-16-le as read from the operating system into utf-8.
-
-The function ``PyRun_InteractiveOneObject`` which currently obtains the encoding
-from ``sys.stdin`` will select utf-8 unless the legacy-mode flag is in effect.
-This may require readline hooks to change their encodings to utf-8, or to
-require legacy-mode for correct behaviour.
-
-Add legacy mode
---------------
-
-Launching Python with the environment variable ``PYTHONLEGACYWINDOWSSTDIO`` set
-will enable the legacy-mode flag, which completely restores the previous
-behaviour.
-
-Alternative Approaches
-======================
-
-The `win_unicode_console package`_ is a pure-Python alternative to changing the
-default behaviour of the console. It implements essentially the same
-modifications as described here using pure Python code.
-
-Code that may break
-===================
-
-The following code patterns may break or see different behaviour as a result of
-this change. All of these code samples require explicitly choosing to use a raw
-file object in place of a more convenient wrapper that would prevent any visible
-change.
-
-Assuming stdin/stdout encoding
------------------------------
-
-Code that assumes that the encoding required by ``sys.stdin.buffer`` or
-``sys.stdout.buffer`` is ``'mbcs'`` or a more specific encoding may currently be
-working by chance, but could encounter issues under this change. For example::
-
-    >>> sys.stdout.buffer.write(text.encode('mbcs'))
-    >>> r = sys.stdin.buffer.read(16).decode('cp437')
-
-To correct this code, the encoding specified on the ``TextIOWrapper`` should be
-used, either implicitly or explicitly::
-
-    >>> # Fix 1: Use wrapper correctly
-    >>> sys.stdout.write(text)
-    >>> r = sys.stdin.read(16)
-
-    >>> # Fix 2: Use encoding explicitly
-    >>> sys.stdout.buffer.write(text.encode(sys.stdout.encoding))
-    >>> r = sys.stdin.buffer.read(16).decode(sys.stdin.encoding)
-
-Incorrectly using the raw object
--------------------------------
-
-Code that uses the raw IO object and does not correctly handle partial reads and
-writes may be affected. This is particularly important for reads, where the
-number of characters read will never exceed one-fourth of the number of bytes
-allowed, as there is no feasible way to prevent input from encoding as much
-longer utf-8 strings::
-
-    >>> raw_stdin = sys.stdin.buffer.raw
-    >>> data = raw_stdin.read(15)
-    abcdefghijklm
-    b'abc'
-    # data contains at most 3 characters, and never more than 12 bytes
-    # error, as "defghijklm\r\n" is passed to the interactive prompt
-
-To correct this code, the buffered reader/writer should be used, or the caller
-should continue reading until its buffer is full::
-
-    >>> # Fix 1: Use the buffered reader/writer
-    >>> stdin = sys.stdin.buffer
-    >>> data = stdin.read(15)
-    abcedfghijklm
-    b'abcdefghijklm\r\n'
-
-    >>> # Fix 2: Loop until enough bytes have been read
-    >>> raw_stdin = sys.stdin.buffer.raw
-    >>> b = b''
-    >>> while len(b) < 15:
-    ...     b += raw_stdin.read(15)
-    abcedfghijklm
-    b'abcdefghijklm\r\n'
-
-Using the raw object with small buffers
---------------------------------------
-
-Code that uses the raw IO object and attempts to read less than four characters
-will now receive an error. Because it's possible that any single character may
-require up to four bytes when represented in utf-8, requests must fail::
-
-    >>> raw_stdin = sys.stdin.buffer.raw
-    >>> data = raw_stdin.read(3)
-    Traceback (most recent call last):
-      File "<stdin>", line 1, in <module>
-    ValueError: must read at least 4 bytes
-
-The only workaround is to pass a larger buffer::
-
-    >>> # Fix: Request at least four bytes
-    >>> raw_stdin = sys.stdin.buffer.raw
-    >>> data = raw_stdin.read(4)
-    a
-    b'a'
-    >>> >>>
-
-(The extra ``>>>`` is due to the newline remaining in the input buffer and is
-expected in this situation.)
-
-Copyright
-=========
-
-This document has been placed in the public domain.
-
-References
-==========
-
-.. _Twisted's process_stdinreader.py: https://github.com/twisted/twisted/blob/trunk/src/twisted/test/process_stdinreader.py
-.. _win_unicode_console package: https://pypi.org/project/win_unicode_console/
+PEP: 528
+Title: Change Windows console encoding to UTF-8
+Version: $Revision$
+Last-Modified: $Date$
+Author: Steve Dower <steve.dower@python.org>
+Status: Final
+Type: Standards Track
+Content-Type: text/x-rst
+Created: 27-Aug-2016
+Python-Version: 3.6
+Post-History: 01-Sep-2016, 04-Sep-2016
+Resolution: https://mail.python.org/pipermail/python-dev/2016-September/146278.html
+
+Abstract
+========
+
+Historically, Python uses the ANSI APIs for interacting with the Windows
+operating system, often via C Runtime functions. However, these have been long
+discouraged in favor of the UTF-16 APIs. Within the operating system, all text
+is represented as UTF-16, and the ANSI APIs perform encoding and decoding using
+the active code page.
+
+This PEP proposes changing the default standard stream implementation on Windows
+to use the Unicode APIs. This will allow users to print and input the full range
+of Unicode characters at the default Windows console. This also requires a
+subtle change to how the tokenizer parses text from readline hooks.
+
+Specific Changes
+================
+
+Add _io.WindowsConsoleIO
+------------------------
+
+Currently an instance of ``_io.FileIO`` is used to wrap the file descriptors
+representing standard input, output and error. We add a new class (implemented
+in C) ``_io.WindowsConsoleIO`` that acts as a raw IO object using the Windows
+console functions, specifically, ``ReadConsoleW`` and ``WriteConsoleW``.
+
+This class will be used when the legacy-mode flag is not in effect, when opening
+a standard stream by file descriptor and the stream is a console buffer rather
+than a redirected file. Otherwise, ``_io.FileIO`` will be used as it is today.
+
+This is a raw (bytes) IO class that requires text to be passed encoded with
+utf-8, which will be decoded to utf-16-le and passed to the Windows APIs.
+Similarly, bytes read from the class will be provided by the operating system as
+utf-16-le and converted into utf-8 when returned to Python.
+
+The use of an ASCII compatible encoding is required to maintain compatibility
+with code that bypasses the ``TextIOWrapper`` and directly writes ASCII bytes to
+the standard streams (for example, `Twisted's process_stdinreader.py`_). Code that assumes
+a particular encoding for the standard streams other than ASCII will likely
+break.
+
+Add _PyOS_WindowsConsoleReadline
+--------------------------------
+
+To allow Unicode entry at the interactive prompt, a new readline hook is
+required. The existing ``PyOS_StdioReadline`` function will delegate to the new
+``_PyOS_WindowsConsoleReadline`` function when reading from a file descriptor
+that is a console buffer and the legacy-mode flag is not in effect (the logic
+should be identical to above).
+
+Since the readline interface is required to return an 8-bit encoded string with
+no embedded nulls, the ``_PyOS_WindowsConsoleReadline`` function transcodes from
+utf-16-le as read from the operating system into utf-8.
+
+The function ``PyRun_InteractiveOneObject`` which currently obtains the encoding
+from ``sys.stdin`` will select utf-8 unless the legacy-mode flag is in effect.
+This may require readline hooks to change their encodings to utf-8, or to
+require legacy-mode for correct behaviour.
+
+Add legacy mode
+---------------
+
+Launching Python with the environment variable ``PYTHONLEGACYWINDOWSSTDIO`` set
+will enable the legacy-mode flag, which completely restores the previous
+behaviour.
+
+Alternative Approaches
+======================
+
+The `win_unicode_console package`_ is a pure-Python alternative to changing the
+default behaviour of the console. It implements essentially the same
+modifications as described here using pure Python code.
+
+Code that may break
+===================
+
+The following code patterns may break or see different behaviour as a result of
+this change. All of these code samples require explicitly choosing to use a raw
+file object in place of a more convenient wrapper that would prevent any visible
+change.
+
+Assuming stdin/stdout encoding
+------------------------------
+
+Code that assumes that the encoding required by ``sys.stdin.buffer`` or
+``sys.stdout.buffer`` is ``'mbcs'`` or a more specific encoding may currently be
+working by chance, but could encounter issues under this change. For example::
+
+    >>> sys.stdout.buffer.write(text.encode('mbcs'))
+    >>> r = sys.stdin.buffer.read(16).decode('cp437')
+
+To correct this code, the encoding specified on the ``TextIOWrapper`` should be
+used, either implicitly or explicitly::
+
+    >>> # Fix 1: Use wrapper correctly
+    >>> sys.stdout.write(text)
+    >>> r = sys.stdin.read(16)
+
+    >>> # Fix 2: Use encoding explicitly
+    >>> sys.stdout.buffer.write(text.encode(sys.stdout.encoding))
+    >>> r = sys.stdin.buffer.read(16).decode(sys.stdin.encoding)
+
+Incorrectly using the raw object
+--------------------------------
+
+Code that uses the raw IO object and does not correctly handle partial reads and
+writes may be affected. This is particularly important for reads, where the
+number of characters read will never exceed one-fourth of the number of bytes
+allowed, as there is no feasible way to prevent input from encoding as much
+longer utf-8 strings::
+
+    >>> raw_stdin = sys.stdin.buffer.raw
+    >>> data = raw_stdin.read(15)
+    abcdefghijklm
+    b'abc'
+    # data contains at most 3 characters, and never more than 12 bytes
+    # error, as "defghijklm\r\n" is passed to the interactive prompt
+
+To correct this code, the buffered reader/writer should be used, or the caller
+should continue reading until its buffer is full::
+
+    >>> # Fix 1: Use the buffered reader/writer
+    >>> stdin = sys.stdin.buffer
+    >>> data = stdin.read(15)
+    abcedfghijklm
+    b'abcdefghijklm\r\n'
+
+    >>> # Fix 2: Loop until enough bytes have been read
+    >>> raw_stdin = sys.stdin.buffer.raw
+    >>> b = b''
+    >>> while len(b) < 15:
+    ...     b += raw_stdin.read(15)
+    abcedfghijklm
+    b'abcdefghijklm\r\n'
+
+Using the raw object with small buffers
+---------------------------------------
+
+Code that uses the raw IO object and attempts to read less than four characters
+will now receive an error. Because it's possible that any single character may
+require up to four bytes when represented in utf-8, requests must fail::
+
+    >>> raw_stdin = sys.stdin.buffer.raw
+    >>> data = raw_stdin.read(3)
+    Traceback (most recent call last):
+      File "<stdin>", line 1, in <module>
+    ValueError: must read at least 4 bytes
+
+The only workaround is to pass a larger buffer::
+
+    >>> # Fix: Request at least four bytes
+    >>> raw_stdin = sys.stdin.buffer.raw
+    >>> data = raw_stdin.read(4)
+    a
+    b'a'
+    >>> >>>
+
+(The extra ``>>>`` is due to the newline remaining in the input buffer and is
+expected in this situation.)
+
+Copyright
+=========
+
+This document has been placed in the public domain.
+
+References
+==========
+
+.. _Twisted's process_stdinreader.py: https://github.com/twisted/twisted/blob/trunk/src/twisted/test/process_stdinreader.py
+.. _win_unicode_console package: https://pypi.org/project/win_unicode_console/
--- a/pep-0529.txt
+++ b/pep-0529.txt
@ -1,453 +1,453 @@
-PEP: 529
-Title: Change Windows filesystem encoding to UTF-8
-Version: $Revision$
-Last-Modified: $Date$
-Author: Steve Dower <steve.dower@python.org>
-Status: Final
-Type: Standards Track
-Content-Type: text/x-rst
-Created: 27-Aug-2016
-Python-Version: 3.6
-Post-History: 01-Sep-2016, 04-Sep-2016
-Resolution: https://mail.python.org/pipermail/python-dev/2016-September/146277.html
-
-Abstract
-========
-
-Historically, Python uses the ANSI APIs for interacting with the Windows
-operating system, often via C Runtime functions. However, these have been long
-discouraged in favor of the UTF-16 APIs. Within the operating system, all text
-is represented as UTF-16, and the ANSI APIs perform encoding and decoding using
-the active code page. See `Naming Files, Paths, and Namespaces`_ for
-more details.
-
-This PEP proposes changing the default filesystem encoding on Windows to utf-8,
-and changing all filesystem functions to use the Unicode APIs for filesystem
-paths. This will not affect code that uses strings to represent paths, however
-those that use bytes for paths will now be able to correctly round-trip all
-valid paths in Windows filesystems. Currently, the conversions between Unicode
-(in the OS) and bytes (in Python) were lossy and would fail to round-trip
-characters outside of the user's active code page.
-
-Notably, this does not impact the encoding of the contents of files. These will
-continue to default to ``locale.getpreferredencoding()`` (for text files) or
-plain bytes (for binary files). This only affects the encoding used when users
-pass a bytes object to Python where it is then passed to the operating system as
-a path name.
-
-Background
-==========
-
-File system paths are almost universally represented as text with an encoding
-determined by the file system. In Python, we expose these paths via a number of
-interfaces, such as the ``os`` and ``io`` modules. Paths may be passed either
-direction across these interfaces, that is, from the filesystem to the
-application (for example, ``os.listdir()``), or from the application to the
-filesystem (for example, ``os.unlink()``).
-
-When paths are passed between the filesystem and the application, they are
-either passed through as a bytes blob or converted to/from str using
-``os.fsencode()`` and ``os.fsdecode()`` or explicit encoding using
-``sys.getfilesystemencoding()``. The result of encoding a string with
-``sys.getfilesystemencoding()`` is a blob of bytes in the native format for the
-default file system.
-
-On Windows, the native format for the filesystem is utf-16-le. The recommended
-platform APIs for accessing the filesystem all accept and return text encoded in
-this format. However, prior to Windows NT (and possibly further back), the
-native format was a configurable machine option and a separate set of APIs
-existed to accept this format. The option (the "active code page") and these
-APIs (the "\*A functions") still exist in recent versions of Windows for
-backwards compatibility, though new functionality often only has a utf-16-le API
-(the "\*W functions").
-
-In Python, str is recommended because it can correctly round-trip all characters
-used in paths (on POSIX with surrogateescape handling; on Windows because str
-maps to the native representation). On Windows bytes cannot round-trip all
-characters used in paths, as Python internally uses the \*A functions and hence
-the encoding is "whatever the active code page is". Since the active code page
-cannot represent all Unicode characters, the conversion of a path into bytes can
-lose information without warning or any available indication.
-
-As a demonstration of this::
-
-    >>> open('test\uAB00.txt', 'wb').close()
-    >>> import glob
-    >>> glob.glob('test*')
-    ['test\uab00.txt']
-    >>> glob.glob(b'test*')
-    [b'test?.txt']
-
-The Unicode character in the second call to glob has been replaced by a '?',
-which means passing the path back into the filesystem will result in a
-``FileNotFoundError``. The same results may be observed with ``os.listdir()`` or
-any function that matches the return type to the parameter type.
-
-While one user-accessible fix is to use str everywhere, POSIX systems generally
-do not suffer from data loss when using bytes exclusively as the bytes are the
-canonical representation. Even if the encoding is "incorrect" by some standard,
-the file system will still map the bytes back to the file. Making use of this
-avoids the cost of decoding and reencoding, such that (theoretically, and only
-on POSIX), code such as this may be faster because of the use of ``b'.'``
-compared to using ``'.'``::
-
-    >>> for f in os.listdir(b'.'):
-    ...     os.stat(f)
-    ...
-
-As a result, POSIX-focused library authors prefer to use bytes to represent
-paths. For some authors it is also a convenience, as their code may receive
-bytes already known to be encoded correctly, while others are attempting to
-simplify porting their code from Python 2. However, the correctness assumptions
-do not carry over to Windows where Unicode is the canonical representation, and
-errors may result. This potential data loss is why the use of bytes paths on
-Windows was deprecated in Python 3.3 - all of the above code snippets produce
-deprecation warnings on Windows.
-
-Proposal
-========
-
-Currently the default filesystem encoding is 'mbcs', which is a meta-encoder
-that uses the active code page. However, when bytes are passed to the filesystem
-they go through the \*A APIs and the operating system handles encoding. In this
-case, paths are always encoded using the equivalent of 'mbcs:replace' with no
-opportunity for Python to override or change this.
-
-This proposal would remove all use of the \*A APIs and only ever call the \*W
-APIs. When Windows returns paths to Python as ``str``, they will be decoded from
-utf-16-le and returned as text (in whatever the minimal representation is). When
-Python code requests paths as ``bytes``, the paths will be transcoded from
-utf-16-le into utf-8 using surrogatepass (Windows does not validate surrogate
-pairs, so it is possible to have invalid surrogates in filenames). Equally, when
-paths are provided as ``bytes``, they are transcoded from utf-8 into utf-16-le
-and passed to the \*W APIs.
-
-The use of utf-8 will not be configurable, except for the provision of a
-"legacy mode" flag to revert to the previous behaviour.
-
-The ``surrogateescape`` error mode does not apply here, as the concern is not
-about retaining non-sensical bytes. Any path returned from the operating system
-will be valid Unicode, while invalid paths created by the user should raise a
-decoding error (currently these would raise ``OSError`` or a subclass).
-
-The choice of utf-8 bytes (as opposed to utf-16-le bytes) is to ensure the
-ability to round-trip path names and allow basic manipulation (for example,
-using the ``os.path`` module) when assuming an ASCII-compatible encoding. Using
-utf-16-le as the encoding is more pure, but will cause more issues than are
-resolved.
-
-This change would also undeprecate the use of bytes paths on Windows. No change
-to the semantics of using bytes as a path is required - as before, they must be
-encoded with the encoding specified by ``sys.getfilesystemencoding()``.
-
-Specific Changes
-================
-
-Update sys.getfilesystemencoding
--------------------------------
-
-Remove the default value for ``Py_FileSystemDefaultEncoding`` and set it in
-``initfsencoding()`` to utf-8, or if the legacy-mode switch is enabled to mbcs.
-
-Update the implementations of ``PyUnicode_DecodeFSDefaultAndSize()`` and
-``PyUnicode_EncodeFSDefault()`` to use the utf-8 codec, or if the legacy-mode
-switch is enabled the existing mbcs codec.
-
-Add sys.getfilesystemencodeerrors
---------------------------------
-
-As the error mode may now change between ``surrogatepass`` and ``replace``,
-Python code that manually performs encoding also needs access to the current
-error mode. This includes the implementation of ``os.fsencode()`` and
-``os.fsdecode()``, which currently assume an error mode based on the codec.
-
-Add a public ``Py_FileSystemDefaultEncodeErrors``, similar to the existing
-``Py_FileSystemDefaultEncoding``. The default value on Windows will be
-``surrogatepass`` or in legacy mode, ``replace``. The default value on all other
-platforms will be ``surrogateescape``.
-
-Add a public ``sys.getfilesystemencodeerrors()`` function that returns the
-current error mode.
-
-Update the implementations of ``PyUnicode_DecodeFSDefaultAndSize()`` and
-``PyUnicode_EncodeFSDefault()`` to use the variable for error mode rather than
-constant strings.
-
-Update the implementations of ``os.fsencode()`` and ``os.fsdecode()`` to use
-``sys.getfilesystemencodeerrors()`` instead of assuming the mode.
-
-Update path_converter
---------------------
-
-Update the path converter to always decode bytes or buffer objects into text
-using ``PyUnicode_DecodeFSDefaultAndSize()``.
-
-Change the ``narrow`` field from a ``char*`` string into a flag that indicates
-whether the original object was bytes. This is required for functions that need
-to return paths using the same type as was originally provided.
-
-Remove unused ANSI code
-----------------------
-
-Remove all code paths using the ``narrow`` field, as these will no longer be
-reachable by any caller. These are only used within ``posixmodule.c``. Other
-uses of paths should have use of bytes paths replaced with decoding and use of
-the \*W APIs.
-
-Add legacy mode
---------------
-
-Add a legacy mode flag, enabled by the environment variable
-``PYTHONLEGACYWINDOWSFSENCODING`` or by a function call to
-``sys._enablelegacywindowsfsencoding()``. The function call can only be
-used to enable the flag and should be used by programs as close to
-initialization as possible. Legacy mode cannot be disabled while Python is
-running.
-
-When this flag is set, the default filesystem encoding is set to mbcs rather
-than utf-8, and the error mode is set to ``replace`` rather than
-``surrogatepass``. Paths will continue to decode to wide characters and only \*W
-APIs will be called, however, the bytes passed in and received from Python will
-be encoded the same as prior to this change.
-
-Undeprecate bytes paths on Windows
----------------------------------
-
-Using bytes as paths on Windows is currently deprecated. We would announce that
-this is no longer the case, and that paths when encoded as bytes should use
-whatever is returned from ``sys.getfilesystemencoding()`` rather than the user's
-active code page.
-
-Beta experiment
---------------
-
-To assist with determining the impact of this change, we propose applying it to
-3.6.0b1 provisionally with the intent being to make a final decision before
-3.6.0b4.
-
-During the experiment period, decoding and encoding exception messages will be
-expanded to include a link to an active online discussion and encourage
-reporting of problems.
-
-If it is decided to revert the functionality for 3.6.0b4, the implementation
-change would be to permanently enable the legacy mode flag, change the
-environment variable to ``PYTHONWINDOWSUTF8FSENCODING`` and function to
-``sys._enablewindowsutf8fsencoding()`` to allow enabling the functionality
-on a case-by-case basis, as opposed to disabling it.
-
-It is expected that if we cannot feasibly make the change for 3.6 due to
-compatibility concerns, it will not be possible to make the change at any later
-time in Python 3.x.
-
-Affected Modules
----------------
-
-This PEP implicitly includes all modules within the Python that either pass path
-names to the operating system, or otherwise use ``sys.getfilesystemencoding()``.
-
-As of 3.6.0a4, the following modules require modification:
-
-* ``os``
-* ``_overlapped``
-* ``_socket``
-* ``subprocess``
-* ``zipimport``
-
-The following modules use ``sys.getfilesystemencoding()`` but do not need
-modification:
-
-* ``gc`` (already assumes bytes are utf-8)
-* ``grp`` (not compiled for Windows)
-* ``http.server`` (correctly includes codec name with transmitted data)
-* ``idlelib.editor`` (should not be needed; has fallback handling)
-* ``nis`` (not compiled for Windows)
-* ``pwd`` (not compiled for Windows)
-* ``spwd`` (not compiled for Windows)
-* ``_ssl`` (only used for ASCII constants)
-* ``tarfile`` (code unused on Windows)
-* ``_tkinter`` (already assumes bytes are utf-8)
-* ``wsgiref`` (assumed as the default encoding for unknown environments)
-* ``zipapp`` (code unused on Windows)
-
-The following native code uses one of the encoding or decoding functions, but do
-not require any modification:
-
-* ``Parser/parsetok.c`` (docs already specify ``sys.getfilesystemencoding()``)
-* ``Python/ast.c`` (docs already specify ``sys.getfilesystemencoding()``)
-* ``Python/compile.c`` (undocumented, but Python filesystem encoding implied)
-* ``Python/errors.c`` (docs already specify ``os.fsdecode()``)
-* ``Python/fileutils.c`` (code unused on Windows)
-* ``Python/future.c`` (undocumented, but Python filesystem encoding implied)
-* ``Python/import.c`` (docs already specify utf-8)
-* ``Python/importdl.c`` (code unused on Windows)
-* ``Python/pythonrun.c`` (docs already specify ``sys.getfilesystemencoding()``)
-* ``Python/symtable.c`` (undocumented, but Python filesystem encoding implied)
-* ``Python/thread.c`` (code unused on Windows)
-* ``Python/traceback.c`` (encodes correctly for comparing strings)
-* ``Python/_warnings.c`` (docs already specify ``os.fsdecode()``)
-
-Rejected Alternatives
-=====================
-
-Use strict mbcs decoding
------------------------
-
-This is essentially the same as the proposed change, but instead of changing
-``sys.getfilesystemencoding()`` to utf-8 it is changed to mbcs (which
-dynamically maps to the active code page).
-
-This approach allows the use of new functionality that is only available as \*W
-APIs and also detection of encoding/decoding errors. For example, rather than
-silently replacing Unicode characters with '?', it would be possible to warn or
-fail the operation.
-
-Compared to the proposed fix, this could enable some new functionality but does
-not fix any of the problems described initially. New runtime errors may cause
-some problems to be more obvious and lead to fixes, provided library maintainers
-are interested in supporting Windows and adding a separate code path to treat
-filesystem paths as strings.
-
-Making the encoding mbcs without strict errors is equivalent to the legacy-mode
-switch being enabled by default. This is a possible course of action if there is
-significant breakage of actual code and a need to extend the deprecation period,
-but still a desire to have the simplifications to the CPython source.
-
-Make bytes paths an error on Windows
------------------------------------
-
-By preventing the use of bytes paths on Windows completely we prevent users from
-hitting encoding issues.
-
-However, the motivation for this PEP is to increase the likelihood that code
-written on POSIX will also work correctly on Windows. This alternative would
-move the other direction and make such code completely incompatible. As this
-does not benefit users in any way, we reject it.
-
-Make bytes paths an error on all platforms
------------------------------------------
-
-By deprecating and then disable the use of bytes paths on all platforms we
-prevent users from hitting encoding issues regardless of where the code was
-originally written. This would require a full deprecation cycle, as there are
-currently no warnings on platforms other than Windows.
-
-This is likely to be seen as a hostile action against Python developers in
-general, and as such is rejected at this time.
-
-Code that may break
-===================
-
-The following code patterns may break or see different behaviour as a result of
-this change. Each of these examples would have been fragile in code intended for
-cross-platform use. The suggested fixes demonstrate the most compatible way to
-handle path encoding issues across all platforms and across multiple Python
-versions.
-
-Note that all of these examples produce deprecation warnings on Python 3.3 and
-later.
-
-Not managing encodings across boundaries
----------------------------------------
-
-Code that does not manage encodings when crossing protocol boundaries may
-currently be working by chance, but could encounter issues when either encoding
-changes. Note that the source of ``filename`` may be any function that returns
-a bytes object, as illustrated in a second example below::
-
-    >>> filename = open('filename_in_mbcs.txt', 'rb').read()
-    >>> text = open(filename, 'r').read()
-
-To correct this code, the encoding of the bytes in ``filename`` should be
-specified, either when reading from the file or before using the value::
-
-    >>> # Fix 1: Open file as text (default encoding)
-    >>> filename = open('filename_in_mbcs.txt', 'r').read()
-    >>> text = open(filename, 'r').read()
-
-    >>> # Fix 2: Open file as text (explicit encoding)
-    >>> filename = open('filename_in_mbcs.txt', 'r', encoding='mbcs').read()
-    >>> text = open(filename, 'r').read()
-
-    >>> # Fix 3: Explicitly decode the path
-    >>> filename = open('filename_in_mbcs.txt', 'rb').read()
-    >>> text = open(filename.decode('mbcs'), 'r').read()
-
-Where the creator of ``filename`` is separated from the user of ``filename``,
-the encoding is important information to include::
-
-    >>> some_object.filename = r'C:\Users\Steve\Documents\my_file.txt'.encode('mbcs')
-
-    >>> filename = some_object.filename
-    >>> type(filename)
-    <class 'bytes'>
-    >>> text = open(filename, 'r').read()
-
-To fix this code for best compatibility across operating systems and Python
-versions, the filename should be exposed as str::
-
-    >>> # Fix 1: Expose as str
-    >>> some_object.filename = r'C:\Users\Steve\Documents\my_file.txt'
-    
-    >>> filename = some_object.filename
-    >>> type(filename)
-    <class 'str'>
-    >>> text = open(filename, 'r').read()
-
-Alternatively, the encoding used for the path needs to be made available to the
-user. Specifying ``os.fsencode()`` (or ``sys.getfilesystemencoding()``) is an
-acceptable choice, or a new attribute could be added with the exact encoding::
-
-    >>> # Fix 2: Use fsencode
-    >>> some_object.filename = os.fsencode(r'C:\Users\Steve\Documents\my_file.txt')
-    
-    >>> filename = some_object.filename
-    >>> type(filename)
-    <class 'bytes'>
-    >>> text = open(filename, 'r').read()
-
-
-    >>> # Fix 3: Expose as explicit encoding
-    >>> some_object.filename = r'C:\Users\Steve\Documents\my_file.txt'.encode('cp437')
-    >>> some_object.filename_encoding = 'cp437'
-
-    >>> filename = some_object.filename
-    >>> type(filename)
-    <class 'bytes'>
-    >>> filename = filename.decode(some_object.filename_encoding)
-    >>> type(filename)
-    <class 'str'>
-    >>> text = open(filename, 'r').read()
-
-
-Explicitly using 'mbcs'
-----------------------
-
-Code that explicitly encodes text using 'mbcs' before passing to file system
-APIs is now passing incorrectly encoded bytes. Note that the source of
-``filename`` in this example is not relevant, provided that it is a str::
-
-    >>> filename = open('files.txt', 'r').readline().rstrip()
-    >>> text = open(filename.encode('mbcs'), 'r')
-
-To correct this code, the string should be passed without explicit encoding, or
-should use ``os.fsencode()``::
-
-    >>> # Fix 1: Do not encode the string
-    >>> filename = open('files.txt', 'r').readline().rstrip()
-    >>> text = open(filename, 'r')
-
-    >>> # Fix 2: Use correct encoding
-    >>> filename = open('files.txt', 'r').readline().rstrip()
-    >>> text = open(os.fsencode(filename), 'r')
-
-
-References
-==========
-
-.. _Naming Files, Paths, and Namespaces: 
-   https://msdn.microsoft.com/en-us/library/windows/desktop/aa365247.aspx
-
-Copyright
-=========
-
-This document has been placed in the public domain.
+PEP: 529
+Title: Change Windows filesystem encoding to UTF-8
+Version: $Revision$
+Last-Modified: $Date$
+Author: Steve Dower <steve.dower@python.org>
+Status: Final
+Type: Standards Track
+Content-Type: text/x-rst
+Created: 27-Aug-2016
+Python-Version: 3.6
+Post-History: 01-Sep-2016, 04-Sep-2016
+Resolution: https://mail.python.org/pipermail/python-dev/2016-September/146277.html
+
+Abstract
+========
+
+Historically, Python uses the ANSI APIs for interacting with the Windows
+operating system, often via C Runtime functions. However, these have been long
+discouraged in favor of the UTF-16 APIs. Within the operating system, all text
+is represented as UTF-16, and the ANSI APIs perform encoding and decoding using
+the active code page. See `Naming Files, Paths, and Namespaces`_ for
+more details.
+
+This PEP proposes changing the default filesystem encoding on Windows to utf-8,
+and changing all filesystem functions to use the Unicode APIs for filesystem
+paths. This will not affect code that uses strings to represent paths, however
+those that use bytes for paths will now be able to correctly round-trip all
+valid paths in Windows filesystems. Currently, the conversions between Unicode
+(in the OS) and bytes (in Python) were lossy and would fail to round-trip
+characters outside of the user's active code page.
+
+Notably, this does not impact the encoding of the contents of files. These will
+continue to default to ``locale.getpreferredencoding()`` (for text files) or
+plain bytes (for binary files). This only affects the encoding used when users
+pass a bytes object to Python where it is then passed to the operating system as
+a path name.
+
+Background
+==========
+
+File system paths are almost universally represented as text with an encoding
+determined by the file system. In Python, we expose these paths via a number of
+interfaces, such as the ``os`` and ``io`` modules. Paths may be passed either
+direction across these interfaces, that is, from the filesystem to the
+application (for example, ``os.listdir()``), or from the application to the
+filesystem (for example, ``os.unlink()``).
+
+When paths are passed between the filesystem and the application, they are
+either passed through as a bytes blob or converted to/from str using
+``os.fsencode()`` and ``os.fsdecode()`` or explicit encoding using
+``sys.getfilesystemencoding()``. The result of encoding a string with
+``sys.getfilesystemencoding()`` is a blob of bytes in the native format for the
+default file system.
+
+On Windows, the native format for the filesystem is utf-16-le. The recommended
+platform APIs for accessing the filesystem all accept and return text encoded in
+this format. However, prior to Windows NT (and possibly further back), the
+native format was a configurable machine option and a separate set of APIs
+existed to accept this format. The option (the "active code page") and these
+APIs (the "\*A functions") still exist in recent versions of Windows for
+backwards compatibility, though new functionality often only has a utf-16-le API
+(the "\*W functions").
+
+In Python, str is recommended because it can correctly round-trip all characters
+used in paths (on POSIX with surrogateescape handling; on Windows because str
+maps to the native representation). On Windows bytes cannot round-trip all
+characters used in paths, as Python internally uses the \*A functions and hence
+the encoding is "whatever the active code page is". Since the active code page
+cannot represent all Unicode characters, the conversion of a path into bytes can
+lose information without warning or any available indication.
+
+As a demonstration of this::
+
+    >>> open('test\uAB00.txt', 'wb').close()
+    >>> import glob
+    >>> glob.glob('test*')
+    ['test\uab00.txt']
+    >>> glob.glob(b'test*')
+    [b'test?.txt']
+
+The Unicode character in the second call to glob has been replaced by a '?',
+which means passing the path back into the filesystem will result in a
+``FileNotFoundError``. The same results may be observed with ``os.listdir()`` or
+any function that matches the return type to the parameter type.
+
+While one user-accessible fix is to use str everywhere, POSIX systems generally
+do not suffer from data loss when using bytes exclusively as the bytes are the
+canonical representation. Even if the encoding is "incorrect" by some standard,
+the file system will still map the bytes back to the file. Making use of this
+avoids the cost of decoding and reencoding, such that (theoretically, and only
+on POSIX), code such as this may be faster because of the use of ``b'.'``
+compared to using ``'.'``::
+
+    >>> for f in os.listdir(b'.'):
+    ...     os.stat(f)
+    ...
+
+As a result, POSIX-focused library authors prefer to use bytes to represent
+paths. For some authors it is also a convenience, as their code may receive
+bytes already known to be encoded correctly, while others are attempting to
+simplify porting their code from Python 2. However, the correctness assumptions
+do not carry over to Windows where Unicode is the canonical representation, and
+errors may result. This potential data loss is why the use of bytes paths on
+Windows was deprecated in Python 3.3 - all of the above code snippets produce
+deprecation warnings on Windows.
+
+Proposal
+========
+
+Currently the default filesystem encoding is 'mbcs', which is a meta-encoder
+that uses the active code page. However, when bytes are passed to the filesystem
+they go through the \*A APIs and the operating system handles encoding. In this
+case, paths are always encoded using the equivalent of 'mbcs:replace' with no
+opportunity for Python to override or change this.
+
+This proposal would remove all use of the \*A APIs and only ever call the \*W
+APIs. When Windows returns paths to Python as ``str``, they will be decoded from
+utf-16-le and returned as text (in whatever the minimal representation is). When
+Python code requests paths as ``bytes``, the paths will be transcoded from
+utf-16-le into utf-8 using surrogatepass (Windows does not validate surrogate
+pairs, so it is possible to have invalid surrogates in filenames). Equally, when
+paths are provided as ``bytes``, they are transcoded from utf-8 into utf-16-le
+and passed to the \*W APIs.
+
+The use of utf-8 will not be configurable, except for the provision of a
+"legacy mode" flag to revert to the previous behaviour.
+
+The ``surrogateescape`` error mode does not apply here, as the concern is not
+about retaining non-sensical bytes. Any path returned from the operating system
+will be valid Unicode, while invalid paths created by the user should raise a
+decoding error (currently these would raise ``OSError`` or a subclass).
+
+The choice of utf-8 bytes (as opposed to utf-16-le bytes) is to ensure the
+ability to round-trip path names and allow basic manipulation (for example,
+using the ``os.path`` module) when assuming an ASCII-compatible encoding. Using
+utf-16-le as the encoding is more pure, but will cause more issues than are
+resolved.
+
+This change would also undeprecate the use of bytes paths on Windows. No change
+to the semantics of using bytes as a path is required - as before, they must be
+encoded with the encoding specified by ``sys.getfilesystemencoding()``.
+
+Specific Changes
+================
+
+Update sys.getfilesystemencoding
+--------------------------------
+
+Remove the default value for ``Py_FileSystemDefaultEncoding`` and set it in
+``initfsencoding()`` to utf-8, or if the legacy-mode switch is enabled to mbcs.
+
+Update the implementations of ``PyUnicode_DecodeFSDefaultAndSize()`` and
+``PyUnicode_EncodeFSDefault()`` to use the utf-8 codec, or if the legacy-mode
+switch is enabled the existing mbcs codec.
+
+Add sys.getfilesystemencodeerrors
+---------------------------------
+
+As the error mode may now change between ``surrogatepass`` and ``replace``,
+Python code that manually performs encoding also needs access to the current
+error mode. This includes the implementation of ``os.fsencode()`` and
+``os.fsdecode()``, which currently assume an error mode based on the codec.
+
+Add a public ``Py_FileSystemDefaultEncodeErrors``, similar to the existing
+``Py_FileSystemDefaultEncoding``. The default value on Windows will be
+``surrogatepass`` or in legacy mode, ``replace``. The default value on all other
+platforms will be ``surrogateescape``.
+
+Add a public ``sys.getfilesystemencodeerrors()`` function that returns the
+current error mode.
+
+Update the implementations of ``PyUnicode_DecodeFSDefaultAndSize()`` and
+``PyUnicode_EncodeFSDefault()`` to use the variable for error mode rather than
+constant strings.
+
+Update the implementations of ``os.fsencode()`` and ``os.fsdecode()`` to use
+``sys.getfilesystemencodeerrors()`` instead of assuming the mode.
+
+Update path_converter
+---------------------
+
+Update the path converter to always decode bytes or buffer objects into text
+using ``PyUnicode_DecodeFSDefaultAndSize()``.
+
+Change the ``narrow`` field from a ``char*`` string into a flag that indicates
+whether the original object was bytes. This is required for functions that need
+to return paths using the same type as was originally provided.
+
+Remove unused ANSI code
+-----------------------
+
+Remove all code paths using the ``narrow`` field, as these will no longer be
+reachable by any caller. These are only used within ``posixmodule.c``. Other
+uses of paths should have use of bytes paths replaced with decoding and use of
+the \*W APIs.
+
+Add legacy mode
+---------------
+
+Add a legacy mode flag, enabled by the environment variable
+``PYTHONLEGACYWINDOWSFSENCODING`` or by a function call to
+``sys._enablelegacywindowsfsencoding()``. The function call can only be
+used to enable the flag and should be used by programs as close to
+initialization as possible. Legacy mode cannot be disabled while Python is
+running.
+
+When this flag is set, the default filesystem encoding is set to mbcs rather
+than utf-8, and the error mode is set to ``replace`` rather than
+``surrogatepass``. Paths will continue to decode to wide characters and only \*W
+APIs will be called, however, the bytes passed in and received from Python will
+be encoded the same as prior to this change.
+
+Undeprecate bytes paths on Windows
+----------------------------------
+
+Using bytes as paths on Windows is currently deprecated. We would announce that
+this is no longer the case, and that paths when encoded as bytes should use
+whatever is returned from ``sys.getfilesystemencoding()`` rather than the user's
+active code page.
+
+Beta experiment
+---------------
+
+To assist with determining the impact of this change, we propose applying it to
+3.6.0b1 provisionally with the intent being to make a final decision before
+3.6.0b4.
+
+During the experiment period, decoding and encoding exception messages will be
+expanded to include a link to an active online discussion and encourage
+reporting of problems.
+
+If it is decided to revert the functionality for 3.6.0b4, the implementation
+change would be to permanently enable the legacy mode flag, change the
+environment variable to ``PYTHONWINDOWSUTF8FSENCODING`` and function to
+``sys._enablewindowsutf8fsencoding()`` to allow enabling the functionality
+on a case-by-case basis, as opposed to disabling it.
+
+It is expected that if we cannot feasibly make the change for 3.6 due to
+compatibility concerns, it will not be possible to make the change at any later
+time in Python 3.x.
+
+Affected Modules
+----------------
+
+This PEP implicitly includes all modules within the Python that either pass path
+names to the operating system, or otherwise use ``sys.getfilesystemencoding()``.
+
+As of 3.6.0a4, the following modules require modification:
+
+* ``os``
+* ``_overlapped``
+* ``_socket``
+* ``subprocess``
+* ``zipimport``
+
+The following modules use ``sys.getfilesystemencoding()`` but do not need
+modification:
+
+* ``gc`` (already assumes bytes are utf-8)
+* ``grp`` (not compiled for Windows)
+* ``http.server`` (correctly includes codec name with transmitted data)
+* ``idlelib.editor`` (should not be needed; has fallback handling)
+* ``nis`` (not compiled for Windows)
+* ``pwd`` (not compiled for Windows)
+* ``spwd`` (not compiled for Windows)
+* ``_ssl`` (only used for ASCII constants)
+* ``tarfile`` (code unused on Windows)
+* ``_tkinter`` (already assumes bytes are utf-8)
+* ``wsgiref`` (assumed as the default encoding for unknown environments)
+* ``zipapp`` (code unused on Windows)
+
+The following native code uses one of the encoding or decoding functions, but do
+not require any modification:
+
+* ``Parser/parsetok.c`` (docs already specify ``sys.getfilesystemencoding()``)
+* ``Python/ast.c`` (docs already specify ``sys.getfilesystemencoding()``)
+* ``Python/compile.c`` (undocumented, but Python filesystem encoding implied)
+* ``Python/errors.c`` (docs already specify ``os.fsdecode()``)
+* ``Python/fileutils.c`` (code unused on Windows)
+* ``Python/future.c`` (undocumented, but Python filesystem encoding implied)
+* ``Python/import.c`` (docs already specify utf-8)
+* ``Python/importdl.c`` (code unused on Windows)
+* ``Python/pythonrun.c`` (docs already specify ``sys.getfilesystemencoding()``)
+* ``Python/symtable.c`` (undocumented, but Python filesystem encoding implied)
+* ``Python/thread.c`` (code unused on Windows)
+* ``Python/traceback.c`` (encodes correctly for comparing strings)
+* ``Python/_warnings.c`` (docs already specify ``os.fsdecode()``)
+
+Rejected Alternatives
+=====================
+
+Use strict mbcs decoding
+------------------------
+
+This is essentially the same as the proposed change, but instead of changing
+``sys.getfilesystemencoding()`` to utf-8 it is changed to mbcs (which
+dynamically maps to the active code page).
+
+This approach allows the use of new functionality that is only available as \*W
+APIs and also detection of encoding/decoding errors. For example, rather than
+silently replacing Unicode characters with '?', it would be possible to warn or
+fail the operation.
+
+Compared to the proposed fix, this could enable some new functionality but does
+not fix any of the problems described initially. New runtime errors may cause
+some problems to be more obvious and lead to fixes, provided library maintainers
+are interested in supporting Windows and adding a separate code path to treat
+filesystem paths as strings.
+
+Making the encoding mbcs without strict errors is equivalent to the legacy-mode
+switch being enabled by default. This is a possible course of action if there is
+significant breakage of actual code and a need to extend the deprecation period,
+but still a desire to have the simplifications to the CPython source.
+
+Make bytes paths an error on Windows
+------------------------------------
+
+By preventing the use of bytes paths on Windows completely we prevent users from
+hitting encoding issues.
+
+However, the motivation for this PEP is to increase the likelihood that code
+written on POSIX will also work correctly on Windows. This alternative would
+move the other direction and make such code completely incompatible. As this
+does not benefit users in any way, we reject it.
+
+Make bytes paths an error on all platforms
+------------------------------------------
+
+By deprecating and then disable the use of bytes paths on all platforms we
+prevent users from hitting encoding issues regardless of where the code was
+originally written. This would require a full deprecation cycle, as there are
+currently no warnings on platforms other than Windows.
+
+This is likely to be seen as a hostile action against Python developers in
+general, and as such is rejected at this time.
+
+Code that may break
+===================
+
+The following code patterns may break or see different behaviour as a result of
+this change. Each of these examples would have been fragile in code intended for
+cross-platform use. The suggested fixes demonstrate the most compatible way to
+handle path encoding issues across all platforms and across multiple Python
+versions.
+
+Note that all of these examples produce deprecation warnings on Python 3.3 and
+later.
+
+Not managing encodings across boundaries
+----------------------------------------
+
+Code that does not manage encodings when crossing protocol boundaries may
+currently be working by chance, but could encounter issues when either encoding
+changes. Note that the source of ``filename`` may be any function that returns
+a bytes object, as illustrated in a second example below::
+
+    >>> filename = open('filename_in_mbcs.txt', 'rb').read()
+    >>> text = open(filename, 'r').read()
+
+To correct this code, the encoding of the bytes in ``filename`` should be
+specified, either when reading from the file or before using the value::
+
+    >>> # Fix 1: Open file as text (default encoding)
+    >>> filename = open('filename_in_mbcs.txt', 'r').read()
+    >>> text = open(filename, 'r').read()
+
+    >>> # Fix 2: Open file as text (explicit encoding)
+    >>> filename = open('filename_in_mbcs.txt', 'r', encoding='mbcs').read()
+    >>> text = open(filename, 'r').read()
+
+    >>> # Fix 3: Explicitly decode the path
+    >>> filename = open('filename_in_mbcs.txt', 'rb').read()
+    >>> text = open(filename.decode('mbcs'), 'r').read()
+
+Where the creator of ``filename`` is separated from the user of ``filename``,
+the encoding is important information to include::
+
+    >>> some_object.filename = r'C:\Users\Steve\Documents\my_file.txt'.encode('mbcs')
+
+    >>> filename = some_object.filename
+    >>> type(filename)
+    <class 'bytes'>
+    >>> text = open(filename, 'r').read()
+
+To fix this code for best compatibility across operating systems and Python
+versions, the filename should be exposed as str::
+
+    >>> # Fix 1: Expose as str
+    >>> some_object.filename = r'C:\Users\Steve\Documents\my_file.txt'
+
+    >>> filename = some_object.filename
+    >>> type(filename)
+    <class 'str'>
+    >>> text = open(filename, 'r').read()
+
+Alternatively, the encoding used for the path needs to be made available to the
+user. Specifying ``os.fsencode()`` (or ``sys.getfilesystemencoding()``) is an
+acceptable choice, or a new attribute could be added with the exact encoding::
+
+    >>> # Fix 2: Use fsencode
+    >>> some_object.filename = os.fsencode(r'C:\Users\Steve\Documents\my_file.txt')
+
+    >>> filename = some_object.filename
+    >>> type(filename)
+    <class 'bytes'>
+    >>> text = open(filename, 'r').read()
+
+
+    >>> # Fix 3: Expose as explicit encoding
+    >>> some_object.filename = r'C:\Users\Steve\Documents\my_file.txt'.encode('cp437')
+    >>> some_object.filename_encoding = 'cp437'
+
+    >>> filename = some_object.filename
+    >>> type(filename)
+    <class 'bytes'>
+    >>> filename = filename.decode(some_object.filename_encoding)
+    >>> type(filename)
+    <class 'str'>
+    >>> text = open(filename, 'r').read()
+
+
+Explicitly using 'mbcs'
+-----------------------
+
+Code that explicitly encodes text using 'mbcs' before passing to file system
+APIs is now passing incorrectly encoded bytes. Note that the source of
+``filename`` in this example is not relevant, provided that it is a str::
+
+    >>> filename = open('files.txt', 'r').readline().rstrip()
+    >>> text = open(filename.encode('mbcs'), 'r')
+
+To correct this code, the string should be passed without explicit encoding, or
+should use ``os.fsencode()``::
+
+    >>> # Fix 1: Do not encode the string
+    >>> filename = open('files.txt', 'r').readline().rstrip()
+    >>> text = open(filename, 'r')
+
+    >>> # Fix 2: Use correct encoding
+    >>> filename = open('files.txt', 'r').readline().rstrip()
+    >>> text = open(os.fsencode(filename), 'r')
+
+
+References
+==========
+
+.. _Naming Files, Paths, and Namespaces:
+   https://msdn.microsoft.com/en-us/library/windows/desktop/aa365247.aspx
+
+Copyright
+=========
+
+This document has been placed in the public domain.