PEP 529: Adds sys function for legacy mode, adds "experimental" procedure, and expands examples

2016-09-06 13:39:54 -07:00 · 2016-09-06 13:39:54 -07:00 · 10cdc578c1
parent d6dd35ec63
commit 10cdc578c1
1 changed files with 89 additions and 9 deletions
--- a/pep-0529.txt
+++ b/pep-0529.txt
@ -197,7 +197,11 @@ Add legacy mode
 ---------------

 Add a legacy mode flag, enabled by the environment variable
-``PYTHONLEGACYWINDOWSFSENCODING``.
+``PYTHONLEGACYWINDOWSFSENCODING`` or by a function call to
+``sys.enable_legacy_windows_fs_encoding()``. The function call can only be
+used to enable the flag and should be used by programs as close to
+initialization as possible. Legacy mode cannot be disabled while Python is
+running.

 When this flag is set, the default filesystem encoding is set to mbcs rather
 than utf-8, and the error mode is set to ``replace`` rather than
@ -213,6 +217,27 @@ this is no longer the case, and that paths when encoded as bytes should use
 whatever is returned from ``sys.getfilesystemencoding()`` rather than the user's
 active code page.

+Beta experiment
+---------------
+
+To assist with determining the impact of this change, we propose applying it to
+3.6.0b1 provisionally with the intent being to make a final decision before
+3.6.0rc1.
+
+During the experiment period, decoding and encoding exception messages will be
+expanded to include a link to an active online discussion and encourage
+reporting of problems.
+
+If it is decided to revert the functionality for 3.6.0rc1, the implementation
+change would be to permanently enable the legacy mode flag, change the
+environment variable to ``PYTHONWINDOWSUTF8FSENCODING`` and function to
+``sys.enable_windows_utf8_fs_encoding()`` to allow enabling the functionality
+on a case-by-case basis, as opposed to disabling it.
+
+It is expected that if we cannot feasibly make the change for 3.6 due to
+compatibility concerns, it will not be possible to make the change at any later
+time in Python 3.x.
+
 Affected Modules
 ----------------

@ -312,7 +337,10 @@ Code that may break
 ===================

 The following code patterns may break or see different behaviour as a result of
-this change.
+this change. Each of these examples would have been fragile in code intended for
+cross-platform use. The suggested fixes demonstrate the most compatible way to
+handle path encoding issues across all platforms and across multiple Python
+versions.

 Note that all of these examples produce deprecation warnings on Python 3.3 and
 later.
@ -322,7 +350,8 @@ Not managing encodings across boundaries

 Code that does not manage encodings when crossing protocol boundaries may
 currently be working by chance, but could encounter issues when either encoding
-changes. For example::
+changes. Note that the source of ``filename`` may be any function that returns
+a bytes object, as illustrated in a second example below::

    >>> filename = open('filename_in_mbcs.txt', 'rb').read()
    >>> text = open(filename, 'r').read()
@ -330,33 +359,84 @@ changes. For example::
 To correct this code, the encoding of the bytes in ``filename`` should be
 specified, either when reading from the file or before using the value::

-    >>> # Fix 1: Open file as text
+    >>> # Fix 1: Open file as text (default encoding)
+    >>> filename = open('filename_in_mbcs.txt', 'r').read()
+    >>> text = open(filename, 'r').read()
+
+    >>> # Fix 2: Open file as text (explicit encoding)
    >>> filename = open('filename_in_mbcs.txt', 'r', encoding='mbcs').read()
    >>> text = open(filename, 'r').read()

-    >>> # Fix 2: Decode path
+    >>> # Fix 3: Explicitly decode the path
    >>> filename = open('filename_in_mbcs.txt', 'rb').read()
    >>> text = open(filename.decode('mbcs'), 'r').read()

+Where the creator of ``filename`` is separated from the user of ``filename``,
+the encoding is important information to include::
+
+    >>> some_object.filename = r'C:\Users\Steve\Documents\my_file.txt'.encode('mbcs')
+
+    >>> filename = some_object.filename
+    >>> type(filename)
+    <class 'bytes'>
+    >>> text = open(filename, 'r').read()
+
+To fix this code for best compatibility across operating systems and Python
+versions, the filename should be exposed as str::
+
+    >>> # Fix 1: Expose as str
+    >>> some_object.filename = r'C:\Users\Steve\Documents\my_file.txt'
+    
+    >>> filename = some_object.filename
+    >>> type(filename)
+    <class 'str'>
+    >>> text = open(filename, 'r').read()
+
+Alternatively, the encoding used for the path needs to be made available to the
+user. Specifying ``os.fsencode()`` (or ``sys.getfilesystemencoding()``) is an
+acceptable choice, or a new attribute could be added with the exact encoding::
+
+    >>> # Fix 2: Use fsencode
+    >>> some_object.filename = os.fsencode(r'C:\Users\Steve\Documents\my_file.txt')
+    
+    >>> filename = some_object.filename
+    >>> type(filename)
+    <class 'bytes'>
+    >>> text = open(filename, 'r').read()
+
+
+    >>> # Fix 3: Expose as explicit encoding
+    >>> some_object.filename = r'C:\Users\Steve\Documents\my_file.txt'.encode('cp437')
+    >>> some_object.filename_encoding = 'cp437'
+
+    >>> filename = some_object.filename
+    >>> type(filename)
+    <class 'bytes'>
+    >>> filename = filename.decode(some_object.filename_encoding)
+    >>> type(filename)
+    <class 'str'>
+    >>> text = open(filename, 'r').read()
+

 Explicitly using 'mbcs'
 -----------------------

 Code that explicitly encodes text using 'mbcs' before passing to file system
-APIs is now passing incorrectly encoded bytes. For example::
+APIs is now passing incorrectly encoded bytes. Note that the source of
+``filename`` in this example is not relevant, provided that it is a str::

-    >>> filename = open('files.txt', 'r').readline()
+    >>> filename = open('files.txt', 'r').readline().rstrip()
    >>> text = open(filename.encode('mbcs'), 'r')

 To correct this code, the string should be passed without explicit encoding, or
 should use ``os.fsencode()``::

    >>> # Fix 1: Do not encode the string
-    >>> filename = open('files.txt', 'r').readline()
+    >>> filename = open('files.txt', 'r').readline().rstrip()
    >>> text = open(filename, 'r')

    >>> # Fix 2: Use correct encoding
-    >>> filename = open('files.txt', 'r').readline()
+    >>> filename = open('files.txt', 'r').readline().rstrip()
    >>> text = open(os.fsencode(filename), 'r')