PEP 529: Adds sys function for legacy mode, adds "experimental" procedure, and expands examples

This commit is contained in:
Steve Dower 2016-09-06 13:39:54 -07:00
parent d6dd35ec63
commit 10cdc578c1
1 changed files with 89 additions and 9 deletions

View File

@ -197,7 +197,11 @@ Add legacy mode
---------------
Add a legacy mode flag, enabled by the environment variable
``PYTHONLEGACYWINDOWSFSENCODING``.
``PYTHONLEGACYWINDOWSFSENCODING`` or by a function call to
``sys.enable_legacy_windows_fs_encoding()``. The function call can only be
used to enable the flag and should be used by programs as close to
initialization as possible. Legacy mode cannot be disabled while Python is
running.
When this flag is set, the default filesystem encoding is set to mbcs rather
than utf-8, and the error mode is set to ``replace`` rather than
@ -213,6 +217,27 @@ this is no longer the case, and that paths when encoded as bytes should use
whatever is returned from ``sys.getfilesystemencoding()`` rather than the user's
active code page.
Beta experiment
---------------
To assist with determining the impact of this change, we propose applying it to
3.6.0b1 provisionally with the intent being to make a final decision before
3.6.0rc1.
During the experiment period, decoding and encoding exception messages will be
expanded to include a link to an active online discussion and encourage
reporting of problems.
If it is decided to revert the functionality for 3.6.0rc1, the implementation
change would be to permanently enable the legacy mode flag, change the
environment variable to ``PYTHONWINDOWSUTF8FSENCODING`` and function to
``sys.enable_windows_utf8_fs_encoding()`` to allow enabling the functionality
on a case-by-case basis, as opposed to disabling it.
It is expected that if we cannot feasibly make the change for 3.6 due to
compatibility concerns, it will not be possible to make the change at any later
time in Python 3.x.
Affected Modules
----------------
@ -312,7 +337,10 @@ Code that may break
===================
The following code patterns may break or see different behaviour as a result of
this change.
this change. Each of these examples would have been fragile in code intended for
cross-platform use. The suggested fixes demonstrate the most compatible way to
handle path encoding issues across all platforms and across multiple Python
versions.
Note that all of these examples produce deprecation warnings on Python 3.3 and
later.
@ -322,7 +350,8 @@ Not managing encodings across boundaries
Code that does not manage encodings when crossing protocol boundaries may
currently be working by chance, but could encounter issues when either encoding
changes. For example::
changes. Note that the source of ``filename`` may be any function that returns
a bytes object, as illustrated in a second example below::
>>> filename = open('filename_in_mbcs.txt', 'rb').read()
>>> text = open(filename, 'r').read()
@ -330,33 +359,84 @@ changes. For example::
To correct this code, the encoding of the bytes in ``filename`` should be
specified, either when reading from the file or before using the value::
>>> # Fix 1: Open file as text
>>> # Fix 1: Open file as text (default encoding)
>>> filename = open('filename_in_mbcs.txt', 'r').read()
>>> text = open(filename, 'r').read()
>>> # Fix 2: Open file as text (explicit encoding)
>>> filename = open('filename_in_mbcs.txt', 'r', encoding='mbcs').read()
>>> text = open(filename, 'r').read()
>>> # Fix 2: Decode path
>>> # Fix 3: Explicitly decode the path
>>> filename = open('filename_in_mbcs.txt', 'rb').read()
>>> text = open(filename.decode('mbcs'), 'r').read()
Where the creator of ``filename`` is separated from the user of ``filename``,
the encoding is important information to include::
>>> some_object.filename = r'C:\Users\Steve\Documents\my_file.txt'.encode('mbcs')
>>> filename = some_object.filename
>>> type(filename)
<class 'bytes'>
>>> text = open(filename, 'r').read()
To fix this code for best compatibility across operating systems and Python
versions, the filename should be exposed as str::
>>> # Fix 1: Expose as str
>>> some_object.filename = r'C:\Users\Steve\Documents\my_file.txt'
>>> filename = some_object.filename
>>> type(filename)
<class 'str'>
>>> text = open(filename, 'r').read()
Alternatively, the encoding used for the path needs to be made available to the
user. Specifying ``os.fsencode()`` (or ``sys.getfilesystemencoding()``) is an
acceptable choice, or a new attribute could be added with the exact encoding::
>>> # Fix 2: Use fsencode
>>> some_object.filename = os.fsencode(r'C:\Users\Steve\Documents\my_file.txt')
>>> filename = some_object.filename
>>> type(filename)
<class 'bytes'>
>>> text = open(filename, 'r').read()
>>> # Fix 3: Expose as explicit encoding
>>> some_object.filename = r'C:\Users\Steve\Documents\my_file.txt'.encode('cp437')
>>> some_object.filename_encoding = 'cp437'
>>> filename = some_object.filename
>>> type(filename)
<class 'bytes'>
>>> filename = filename.decode(some_object.filename_encoding)
>>> type(filename)
<class 'str'>
>>> text = open(filename, 'r').read()
Explicitly using 'mbcs'
-----------------------
Code that explicitly encodes text using 'mbcs' before passing to file system
APIs is now passing incorrectly encoded bytes. For example::
APIs is now passing incorrectly encoded bytes. Note that the source of
``filename`` in this example is not relevant, provided that it is a str::
>>> filename = open('files.txt', 'r').readline()
>>> filename = open('files.txt', 'r').readline().rstrip()
>>> text = open(filename.encode('mbcs'), 'r')
To correct this code, the string should be passed without explicit encoding, or
should use ``os.fsencode()``::
>>> # Fix 1: Do not encode the string
>>> filename = open('files.txt', 'r').readline()
>>> filename = open('files.txt', 'r').readline().rstrip()
>>> text = open(filename, 'r')
>>> # Fix 2: Use correct encoding
>>> filename = open('files.txt', 'r').readline()
>>> filename = open('files.txt', 'r').readline().rstrip()
>>> text = open(os.fsencode(filename), 'r')