PEP 529: Adds sys function for legacy mode, adds "experimental" procedure, and expands examples
This commit is contained in:
parent
d6dd35ec63
commit
10cdc578c1
98
pep-0529.txt
98
pep-0529.txt
|
@ -197,7 +197,11 @@ Add legacy mode
|
||||||
---------------
|
---------------
|
||||||
|
|
||||||
Add a legacy mode flag, enabled by the environment variable
|
Add a legacy mode flag, enabled by the environment variable
|
||||||
``PYTHONLEGACYWINDOWSFSENCODING``.
|
``PYTHONLEGACYWINDOWSFSENCODING`` or by a function call to
|
||||||
|
``sys.enable_legacy_windows_fs_encoding()``. The function call can only be
|
||||||
|
used to enable the flag and should be used by programs as close to
|
||||||
|
initialization as possible. Legacy mode cannot be disabled while Python is
|
||||||
|
running.
|
||||||
|
|
||||||
When this flag is set, the default filesystem encoding is set to mbcs rather
|
When this flag is set, the default filesystem encoding is set to mbcs rather
|
||||||
than utf-8, and the error mode is set to ``replace`` rather than
|
than utf-8, and the error mode is set to ``replace`` rather than
|
||||||
|
@ -213,6 +217,27 @@ this is no longer the case, and that paths when encoded as bytes should use
|
||||||
whatever is returned from ``sys.getfilesystemencoding()`` rather than the user's
|
whatever is returned from ``sys.getfilesystemencoding()`` rather than the user's
|
||||||
active code page.
|
active code page.
|
||||||
|
|
||||||
|
Beta experiment
|
||||||
|
---------------
|
||||||
|
|
||||||
|
To assist with determining the impact of this change, we propose applying it to
|
||||||
|
3.6.0b1 provisionally with the intent being to make a final decision before
|
||||||
|
3.6.0rc1.
|
||||||
|
|
||||||
|
During the experiment period, decoding and encoding exception messages will be
|
||||||
|
expanded to include a link to an active online discussion and encourage
|
||||||
|
reporting of problems.
|
||||||
|
|
||||||
|
If it is decided to revert the functionality for 3.6.0rc1, the implementation
|
||||||
|
change would be to permanently enable the legacy mode flag, change the
|
||||||
|
environment variable to ``PYTHONWINDOWSUTF8FSENCODING`` and function to
|
||||||
|
``sys.enable_windows_utf8_fs_encoding()`` to allow enabling the functionality
|
||||||
|
on a case-by-case basis, as opposed to disabling it.
|
||||||
|
|
||||||
|
It is expected that if we cannot feasibly make the change for 3.6 due to
|
||||||
|
compatibility concerns, it will not be possible to make the change at any later
|
||||||
|
time in Python 3.x.
|
||||||
|
|
||||||
Affected Modules
|
Affected Modules
|
||||||
----------------
|
----------------
|
||||||
|
|
||||||
|
@ -312,7 +337,10 @@ Code that may break
|
||||||
===================
|
===================
|
||||||
|
|
||||||
The following code patterns may break or see different behaviour as a result of
|
The following code patterns may break or see different behaviour as a result of
|
||||||
this change.
|
this change. Each of these examples would have been fragile in code intended for
|
||||||
|
cross-platform use. The suggested fixes demonstrate the most compatible way to
|
||||||
|
handle path encoding issues across all platforms and across multiple Python
|
||||||
|
versions.
|
||||||
|
|
||||||
Note that all of these examples produce deprecation warnings on Python 3.3 and
|
Note that all of these examples produce deprecation warnings on Python 3.3 and
|
||||||
later.
|
later.
|
||||||
|
@ -322,7 +350,8 @@ Not managing encodings across boundaries
|
||||||
|
|
||||||
Code that does not manage encodings when crossing protocol boundaries may
|
Code that does not manage encodings when crossing protocol boundaries may
|
||||||
currently be working by chance, but could encounter issues when either encoding
|
currently be working by chance, but could encounter issues when either encoding
|
||||||
changes. For example::
|
changes. Note that the source of ``filename`` may be any function that returns
|
||||||
|
a bytes object, as illustrated in a second example below::
|
||||||
|
|
||||||
>>> filename = open('filename_in_mbcs.txt', 'rb').read()
|
>>> filename = open('filename_in_mbcs.txt', 'rb').read()
|
||||||
>>> text = open(filename, 'r').read()
|
>>> text = open(filename, 'r').read()
|
||||||
|
@ -330,33 +359,84 @@ changes. For example::
|
||||||
To correct this code, the encoding of the bytes in ``filename`` should be
|
To correct this code, the encoding of the bytes in ``filename`` should be
|
||||||
specified, either when reading from the file or before using the value::
|
specified, either when reading from the file or before using the value::
|
||||||
|
|
||||||
>>> # Fix 1: Open file as text
|
>>> # Fix 1: Open file as text (default encoding)
|
||||||
|
>>> filename = open('filename_in_mbcs.txt', 'r').read()
|
||||||
|
>>> text = open(filename, 'r').read()
|
||||||
|
|
||||||
|
>>> # Fix 2: Open file as text (explicit encoding)
|
||||||
>>> filename = open('filename_in_mbcs.txt', 'r', encoding='mbcs').read()
|
>>> filename = open('filename_in_mbcs.txt', 'r', encoding='mbcs').read()
|
||||||
>>> text = open(filename, 'r').read()
|
>>> text = open(filename, 'r').read()
|
||||||
|
|
||||||
>>> # Fix 2: Decode path
|
>>> # Fix 3: Explicitly decode the path
|
||||||
>>> filename = open('filename_in_mbcs.txt', 'rb').read()
|
>>> filename = open('filename_in_mbcs.txt', 'rb').read()
|
||||||
>>> text = open(filename.decode('mbcs'), 'r').read()
|
>>> text = open(filename.decode('mbcs'), 'r').read()
|
||||||
|
|
||||||
|
Where the creator of ``filename`` is separated from the user of ``filename``,
|
||||||
|
the encoding is important information to include::
|
||||||
|
|
||||||
|
>>> some_object.filename = r'C:\Users\Steve\Documents\my_file.txt'.encode('mbcs')
|
||||||
|
|
||||||
|
>>> filename = some_object.filename
|
||||||
|
>>> type(filename)
|
||||||
|
<class 'bytes'>
|
||||||
|
>>> text = open(filename, 'r').read()
|
||||||
|
|
||||||
|
To fix this code for best compatibility across operating systems and Python
|
||||||
|
versions, the filename should be exposed as str::
|
||||||
|
|
||||||
|
>>> # Fix 1: Expose as str
|
||||||
|
>>> some_object.filename = r'C:\Users\Steve\Documents\my_file.txt'
|
||||||
|
|
||||||
|
>>> filename = some_object.filename
|
||||||
|
>>> type(filename)
|
||||||
|
<class 'str'>
|
||||||
|
>>> text = open(filename, 'r').read()
|
||||||
|
|
||||||
|
Alternatively, the encoding used for the path needs to be made available to the
|
||||||
|
user. Specifying ``os.fsencode()`` (or ``sys.getfilesystemencoding()``) is an
|
||||||
|
acceptable choice, or a new attribute could be added with the exact encoding::
|
||||||
|
|
||||||
|
>>> # Fix 2: Use fsencode
|
||||||
|
>>> some_object.filename = os.fsencode(r'C:\Users\Steve\Documents\my_file.txt')
|
||||||
|
|
||||||
|
>>> filename = some_object.filename
|
||||||
|
>>> type(filename)
|
||||||
|
<class 'bytes'>
|
||||||
|
>>> text = open(filename, 'r').read()
|
||||||
|
|
||||||
|
|
||||||
|
>>> # Fix 3: Expose as explicit encoding
|
||||||
|
>>> some_object.filename = r'C:\Users\Steve\Documents\my_file.txt'.encode('cp437')
|
||||||
|
>>> some_object.filename_encoding = 'cp437'
|
||||||
|
|
||||||
|
>>> filename = some_object.filename
|
||||||
|
>>> type(filename)
|
||||||
|
<class 'bytes'>
|
||||||
|
>>> filename = filename.decode(some_object.filename_encoding)
|
||||||
|
>>> type(filename)
|
||||||
|
<class 'str'>
|
||||||
|
>>> text = open(filename, 'r').read()
|
||||||
|
|
||||||
|
|
||||||
Explicitly using 'mbcs'
|
Explicitly using 'mbcs'
|
||||||
-----------------------
|
-----------------------
|
||||||
|
|
||||||
Code that explicitly encodes text using 'mbcs' before passing to file system
|
Code that explicitly encodes text using 'mbcs' before passing to file system
|
||||||
APIs is now passing incorrectly encoded bytes. For example::
|
APIs is now passing incorrectly encoded bytes. Note that the source of
|
||||||
|
``filename`` in this example is not relevant, provided that it is a str::
|
||||||
|
|
||||||
>>> filename = open('files.txt', 'r').readline()
|
>>> filename = open('files.txt', 'r').readline().rstrip()
|
||||||
>>> text = open(filename.encode('mbcs'), 'r')
|
>>> text = open(filename.encode('mbcs'), 'r')
|
||||||
|
|
||||||
To correct this code, the string should be passed without explicit encoding, or
|
To correct this code, the string should be passed without explicit encoding, or
|
||||||
should use ``os.fsencode()``::
|
should use ``os.fsencode()``::
|
||||||
|
|
||||||
>>> # Fix 1: Do not encode the string
|
>>> # Fix 1: Do not encode the string
|
||||||
>>> filename = open('files.txt', 'r').readline()
|
>>> filename = open('files.txt', 'r').readline().rstrip()
|
||||||
>>> text = open(filename, 'r')
|
>>> text = open(filename, 'r')
|
||||||
|
|
||||||
>>> # Fix 2: Use correct encoding
|
>>> # Fix 2: Use correct encoding
|
||||||
>>> filename = open('files.txt', 'r').readline()
|
>>> filename = open('files.txt', 'r').readline().rstrip()
|
||||||
>>> text = open(os.fsencode(filename), 'r')
|
>>> text = open(os.fsencode(filename), 'r')
|
||||||
|
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue