PEP 529: Adds sys function for legacy mode, adds "experimental" procedure, and expands examples
This commit is contained in:
parent
d6dd35ec63
commit
10cdc578c1
98
pep-0529.txt
98
pep-0529.txt
|
@ -197,7 +197,11 @@ Add legacy mode
|
|||
---------------
|
||||
|
||||
Add a legacy mode flag, enabled by the environment variable
|
||||
``PYTHONLEGACYWINDOWSFSENCODING``.
|
||||
``PYTHONLEGACYWINDOWSFSENCODING`` or by a function call to
|
||||
``sys.enable_legacy_windows_fs_encoding()``. The function call can only be
|
||||
used to enable the flag and should be used by programs as close to
|
||||
initialization as possible. Legacy mode cannot be disabled while Python is
|
||||
running.
|
||||
|
||||
When this flag is set, the default filesystem encoding is set to mbcs rather
|
||||
than utf-8, and the error mode is set to ``replace`` rather than
|
||||
|
@ -213,6 +217,27 @@ this is no longer the case, and that paths when encoded as bytes should use
|
|||
whatever is returned from ``sys.getfilesystemencoding()`` rather than the user's
|
||||
active code page.
|
||||
|
||||
Beta experiment
|
||||
---------------
|
||||
|
||||
To assist with determining the impact of this change, we propose applying it to
|
||||
3.6.0b1 provisionally with the intent being to make a final decision before
|
||||
3.6.0rc1.
|
||||
|
||||
During the experiment period, decoding and encoding exception messages will be
|
||||
expanded to include a link to an active online discussion and encourage
|
||||
reporting of problems.
|
||||
|
||||
If it is decided to revert the functionality for 3.6.0rc1, the implementation
|
||||
change would be to permanently enable the legacy mode flag, change the
|
||||
environment variable to ``PYTHONWINDOWSUTF8FSENCODING`` and function to
|
||||
``sys.enable_windows_utf8_fs_encoding()`` to allow enabling the functionality
|
||||
on a case-by-case basis, as opposed to disabling it.
|
||||
|
||||
It is expected that if we cannot feasibly make the change for 3.6 due to
|
||||
compatibility concerns, it will not be possible to make the change at any later
|
||||
time in Python 3.x.
|
||||
|
||||
Affected Modules
|
||||
----------------
|
||||
|
||||
|
@ -312,7 +337,10 @@ Code that may break
|
|||
===================
|
||||
|
||||
The following code patterns may break or see different behaviour as a result of
|
||||
this change.
|
||||
this change. Each of these examples would have been fragile in code intended for
|
||||
cross-platform use. The suggested fixes demonstrate the most compatible way to
|
||||
handle path encoding issues across all platforms and across multiple Python
|
||||
versions.
|
||||
|
||||
Note that all of these examples produce deprecation warnings on Python 3.3 and
|
||||
later.
|
||||
|
@ -322,7 +350,8 @@ Not managing encodings across boundaries
|
|||
|
||||
Code that does not manage encodings when crossing protocol boundaries may
|
||||
currently be working by chance, but could encounter issues when either encoding
|
||||
changes. For example::
|
||||
changes. Note that the source of ``filename`` may be any function that returns
|
||||
a bytes object, as illustrated in a second example below::
|
||||
|
||||
>>> filename = open('filename_in_mbcs.txt', 'rb').read()
|
||||
>>> text = open(filename, 'r').read()
|
||||
|
@ -330,33 +359,84 @@ changes. For example::
|
|||
To correct this code, the encoding of the bytes in ``filename`` should be
|
||||
specified, either when reading from the file or before using the value::
|
||||
|
||||
>>> # Fix 1: Open file as text
|
||||
>>> # Fix 1: Open file as text (default encoding)
|
||||
>>> filename = open('filename_in_mbcs.txt', 'r').read()
|
||||
>>> text = open(filename, 'r').read()
|
||||
|
||||
>>> # Fix 2: Open file as text (explicit encoding)
|
||||
>>> filename = open('filename_in_mbcs.txt', 'r', encoding='mbcs').read()
|
||||
>>> text = open(filename, 'r').read()
|
||||
|
||||
>>> # Fix 2: Decode path
|
||||
>>> # Fix 3: Explicitly decode the path
|
||||
>>> filename = open('filename_in_mbcs.txt', 'rb').read()
|
||||
>>> text = open(filename.decode('mbcs'), 'r').read()
|
||||
|
||||
Where the creator of ``filename`` is separated from the user of ``filename``,
|
||||
the encoding is important information to include::
|
||||
|
||||
>>> some_object.filename = r'C:\Users\Steve\Documents\my_file.txt'.encode('mbcs')
|
||||
|
||||
>>> filename = some_object.filename
|
||||
>>> type(filename)
|
||||
<class 'bytes'>
|
||||
>>> text = open(filename, 'r').read()
|
||||
|
||||
To fix this code for best compatibility across operating systems and Python
|
||||
versions, the filename should be exposed as str::
|
||||
|
||||
>>> # Fix 1: Expose as str
|
||||
>>> some_object.filename = r'C:\Users\Steve\Documents\my_file.txt'
|
||||
|
||||
>>> filename = some_object.filename
|
||||
>>> type(filename)
|
||||
<class 'str'>
|
||||
>>> text = open(filename, 'r').read()
|
||||
|
||||
Alternatively, the encoding used for the path needs to be made available to the
|
||||
user. Specifying ``os.fsencode()`` (or ``sys.getfilesystemencoding()``) is an
|
||||
acceptable choice, or a new attribute could be added with the exact encoding::
|
||||
|
||||
>>> # Fix 2: Use fsencode
|
||||
>>> some_object.filename = os.fsencode(r'C:\Users\Steve\Documents\my_file.txt')
|
||||
|
||||
>>> filename = some_object.filename
|
||||
>>> type(filename)
|
||||
<class 'bytes'>
|
||||
>>> text = open(filename, 'r').read()
|
||||
|
||||
|
||||
>>> # Fix 3: Expose as explicit encoding
|
||||
>>> some_object.filename = r'C:\Users\Steve\Documents\my_file.txt'.encode('cp437')
|
||||
>>> some_object.filename_encoding = 'cp437'
|
||||
|
||||
>>> filename = some_object.filename
|
||||
>>> type(filename)
|
||||
<class 'bytes'>
|
||||
>>> filename = filename.decode(some_object.filename_encoding)
|
||||
>>> type(filename)
|
||||
<class 'str'>
|
||||
>>> text = open(filename, 'r').read()
|
||||
|
||||
|
||||
Explicitly using 'mbcs'
|
||||
-----------------------
|
||||
|
||||
Code that explicitly encodes text using 'mbcs' before passing to file system
|
||||
APIs is now passing incorrectly encoded bytes. For example::
|
||||
APIs is now passing incorrectly encoded bytes. Note that the source of
|
||||
``filename`` in this example is not relevant, provided that it is a str::
|
||||
|
||||
>>> filename = open('files.txt', 'r').readline()
|
||||
>>> filename = open('files.txt', 'r').readline().rstrip()
|
||||
>>> text = open(filename.encode('mbcs'), 'r')
|
||||
|
||||
To correct this code, the string should be passed without explicit encoding, or
|
||||
should use ``os.fsencode()``::
|
||||
|
||||
>>> # Fix 1: Do not encode the string
|
||||
>>> filename = open('files.txt', 'r').readline()
|
||||
>>> filename = open('files.txt', 'r').readline().rstrip()
|
||||
>>> text = open(filename, 'r')
|
||||
|
||||
>>> # Fix 2: Use correct encoding
|
||||
>>> filename = open('files.txt', 'r').readline()
|
||||
>>> filename = open('files.txt', 'r').readline().rstrip()
|
||||
>>> text = open(os.fsencode(filename), 'r')
|
||||
|
||||
|
||||
|
|
Loading…
Reference in New Issue