PEP 528: Readability and style updates

This commit is contained in:
Steve Dower 2016-09-04 22:43:56 -07:00
parent a73c6dfc9c
commit 37567a1d72
1 changed files with 51 additions and 28 deletions

View File

@ -21,8 +21,7 @@ the active code page.
This PEP proposes changing the default standard stream implementation on Windows This PEP proposes changing the default standard stream implementation on Windows
to use the Unicode APIs. This will allow users to print and input the full range to use the Unicode APIs. This will allow users to print and input the full range
of Unicode characters at the default Windows console. This also requires a of Unicode characters at the default Windows console. This also requires a
subtle change to how the tokenizer parses text from readline hooks, that should subtle change to how the tokenizer parses text from readline hooks.
have no backwards compatibility issues.
Specific Changes Specific Changes
================ ================
@ -46,7 +45,7 @@ utf-16-le and converted into utf-8 when returned to Python.
The use of an ASCII compatible encoding is required to maintain compatibility The use of an ASCII compatible encoding is required to maintain compatibility
with code that bypasses the ``TextIOWrapper`` and directly writes ASCII bytes to with code that bypasses the ``TextIOWrapper`` and directly writes ASCII bytes to
the standard streams (for example, [process_stdinreader.py]_). Code that assumes the standard streams (for example, `Twisted's process_stdinreader.py`_). Code that assumes
a particular encoding for the standard streams other than ASCII will likely a particular encoding for the standard streams other than ASCII will likely
break. break.
@ -78,8 +77,9 @@ behaviour.
Alternative Approaches Alternative Approaches
====================== ======================
The ``win_unicode_console`` package [win_unicode_console]_ is a pure-Python The `win_unicode_console package`_ is a pure-Python alternative to changing the
alternative to changing the default behaviour of the console. default behaviour of the console. It implements essentially the same
modifications as described here using pure Python code.
Code that may break Code that may break
=================== ===================
@ -94,21 +94,21 @@ Assuming stdin/stdout encoding
Code that assumes that the encoding required by ``sys.stdin.buffer`` or Code that assumes that the encoding required by ``sys.stdin.buffer`` or
``sys.stdout.buffer`` is ``'mbcs'`` or a more specific encoding may currently be ``sys.stdout.buffer`` is ``'mbcs'`` or a more specific encoding may currently be
working by chance, but could encounter issues under this change. For example:: working by chance, but could encounter issues under this change. For example:
sys.stdout.buffer.write(text.encode('mbcs')) >>> sys.stdout.buffer.write(text.encode('mbcs'))
r = sys.stdin.buffer.read(16).decode('cp437') >>> r = sys.stdin.buffer.read(16).decode('cp437')
To correct this code, the encoding specified on the ``TextIOWrapper`` should be To correct this code, the encoding specified on the ``TextIOWrapper`` should be
used, either implicitly or explicitly:: used, either implicitly or explicitly:
# Fix 1: Use wrapper correctly >>> # Fix 1: Use wrapper correctly
sys.stdout.write(text) >>> sys.stdout.write(text)
r = sys.stdin.read(16) >>> r = sys.stdin.read(16)
# Fix 2: Use encoding explicitly >>> # Fix 2: Use encoding explicitly
sys.stdout.buffer.write(text.encode(sys.stdout.encoding)) >>> sys.stdout.buffer.write(text.encode(sys.stdout.encoding))
r = sys.stdin.buffer.read(16).decode(sys.stdin.encoding) >>> r = sys.stdin.buffer.read(16).decode(sys.stdin.encoding)
Incorrectly using the raw object Incorrectly using the raw object
-------------------------------- --------------------------------
@ -117,32 +117,57 @@ Code that uses the raw IO object and does not correctly handle partial reads and
writes may be affected. This is particularly important for reads, where the writes may be affected. This is particularly important for reads, where the
number of characters read will never exceed one-fourth of the number of bytes number of characters read will never exceed one-fourth of the number of bytes
allowed, as there is no feasible way to prevent input from encoding as much allowed, as there is no feasible way to prevent input from encoding as much
longer utf-8 strings:: longer utf-8 strings.
>>> stdin = open(sys.stdin.fileno(), 'rb') >>> raw_stdin = sys.stdin.buffer.raw
>>> data = stdin.raw.read(15) >>> data = raw_stdin.read(15)
abcdefghijklm abcdefghijklm
b'abc' b'abc'
# data contains at most 3 characters, and never more than 12 bytes # data contains at most 3 characters, and never more than 12 bytes
# error, as "defghijklm\r\n" is passed to the interactive prompt # error, as "defghijklm\r\n" is passed to the interactive prompt
To correct this code, the buffered reader/writer should be used, or the caller To correct this code, the buffered reader/writer should be used, or the caller
should continue reading until its buffer is full.:: should continue reading until its buffer is full.
# Fix 1: Use the buffered reader/writer >>> # Fix 1: Use the buffered reader/writer
>>> stdin = open(sys.stdin.fileno(), 'rb') >>> stdin = sys.stdin.buffer
>>> data = stdin.read(15) >>> data = stdin.read(15)
abcedfghijklm abcedfghijklm
b'abcdefghijklm\r\n' b'abcdefghijklm\r\n'
# Fix 2: Loop until enough bytes have been read >>> # Fix 2: Loop until enough bytes have been read
>>> stdin = open(sys.stdin.fileno(), 'rb') >>> raw_stdin = sys.stdin.buffer.raw
>>> b = b'' >>> b = b''
>>> while len(b) < 15: >>> while len(b) < 15:
... b += stdin.raw.read(15) ... b += raw_stdin.read(15)
abcedfghijklm abcedfghijklm
b'abcdefghijklm\r\n' b'abcdefghijklm\r\n'
Using the raw object with small buffers
---------------------------------------
Code that uses the raw IO object and attempts to read less than four characters
will now receive an error. Because it's possible that any single character may
require up to four bytes when represented in utf-8, requests must fail.
>>> raw_stdin = sys.stdin.buffer.raw
>>> data = raw_stdin.read(3)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: must read at least 4 bytes
The only workaround is to pass a larger buffer.
>>> # Fix: Request at least four bytes
>>> raw_stdin = sys.stdin.buffer.raw
>>> data = raw_stdin.read(4)
a
b'a'
>>> >>>
(The extra ``>>>`` is due to the newline remaining in the input buffer and is
expected in this situation.)
Copyright Copyright
========= =========
@ -151,7 +176,5 @@ This document has been placed in the public domain.
References References
========== ==========
.. [process_stdinreader.py] Twisted's process_stdinreader.py .. _Twisted's process_stdinreader.py: https://github.com/twisted/twisted/blob/trunk/src/twisted/test/process_stdinreader.py
(https://github.com/twisted/twisted/blob/trunk/src/twisted/test/process_stdinreader.py) .. _win_unicode_console package: https://pypi.org/project/win_unicode_console/
.. [win_unicode_console] win_unicode_console package
(https://pypi.org/project/win_unicode_console/)