PEP: Add Optional Length-Checking To zip (#1389)
This commit is contained in:
parent
08a58eccaa
commit
11b2de41a7
|
@ -0,0 +1,236 @@
|
|||
PEP: 618
|
||||
Title: Add Optional Length-Checking To zip
|
||||
Version: $Revision$
|
||||
Last-Modified: $Date$
|
||||
Author: Brandt Bucher <brandtbucher@gmail.com>
|
||||
Sponsor: Antoine Pitrou <antoine@python.org>
|
||||
Status: Draft
|
||||
Type: Standards Track
|
||||
Content-Type: text/x-rst
|
||||
Created: 30-Apr-2020
|
||||
Python-Version: 3.9
|
||||
Post-History:
|
||||
Resolution:
|
||||
|
||||
|
||||
Abstract
|
||||
========
|
||||
|
||||
This PEP proposes adding an optional ``strict`` boolean keyword
|
||||
argument to the built-in ``zip``. When enabled, a ``ValueError`` is
|
||||
raised if one of the arguments is exhausted before the others.
|
||||
|
||||
|
||||
Motivation
|
||||
==========
|
||||
|
||||
Many Python users find that most of their ``zip`` usage involves
|
||||
iterables that *should* be of equal length. Sometimes this invariant
|
||||
is proven true from the context of the surrounding code, but often the
|
||||
data being zipped is passed from the caller, sourced separately, or
|
||||
generated in some fashion. In any of these cases, the default
|
||||
behavior of ``zip`` means that faulty refactoring or logic errors
|
||||
could easily result in silently losing data. These bugs are not only
|
||||
difficult to diagnose, but difficult to even detect at all.
|
||||
|
||||
It is easy to come up with simple cases where this could be a problem.
|
||||
For example, the following code may work fine when ``items`` is a
|
||||
sequence, but silently start producing shortened, mismatched results
|
||||
if ``items`` is refactored by the caller to be a consumable iterator::
|
||||
|
||||
def apply_calculations(items):
|
||||
transformed = transform(items)
|
||||
for x, y in zip(items, transformed):
|
||||
yield something(x, y)
|
||||
|
||||
There are several other ways in which ``zip`` is commonly used.
|
||||
Idiomatic tricks are especially susceptible, because they are often
|
||||
employed by users who lack a complete understanding of how the code
|
||||
works. One example is unpacking into ``zip`` to lazily "unzip" or
|
||||
"transpose" nested iterables::
|
||||
|
||||
>>> x = iter([iter([1, 2, 3]), iter(["one" "two" "three"])])
|
||||
>>> xt = list(zip(*x))
|
||||
|
||||
Another is "chunking" data into equal-sized groups::
|
||||
|
||||
>>> n = 3
|
||||
>>> x = range(n ** 2),
|
||||
>>> xn = list(zip(*[iter(x)] * n))
|
||||
|
||||
In the first case, non-rectangular data is usually a logic error. In
|
||||
the second case, data that is not a multiple of ``n`` is often an
|
||||
error as well. However, both of these idioms will silently omit the
|
||||
tail-end items of malformed input.
|
||||
|
||||
Perhaps most convincingly, the current use of ``zip`` in the
|
||||
standard-library ``ast`` module has created multiple bugs that
|
||||
`silently drop parts of malformed nodes
|
||||
<https://bugs.python.org/issue40355>`_::
|
||||
|
||||
>>> from ast import Constant, Dict, literal_eval
|
||||
>>> nasty_dict = Dict(keys=[Constant("XXX")], values=[])
|
||||
>>> literal_eval(nasty_dict) # Like eval('{"XXX": }')
|
||||
{}
|
||||
|
||||
In fact, the author has counted dozens of other call sites in Python's
|
||||
standard library and tooling where it would be appropriate to enable
|
||||
this new feature immediately.
|
||||
|
||||
|
||||
Rationale
|
||||
=========
|
||||
|
||||
Some critics assert that boolean switches are a "code-smell", or go
|
||||
against Python's design philosophy. However, Python currently
|
||||
contains several examples of built-in functions with boolean keyword
|
||||
arguments:
|
||||
|
||||
- ``compile(..., dont_inherit=True)``
|
||||
- ``open(..., closefd=False)``
|
||||
- ``print(..., flush=True)``
|
||||
- ``sorted(..., reverse=True)``
|
||||
|
||||
Many more exist in the standard library.
|
||||
|
||||
A good rule of thumb is that "mode-switches" which change return types
|
||||
or significantly alter functionality are indeed an anti-pattern, while
|
||||
ones which enable or disable complementary checks or functionality are
|
||||
not.
|
||||
|
||||
The name for this new parameter was taken from the `original message
|
||||
<https://mail.python.org/archives/list/python-ideas@python.org/message/6GFUADSQ5JTF7W7OGWF7XF2NH2XUTUQM>`_
|
||||
suggesting the feature. It received over 100 replies, with nobody
|
||||
challenging the use of the word "strict".
|
||||
|
||||
|
||||
Specification
|
||||
=============
|
||||
|
||||
When the built-in ``zip`` is called with the keyword-only argument
|
||||
``strict=True``, the resulting iterator will raise a ``ValueError`` if
|
||||
the arguments are exhausted at differing lengths. This error will
|
||||
occur at the point when iteration would normally stop today.
|
||||
|
||||
At most one additional item may be consumed from one of the iterators
|
||||
when compared to normal ``zip`` usage.
|
||||
|
||||
|
||||
Backward Compatibility
|
||||
======================
|
||||
|
||||
This change is fully backward-compatible.
|
||||
|
||||
|
||||
Reference Implementation
|
||||
========================
|
||||
|
||||
The author has `drafted a C implementation
|
||||
<https://github.com/python/cpython/compare/master...brandtbucher:zip-strict>`_.
|
||||
|
||||
An approximate pure-Python translation is::
|
||||
|
||||
def zip(*iterables, strict=False):
|
||||
if not iterables:
|
||||
return
|
||||
iterators = tuple(iter(iterable) for iterable in iterables)
|
||||
try:
|
||||
while True:
|
||||
items = []
|
||||
for iterator in iterators:
|
||||
items.append(next(iterator))
|
||||
yield tuple(items)
|
||||
except StopIteration:
|
||||
if not strict:
|
||||
return
|
||||
if items:
|
||||
i = len(items) + 1
|
||||
raise ValueError(f"zip() argument {i} is too short")
|
||||
sentinel = object()
|
||||
for i, iterator in enumerate(iterators[1:], 2):
|
||||
if next(iterator, sentinel) is not sentinel:
|
||||
raise ValueError(f"zip() argument {i} is too long")
|
||||
|
||||
|
||||
Rejected Ideas
|
||||
==============
|
||||
|
||||
Add Additional Flavors Of ``zip`` To ``itertools``
|
||||
''''''''''''''''''''''''''''''''''''''''''''''''''
|
||||
|
||||
Importing a drop-in replacement for a built-in feels too heavy,
|
||||
especially just to check a tricky condition that should "always" be
|
||||
true. The goal here is not just to provide a way to catch bugs, but
|
||||
to also make it easy (even tempting) for a user to enable the check
|
||||
whenever using ``zip`` at a call site with this property.
|
||||
|
||||
Some have also argued that a new function buried in the standard
|
||||
library is somehow more "discoverable" than a keyword argument on the
|
||||
built-in itself. The author does not believe this to be true.
|
||||
|
||||
Another proposed idiom, per-module shadowing of the built-in ``zip``
|
||||
with some subtly different variant from ``itertools``, is an
|
||||
anti-pattern that shouldn't be encouraged.
|
||||
|
||||
|
||||
Change The Default Behavior Of ``zip``
|
||||
''''''''''''''''''''''''''''''''''''''
|
||||
|
||||
Support for infinite iterators is generally useful, and there is
|
||||
nothing "wrong" with the default behavior of ``zip``. Likely, this
|
||||
backward-incompatible change would break more code than it "fixes".
|
||||
|
||||
``itertools.zip_longest`` already exists to service those cases where
|
||||
the "extra" tail-end data is still needed.
|
||||
|
||||
|
||||
Add A Method Or Alternate Constructor To The ``zip`` Type
|
||||
'''''''''''''''''''''''''''''''''''''''''''''''''''''''''
|
||||
|
||||
The actual ``zip`` type is an undocumented implementation detail.
|
||||
Adding additional methods or constructors is really a much larger
|
||||
change that is not necessary to achieve the stated goal.
|
||||
|
||||
|
||||
Raise An ``AssertionError`` Instead Of A ``ValueError``
|
||||
'''''''''''''''''''''''''''''''''''''''''''''''''''''''
|
||||
|
||||
There are no built-in functions or types that raise an
|
||||
``AssertionError`` as part of their API. Further, the `official
|
||||
documentation
|
||||
<https://docs.python.org/3.9/library/exceptions.html?highlight=assertionerror#AssertionError>`_
|
||||
simply reads (in its entirety):
|
||||
|
||||
Raised when an ``assert`` statement fails.
|
||||
|
||||
Since this feature has nothing to do with Python's ``assert``
|
||||
statement, raising an ``AssertionError`` here would be inappropriate.
|
||||
|
||||
|
||||
Do Nothing
|
||||
''''''''''
|
||||
|
||||
This option is perhaps the least attractive.
|
||||
|
||||
Silently truncated data is a particularly nasty class of bug, and
|
||||
hand-writing a robust solution that gets this right isn't trivial. The
|
||||
real-world motivating examples from Python's own standard library are
|
||||
evidence that it's *very* easy to fall into the sort of trap that this
|
||||
feature aims to avoid.
|
||||
|
||||
|
||||
Copyright
|
||||
=========
|
||||
|
||||
This document is placed in the public domain or under the
|
||||
CC0-1.0-Universal license, whichever is more permissive.
|
||||
|
||||
|
||||
..
|
||||
Local Variables:
|
||||
mode: indented-text
|
||||
indent-tabs-mode: nil
|
||||
sentence-end-double-space: t
|
||||
fill-column: 70
|
||||
coding: utf-8
|
||||
End:
|
Loading…
Reference in New Issue