PEP 618: Second Draft (#1399)

* Update history.

* Credit Ram.

* Acknowledge "equal" alternative.

* Reword link.

* Add note about strict=__debug__.

* Add another argument against  "equal".

* Don't lean so heavily on infinite iterators.

* Clarify example.

* Clean up example.

* Clean up response.

* Outline precedent for  map.

* Clean up wording.

* Reword map section.

* Move method argument.

* Flesh out argument against methods.

* Further refinements to method argument.

* Add callback argument.

* Simplify AST example.

* Clean up method/constructor outcomes.

* Clean up callback bit.

* Add StackOverflow link.

* Address "mode" parameter.

* "argument" -> "parameter"

* Address "constant" arguments in Rationale.

* Clean up Rationale phrasing.

* Flesh out itertools bit.

* Clean up itertools wording.
This commit is contained in:
Brandt Bucher 2020-05-10 08:31:47 -07:00 committed by GitHub
parent 4732446cae
commit 17a0d39810
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 118 additions and 34 deletions

View File

@ -7,9 +7,9 @@ Sponsor: Antoine Pitrou <antoine@python.org>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 30-Apr-2020
Created: 01-May-2020
Python-Version: 3.9
Post-History:
Post-History: 01-May-2020
Resolution:
@ -17,7 +17,7 @@ Abstract
========
This PEP proposes adding an optional ``strict`` boolean keyword
argument to the built-in ``zip``. When enabled, a ``ValueError`` is
parameter to the built-in ``zip``. When enabled, a ``ValueError`` is
raised if one of the arguments is exhausted before the others.
@ -59,9 +59,9 @@ Another is "chunking" data into equal-sized groups::
>>> xn = list(zip(*[iter(x)] * n))
In the first case, non-rectangular data is usually a logic error. In
the second case, data that is not a multiple of ``n`` is often an
error as well. However, both of these idioms will silently omit the
tail-end items of malformed input.
the second case, data with a length that is not a multiple of ``n`` is
often an error as well. However, both of these idioms will silently
omit the tail-end items of malformed input.
Perhaps most convincingly, the current use of ``zip`` in the
standard-library ``ast`` module has created multiple bugs that
@ -69,8 +69,8 @@ standard-library ``ast`` module has created multiple bugs that
<https://bugs.python.org/issue40355>`_::
>>> from ast import Constant, Dict, literal_eval
>>> nasty_dict = Dict(keys=[Constant("XXX")], values=[])
>>> literal_eval(nasty_dict) # Like eval('{"XXX": }')
>>> nasty_dict = Dict(keys=[Constant(None)], values=[])
>>> literal_eval(nasty_dict) # Like eval("{None: }")
{}
In fact, the author has counted dozens of other call sites in Python's
@ -81,10 +81,10 @@ this new feature immediately.
Rationale
=========
Some critics assert that boolean switches are a "code-smell", or go
against Python's design philosophy. However, Python currently
contains several examples of built-in functions with boolean keyword
arguments:
Some critics assert that constant boolean switches are a "code-smell",
or go against Python's design philosophy. However, Python currently
contains several examples of boolean keyword parameters on built-in
functions which are typically called with compile-time constants:
- ``compile(..., dont_inherit=True)``
- ``open(..., closefd=False)``
@ -95,13 +95,18 @@ Many more exist in the standard library.
A good rule of thumb is that "mode-switches" which change return types
or significantly alter functionality are indeed an anti-pattern, while
ones which enable or disable complementary checks or functionality are
not.
ones which enable or disable complementary checks or behavior are not.
The name for this new parameter was taken from the `original message
The idea and name for this new parameter were `originally proposed
<https://mail.python.org/archives/list/python-ideas@python.org/message/6GFUADSQ5JTF7W7OGWF7XF2NH2XUTUQM>`_
suggesting the feature. It received over 100 replies, with nobody
challenging the use of the word "strict".
by Ram Rachum. The thread received over 100 replies, with the
alternative "equal" receiving a similar amount of support.
The author does not have a strong preference between the two choices,
though "equal equals" *is* a bit awkward in prose. It may also
(wrongly) imply some notion of "equality" between the zipped items::
>>> z = zip([2.0, 4.0, 6.0], [2, 4, 8], equal=True)
Specification
@ -125,7 +130,7 @@ This change is fully backward-compatible.
Reference Implementation
========================
The author has `drafted a C implementation
The author has drafted a `C implementation
<https://github.com/python/cpython/compare/master...brandtbucher:zip-strict>`_.
An approximate pure-Python translation is::
@ -158,14 +163,26 @@ Rejected Ideas
Add Additional Flavors Of ``zip`` To ``itertools``
''''''''''''''''''''''''''''''''''''''''''''''''''
Importing a drop-in replacement for a built-in feels too heavy,
Adding ``zip_strict`` to itertools is a larger change with greater
maintenance burden than the simple modification being proposed.
It seems that a great deal of the motivation driving this alternative
is that ``zip_longest`` already exists in ``itertools``. However,
``zip_longest`` is really another beast entirely: it takes on the
responsibility of filling in missing values, a problem neither of
the other variants even have. It also arguably has the most
specialized behavior of the three (to the point of exposing a new
``fillvalue`` parameter), so it makes sense that it would live in
``itertools`` while ``zip`` grows in-place.
Importing a drop-in replacement for a built-in also feels too heavy,
especially just to check a tricky condition that should "always" be
true. The goal here is not just to provide a way to catch bugs, but
to also make it easy (even tempting) for a user to enable the check
whenever using ``zip`` at a call site with this property.
Some have also argued that a new function buried in the standard
library is somehow more "discoverable" than a keyword argument on the
library is somehow more "discoverable" than a keyword parameter on the
built-in itself. The author does not believe this to be true.
Another proposed idiom, per-module shadowing of the built-in ``zip``
@ -173,23 +190,72 @@ with some subtly different variant from ``itertools``, is an
anti-pattern that shouldn't be encouraged.
Change The Default Behavior Of ``zip``
''''''''''''''''''''''''''''''''''''''
Add Several "Modes" To Switch Between
'''''''''''''''''''''''''''''''''''''
Support for infinite iterators is generally useful, and there is
nothing "wrong" with the default behavior of ``zip``. Likely, this
backward-incompatible change would break more code than it "fixes".
This option only makes more sense than a binary flag if we anticipate
having three or more modes. The "obvious" three choices for these
enumerated or constant modes would be "shortest" (the current ``zip``
behavior), "strict" (the proposed behavior), and "longest"
(the ``itertools.zip_longest`` behavior).
``itertools.zip_longest`` already exists to service those cases where
the "extra" tail-end data is still needed.
However, it doesn't seem like adding behaviors other than the current
default and the proposed "strict" mode is worth the additional
complexity. The clearest candidate, "longest", would require a new
``fillvalue`` parameter (which is meaningless for both other modes).
This mode is also already handled perfectly by
``itertools.zip_longest``, and adding it would create two ways of
doing the same thing. It's not clear which would be the "obvious"
choice: the ``mode`` parameter on the built-in ``zip``, or the
long-lived namesake utility in ``itertools``.
Add A Method Or Alternate Constructor To The ``zip`` Type
'''''''''''''''''''''''''''''''''''''''''''''''''''''''''
The actual ``zip`` type is an undocumented implementation detail.
Adding additional methods or constructors is really a much larger
change that is not necessary to achieve the stated goal.
Consider the following two options, which have both been proposed::
>>> zm = zip(*iters).strict()
>>> zd = zip.strict(*iters)
It's not obvious which one will succeed, or how the other will fail.
If ``zip.strict`` is implemented as a method, ``zm`` will succeed, but
``zd`` will fail in one of several confusing ways:
- Yield results that aren't wrapped in a tuple (if ``iters`` contains
just one item, a ``zip`` iterator).
- Raise a ``TypeError`` for an incorrect argument type (if ``iters``
contains just one item, not a ``zip`` iterator).
- Raise a ``TypeError`` for an incorrect number of arguments
(otherwise).
If ``zip.strict`` is implemented as a ``classmethod`` or
``staticmethod``, ``zd`` will succeed, and ``zm`` will silently yield
nothing (which is the problem we are trying to avoid in the first
place).
This proposal is further complicated by the fact that CPython's actual
``zip`` type is an undocumented implementation detail.
Change The Default Behavior Of ``zip``
''''''''''''''''''''''''''''''''''''''
There is nothing "wrong" with the default behavior of ``zip``, since
there are many cases where it is indeed the correct way to handle
unequally-sized inputs. It's extremely useful, for example, when
dealing with infinite iterators.
``itertools.zip_longest`` already exists to service those cases where
the "extra" tail-end data is still needed.
Accept A Callback To Handle Remaining Items
'''''''''''''''''''''''''''''''''''''''''''
While able to do basically anything a user could need, this solution
makes handling the more common cases (like rejecting mismatched
lengths) unnecessarily complicated and non-obvious.
Raise An ``AssertionError`` Instead Of A ``ValueError``
@ -205,6 +271,23 @@ simply reads (in its entirety):
Since this feature has nothing to do with Python's ``assert``
statement, raising an ``AssertionError`` here would be inappropriate.
Users desiring a check that is disabled in optimized mode (like an
``assert`` statement) can use ``strict=__debug__`` instead.
Add A Similar Feature to ``map``
''''''''''''''''''''''''''''''''
This PEP does not propose any changes to ``map``, since the use of
``map`` with multiple iterable arguments is quite rare. However, this
PEP's ruling shall serve as precedent such a future discussion (should
it occur).
If rejected, the feature is realistically not worth pursuing. If
accepted, such a change to ``map`` should not require its own PEP
(though, like all enhancements, its usefulness should be carefully
considered). For consistency, it should follow same API and semantics
debated here for ``zip``.
Do Nothing
@ -213,10 +296,11 @@ Do Nothing
This option is perhaps the least attractive.
Silently truncated data is a particularly nasty class of bug, and
hand-writing a robust solution that gets this right isn't trivial. The
real-world motivating examples from Python's own standard library are
evidence that it's *very* easy to fall into the sort of trap that this
feature aims to avoid.
hand-writing a robust solution that gets this right `isn't trivial
<https://stackoverflow.com/questions/32954486/zip-iterators-asserting-for-equal-length-in-python>`_.
The real-world motivating examples from Python's own standard library
are evidence that it's *very* easy to fall into the sort of trap that
this feature aims to avoid.
Copyright