Moved all the discussion items together at the end, in two sections

"Open Issues" and "Resolved Issues".
This commit is contained in:
Guido van Rossum 2001-05-01 02:04:28 +00:00
parent e5aa4a1379
commit 3deab93236
1 changed files with 139 additions and 85 deletions

View File

@ -97,22 +97,6 @@ C API Specification
reference to themselves; this is needed to make it possible to
use an iterator (as opposed to a sequence) in a for loop.
Discussion: should the next() method be renamed to __next__()?
Every other method corresponding to a tp_<something> slot has a
special name. On the other hand, this would suggest that there
should also be a primitive operation next(x) that would call
x.__next__(), and this just looks like adding complexity without
benefit. So I think it's better to stick with next(). On the
other hand, Marc-Andre Lemburg points out: "Even though .next()
reads better, I think that we should stick to the convention that
interpreter APIs use the __xxx__ naming scheme. Otherwise, people
will have a hard time differentiating between user-level protocols
and interpreter-level ones. AFAIK, .next() would be the first
low-level API not using this convention." My (BDFL's) response:
there are other important protocols with a user-level name
(e.g. keys()), and I don't see the importance of this particular
rule. BDFL pronouncement: this topic is closed. next() it is.
Python API Specification
@ -150,35 +134,6 @@ Python API Specification
above. A class that wants to be an iterator also ought to
implement __iter__() returning itself.
Discussion:
- The name iter() is an abbreviation. Alternatives proposed
include iterate(), harp(), traverse(), narrate().
- Using the same name for two different operations (getting an
iterator from an object and making an iterator for a function
with an sentinel value) is somewhat ugly. I haven't seen a
better name for the second operation though.
- There's a bit of undefined behavior for iterators: once a
particular iterator object has raised StopIteration, will it
also raise StopIteration on all subsequent next() calls? Some
say that it would be useful to require this, others say that it
is useful to leave this open to individual iterators. Note that
this may require an additional state bit for some iterator
implementations (e.g. function-wrapping iterators).
- Some folks have requested the ability to restart an iterator. I
believe this should be dealt with by calling iter() on a
sequence repeatedly, not by the iterator protocol itself.
- It was originally proposed that rather than having a next()
method, an iterator object should simply be callable. This was
rejected in favor of an explicit next() method. The reason is
clarity: if you don't know the code very well, "x = s()" does
not give a hint about what it does; but "x = s.next()" is pretty
clear. BDFL pronouncement: this topic is closed. next() it is.
Dictionary Iterators
@ -211,46 +166,11 @@ Dictionary Iterators
as long as the restriction on modifications to the dictionary
(either by the loop or by another thread) are not violated.
There is no doubt that the dict.has_keys(x) interpretation of "x
in dict" is by far the most useful interpretation, probably the
only useful one. There has been resistance against this because
"x in list" checks whether x is present among the values, while
the proposal makes "x in dict" check whether x is present among
the keys. Given that the symmetry between lists and dictionaries
is very weak, this argument does not have much weight.
The main discussion focuses on whether
for x in dict: ...
should assign x the successive keys, values, or items of the
dictionary. The symmetry between "if x in y" and "for x in y"
suggests that it should iterate over keys. This symmetry has been
observed by many independently and has even been used to "explain"
one using the other. This is because for sequences, "if x in y"
iterates over y comparing the iterated values to x. If we adopt
both of the above proposals, this will also hold for
dictionaries.
The argument against making "for x in dict" iterate over the keys
comes mostly from a practicality point of view: scans of the
standard library show that there are about as many uses of "for x
in dict.items()" as there are of "for x in dict.keys()", with the
items() version having a small majority. Presumably many of the
loops using keys() use the corresponding value anyway, by writing
dict[x], so (the argument goes) by making both the key and value
available, we could support the largest number of cases. While
this is true, I (Guido) find the correspondence between "for x in
dict" and "if x in dict" too compelling to break, and there's not
much overhead in having to write dict[x] to explicitly get the
value. We could also add methods to dictionaries that return
different kinds of iterators, e.g.
for key, value in dict.iteritems(): ...
for value in dict.itervalues(): ...
for key in dict.iterkeys(): ...
If this proposal is accepted, it makes sense to recommend that
other mappings, if they support iterators at all, should also
iterate over the keys. However, this should not be taken as an
absolute rule; specific applications may have different
requirements.
File Iterators
@ -309,6 +229,139 @@ Rationale
{__getitem__, keys, values, items}.
Open Issues
The following questions are still open.
- The name iter() is an abbreviation. Alternatives proposed
include iterate(), harp(), traverse(), narrate().
- Using the same name for two different operations (getting an
iterator from an object and making an iterator for a function
with an sentinel value) is somewhat ugly. I haven't seen a
better name for the second operation though.
- Once a particular iterator object has raised StopIteration, will
it also raise StopIteration on all subsequent next() calls?
Some say that it would be useful to require this, others say
that it is useful to leave this open to individual iterators.
Note that this may require an additional state bit for some
iterator implementations (e.g. function-wrapping iterators).
- Some folks have requested extensions of the iterator protocol,
e.g. prev() to get the previous item, current() to get the
current item again, finished() to test whether the iterator is
finished, and maybe even others, like rewind(), __len__(),
position().
While some of these are useful, many of these cannot easily be
implemented for all iterator types without adding arbitrary
buffering, and sometimes they can't be implemented at all (or
not reasonably). E.g. anything to do with reversing directions
can't be done when iterating over a file or function. Maybe a
separate PEP can be drafted to standardize the names for such
operations when the are implementable.
- There is still discussion about whether
for x in dict: ...
should assign x the successive keys, values, or items of the
dictionary. The symmetry between "if x in y" and "for x in y"
suggests that it should iterate over keys. This symmetry has been
observed by many independently and has even been used to "explain"
one using the other. This is because for sequences, "if x in y"
iterates over y comparing the iterated values to x. If we adopt
both of the above proposals, this will also hold for
dictionaries.
The argument against making "for x in dict" iterate over the keys
comes mostly from a practicality point of view: scans of the
standard library show that there are about as many uses of "for x
in dict.items()" as there are of "for x in dict.keys()", with the
items() version having a small majority. Presumably many of the
loops using keys() use the corresponding value anyway, by writing
dict[x], so (the argument goes) by making both the key and value
available, we could support the largest number of cases. While
this is true, I (Guido) find the correspondence between "for x in
dict" and "if x in dict" too compelling to break, and there's not
much overhead in having to write dict[x] to explicitly get the
value. We could also add methods to dictionaries that return
different kinds of iterators, e.g.
for key, value in dict.iteritems(): ...
for value in dict.itervalues(): ...
for key in dict.iterkeys(): ...
Resolved Issues
The following topics have been decided by consensus or BDFL
pronouncement.
- Two alternative spellings for next() have been proposed but
rejected: __next__(), because it corresponds to a type object
slot (tp_iternext); and __call__(), because this is the only
operation.
Arguments against __next__(): while many iterators are used in
for loops, it is expected that user code will also call next()
directly, so having to write __next__() is ugly; also, a
possible extension of the protocol would be to allow for prev(),
current() and reset() operations; surely we don't want to use
__prev__(), __current__(), __reset__().
Arguments against __call__() (the original proposal): taken out
of context, x() is not very readable, while x.next() is clear;
there's a danger that every special-purpose object wants to use
__call__() for its most common operation, causing more confusion
than clarity.
- Some folks have requested the ability to restart an iterator.
This should be dealt with by calling iter() on a sequence
repeatedly, not by the iterator protocol itself.
- It has been questioned whether an exception to signal the end of
the iteration isn't too expensive. Several alternatives for the
StopIteration exception have been proposed: a special value End
to signal the end, a function end() to test whether the iterator
is finished, even reusing the IndexError exception.
- A special value has the problem that if a sequence ever
contains that special value, a loop over that sequence will
end prematurely without any warning. If the experience with
null-terminated C strings hasn't taught us the problems this
can cause, imagine the trouble a Python introspection tool
would have iterating over a list of all built-in names,
assuming that the special End value was a built-in name!
- Calling an end() function would require two calls per
iteration. Two calls is much more expensive than one call
plus a test for an exception. Especially the time-critical
for loop can test very cheaply for an exception.
- Reusing IndexError can cause confusion because it can be a
genuine error, which would be masked by ending the loop
prematurely.
- Some have asked for a standard iterator type. Presumably all
iterators would have to be derived from this type. But this is
not the Python way: dictionaries are mappings because they
support __getitem__() and a handful other operations, not
because they are derived from an abstract mapping type.
- Regarding "if key in dict": there is no doubt that the
dict.has_keys(x) interpretation of "x in dict" is by far the
most useful interpretation, probably the only useful one. There
has been resistance against this because "x in list" checks
whether x is present among the values, while the proposal makes
"x in dict" check whether x is present among the keys. Given
that the symmetry between lists and dictionaries is very weak,
this argument does not have much weight.
Mailing Lists
The iterator protocol has been discussed extensively in a mailing
@ -321,6 +374,7 @@ Mailing Lists
http://groups.yahoo.com/group/python-iter
Copyright
This document is in the public domain.