2001-02-19 01:08:07 -05:00
|
|
|
|
PEP: 234
|
2001-02-03 23:36:37 -05:00
|
|
|
|
Title: Iterators
|
|
|
|
|
Version: $Revision$
|
2006-03-23 15:13:19 -05:00
|
|
|
|
Last-Modified: $Date$
|
2007-06-27 20:11:17 -04:00
|
|
|
|
Author: ping@zesty.ca (Ka-Ping Yee), guido@python.org (Guido van Rossum)
|
2001-10-25 16:14:01 -04:00
|
|
|
|
Status: Final
|
2001-02-03 23:36:37 -05:00
|
|
|
|
Type: Standards Track
|
|
|
|
|
Created: 30-Jan-2001
|
2007-06-19 00:20:07 -04:00
|
|
|
|
Python-Version: 2.1
|
2001-04-30 22:29:03 -04:00
|
|
|
|
Post-History: 30-Apr-2001
|
2001-02-03 23:36:37 -05:00
|
|
|
|
|
|
|
|
|
Abstract
|
|
|
|
|
|
|
|
|
|
This document proposes an iteration interface that objects can
|
|
|
|
|
provide to control the behaviour of 'for' loops. Looping is
|
|
|
|
|
customized by providing a method that produces an iterator object.
|
2001-04-23 14:31:46 -04:00
|
|
|
|
The iterator provides a 'get next value' operation that produces
|
2001-05-01 13:52:06 -04:00
|
|
|
|
the next item in the sequence each time it is called, raising an
|
2001-04-23 14:31:46 -04:00
|
|
|
|
exception when no more items are available.
|
|
|
|
|
|
|
|
|
|
In addition, specific iterators over the keys of a dictionary and
|
|
|
|
|
over the lines of a file are proposed, and a proposal is made to
|
2001-05-01 07:42:07 -04:00
|
|
|
|
allow spelling dict.has_key(key) as "key in dict".
|
2001-04-23 14:31:46 -04:00
|
|
|
|
|
|
|
|
|
Note: this is an almost complete rewrite of this PEP by the second
|
|
|
|
|
author, describing the actual implementation checked into the
|
|
|
|
|
trunk of the Python 2.2 CVS tree. It is still open for
|
|
|
|
|
discussion. Some of the more esoteric proposals in the original
|
|
|
|
|
version of this PEP have been withdrawn for now; these may be the
|
|
|
|
|
subject of a separate PEP in the future.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
C API Specification
|
|
|
|
|
|
|
|
|
|
A new exception is defined, StopIteration, which can be used to
|
|
|
|
|
signal the end of an iteration.
|
|
|
|
|
|
|
|
|
|
A new slot named tp_iter for requesting an iterator is added to
|
|
|
|
|
the type object structure. This should be a function of one
|
|
|
|
|
PyObject * argument returning a PyObject *, or NULL. To use this
|
|
|
|
|
slot, a new C API function PyObject_GetIter() is added, with the
|
|
|
|
|
same signature as the tp_iter slot function.
|
|
|
|
|
|
|
|
|
|
Another new slot, named tp_iternext, is added to the type
|
|
|
|
|
structure, for obtaining the next value in the iteration. To use
|
|
|
|
|
this slot, a new C API function PyIter_Next() is added. The
|
2001-05-04 20:14:56 -04:00
|
|
|
|
signature for both the slot and the API function is as follows,
|
|
|
|
|
although the NULL return conditions differ: the argument is a
|
|
|
|
|
PyObject * and so is the return value. When the return value is
|
|
|
|
|
non-NULL, it is the next value in the iteration. When it is NULL,
|
|
|
|
|
then for the tp_iternext slot there are three possibilities:
|
2001-04-23 14:31:46 -04:00
|
|
|
|
|
|
|
|
|
- No exception is set; this implies the end of the iteration.
|
|
|
|
|
|
|
|
|
|
- The StopIteration exception (or a derived exception class) is
|
|
|
|
|
set; this implies the end of the iteration.
|
|
|
|
|
|
|
|
|
|
- Some other exception is set; this means that an error occurred
|
|
|
|
|
that should be propagated normally.
|
|
|
|
|
|
2001-05-04 20:14:56 -04:00
|
|
|
|
The higher-level PyIter_Next() function clears the StopIteration
|
|
|
|
|
exception (or derived exception) when it occurs, so its NULL return
|
|
|
|
|
conditions are simpler:
|
|
|
|
|
|
|
|
|
|
- No exception is set; this means iteration has ended.
|
|
|
|
|
|
|
|
|
|
- Some exception is set; this means an error occurred, and should
|
|
|
|
|
be propagated normally.
|
|
|
|
|
|
2002-07-18 16:38:28 -04:00
|
|
|
|
Iterators implemented in C should *not* implement a next() method
|
|
|
|
|
with similar semantics as the tp_iternext slot! When the type's
|
|
|
|
|
dictionary is initialized (by PyType_Ready()), the presence of a
|
|
|
|
|
tp_iternext slot causes a method next() wrapping that slot to be
|
|
|
|
|
added to the type's tp_dict. (Exception: if the type doesn't use
|
|
|
|
|
PyObject_GenericGetAttr() to access instance attributes, the
|
|
|
|
|
next() method in the type's tp_dict may not be seen.) (Due to a
|
|
|
|
|
misunderstanding in the original text of this PEP, in Python 2.2,
|
|
|
|
|
all iterator types implemented a next() method that was overridden
|
|
|
|
|
by the wrapper; this has been fixed in Python 2.3.)
|
2001-04-23 14:31:46 -04:00
|
|
|
|
|
|
|
|
|
To ensure binary backwards compatibility, a new flag
|
|
|
|
|
Py_TPFLAGS_HAVE_ITER is added to the set of flags in the tp_flags
|
|
|
|
|
field, and to the default flags macro. This flag must be tested
|
|
|
|
|
before accessing the tp_iter or tp_iternext slots. The macro
|
|
|
|
|
PyIter_Check() tests whether an object has the appropriate flag
|
|
|
|
|
set and has a non-NULL tp_iternext slot. There is no such macro
|
|
|
|
|
for the tp_iter slot (since the only place where this slot is
|
2002-07-18 16:38:28 -04:00
|
|
|
|
referenced should be PyObject_GetIter(), and this can check for
|
|
|
|
|
the Py_TPFLAGS_HAVE_ITER flag directly).
|
2001-04-23 14:31:46 -04:00
|
|
|
|
|
|
|
|
|
(Note: the tp_iter slot can be present on any object; the
|
|
|
|
|
tp_iternext slot should only be present on objects that act as
|
|
|
|
|
iterators.)
|
|
|
|
|
|
|
|
|
|
For backwards compatibility, the PyObject_GetIter() function
|
|
|
|
|
implements fallback semantics when its argument is a sequence that
|
|
|
|
|
does not implement a tp_iter function: a lightweight sequence
|
|
|
|
|
iterator object is constructed in that case which iterates over
|
|
|
|
|
the items of the sequence in the natural order.
|
|
|
|
|
|
|
|
|
|
The Python bytecode generated for 'for' loops is changed to use
|
|
|
|
|
new opcodes, GET_ITER and FOR_ITER, that use the iterator protocol
|
|
|
|
|
rather than the sequence protocol to get the next value for the
|
|
|
|
|
loop variable. This makes it possible to use a 'for' loop to loop
|
|
|
|
|
over non-sequence objects that support the tp_iter slot. Other
|
|
|
|
|
places where the interpreter loops over the values of a sequence
|
|
|
|
|
should also be changed to use iterators.
|
|
|
|
|
|
|
|
|
|
Iterators ought to implement the tp_iter slot as returning a
|
|
|
|
|
reference to themselves; this is needed to make it possible to
|
|
|
|
|
use an iterator (as opposed to a sequence) in a for loop.
|
|
|
|
|
|
2002-07-18 16:38:28 -04:00
|
|
|
|
Iterator implementations (in C or in Python) should guarantee that
|
|
|
|
|
once the iterator has signalled its exhaustion, subsequent calls
|
|
|
|
|
to tp_iternext or to the next() method will continue to do so. It
|
|
|
|
|
is not specified whether an iterator should enter the exhausted
|
|
|
|
|
state when an exception (other than StopIteration) is raised.
|
|
|
|
|
Note that Python cannot guarantee that user-defined or 3rd party
|
|
|
|
|
iterators implement this requirement correctly.
|
|
|
|
|
|
2001-04-23 14:31:46 -04:00
|
|
|
|
|
|
|
|
|
Python API Specification
|
|
|
|
|
|
2001-04-30 22:29:03 -04:00
|
|
|
|
The StopIteration exception is made visible as one of the
|
2001-04-23 14:31:46 -04:00
|
|
|
|
standard exceptions. It is derived from Exception.
|
|
|
|
|
|
|
|
|
|
A new built-in function is defined, iter(), which can be called in
|
|
|
|
|
two ways:
|
|
|
|
|
|
|
|
|
|
- iter(obj) calls PyObject_GetIter(obj).
|
|
|
|
|
|
|
|
|
|
- iter(callable, sentinel) returns a special kind of iterator that
|
|
|
|
|
calls the callable to produce a new value, and compares the
|
|
|
|
|
return value to the sentinel value. If the return value equals
|
|
|
|
|
the sentinel, this signals the end of the iteration and
|
|
|
|
|
StopIteration is raised rather than returning normal; if the
|
|
|
|
|
return value does not equal the sentinel, it is returned as the
|
|
|
|
|
next value from the iterator. If the callable raises an
|
|
|
|
|
exception, this is propagated normally; in particular, the
|
2001-05-01 07:42:07 -04:00
|
|
|
|
function is allowed to raise StopIteration as an alternative way
|
|
|
|
|
to end the iteration. (This functionality is available from the
|
|
|
|
|
C API as PyCallIter_New(callable, sentinel).)
|
2001-04-23 14:31:46 -04:00
|
|
|
|
|
|
|
|
|
Iterator objects returned by either form of iter() have a next()
|
|
|
|
|
method. This method either returns the next value in the
|
2001-05-01 07:42:07 -04:00
|
|
|
|
iteration, or raises StopIteration (or a derived exception class)
|
|
|
|
|
to signal the end of the iteration. Any other exception should be
|
2001-04-23 14:31:46 -04:00
|
|
|
|
considered to signify an error and should be propagated normally,
|
|
|
|
|
not taken to mean the end of the iteration.
|
|
|
|
|
|
|
|
|
|
Classes can define how they are iterated over by defining an
|
|
|
|
|
__iter__() method; this should take no additional arguments and
|
2002-07-18 16:00:21 -04:00
|
|
|
|
return a valid iterator object. A class that wants to be an
|
|
|
|
|
iterator should implement two methods: a next() method that behaves
|
|
|
|
|
as described above, and an __iter__() method that returns self.
|
|
|
|
|
|
|
|
|
|
The two methods correspond to two distinct protocols:
|
|
|
|
|
|
|
|
|
|
1. An object can be iterated over with "for" if it implements
|
|
|
|
|
__iter__() or __getitem__().
|
|
|
|
|
|
|
|
|
|
2. An object can function as an iterator if it implements next().
|
|
|
|
|
|
|
|
|
|
Container-like objects usually support protocol 1. Iterators are
|
|
|
|
|
currently required to support both protocols. The semantics of
|
|
|
|
|
iteration come only from protocol 2; protocol 1 is present to make
|
2002-07-19 00:25:06 -04:00
|
|
|
|
iterators behave like sequences; in particular so that code
|
|
|
|
|
receiving an iterator can use a for-loop over the iterator.
|
2001-04-23 14:31:46 -04:00
|
|
|
|
|
2001-02-03 23:36:37 -05:00
|
|
|
|
|
2001-04-23 14:31:46 -04:00
|
|
|
|
Dictionary Iterators
|
2001-02-03 23:36:37 -05:00
|
|
|
|
|
2001-04-23 14:31:46 -04:00
|
|
|
|
- Dictionaries implement a sq_contains slot that implements the
|
|
|
|
|
same test as the has_key() method. This means that we can write
|
|
|
|
|
|
|
|
|
|
if k in dict: ...
|
2001-02-03 23:36:37 -05:00
|
|
|
|
|
2001-04-23 14:31:46 -04:00
|
|
|
|
which is equivalent to
|
|
|
|
|
|
|
|
|
|
if dict.has_key(k): ...
|
2001-02-03 23:36:37 -05:00
|
|
|
|
|
2001-04-23 14:31:46 -04:00
|
|
|
|
- Dictionaries implement a tp_iter slot that returns an efficient
|
|
|
|
|
iterator that iterates over the keys of the dictionary. During
|
|
|
|
|
such an iteration, the dictionary should not be modified, except
|
|
|
|
|
that setting the value for an existing key is allowed (deletions
|
|
|
|
|
or additions are not, nor is the update() method). This means
|
|
|
|
|
that we can write
|
2001-02-03 23:36:37 -05:00
|
|
|
|
|
2001-04-23 14:31:46 -04:00
|
|
|
|
for k in dict: ...
|
2001-02-03 23:36:37 -05:00
|
|
|
|
|
2001-04-23 14:31:46 -04:00
|
|
|
|
which is equivalent to, but much faster than
|
2001-02-03 23:36:37 -05:00
|
|
|
|
|
2001-04-23 14:31:46 -04:00
|
|
|
|
for k in dict.keys(): ...
|
2001-02-03 23:36:37 -05:00
|
|
|
|
|
2001-04-23 14:31:46 -04:00
|
|
|
|
as long as the restriction on modifications to the dictionary
|
|
|
|
|
(either by the loop or by another thread) are not violated.
|
2001-02-03 23:36:37 -05:00
|
|
|
|
|
2001-05-01 08:15:42 -04:00
|
|
|
|
- Add methods to dictionaries that return different kinds of
|
|
|
|
|
iterators explicitly:
|
|
|
|
|
|
|
|
|
|
for key in dict.iterkeys(): ...
|
|
|
|
|
|
|
|
|
|
for value in dict.itervalues(): ...
|
|
|
|
|
|
|
|
|
|
for key, value in dict.iteritems(): ...
|
|
|
|
|
|
|
|
|
|
This means that "for x in dict" is shorthand for "for x in
|
|
|
|
|
dict.iterkeys()".
|
|
|
|
|
|
2002-07-18 16:38:28 -04:00
|
|
|
|
Other mappings, if they support iterators at all, should also
|
2001-04-30 22:04:28 -04:00
|
|
|
|
iterate over the keys. However, this should not be taken as an
|
|
|
|
|
absolute rule; specific applications may have different
|
|
|
|
|
requirements.
|
2001-02-03 23:36:37 -05:00
|
|
|
|
|
|
|
|
|
|
2001-04-23 14:31:46 -04:00
|
|
|
|
File Iterators
|
2001-02-03 23:36:37 -05:00
|
|
|
|
|
2002-07-18 16:38:28 -04:00
|
|
|
|
The following proposal is useful because it provides us with a
|
|
|
|
|
good answer to the complaint that the common idiom to iterate over
|
|
|
|
|
the lines of a file is ugly and slow.
|
2001-02-03 23:36:37 -05:00
|
|
|
|
|
2001-04-23 14:31:46 -04:00
|
|
|
|
- Files implement a tp_iter slot that is equivalent to
|
|
|
|
|
iter(f.readline, ""). This means that we can write
|
2001-02-03 23:36:37 -05:00
|
|
|
|
|
2001-04-23 14:31:46 -04:00
|
|
|
|
for line in file:
|
|
|
|
|
...
|
2001-02-03 23:36:37 -05:00
|
|
|
|
|
2001-04-23 14:31:46 -04:00
|
|
|
|
as a shorthand for
|
2001-02-03 23:36:37 -05:00
|
|
|
|
|
2001-04-23 14:31:46 -04:00
|
|
|
|
for line in iter(file.readline, ""):
|
|
|
|
|
...
|
2001-02-03 23:36:37 -05:00
|
|
|
|
|
2001-04-23 14:31:46 -04:00
|
|
|
|
which is equivalent to, but faster than
|
2001-02-03 23:36:37 -05:00
|
|
|
|
|
2001-04-23 14:31:46 -04:00
|
|
|
|
while 1:
|
|
|
|
|
line = file.readline()
|
|
|
|
|
if not line:
|
|
|
|
|
break
|
|
|
|
|
...
|
2001-02-03 23:36:37 -05:00
|
|
|
|
|
2001-04-23 14:31:46 -04:00
|
|
|
|
This also shows that some iterators are destructive: they consume
|
|
|
|
|
all the values and a second iterator cannot easily be created that
|
|
|
|
|
iterates independently over the same values. You could open the
|
|
|
|
|
file for a second time, or seek() to the beginning, but these
|
|
|
|
|
solutions don't work for all file types, e.g. they don't work when
|
|
|
|
|
the open file object really represents a pipe or a stream socket.
|
2001-02-03 23:36:37 -05:00
|
|
|
|
|
2002-07-18 16:38:28 -04:00
|
|
|
|
Because the file iterator uses an internal buffer, mixing this
|
|
|
|
|
with other file operations (e.g. file.readline()) doesn't work
|
|
|
|
|
right. Also, the following code:
|
|
|
|
|
|
|
|
|
|
for line in file:
|
|
|
|
|
if line == "\n":
|
|
|
|
|
break
|
|
|
|
|
for line in file:
|
|
|
|
|
print line,
|
|
|
|
|
|
|
|
|
|
doesn't work as you might expect, because the iterator created by
|
|
|
|
|
the second for-loop doesn't take the buffer read-ahead by the
|
|
|
|
|
first for-loop into account. A correct way to write this is:
|
|
|
|
|
|
|
|
|
|
it = iter(file)
|
|
|
|
|
for line in it:
|
|
|
|
|
if line == "\n":
|
|
|
|
|
break
|
|
|
|
|
for line in it:
|
|
|
|
|
print line,
|
|
|
|
|
|
|
|
|
|
(The rationale for these restrictions are that "for line in file"
|
|
|
|
|
ought to become the recommended, standard way to iterate over the
|
|
|
|
|
lines of a file, and this should be as fast as can be. The
|
|
|
|
|
iterator version is considerable faster than calling readline(),
|
|
|
|
|
due to the internal buffer in the iterator.)
|
|
|
|
|
|
2001-02-03 23:36:37 -05:00
|
|
|
|
|
|
|
|
|
Rationale
|
|
|
|
|
|
|
|
|
|
If all the parts of the proposal are included, this addresses many
|
|
|
|
|
concerns in a consistent and flexible fashion. Among its chief
|
2001-05-01 07:42:07 -04:00
|
|
|
|
virtues are the following four -- no, five -- no, six -- points:
|
2001-02-03 23:36:37 -05:00
|
|
|
|
|
|
|
|
|
1. It provides an extensible iterator interface.
|
|
|
|
|
|
2001-05-01 07:42:07 -04:00
|
|
|
|
2. It allows performance enhancements to list iteration.
|
2001-02-03 23:36:37 -05:00
|
|
|
|
|
2001-04-23 14:31:46 -04:00
|
|
|
|
3. It allows big performance enhancements to dictionary iteration.
|
2001-02-03 23:36:37 -05:00
|
|
|
|
|
|
|
|
|
4. It allows one to provide an interface for just iteration
|
|
|
|
|
without pretending to provide random access to elements.
|
|
|
|
|
|
|
|
|
|
5. It is backward-compatible with all existing user-defined
|
|
|
|
|
classes and extension objects that emulate sequences and
|
|
|
|
|
mappings, even mappings that only implement a subset of
|
|
|
|
|
{__getitem__, keys, values, items}.
|
|
|
|
|
|
2001-05-01 07:42:07 -04:00
|
|
|
|
6. It makes code iterating over non-sequence collections more
|
|
|
|
|
concise and readable.
|
|
|
|
|
|
2001-02-03 23:36:37 -05:00
|
|
|
|
|
2001-10-25 16:14:01 -04:00
|
|
|
|
Resolved Issues
|
|
|
|
|
|
|
|
|
|
The following topics have been decided by consensus or BDFL
|
|
|
|
|
pronouncement.
|
|
|
|
|
|
|
|
|
|
- Two alternative spellings for next() have been proposed but
|
|
|
|
|
rejected: __next__(), because it corresponds to a type object
|
|
|
|
|
slot (tp_iternext); and __call__(), because this is the only
|
|
|
|
|
operation.
|
|
|
|
|
|
|
|
|
|
Arguments against __next__(): while many iterators are used in
|
|
|
|
|
for loops, it is expected that user code will also call next()
|
|
|
|
|
directly, so having to write __next__() is ugly; also, a
|
|
|
|
|
possible extension of the protocol would be to allow for prev(),
|
|
|
|
|
current() and reset() operations; surely we don't want to use
|
|
|
|
|
__prev__(), __current__(), __reset__().
|
|
|
|
|
|
|
|
|
|
Arguments against __call__() (the original proposal): taken out
|
|
|
|
|
of context, x() is not very readable, while x.next() is clear;
|
|
|
|
|
there's a danger that every special-purpose object wants to use
|
|
|
|
|
__call__() for its most common operation, causing more confusion
|
|
|
|
|
than clarity.
|
2001-04-30 22:04:28 -04:00
|
|
|
|
|
2002-07-18 16:38:28 -04:00
|
|
|
|
(In retrospect, it might have been better to go for __next__()
|
|
|
|
|
and have a new built-in, next(it), which calls it.__next__().
|
|
|
|
|
But alas, it's too late; this has been deployed in Python 2.2
|
|
|
|
|
since December 2001.)
|
|
|
|
|
|
2001-10-25 16:14:01 -04:00
|
|
|
|
- Some folks have requested the ability to restart an iterator.
|
|
|
|
|
This should be dealt with by calling iter() on a sequence
|
2002-07-18 16:38:28 -04:00
|
|
|
|
repeatedly, not by the iterator protocol itself. (See also
|
|
|
|
|
requested extensions below.)
|
2001-10-25 16:14:01 -04:00
|
|
|
|
|
|
|
|
|
- It has been questioned whether an exception to signal the end of
|
|
|
|
|
the iteration isn't too expensive. Several alternatives for the
|
|
|
|
|
StopIteration exception have been proposed: a special value End
|
|
|
|
|
to signal the end, a function end() to test whether the iterator
|
|
|
|
|
is finished, even reusing the IndexError exception.
|
|
|
|
|
|
|
|
|
|
- A special value has the problem that if a sequence ever
|
|
|
|
|
contains that special value, a loop over that sequence will
|
|
|
|
|
end prematurely without any warning. If the experience with
|
|
|
|
|
null-terminated C strings hasn't taught us the problems this
|
|
|
|
|
can cause, imagine the trouble a Python introspection tool
|
|
|
|
|
would have iterating over a list of all built-in names,
|
|
|
|
|
assuming that the special End value was a built-in name!
|
|
|
|
|
|
|
|
|
|
- Calling an end() function would require two calls per
|
|
|
|
|
iteration. Two calls is much more expensive than one call
|
|
|
|
|
plus a test for an exception. Especially the time-critical
|
|
|
|
|
for loop can test very cheaply for an exception.
|
|
|
|
|
|
|
|
|
|
- Reusing IndexError can cause confusion because it can be a
|
|
|
|
|
genuine error, which would be masked by ending the loop
|
|
|
|
|
prematurely.
|
|
|
|
|
|
|
|
|
|
- Some have asked for a standard iterator type. Presumably all
|
|
|
|
|
iterators would have to be derived from this type. But this is
|
|
|
|
|
not the Python way: dictionaries are mappings because they
|
|
|
|
|
support __getitem__() and a handful other operations, not
|
|
|
|
|
because they are derived from an abstract mapping type.
|
|
|
|
|
|
|
|
|
|
- Regarding "if key in dict": there is no doubt that the
|
2002-07-08 02:48:55 -04:00
|
|
|
|
dict.has_key(x) interpretation of "x in dict" is by far the
|
2001-10-25 16:14:01 -04:00
|
|
|
|
most useful interpretation, probably the only useful one. There
|
|
|
|
|
has been resistance against this because "x in list" checks
|
|
|
|
|
whether x is present among the values, while the proposal makes
|
|
|
|
|
"x in dict" check whether x is present among the keys. Given
|
|
|
|
|
that the symmetry between lists and dictionaries is very weak,
|
|
|
|
|
this argument does not have much weight.
|
2001-04-30 22:04:28 -04:00
|
|
|
|
|
|
|
|
|
- The name iter() is an abbreviation. Alternatives proposed
|
2001-05-01 07:42:07 -04:00
|
|
|
|
include iterate(), traverse(), but these appear too long.
|
|
|
|
|
Python has a history of using abbrs for common builtins,
|
|
|
|
|
e.g. repr(), str(), len().
|
2001-04-30 22:04:28 -04:00
|
|
|
|
|
2001-10-25 16:14:01 -04:00
|
|
|
|
Resolution: iter() it is.
|
|
|
|
|
|
2001-04-30 22:04:28 -04:00
|
|
|
|
- Using the same name for two different operations (getting an
|
|
|
|
|
iterator from an object and making an iterator for a function
|
|
|
|
|
with an sentinel value) is somewhat ugly. I haven't seen a
|
2001-05-01 07:42:07 -04:00
|
|
|
|
better name for the second operation though, and since they both
|
|
|
|
|
return an iterator, it's easy to remember.
|
2001-04-30 22:04:28 -04:00
|
|
|
|
|
2001-10-25 16:14:01 -04:00
|
|
|
|
Resolution: the builtin iter() takes an optional argument, which
|
|
|
|
|
is the sentinel to look for.
|
|
|
|
|
|
2001-04-30 22:04:28 -04:00
|
|
|
|
- Once a particular iterator object has raised StopIteration, will
|
|
|
|
|
it also raise StopIteration on all subsequent next() calls?
|
|
|
|
|
Some say that it would be useful to require this, others say
|
|
|
|
|
that it is useful to leave this open to individual iterators.
|
|
|
|
|
Note that this may require an additional state bit for some
|
|
|
|
|
iterator implementations (e.g. function-wrapping iterators).
|
|
|
|
|
|
2001-10-25 16:14:01 -04:00
|
|
|
|
Resolution: once StopIteration is raised, calling it.next()
|
|
|
|
|
continues to raise StopIteration.
|
|
|
|
|
|
2002-07-18 16:38:28 -04:00
|
|
|
|
Note: this was in fact not implemented in Python 2.2; there are
|
|
|
|
|
many cases where an iterator's next() method can raise
|
|
|
|
|
StopIteration on one call but not on the next. This has been
|
|
|
|
|
remedied in Python 2.3.
|
|
|
|
|
|
2001-05-01 07:47:29 -04:00
|
|
|
|
- It has been proposed that a file object should be its own
|
|
|
|
|
iterator, with a next() method returning the next line. This
|
|
|
|
|
has certain advantages, and makes it even clearer that this
|
|
|
|
|
iterator is destructive. The disadvantage is that this would
|
|
|
|
|
make it even more painful to implement the "sticky
|
|
|
|
|
StopIteration" feature proposed in the previous bullet.
|
|
|
|
|
|
2002-07-18 16:38:28 -04:00
|
|
|
|
Resolution: tentatively rejected (though there are still people
|
|
|
|
|
arguing for this).
|
2001-10-25 16:14:01 -04:00
|
|
|
|
|
2001-04-30 22:04:28 -04:00
|
|
|
|
- Some folks have requested extensions of the iterator protocol,
|
|
|
|
|
e.g. prev() to get the previous item, current() to get the
|
|
|
|
|
current item again, finished() to test whether the iterator is
|
|
|
|
|
finished, and maybe even others, like rewind(), __len__(),
|
|
|
|
|
position().
|
|
|
|
|
|
|
|
|
|
While some of these are useful, many of these cannot easily be
|
|
|
|
|
implemented for all iterator types without adding arbitrary
|
|
|
|
|
buffering, and sometimes they can't be implemented at all (or
|
|
|
|
|
not reasonably). E.g. anything to do with reversing directions
|
|
|
|
|
can't be done when iterating over a file or function. Maybe a
|
|
|
|
|
separate PEP can be drafted to standardize the names for such
|
|
|
|
|
operations when the are implementable.
|
|
|
|
|
|
2001-10-25 16:14:01 -04:00
|
|
|
|
Resolution: rejected.
|
|
|
|
|
|
2002-07-18 16:38:28 -04:00
|
|
|
|
- There has been a long discussion about whether
|
2001-04-30 22:04:28 -04:00
|
|
|
|
|
|
|
|
|
for x in dict: ...
|
|
|
|
|
|
|
|
|
|
should assign x the successive keys, values, or items of the
|
|
|
|
|
dictionary. The symmetry between "if x in y" and "for x in y"
|
|
|
|
|
suggests that it should iterate over keys. This symmetry has been
|
|
|
|
|
observed by many independently and has even been used to "explain"
|
|
|
|
|
one using the other. This is because for sequences, "if x in y"
|
|
|
|
|
iterates over y comparing the iterated values to x. If we adopt
|
|
|
|
|
both of the above proposals, this will also hold for
|
|
|
|
|
dictionaries.
|
|
|
|
|
|
|
|
|
|
The argument against making "for x in dict" iterate over the keys
|
|
|
|
|
comes mostly from a practicality point of view: scans of the
|
|
|
|
|
standard library show that there are about as many uses of "for x
|
|
|
|
|
in dict.items()" as there are of "for x in dict.keys()", with the
|
|
|
|
|
items() version having a small majority. Presumably many of the
|
|
|
|
|
loops using keys() use the corresponding value anyway, by writing
|
|
|
|
|
dict[x], so (the argument goes) by making both the key and value
|
|
|
|
|
available, we could support the largest number of cases. While
|
|
|
|
|
this is true, I (Guido) find the correspondence between "for x in
|
|
|
|
|
dict" and "if x in dict" too compelling to break, and there's not
|
|
|
|
|
much overhead in having to write dict[x] to explicitly get the
|
2001-05-01 08:15:42 -04:00
|
|
|
|
value.
|
|
|
|
|
|
|
|
|
|
For fast iteration over items, use "for key, value in
|
|
|
|
|
dict.iteritems()". I've timed the difference between
|
2001-05-01 07:42:07 -04:00
|
|
|
|
|
|
|
|
|
for key in dict: dict[key]
|
|
|
|
|
|
|
|
|
|
and
|
|
|
|
|
|
2001-05-01 08:15:42 -04:00
|
|
|
|
for key, value in dict.iteritems(): pass
|
2001-04-30 22:04:28 -04:00
|
|
|
|
|
2001-05-01 08:15:42 -04:00
|
|
|
|
and found that the latter is only about 7% faster.
|
2001-04-30 22:04:28 -04:00
|
|
|
|
|
2001-10-25 16:14:01 -04:00
|
|
|
|
Resolution: By BDFL pronouncement, "for x in dict" iterates over
|
|
|
|
|
the keys, and dictionaries have iteritems(), iterkeys(), and
|
|
|
|
|
itervalues() to return the different flavors of dictionary
|
|
|
|
|
iterators.
|
2001-04-30 22:04:28 -04:00
|
|
|
|
|
|
|
|
|
|
2001-04-27 11:26:54 -04:00
|
|
|
|
Mailing Lists
|
|
|
|
|
|
|
|
|
|
The iterator protocol has been discussed extensively in a mailing
|
|
|
|
|
list on SourceForge:
|
|
|
|
|
|
|
|
|
|
http://lists.sourceforge.net/lists/listinfo/python-iterators
|
|
|
|
|
|
|
|
|
|
Initially, some of the discussion was carried out at Yahoo;
|
|
|
|
|
archives are still accessible:
|
|
|
|
|
|
|
|
|
|
http://groups.yahoo.com/group/python-iter
|
|
|
|
|
|
2001-04-30 22:04:28 -04:00
|
|
|
|
|
2001-04-23 14:31:46 -04:00
|
|
|
|
Copyright
|
2001-02-03 23:36:37 -05:00
|
|
|
|
|
2001-04-23 14:31:46 -04:00
|
|
|
|
This document is in the public domain.
|
2001-02-03 23:36:37 -05:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Local Variables:
|
|
|
|
|
mode: indented-text
|
|
|
|
|
indent-tabs-mode: nil
|
|
|
|
|
End:
|