2001-02-19 01:08:07 -05:00
|
|
|
|
PEP: 234
|
2001-02-03 23:36:37 -05:00
|
|
|
|
Title: Iterators
|
|
|
|
|
Version: $Revision$
|
2001-04-23 14:31:46 -04:00
|
|
|
|
Author: ping@lfw.org (Ka-Ping Yee), guido@python.org (Guido van Rossum)
|
2001-02-03 23:36:37 -05:00
|
|
|
|
Status: Draft
|
|
|
|
|
Type: Standards Track
|
|
|
|
|
Python-Version: 2.1
|
|
|
|
|
Created: 30-Jan-2001
|
|
|
|
|
Post-History:
|
|
|
|
|
|
|
|
|
|
Abstract
|
|
|
|
|
|
|
|
|
|
This document proposes an iteration interface that objects can
|
|
|
|
|
provide to control the behaviour of 'for' loops. Looping is
|
|
|
|
|
customized by providing a method that produces an iterator object.
|
2001-04-23 14:31:46 -04:00
|
|
|
|
The iterator provides a 'get next value' operation that produces
|
|
|
|
|
the nxet item in the sequence each time it is called, raising an
|
|
|
|
|
exception when no more items are available.
|
|
|
|
|
|
|
|
|
|
In addition, specific iterators over the keys of a dictionary and
|
|
|
|
|
over the lines of a file are proposed, and a proposal is made to
|
|
|
|
|
allow spelling dict.kas_key(key) as "key in dict".
|
|
|
|
|
|
|
|
|
|
Note: this is an almost complete rewrite of this PEP by the second
|
|
|
|
|
author, describing the actual implementation checked into the
|
|
|
|
|
trunk of the Python 2.2 CVS tree. It is still open for
|
|
|
|
|
discussion. Some of the more esoteric proposals in the original
|
|
|
|
|
version of this PEP have been withdrawn for now; these may be the
|
|
|
|
|
subject of a separate PEP in the future.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
C API Specification
|
|
|
|
|
|
|
|
|
|
A new exception is defined, StopIteration, which can be used to
|
|
|
|
|
signal the end of an iteration.
|
|
|
|
|
|
|
|
|
|
A new slot named tp_iter for requesting an iterator is added to
|
|
|
|
|
the type object structure. This should be a function of one
|
|
|
|
|
PyObject * argument returning a PyObject *, or NULL. To use this
|
|
|
|
|
slot, a new C API function PyObject_GetIter() is added, with the
|
|
|
|
|
same signature as the tp_iter slot function.
|
|
|
|
|
|
|
|
|
|
Another new slot, named tp_iternext, is added to the type
|
|
|
|
|
structure, for obtaining the next value in the iteration. To use
|
|
|
|
|
this slot, a new C API function PyIter_Next() is added. The
|
|
|
|
|
signature for both the slot and the API function is as follows:
|
|
|
|
|
the argument is a PyObject * and so is the return value. When the
|
|
|
|
|
return value is non-NULL, it is the next value in the iteration.
|
|
|
|
|
When it is NULL, there are three possibilities:
|
|
|
|
|
|
|
|
|
|
- No exception is set; this implies the end of the iteration.
|
|
|
|
|
|
|
|
|
|
- The StopIteration exception (or a derived exception class) is
|
|
|
|
|
set; this implies the end of the iteration.
|
|
|
|
|
|
|
|
|
|
- Some other exception is set; this means that an error occurred
|
|
|
|
|
that should be propagated normally.
|
|
|
|
|
|
|
|
|
|
In addition to the tp_iternext slot, every iterator object must
|
|
|
|
|
also implement a next() method, callable without arguments. This
|
|
|
|
|
should have the same semantics as the tp_iternext slot function,
|
|
|
|
|
except that the only way to signal the end of the iteration is to
|
|
|
|
|
raise StopIteration. The iterator object should not care whether
|
|
|
|
|
its tp_iternext slot function is called or its next() method, and
|
|
|
|
|
the caller may mix calls arbitrarily. (The next() method is for
|
|
|
|
|
the benefit of Python code using iterators directly; the
|
|
|
|
|
tp_iternext slot is added to make 'for' loops more efficient.)
|
|
|
|
|
|
|
|
|
|
To ensure binary backwards compatibility, a new flag
|
|
|
|
|
Py_TPFLAGS_HAVE_ITER is added to the set of flags in the tp_flags
|
|
|
|
|
field, and to the default flags macro. This flag must be tested
|
|
|
|
|
before accessing the tp_iter or tp_iternext slots. The macro
|
|
|
|
|
PyIter_Check() tests whether an object has the appropriate flag
|
|
|
|
|
set and has a non-NULL tp_iternext slot. There is no such macro
|
|
|
|
|
for the tp_iter slot (since the only place where this slot is
|
|
|
|
|
referenced should be PyObject_GetIter()).
|
|
|
|
|
|
|
|
|
|
(Note: the tp_iter slot can be present on any object; the
|
|
|
|
|
tp_iternext slot should only be present on objects that act as
|
|
|
|
|
iterators.)
|
|
|
|
|
|
|
|
|
|
For backwards compatibility, the PyObject_GetIter() function
|
|
|
|
|
implements fallback semantics when its argument is a sequence that
|
|
|
|
|
does not implement a tp_iter function: a lightweight sequence
|
|
|
|
|
iterator object is constructed in that case which iterates over
|
|
|
|
|
the items of the sequence in the natural order.
|
|
|
|
|
|
|
|
|
|
The Python bytecode generated for 'for' loops is changed to use
|
|
|
|
|
new opcodes, GET_ITER and FOR_ITER, that use the iterator protocol
|
|
|
|
|
rather than the sequence protocol to get the next value for the
|
|
|
|
|
loop variable. This makes it possible to use a 'for' loop to loop
|
|
|
|
|
over non-sequence objects that support the tp_iter slot. Other
|
|
|
|
|
places where the interpreter loops over the values of a sequence
|
|
|
|
|
should also be changed to use iterators.
|
|
|
|
|
|
|
|
|
|
Iterators ought to implement the tp_iter slot as returning a
|
|
|
|
|
reference to themselves; this is needed to make it possible to
|
|
|
|
|
use an iterator (as opposed to a sequence) in a for loop.
|
|
|
|
|
|
2001-04-23 16:04:59 -04:00
|
|
|
|
Discussion: should the next() method be renamed to __next__()?
|
|
|
|
|
Every other method corresponding to a tp_<something> slot has a
|
|
|
|
|
special name. On the other hand, this would suggest that there
|
|
|
|
|
should also be a primitive operation next(x) that would call
|
|
|
|
|
x.__next__(), and this just looks like adding complexity without
|
2001-04-26 09:39:59 -04:00
|
|
|
|
benefit. So I think it's better to stick with next(). On the
|
|
|
|
|
other hand, Marc-Andre Lemburg points out: "Even though .next()
|
|
|
|
|
reads better, I think that we should stick to the convention that
|
|
|
|
|
interpreter APIs use the __xxx__ naming scheme. Otherwise, people
|
|
|
|
|
will have a hard time differentiating between user-level protocols
|
|
|
|
|
and interpreter-level ones. AFAIK, .next() would be the first
|
2001-04-26 17:50:09 -04:00
|
|
|
|
low-level API not using this convention." My (BDFL's) response:
|
|
|
|
|
there are other important protocols with a user-level name
|
|
|
|
|
(e.g. keys()), and I don't see the importance of this particular
|
|
|
|
|
rule.
|
2001-04-23 16:04:59 -04:00
|
|
|
|
|
2001-04-23 14:31:46 -04:00
|
|
|
|
|
|
|
|
|
Python API Specification
|
|
|
|
|
|
|
|
|
|
The StopIteration exception is made visiable as one of the
|
|
|
|
|
standard exceptions. It is derived from Exception.
|
|
|
|
|
|
|
|
|
|
A new built-in function is defined, iter(), which can be called in
|
|
|
|
|
two ways:
|
|
|
|
|
|
|
|
|
|
- iter(obj) calls PyObject_GetIter(obj).
|
|
|
|
|
|
|
|
|
|
- iter(callable, sentinel) returns a special kind of iterator that
|
|
|
|
|
calls the callable to produce a new value, and compares the
|
|
|
|
|
return value to the sentinel value. If the return value equals
|
|
|
|
|
the sentinel, this signals the end of the iteration and
|
|
|
|
|
StopIteration is raised rather than returning normal; if the
|
|
|
|
|
return value does not equal the sentinel, it is returned as the
|
|
|
|
|
next value from the iterator. If the callable raises an
|
|
|
|
|
exception, this is propagated normally; in particular, the
|
|
|
|
|
function is allowed to raise StopError as an alternative way to
|
|
|
|
|
end the iteration. (This functionality is available from the C
|
|
|
|
|
API as PyCallIter_New(callable, sentinel).)
|
|
|
|
|
|
|
|
|
|
Iterator objects returned by either form of iter() have a next()
|
|
|
|
|
method. This method either returns the next value in the
|
|
|
|
|
iteration, or raises StopError (or a derived exception class) to
|
|
|
|
|
signal the end of the iteration. Any other exception should be
|
|
|
|
|
considered to signify an error and should be propagated normally,
|
|
|
|
|
not taken to mean the end of the iteration.
|
|
|
|
|
|
|
|
|
|
Classes can define how they are iterated over by defining an
|
|
|
|
|
__iter__() method; this should take no additional arguments and
|
|
|
|
|
return a valid iterator object. A class is a valid iterator
|
|
|
|
|
object when it defines a next() method that behaves as described
|
|
|
|
|
above. A class that wants to be an iterator also ought to
|
|
|
|
|
implement __iter__() returning itself.
|
|
|
|
|
|
2001-04-26 17:50:09 -04:00
|
|
|
|
Discussion:
|
2001-04-23 14:31:46 -04:00
|
|
|
|
|
|
|
|
|
- The name iter() is an abbreviation. Alternatives proposed
|
|
|
|
|
include iterate(), harp(), traverse(), narrate().
|
|
|
|
|
|
|
|
|
|
- Using the same name for two different operations (getting an
|
|
|
|
|
iterator from an object and making an iterator for a function
|
|
|
|
|
with an sentinel value) is somewhat ugly. I haven't seen a
|
|
|
|
|
better name for the second operation though.
|
2001-02-03 23:36:37 -05:00
|
|
|
|
|
2001-04-26 17:50:09 -04:00
|
|
|
|
- It was originally proposed that rather than having a next()
|
|
|
|
|
method, an iterator object should simply be callable. This was
|
|
|
|
|
rejected in favor of an explicit next() method. The reason is
|
|
|
|
|
clarity: if you don't know the code very well, "x = s()" does
|
|
|
|
|
not give a hint about what it does; but "x = s.next()" is pretty
|
|
|
|
|
clear.
|
|
|
|
|
|
2001-02-03 23:36:37 -05:00
|
|
|
|
|
2001-04-23 14:31:46 -04:00
|
|
|
|
Dictionary Iterators
|
2001-02-03 23:36:37 -05:00
|
|
|
|
|
2001-04-23 14:31:46 -04:00
|
|
|
|
The following two proposals are somewhat controversial. They are
|
|
|
|
|
also independent from the main iterator implementation. However,
|
|
|
|
|
they are both very useful.
|
|
|
|
|
|
|
|
|
|
- Dictionaries implement a sq_contains slot that implements the
|
|
|
|
|
same test as the has_key() method. This means that we can write
|
|
|
|
|
|
|
|
|
|
if k in dict: ...
|
2001-02-03 23:36:37 -05:00
|
|
|
|
|
2001-04-23 14:31:46 -04:00
|
|
|
|
which is equivalent to
|
|
|
|
|
|
|
|
|
|
if dict.has_key(k): ...
|
2001-02-03 23:36:37 -05:00
|
|
|
|
|
2001-04-23 14:31:46 -04:00
|
|
|
|
- Dictionaries implement a tp_iter slot that returns an efficient
|
|
|
|
|
iterator that iterates over the keys of the dictionary. During
|
|
|
|
|
such an iteration, the dictionary should not be modified, except
|
|
|
|
|
that setting the value for an existing key is allowed (deletions
|
|
|
|
|
or additions are not, nor is the update() method). This means
|
|
|
|
|
that we can write
|
2001-02-03 23:36:37 -05:00
|
|
|
|
|
2001-04-23 14:31:46 -04:00
|
|
|
|
for k in dict: ...
|
2001-02-03 23:36:37 -05:00
|
|
|
|
|
2001-04-23 14:31:46 -04:00
|
|
|
|
which is equivalent to, but much faster than
|
2001-02-03 23:36:37 -05:00
|
|
|
|
|
2001-04-23 14:31:46 -04:00
|
|
|
|
for k in dict.keys(): ...
|
2001-02-03 23:36:37 -05:00
|
|
|
|
|
2001-04-23 14:31:46 -04:00
|
|
|
|
as long as the restriction on modifications to the dictionary
|
|
|
|
|
(either by the loop or by another thread) are not violated.
|
2001-02-03 23:36:37 -05:00
|
|
|
|
|
2001-04-23 14:31:46 -04:00
|
|
|
|
There is no doubt that the dict.has_keys(x) interpretation of "x
|
|
|
|
|
in dict" is by far the most useful interpretation, probably the
|
|
|
|
|
only useful one. There has been resistance against this because
|
|
|
|
|
"x in list" checks whether x is present among the values, while
|
|
|
|
|
the proposal makes "x in dict" check whether x is present among
|
|
|
|
|
the keys. Given that the symmetry between lists and dictionaries
|
|
|
|
|
is very weak, this argument does not have much weight.
|
2001-02-03 23:36:37 -05:00
|
|
|
|
|
2001-04-23 14:31:46 -04:00
|
|
|
|
The main discussion focuses on whether
|
2001-02-03 23:36:37 -05:00
|
|
|
|
|
2001-04-23 14:31:46 -04:00
|
|
|
|
for x in dict: ...
|
2001-02-03 23:36:37 -05:00
|
|
|
|
|
2001-04-23 14:31:46 -04:00
|
|
|
|
should assign x the successive keys, values, or items of the
|
|
|
|
|
dictionary. The symmetry between "if x in y" and "for x in y"
|
|
|
|
|
suggests that it should iterate over keys. This symmetry has been
|
|
|
|
|
observed by many independently and has even been used to "explain"
|
|
|
|
|
one using the other. This is because for sequences, "if x in y"
|
|
|
|
|
iterates over y comparing the iterated values to x. If we adopt
|
|
|
|
|
both of the above proposals, this will also hold for
|
|
|
|
|
dictionaries.
|
2001-02-03 23:36:37 -05:00
|
|
|
|
|
2001-04-23 14:31:46 -04:00
|
|
|
|
The argument against making "for x in dict" iterate over the keys
|
|
|
|
|
comes mostly from a practicality point of view: scans of the
|
|
|
|
|
standard library show that there are about as many uses of "for x
|
|
|
|
|
in dict.items()" as there are of "for x in dict.keys()", with the
|
|
|
|
|
items() version having a small majority. Presumably many of the
|
|
|
|
|
loops using keys() use the corresponding value anyway, by writing
|
|
|
|
|
dict[x], so (the argument goes) by making both the key and value
|
|
|
|
|
available, we could support the largest number of cases. While
|
|
|
|
|
this is true, I (Guido) find the correspondence between "for x in
|
|
|
|
|
dict" and "if x in dict" too compelling to break, and there's not
|
|
|
|
|
much overhead in having to write dict[x] to explicitly get the
|
|
|
|
|
value. We could also add methods to dictionaries that return
|
|
|
|
|
different kinds of iterators, e.g.
|
2001-02-03 23:36:37 -05:00
|
|
|
|
|
2001-04-23 14:31:46 -04:00
|
|
|
|
for key, value in dict.iteritems(): ...
|
2001-02-03 23:36:37 -05:00
|
|
|
|
|
2001-04-23 14:31:46 -04:00
|
|
|
|
for value in dict.itervalues(): ...
|
2001-02-03 23:36:37 -05:00
|
|
|
|
|
2001-04-23 14:31:46 -04:00
|
|
|
|
for key in dict.iterkeys(): ...
|
2001-02-03 23:36:37 -05:00
|
|
|
|
|
|
|
|
|
|
2001-04-23 14:31:46 -04:00
|
|
|
|
File Iterators
|
2001-02-03 23:36:37 -05:00
|
|
|
|
|
2001-04-23 14:31:46 -04:00
|
|
|
|
The following proposal is not controversial, but should be
|
|
|
|
|
considered a separate step after introducing the iterator
|
|
|
|
|
framework described above. It is useful because it provides us
|
|
|
|
|
with a good answer to the complaint that the common idiom to
|
|
|
|
|
iterate over the lines of a file is ugly and slow.
|
2001-02-03 23:36:37 -05:00
|
|
|
|
|
2001-04-23 14:31:46 -04:00
|
|
|
|
- Files implement a tp_iter slot that is equivalent to
|
|
|
|
|
iter(f.readline, ""). This means that we can write
|
2001-02-03 23:36:37 -05:00
|
|
|
|
|
2001-04-23 14:31:46 -04:00
|
|
|
|
for line in file:
|
|
|
|
|
...
|
2001-02-03 23:36:37 -05:00
|
|
|
|
|
2001-04-23 14:31:46 -04:00
|
|
|
|
as a shorthand for
|
2001-02-03 23:36:37 -05:00
|
|
|
|
|
2001-04-23 14:31:46 -04:00
|
|
|
|
for line in iter(file.readline, ""):
|
|
|
|
|
...
|
2001-02-03 23:36:37 -05:00
|
|
|
|
|
2001-04-23 14:31:46 -04:00
|
|
|
|
which is equivalent to, but faster than
|
2001-02-03 23:36:37 -05:00
|
|
|
|
|
2001-04-23 14:31:46 -04:00
|
|
|
|
while 1:
|
|
|
|
|
line = file.readline()
|
|
|
|
|
if not line:
|
|
|
|
|
break
|
|
|
|
|
...
|
2001-02-03 23:36:37 -05:00
|
|
|
|
|
2001-04-23 14:31:46 -04:00
|
|
|
|
This also shows that some iterators are destructive: they consume
|
|
|
|
|
all the values and a second iterator cannot easily be created that
|
|
|
|
|
iterates independently over the same values. You could open the
|
|
|
|
|
file for a second time, or seek() to the beginning, but these
|
|
|
|
|
solutions don't work for all file types, e.g. they don't work when
|
|
|
|
|
the open file object really represents a pipe or a stream socket.
|
2001-02-03 23:36:37 -05:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Rationale
|
|
|
|
|
|
|
|
|
|
If all the parts of the proposal are included, this addresses many
|
|
|
|
|
concerns in a consistent and flexible fashion. Among its chief
|
|
|
|
|
virtues are the following three -- no, four -- no, five -- points:
|
|
|
|
|
|
|
|
|
|
1. It provides an extensible iterator interface.
|
|
|
|
|
|
2001-04-23 14:31:46 -04:00
|
|
|
|
1. It allows performance enhancements to list iteration.
|
2001-02-03 23:36:37 -05:00
|
|
|
|
|
2001-04-23 14:31:46 -04:00
|
|
|
|
3. It allows big performance enhancements to dictionary iteration.
|
2001-02-03 23:36:37 -05:00
|
|
|
|
|
|
|
|
|
4. It allows one to provide an interface for just iteration
|
|
|
|
|
without pretending to provide random access to elements.
|
|
|
|
|
|
|
|
|
|
5. It is backward-compatible with all existing user-defined
|
|
|
|
|
classes and extension objects that emulate sequences and
|
|
|
|
|
mappings, even mappings that only implement a subset of
|
|
|
|
|
{__getitem__, keys, values, items}.
|
|
|
|
|
|
|
|
|
|
|
2001-04-23 14:31:46 -04:00
|
|
|
|
Copyright
|
2001-02-03 23:36:37 -05:00
|
|
|
|
|
2001-04-23 14:31:46 -04:00
|
|
|
|
This document is in the public domain.
|
2001-02-03 23:36:37 -05:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Local Variables:
|
|
|
|
|
mode: indented-text
|
|
|
|
|
indent-tabs-mode: nil
|
|
|
|
|
End:
|