Initial draft of "Iterators" PEP.
This commit is contained in:
parent
6f6df21bd8
commit
f6828bb85b
|
@ -0,0 +1,356 @@
|
||||||
|
PEP: 2xx
|
||||||
|
Title: Iterators
|
||||||
|
Version: $Revision$
|
||||||
|
Author: ping@lfw.org (Ka-Ping Yee)
|
||||||
|
Status: Draft
|
||||||
|
Type: Standards Track
|
||||||
|
Python-Version: 2.1
|
||||||
|
Created: 30-Jan-2001
|
||||||
|
Post-History:
|
||||||
|
|
||||||
|
Abstract
|
||||||
|
|
||||||
|
This document proposes an iteration interface that objects can
|
||||||
|
provide to control the behaviour of 'for' loops. Looping is
|
||||||
|
customized by providing a method that produces an iterator object.
|
||||||
|
The iterator should be a callable object that returns the next
|
||||||
|
item in the sequence each time it is called, raising an exception
|
||||||
|
when no more items are available.
|
||||||
|
|
||||||
|
|
||||||
|
Copyright
|
||||||
|
|
||||||
|
This document is in the public domain.
|
||||||
|
|
||||||
|
|
||||||
|
Sequence Iterators
|
||||||
|
|
||||||
|
A new field named 'sq_iter' for requesting an iterator is added
|
||||||
|
to the PySequenceMethods table. Upon an attempt to iterate over
|
||||||
|
an object with a loop such as
|
||||||
|
|
||||||
|
for item in sequence:
|
||||||
|
...body...
|
||||||
|
|
||||||
|
the interpreter looks for the 'sq_iter' of the 'sequence' object.
|
||||||
|
If the method exists, it is called to get an iterator; it should
|
||||||
|
return a callable object. If the method does not exist, the
|
||||||
|
interpreter produces a built-in iterator object in the following
|
||||||
|
manner (described in Python here, but implemented in the core):
|
||||||
|
|
||||||
|
def make_iterator(sequence):
|
||||||
|
def iterator(sequence=sequence, index=[0]):
|
||||||
|
item = sequence[index[0]]
|
||||||
|
index[0] += 1
|
||||||
|
return item
|
||||||
|
return iterator
|
||||||
|
|
||||||
|
To execute the above 'for' loop, the interpreter would proceed as
|
||||||
|
follows, where 'iterator' is the iterator that was obtained:
|
||||||
|
|
||||||
|
while 1:
|
||||||
|
try:
|
||||||
|
item = iterator()
|
||||||
|
except IndexError:
|
||||||
|
break
|
||||||
|
...body...
|
||||||
|
|
||||||
|
(Note that the 'break' above doesn't translate to a "real" Python
|
||||||
|
break, since it would go to the 'else:' clause of the loop whereas
|
||||||
|
a "real" break in the body would skip the 'else:' clause.)
|
||||||
|
|
||||||
|
The list() and tuple() built-in functions would be updated to use
|
||||||
|
this same iterator logic to retrieve the items in their argument.
|
||||||
|
|
||||||
|
List and tuple objects would implement the 'sq_iter' method by
|
||||||
|
calling the built-in make_iterator() routine just described.
|
||||||
|
|
||||||
|
Instance objects would implement the 'sq_iter' method as follows:
|
||||||
|
|
||||||
|
if hasattr(self, '__iter__'):
|
||||||
|
return self.__iter__()
|
||||||
|
elif hasattr(self, '__getitem__'):
|
||||||
|
return make_iterator(self)
|
||||||
|
else:
|
||||||
|
raise TypeError, thing.__class__.__name__ + \
|
||||||
|
' instance does not support iteration'
|
||||||
|
|
||||||
|
Extension objects can implement 'sq_iter' however they wish, as
|
||||||
|
long as they return a callable object.
|
||||||
|
|
||||||
|
|
||||||
|
Mapping Iterators
|
||||||
|
|
||||||
|
An additional proposal from Guido is to provide special syntax
|
||||||
|
for iterating over mappings. The loop:
|
||||||
|
|
||||||
|
for key:value in mapping:
|
||||||
|
|
||||||
|
would bind both 'key' and 'value' to a key-value pair from the
|
||||||
|
mapping on each iteration. Tim Peters suggested that similarly,
|
||||||
|
|
||||||
|
for key: in mapping:
|
||||||
|
|
||||||
|
could iterate over just the keys and
|
||||||
|
|
||||||
|
for :value in mapping:
|
||||||
|
|
||||||
|
could iterate over just the values.
|
||||||
|
|
||||||
|
The syntax is unambiguous since the new colon is currently not
|
||||||
|
permitted in this position in the grammar.
|
||||||
|
|
||||||
|
This behaviour would be provided by additional methods in the
|
||||||
|
PyMappingMethods table: 'mp_iteritems', 'mp_iterkeys', and
|
||||||
|
'mp_itervalues' respectively. 'mp_iteritems' is expected to
|
||||||
|
produce a callable object that returns a (key, value) tuple;
|
||||||
|
'mp_iterkeys' and 'mp_itervalues' are expected to produce a
|
||||||
|
callable object that returns a single key or value.
|
||||||
|
|
||||||
|
The implementations of these methods on instance objects would
|
||||||
|
then check for and call the '__iteritems__', '__iterkeys__',
|
||||||
|
and '__itervalues__' methods respectively.
|
||||||
|
|
||||||
|
When 'mp_iteritems', 'mp_iterkeys', or 'mp_itervalues' is missing,
|
||||||
|
the default behaviour is to do make_iterator(mapping.items()),
|
||||||
|
make_iterator(mapping.keys()), or make_iterator(mapping.values())
|
||||||
|
respectively, using the definition of make_iterator() above.
|
||||||
|
|
||||||
|
|
||||||
|
Indexing Sequences
|
||||||
|
|
||||||
|
The special syntax described above can be applied to sequences
|
||||||
|
as well, to provide the long-hoped-for ability to obtain the
|
||||||
|
indices of a sequence without the strange-looking 'range(len(x))'
|
||||||
|
expression.
|
||||||
|
|
||||||
|
for index:item in sequence:
|
||||||
|
|
||||||
|
causes 'index' to be bound to the index of each item as 'item' is
|
||||||
|
bound to the items of the sequence in turn, and
|
||||||
|
|
||||||
|
for index: in sequence:
|
||||||
|
|
||||||
|
simply causes 'index' to start at 0 and increment until an attempt
|
||||||
|
to get sequence[index] produces an IndexError. For completeness,
|
||||||
|
|
||||||
|
for :item in sequence:
|
||||||
|
|
||||||
|
is equivalent to
|
||||||
|
|
||||||
|
for item in sequence:
|
||||||
|
|
||||||
|
In each case we try to request an appropriate iterator from the
|
||||||
|
sequence. In summary:
|
||||||
|
|
||||||
|
for k:v in x looks for mp_iteritems, then sq_iter
|
||||||
|
for k: in x looks for mp_iterkeys, then sq_iter
|
||||||
|
for :v in x looks for mp_itervalues, then sq_iter
|
||||||
|
for v in x looks for sq_iter
|
||||||
|
|
||||||
|
If we fall back to sq_iter in the first two cases, we generate
|
||||||
|
indices for k as needed, by starting at 0 and incrementing.
|
||||||
|
|
||||||
|
The implementation of the mp_iter* methods on instance objects
|
||||||
|
then checks for methods in the following order:
|
||||||
|
|
||||||
|
mp_iteritems __iteritems__, __iter__, items, __getitem__
|
||||||
|
mp_iterkeys __iterkeys__, __iter__, keys, __getitem__
|
||||||
|
mp_itervalues __itervalues__, __iter__, values, __getitem__
|
||||||
|
sq_iter __iter__, __getitem__
|
||||||
|
|
||||||
|
If a __iteritems__, __iterkeys__, or __itervalues__ method is
|
||||||
|
found, we just call it and use the resulting iterator. If a
|
||||||
|
mp_* function finds no such method but finds __iter__ instead,
|
||||||
|
we generate indices as needed.
|
||||||
|
|
||||||
|
Upon finding an items(), keys(), or values() method, we use
|
||||||
|
make_iterator(x.items()), make_iterator(x.keys()), or
|
||||||
|
make_iterator(x.values()) respectively. Upon finding a
|
||||||
|
__getitem__ method, we use it and generate indices as needed.
|
||||||
|
|
||||||
|
For example, the complete implementation of the mp_iteritems
|
||||||
|
method for instances can be roughly described as follows:
|
||||||
|
|
||||||
|
def mp_iteritems(thing):
|
||||||
|
if hasattr(thing, '__iteritems__'):
|
||||||
|
return thing.__iteritems__()
|
||||||
|
if hasattr(thing, '__iter__'):
|
||||||
|
def iterator(sequence=thing, index=[0]):
|
||||||
|
item = (index[0], sequence.__iter__())
|
||||||
|
index[0] += 1
|
||||||
|
return item
|
||||||
|
return iterator
|
||||||
|
if hasattr(thing, 'items'):
|
||||||
|
return make_iterator(thing.items())
|
||||||
|
if hasattr(thing, '__getitem__'):
|
||||||
|
def iterator(sequence=thing, index=[0]):
|
||||||
|
item = (index[0], sequence[index[0]])
|
||||||
|
index[0] += 1
|
||||||
|
return item
|
||||||
|
return iterator
|
||||||
|
raise TypeError, thing.__class__.__name__ + \
|
||||||
|
' instance does not support iteration over items'
|
||||||
|
|
||||||
|
|
||||||
|
Examples
|
||||||
|
|
||||||
|
Here is a class written in Python that represents the sequence of
|
||||||
|
lines in a file.
|
||||||
|
|
||||||
|
class FileLines:
|
||||||
|
def __init__(self, filename):
|
||||||
|
self.file = open(filename)
|
||||||
|
def __iter__(self):
|
||||||
|
def iter(self=self):
|
||||||
|
line = self.file.readline()
|
||||||
|
if line: return line
|
||||||
|
else: raise IndexError
|
||||||
|
return iter
|
||||||
|
|
||||||
|
for line in FileLines('spam.txt'):
|
||||||
|
print line
|
||||||
|
|
||||||
|
And here's an interactive session demonstrating the proposed new
|
||||||
|
looping syntax:
|
||||||
|
|
||||||
|
>>> for i:item in ['a', 'b', 'c']:
|
||||||
|
... print i, item
|
||||||
|
...
|
||||||
|
0 a
|
||||||
|
1 b
|
||||||
|
2 c
|
||||||
|
>>> for i: in 'abcdefg': # just the indices, please
|
||||||
|
... print i,
|
||||||
|
... print
|
||||||
|
...
|
||||||
|
0 1 2 3 4 5 6
|
||||||
|
>>> for k:v in os.environ: # os.environ is an instance, but
|
||||||
|
... print k, v # this still works because we fall
|
||||||
|
... # back to calling items()
|
||||||
|
MAIL /var/spool/mail/ping
|
||||||
|
HOME /home/ping
|
||||||
|
DISPLAY :0.0
|
||||||
|
TERM xterm
|
||||||
|
.
|
||||||
|
.
|
||||||
|
.
|
||||||
|
|
||||||
|
|
||||||
|
Rationale
|
||||||
|
|
||||||
|
If all the parts of the proposal are included, this addresses many
|
||||||
|
concerns in a consistent and flexible fashion. Among its chief
|
||||||
|
virtues are the following three -- no, four -- no, five -- points:
|
||||||
|
|
||||||
|
1. It provides an extensible iterator interface.
|
||||||
|
|
||||||
|
2. It resolves the endless "i indexing sequence" debate.
|
||||||
|
|
||||||
|
3. It allows performance enhancements to dictionary iteration.
|
||||||
|
|
||||||
|
4. It allows one to provide an interface for just iteration
|
||||||
|
without pretending to provide random access to elements.
|
||||||
|
|
||||||
|
5. It is backward-compatible with all existing user-defined
|
||||||
|
classes and extension objects that emulate sequences and
|
||||||
|
mappings, even mappings that only implement a subset of
|
||||||
|
{__getitem__, keys, values, items}.
|
||||||
|
|
||||||
|
|
||||||
|
Errors
|
||||||
|
|
||||||
|
Errors that occur during sq_iter, mp_iter*, or the __iter*__
|
||||||
|
methods are allowed to propagate normally to the surface.
|
||||||
|
|
||||||
|
An attempt to do
|
||||||
|
|
||||||
|
for item in dict:
|
||||||
|
|
||||||
|
over a dictionary object still produces:
|
||||||
|
|
||||||
|
TypeError: loop over non-sequence
|
||||||
|
|
||||||
|
An attempt to iterate over an instance that provides neither
|
||||||
|
__iter__ nor __getitem__ produces:
|
||||||
|
|
||||||
|
TypeError: instance does not support iteration
|
||||||
|
|
||||||
|
Similarly, an attempt to do mapping-iteration over an instance
|
||||||
|
that doesn't provide the right methods should produce one of the
|
||||||
|
following errors:
|
||||||
|
|
||||||
|
TypeError: instance does not support iteration over items
|
||||||
|
TypeError: instance does not support iteration over keys
|
||||||
|
TypeError: instance does not support iteration over values
|
||||||
|
|
||||||
|
It's an error for the iterator produced by __iteritems__ or
|
||||||
|
mp_iteritems to return an object whose length is not 2:
|
||||||
|
|
||||||
|
TypeError: item iterator did not return a 2-tuple
|
||||||
|
|
||||||
|
|
||||||
|
Open Issues
|
||||||
|
|
||||||
|
We could introduce a new exception type such as IteratorExit just
|
||||||
|
for terminating loops rather than using IndexError. In this case,
|
||||||
|
the implementation of make_iterator() would catch and translate an
|
||||||
|
IndexError into an IteratorExit for backward compatibility.
|
||||||
|
|
||||||
|
We could provide access to the logic that calls either 'sq_item'
|
||||||
|
or make_iterator() with an iter() function in the built-in module
|
||||||
|
(just as the getattr() function provides access to 'tp_getattr').
|
||||||
|
One possible motivation for this is to make it easier for the
|
||||||
|
implementation of __iter__ to delegate iteration to some other
|
||||||
|
sequence. Presumably we would then have to consider adding
|
||||||
|
iteritems(), iterkeys(), and itervalues() as well.
|
||||||
|
|
||||||
|
An alternative way to let __iter__ delegate iteration to another
|
||||||
|
sequence is for it to return another sequence. Upon detecting
|
||||||
|
that the object returned by __iter__ is not callable, the
|
||||||
|
interpreter could repeat the process of looking for an iterator
|
||||||
|
on the new object. However, this process seems potentially
|
||||||
|
convoluted and likely to produce more confusing error messages.
|
||||||
|
|
||||||
|
If we decide to add "freezing" ability to lists and dictionaries,
|
||||||
|
it is suggested that the implementation of make_iterator
|
||||||
|
automatically freeze any list or dictionary argument for the
|
||||||
|
duration of the loop, and produce an error complaining about any
|
||||||
|
attempt to modify it during iteration. Since it is relatively
|
||||||
|
rare to actually want to modify it during iteration, this is
|
||||||
|
likely to catch mistakes earlier. If a programmer wants to
|
||||||
|
modify a list or dictionary during iteration, they should
|
||||||
|
explicitly make a copy to iterate over using x[:], x.clone(),
|
||||||
|
x.keys(), x.values(), or x.items().
|
||||||
|
|
||||||
|
For consistency with the 'key in dict' expression, we could
|
||||||
|
support 'for key in dict' as equivalent to 'for key: in dict'.
|
||||||
|
|
||||||
|
|
||||||
|
BDFL Pronouncements
|
||||||
|
|
||||||
|
The "parallel expression" to 'for key:value in mapping':
|
||||||
|
|
||||||
|
if key:value in mapping:
|
||||||
|
|
||||||
|
is infeasible since the first colon ends the "if" condition.
|
||||||
|
The following compromise is technically feasible:
|
||||||
|
|
||||||
|
if (key:value) in mapping:
|
||||||
|
|
||||||
|
but the BDFL has pronounced a solid -1 on this.
|
||||||
|
|
||||||
|
The BDFL gave a +0.5 to:
|
||||||
|
|
||||||
|
for key:value in mapping:
|
||||||
|
for index:item in sequence:
|
||||||
|
|
||||||
|
and a +0.2 to the variations where the part before or after
|
||||||
|
the first colon is missing.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
Local Variables:
|
||||||
|
mode: indented-text
|
||||||
|
indent-tabs-mode: nil
|
||||||
|
End:
|
Loading…
Reference in New Issue