Add a number of clarifications and updates.

This commit is contained in:
Guido van Rossum 2002-07-18 20:38:28 +00:00
parent 6c29d4df54
commit 9aee42f23e
1 changed files with 66 additions and 24 deletions

View File

@ -66,15 +66,16 @@ C API Specification
- Some exception is set; this means an error occurred, and should
be propagated normally.
In addition to the tp_iternext slot, every iterator object must
also implement a next() method, callable without arguments. This
should have the same semantics as the tp_iternext slot function,
except that the only way to signal the end of the iteration is to
raise StopIteration. The iterator object should not care whether
its tp_iternext slot function is called or its next() method, and
the caller may mix calls arbitrarily. (The next() method is for
the benefit of Python code using iterators directly; the
tp_iternext slot is added to make 'for' loops more efficient.)
Iterators implemented in C should *not* implement a next() method
with similar semantics as the tp_iternext slot! When the type's
dictionary is initialized (by PyType_Ready()), the presence of a
tp_iternext slot causes a method next() wrapping that slot to be
added to the type's tp_dict. (Exception: if the type doesn't use
PyObject_GenericGetAttr() to access instance attributes, the
next() method in the type's tp_dict may not be seen.) (Due to a
misunderstanding in the original text of this PEP, in Python 2.2,
all iterator types implemented a next() method that was overridden
by the wrapper; this has been fixed in Python 2.3.)
To ensure binary backwards compatibility, a new flag
Py_TPFLAGS_HAVE_ITER is added to the set of flags in the tp_flags
@ -83,7 +84,8 @@ C API Specification
PyIter_Check() tests whether an object has the appropriate flag
set and has a non-NULL tp_iternext slot. There is no such macro
for the tp_iter slot (since the only place where this slot is
referenced should be PyObject_GetIter()).
referenced should be PyObject_GetIter(), and this can check for
the Py_TPFLAGS_HAVE_ITER flag directly).
(Note: the tp_iter slot can be present on any object; the
tp_iternext slot should only be present on objects that act as
@ -107,6 +109,14 @@ C API Specification
reference to themselves; this is needed to make it possible to
use an iterator (as opposed to a sequence) in a for loop.
Iterator implementations (in C or in Python) should guarantee that
once the iterator has signalled its exhaustion, subsequent calls
to tp_iternext or to the next() method will continue to do so. It
is not specified whether an iterator should enter the exhausted
state when an exception (other than StopIteration) is raised.
Note that Python cannot guarantee that user-defined or 3rd party
iterators implement this requirement correctly.
Python API Specification
@ -163,10 +173,6 @@ Python API Specification
Dictionary Iterators
The following two proposals are somewhat controversial. They are
also independent from the main iterator implementation. However,
they are both very useful.
- Dictionaries implement a sq_contains slot that implements the
same test as the has_key() method. This means that we can write
@ -204,8 +210,7 @@ Dictionary Iterators
This means that "for x in dict" is shorthand for "for x in
dict.iterkeys()".
If this proposal is accepted, it makes sense to recommend that
other mappings, if they support iterators at all, should also
Other mappings, if they support iterators at all, should also
iterate over the keys. However, this should not be taken as an
absolute rule; specific applications may have different
requirements.
@ -213,11 +218,9 @@ Dictionary Iterators
File Iterators
The following proposal is not controversial, but should be
considered a separate step after introducing the iterator
framework described above. It is useful because it provides us
with a good answer to the complaint that the common idiom to
iterate over the lines of a file is ugly and slow.
The following proposal is useful because it provides us with a
good answer to the complaint that the common idiom to iterate over
the lines of a file is ugly and slow.
- Files implement a tp_iter slot that is equivalent to
iter(f.readline, ""). This means that we can write
@ -245,6 +248,33 @@ File Iterators
solutions don't work for all file types, e.g. they don't work when
the open file object really represents a pipe or a stream socket.
Because the file iterator uses an internal buffer, mixing this
with other file operations (e.g. file.readline()) doesn't work
right. Also, the following code:
for line in file:
if line == "\n":
break
for line in file:
print line,
doesn't work as you might expect, because the iterator created by
the second for-loop doesn't take the buffer read-ahead by the
first for-loop into account. A correct way to write this is:
it = iter(file)
for line in it:
if line == "\n":
break
for line in it:
print line,
(The rationale for these restrictions are that "for line in file"
ought to become the recommended, standard way to iterate over the
lines of a file, and this should be as fast as can be. The
iterator version is considerable faster than calling readline(),
due to the internal buffer in the iterator.)
Rationale
@ -293,9 +323,15 @@ Resolved Issues
__call__() for its most common operation, causing more confusion
than clarity.
(In retrospect, it might have been better to go for __next__()
and have a new built-in, next(it), which calls it.__next__().
But alas, it's too late; this has been deployed in Python 2.2
since December 2001.)
- Some folks have requested the ability to restart an iterator.
This should be dealt with by calling iter() on a sequence
repeatedly, not by the iterator protocol itself.
repeatedly, not by the iterator protocol itself. (See also
requested extensions below.)
- It has been questioned whether an exception to signal the end of
the iteration isn't too expensive. Several alternatives for the
@ -361,6 +397,11 @@ Resolved Issues
Resolution: once StopIteration is raised, calling it.next()
continues to raise StopIteration.
Note: this was in fact not implemented in Python 2.2; there are
many cases where an iterator's next() method can raise
StopIteration on one call but not on the next. This has been
remedied in Python 2.3.
- It has been proposed that a file object should be its own
iterator, with a next() method returning the next line. This
has certain advantages, and makes it even clearer that this
@ -368,7 +409,8 @@ Resolved Issues
make it even more painful to implement the "sticky
StopIteration" feature proposed in the previous bullet.
Resolution: this has been implemented.
Resolution: tentatively rejected (though there are still people
arguing for this).
- Some folks have requested extensions of the iterator protocol,
e.g. prev() to get the previous item, current() to get the
@ -386,7 +428,7 @@ Resolved Issues
Resolution: rejected.
- There is still discussion about whether
- There has been a long discussion about whether
for x in dict: ...