Almost completely rewritten, focusing on documenting the current state

of affairs, filling in some things still under discussion. Ping, I hope this is okay with you. If you want to revive "for keys:values in dict" etc., you'll write a separate PEP, right?
2001-04-23 18:31:46 +00:00 · 2001-04-23 18:31:46 +00:00 · 31a363c4f5
parent bad454ef15
commit 31a363c4f5
1 changed files with 206 additions and 276 deletions
--- a/pep-0234.txt
+++ b/pep-0234.txt
@ -1,7 +1,7 @@
 PEP: 234
 Title: Iterators
 Version: $Revision$
-Author: ping@lfw.org (Ka-Ping Yee)
+Author: ping@lfw.org (Ka-Ping Yee), guido@python.org (Guido van Rossum)
 Status: Draft
 Type: Standards Track
 Python-Version: 2.1
@ -13,228 +13,244 @@ Abstract
    This document proposes an iteration interface that objects can
    provide to control the behaviour of 'for' loops.  Looping is
    customized by providing a method that produces an iterator object.
-    The iterator should be a callable object that returns the next
-    item in the sequence each time it is called, raising an exception
-    when no more items are available.
+    The iterator provides a 'get next value' operation that produces
+    the nxet item in the sequence each time it is called, raising an
+    exception when no more items are available.
+
+    In addition, specific iterators over the keys of a dictionary and
+    over the lines of a file are proposed, and a proposal is made to
+    allow spelling dict.kas_key(key) as "key in dict".
+
+    Note: this is an almost complete rewrite of this PEP by the second
+    author, describing the actual implementation checked into the
+    trunk of the Python 2.2 CVS tree.  It is still open for
+    discussion.  Some of the more esoteric proposals in the original
+    version of this PEP have been withdrawn for now; these may be the
+    subject of a separate PEP in the future.


-Copyright
+C API Specification

-    This document is in the public domain.
+    A new exception is defined, StopIteration, which can be used to
+    signal the end of an iteration.
+
+    A new slot named tp_iter for requesting an iterator is added to
+    the type object structure.  This should be a function of one
+    PyObject * argument returning a PyObject *, or NULL.  To use this
+    slot, a new C API function PyObject_GetIter() is added, with the
+    same signature as the tp_iter slot function.
+
+    Another new slot, named tp_iternext, is added to the type
+    structure, for obtaining the next value in the iteration.  To use
+    this slot, a new C API function PyIter_Next() is added.  The
+    signature for both the slot and the API function is as follows:
+    the argument is a PyObject * and so is the return value.  When the
+    return value is non-NULL, it is the next value in the iteration.
+    When it is NULL, there are three possibilities:
+
+    - No exception is set; this implies the end of the iteration.
+
+    - The StopIteration exception (or a derived exception class) is
+      set; this implies the end of the iteration.
+
+    - Some other exception is set; this means that an error occurred
+      that should be propagated normally.
+
+    In addition to the tp_iternext slot, every iterator object must
+    also implement a next() method, callable without arguments.  This
+    should have the same semantics as the tp_iternext slot function,
+    except that the only way to signal the end of the iteration is to
+    raise StopIteration.  The iterator object should not care whether
+    its tp_iternext slot function is called or its next() method, and
+    the caller may mix calls arbitrarily.  (The next() method is for
+    the benefit of Python code using iterators directly; the
+    tp_iternext slot is added to make 'for' loops more efficient.)
+
+    To ensure binary backwards compatibility, a new flag
+    Py_TPFLAGS_HAVE_ITER is added to the set of flags in the tp_flags
+    field, and to the default flags macro.  This flag must be tested
+    before accessing the tp_iter or tp_iternext slots.  The macro
+    PyIter_Check() tests whether an object has the appropriate flag
+    set and has a non-NULL tp_iternext slot.  There is no such macro
+    for the tp_iter slot (since the only place where this slot is
+    referenced should be PyObject_GetIter()).
+
+    (Note: the tp_iter slot can be present on any object; the
+    tp_iternext slot should only be present on objects that act as
+    iterators.)
+
+    For backwards compatibility, the PyObject_GetIter() function
+    implements fallback semantics when its argument is a sequence that
+    does not implement a tp_iter function: a lightweight sequence
+    iterator object is constructed in that case which iterates over
+    the items of the sequence in the natural order.
+
+    The Python bytecode generated for 'for' loops is changed to use
+    new opcodes, GET_ITER and FOR_ITER, that use the iterator protocol
+    rather than the sequence protocol to get the next value for the
+    loop variable.  This makes it possible to use a 'for' loop to loop
+    over non-sequence objects that support the tp_iter slot.  Other
+    places where the interpreter loops over the values of a sequence
+    should also be changed to use iterators.
+
+    Iterators ought to implement the tp_iter slot as returning a
+    reference to themselves; this is needed to make it possible to
+    use an iterator (as opposed to a sequence) in a for loop.


-Sequence Iterators
+Python API Specification

-    A new field named 'sq_iter' for requesting an iterator is added
-    to the PySequenceMethods table.  Upon an attempt to iterate over
-    an object with a loop such as
+    The StopIteration exception is made visiable as one of the
+    standard exceptions.  It is derived from Exception.

-        for item in sequence:
-            ...body...
+    A new built-in function is defined, iter(), which can be called in
+    two ways:

-    the interpreter looks for the 'sq_iter' of the 'sequence' object.
-    If the method exists, it is called to get an iterator; it should
-    return a callable object.  If the method does not exist, the
-    interpreter produces a built-in iterator object in the following
-    manner (described in Python here, but implemented in the core):
+    - iter(obj) calls PyObject_GetIter(obj).

-        def make_iterator(sequence):
-            def iterator(sequence=sequence, index=[0]):
-                item = sequence[index[0]]
-                index[0] += 1
-                return item
-            return iterator
+    - iter(callable, sentinel) returns a special kind of iterator that
+      calls the callable to produce a new value, and compares the
+      return value to the sentinel value.  If the return value equals
+      the sentinel, this signals the end of the iteration and
+      StopIteration is raised rather than returning normal; if the
+      return value does not equal the sentinel, it is returned as the
+      next value from the iterator.  If the callable raises an
+      exception, this is propagated normally; in particular, the
+      function is allowed to raise StopError as an alternative way to
+      end the iteration.  (This functionality is available from the C
+      API as PyCallIter_New(callable, sentinel).)

-    To execute the above 'for' loop, the interpreter would proceed as
-    follows, where 'iterator' is the iterator that was obtained:
+    Iterator objects returned by either form of iter() have a next()
+    method.  This method either returns the next value in the
+    iteration, or raises StopError (or a derived exception class) to
+    signal the end of the iteration.  Any other exception should be
+    considered to signify an error and should be propagated normally,
+    not taken to mean the end of the iteration.

-        while 1:
-            try:
-                item = iterator()
-            except IndexError:
-                break
-            ...body...
+    Classes can define how they are iterated over by defining an
+    __iter__() method; this should take no additional arguments and
+    return a valid iterator object.  A class is a valid iterator
+    object when it defines a next() method that behaves as described
+    above.  A class that wants to be an iterator also ought to
+    implement __iter__() returning itself.

-    (Note that the 'break' above doesn't translate to a "real" Python
-    break, since it would go to the 'else:' clause of the loop whereas
-    a "real" break in the body would skip the 'else:' clause.)
+    There is some controversy here:

-    The list() and tuple() built-in functions would be updated to use
-    this same iterator logic to retrieve the items in their argument.
+    - The name iter() is an abbreviation.  Alternatives proposed
+      include iterate(), harp(), traverse(), narrate().

-    List and tuple objects would implement the 'sq_iter' method by
-    calling the built-in make_iterator() routine just described.
-
-    Instance objects would implement the 'sq_iter' method as follows:
-
-        if hasattr(self, '__iter__'):
-            return self.__iter__()
-        elif hasattr(self, '__getitem__'):
-            return make_iterator(self)
-        else:
-            raise TypeError, thing.__class__.__name__ + \
-                ' instance does not support iteration'
-
-    Extension objects can implement 'sq_iter' however they wish, as
-    long as they return a callable object.
+    - Using the same name for two different operations (getting an
+      iterator from an object and making an iterator for a function
+      with an sentinel value) is somewhat ugly.  I haven't seen a
+      better name for the second operation though.


-Mapping Iterators
+Dictionary Iterators

-    An additional proposal from Guido is to provide special syntax
-    for iterating over mappings.  The loop:
+    The following two proposals are somewhat controversial.  They are
+    also independent from the main iterator implementation.  However,
+    they are both very useful.

-        for key:value in mapping:
+    - Dictionaries implement a sq_contains slot that implements the
+      same test as the has_key() method.  This means that we can write

-    would bind both 'key' and 'value' to a key-value pair from the
-    mapping on each iteration.  Tim Peters suggested that similarly,
+          if k in dict: ...

-        for key: in mapping:
+      which is equivalent to

-    could iterate over just the keys and
+          if dict.has_key(k): ...

-        for :value in mapping:
+    - Dictionaries implement a tp_iter slot that returns an efficient
+      iterator that iterates over the keys of the dictionary.  During
+      such an iteration, the dictionary should not be modified, except
+      that setting the value for an existing key is allowed (deletions
+      or additions are not, nor is the update() method).  This means
+      that we can write

-    could iterate over just the values.
+          for k in dict: ...

-    The syntax is unambiguous since the new colon is currently not
-    permitted in this position in the grammar.
+      which is equivalent to, but much faster than

-    This behaviour would be provided by additional methods in the
-    PyMappingMethods table: 'mp_iteritems', 'mp_iterkeys', and
-    'mp_itervalues' respectively.  'mp_iteritems' is expected to
-    produce a callable object that returns a (key, value) tuple;
-    'mp_iterkeys' and 'mp_itervalues' are expected to produce a
-    callable object that returns a single key or value.
+          for k in dict.keys(): ...

-    The implementations of these methods on instance objects would
-    then check for and call the '__iteritems__', '__iterkeys__',
-    and '__itervalues__' methods respectively.
+      as long as the restriction on modifications to the dictionary
+      (either by the loop or by another thread) are not violated.

-    When 'mp_iteritems', 'mp_iterkeys', or 'mp_itervalues' is missing,
-    the default behaviour is to do make_iterator(mapping.items()),
-    make_iterator(mapping.keys()), or make_iterator(mapping.values())
-    respectively, using the definition of make_iterator() above.
+    There is no doubt that the dict.has_keys(x) interpretation of "x
+    in dict" is by far the most useful interpretation, probably the
+    only useful one.  There has been resistance against this because
+    "x in list" checks whether x is present among the values, while
+    the proposal makes "x in dict" check whether x is present among
+    the keys.  Given that the symmetry between lists and dictionaries
+    is very weak, this argument does not have much weight.
+
+    The main discussion focuses on whether
+
+        for x in dict: ...
+
+    should assign x the successive keys, values, or items of the
+    dictionary.  The symmetry between "if x in y" and "for x in y"
+    suggests that it should iterate over keys.  This symmetry has been
+    observed by many independently and has even been used to "explain"
+    one using the other.  This is because for sequences, "if x in y"
+    iterates over y comparing the iterated values to x.  If we adopt
+    both of the above proposals, this will also hold for
+    dictionaries.
+
+    The argument against making "for x in dict" iterate over the keys
+    comes mostly from a practicality point of view: scans of the
+    standard library show that there are about as many uses of "for x
+    in dict.items()" as there are of "for x in dict.keys()", with the
+    items() version having a small majority.  Presumably many of the
+    loops using keys() use the corresponding value anyway, by writing
+    dict[x], so (the argument goes) by making both the key and value
+    available, we could support the largest number of cases.  While
+    this is true, I (Guido) find the correspondence between "for x in
+    dict" and "if x in dict" too compelling to break, and there's not
+    much overhead in having to write dict[x] to explicitly get the
+    value.  We could also add methods to dictionaries that return
+    different kinds of iterators, e.g.
+
+        for key, value in dict.iteritems(): ...
+
+        for value in dict.itervalues(): ...
+
+        for key in dict.iterkeys(): ...


-Indexing Sequences
+File Iterators

-    The special syntax described above can be applied to sequences
-    as well, to provide the long-hoped-for ability to obtain the
-    indices of a sequence without the strange-looking 'range(len(x))'
-    expression.
+    The following proposal is not controversial, but should be
+    considered a separate step after introducing the iterator
+    framework described above.  It is useful because it provides us
+    with a good answer to the complaint that the common idiom to
+    iterate over the lines of a file is ugly and slow.

-        for index:item in sequence:
+    - Files implement a tp_iter slot that is equivalent to
+      iter(f.readline, "").  This means that we can write

-    causes 'index' to be bound to the index of each item as 'item' is
-    bound to the items of the sequence in turn, and
+          for line in file:
+              ...

-        for index: in sequence:
+      as a shorthand for

-    simply causes 'index' to start at 0 and increment until an attempt
-    to get sequence[index] produces an IndexError.  For completeness,
+          for line in iter(file.readline, ""):
+              ...

-        for :item in sequence:
+      which is equivalent to, but faster than

-    is equivalent to
+          while 1:
+              line = file.readline()
+              if not line:
+                  break
+              ...

-        for item in sequence:
-
-    In each case we try to request an appropriate iterator from the
-    sequence.  In summary:
-
-        for k:v in x    looks for mp_iteritems, then sq_iter
-        for k: in x     looks for mp_iterkeys, then sq_iter
-        for :v in x     looks for mp_itervalues, then sq_iter
-        for v in x      looks for sq_iter
-
-    If we fall back to sq_iter in the first two cases, we generate
-    indices for k as needed, by starting at 0 and incrementing.
-
-    The implementation of the mp_iter* methods on instance objects
-    then checks for methods in the following order:
-
-        mp_iteritems    __iteritems__, __iter__, items, __getitem__
-        mp_iterkeys     __iterkeys__, __iter__, keys, __getitem__
-        mp_itervalues   __itervalues__, __iter__, values, __getitem__
-        sq_iter         __iter__, __getitem__
-
-    If a __iteritems__, __iterkeys__, or __itervalues__ method is
-    found, we just call it and use the resulting iterator.  If a
-    mp_* function finds no such method but finds __iter__ instead,
-    we generate indices as needed.
-
-    Upon finding an items(), keys(), or values() method, we use
-    make_iterator(x.items()), make_iterator(x.keys()), or
-    make_iterator(x.values()) respectively.  Upon finding a
-    __getitem__ method, we use it and generate indices as needed.
-
-    For example, the complete implementation of the mp_iteritems
-    method for instances can be roughly described as follows:
-
-        def mp_iteritems(thing):
-            if hasattr(thing, '__iteritems__'):
-                return thing.__iteritems__()
-            if hasattr(thing, '__iter__'):
-                def iterator(sequence=thing, index=[0]):
-                    item = (index[0], sequence.__iter__())
-                    index[0] += 1
-                    return item
-                return iterator
-            if hasattr(thing, 'items'):
-                return make_iterator(thing.items())
-            if hasattr(thing, '__getitem__'):
-                def iterator(sequence=thing, index=[0]):
-                    item = (index[0], sequence[index[0]])
-                    index[0] += 1
-                    return item
-                return iterator
-            raise TypeError, thing.__class__.__name__ + \
-                ' instance does not support iteration over items'
-
-
-Examples
-
-    Here is a class written in Python that represents the sequence of
-    lines in a file.
-
-        class FileLines:
-            def __init__(self, filename):
-                self.file = open(filename)
-            def __iter__(self):
-                def iter(self=self):
-                    line = self.file.readline()
-                    if line: return line
-                    else: raise IndexError
-                return iter
-
-        for line in FileLines('spam.txt'):
-            print line
-
-    And here's an interactive session demonstrating the proposed new
-    looping syntax:
-
-        >>> for i:item in ['a', 'b', 'c']:
-        ...     print i, item
-        ...
-        0 a
-        1 b
-        2 c
-        >>> for i: in 'abcdefg':        # just the indices, please
-        ...     print i,
-        ... print
-        ...
-        0 1 2 3 4 5 6
-        >>> for k:v in os.environ:      # os.environ is an instance, but
-        ...     print k, v              # this still works because we fall
-        ...                             # back to calling items()
-        MAIL /var/spool/mail/ping
-        HOME /home/ping
-        DISPLAY :0.0
-        TERM xterm
-        .
-        .
-        .
+    This also shows that some iterators are destructive: they consume
+    all the values and a second iterator cannot easily be created that
+    iterates independently over the same values.  You could open the
+    file for a second time, or seek() to the beginning, but these
+    solutions don't work for all file types, e.g. they don't work when
+    the open file object really represents a pipe or a stream socket.


 Rationale
@ -245,9 +261,9 @@ Rationale

    1. It provides an extensible iterator interface.

-    2. It resolves the endless "i indexing sequence" debate.
+    1. It allows performance enhancements to list iteration.

-    3. It allows performance enhancements to dictionary iteration.
+    3. It allows big performance enhancements to dictionary iteration.

    4. It allows one to provide an interface for just iteration
       without pretending to provide random access to elements.
@ -258,95 +274,9 @@ Rationale
       {__getitem__, keys, values, items}.


-Errors
+Copyright

-    Errors that occur during sq_iter, mp_iter*, or the __iter*__
-    methods are allowed to propagate normally to the surface.
-
-    An attempt to do
-
-        for item in dict:
-
-    over a dictionary object still produces:
-
-        TypeError: loop over non-sequence
-
-    An attempt to iterate over an instance that provides neither
-    __iter__ nor __getitem__ produces:
-
-        TypeError: instance does not support iteration
-
-    Similarly, an attempt to do mapping-iteration over an instance
-    that doesn't provide the right methods should produce one of the
-    following errors:
-
-        TypeError: instance does not support iteration over items
-        TypeError: instance does not support iteration over keys
-        TypeError: instance does not support iteration over values
-
-    It's an error for the iterator produced by __iteritems__ or
-    mp_iteritems to return an object whose length is not 2:
-
-        TypeError: item iterator did not return a 2-tuple
-
-
-Open Issues
-
-    We could introduce a new exception type such as IteratorExit just
-    for terminating loops rather than using IndexError.  In this case,
-    the implementation of make_iterator() would catch and translate an
-    IndexError into an IteratorExit for backward compatibility.
-
-    We could provide access to the logic that calls either 'sq_item'
-    or make_iterator() with an iter() function in the built-in module
-    (just as the getattr() function provides access to 'tp_getattr').
-    One possible motivation for this is to make it easier for the
-    implementation of __iter__ to delegate iteration to some other
-    sequence.  Presumably we would then have to consider adding
-    iteritems(), iterkeys(), and itervalues() as well.
-
-    An alternative way to let __iter__ delegate iteration to another
-    sequence is for it to return another sequence.  Upon detecting
-    that the object returned by __iter__ is not callable, the
-    interpreter could repeat the process of looking for an iterator
-    on the new object.  However, this process seems potentially
-    convoluted and likely to produce more confusing error messages.
-
-    If we decide to add "freezing" ability to lists and dictionaries,
-    it is suggested that the implementation of make_iterator
-    automatically freeze any list or dictionary argument for the
-    duration of the loop, and produce an error complaining about any
-    attempt to modify it during iteration.  Since it is relatively
-    rare to actually want to modify it during iteration, this is
-    likely to catch mistakes earlier.  If a programmer wants to
-    modify a list or dictionary during iteration, they should
-    explicitly make a copy to iterate over using x[:], x.clone(),
-    x.keys(), x.values(), or x.items().
-
-    For consistency with the 'key in dict' expression, we could
-    support 'for key in dict' as equivalent to 'for key: in dict'.
-
-
-BDFL Pronouncements
-
-    The "parallel expression" to 'for key:value in mapping':
-
-        if key:value in mapping:
-
-    is infeasible since the first colon ends the "if" condition.
-    The following compromise is technically feasible:
-
-        if (key:value) in mapping:
-
-    but the BDFL has pronounced a solid -1 on this.
-
-    The BDFL gave a +0.5 to:
-
-        for key:value in mapping:
-        for index:item in sequence:
-
-    and a +0.2 to the variations where the part before or after
-    the first colon is missing.
+    This document is in the public domain.