python-peps/pep-0234.txt

PEP: 234
Title: Iterators
Version: $Revision$
Author: ping@lfw.org (Ka-Ping Yee), guido@python.org (Guido van Rossum)
Status: Draft
Type: Standards Track
Python-Version: 2.1
Created: 30-Jan-2001
Post-History: 30-Apr-2001

Abstract

    This document proposes an iteration interface that objects can
    provide to control the behaviour of 'for' loops.  Looping is
    customized by providing a method that produces an iterator object.
    The iterator provides a 'get next value' operation that produces
    the next item in the sequence each time it is called, raising an
    exception when no more items are available.

    In addition, specific iterators over the keys of a dictionary and
    over the lines of a file are proposed, and a proposal is made to
    allow spelling dict.has_key(key) as "key in dict".

    Note: this is an almost complete rewrite of this PEP by the second
    author, describing the actual implementation checked into the
    trunk of the Python 2.2 CVS tree.  It is still open for
    discussion.  Some of the more esoteric proposals in the original
    version of this PEP have been withdrawn for now; these may be the
    subject of a separate PEP in the future.


C API Specification

    A new exception is defined, StopIteration, which can be used to
    signal the end of an iteration.

    A new slot named tp_iter for requesting an iterator is added to
    the type object structure.  This should be a function of one
    PyObject * argument returning a PyObject *, or NULL.  To use this
    slot, a new C API function PyObject_GetIter() is added, with the
    same signature as the tp_iter slot function.

    Another new slot, named tp_iternext, is added to the type
    structure, for obtaining the next value in the iteration.  To use
    this slot, a new C API function PyIter_Next() is added.  The
    signature for both the slot and the API function is as follows,
    although the NULL return conditions differ:  the argument is a
    PyObject * and so is the return value.  When the return value is
    non-NULL, it is the next value in the iteration.  When it is NULL,
    then for the tp_iternext slot there are three possibilities:

    - No exception is set; this implies the end of the iteration.

    - The StopIteration exception (or a derived exception class) is
      set; this implies the end of the iteration.

    - Some other exception is set; this means that an error occurred
      that should be propagated normally.

    The higher-level PyIter_Next() function clears the StopIteration
    exception (or derived exception) when it occurs, so its NULL return
    conditions are simpler:

    - No exception is set; this means iteration has ended.

    - Some exception is set; this means an error occurred, and should
      be propagated normally.

    In addition to the tp_iternext slot, every iterator object must
    also implement a next() method, callable without arguments.  This
    should have the same semantics as the tp_iternext slot function,
    except that the only way to signal the end of the iteration is to
    raise StopIteration.  The iterator object should not care whether
    its tp_iternext slot function is called or its next() method, and
    the caller may mix calls arbitrarily.  (The next() method is for
    the benefit of Python code using iterators directly; the
    tp_iternext slot is added to make 'for' loops more efficient.)

    To ensure binary backwards compatibility, a new flag
    Py_TPFLAGS_HAVE_ITER is added to the set of flags in the tp_flags
    field, and to the default flags macro.  This flag must be tested
    before accessing the tp_iter or tp_iternext slots.  The macro
    PyIter_Check() tests whether an object has the appropriate flag
    set and has a non-NULL tp_iternext slot.  There is no such macro
    for the tp_iter slot (since the only place where this slot is
    referenced should be PyObject_GetIter()).

    (Note: the tp_iter slot can be present on any object; the
    tp_iternext slot should only be present on objects that act as
    iterators.)

    For backwards compatibility, the PyObject_GetIter() function
    implements fallback semantics when its argument is a sequence that
    does not implement a tp_iter function: a lightweight sequence
    iterator object is constructed in that case which iterates over
    the items of the sequence in the natural order.

    The Python bytecode generated for 'for' loops is changed to use
    new opcodes, GET_ITER and FOR_ITER, that use the iterator protocol
    rather than the sequence protocol to get the next value for the
    loop variable.  This makes it possible to use a 'for' loop to loop
    over non-sequence objects that support the tp_iter slot.  Other
    places where the interpreter loops over the values of a sequence
    should also be changed to use iterators.

    Iterators ought to implement the tp_iter slot as returning a
    reference to themselves; this is needed to make it possible to
    use an iterator (as opposed to a sequence) in a for loop.


Python API Specification

    The StopIteration exception is made visible as one of the
    standard exceptions.  It is derived from Exception.

    A new built-in function is defined, iter(), which can be called in
    two ways:

    - iter(obj) calls PyObject_GetIter(obj).

    - iter(callable, sentinel) returns a special kind of iterator that
      calls the callable to produce a new value, and compares the
      return value to the sentinel value.  If the return value equals
      the sentinel, this signals the end of the iteration and
      StopIteration is raised rather than returning normal; if the
      return value does not equal the sentinel, it is returned as the
      next value from the iterator.  If the callable raises an
      exception, this is propagated normally; in particular, the
      function is allowed to raise StopIteration as an alternative way
      to end the iteration.  (This functionality is available from the
      C API as PyCallIter_New(callable, sentinel).)

    Iterator objects returned by either form of iter() have a next()
    method.  This method either returns the next value in the
    iteration, or raises StopIteration (or a derived exception class)
    to signal the end of the iteration.  Any other exception should be
    considered to signify an error and should be propagated normally,
    not taken to mean the end of the iteration.

    Classes can define how they are iterated over by defining an
    __iter__() method; this should take no additional arguments and
    return a valid iterator object.  A class is a valid iterator
    object when it defines a next() method that behaves as described
    above.  A class that wants to be an iterator also ought to
    implement __iter__() returning itself.


Dictionary Iterators

    The following two proposals are somewhat controversial.  They are
    also independent from the main iterator implementation.  However,
    they are both very useful.

    - Dictionaries implement a sq_contains slot that implements the
      same test as the has_key() method.  This means that we can write

          if k in dict: ...

      which is equivalent to

          if dict.has_key(k): ...

    - Dictionaries implement a tp_iter slot that returns an efficient
      iterator that iterates over the keys of the dictionary.  During
      such an iteration, the dictionary should not be modified, except
      that setting the value for an existing key is allowed (deletions
      or additions are not, nor is the update() method).  This means
      that we can write

          for k in dict: ...

      which is equivalent to, but much faster than

          for k in dict.keys(): ...

      as long as the restriction on modifications to the dictionary
      (either by the loop or by another thread) are not violated.

    - Add methods to dictionaries that return different kinds of
      iterators explicitly:

          for key in dict.iterkeys(): ...

          for value in dict.itervalues(): ...

          for key, value in dict.iteritems(): ...

      This means that "for x in dict" is shorthand for "for x in
      dict.iterkeys()".

    If this proposal is accepted, it makes sense to recommend that
    other mappings, if they support iterators at all, should also
    iterate over the keys.  However, this should not be taken as an
    absolute rule; specific applications may have different
    requirements.


File Iterators

    The following proposal is not controversial, but should be
    considered a separate step after introducing the iterator
    framework described above.  It is useful because it provides us
    with a good answer to the complaint that the common idiom to
    iterate over the lines of a file is ugly and slow.

    - Files implement a tp_iter slot that is equivalent to
      iter(f.readline, "").  This means that we can write

          for line in file:
              ...

      as a shorthand for

          for line in iter(file.readline, ""):
              ...

      which is equivalent to, but faster than

          while 1:
              line = file.readline()
              if not line:
                  break
              ...

    This also shows that some iterators are destructive: they consume
    all the values and a second iterator cannot easily be created that
    iterates independently over the same values.  You could open the
    file for a second time, or seek() to the beginning, but these
    solutions don't work for all file types, e.g. they don't work when
    the open file object really represents a pipe or a stream socket.


Rationale

    If all the parts of the proposal are included, this addresses many
    concerns in a consistent and flexible fashion.  Among its chief
    virtues are the following four -- no, five -- no, six -- points:

    1. It provides an extensible iterator interface.

    2. It allows performance enhancements to list iteration.

    3. It allows big performance enhancements to dictionary iteration.

    4. It allows one to provide an interface for just iteration
       without pretending to provide random access to elements.

    5. It is backward-compatible with all existing user-defined
       classes and extension objects that emulate sequences and
       mappings, even mappings that only implement a subset of
       {__getitem__, keys, values, items}.

    6. It makes code iterating over non-sequence collections more
       concise and readable.


Open Issues

    The following questions are still open.

    - The name iter() is an abbreviation.  Alternatives proposed
      include iterate(), traverse(), but these appear too long.
      Python has a history of using abbrs for common builtins,
      e.g. repr(), str(), len().

    - Using the same name for two different operations (getting an
      iterator from an object and making an iterator for a function
      with an sentinel value) is somewhat ugly.  I haven't seen a
      better name for the second operation though, and since they both
      return an iterator, it's easy to remember.

    - Once a particular iterator object has raised StopIteration, will
      it also raise StopIteration on all subsequent next() calls?
      Some say that it would be useful to require this, others say
      that it is useful to leave this open to individual iterators.
      Note that this may require an additional state bit for some
      iterator implementations (e.g. function-wrapping iterators).

    - It has been proposed that a file object should be its own
      iterator, with a next() method returning the next line.  This
      has certain advantages, and makes it even clearer that this
      iterator is destructive.  The disadvantage is that this would
      make it even more painful to implement the "sticky
      StopIteration" feature proposed in the previous bullet.

    - Some folks have requested extensions of the iterator protocol,
      e.g. prev() to get the previous item, current() to get the
      current item again, finished() to test whether the iterator is
      finished, and maybe even others, like rewind(), __len__(),
      position().

      While some of these are useful, many of these cannot easily be
      implemented for all iterator types without adding arbitrary
      buffering, and sometimes they can't be implemented at all (or
      not reasonably).  E.g. anything to do with reversing directions
      can't be done when iterating over a file or function.  Maybe a
      separate PEP can be drafted to standardize the names for such
      operations when the are implementable.

    - There is still discussion about whether

          for x in dict: ...

      should assign x the successive keys, values, or items of the
      dictionary.  The symmetry between "if x in y" and "for x in y"
      suggests that it should iterate over keys.  This symmetry has been
      observed by many independently and has even been used to "explain"
      one using the other.  This is because for sequences, "if x in y"
      iterates over y comparing the iterated values to x.  If we adopt
      both of the above proposals, this will also hold for
      dictionaries.

      The argument against making "for x in dict" iterate over the keys
      comes mostly from a practicality point of view: scans of the
      standard library show that there are about as many uses of "for x
      in dict.items()" as there are of "for x in dict.keys()", with the
      items() version having a small majority.  Presumably many of the
      loops using keys() use the corresponding value anyway, by writing
      dict[x], so (the argument goes) by making both the key and value
      available, we could support the largest number of cases.  While
      this is true, I (Guido) find the correspondence between "for x in
      dict" and "if x in dict" too compelling to break, and there's not
      much overhead in having to write dict[x] to explicitly get the
      value.

      For fast iteration over items, use "for key, value in
      dict.iteritems()".  I've timed the difference between

          for key in dict: dict[key]

      and

          for key, value in dict.iteritems(): pass

      and found that the latter is only about 7% faster.


Resolved Issues

    The following topics have been decided by consensus or BDFL
    pronouncement.

    - Two alternative spellings for next() have been proposed but
      rejected: __next__(), because it corresponds to a type object
      slot (tp_iternext); and __call__(), because this is the only
      operation.

      Arguments against __next__(): while many iterators are used in
      for loops, it is expected that user code will also call next()
      directly, so having to write __next__() is ugly; also, a
      possible extension of the protocol would be to allow for prev(),
      current() and reset() operations; surely we don't want to use
      __prev__(), __current__(), __reset__().

      Arguments against __call__() (the original proposal): taken out
      of context, x() is not very readable, while x.next() is clear;
      there's a danger that every special-purpose object wants to use
      __call__() for its most common operation, causing more confusion
      than clarity.

    - Some folks have requested the ability to restart an iterator.
      This should be dealt with by calling iter() on a sequence
      repeatedly, not by the iterator protocol itself.

    - It has been questioned whether an exception to signal the end of
      the iteration isn't too expensive.  Several alternatives for the
      StopIteration exception have been proposed: a special value End
      to signal the end, a function end() to test whether the iterator
      is finished, even reusing the IndexError exception.

      - A special value has the problem that if a sequence ever
        contains that special value, a loop over that sequence will
        end prematurely without any warning.  If the experience with
        null-terminated C strings hasn't taught us the problems this
        can cause, imagine the trouble a Python introspection tool
        would have iterating over a list of all built-in names,
        assuming that the special End value was a built-in name!

      - Calling an end() function would require two calls per
        iteration.  Two calls is much more expensive than one call
        plus a test for an exception.  Especially the time-critical
        for loop can test very cheaply for an exception.

      - Reusing IndexError can cause confusion because it can be a
        genuine error, which would be masked by ending the loop
        prematurely.

    - Some have asked for a standard iterator type.  Presumably all
      iterators would have to be derived from this type.  But this is
      not the Python way: dictionaries are mappings because they
      support __getitem__() and a handful other operations, not
      because they are derived from an abstract mapping type.

    - Regarding "if key in dict": there is no doubt that the
      dict.has_keys(x) interpretation of "x in dict" is by far the
      most useful interpretation, probably the only useful one.  There
      has been resistance against this because "x in list" checks
      whether x is present among the values, while the proposal makes
      "x in dict" check whether x is present among the keys.  Given
      that the symmetry between lists and dictionaries is very weak,
      this argument does not have much weight.


Mailing Lists

    The iterator protocol has been discussed extensively in a mailing
    list on SourceForge:

        http://lists.sourceforge.net/lists/listinfo/python-iterators

    Initially, some of the discussion was carried out at Yahoo;
    archives are still accessible:

        http://groups.yahoo.com/group/python-iter


Copyright

    This document is in the public domain.


Local Variables:
mode: indented-text
indent-tabs-mode: nil
End:
-												Fix the PEP number in the header.

											
										
										
											2001-02-19 01:08:07 -05:00
+								PEP: 234
-												Initial draft of "Iterators" PEP.

											
										
										
											2001-02-03 23:36:37 -05:00
+								Title: Iterators
 								Version: $Revision$
-												Almost completely rewritten, focusing on documenting the current state
of affairs, filling in some things still under discussion.

Ping, I hope this is okay with you.  If you want to revive "for
keys:values in dict" etc., you'll write a separate PEP, right?

											
										
										
											2001-04-23 14:31:46 -04:00
+								Author: ping@lfw.org (Ka-Ping Yee), guido@python.org (Guido van Rossum)
-												Initial draft of "Iterators" PEP.

											
										
										
											2001-02-03 23:36:37 -05:00
+								Status: Draft
 								Type: Standards Track
 								Python-Version: 2.1
 								Created: 30-Jan-2001
-												Update post-history; corrected a typo.

											
										
										
											2001-04-30 22:29:03 -04:00
+								Post-History: 30-Apr-2001
-												Initial draft of "Iterators" PEP.

											
										
										
											2001-02-03 23:36:37 -05:00
 								Abstract
 								    This document proposes an iteration interface that objects can
 								    provide to control the behaviour of 'for' loops.  Looping is
 								    customized by providing a method that produces an iterator object.
-												Almost completely rewritten, focusing on documenting the current state
of affairs, filling in some things still under discussion.

Ping, I hope this is okay with you.  If you want to revive "for
keys:values in dict" etc., you'll write a separate PEP, right?

											
										
										
											2001-04-23 14:31:46 -04:00
+								    The iterator provides a 'get next value' operation that produces
-												Fixed one small typo

											
										
										
											2001-05-01 13:52:06 -04:00
+								    the next item in the sequence each time it is called, raising an
-												Almost completely rewritten, focusing on documenting the current state
of affairs, filling in some things still under discussion.

Ping, I hope this is okay with you.  If you want to revive "for
keys:values in dict" etc., you'll write a separate PEP, right?

											
										
										
											2001-04-23 14:31:46 -04:00
+								    exception when no more items are available.
 								    In addition, specific iterators over the keys of a dictionary and
 								    over the lines of a file are proposed, and a proposal is made to
-												Correct typos and add bits of discussion.  Also add another "chief
virtue".

											
										
										
											2001-05-01 07:42:07 -04:00
+								    allow spelling dict.has_key(key) as "key in dict".
-												Almost completely rewritten, focusing on documenting the current state
of affairs, filling in some things still under discussion.

Ping, I hope this is okay with you.  If you want to revive "for
keys:values in dict" etc., you'll write a separate PEP, right?

											
										
										
											2001-04-23 14:31:46 -04:00
 								    Note: this is an almost complete rewrite of this PEP by the second
 								    author, describing the actual implementation checked into the
 								    trunk of the Python 2.2 CVS tree.  It is still open for
 								    discussion.  Some of the more esoteric proposals in the original
 								    version of this PEP have been withdrawn for now; these may be the
 								    subject of a separate PEP in the future.
 								C API Specification
 								    A new exception is defined, StopIteration, which can be used to
 								    signal the end of an iteration.
 								    A new slot named tp_iter for requesting an iterator is added to
 								    the type object structure.  This should be a function of one
 								    PyObject * argument returning a PyObject *, or NULL.  To use this
 								    slot, a new C API function PyObject_GetIter() is added, with the
 								    same signature as the tp_iter slot function.
 								    Another new slot, named tp_iternext, is added to the type
 								    structure, for obtaining the next value in the iteration.  To use
 								    this slot, a new C API function PyIter_Next() is added.  The
-												Make PyIter_Next() a little smarter (wrt its knowledge of iterator
internals) so clients can be a lot dumber (wrt their knowledge).

											
										
										
											2001-05-04 20:14:56 -04:00
+								    signature for both the slot and the API function is as follows,
 								    although the NULL return conditions differ:  the argument is a
 								    PyObject * and so is the return value.  When the return value is
 								    non-NULL, it is the next value in the iteration.  When it is NULL,
 								    then for the tp_iternext slot there are three possibilities:
-												Almost completely rewritten, focusing on documenting the current state
of affairs, filling in some things still under discussion.

Ping, I hope this is okay with you.  If you want to revive "for
keys:values in dict" etc., you'll write a separate PEP, right?

											
										
										
											2001-04-23 14:31:46 -04:00
 								    - No exception is set; this implies the end of the iteration.
 								    - The StopIteration exception (or a derived exception class) is
 								      set; this implies the end of the iteration.
 								    - Some other exception is set; this means that an error occurred
 								      that should be propagated normally.
-												Make PyIter_Next() a little smarter (wrt its knowledge of iterator
internals) so clients can be a lot dumber (wrt their knowledge).

											
										
										
											2001-05-04 20:14:56 -04:00
+								    The higher-level PyIter_Next() function clears the StopIteration
 								    exception (or derived exception) when it occurs, so its NULL return
 								    conditions are simpler:
 								    - No exception is set; this means iteration has ended.
 								    - Some exception is set; this means an error occurred, and should
 								      be propagated normally.
-												Almost completely rewritten, focusing on documenting the current state
of affairs, filling in some things still under discussion.

Ping, I hope this is okay with you.  If you want to revive "for
keys:values in dict" etc., you'll write a separate PEP, right?

											
										
										
											2001-04-23 14:31:46 -04:00
+								    In addition to the tp_iternext slot, every iterator object must
 								    also implement a next() method, callable without arguments.  This
 								    should have the same semantics as the tp_iternext slot function,
 								    except that the only way to signal the end of the iteration is to
 								    raise StopIteration.  The iterator object should not care whether
 								    its tp_iternext slot function is called or its next() method, and
 								    the caller may mix calls arbitrarily.  (The next() method is for
 								    the benefit of Python code using iterators directly; the
 								    tp_iternext slot is added to make 'for' loops more efficient.)
 								    To ensure binary backwards compatibility, a new flag
 								    Py_TPFLAGS_HAVE_ITER is added to the set of flags in the tp_flags
 								    field, and to the default flags macro.  This flag must be tested
 								    before accessing the tp_iter or tp_iternext slots.  The macro
 								    PyIter_Check() tests whether an object has the appropriate flag
 								    set and has a non-NULL tp_iternext slot.  There is no such macro
 								    for the tp_iter slot (since the only place where this slot is
 								    referenced should be PyObject_GetIter()).
 								    (Note: the tp_iter slot can be present on any object; the
 								    tp_iternext slot should only be present on objects that act as
 								    iterators.)
 								    For backwards compatibility, the PyObject_GetIter() function
 								    implements fallback semantics when its argument is a sequence that
 								    does not implement a tp_iter function: a lightweight sequence
 								    iterator object is constructed in that case which iterates over
 								    the items of the sequence in the natural order.
 								    The Python bytecode generated for 'for' loops is changed to use
 								    new opcodes, GET_ITER and FOR_ITER, that use the iterator protocol
 								    rather than the sequence protocol to get the next value for the
 								    loop variable.  This makes it possible to use a 'for' loop to loop
 								    over non-sequence objects that support the tp_iter slot.  Other
 								    places where the interpreter loops over the values of a sequence
 								    should also be changed to use iterators.
 								    Iterators ought to implement the tp_iter slot as returning a
 								    reference to themselves; this is needed to make it possible to
 								    use an iterator (as opposed to a sequence) in a for loop.
 								Python API Specification
-												Update post-history; corrected a typo.

											
										
										
											2001-04-30 22:29:03 -04:00
+								    The StopIteration exception is made visible as one of the
-												Almost completely rewritten, focusing on documenting the current state
of affairs, filling in some things still under discussion.

Ping, I hope this is okay with you.  If you want to revive "for
keys:values in dict" etc., you'll write a separate PEP, right?

											
										
										
											2001-04-23 14:31:46 -04:00
+								    standard exceptions.  It is derived from Exception.
 								    A new built-in function is defined, iter(), which can be called in
 								    two ways:
 								    - iter(obj) calls PyObject_GetIter(obj).
 								    - iter(callable, sentinel) returns a special kind of iterator that
 								      calls the callable to produce a new value, and compares the
 								      return value to the sentinel value.  If the return value equals
 								      the sentinel, this signals the end of the iteration and
 								      StopIteration is raised rather than returning normal; if the
 								      return value does not equal the sentinel, it is returned as the
 								      next value from the iterator.  If the callable raises an
 								      exception, this is propagated normally; in particular, the
-												Correct typos and add bits of discussion.  Also add another "chief
virtue".

											
										
										
											2001-05-01 07:42:07 -04:00
+								      function is allowed to raise StopIteration as an alternative way
 								      to end the iteration.  (This functionality is available from the
 								      C API as PyCallIter_New(callable, sentinel).)
-												Almost completely rewritten, focusing on documenting the current state
of affairs, filling in some things still under discussion.

Ping, I hope this is okay with you.  If you want to revive "for
keys:values in dict" etc., you'll write a separate PEP, right?

											
										
										
											2001-04-23 14:31:46 -04:00
 								    Iterator objects returned by either form of iter() have a next()
 								    method.  This method either returns the next value in the
-												Correct typos and add bits of discussion.  Also add another "chief
virtue".

											
										
										
											2001-05-01 07:42:07 -04:00
+								    iteration, or raises StopIteration (or a derived exception class)
 								    to signal the end of the iteration.  Any other exception should be
-												Almost completely rewritten, focusing on documenting the current state
of affairs, filling in some things still under discussion.

Ping, I hope this is okay with you.  If you want to revive "for
keys:values in dict" etc., you'll write a separate PEP, right?

											
										
										
											2001-04-23 14:31:46 -04:00
+								    considered to signify an error and should be propagated normally,
 								    not taken to mean the end of the iteration.
 								    Classes can define how they are iterated over by defining an
 								    __iter__() method; this should take no additional arguments and
 								    return a valid iterator object.  A class is a valid iterator
 								    object when it defines a next() method that behaves as described
 								    above.  A class that wants to be an iterator also ought to
 								    implement __iter__() returning itself.
-												Initial draft of "Iterators" PEP.

											
										
										
											2001-02-03 23:36:37 -05:00
-												Almost completely rewritten, focusing on documenting the current state
of affairs, filling in some things still under discussion.

Ping, I hope this is okay with you.  If you want to revive "for
keys:values in dict" etc., you'll write a separate PEP, right?

											
										
										
											2001-04-23 14:31:46 -04:00
+								Dictionary Iterators
-												Initial draft of "Iterators" PEP.

											
										
										
											2001-02-03 23:36:37 -05:00
-												Almost completely rewritten, focusing on documenting the current state
of affairs, filling in some things still under discussion.

Ping, I hope this is okay with you.  If you want to revive "for
keys:values in dict" etc., you'll write a separate PEP, right?

											
										
										
											2001-04-23 14:31:46 -04:00
+								    The following two proposals are somewhat controversial.  They are
 								    also independent from the main iterator implementation.  However,
 								    they are both very useful.
 								    - Dictionaries implement a sq_contains slot that implements the
 								      same test as the has_key() method.  This means that we can write
 								          if k in dict: ...
-												Initial draft of "Iterators" PEP.

											
										
										
											2001-02-03 23:36:37 -05:00
-												Almost completely rewritten, focusing on documenting the current state
of affairs, filling in some things still under discussion.

Ping, I hope this is okay with you.  If you want to revive "for
keys:values in dict" etc., you'll write a separate PEP, right?

											
										
										
											2001-04-23 14:31:46 -04:00
+								      which is equivalent to
 								          if dict.has_key(k): ...
-												Initial draft of "Iterators" PEP.

											
										
										
											2001-02-03 23:36:37 -05:00
-												Almost completely rewritten, focusing on documenting the current state
of affairs, filling in some things still under discussion.

Ping, I hope this is okay with you.  If you want to revive "for
keys:values in dict" etc., you'll write a separate PEP, right?

											
										
										
											2001-04-23 14:31:46 -04:00
+								    - Dictionaries implement a tp_iter slot that returns an efficient
 								      iterator that iterates over the keys of the dictionary.  During
 								      such an iteration, the dictionary should not be modified, except
 								      that setting the value for an existing key is allowed (deletions
 								      or additions are not, nor is the update() method).  This means
 								      that we can write
-												Initial draft of "Iterators" PEP.

											
										
										
											2001-02-03 23:36:37 -05:00
-												Almost completely rewritten, focusing on documenting the current state
of affairs, filling in some things still under discussion.

Ping, I hope this is okay with you.  If you want to revive "for
keys:values in dict" etc., you'll write a separate PEP, right?

											
										
										
											2001-04-23 14:31:46 -04:00
+								          for k in dict: ...
-												Initial draft of "Iterators" PEP.

											
										
										
											2001-02-03 23:36:37 -05:00
-												Almost completely rewritten, focusing on documenting the current state
of affairs, filling in some things still under discussion.

Ping, I hope this is okay with you.  If you want to revive "for
keys:values in dict" etc., you'll write a separate PEP, right?

											
										
										
											2001-04-23 14:31:46 -04:00
+								      which is equivalent to, but much faster than
-												Initial draft of "Iterators" PEP.

											
										
										
											2001-02-03 23:36:37 -05:00
-												Almost completely rewritten, focusing on documenting the current state
of affairs, filling in some things still under discussion.

Ping, I hope this is okay with you.  If you want to revive "for
keys:values in dict" etc., you'll write a separate PEP, right?

											
										
										
											2001-04-23 14:31:46 -04:00
+								          for k in dict.keys(): ...
-												Initial draft of "Iterators" PEP.

											
										
										
											2001-02-03 23:36:37 -05:00
-												Almost completely rewritten, focusing on documenting the current state
of affairs, filling in some things still under discussion.

Ping, I hope this is okay with you.  If you want to revive "for
keys:values in dict" etc., you'll write a separate PEP, right?

											
										
										
											2001-04-23 14:31:46 -04:00
+								      as long as the restriction on modifications to the dictionary
 								      (either by the loop or by another thread) are not violated.
-												Initial draft of "Iterators" PEP.

											
										
										
											2001-02-03 23:36:37 -05:00
-												I've implemented iterkeys(), itervalues() and iteritems() methods for
dictionaries.  These do away with the myth that iterating over the
keys and extracting the values is as fast as iterating over the items;
it is about 7% slower.

											
										
										
											2001-05-01 08:15:42 -04:00
+								    - Add methods to dictionaries that return different kinds of
 								      iterators explicitly:
 								          for key in dict.iterkeys(): ...
 								          for value in dict.itervalues(): ...
 								          for key, value in dict.iteritems(): ...
 								      This means that "for x in dict" is shorthand for "for x in
 								      dict.iterkeys()".
-												Moved all the discussion items together at the end, in two sections
"Open Issues" and "Resolved Issues".

											
										
										
											2001-04-30 22:04:28 -04:00
+								    If this proposal is accepted, it makes sense to recommend that
 								    other mappings, if they support iterators at all, should also
 								    iterate over the keys.  However, this should not be taken as an
 								    absolute rule; specific applications may have different
 								    requirements.
-												Initial draft of "Iterators" PEP.

											
										
										
											2001-02-03 23:36:37 -05:00
-												Almost completely rewritten, focusing on documenting the current state
of affairs, filling in some things still under discussion.

Ping, I hope this is okay with you.  If you want to revive "for
keys:values in dict" etc., you'll write a separate PEP, right?

											
										
										
											2001-04-23 14:31:46 -04:00
+								File Iterators
-												Initial draft of "Iterators" PEP.

											
										
										
											2001-02-03 23:36:37 -05:00
-												Almost completely rewritten, focusing on documenting the current state
of affairs, filling in some things still under discussion.

Ping, I hope this is okay with you.  If you want to revive "for
keys:values in dict" etc., you'll write a separate PEP, right?

											
										
										
											2001-04-23 14:31:46 -04:00
+								    The following proposal is not controversial, but should be
 								    considered a separate step after introducing the iterator
 								    framework described above.  It is useful because it provides us
 								    with a good answer to the complaint that the common idiom to
 								    iterate over the lines of a file is ugly and slow.
-												Initial draft of "Iterators" PEP.

											
										
										
											2001-02-03 23:36:37 -05:00
-												Almost completely rewritten, focusing on documenting the current state
of affairs, filling in some things still under discussion.

Ping, I hope this is okay with you.  If you want to revive "for
keys:values in dict" etc., you'll write a separate PEP, right?

											
										
										
											2001-04-23 14:31:46 -04:00
+								    - Files implement a tp_iter slot that is equivalent to
 								      iter(f.readline, "").  This means that we can write
-												Initial draft of "Iterators" PEP.

											
										
										
											2001-02-03 23:36:37 -05:00
-												Almost completely rewritten, focusing on documenting the current state
of affairs, filling in some things still under discussion.

Ping, I hope this is okay with you.  If you want to revive "for
keys:values in dict" etc., you'll write a separate PEP, right?

											
										
										
											2001-04-23 14:31:46 -04:00
+								          for line in file:
 								              ...
-												Initial draft of "Iterators" PEP.

											
										
										
											2001-02-03 23:36:37 -05:00
-												Almost completely rewritten, focusing on documenting the current state
of affairs, filling in some things still under discussion.

Ping, I hope this is okay with you.  If you want to revive "for
keys:values in dict" etc., you'll write a separate PEP, right?

											
										
										
											2001-04-23 14:31:46 -04:00
+								      as a shorthand for
-												Initial draft of "Iterators" PEP.

											
										
										
											2001-02-03 23:36:37 -05:00
-												Almost completely rewritten, focusing on documenting the current state
of affairs, filling in some things still under discussion.

Ping, I hope this is okay with you.  If you want to revive "for
keys:values in dict" etc., you'll write a separate PEP, right?

											
										
										
											2001-04-23 14:31:46 -04:00
+								          for line in iter(file.readline, ""):
 								              ...
-												Initial draft of "Iterators" PEP.

											
										
										
											2001-02-03 23:36:37 -05:00
-												Almost completely rewritten, focusing on documenting the current state
of affairs, filling in some things still under discussion.

Ping, I hope this is okay with you.  If you want to revive "for
keys:values in dict" etc., you'll write a separate PEP, right?

											
										
										
											2001-04-23 14:31:46 -04:00
+								      which is equivalent to, but faster than
-												Initial draft of "Iterators" PEP.

											
										
										
											2001-02-03 23:36:37 -05:00
-												Almost completely rewritten, focusing on documenting the current state
of affairs, filling in some things still under discussion.

Ping, I hope this is okay with you.  If you want to revive "for
keys:values in dict" etc., you'll write a separate PEP, right?

											
										
										
											2001-04-23 14:31:46 -04:00
+								          while 1:
 								              line = file.readline()
 								              if not line:
 								                  break
 								              ...
-												Initial draft of "Iterators" PEP.

											
										
										
											2001-02-03 23:36:37 -05:00
-												Almost completely rewritten, focusing on documenting the current state
of affairs, filling in some things still under discussion.

Ping, I hope this is okay with you.  If you want to revive "for
keys:values in dict" etc., you'll write a separate PEP, right?

											
										
										
											2001-04-23 14:31:46 -04:00
+								    This also shows that some iterators are destructive: they consume
 								    all the values and a second iterator cannot easily be created that
 								    iterates independently over the same values.  You could open the
 								    file for a second time, or seek() to the beginning, but these
 								    solutions don't work for all file types, e.g. they don't work when
 								    the open file object really represents a pipe or a stream socket.
-												Initial draft of "Iterators" PEP.

											
										
										
											2001-02-03 23:36:37 -05:00
 								Rationale
 								    If all the parts of the proposal are included, this addresses many
 								    concerns in a consistent and flexible fashion.  Among its chief
-												Correct typos and add bits of discussion.  Also add another "chief
virtue".

											
										
										
											2001-05-01 07:42:07 -04:00
+								    virtues are the following four -- no, five -- no, six -- points:
-												Initial draft of "Iterators" PEP.

											
										
										
											2001-02-03 23:36:37 -05:00
 . It provides an extensible iterator interface.
-												Correct typos and add bits of discussion.  Also add another "chief
virtue".

											
										
										
											2001-05-01 07:42:07 -04:00
+. It allows performance enhancements to list iteration.
-												Initial draft of "Iterators" PEP.

											
										
										
											2001-02-03 23:36:37 -05:00
-												Almost completely rewritten, focusing on documenting the current state
of affairs, filling in some things still under discussion.

Ping, I hope this is okay with you.  If you want to revive "for
keys:values in dict" etc., you'll write a separate PEP, right?

											
										
										
											2001-04-23 14:31:46 -04:00
+. It allows big performance enhancements to dictionary iteration.
-												Initial draft of "Iterators" PEP.

											
										
										
											2001-02-03 23:36:37 -05:00
 . It allows one to provide an interface for just iteration
 								       without pretending to provide random access to elements.
 . It is backward-compatible with all existing user-defined
 								       classes and extension objects that emulate sequences and
 								       mappings, even mappings that only implement a subset of
 								       {__getitem__, keys, values, items}.
-												Correct typos and add bits of discussion.  Also add another "chief
virtue".

											
										
										
											2001-05-01 07:42:07 -04:00
+. It makes code iterating over non-sequence collections more
 								       concise and readable.
-												Initial draft of "Iterators" PEP.

											
										
										
											2001-02-03 23:36:37 -05:00
-												Moved all the discussion items together at the end, in two sections
"Open Issues" and "Resolved Issues".

											
										
										
											2001-04-30 22:04:28 -04:00
+								Open Issues
 								    The following questions are still open.
 								    - The name iter() is an abbreviation.  Alternatives proposed
-												Correct typos and add bits of discussion.  Also add another "chief
virtue".

											
										
										
											2001-05-01 07:42:07 -04:00
+								      include iterate(), traverse(), but these appear too long.
 								      Python has a history of using abbrs for common builtins,
 								      e.g. repr(), str(), len().
-												Moved all the discussion items together at the end, in two sections
"Open Issues" and "Resolved Issues".

											
										
										
											2001-04-30 22:04:28 -04:00
 								    - Using the same name for two different operations (getting an
 								      iterator from an object and making an iterator for a function
 								      with an sentinel value) is somewhat ugly.  I haven't seen a
-												Correct typos and add bits of discussion.  Also add another "chief
virtue".

											
										
										
											2001-05-01 07:42:07 -04:00
+								      better name for the second operation though, and since they both
 								      return an iterator, it's easy to remember.
-												Moved all the discussion items together at the end, in two sections
"Open Issues" and "Resolved Issues".

											
										
										
											2001-04-30 22:04:28 -04:00
 								    - Once a particular iterator object has raised StopIteration, will
 								      it also raise StopIteration on all subsequent next() calls?
 								      Some say that it would be useful to require this, others say
 								      that it is useful to leave this open to individual iterators.
 								      Note that this may require an additional state bit for some
 								      iterator implementations (e.g. function-wrapping iterators).
-												Add proposal to make files their own iterator.

											
										
										
											2001-05-01 07:47:29 -04:00
+								    - It has been proposed that a file object should be its own
 								      iterator, with a next() method returning the next line.  This
 								      has certain advantages, and makes it even clearer that this
 								      iterator is destructive.  The disadvantage is that this would
 								      make it even more painful to implement the "sticky
 								      StopIteration" feature proposed in the previous bullet.
-												Moved all the discussion items together at the end, in two sections
"Open Issues" and "Resolved Issues".

											
										
										
											2001-04-30 22:04:28 -04:00
+								    - Some folks have requested extensions of the iterator protocol,
 								      e.g. prev() to get the previous item, current() to get the
 								      current item again, finished() to test whether the iterator is
 								      finished, and maybe even others, like rewind(), __len__(),
 								      position().
 								      While some of these are useful, many of these cannot easily be
 								      implemented for all iterator types without adding arbitrary
 								      buffering, and sometimes they can't be implemented at all (or
 								      not reasonably).  E.g. anything to do with reversing directions
 								      can't be done when iterating over a file or function.  Maybe a
 								      separate PEP can be drafted to standardize the names for such
 								      operations when the are implementable.
 								    - There is still discussion about whether
 								          for x in dict: ...
 								      should assign x the successive keys, values, or items of the
 								      dictionary.  The symmetry between "if x in y" and "for x in y"
 								      suggests that it should iterate over keys.  This symmetry has been
 								      observed by many independently and has even been used to "explain"
 								      one using the other.  This is because for sequences, "if x in y"
 								      iterates over y comparing the iterated values to x.  If we adopt
 								      both of the above proposals, this will also hold for
 								      dictionaries.
 								      The argument against making "for x in dict" iterate over the keys
 								      comes mostly from a practicality point of view: scans of the
 								      standard library show that there are about as many uses of "for x
 								      in dict.items()" as there are of "for x in dict.keys()", with the
 								      items() version having a small majority.  Presumably many of the
 								      loops using keys() use the corresponding value anyway, by writing
 								      dict[x], so (the argument goes) by making both the key and value
 								      available, we could support the largest number of cases.  While
 								      this is true, I (Guido) find the correspondence between "for x in
 								      dict" and "if x in dict" too compelling to break, and there's not
 								      much overhead in having to write dict[x] to explicitly get the
-												I've implemented iterkeys(), itervalues() and iteritems() methods for
dictionaries.  These do away with the myth that iterating over the
keys and extracting the values is as fast as iterating over the items;
it is about 7% slower.

											
										
										
											2001-05-01 08:15:42 -04:00
+								      value.
 								      For fast iteration over items, use "for key, value in
 								      dict.iteritems()".  I've timed the difference between
-												Correct typos and add bits of discussion.  Also add another "chief
virtue".

											
										
										
											2001-05-01 07:42:07 -04:00
 								          for key in dict: dict[key]
 								      and
-												I've implemented iterkeys(), itervalues() and iteritems() methods for
dictionaries.  These do away with the myth that iterating over the
keys and extracting the values is as fast as iterating over the items;
it is about 7% slower.

											
										
										
											2001-05-01 08:15:42 -04:00
+								          for key, value in dict.iteritems(): pass
-												Moved all the discussion items together at the end, in two sections
"Open Issues" and "Resolved Issues".

											
										
										
											2001-04-30 22:04:28 -04:00
-												I've implemented iterkeys(), itervalues() and iteritems() methods for
dictionaries.  These do away with the myth that iterating over the
keys and extracting the values is as fast as iterating over the items;
it is about 7% slower.

											
										
										
											2001-05-01 08:15:42 -04:00
+								      and found that the latter is only about 7% faster.
-												Moved all the discussion items together at the end, in two sections
"Open Issues" and "Resolved Issues".

											
										
										
											2001-04-30 22:04:28 -04:00
 								Resolved Issues
 								    The following topics have been decided by consensus or BDFL
 								    pronouncement.
 								    - Two alternative spellings for next() have been proposed but
 								      rejected: __next__(), because it corresponds to a type object
 								      slot (tp_iternext); and __call__(), because this is the only
 								      operation.
 								      Arguments against __next__(): while many iterators are used in
 								      for loops, it is expected that user code will also call next()
 								      directly, so having to write __next__() is ugly; also, a
 								      possible extension of the protocol would be to allow for prev(),
 								      current() and reset() operations; surely we don't want to use
 								      __prev__(), __current__(), __reset__().
 								      Arguments against __call__() (the original proposal): taken out
 								      of context, x() is not very readable, while x.next() is clear;
 								      there's a danger that every special-purpose object wants to use
 								      __call__() for its most common operation, causing more confusion
 								      than clarity.
 								    - Some folks have requested the ability to restart an iterator.
 								      This should be dealt with by calling iter() on a sequence
 								      repeatedly, not by the iterator protocol itself.
 								    - It has been questioned whether an exception to signal the end of
 								      the iteration isn't too expensive.  Several alternatives for the
 								      StopIteration exception have been proposed: a special value End
 								      to signal the end, a function end() to test whether the iterator
 								      is finished, even reusing the IndexError exception.
 								      - A special value has the problem that if a sequence ever
 								        contains that special value, a loop over that sequence will
 								        end prematurely without any warning.  If the experience with
 								        null-terminated C strings hasn't taught us the problems this
 								        can cause, imagine the trouble a Python introspection tool
 								        would have iterating over a list of all built-in names,
 								        assuming that the special End value was a built-in name!
 								      - Calling an end() function would require two calls per
 								        iteration.  Two calls is much more expensive than one call
 								        plus a test for an exception.  Especially the time-critical
 								        for loop can test very cheaply for an exception.
 								      - Reusing IndexError can cause confusion because it can be a
 								        genuine error, which would be masked by ending the loop
 								        prematurely.
 								    - Some have asked for a standard iterator type.  Presumably all
 								      iterators would have to be derived from this type.  But this is
 								      not the Python way: dictionaries are mappings because they
 								      support __getitem__() and a handful other operations, not
 								      because they are derived from an abstract mapping type.
 								    - Regarding "if key in dict": there is no doubt that the
 								      dict.has_keys(x) interpretation of "x in dict" is by far the
 								      most useful interpretation, probably the only useful one.  There
 								      has been resistance against this because "x in list" checks
 								      whether x is present among the values, while the proposal makes
 								      "x in dict" check whether x is present among the keys.  Given
 								      that the symmetry between lists and dictionaries is very weak,
 								      this argument does not have much weight.
-												BDFL pronouncement on next() vs. __next__() vs. __call__().

Add mailing list pointers.

Add discussion on "once-stopped-always-stopped".

											
										
										
											2001-04-27 11:26:54 -04:00
+								Mailing Lists
 								    The iterator protocol has been discussed extensively in a mailing
 								    list on SourceForge:
 								        http://lists.sourceforge.net/lists/listinfo/python-iterators
 								    Initially, some of the discussion was carried out at Yahoo;
 								    archives are still accessible:
 								        http://groups.yahoo.com/group/python-iter
-												Moved all the discussion items together at the end, in two sections
"Open Issues" and "Resolved Issues".

											
										
										
											2001-04-30 22:04:28 -04:00
-												Almost completely rewritten, focusing on documenting the current state
of affairs, filling in some things still under discussion.

Ping, I hope this is okay with you.  If you want to revive "for
keys:values in dict" etc., you'll write a separate PEP, right?

											
										
										
											2001-04-23 14:31:46 -04:00
+								Copyright
-												Initial draft of "Iterators" PEP.

											
										
										
											2001-02-03 23:36:37 -05:00
-												Almost completely rewritten, focusing on documenting the current state
of affairs, filling in some things still under discussion.

Ping, I hope this is okay with you.  If you want to revive "for
keys:values in dict" etc., you'll write a separate PEP, right?

											
										
										
											2001-04-23 14:31:46 -04:00
+								    This document is in the public domain.
-												Initial draft of "Iterators" PEP.

											
										
										
											2001-02-03 23:36:37 -05:00
 								Local Variables:
 								mode: indented-text
 								indent-tabs-mode: nil
 								End: