394 lines
13 KiB
Plaintext
394 lines
13 KiB
Plaintext
|
PEP: 276
|
|||
|
Title: Simple Iterator for ints
|
|||
|
Version: $Revision$
|
|||
|
Last-Modified: $Date$
|
|||
|
Author: james_althoff@i2.com (Jim Althoff)
|
|||
|
Status: Draft
|
|||
|
Type: Standards Track
|
|||
|
Created: 12-Nov-2001
|
|||
|
Python-Version: 2.3
|
|||
|
Post-History:
|
|||
|
|
|||
|
|
|||
|
Abstract
|
|||
|
|
|||
|
Python 2.1 added new functionality to support iterators[1].
|
|||
|
Iterators have proven to be useful and convenient in many coding
|
|||
|
situations. It is noted that the implementation of Python's
|
|||
|
for-loop control structure uses the iterator protocol as of
|
|||
|
release 2.1. It is also noted that Python provides iterators for
|
|||
|
the following builtin types: lists, tuples, dictionaries, strings,
|
|||
|
and files. This PEP proposes the addition of an iterator for the
|
|||
|
builtin type int (types.IntType). Such an iterator would simplify
|
|||
|
the coding of certain for-loops in Python.
|
|||
|
|
|||
|
|
|||
|
Specification
|
|||
|
|
|||
|
Define an iterator for types.intType (i.e., the builtin type
|
|||
|
"int") that is returned from the builtin function "iter" when
|
|||
|
called with an instance of types.intType as the argument.
|
|||
|
|
|||
|
The returned iterator has the following behavior:
|
|||
|
|
|||
|
- Assume that object i is an instance of types.intType (the
|
|||
|
builtin type int) and that i > 0
|
|||
|
|
|||
|
- iter(i) returns an iterator object
|
|||
|
|
|||
|
- said iterator object iterates through the sequence of ints
|
|||
|
0,1,2,...,i-1
|
|||
|
|
|||
|
Example:
|
|||
|
|
|||
|
iter(5) returns an iterator object that iterates through the
|
|||
|
sequence of ints 0,1,2,3,4
|
|||
|
|
|||
|
- if i <= 0, iter(i) returns an "empty" iterator, i.e., one that
|
|||
|
throws StopIteration upon the first call of its "next" method
|
|||
|
|
|||
|
In other words, the conditions and semantics of said iterator is
|
|||
|
consistent with the conditions and semantics of the range() and
|
|||
|
xrange() functions.
|
|||
|
|
|||
|
Note that the sequence 0,1,2,...,i-1 associated with the int i is
|
|||
|
considered "natural" in the context of Python programming because
|
|||
|
it is consistent with the builtin indexing protocol of sequences
|
|||
|
in Python. Python lists and tuples, for example, are indexed
|
|||
|
starting at 0 and ending at len(object)-1 (when using positive
|
|||
|
indices). In other words, such objects are indexed with the
|
|||
|
sequence 0,1,2,...,len(object)-1
|
|||
|
|
|||
|
|
|||
|
Rationale
|
|||
|
|
|||
|
A common programming idiom is to take a collection of objects and
|
|||
|
apply some operation to each item in the collection in some
|
|||
|
established sequential order. Python provides the "for in"
|
|||
|
looping control structure for handling this common idiom. Cases
|
|||
|
arise, however, where it is necessary (or more convenient) to
|
|||
|
access each item in an "indexed" collection by iterating through
|
|||
|
each index and accessing each item in the collection using the
|
|||
|
corresponding index.
|
|||
|
|
|||
|
For example, one might have a two-dimensional "table" object where one
|
|||
|
requires the application of some operation to the first column of
|
|||
|
each row in the table. Depending on the implementation of the table
|
|||
|
it might not be possible to access first each row and then each
|
|||
|
column as individual objects. It might, rather, be possible to
|
|||
|
access a cell in the table using a row index and a column index.
|
|||
|
In such a case it is necessary to use an idiom where one iterates
|
|||
|
through a sequence of indices (indexes) in order to access the
|
|||
|
desired items in the table. (Note that the commonly used
|
|||
|
DefaultTableModel class in Java-Swing-Jython has this very protocol).
|
|||
|
|
|||
|
Another common example is where one needs to process two or more
|
|||
|
collections in parallel. Another example is where one needs to
|
|||
|
access, say, every second item in a collection.
|
|||
|
|
|||
|
There are many other examples where access to items in a
|
|||
|
collection is facilitated by a computation on an index thus
|
|||
|
necessitating access to the indices rather than direct access to
|
|||
|
the items themselves.
|
|||
|
|
|||
|
Let's call this idiom the "indexed for-loop" idiom. Some
|
|||
|
programming languages provide builtin syntax for handling this
|
|||
|
idiom. In Python the common convention for implementing the
|
|||
|
indexed for-loop idiom is to use the builtin range() or xrange()
|
|||
|
function to generate a sequence of indices as in, for example:
|
|||
|
|
|||
|
for rowcount in range(table.getRowCount()):
|
|||
|
print table.getValueAt(rowcount, 0)
|
|||
|
|
|||
|
or
|
|||
|
|
|||
|
for rowcount in xrange(table.getRowCount()):
|
|||
|
print table.getValueAt(rowcount, 0)
|
|||
|
|
|||
|
From time to time there are discussions in the Python community
|
|||
|
about the indexed for-loop idiom. It is sometimes argued that the
|
|||
|
need for using the range() or xrange() function for this design
|
|||
|
idiom is:
|
|||
|
|
|||
|
- Not obvious (to new-to-Python programmers),
|
|||
|
|
|||
|
- Error prone (easy to forget, even for experienced Python
|
|||
|
programmers)
|
|||
|
|
|||
|
- Confusing and distracting for those who feel compelled to understand
|
|||
|
the differences and recommended usage of xrange() vis-a-vis range()
|
|||
|
|
|||
|
- Unwieldy, especially when combined with the len() function,
|
|||
|
i.e., xrange(len(sequence))
|
|||
|
|
|||
|
- Not as convenient as equivalent mechanisms in other languages,
|
|||
|
|
|||
|
- Annoying, a "wart", etc.
|
|||
|
|
|||
|
And from time to time proposals are put forth for ways in which
|
|||
|
Python could provide a better mechanism for this idiom. Recent
|
|||
|
examples include PEP 204, "Range Literals", and PEP 212, "Loop
|
|||
|
Counter Iteration".
|
|||
|
|
|||
|
Most often, such proposal include changes to Python's syntax and
|
|||
|
other "heavyweight" changes.
|
|||
|
|
|||
|
Part of the difficulty here is that advocating new syntax implies
|
|||
|
a comprehensive solution for "general indexing" that has to
|
|||
|
include aspects like:
|
|||
|
|
|||
|
- starting index value
|
|||
|
|
|||
|
- ending index value
|
|||
|
|
|||
|
- step value
|
|||
|
|
|||
|
- open intervals versus closed intervals versus half opened intervals
|
|||
|
|
|||
|
Finding a new syntax that is comprehensive, simple, general,
|
|||
|
Pythonic, appealing to many, easy to implement, not in conflict
|
|||
|
with existing structures, not excessively overloading of existing
|
|||
|
structures, etc. has proven to be more difficult than one might
|
|||
|
anticipate.
|
|||
|
|
|||
|
The proposal outlined in this PEP tries to address the problem by
|
|||
|
suggesting a simple "lightweight" solution that helps the most
|
|||
|
common case by using a proven mechanism that is already available
|
|||
|
(as of Python 2.1): namely, iterators.
|
|||
|
|
|||
|
Because for-loops already use "iterator" protocol as of Python
|
|||
|
2.1, adding an iterator for types.IntType as proposed in this PEP
|
|||
|
would enable by default the following shortcut for the indexed
|
|||
|
for-loop idiom:
|
|||
|
|
|||
|
for rowcount in table.getRowCount():
|
|||
|
print table.getValueAt(rowcount, 0)
|
|||
|
|
|||
|
The following benefits for this approach vis-a-vis the current
|
|||
|
mechanism of using the range() or xrange() functions are claimed
|
|||
|
to be:
|
|||
|
|
|||
|
- Simpler,
|
|||
|
|
|||
|
- Less cluttered,
|
|||
|
|
|||
|
- Focuses on the problem at hand without the need to resort to
|
|||
|
secondary implementation-oriented functions (range() and
|
|||
|
xrange())
|
|||
|
|
|||
|
And compared to other proposals for change:
|
|||
|
|
|||
|
- Requires no new syntax
|
|||
|
|
|||
|
- Requires no new keywords
|
|||
|
|
|||
|
- Takes advantage of the new and well-established iterator mechanism
|
|||
|
|
|||
|
And generally:
|
|||
|
|
|||
|
- Is consistent with iterator-based "convenience" changes already
|
|||
|
included (as of Python 2.1) for other builtin types such as:
|
|||
|
lists, tuples, dictionaries, strings, and files.
|
|||
|
|
|||
|
Preliminary discussion on the Python interest mailing list
|
|||
|
suggests a reasonable amount of initial support for this PEP
|
|||
|
(along with some dissents/issues noted below).
|
|||
|
|
|||
|
|
|||
|
Backwards Compatibility
|
|||
|
|
|||
|
The proposed mechanism is generally backwards compatible as it
|
|||
|
calls for neither new syntax nor new keywords. All existing,
|
|||
|
valid Python programs should continue to work unmodified.
|
|||
|
|
|||
|
However, this proposal is not perfectly backwards compatible in
|
|||
|
the sense that certain statements that are currently invalid
|
|||
|
would, under the current proposal, become valid.
|
|||
|
|
|||
|
Tim Peters has pointed out two such examples:
|
|||
|
|
|||
|
1) The common case where one forgets to include range() or
|
|||
|
xrange(), for example:
|
|||
|
|
|||
|
for rowcount in table.getRowCount():
|
|||
|
print table.getValueAt(rowcount, 0)
|
|||
|
|
|||
|
in Python 2.2 raises a TypeError exception.
|
|||
|
|
|||
|
Under the current proposal, the above statement would be valid
|
|||
|
and would work as (presumably) intended. Presumably, this is a
|
|||
|
good thing.
|
|||
|
|
|||
|
As noted by Tim, this is the common case of the "forgotten
|
|||
|
range" mistake (which one currently corrects by adding a call
|
|||
|
to range() or xrange()).
|
|||
|
|
|||
|
2) The (hopefully) very uncommon case where one makes a typing
|
|||
|
mistake when using tuple unpacking. For example:
|
|||
|
|
|||
|
x, = 1
|
|||
|
|
|||
|
in Python 2.2 raises a TypeError exception.
|
|||
|
|
|||
|
Under the current proposal, the above statement would be valid
|
|||
|
and would set x to 0. The PEP author has no data as to how
|
|||
|
common this typing error is nor how difficult it would be to
|
|||
|
catch such an error under the current proposal. He imagines
|
|||
|
that it does not occur frequently and that it would be
|
|||
|
relatively easy to correct should it happen.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Issues:
|
|||
|
|
|||
|
Based on some preliminary discussion on the Python interest
|
|||
|
mailing list, the following concerns have been voiced:
|
|||
|
|
|||
|
- Is it obvious that iter(5) maps to the sequence 0,1,2,3,4?
|
|||
|
|
|||
|
Response: Given, as noted above, that Python has a strong
|
|||
|
convention for indexing sequences starting at 0 and stopping at
|
|||
|
(inclusively) the index whose value is one less than the length
|
|||
|
of the sequence, it is argued that the proposed sequence is
|
|||
|
reasonably intuitive to a Python programmer while being useful
|
|||
|
and practical.
|
|||
|
|
|||
|
- "in" (as in "for i in x") does not match standard English usage
|
|||
|
in this case. "up to" or something similar might be better.
|
|||
|
|
|||
|
Response: Not everyone felt that matching standard English
|
|||
|
perfectly is a requirement. It is noted that "for:else:"
|
|||
|
doesn't match standard English very well either. And few are
|
|||
|
excited about adding a new keyword, especially just to get a
|
|||
|
somewhat better match to standard English usage.
|
|||
|
|
|||
|
- Possible ambiguity
|
|||
|
|
|||
|
for i in 10: print i
|
|||
|
|
|||
|
might be mistaken for
|
|||
|
|
|||
|
for i in (10,): print i
|
|||
|
|
|||
|
Response: The predicted ambiguity was not readily apparent to
|
|||
|
several of the posters.
|
|||
|
|
|||
|
- It would be better to add special new syntax such as:
|
|||
|
|
|||
|
for i in 0..10: print i
|
|||
|
|
|||
|
Response: There are other PEPs that take this approach[2][3].
|
|||
|
|
|||
|
- It would be better to reuse the ellipsis literal syntax (...)
|
|||
|
|
|||
|
Response: Shares disadvantages of other proposals that require
|
|||
|
changes to the syntax. Needs more design to determine how it
|
|||
|
would handle the general case of start,stop,step,
|
|||
|
open/closed/half-closed intervals, etc. Needs a PEP.
|
|||
|
|
|||
|
- It would be better to reuse the slicing literal syntax attached
|
|||
|
to the int class, e.g., int[0:10]
|
|||
|
|
|||
|
Response: Same as previous response. In addition, design
|
|||
|
consideration needs to be given to what it would mean if one
|
|||
|
uses slicing syntax after some arbitrary class other than class
|
|||
|
int. Needs a PEP.
|
|||
|
|
|||
|
- Might dissuade newbies from using the indexed for-loop idiom when
|
|||
|
the standard "for item in collection:" idiom is clearly better.
|
|||
|
|
|||
|
Response: The standard idiom is so nice when "it fits" that it
|
|||
|
needs neither extra "carrot" nor "stick". On the other hand,
|
|||
|
one does notice cases of overuse/misuse of the standard idiom
|
|||
|
(due, most likely, to the awkwardness of the indexed for-loop
|
|||
|
idiom), as in:
|
|||
|
|
|||
|
for item in sequence:
|
|||
|
print sequence.index(item)
|
|||
|
|
|||
|
- Doesn't handle the general case of start,stop,step
|
|||
|
|
|||
|
Response: use the existing range() or xrange() mechanisms. Or,
|
|||
|
see below.
|
|||
|
|
|||
|
|
|||
|
Extension
|
|||
|
|
|||
|
If one wants to handle general indexing (start,stop,step) without
|
|||
|
having to resort to using the range() or xrange() functions then
|
|||
|
the following could be incorporated into the current proposal.
|
|||
|
|
|||
|
Add an "iter" method (or use some other preferred name) to
|
|||
|
types.IntType with the following signature:
|
|||
|
|
|||
|
def iter(start=0, step=1):
|
|||
|
|
|||
|
This method would have the (hopefully) obvious semantics.
|
|||
|
|
|||
|
Then one could do, for example:
|
|||
|
|
|||
|
x = 100
|
|||
|
for i in x.iter(start=1, step=2):
|
|||
|
print i
|
|||
|
|
|||
|
Under this extension (for x bound to an int),
|
|||
|
|
|||
|
for i in x:
|
|||
|
|
|||
|
would be equivalent to
|
|||
|
|
|||
|
for i in x.iter():
|
|||
|
|
|||
|
and to
|
|||
|
|
|||
|
for i in x.iter(start=0, step=1):
|
|||
|
|
|||
|
This extension is consistent with the generalization provided by
|
|||
|
the current mechanism for dictionaries whereby one can use:
|
|||
|
|
|||
|
for k in d.iterkeys():
|
|||
|
for v in d.itervalues():
|
|||
|
for k,v in d.iteritems():
|
|||
|
|
|||
|
depending on one's needs, given that
|
|||
|
|
|||
|
for i in d:
|
|||
|
|
|||
|
has a meaning aimed at the most common and useful case (d.iterkeys()).
|
|||
|
|
|||
|
|
|||
|
Implementation
|
|||
|
|
|||
|
An implementation is not available at this time and although the
|
|||
|
author is not qualified to comment on such he will, nonetheless,
|
|||
|
speculate that this might be straightforward and, hopefully, might
|
|||
|
consist of little more than setting the tp_iter slot in
|
|||
|
types.IntType to point to a simple iterator function that would be
|
|||
|
similar to -- or perhaps even a wrapper around -- the xrange()
|
|||
|
function.
|
|||
|
|
|||
|
|
|||
|
References
|
|||
|
|
|||
|
[1] PEP 234, Iterators
|
|||
|
http://python.sourceforge.net/peps/pep-0234.html
|
|||
|
|
|||
|
[2] PEP 204, Range Literals
|
|||
|
http://python.sourceforge.net/peps/pep-0204.html
|
|||
|
|
|||
|
[3] PEP 212, Loop Counter Iteration
|
|||
|
http://python.sourceforge.net/peps/pep-0212.html
|
|||
|
|
|||
|
|
|||
|
Copyright
|
|||
|
|
|||
|
This document has been placed in the public domain.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Local Variables:
|
|||
|
mode: indented-text
|
|||
|
indent-tabs-mode: nil
|
|||
|
fill-column: 70
|
|||
|
End:
|