394 lines
13 KiB
Plaintext
394 lines
13 KiB
Plaintext
PEP: 276
|
||
Title: Simple Iterator for ints
|
||
Version: $Revision$
|
||
Last-Modified: $Date$
|
||
Author: james_althoff@i2.com (Jim Althoff)
|
||
Status: Draft
|
||
Type: Standards Track
|
||
Created: 12-Nov-2001
|
||
Python-Version: 2.3
|
||
Post-History:
|
||
|
||
|
||
Abstract
|
||
|
||
Python 2.1 added new functionality to support iterators[1].
|
||
Iterators have proven to be useful and convenient in many coding
|
||
situations. It is noted that the implementation of Python's
|
||
for-loop control structure uses the iterator protocol as of
|
||
release 2.1. It is also noted that Python provides iterators for
|
||
the following builtin types: lists, tuples, dictionaries, strings,
|
||
and files. This PEP proposes the addition of an iterator for the
|
||
builtin type int (types.IntType). Such an iterator would simplify
|
||
the coding of certain for-loops in Python.
|
||
|
||
|
||
Specification
|
||
|
||
Define an iterator for types.intType (i.e., the builtin type
|
||
"int") that is returned from the builtin function "iter" when
|
||
called with an instance of types.intType as the argument.
|
||
|
||
The returned iterator has the following behavior:
|
||
|
||
- Assume that object i is an instance of types.intType (the
|
||
builtin type int) and that i > 0
|
||
|
||
- iter(i) returns an iterator object
|
||
|
||
- said iterator object iterates through the sequence of ints
|
||
0,1,2,...,i-1
|
||
|
||
Example:
|
||
|
||
iter(5) returns an iterator object that iterates through the
|
||
sequence of ints 0,1,2,3,4
|
||
|
||
- if i <= 0, iter(i) returns an "empty" iterator, i.e., one that
|
||
throws StopIteration upon the first call of its "next" method
|
||
|
||
In other words, the conditions and semantics of said iterator is
|
||
consistent with the conditions and semantics of the range() and
|
||
xrange() functions.
|
||
|
||
Note that the sequence 0,1,2,...,i-1 associated with the int i is
|
||
considered "natural" in the context of Python programming because
|
||
it is consistent with the builtin indexing protocol of sequences
|
||
in Python. Python lists and tuples, for example, are indexed
|
||
starting at 0 and ending at len(object)-1 (when using positive
|
||
indices). In other words, such objects are indexed with the
|
||
sequence 0,1,2,...,len(object)-1
|
||
|
||
|
||
Rationale
|
||
|
||
A common programming idiom is to take a collection of objects and
|
||
apply some operation to each item in the collection in some
|
||
established sequential order. Python provides the "for in"
|
||
looping control structure for handling this common idiom. Cases
|
||
arise, however, where it is necessary (or more convenient) to
|
||
access each item in an "indexed" collection by iterating through
|
||
each index and accessing each item in the collection using the
|
||
corresponding index.
|
||
|
||
For example, one might have a two-dimensional "table" object where one
|
||
requires the application of some operation to the first column of
|
||
each row in the table. Depending on the implementation of the table
|
||
it might not be possible to access first each row and then each
|
||
column as individual objects. It might, rather, be possible to
|
||
access a cell in the table using a row index and a column index.
|
||
In such a case it is necessary to use an idiom where one iterates
|
||
through a sequence of indices (indexes) in order to access the
|
||
desired items in the table. (Note that the commonly used
|
||
DefaultTableModel class in Java-Swing-Jython has this very protocol).
|
||
|
||
Another common example is where one needs to process two or more
|
||
collections in parallel. Another example is where one needs to
|
||
access, say, every second item in a collection.
|
||
|
||
There are many other examples where access to items in a
|
||
collection is facilitated by a computation on an index thus
|
||
necessitating access to the indices rather than direct access to
|
||
the items themselves.
|
||
|
||
Let's call this idiom the "indexed for-loop" idiom. Some
|
||
programming languages provide builtin syntax for handling this
|
||
idiom. In Python the common convention for implementing the
|
||
indexed for-loop idiom is to use the builtin range() or xrange()
|
||
function to generate a sequence of indices as in, for example:
|
||
|
||
for rowcount in range(table.getRowCount()):
|
||
print table.getValueAt(rowcount, 0)
|
||
|
||
or
|
||
|
||
for rowcount in xrange(table.getRowCount()):
|
||
print table.getValueAt(rowcount, 0)
|
||
|
||
From time to time there are discussions in the Python community
|
||
about the indexed for-loop idiom. It is sometimes argued that the
|
||
need for using the range() or xrange() function for this design
|
||
idiom is:
|
||
|
||
- Not obvious (to new-to-Python programmers),
|
||
|
||
- Error prone (easy to forget, even for experienced Python
|
||
programmers)
|
||
|
||
- Confusing and distracting for those who feel compelled to understand
|
||
the differences and recommended usage of xrange() vis-a-vis range()
|
||
|
||
- Unwieldy, especially when combined with the len() function,
|
||
i.e., xrange(len(sequence))
|
||
|
||
- Not as convenient as equivalent mechanisms in other languages,
|
||
|
||
- Annoying, a "wart", etc.
|
||
|
||
And from time to time proposals are put forth for ways in which
|
||
Python could provide a better mechanism for this idiom. Recent
|
||
examples include PEP 204, "Range Literals", and PEP 212, "Loop
|
||
Counter Iteration".
|
||
|
||
Most often, such proposal include changes to Python's syntax and
|
||
other "heavyweight" changes.
|
||
|
||
Part of the difficulty here is that advocating new syntax implies
|
||
a comprehensive solution for "general indexing" that has to
|
||
include aspects like:
|
||
|
||
- starting index value
|
||
|
||
- ending index value
|
||
|
||
- step value
|
||
|
||
- open intervals versus closed intervals versus half opened intervals
|
||
|
||
Finding a new syntax that is comprehensive, simple, general,
|
||
Pythonic, appealing to many, easy to implement, not in conflict
|
||
with existing structures, not excessively overloading of existing
|
||
structures, etc. has proven to be more difficult than one might
|
||
anticipate.
|
||
|
||
The proposal outlined in this PEP tries to address the problem by
|
||
suggesting a simple "lightweight" solution that helps the most
|
||
common case by using a proven mechanism that is already available
|
||
(as of Python 2.1): namely, iterators.
|
||
|
||
Because for-loops already use "iterator" protocol as of Python
|
||
2.1, adding an iterator for types.IntType as proposed in this PEP
|
||
would enable by default the following shortcut for the indexed
|
||
for-loop idiom:
|
||
|
||
for rowcount in table.getRowCount():
|
||
print table.getValueAt(rowcount, 0)
|
||
|
||
The following benefits for this approach vis-a-vis the current
|
||
mechanism of using the range() or xrange() functions are claimed
|
||
to be:
|
||
|
||
- Simpler,
|
||
|
||
- Less cluttered,
|
||
|
||
- Focuses on the problem at hand without the need to resort to
|
||
secondary implementation-oriented functions (range() and
|
||
xrange())
|
||
|
||
And compared to other proposals for change:
|
||
|
||
- Requires no new syntax
|
||
|
||
- Requires no new keywords
|
||
|
||
- Takes advantage of the new and well-established iterator mechanism
|
||
|
||
And generally:
|
||
|
||
- Is consistent with iterator-based "convenience" changes already
|
||
included (as of Python 2.1) for other builtin types such as:
|
||
lists, tuples, dictionaries, strings, and files.
|
||
|
||
Preliminary discussion on the Python interest mailing list
|
||
suggests a reasonable amount of initial support for this PEP
|
||
(along with some dissents/issues noted below).
|
||
|
||
|
||
Backwards Compatibility
|
||
|
||
The proposed mechanism is generally backwards compatible as it
|
||
calls for neither new syntax nor new keywords. All existing,
|
||
valid Python programs should continue to work unmodified.
|
||
|
||
However, this proposal is not perfectly backwards compatible in
|
||
the sense that certain statements that are currently invalid
|
||
would, under the current proposal, become valid.
|
||
|
||
Tim Peters has pointed out two such examples:
|
||
|
||
1) The common case where one forgets to include range() or
|
||
xrange(), for example:
|
||
|
||
for rowcount in table.getRowCount():
|
||
print table.getValueAt(rowcount, 0)
|
||
|
||
in Python 2.2 raises a TypeError exception.
|
||
|
||
Under the current proposal, the above statement would be valid
|
||
and would work as (presumably) intended. Presumably, this is a
|
||
good thing.
|
||
|
||
As noted by Tim, this is the common case of the "forgotten
|
||
range" mistake (which one currently corrects by adding a call
|
||
to range() or xrange()).
|
||
|
||
2) The (hopefully) very uncommon case where one makes a typing
|
||
mistake when using tuple unpacking. For example:
|
||
|
||
x, = 1
|
||
|
||
in Python 2.2 raises a TypeError exception.
|
||
|
||
Under the current proposal, the above statement would be valid
|
||
and would set x to 0. The PEP author has no data as to how
|
||
common this typing error is nor how difficult it would be to
|
||
catch such an error under the current proposal. He imagines
|
||
that it does not occur frequently and that it would be
|
||
relatively easy to correct should it happen.
|
||
|
||
|
||
|
||
Issues:
|
||
|
||
Based on some preliminary discussion on the Python interest
|
||
mailing list, the following concerns have been voiced:
|
||
|
||
- Is it obvious that iter(5) maps to the sequence 0,1,2,3,4?
|
||
|
||
Response: Given, as noted above, that Python has a strong
|
||
convention for indexing sequences starting at 0 and stopping at
|
||
(inclusively) the index whose value is one less than the length
|
||
of the sequence, it is argued that the proposed sequence is
|
||
reasonably intuitive to a Python programmer while being useful
|
||
and practical.
|
||
|
||
- "in" (as in "for i in x") does not match standard English usage
|
||
in this case. "up to" or something similar might be better.
|
||
|
||
Response: Not everyone felt that matching standard English
|
||
perfectly is a requirement. It is noted that "for:else:"
|
||
doesn't match standard English very well either. And few are
|
||
excited about adding a new keyword, especially just to get a
|
||
somewhat better match to standard English usage.
|
||
|
||
- Possible ambiguity
|
||
|
||
for i in 10: print i
|
||
|
||
might be mistaken for
|
||
|
||
for i in (10,): print i
|
||
|
||
Response: The predicted ambiguity was not readily apparent to
|
||
several of the posters.
|
||
|
||
- It would be better to add special new syntax such as:
|
||
|
||
for i in 0..10: print i
|
||
|
||
Response: There are other PEPs that take this approach[2][3].
|
||
|
||
- It would be better to reuse the ellipsis literal syntax (...)
|
||
|
||
Response: Shares disadvantages of other proposals that require
|
||
changes to the syntax. Needs more design to determine how it
|
||
would handle the general case of start,stop,step,
|
||
open/closed/half-closed intervals, etc. Needs a PEP.
|
||
|
||
- It would be better to reuse the slicing literal syntax attached
|
||
to the int class, e.g., int[0:10]
|
||
|
||
Response: Same as previous response. In addition, design
|
||
consideration needs to be given to what it would mean if one
|
||
uses slicing syntax after some arbitrary class other than class
|
||
int. Needs a PEP.
|
||
|
||
- Might dissuade newbies from using the indexed for-loop idiom when
|
||
the standard "for item in collection:" idiom is clearly better.
|
||
|
||
Response: The standard idiom is so nice when "it fits" that it
|
||
needs neither extra "carrot" nor "stick". On the other hand,
|
||
one does notice cases of overuse/misuse of the standard idiom
|
||
(due, most likely, to the awkwardness of the indexed for-loop
|
||
idiom), as in:
|
||
|
||
for item in sequence:
|
||
print sequence.index(item)
|
||
|
||
- Doesn't handle the general case of start,stop,step
|
||
|
||
Response: use the existing range() or xrange() mechanisms. Or,
|
||
see below.
|
||
|
||
|
||
Extension
|
||
|
||
If one wants to handle general indexing (start,stop,step) without
|
||
having to resort to using the range() or xrange() functions then
|
||
the following could be incorporated into the current proposal.
|
||
|
||
Add an "iter" method (or use some other preferred name) to
|
||
types.IntType with the following signature:
|
||
|
||
def iter(start=0, step=1):
|
||
|
||
This method would have the (hopefully) obvious semantics.
|
||
|
||
Then one could do, for example:
|
||
|
||
x = 100
|
||
for i in x.iter(start=1, step=2):
|
||
print i
|
||
|
||
Under this extension (for x bound to an int),
|
||
|
||
for i in x:
|
||
|
||
would be equivalent to
|
||
|
||
for i in x.iter():
|
||
|
||
and to
|
||
|
||
for i in x.iter(start=0, step=1):
|
||
|
||
This extension is consistent with the generalization provided by
|
||
the current mechanism for dictionaries whereby one can use:
|
||
|
||
for k in d.iterkeys():
|
||
for v in d.itervalues():
|
||
for k,v in d.iteritems():
|
||
|
||
depending on one's needs, given that
|
||
|
||
for i in d:
|
||
|
||
has a meaning aimed at the most common and useful case (d.iterkeys()).
|
||
|
||
|
||
Implementation
|
||
|
||
An implementation is not available at this time and although the
|
||
author is not qualified to comment on such he will, nonetheless,
|
||
speculate that this might be straightforward and, hopefully, might
|
||
consist of little more than setting the tp_iter slot in
|
||
types.IntType to point to a simple iterator function that would be
|
||
similar to -- or perhaps even a wrapper around -- the xrange()
|
||
function.
|
||
|
||
|
||
References
|
||
|
||
[1] PEP 234, Iterators
|
||
http://python.sourceforge.net/peps/pep-0234.html
|
||
|
||
[2] PEP 204, Range Literals
|
||
http://python.sourceforge.net/peps/pep-0204.html
|
||
|
||
[3] PEP 212, Loop Counter Iteration
|
||
http://python.sourceforge.net/peps/pep-0212.html
|
||
|
||
|
||
Copyright
|
||
|
||
This document has been placed in the public domain.
|
||
|
||
|
||
|
||
Local Variables:
|
||
mode: indented-text
|
||
indent-tabs-mode: nil
|
||
fill-column: 70
|
||
End:
|