430 lines
16 KiB
Plaintext
430 lines
16 KiB
Plaintext
PEP: 276
|
||
Title: Simple Iterator for ints
|
||
Version: $Revision$
|
||
Last-Modified: $Date$
|
||
Author: james_althoff@i2.com (Jim Althoff)
|
||
Status: Active
|
||
Type: Standards Track
|
||
Created: 12-Nov-2001
|
||
Python-Version: 2.3
|
||
Post-History:
|
||
|
||
|
||
Abstract
|
||
|
||
Python 2.1 added new functionality to support iterators[1].
|
||
Iterators have proven to be useful and convenient in many coding
|
||
situations. It is noted that the implementation of Python's
|
||
for-loop control structure uses the iterator protocol as of
|
||
release 2.1. It is also noted that Python provides iterators for
|
||
the following builtin types: lists, tuples, dictionaries, strings,
|
||
and files. This PEP proposes the addition of an iterator for the
|
||
builtin type int (types.IntType). Such an iterator would simplify
|
||
the coding of certain for-loops in Python.
|
||
|
||
BDFL Pronouncement
|
||
|
||
This PEP was rejected on 17 June 2005 with a note to python-dev.
|
||
|
||
Much of the original need was met by the enumerate() function which
|
||
was accepted for Python 2.3.
|
||
|
||
Also, the proposal both allowed and encouraged misuses such as:
|
||
|
||
>>> for i in 3: print i
|
||
0
|
||
1
|
||
2
|
||
|
||
Likewise, it was not helpful that the proposal would disable the
|
||
syntax error in statements like:
|
||
|
||
x, = 1
|
||
|
||
Specification
|
||
|
||
Define an iterator for types.intType (i.e., the builtin type
|
||
"int") that is returned from the builtin function "iter" when
|
||
called with an instance of types.intType as the argument.
|
||
|
||
The returned iterator has the following behavior:
|
||
|
||
- Assume that object i is an instance of types.intType (the
|
||
builtin type int) and that i > 0
|
||
|
||
- iter(i) returns an iterator object
|
||
|
||
- said iterator object iterates through the sequence of ints
|
||
0,1,2,...,i-1
|
||
|
||
Example:
|
||
|
||
iter(5) returns an iterator object that iterates through the
|
||
sequence of ints 0,1,2,3,4
|
||
|
||
- if i <= 0, iter(i) returns an "empty" iterator, i.e., one that
|
||
throws StopIteration upon the first call of its "next" method
|
||
|
||
In other words, the conditions and semantics of said iterator is
|
||
consistent with the conditions and semantics of the range() and
|
||
xrange() functions.
|
||
|
||
Note that the sequence 0,1,2,...,i-1 associated with the int i is
|
||
considered "natural" in the context of Python programming because
|
||
it is consistent with the builtin indexing protocol of sequences
|
||
in Python. Python lists and tuples, for example, are indexed
|
||
starting at 0 and ending at len(object)-1 (when using positive
|
||
indices). In other words, such objects are indexed with the
|
||
sequence 0,1,2,...,len(object)-1
|
||
|
||
|
||
Rationale
|
||
|
||
A common programming idiom is to take a collection of objects and
|
||
apply some operation to each item in the collection in some
|
||
established sequential order. Python provides the "for in"
|
||
looping control structure for handling this common idiom. Cases
|
||
arise, however, where it is necessary (or more convenient) to
|
||
access each item in an "indexed" collection by iterating through
|
||
each index and accessing each item in the collection using the
|
||
corresponding index.
|
||
|
||
For example, one might have a two-dimensional "table" object where one
|
||
requires the application of some operation to the first column of
|
||
each row in the table. Depending on the implementation of the table
|
||
it might not be possible to access first each row and then each
|
||
column as individual objects. It might, rather, be possible to
|
||
access a cell in the table using a row index and a column index.
|
||
In such a case it is necessary to use an idiom where one iterates
|
||
through a sequence of indices (indexes) in order to access the
|
||
desired items in the table. (Note that the commonly used
|
||
DefaultTableModel class in Java-Swing-Jython has this very protocol).
|
||
|
||
Another common example is where one needs to process two or more
|
||
collections in parallel. Another example is where one needs to
|
||
access, say, every second item in a collection.
|
||
|
||
There are many other examples where access to items in a
|
||
collection is facilitated by a computation on an index thus
|
||
necessitating access to the indices rather than direct access to
|
||
the items themselves.
|
||
|
||
Let's call this idiom the "indexed for-loop" idiom. Some
|
||
programming languages provide builtin syntax for handling this
|
||
idiom. In Python the common convention for implementing the
|
||
indexed for-loop idiom is to use the builtin range() or xrange()
|
||
function to generate a sequence of indices as in, for example:
|
||
|
||
for rowcount in range(table.getRowCount()):
|
||
print table.getValueAt(rowcount, 0)
|
||
|
||
or
|
||
|
||
for rowcount in xrange(table.getRowCount()):
|
||
print table.getValueAt(rowcount, 0)
|
||
|
||
From time to time there are discussions in the Python community
|
||
about the indexed for-loop idiom. It is sometimes argued that the
|
||
need for using the range() or xrange() function for this design
|
||
idiom is:
|
||
|
||
- Not obvious (to new-to-Python programmers),
|
||
|
||
- Error prone (easy to forget, even for experienced Python
|
||
programmers)
|
||
|
||
- Confusing and distracting for those who feel compelled to understand
|
||
the differences and recommended usage of xrange() vis-a-vis range()
|
||
|
||
- Unwieldy, especially when combined with the len() function,
|
||
i.e., xrange(len(sequence))
|
||
|
||
- Not as convenient as equivalent mechanisms in other languages,
|
||
|
||
- Annoying, a "wart", etc.
|
||
|
||
And from time to time proposals are put forth for ways in which
|
||
Python could provide a better mechanism for this idiom. Recent
|
||
examples include PEP 204, "Range Literals", and PEP 212, "Loop
|
||
Counter Iteration".
|
||
|
||
Most often, such proposal include changes to Python's syntax and
|
||
other "heavyweight" changes.
|
||
|
||
Part of the difficulty here is that advocating new syntax implies
|
||
a comprehensive solution for "general indexing" that has to
|
||
include aspects like:
|
||
|
||
- starting index value
|
||
|
||
- ending index value
|
||
|
||
- step value
|
||
|
||
- open intervals versus closed intervals versus half opened intervals
|
||
|
||
Finding a new syntax that is comprehensive, simple, general,
|
||
Pythonic, appealing to many, easy to implement, not in conflict
|
||
with existing structures, not excessively overloading of existing
|
||
structures, etc. has proven to be more difficult than one might
|
||
anticipate.
|
||
|
||
The proposal outlined in this PEP tries to address the problem by
|
||
suggesting a simple "lightweight" solution that helps the most
|
||
common case by using a proven mechanism that is already available
|
||
(as of Python 2.1): namely, iterators.
|
||
|
||
Because for-loops already use "iterator" protocol as of Python
|
||
2.1, adding an iterator for types.IntType as proposed in this PEP
|
||
would enable by default the following shortcut for the indexed
|
||
for-loop idiom:
|
||
|
||
for rowcount in table.getRowCount():
|
||
print table.getValueAt(rowcount, 0)
|
||
|
||
The following benefits for this approach vis-a-vis the current
|
||
mechanism of using the range() or xrange() functions are claimed
|
||
to be:
|
||
|
||
- Simpler,
|
||
|
||
- Less cluttered,
|
||
|
||
- Focuses on the problem at hand without the need to resort to
|
||
secondary implementation-oriented functions (range() and
|
||
xrange())
|
||
|
||
And compared to other proposals for change:
|
||
|
||
- Requires no new syntax
|
||
|
||
- Requires no new keywords
|
||
|
||
- Takes advantage of the new and well-established iterator mechanism
|
||
|
||
And generally:
|
||
|
||
- Is consistent with iterator-based "convenience" changes already
|
||
included (as of Python 2.1) for other builtin types such as:
|
||
lists, tuples, dictionaries, strings, and files.
|
||
|
||
|
||
Backwards Compatibility
|
||
|
||
The proposed mechanism is generally backwards compatible as it
|
||
calls for neither new syntax nor new keywords. All existing,
|
||
valid Python programs should continue to work unmodified.
|
||
|
||
However, this proposal is not perfectly backwards compatible in
|
||
the sense that certain statements that are currently invalid
|
||
would, under the current proposal, become valid.
|
||
|
||
Tim Peters has pointed out two such examples:
|
||
|
||
1) The common case where one forgets to include range() or
|
||
xrange(), for example:
|
||
|
||
for rowcount in table.getRowCount():
|
||
print table.getValueAt(rowcount, 0)
|
||
|
||
in Python 2.2 raises a TypeError exception.
|
||
|
||
Under the current proposal, the above statement would be valid
|
||
and would work as (presumably) intended. Presumably, this is a
|
||
good thing.
|
||
|
||
As noted by Tim, this is the common case of the "forgotten
|
||
range" mistake (which one currently corrects by adding a call
|
||
to range() or xrange()).
|
||
|
||
2) The (hopefully) very uncommon case where one makes a typing
|
||
mistake when using tuple unpacking. For example:
|
||
|
||
x, = 1
|
||
|
||
in Python 2.2 raises a TypeError exception.
|
||
|
||
Under the current proposal, the above statement would be valid
|
||
and would set x to 0. The PEP author has no data as to how
|
||
common this typing error is nor how difficult it would be to
|
||
catch such an error under the current proposal. He imagines
|
||
that it does not occur frequently and that it would be
|
||
relatively easy to correct should it happen.
|
||
|
||
|
||
Issues:
|
||
|
||
Extensive discussions concerning PEP 276 on the Python interest
|
||
mailing list suggests a range of opinions: some in favor, some
|
||
neutral, some against. Those in favor tend to agree with the
|
||
claims above of the usefulness, convenience, ease of learning,
|
||
and simplicity of a simple iterator for integers.
|
||
|
||
Issues with PEP 276 include:
|
||
|
||
- Using range/xrange is fine as is.
|
||
|
||
Response: Some posters feel this way. Other disagree.
|
||
|
||
- Some feel that iterating over the sequence "0, 1, 2, ..., n-1"
|
||
for an integer n is not intuitive. "for i in 5:" is considered
|
||
(by some) to be "non-obvious", for example. Some dislike this
|
||
usage because it doesn't have "the right feel". Some dislike it
|
||
because they believe that this type of usage forces one to view
|
||
integers as a sequences and this seems wrong to them. Some
|
||
dislike it because they prefer to view for-loops as dealing
|
||
with explicit sequences rather than with arbitrary iterators.
|
||
|
||
Response: Some like the proposed idiom and see it as simple,
|
||
elegant, easy to learn, and easy to use. Some are neutral on
|
||
this issue. Others, as noted, dislike it.
|
||
|
||
- Is it obvious that iter(5) maps to the sequence 0,1,2,3,4?
|
||
|
||
Response: Given, as noted above, that Python has a strong
|
||
convention for indexing sequences starting at 0 and stopping at
|
||
(inclusively) the index whose value is one less than the length
|
||
of the sequence, it is argued that the proposed sequence is
|
||
reasonably intuitive to the Python programmer while being useful
|
||
and practical. More importantly, it is argued that once learned
|
||
this convention is very easy to remember. Note that the doc
|
||
string for the range function makes a reference to the
|
||
natural and useful association between range(n) and the indices
|
||
for a list whose length is n.
|
||
|
||
- Possible ambiguity
|
||
|
||
for i in 10: print i
|
||
|
||
might be mistaken for
|
||
|
||
for i in (10,): print i
|
||
|
||
Response: This is exactly the same situation with strings in
|
||
current Python (replace 10 with 'spam' in the above, for
|
||
example).
|
||
|
||
- Too general: in the newest releases of Python there are
|
||
contexts -- as with for-loops -- where iterators are called
|
||
implicitly. Some fear that having an iterator invoked for
|
||
an integer in one of the context (excluding for-loops) might
|
||
lead to unexpected behavior and bugs. The "x, = 1" example
|
||
noted above is an a case in point.
|
||
|
||
Response: From the author's perspective the examples of the
|
||
above that were identified in the PEP 276 discussions did
|
||
not appear to be ones that would be accidentally misused
|
||
in ways that would lead to subtle and hard-to-detect errors.
|
||
|
||
In addition, it seems that there is a way to deal with this
|
||
issue by using a variation of what is outlined in the
|
||
specification section of this proposal. Instead of adding
|
||
an __iter__ method to class int, change the for-loop handling
|
||
code to convert (in essense) from
|
||
|
||
for i in n: # when isinstance(n,int) is 1
|
||
|
||
to
|
||
|
||
for i in xrange(n):
|
||
|
||
This approach gives the same results in a for-loop as an
|
||
__iter__ method would but would prevent iteration on integer
|
||
values in any other context. Lists and tuples, for example,
|
||
don't have __iter__ and are handled with special code.
|
||
Integer values would be one more special case.
|
||
|
||
- "i in n" seems very unnatural.
|
||
|
||
Response: Some feel that "i in len(mylist)" would be easily
|
||
understandable and useful. Some don't like it, particularly
|
||
when a literal is used as in "i in 5". If the variant
|
||
mentioned in the response to the previous issue is implemented,
|
||
this issue is moot. If not, then one could also address this
|
||
issue by defining a __contains__ method in class int that would
|
||
always raise a TypeError. This would then make the behavior of
|
||
"i in n" identical to that of current Python.
|
||
|
||
- Might dissuade newbies from using the indexed for-loop idiom when
|
||
the standard "for item in collection:" idiom is clearly better.
|
||
|
||
Response: The standard idiom is so nice when it fits that it
|
||
needs neither extra "carrot" nor "stick". On the other hand,
|
||
one does notice cases of overuse/misuse of the standard idiom
|
||
(due, most likely, to the awkwardness of the indexed for-loop
|
||
idiom), as in:
|
||
|
||
for item in sequence:
|
||
print sequence.index(item)
|
||
|
||
- Why not propose even bigger changes?
|
||
|
||
The majority of disagreement with PEP 276 came from those who
|
||
favor much larger changes to Python to address the more general
|
||
problem of specifying a sequence of integers where such
|
||
a specification is general enough to handle the starting value,
|
||
ending value, and stepping value of the sequence and also
|
||
addresses variations of open, closed, and half-open (half-closed)
|
||
integer intervals. Many suggestions of such were discussed.
|
||
|
||
These include:
|
||
|
||
- adding Haskell-like notation for specifying a sequence of
|
||
integers in a literal list,
|
||
|
||
- various uses of slicing notation to specify sequences,
|
||
|
||
- changes to the syntax of for-in loops to allow the use of
|
||
relational operators in the loop header,
|
||
|
||
- creation of an integer-interval class along with methods that
|
||
overload relational operators or division operators
|
||
to provide "slicing" on integer-interval objects,
|
||
|
||
- and more.
|
||
|
||
It should be noted that there was much debate but not an
|
||
overwhelming concensus for any of these larger-scale suggestions.
|
||
|
||
Clearly, PEP 276 does not propose such a large-scale change
|
||
and instead focuses on a specific problem area. Towards the
|
||
end of the discussion period, several posters expressed favor
|
||
for the narrow focus and simplicity of PEP 276 vis-a-vis the more
|
||
ambitious suggestions that were advanced. There did appear to be
|
||
concensus for the need for a PEP for any such larger-scale,
|
||
alternative suggestion. In light of this recognition, details of
|
||
the various alternative suggestions are not discussed here further.
|
||
|
||
|
||
Implementation
|
||
|
||
An implementation is not available at this time but is expected
|
||
to be straightforward. The author has implemented a subclass of
|
||
int with an __iter__ method (written in Python) as a means to test
|
||
out the ideas in this proposal, however.
|
||
|
||
|
||
References
|
||
|
||
[1] PEP 234, Iterators
|
||
http://www.python.org/peps/pep-0234.html
|
||
|
||
[2] PEP 204, Range Literals
|
||
http://www.python.org/peps/pep-0204.html
|
||
|
||
[3] PEP 212, Loop Counter Iteration
|
||
http://www.python.org/peps/pep-0212.html
|
||
|
||
|
||
Copyright
|
||
|
||
This document has been placed in the public domain.
|
||
|
||
|
||
|
||
Local Variables:
|
||
mode: indented-text
|
||
indent-tabs-mode: nil
|
||
fill-column: 70
|
||
End:
|