diff --git a/pep-0204.txt b/pep-0204.txt index 96ad896cc..a280a5663 100644 --- a/pep-0204.txt +++ b/pep-0204.txt @@ -1,11 +1,271 @@ -PEP: 203 +PEP: 204 Title: Range Literals Version: $Revision$ Owner: thomas@xs4all.net (Thomas Wouters) Python-Version: 2.0 -Status: Incomplete +Status: Draft + +Introduction + + This PEP describes the `range literal' proposal for Python 2.0. + This PEP tracks the status and ownership of this feature, slated + for introduction in Python 2.0. It contains a description of the + feature and outlines changes necessary to support the feature. + This PEP summarizes discussions held in mailing list forums, and + provides URLs for further information, where appropriate. The CVS + revision history of this file contains the definitive historical + record. + + + +List ranges + + Ranges are sequences of numbers of a fixed stepping, often used in + for-loops. The Python for-loop is designed to iterate over a + sequence directly: + + >>> l = ['a', 'b', 'c', 'd'] + >>> for item in l: + ... print item + a + b + c + d + + However, this solution is not always prudent. Firstly, problems + arise when altering the sequence in the body of the for-loop, + resulting in the for-loop skipping items. Secondly, it is not + possible to iterate over, say, every second element of the + sequence. And thirdly, it is sometimes necessary to process an + element based on its index, which is not readily available in the + above construct. + + For these instances, and others where a range of numbers is + desired, Python provides the `range' builtin function, which + creates a list of numbers. The `range' function takes three + arguments, `start', `end' and `step'. `start' and `step' are + optional, and default to 0 and 1, respectively. + + The `range' function creates a list of numbers, starting at + `start', with a step of `step', up to, but not including `end', so + that `range(10)' produces a list that has exactly 10 items, the + numbers 0 through 9. + + Using the `range' function, the above example would look like + this: + + >>> for i in range(len(l)): + ... print l[i] + a + b + c + d + + Or, to start at the second element of `l' and processing only + every second element from then on: + + >>> for i in range(1, len(l), 2): + ... print l[i] + b + d + + There are several disadvantages with this approach: + + - Clarity of purpose: Adding another functioncall, possibly with + extra arithmatic to determine the desired length and step of the + list, does not improve readability of the code. Also, it is + possible to `shadow' the builtin `range' function by supplying a + local or global variable with the same name, effectively + replacing it. This may or may not be a desired effect. + + - Efficiency: because the `range' function can be overridden, the + Python compiler cannot make assumptions about the for-loop, and + has to maintain a seperate loop counter. + + - Consistency: There already is a syntax that is used to denote + ranges, as shown below. This syntax uses the exact same + arguments, though all optional, in the exact same way. It seems + logical to extend this syntax to ranges, to form `range + literals'. + + + +Slice Indices + + In Python, a sequence can be indexed in one of two ways: + retrieving a single item, or retrieving a range of items. + Retrieving a range of items results in a new object of the same + type as the original sequence, containing zero or more items from + the original sequence. This is done using a `range notation': + + >>> l[2:4] + ['c', 'd'] + + This range notation consists of zero, one or two indices seperated + by a colon. The first index is the `start' index, the second the + `end'. When either is left out, they default to respectively the + start and the end of the sequence. + + There is also an extended range notation, which incorporates + `step' as well. Though this notation is not currently supported + by most builtin types, if it were, it would work as follows: + + >>> l[1:4:2] + ['b', 'd'] + + The third `argument' to the slice syntax is exactly the same as + the `step' argument to range(). The underlying mechanisms of + standard and these extended slices are sufficiently different and + inconsistent that many classes and extensions outside of + mathematical packages do not implement support for the extended + variant, and this should definately be resolved, but this is + beyond the scope of this PEP. + + Extended slices do show, however, that there is already a + perfectly valid and applicable syntax to denote ranges in a way + that solve all of the earlier stated disadvantages of the use of + the range() function: + + - It is clearer, more concise syntax, which has already proven to + be both intuitive and easy to learn. + + - It is consistent with the other use of ranges in Python + (slices.) + + - Because it is built-in syntax, instead of a builtin function, it + cannot be overridden. This means both that a viewer can be + certain about what the code does, and that an optimizer will not + have to worry about range() being `shadowed'. + + + +The Proposed Solution + + The proposed implementation of range-literals combines the syntax + for list literals with the syntax for (extended) slices, to form + range literals: + + >>> [1:10] + [1, 2, 3, 4, 5, 6, 7, 8, 9] + >>> [:5] + [0, 1, 2, 3, 4] + >>> [5:1:-1] + [5, 4, 3, 2] + + There is one minor difference between range literals and the slice + syntax: though it is possible to omit all of `start', `end' and + `step' in slices, it does not make sense to omit `end' in range + literals. In slices, `end' would default to the end of the list, + but this has no meaning in range literals. + + + +Reference Implementation + + The proposed implementation can be found on SourceForge[1]. It + adds a new bytecode, BUILD_RANGE, that takes three arguments from + the stack and builds a list on the bases of those. The list is + pushed back on the stack. + + The use of a new bytecode is necessary to be able to build ranges + based on other calculations, whose outcome is not known at compile + time. + + The code introduces two new functions to listobject.c, which are + currently hovering between private functions and full-fledged API + calls. + + PyObject * PyList_FromRange(long start, long end, long step) + builds a list from start, end and step, returning NULL if an error + occurs. + + long PyList_GetLenOfRange(long start, long end, long step) is a + helper function to determine the length of a range. It was + previously a static function in bltinmodule.c, but is now + necessary in both listobject.c and bltinmodule.c (for xrange). It + is made non-static solely to avoid code duplication. + + +Open issues + + One possible solution to the discrepancy of requiring the `end' + argument in range literals is to allow the range syntax to create + a `generator', rather than a list, such as the `xrange' builtin + function does. However, a generator would not be a list, and it + would be impossible, for instance, to assign to items in the + generator, or append to it. + + The range syntax could conceivably be extended to include tuples, + immutable lists, which could then be safely implemented as + generators. Especially for large number arrays, this may be a + desirable solution: generators require very little in the way of + storage and initialization, and there is only a small performance + impact in calculating and creating the appropriate number on + request. (TBD: is there any at all ? Cursory testing suggests + equal performance even in the case of ranges of length 1.) + + However, even if idea was adopted, would it be wise to `special + case' the second argument, making it optional in one instance of + the syntax, and non-optional in other cases ? + + + Should it be possible to mix range syntax with normal list + literals, creating a single list, like so: + + >>> [5, 6, 1:6, 7, 9] + to create + [5, 6, 1, 2, 3, 4, 5, 7, 9] + + + How should range literals interact with another proposed new + feature, `list comprehensions', PEP-202 ? In specific, should it + be possible to create lists in list comprehensions, like so: + + >>> [x:y for x in (1,2) y in (3, 4)] + + Should this example return a single list with multiple ranges: + [1, 2, 1, 2, 3, 2, 2, 3] + + Or a list of lists, like so: + [[1, 2], [1, 2, 3], [2], [2, 3]] + + However, as the syntax and semantics of list comprehensions are + still subject of hot debate, these issues are probably best + addressed by the `list comprehensions' PEP. + + + Range literals accept objects other than integers: it performs + PyInt_AsLong() on the objects passed in, so as long as the objects + can be coerced into integers, they will be accepted. The + resulting list, however, is always composed of standard integers. + + Should range literals create a list of the passed-in type ? It + might be desirable in the cases of other builtin types, such as + longs and strings: + + >>> [ 1L : 2L<<64 : 2<<32L ] + >>> ["a":"z":"b"] + >>> ["a":"z":2] + + However, this might be too much `magic' to be obvious. It might + also present problems with user-defined classes: even if the base + class can be found and a new instance created, the instance may + require additional arguments to __init__, causing the creation to + fail. + + + The PyList_FromRange() and PyList_GetLenOfRange() functions need + to be classified: are they part of the API, or should they be made + private functions ? + + +References: + + [1] +http://sourceforge.net/patch/?func=detailpatch&patch_id=100902&group_id=5470 + Local Variables: mode: indented-text