2000-07-18 06:01:12 -04:00
|
|
|
|
PEP: 204
|
2000-07-13 23:29:11 -04:00
|
|
|
|
Title: Range Literals
|
|
|
|
|
Version: $Revision$
|
2000-08-23 01:41:57 -04:00
|
|
|
|
Author: thomas@xs4all.net (Thomas Wouters)
|
2000-07-18 06:01:12 -04:00
|
|
|
|
Status: Draft
|
2000-08-23 01:41:57 -04:00
|
|
|
|
Type: Standards Track
|
|
|
|
|
Python-Version: 2.0
|
|
|
|
|
Created: 14-Jul-2000
|
|
|
|
|
Post-History:
|
2000-07-13 23:29:11 -04:00
|
|
|
|
|
|
|
|
|
|
2000-07-18 06:01:12 -04:00
|
|
|
|
Introduction
|
|
|
|
|
|
2000-08-23 01:41:57 -04:00
|
|
|
|
This PEP describes the `range literal' proposal for Python 2.0.
|
2000-07-18 06:01:12 -04:00
|
|
|
|
This PEP tracks the status and ownership of this feature, slated
|
|
|
|
|
for introduction in Python 2.0. It contains a description of the
|
2000-08-23 01:41:57 -04:00
|
|
|
|
feature and outlines changes necessary to support the feature.
|
2000-07-18 06:01:12 -04:00
|
|
|
|
This PEP summarizes discussions held in mailing list forums, and
|
|
|
|
|
provides URLs for further information, where appropriate. The CVS
|
|
|
|
|
revision history of this file contains the definitive historical
|
|
|
|
|
record.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
List ranges
|
|
|
|
|
|
|
|
|
|
Ranges are sequences of numbers of a fixed stepping, often used in
|
|
|
|
|
for-loops. The Python for-loop is designed to iterate over a
|
|
|
|
|
sequence directly:
|
|
|
|
|
|
2000-08-23 01:41:57 -04:00
|
|
|
|
>>> l = ['a', 'b', 'c', 'd']
|
|
|
|
|
>>> for item in l:
|
|
|
|
|
... print item
|
|
|
|
|
a
|
|
|
|
|
b
|
|
|
|
|
c
|
|
|
|
|
d
|
2000-07-18 06:01:12 -04:00
|
|
|
|
|
|
|
|
|
However, this solution is not always prudent. Firstly, problems
|
|
|
|
|
arise when altering the sequence in the body of the for-loop,
|
|
|
|
|
resulting in the for-loop skipping items. Secondly, it is not
|
|
|
|
|
possible to iterate over, say, every second element of the
|
|
|
|
|
sequence. And thirdly, it is sometimes necessary to process an
|
|
|
|
|
element based on its index, which is not readily available in the
|
|
|
|
|
above construct.
|
|
|
|
|
|
|
|
|
|
For these instances, and others where a range of numbers is
|
|
|
|
|
desired, Python provides the `range' builtin function, which
|
|
|
|
|
creates a list of numbers. The `range' function takes three
|
|
|
|
|
arguments, `start', `end' and `step'. `start' and `step' are
|
|
|
|
|
optional, and default to 0 and 1, respectively.
|
|
|
|
|
|
|
|
|
|
The `range' function creates a list of numbers, starting at
|
|
|
|
|
`start', with a step of `step', up to, but not including `end', so
|
|
|
|
|
that `range(10)' produces a list that has exactly 10 items, the
|
|
|
|
|
numbers 0 through 9.
|
|
|
|
|
|
|
|
|
|
Using the `range' function, the above example would look like
|
|
|
|
|
this:
|
|
|
|
|
|
2000-08-23 01:41:57 -04:00
|
|
|
|
>>> for i in range(len(l)):
|
|
|
|
|
... print l[i]
|
|
|
|
|
a
|
|
|
|
|
b
|
|
|
|
|
c
|
|
|
|
|
d
|
2000-07-18 06:01:12 -04:00
|
|
|
|
|
|
|
|
|
Or, to start at the second element of `l' and processing only
|
|
|
|
|
every second element from then on:
|
|
|
|
|
|
2000-08-23 01:41:57 -04:00
|
|
|
|
>>> for i in range(1, len(l), 2):
|
|
|
|
|
... print l[i]
|
|
|
|
|
b
|
|
|
|
|
d
|
2000-07-18 06:01:12 -04:00
|
|
|
|
|
|
|
|
|
There are several disadvantages with this approach:
|
|
|
|
|
|
2000-08-23 01:41:57 -04:00
|
|
|
|
- Clarity of purpose: Adding another function call, possibly with
|
|
|
|
|
extra arithmetic to determine the desired length and step of the
|
2000-07-18 06:01:12 -04:00
|
|
|
|
list, does not improve readability of the code. Also, it is
|
|
|
|
|
possible to `shadow' the builtin `range' function by supplying a
|
|
|
|
|
local or global variable with the same name, effectively
|
|
|
|
|
replacing it. This may or may not be a desired effect.
|
|
|
|
|
|
|
|
|
|
- Efficiency: because the `range' function can be overridden, the
|
|
|
|
|
Python compiler cannot make assumptions about the for-loop, and
|
2000-08-23 01:41:57 -04:00
|
|
|
|
has to maintain a separate loop counter.
|
2000-07-18 06:01:12 -04:00
|
|
|
|
|
|
|
|
|
- Consistency: There already is a syntax that is used to denote
|
|
|
|
|
ranges, as shown below. This syntax uses the exact same
|
|
|
|
|
arguments, though all optional, in the exact same way. It seems
|
|
|
|
|
logical to extend this syntax to ranges, to form `range
|
|
|
|
|
literals'.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Slice Indices
|
|
|
|
|
|
|
|
|
|
In Python, a sequence can be indexed in one of two ways:
|
|
|
|
|
retrieving a single item, or retrieving a range of items.
|
|
|
|
|
Retrieving a range of items results in a new object of the same
|
|
|
|
|
type as the original sequence, containing zero or more items from
|
|
|
|
|
the original sequence. This is done using a `range notation':
|
|
|
|
|
|
2000-08-23 01:41:57 -04:00
|
|
|
|
>>> l[2:4]
|
|
|
|
|
['c', 'd']
|
2000-07-18 06:01:12 -04:00
|
|
|
|
|
2000-08-23 01:41:57 -04:00
|
|
|
|
This range notation consists of zero, one or two indices separated
|
2000-07-18 06:01:12 -04:00
|
|
|
|
by a colon. The first index is the `start' index, the second the
|
|
|
|
|
`end'. When either is left out, they default to respectively the
|
|
|
|
|
start and the end of the sequence.
|
|
|
|
|
|
|
|
|
|
There is also an extended range notation, which incorporates
|
|
|
|
|
`step' as well. Though this notation is not currently supported
|
|
|
|
|
by most builtin types, if it were, it would work as follows:
|
|
|
|
|
|
2000-08-23 01:41:57 -04:00
|
|
|
|
>>> l[1:4:2]
|
|
|
|
|
['b', 'd']
|
2000-07-18 06:01:12 -04:00
|
|
|
|
|
|
|
|
|
The third `argument' to the slice syntax is exactly the same as
|
2000-08-23 01:41:57 -04:00
|
|
|
|
the `step' argument to range(). The underlying mechanisms of the
|
|
|
|
|
standard, and these extended slices, are sufficiently different
|
|
|
|
|
and inconsistent that many classes and extensions outside of
|
2000-07-18 06:01:12 -04:00
|
|
|
|
mathematical packages do not implement support for the extended
|
2000-08-23 01:41:57 -04:00
|
|
|
|
variant. While this should be resolved, it is beyond the scope of
|
|
|
|
|
this PEP.
|
2000-07-18 06:01:12 -04:00
|
|
|
|
|
|
|
|
|
Extended slices do show, however, that there is already a
|
|
|
|
|
perfectly valid and applicable syntax to denote ranges in a way
|
|
|
|
|
that solve all of the earlier stated disadvantages of the use of
|
|
|
|
|
the range() function:
|
|
|
|
|
|
|
|
|
|
- It is clearer, more concise syntax, which has already proven to
|
|
|
|
|
be both intuitive and easy to learn.
|
|
|
|
|
|
|
|
|
|
- It is consistent with the other use of ranges in Python
|
2000-08-23 01:41:57 -04:00
|
|
|
|
(e.g. slices).
|
2000-07-18 06:01:12 -04:00
|
|
|
|
|
|
|
|
|
- Because it is built-in syntax, instead of a builtin function, it
|
|
|
|
|
cannot be overridden. This means both that a viewer can be
|
|
|
|
|
certain about what the code does, and that an optimizer will not
|
|
|
|
|
have to worry about range() being `shadowed'.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The Proposed Solution
|
|
|
|
|
|
|
|
|
|
The proposed implementation of range-literals combines the syntax
|
|
|
|
|
for list literals with the syntax for (extended) slices, to form
|
|
|
|
|
range literals:
|
|
|
|
|
|
2000-08-23 01:41:57 -04:00
|
|
|
|
>>> [1:10]
|
|
|
|
|
[1, 2, 3, 4, 5, 6, 7, 8, 9]
|
|
|
|
|
>>> [:5]
|
|
|
|
|
[0, 1, 2, 3, 4]
|
|
|
|
|
>>> [5:1:-1]
|
|
|
|
|
[5, 4, 3, 2]
|
2000-07-18 06:01:12 -04:00
|
|
|
|
|
|
|
|
|
There is one minor difference between range literals and the slice
|
|
|
|
|
syntax: though it is possible to omit all of `start', `end' and
|
|
|
|
|
`step' in slices, it does not make sense to omit `end' in range
|
|
|
|
|
literals. In slices, `end' would default to the end of the list,
|
|
|
|
|
but this has no meaning in range literals.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Reference Implementation
|
|
|
|
|
|
|
|
|
|
The proposed implementation can be found on SourceForge[1]. It
|
|
|
|
|
adds a new bytecode, BUILD_RANGE, that takes three arguments from
|
|
|
|
|
the stack and builds a list on the bases of those. The list is
|
|
|
|
|
pushed back on the stack.
|
|
|
|
|
|
|
|
|
|
The use of a new bytecode is necessary to be able to build ranges
|
|
|
|
|
based on other calculations, whose outcome is not known at compile
|
|
|
|
|
time.
|
|
|
|
|
|
|
|
|
|
The code introduces two new functions to listobject.c, which are
|
|
|
|
|
currently hovering between private functions and full-fledged API
|
|
|
|
|
calls.
|
|
|
|
|
|
2000-08-23 01:41:57 -04:00
|
|
|
|
PyList_FromRange() builds a list from start, end and step,
|
|
|
|
|
returning NULL if an error occurs. Its prototype is:
|
|
|
|
|
|
|
|
|
|
PyObject * PyList_FromRange(long start, long end, long step)
|
2000-07-18 06:01:12 -04:00
|
|
|
|
|
2000-08-23 01:41:57 -04:00
|
|
|
|
PyList_GetLenOfRange() is a helper function used to determine the
|
|
|
|
|
length of a range. Previously, it was a static function in
|
|
|
|
|
bltinmodule.c, but is now necessary in both listobject.c and
|
|
|
|
|
bltinmodule.c (for xrange). It is made non-static solely to avoid
|
|
|
|
|
code duplication. Its prototype is:
|
|
|
|
|
|
|
|
|
|
long PyList_GetLenOfRange(long start, long end, long step)
|
|
|
|
|
|
2000-07-18 06:01:12 -04:00
|
|
|
|
|
|
|
|
|
Open issues
|
|
|
|
|
|
2000-08-23 01:41:57 -04:00
|
|
|
|
- One possible solution to the discrepancy of requiring the `end'
|
|
|
|
|
argument in range literals is to allow the range syntax to
|
|
|
|
|
create a `generator', rather than a list, such as the `xrange'
|
|
|
|
|
builtin function does. However, a generator would not be a
|
|
|
|
|
list, and it would be impossible, for instance, to assign to
|
|
|
|
|
items in the generator, or append to it.
|
|
|
|
|
|
|
|
|
|
The range syntax could conceivably be extended to include tuples
|
|
|
|
|
(i.e. immutable lists), which could then be safely implemented
|
|
|
|
|
as generators. This may be a desirable solution, especially for
|
|
|
|
|
large number arrays: generators require very little in the way
|
|
|
|
|
of storage and initialization, and there is only a small
|
|
|
|
|
performance impact in calculating and creating the appropriate
|
|
|
|
|
number on request. (TBD: is there any at all? Cursory testing
|
|
|
|
|
suggests equal performance even in the case of ranges of length
|
|
|
|
|
1)
|
|
|
|
|
|
|
|
|
|
However, even if idea was adopted, would it be wise to `special
|
|
|
|
|
case' the second argument, making it optional in one instance of
|
|
|
|
|
the syntax, and non-optional in other cases ?
|
2000-07-18 06:01:12 -04:00
|
|
|
|
|
2000-08-23 01:41:57 -04:00
|
|
|
|
- Should it be possible to mix range syntax with normal list
|
|
|
|
|
literals, creating a single list? E.g.:
|
2000-07-18 06:01:12 -04:00
|
|
|
|
|
2000-08-23 01:41:57 -04:00
|
|
|
|
>>> [5, 6, 1:6, 7, 9]
|
2000-07-18 06:01:12 -04:00
|
|
|
|
|
|
|
|
|
to create
|
|
|
|
|
|
2000-08-23 01:41:57 -04:00
|
|
|
|
[5, 6, 1, 2, 3, 4, 5, 7, 9]
|
2000-07-18 06:01:12 -04:00
|
|
|
|
|
2000-08-23 01:41:57 -04:00
|
|
|
|
- How should range literals interact with another proposed new
|
|
|
|
|
feature, `list comprehensions'[2]? Specifically, should it be
|
|
|
|
|
possible to create lists in list comprehensions? E.g.:
|
2000-07-18 06:01:12 -04:00
|
|
|
|
|
2000-08-23 01:41:57 -04:00
|
|
|
|
>>> [x:y for x in (1, 2) y in (3, 4)]
|
2000-07-18 06:01:12 -04:00
|
|
|
|
|
2000-08-23 01:41:57 -04:00
|
|
|
|
Should this example return a single list with multiple ranges:
|
2000-07-18 06:01:12 -04:00
|
|
|
|
|
2000-08-23 01:41:57 -04:00
|
|
|
|
[1, 2, 1, 2, 3, 2, 2, 3]
|
2000-07-18 06:01:12 -04:00
|
|
|
|
|
2000-08-23 01:41:57 -04:00
|
|
|
|
Or a list of lists, like so:
|
2000-07-18 06:01:12 -04:00
|
|
|
|
|
2000-08-23 01:41:57 -04:00
|
|
|
|
[[1, 2], [1, 2, 3], [2], [2, 3]]
|
2000-07-18 06:01:12 -04:00
|
|
|
|
|
2000-08-23 01:41:57 -04:00
|
|
|
|
However, as the syntax and semantics of list comprehensions are
|
|
|
|
|
still subject of hot debate, these issues are probably best
|
|
|
|
|
addressed by the `list comprehensions' PEP.
|
2000-07-18 06:01:12 -04:00
|
|
|
|
|
2000-08-23 01:41:57 -04:00
|
|
|
|
- Range literals accept objects other than integers: it performs
|
|
|
|
|
PyInt_AsLong() on the objects passed in, so as long as the
|
|
|
|
|
objects can be coerced into integers, they will be accepted.
|
|
|
|
|
The resulting list, however, is always composed of standard
|
|
|
|
|
integers.
|
|
|
|
|
|
|
|
|
|
Should range literals create a list of the passed-in type? It
|
|
|
|
|
might be desirable in the cases of other builtin types, such as
|
|
|
|
|
longs and strings:
|
|
|
|
|
|
|
|
|
|
>>> [ 1L : 2L<<64 : 2<<32L ]
|
|
|
|
|
>>> ["a":"z":"b"]
|
|
|
|
|
>>> ["a":"z":2]
|
|
|
|
|
|
|
|
|
|
However, this might be too much `magic' to be obvious. It might
|
|
|
|
|
also present problems with user-defined classes: even if the
|
|
|
|
|
base class can be found and a new instance created, the instance
|
|
|
|
|
may require additional arguments to __init__, causing the
|
|
|
|
|
creation to fail.
|
2000-07-18 06:01:12 -04:00
|
|
|
|
|
2000-08-23 01:41:57 -04:00
|
|
|
|
- The PyList_FromRange() and PyList_GetLenOfRange() functions need
|
|
|
|
|
to be classified: are they part of the API, or should they be
|
|
|
|
|
made private functions?
|
2000-07-18 06:01:12 -04:00
|
|
|
|
|
2000-08-23 01:41:57 -04:00
|
|
|
|
|
2000-07-18 06:01:12 -04:00
|
|
|
|
References:
|
|
|
|
|
|
2000-07-26 00:12:42 -04:00
|
|
|
|
[1] http://sourceforge.net/patch/?func=detailpatch&patch_id=100902&group_id=5470
|
2000-08-23 01:41:57 -04:00
|
|
|
|
[2] PEP 202, List Comprehensions, pep-0202.txt
|
|
|
|
|
|
2000-07-18 06:01:12 -04:00
|
|
|
|
|
2000-07-13 23:29:11 -04:00
|
|
|
|
|
|
|
|
|
Local Variables:
|
|
|
|
|
mode: indented-text
|
|
|
|
|
indent-tabs-mode: nil
|
|
|
|
|
End:
|