273 lines
9.7 KiB
Plaintext
273 lines
9.7 KiB
Plaintext
PEP: 204
|
||
Title: Range Literals
|
||
Version: $Revision$
|
||
Owner: thomas@xs4all.net (Thomas Wouters)
|
||
Python-Version: 2.0
|
||
Status: Draft
|
||
|
||
|
||
|
||
Introduction
|
||
|
||
This PEP describes the `range literal' proposal for Python 2.0.
|
||
This PEP tracks the status and ownership of this feature, slated
|
||
for introduction in Python 2.0. It contains a description of the
|
||
feature and outlines changes necessary to support the feature.
|
||
This PEP summarizes discussions held in mailing list forums, and
|
||
provides URLs for further information, where appropriate. The CVS
|
||
revision history of this file contains the definitive historical
|
||
record.
|
||
|
||
|
||
|
||
List ranges
|
||
|
||
Ranges are sequences of numbers of a fixed stepping, often used in
|
||
for-loops. The Python for-loop is designed to iterate over a
|
||
sequence directly:
|
||
|
||
>>> l = ['a', 'b', 'c', 'd']
|
||
>>> for item in l:
|
||
... print item
|
||
a
|
||
b
|
||
c
|
||
d
|
||
|
||
However, this solution is not always prudent. Firstly, problems
|
||
arise when altering the sequence in the body of the for-loop,
|
||
resulting in the for-loop skipping items. Secondly, it is not
|
||
possible to iterate over, say, every second element of the
|
||
sequence. And thirdly, it is sometimes necessary to process an
|
||
element based on its index, which is not readily available in the
|
||
above construct.
|
||
|
||
For these instances, and others where a range of numbers is
|
||
desired, Python provides the `range' builtin function, which
|
||
creates a list of numbers. The `range' function takes three
|
||
arguments, `start', `end' and `step'. `start' and `step' are
|
||
optional, and default to 0 and 1, respectively.
|
||
|
||
The `range' function creates a list of numbers, starting at
|
||
`start', with a step of `step', up to, but not including `end', so
|
||
that `range(10)' produces a list that has exactly 10 items, the
|
||
numbers 0 through 9.
|
||
|
||
Using the `range' function, the above example would look like
|
||
this:
|
||
|
||
>>> for i in range(len(l)):
|
||
... print l[i]
|
||
a
|
||
b
|
||
c
|
||
d
|
||
|
||
Or, to start at the second element of `l' and processing only
|
||
every second element from then on:
|
||
|
||
>>> for i in range(1, len(l), 2):
|
||
... print l[i]
|
||
b
|
||
d
|
||
|
||
There are several disadvantages with this approach:
|
||
|
||
- Clarity of purpose: Adding another functioncall, possibly with
|
||
extra arithmatic to determine the desired length and step of the
|
||
list, does not improve readability of the code. Also, it is
|
||
possible to `shadow' the builtin `range' function by supplying a
|
||
local or global variable with the same name, effectively
|
||
replacing it. This may or may not be a desired effect.
|
||
|
||
- Efficiency: because the `range' function can be overridden, the
|
||
Python compiler cannot make assumptions about the for-loop, and
|
||
has to maintain a seperate loop counter.
|
||
|
||
- Consistency: There already is a syntax that is used to denote
|
||
ranges, as shown below. This syntax uses the exact same
|
||
arguments, though all optional, in the exact same way. It seems
|
||
logical to extend this syntax to ranges, to form `range
|
||
literals'.
|
||
|
||
|
||
|
||
Slice Indices
|
||
|
||
In Python, a sequence can be indexed in one of two ways:
|
||
retrieving a single item, or retrieving a range of items.
|
||
Retrieving a range of items results in a new object of the same
|
||
type as the original sequence, containing zero or more items from
|
||
the original sequence. This is done using a `range notation':
|
||
|
||
>>> l[2:4]
|
||
['c', 'd']
|
||
|
||
This range notation consists of zero, one or two indices seperated
|
||
by a colon. The first index is the `start' index, the second the
|
||
`end'. When either is left out, they default to respectively the
|
||
start and the end of the sequence.
|
||
|
||
There is also an extended range notation, which incorporates
|
||
`step' as well. Though this notation is not currently supported
|
||
by most builtin types, if it were, it would work as follows:
|
||
|
||
>>> l[1:4:2]
|
||
['b', 'd']
|
||
|
||
The third `argument' to the slice syntax is exactly the same as
|
||
the `step' argument to range(). The underlying mechanisms of
|
||
standard and these extended slices are sufficiently different and
|
||
inconsistent that many classes and extensions outside of
|
||
mathematical packages do not implement support for the extended
|
||
variant, and this should definately be resolved, but this is
|
||
beyond the scope of this PEP.
|
||
|
||
Extended slices do show, however, that there is already a
|
||
perfectly valid and applicable syntax to denote ranges in a way
|
||
that solve all of the earlier stated disadvantages of the use of
|
||
the range() function:
|
||
|
||
- It is clearer, more concise syntax, which has already proven to
|
||
be both intuitive and easy to learn.
|
||
|
||
- It is consistent with the other use of ranges in Python
|
||
(slices.)
|
||
|
||
- Because it is built-in syntax, instead of a builtin function, it
|
||
cannot be overridden. This means both that a viewer can be
|
||
certain about what the code does, and that an optimizer will not
|
||
have to worry about range() being `shadowed'.
|
||
|
||
|
||
|
||
The Proposed Solution
|
||
|
||
The proposed implementation of range-literals combines the syntax
|
||
for list literals with the syntax for (extended) slices, to form
|
||
range literals:
|
||
|
||
>>> [1:10]
|
||
[1, 2, 3, 4, 5, 6, 7, 8, 9]
|
||
>>> [:5]
|
||
[0, 1, 2, 3, 4]
|
||
>>> [5:1:-1]
|
||
[5, 4, 3, 2]
|
||
|
||
There is one minor difference between range literals and the slice
|
||
syntax: though it is possible to omit all of `start', `end' and
|
||
`step' in slices, it does not make sense to omit `end' in range
|
||
literals. In slices, `end' would default to the end of the list,
|
||
but this has no meaning in range literals.
|
||
|
||
|
||
|
||
Reference Implementation
|
||
|
||
The proposed implementation can be found on SourceForge[1]. It
|
||
adds a new bytecode, BUILD_RANGE, that takes three arguments from
|
||
the stack and builds a list on the bases of those. The list is
|
||
pushed back on the stack.
|
||
|
||
The use of a new bytecode is necessary to be able to build ranges
|
||
based on other calculations, whose outcome is not known at compile
|
||
time.
|
||
|
||
The code introduces two new functions to listobject.c, which are
|
||
currently hovering between private functions and full-fledged API
|
||
calls.
|
||
|
||
PyObject * PyList_FromRange(long start, long end, long step)
|
||
builds a list from start, end and step, returning NULL if an error
|
||
occurs.
|
||
|
||
long PyList_GetLenOfRange(long start, long end, long step) is a
|
||
helper function to determine the length of a range. It was
|
||
previously a static function in bltinmodule.c, but is now
|
||
necessary in both listobject.c and bltinmodule.c (for xrange). It
|
||
is made non-static solely to avoid code duplication.
|
||
|
||
|
||
Open issues
|
||
|
||
One possible solution to the discrepancy of requiring the `end'
|
||
argument in range literals is to allow the range syntax to create
|
||
a `generator', rather than a list, such as the `xrange' builtin
|
||
function does. However, a generator would not be a list, and it
|
||
would be impossible, for instance, to assign to items in the
|
||
generator, or append to it.
|
||
|
||
The range syntax could conceivably be extended to include tuples,
|
||
immutable lists, which could then be safely implemented as
|
||
generators. Especially for large number arrays, this may be a
|
||
desirable solution: generators require very little in the way of
|
||
storage and initialization, and there is only a small performance
|
||
impact in calculating and creating the appropriate number on
|
||
request. (TBD: is there any at all ? Cursory testing suggests
|
||
equal performance even in the case of ranges of length 1.)
|
||
|
||
However, even if idea was adopted, would it be wise to `special
|
||
case' the second argument, making it optional in one instance of
|
||
the syntax, and non-optional in other cases ?
|
||
|
||
|
||
Should it be possible to mix range syntax with normal list
|
||
literals, creating a single list, like so:
|
||
|
||
>>> [5, 6, 1:6, 7, 9]
|
||
to create
|
||
[5, 6, 1, 2, 3, 4, 5, 7, 9]
|
||
|
||
|
||
How should range literals interact with another proposed new
|
||
feature, `list comprehensions', PEP-202 ? In specific, should it
|
||
be possible to create lists in list comprehensions, like so:
|
||
|
||
>>> [x:y for x in (1,2) y in (3, 4)]
|
||
|
||
Should this example return a single list with multiple ranges:
|
||
[1, 2, 1, 2, 3, 2, 2, 3]
|
||
|
||
Or a list of lists, like so:
|
||
[[1, 2], [1, 2, 3], [2], [2, 3]]
|
||
|
||
However, as the syntax and semantics of list comprehensions are
|
||
still subject of hot debate, these issues are probably best
|
||
addressed by the `list comprehensions' PEP.
|
||
|
||
|
||
Range literals accept objects other than integers: it performs
|
||
PyInt_AsLong() on the objects passed in, so as long as the objects
|
||
can be coerced into integers, they will be accepted. The
|
||
resulting list, however, is always composed of standard integers.
|
||
|
||
Should range literals create a list of the passed-in type ? It
|
||
might be desirable in the cases of other builtin types, such as
|
||
longs and strings:
|
||
|
||
>>> [ 1L : 2L<<64 : 2<<32L ]
|
||
>>> ["a":"z":"b"]
|
||
>>> ["a":"z":2]
|
||
|
||
However, this might be too much `magic' to be obvious. It might
|
||
also present problems with user-defined classes: even if the base
|
||
class can be found and a new instance created, the instance may
|
||
require additional arguments to __init__, causing the creation to
|
||
fail.
|
||
|
||
|
||
The PyList_FromRange() and PyList_GetLenOfRange() functions need
|
||
to be classified: are they part of the API, or should they be made
|
||
private functions ?
|
||
|
||
|
||
References:
|
||
|
||
[1] http://sourceforge.net/patch/?func=detailpatch&patch_id=100902&group_id=5470
|
||
|
||
|
||
Local Variables:
|
||
mode: indented-text
|
||
indent-tabs-mode: nil
|
||
End:
|