First draft of the range-literals PEP. Not sprinkeled with explicit 'TBD's,
and the one TBD in there should be an 'XXX' in my opinion ;) but with a lot of issues under 'Open issues'. Please, comment freely, on anything from specific arguments and spelling errors to style issues and argument-flow. I wrote all this on the train to and from work, listening to UB40 on the way to work, and to my girlfriend complaining about her work and her colleagues on the way home ;) So it might be a bit incoherent in places. One TBD that's going to come up a lot in PEPs, I think: how to wrap long URLs ? I think whoever invents a good way to do that can become a millionaire. Can we define/steal a good, standard way to wrap URLs, please ?
This commit is contained in:
parent
be3c33389e
commit
6f232833de
264
pep-0204.txt
264
pep-0204.txt
|
@ -1,11 +1,271 @@
|
|||
PEP: 203
|
||||
PEP: 204
|
||||
Title: Range Literals
|
||||
Version: $Revision$
|
||||
Owner: thomas@xs4all.net (Thomas Wouters)
|
||||
Python-Version: 2.0
|
||||
Status: Incomplete
|
||||
Status: Draft
|
||||
|
||||
|
||||
|
||||
Introduction
|
||||
|
||||
This PEP describes the `range literal' proposal for Python 2.0.
|
||||
This PEP tracks the status and ownership of this feature, slated
|
||||
for introduction in Python 2.0. It contains a description of the
|
||||
feature and outlines changes necessary to support the feature.
|
||||
This PEP summarizes discussions held in mailing list forums, and
|
||||
provides URLs for further information, where appropriate. The CVS
|
||||
revision history of this file contains the definitive historical
|
||||
record.
|
||||
|
||||
|
||||
|
||||
List ranges
|
||||
|
||||
Ranges are sequences of numbers of a fixed stepping, often used in
|
||||
for-loops. The Python for-loop is designed to iterate over a
|
||||
sequence directly:
|
||||
|
||||
>>> l = ['a', 'b', 'c', 'd']
|
||||
>>> for item in l:
|
||||
... print item
|
||||
a
|
||||
b
|
||||
c
|
||||
d
|
||||
|
||||
However, this solution is not always prudent. Firstly, problems
|
||||
arise when altering the sequence in the body of the for-loop,
|
||||
resulting in the for-loop skipping items. Secondly, it is not
|
||||
possible to iterate over, say, every second element of the
|
||||
sequence. And thirdly, it is sometimes necessary to process an
|
||||
element based on its index, which is not readily available in the
|
||||
above construct.
|
||||
|
||||
For these instances, and others where a range of numbers is
|
||||
desired, Python provides the `range' builtin function, which
|
||||
creates a list of numbers. The `range' function takes three
|
||||
arguments, `start', `end' and `step'. `start' and `step' are
|
||||
optional, and default to 0 and 1, respectively.
|
||||
|
||||
The `range' function creates a list of numbers, starting at
|
||||
`start', with a step of `step', up to, but not including `end', so
|
||||
that `range(10)' produces a list that has exactly 10 items, the
|
||||
numbers 0 through 9.
|
||||
|
||||
Using the `range' function, the above example would look like
|
||||
this:
|
||||
|
||||
>>> for i in range(len(l)):
|
||||
... print l[i]
|
||||
a
|
||||
b
|
||||
c
|
||||
d
|
||||
|
||||
Or, to start at the second element of `l' and processing only
|
||||
every second element from then on:
|
||||
|
||||
>>> for i in range(1, len(l), 2):
|
||||
... print l[i]
|
||||
b
|
||||
d
|
||||
|
||||
There are several disadvantages with this approach:
|
||||
|
||||
- Clarity of purpose: Adding another functioncall, possibly with
|
||||
extra arithmatic to determine the desired length and step of the
|
||||
list, does not improve readability of the code. Also, it is
|
||||
possible to `shadow' the builtin `range' function by supplying a
|
||||
local or global variable with the same name, effectively
|
||||
replacing it. This may or may not be a desired effect.
|
||||
|
||||
- Efficiency: because the `range' function can be overridden, the
|
||||
Python compiler cannot make assumptions about the for-loop, and
|
||||
has to maintain a seperate loop counter.
|
||||
|
||||
- Consistency: There already is a syntax that is used to denote
|
||||
ranges, as shown below. This syntax uses the exact same
|
||||
arguments, though all optional, in the exact same way. It seems
|
||||
logical to extend this syntax to ranges, to form `range
|
||||
literals'.
|
||||
|
||||
|
||||
|
||||
Slice Indices
|
||||
|
||||
In Python, a sequence can be indexed in one of two ways:
|
||||
retrieving a single item, or retrieving a range of items.
|
||||
Retrieving a range of items results in a new object of the same
|
||||
type as the original sequence, containing zero or more items from
|
||||
the original sequence. This is done using a `range notation':
|
||||
|
||||
>>> l[2:4]
|
||||
['c', 'd']
|
||||
|
||||
This range notation consists of zero, one or two indices seperated
|
||||
by a colon. The first index is the `start' index, the second the
|
||||
`end'. When either is left out, they default to respectively the
|
||||
start and the end of the sequence.
|
||||
|
||||
There is also an extended range notation, which incorporates
|
||||
`step' as well. Though this notation is not currently supported
|
||||
by most builtin types, if it were, it would work as follows:
|
||||
|
||||
>>> l[1:4:2]
|
||||
['b', 'd']
|
||||
|
||||
The third `argument' to the slice syntax is exactly the same as
|
||||
the `step' argument to range(). The underlying mechanisms of
|
||||
standard and these extended slices are sufficiently different and
|
||||
inconsistent that many classes and extensions outside of
|
||||
mathematical packages do not implement support for the extended
|
||||
variant, and this should definately be resolved, but this is
|
||||
beyond the scope of this PEP.
|
||||
|
||||
Extended slices do show, however, that there is already a
|
||||
perfectly valid and applicable syntax to denote ranges in a way
|
||||
that solve all of the earlier stated disadvantages of the use of
|
||||
the range() function:
|
||||
|
||||
- It is clearer, more concise syntax, which has already proven to
|
||||
be both intuitive and easy to learn.
|
||||
|
||||
- It is consistent with the other use of ranges in Python
|
||||
(slices.)
|
||||
|
||||
- Because it is built-in syntax, instead of a builtin function, it
|
||||
cannot be overridden. This means both that a viewer can be
|
||||
certain about what the code does, and that an optimizer will not
|
||||
have to worry about range() being `shadowed'.
|
||||
|
||||
|
||||
|
||||
The Proposed Solution
|
||||
|
||||
The proposed implementation of range-literals combines the syntax
|
||||
for list literals with the syntax for (extended) slices, to form
|
||||
range literals:
|
||||
|
||||
>>> [1:10]
|
||||
[1, 2, 3, 4, 5, 6, 7, 8, 9]
|
||||
>>> [:5]
|
||||
[0, 1, 2, 3, 4]
|
||||
>>> [5:1:-1]
|
||||
[5, 4, 3, 2]
|
||||
|
||||
There is one minor difference between range literals and the slice
|
||||
syntax: though it is possible to omit all of `start', `end' and
|
||||
`step' in slices, it does not make sense to omit `end' in range
|
||||
literals. In slices, `end' would default to the end of the list,
|
||||
but this has no meaning in range literals.
|
||||
|
||||
|
||||
|
||||
Reference Implementation
|
||||
|
||||
The proposed implementation can be found on SourceForge[1]. It
|
||||
adds a new bytecode, BUILD_RANGE, that takes three arguments from
|
||||
the stack and builds a list on the bases of those. The list is
|
||||
pushed back on the stack.
|
||||
|
||||
The use of a new bytecode is necessary to be able to build ranges
|
||||
based on other calculations, whose outcome is not known at compile
|
||||
time.
|
||||
|
||||
The code introduces two new functions to listobject.c, which are
|
||||
currently hovering between private functions and full-fledged API
|
||||
calls.
|
||||
|
||||
PyObject * PyList_FromRange(long start, long end, long step)
|
||||
builds a list from start, end and step, returning NULL if an error
|
||||
occurs.
|
||||
|
||||
long PyList_GetLenOfRange(long start, long end, long step) is a
|
||||
helper function to determine the length of a range. It was
|
||||
previously a static function in bltinmodule.c, but is now
|
||||
necessary in both listobject.c and bltinmodule.c (for xrange). It
|
||||
is made non-static solely to avoid code duplication.
|
||||
|
||||
|
||||
Open issues
|
||||
|
||||
One possible solution to the discrepancy of requiring the `end'
|
||||
argument in range literals is to allow the range syntax to create
|
||||
a `generator', rather than a list, such as the `xrange' builtin
|
||||
function does. However, a generator would not be a list, and it
|
||||
would be impossible, for instance, to assign to items in the
|
||||
generator, or append to it.
|
||||
|
||||
The range syntax could conceivably be extended to include tuples,
|
||||
immutable lists, which could then be safely implemented as
|
||||
generators. Especially for large number arrays, this may be a
|
||||
desirable solution: generators require very little in the way of
|
||||
storage and initialization, and there is only a small performance
|
||||
impact in calculating and creating the appropriate number on
|
||||
request. (TBD: is there any at all ? Cursory testing suggests
|
||||
equal performance even in the case of ranges of length 1.)
|
||||
|
||||
However, even if idea was adopted, would it be wise to `special
|
||||
case' the second argument, making it optional in one instance of
|
||||
the syntax, and non-optional in other cases ?
|
||||
|
||||
|
||||
Should it be possible to mix range syntax with normal list
|
||||
literals, creating a single list, like so:
|
||||
|
||||
>>> [5, 6, 1:6, 7, 9]
|
||||
to create
|
||||
[5, 6, 1, 2, 3, 4, 5, 7, 9]
|
||||
|
||||
|
||||
How should range literals interact with another proposed new
|
||||
feature, `list comprehensions', PEP-202 ? In specific, should it
|
||||
be possible to create lists in list comprehensions, like so:
|
||||
|
||||
>>> [x:y for x in (1,2) y in (3, 4)]
|
||||
|
||||
Should this example return a single list with multiple ranges:
|
||||
[1, 2, 1, 2, 3, 2, 2, 3]
|
||||
|
||||
Or a list of lists, like so:
|
||||
[[1, 2], [1, 2, 3], [2], [2, 3]]
|
||||
|
||||
However, as the syntax and semantics of list comprehensions are
|
||||
still subject of hot debate, these issues are probably best
|
||||
addressed by the `list comprehensions' PEP.
|
||||
|
||||
|
||||
Range literals accept objects other than integers: it performs
|
||||
PyInt_AsLong() on the objects passed in, so as long as the objects
|
||||
can be coerced into integers, they will be accepted. The
|
||||
resulting list, however, is always composed of standard integers.
|
||||
|
||||
Should range literals create a list of the passed-in type ? It
|
||||
might be desirable in the cases of other builtin types, such as
|
||||
longs and strings:
|
||||
|
||||
>>> [ 1L : 2L<<64 : 2<<32L ]
|
||||
>>> ["a":"z":"b"]
|
||||
>>> ["a":"z":2]
|
||||
|
||||
However, this might be too much `magic' to be obvious. It might
|
||||
also present problems with user-defined classes: even if the base
|
||||
class can be found and a new instance created, the instance may
|
||||
require additional arguments to __init__, causing the creation to
|
||||
fail.
|
||||
|
||||
|
||||
The PyList_FromRange() and PyList_GetLenOfRange() functions need
|
||||
to be classified: are they part of the API, or should they be made
|
||||
private functions ?
|
||||
|
||||
|
||||
References:
|
||||
|
||||
[1]
|
||||
http://sourceforge.net/patch/?func=detailpatch&patch_id=100902&group_id=5470
|
||||
|
||||
|
||||
Local Variables:
|
||||
mode: indented-text
|
||||
|
|
Loading…
Reference in New Issue