2006-06-26 12:41:13 -04:00
|
|
|
|
PEP: 3103
|
|
|
|
|
Title: A Switch/Case Statement
|
|
|
|
|
Version: $Revision$
|
|
|
|
|
Last-Modified: $Date$
|
|
|
|
|
Author: guido@python.org (Guido van Rossum)
|
2007-02-25 19:35:06 -05:00
|
|
|
|
Status: Rejected
|
2006-06-26 12:41:13 -04:00
|
|
|
|
Type: Standards Track
|
|
|
|
|
Content-Type: text/x-rst
|
|
|
|
|
Created: 25-Jun-2006
|
2007-06-19 00:20:07 -04:00
|
|
|
|
Python-Version: 3.0
|
2006-06-26 14:05:39 -04:00
|
|
|
|
Post-History: 26-Jun-2006
|
2006-06-26 12:41:13 -04:00
|
|
|
|
|
|
|
|
|
|
2007-02-25 19:35:06 -05:00
|
|
|
|
Rejection Notice
|
|
|
|
|
================
|
|
|
|
|
|
|
|
|
|
A quick poll during my keynote presentation at PyCon 2007 shows this
|
|
|
|
|
proposal has no popular support. I therefore reject it.
|
|
|
|
|
|
|
|
|
|
|
2006-06-26 12:41:13 -04:00
|
|
|
|
Abstract
|
|
|
|
|
========
|
|
|
|
|
|
|
|
|
|
Python-dev has recently seen a flurry of discussion on adding a switch
|
|
|
|
|
statement. In this PEP I'm trying to extract my own preferences from
|
|
|
|
|
the smorgasboard of proposals, discussing alternatives and explaining
|
|
|
|
|
my choices where I can. I'll also indicate how strongly I feel about
|
|
|
|
|
alternatives I discuss.
|
|
|
|
|
|
|
|
|
|
This PEP should be seen as an alternative to PEP 275. My views are
|
|
|
|
|
somewhat different from that PEP's author, but I'm grateful for the
|
|
|
|
|
work done in that PEP.
|
|
|
|
|
|
2006-06-26 14:05:39 -04:00
|
|
|
|
This PEP introduces canonical names for the many variants that have
|
|
|
|
|
been discussed for different aspects of the syntax and semantics, such
|
2006-06-27 01:06:58 -04:00
|
|
|
|
as "alternative 1", "school II", "option 3" and so on. Hopefully
|
2006-06-26 14:05:39 -04:00
|
|
|
|
these names will help the discussion.
|
|
|
|
|
|
2006-06-26 12:41:13 -04:00
|
|
|
|
|
|
|
|
|
Rationale
|
|
|
|
|
=========
|
|
|
|
|
|
|
|
|
|
A common programming idiom is to consider an expression and do
|
|
|
|
|
different things depending on its value. This is usually done with a
|
|
|
|
|
chain of if/elif tests; I'll refer to this form as the "if/elif
|
|
|
|
|
chain". There are two main motivations to want to introduce new
|
|
|
|
|
syntax for this idiom:
|
|
|
|
|
|
|
|
|
|
- It is repetitive: the variable and the test operator, usually '=='
|
|
|
|
|
or 'in', are repeated in each if/elif branch.
|
|
|
|
|
|
2006-06-26 14:47:03 -04:00
|
|
|
|
- It is inefficient: when an expression matches the last test value
|
2006-06-26 12:41:13 -04:00
|
|
|
|
(or no test value at all) it is compared to each of the preceding
|
|
|
|
|
test values.
|
|
|
|
|
|
|
|
|
|
Both of these complaints are relatively mild; there isn't a lot of
|
|
|
|
|
readability or performance to be gained by writing this differently.
|
|
|
|
|
Yet, some kind of switch statement is found in many languages and it
|
|
|
|
|
is not unreasonable to expect that its addition to Python will allow
|
|
|
|
|
us to write up certain code more cleanly and efficiently than before.
|
|
|
|
|
|
|
|
|
|
There are forms of dispatch that are not suitable for the proposed
|
|
|
|
|
switch statement; for example, when the number of cases is not
|
|
|
|
|
statically known, or when it is desirable to place the code for
|
|
|
|
|
different cases in different classes or files.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Basic Syntax
|
|
|
|
|
============
|
|
|
|
|
|
|
|
|
|
I'm considering several variants of the syntax first proposed in PEP
|
|
|
|
|
275 here. There are lots of other possibilities, but I don't see that
|
|
|
|
|
they add anything.
|
|
|
|
|
|
2006-07-07 16:16:41 -04:00
|
|
|
|
I've recently been converted to alternative 1.
|
2006-06-26 12:41:13 -04:00
|
|
|
|
|
2006-06-26 14:47:03 -04:00
|
|
|
|
I should note that all alternatives here have the "implicit break"
|
2006-06-26 12:41:13 -04:00
|
|
|
|
property: at the end of the suite for a particular case, the control
|
|
|
|
|
flow jumps to the end of the whole switch statement. There is no way
|
|
|
|
|
to pass control from one case to another. This in contrast to C,
|
|
|
|
|
where an explicit 'break' statement is required to prevent falling
|
|
|
|
|
through to the next case.
|
|
|
|
|
|
|
|
|
|
In all alternatives, the else-suite is optional. It is more Pythonic
|
|
|
|
|
to use 'else' here rather than introducing a new reserved word,
|
|
|
|
|
'default', as in C.
|
|
|
|
|
|
|
|
|
|
Semantics are discussed in the next top-level section.
|
|
|
|
|
|
|
|
|
|
Alternative 1
|
|
|
|
|
-------------
|
|
|
|
|
|
|
|
|
|
This is the preferred form in PEP 275::
|
|
|
|
|
|
|
|
|
|
switch EXPR:
|
|
|
|
|
case EXPR:
|
|
|
|
|
SUITE
|
|
|
|
|
case EXPR:
|
|
|
|
|
SUITE
|
|
|
|
|
...
|
|
|
|
|
else:
|
|
|
|
|
SUITE
|
|
|
|
|
|
|
|
|
|
The main downside is that the suites where all the action is are
|
2006-07-07 16:16:41 -04:00
|
|
|
|
indented two levels deep; this can be remedied by indenting the cases
|
|
|
|
|
"half a level" (e.g. 2 spaces if the general indentation level is 4).
|
2006-06-26 12:41:13 -04:00
|
|
|
|
|
|
|
|
|
Alternative 2
|
|
|
|
|
-------------
|
|
|
|
|
|
|
|
|
|
This is Fredrik Lundh's preferred form; it differs by not indenting
|
|
|
|
|
the cases::
|
|
|
|
|
|
|
|
|
|
switch EXPR:
|
|
|
|
|
case EXPR:
|
|
|
|
|
SUITE
|
|
|
|
|
case EXPR:
|
|
|
|
|
SUITE
|
|
|
|
|
....
|
|
|
|
|
else:
|
|
|
|
|
SUITE
|
|
|
|
|
|
2006-07-07 16:16:41 -04:00
|
|
|
|
Some reasons not to choose this include expected difficulties for
|
|
|
|
|
auto-indenting editors, folding editors, and the like; and confused
|
|
|
|
|
users. There are no situations currently in Python where a line
|
|
|
|
|
ending in a colon is followed by an unindented line.
|
|
|
|
|
|
2006-06-26 12:41:13 -04:00
|
|
|
|
Alternative 3
|
|
|
|
|
-------------
|
|
|
|
|
|
|
|
|
|
This is the same as alternative 2 but leaves out the colon after the
|
|
|
|
|
switch::
|
|
|
|
|
|
|
|
|
|
switch EXPR
|
|
|
|
|
case EXPR:
|
|
|
|
|
SUITE
|
|
|
|
|
case EXPR:
|
|
|
|
|
SUITE
|
|
|
|
|
....
|
|
|
|
|
else:
|
|
|
|
|
SUITE
|
|
|
|
|
|
2006-06-28 12:26:17 -04:00
|
|
|
|
The hope of this alternative is that it will not upset the auto-indent
|
2006-06-26 12:41:13 -04:00
|
|
|
|
logic of the average Python-aware text editor less. But it looks
|
|
|
|
|
strange to me.
|
|
|
|
|
|
|
|
|
|
Alternative 4
|
|
|
|
|
-------------
|
|
|
|
|
|
|
|
|
|
This leaves out the 'case' keyword on the basis that it is redundant::
|
|
|
|
|
|
|
|
|
|
switch EXPR:
|
|
|
|
|
EXPR:
|
|
|
|
|
SUITE
|
|
|
|
|
EXPR:
|
|
|
|
|
SUITE
|
|
|
|
|
...
|
|
|
|
|
else:
|
|
|
|
|
SUITE
|
|
|
|
|
|
|
|
|
|
Unfortunately now we are forced to indent the case expressions,
|
|
|
|
|
because otherwise (at least in the absence of an 'else' keyword) the
|
|
|
|
|
parser would have a hard time distinguishing between an unindented
|
|
|
|
|
case expression (which continues the switch statement) or an unrelated
|
|
|
|
|
statement that starts like an expression (such as an assignment or a
|
|
|
|
|
procedure call). The parser is not smart enough to backtrack once it
|
|
|
|
|
sees the colon. This is my least favorite alternative.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Extended Syntax
|
|
|
|
|
===============
|
|
|
|
|
|
|
|
|
|
There is one additional concern that needs to be addressed
|
|
|
|
|
syntactically. Often two or more values need to be treated the same.
|
|
|
|
|
In C, this done by writing multiple case labels together without any
|
|
|
|
|
code between them. The "fall through" semantics then mean that these
|
|
|
|
|
are all handled by the same code. Since the Python switch will not
|
|
|
|
|
have fall-through semantics (which have yet to find a champion) we
|
|
|
|
|
need another solution. Here are some alternatives.
|
|
|
|
|
|
|
|
|
|
Alternative A
|
|
|
|
|
-------------
|
|
|
|
|
|
|
|
|
|
Use::
|
|
|
|
|
|
|
|
|
|
case EXPR:
|
|
|
|
|
|
|
|
|
|
to match on a single expression; use::
|
|
|
|
|
|
|
|
|
|
case EXPR, EXPR, ...:
|
|
|
|
|
|
|
|
|
|
to match on mulltiple expressions. The is interpreted so that if EXPR
|
|
|
|
|
is a parenthesized tuple or another expression whose value is a tuple,
|
|
|
|
|
the switch expression must equal that tuple, not one of its elements.
|
|
|
|
|
This means that we cannot use a variable to indicate multiple cases.
|
|
|
|
|
While this is also true in C's switch statement, it is a relatively
|
|
|
|
|
common occurrence in Python (see for example sre_compile.py).
|
|
|
|
|
|
|
|
|
|
Alternative B
|
|
|
|
|
-------------
|
|
|
|
|
|
|
|
|
|
Use::
|
|
|
|
|
|
|
|
|
|
case EXPR:
|
|
|
|
|
|
|
|
|
|
to match on a single expression; use::
|
|
|
|
|
|
|
|
|
|
case in EXPR_LIST:
|
|
|
|
|
|
|
|
|
|
to match on multiple expressions. If EXPR_LIST is a single
|
|
|
|
|
expression, the 'in' forces its interpretation as an iterable (or
|
|
|
|
|
something supporting __contains__, in a minority semantics
|
|
|
|
|
alternative). If it is multiple expressions, each of those is
|
|
|
|
|
considered for a match.
|
|
|
|
|
|
|
|
|
|
Alternative C
|
|
|
|
|
-------------
|
|
|
|
|
|
|
|
|
|
Use::
|
|
|
|
|
|
|
|
|
|
case EXPR:
|
|
|
|
|
|
|
|
|
|
to match on a single expression; use::
|
|
|
|
|
|
|
|
|
|
case EXPR, EXPR, ...:
|
|
|
|
|
|
|
|
|
|
to match on multiple expressions (as in alternative A); and use::
|
|
|
|
|
|
|
|
|
|
case *EXPR:
|
|
|
|
|
|
|
|
|
|
to match on the elements of an expression whose value is an iterable.
|
|
|
|
|
The latter two cases can be combined, so that the true syntax is more
|
|
|
|
|
like this::
|
|
|
|
|
|
|
|
|
|
case [*]EXPR, [*]EXPR, ...:
|
|
|
|
|
|
2006-06-26 14:09:41 -04:00
|
|
|
|
The `*` notation is similar to the use of prefix `*` already in use for
|
2006-06-26 14:05:39 -04:00
|
|
|
|
variable-length parameter lists and for passing computed argument
|
2006-06-27 01:06:58 -04:00
|
|
|
|
lists, and often proposed for value-unpacking (e.g. ``a, b, *c = X`` as
|
|
|
|
|
an alternative to ``(a, b), c = X[:2], X[2:]``).
|
2006-06-26 12:41:13 -04:00
|
|
|
|
|
|
|
|
|
Alternative D
|
|
|
|
|
-------------
|
|
|
|
|
|
|
|
|
|
This is a mixture of alternatives B and C; the syntax is like
|
|
|
|
|
alternative B but instead of the 'in' keyword it uses '*'. This is
|
|
|
|
|
more limited, but still allows the same flexibility. It uses::
|
|
|
|
|
|
|
|
|
|
case EXPR:
|
|
|
|
|
|
|
|
|
|
to match on a single expression and::
|
|
|
|
|
|
|
|
|
|
case *EXPR:
|
|
|
|
|
|
|
|
|
|
to match on the elements of an iterable. If one wants to specify
|
|
|
|
|
multiple matches in one case, one can write this::
|
|
|
|
|
|
|
|
|
|
case *(EXPR, EXPR, ...):
|
|
|
|
|
|
|
|
|
|
or perhaps this (although it's a bit strange because the relative
|
|
|
|
|
priority of '*' and ',' is different than elsewhere)::
|
|
|
|
|
|
|
|
|
|
case * EXPR, EXPR, ...:
|
|
|
|
|
|
|
|
|
|
Discussion
|
|
|
|
|
----------
|
|
|
|
|
|
|
|
|
|
Alternatives B, C and D are motivated by the desire to specify
|
|
|
|
|
multiple cases with the same treatment using a variable representing a
|
|
|
|
|
set (usually a tuple) rather than spelling them out. The motivation
|
|
|
|
|
for this is usually that if one has several switches over the same set
|
|
|
|
|
of cases it's a shame to have to spell out all the alternatives each
|
|
|
|
|
time. An additional motivation is to be able to specify *ranges* to
|
|
|
|
|
be matched easily and efficiently, similar to Pascal's "1..1000:"
|
|
|
|
|
notation. At the same time we want to prevent the kind of mistake
|
|
|
|
|
that is common in exception handling (and which will be addressed in
|
|
|
|
|
Python 3000 by changing the syntax of the except clause): writing
|
|
|
|
|
"case 1, 2:" where "case (1, 2):" was meant, or vice versa.
|
|
|
|
|
|
|
|
|
|
The case could be made that the need is insufficient for the added
|
|
|
|
|
complexity; C doesn't have a way to express ranges either, and it's
|
|
|
|
|
used a lot more than Pascal these days. Also, if a dispatch method
|
|
|
|
|
based on dict lookup is chosen as the semantics, large ranges could be
|
|
|
|
|
inefficient (consider range(1, sys.maxint)).
|
|
|
|
|
|
2006-06-27 01:06:58 -04:00
|
|
|
|
All in all my preferences are (from most to least favorite) B, A, D',
|
|
|
|
|
C, where D' is D without the third possibility.
|
2006-06-26 12:41:13 -04:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Semantics
|
|
|
|
|
=========
|
|
|
|
|
|
|
|
|
|
There are several issues to review before we can choose the right
|
|
|
|
|
semantics.
|
|
|
|
|
|
|
|
|
|
If/Elif Chain vs. Dict-based Dispatch
|
|
|
|
|
-------------------------------------
|
|
|
|
|
|
2006-06-27 01:06:58 -04:00
|
|
|
|
There are several main schools of thought about the switch statement's
|
|
|
|
|
semantics:
|
|
|
|
|
|
|
|
|
|
- School I wants to define the switch statement in term of an
|
|
|
|
|
equivalent if/elif chain (possibly with some optimization thrown
|
|
|
|
|
in).
|
|
|
|
|
|
|
|
|
|
- School II prefers to think of it as a dispatch on a precomputed
|
|
|
|
|
dict. There are different choices for when the precomputation
|
|
|
|
|
happens.
|
|
|
|
|
|
2006-06-27 14:01:47 -04:00
|
|
|
|
- There's also school III, which agrees with school I that the
|
2006-06-27 01:06:58 -04:00
|
|
|
|
definition of a switch statement should be in terms of an equivalent
|
|
|
|
|
if/elif chain, but concedes to the optimization camp that all
|
|
|
|
|
expressions involved must be hashable.
|
|
|
|
|
|
2006-06-27 14:01:47 -04:00
|
|
|
|
We need to further separate school I into school Ia and school Ib:
|
2006-06-27 01:06:58 -04:00
|
|
|
|
|
|
|
|
|
- School Ia has a simple position: a switch statement is translated to
|
|
|
|
|
an equivalent if/elif chain, and that's that. It should not be
|
|
|
|
|
linked to optimization at all. That is also my main objection
|
|
|
|
|
against this school: without any hint of optimization, the switch
|
|
|
|
|
statement isn't attractive enough to warrant new syntax.
|
|
|
|
|
|
2006-06-27 14:01:47 -04:00
|
|
|
|
- School Ib has a more complex position: it agrees with school II that
|
2006-06-27 01:06:58 -04:00
|
|
|
|
optimization is important, and is willing to concede the compiler
|
|
|
|
|
certain liberties to allow this. (For example, PEP 275 Solution 1.)
|
|
|
|
|
In particular, hash() of the switch and case expressions may or may
|
|
|
|
|
not be called (so it should be side-effect-free); and the case
|
|
|
|
|
expressions may not be evaluated each time as expected by the
|
|
|
|
|
if/elif chain behavior, so the case expressions should also be
|
|
|
|
|
side-effect free. My objection to this (elaborated below) is that
|
|
|
|
|
if either the hash() or the case expressions aren't
|
|
|
|
|
side-effect-free, optimized and unoptimized code may behave
|
|
|
|
|
differently.
|
|
|
|
|
|
|
|
|
|
School II grew out of the realization that optimization of commonly
|
|
|
|
|
found cases isn't so easy, and that it's better to face this head on.
|
|
|
|
|
This will become clear below.
|
|
|
|
|
|
2006-06-27 14:01:47 -04:00
|
|
|
|
The differences between school I (mostly school Ib) and school II are
|
2006-06-27 01:06:58 -04:00
|
|
|
|
threefold:
|
|
|
|
|
|
|
|
|
|
- When optimizing using a dispatch dict, if either the switch
|
|
|
|
|
expression or the case expressions are unhashable (in which case
|
2006-06-27 14:01:47 -04:00
|
|
|
|
hash() raises an exception), school Ib requires catching the hash()
|
2006-06-27 01:06:58 -04:00
|
|
|
|
failure and falling back to an if/elif chain. School II simply lets
|
|
|
|
|
the exception happen. The problem with catching an exception in
|
2006-06-27 14:01:47 -04:00
|
|
|
|
hash() as required by school Ib, is that this may hide a genuine
|
2006-06-27 01:06:58 -04:00
|
|
|
|
bug. A possible way out is to only use a dispatch dict if all case
|
|
|
|
|
expressions are ints, strings or other built-ins with known good
|
|
|
|
|
hash behavior, and to only attempt to hash the switch expression if
|
|
|
|
|
it is also one of those types. Type objects should probably also be
|
2006-06-27 14:01:47 -04:00
|
|
|
|
supported here. This is the (only) problem that school III
|
2006-06-27 01:06:58 -04:00
|
|
|
|
addresses.
|
|
|
|
|
|
|
|
|
|
- When optimizing using a dispatch dict, if the hash() function of any
|
|
|
|
|
expression involved returns an incorrect value, under school Ib,
|
|
|
|
|
optimized code will not behave the same as unoptimized code. This
|
|
|
|
|
is a well-known problem with optimization-related bugs, and waste
|
2006-06-27 14:01:47 -04:00
|
|
|
|
lots of developer time. Under school II, in this situation
|
2006-06-27 01:06:58 -04:00
|
|
|
|
incorrect results are produced at least consistently, which should
|
|
|
|
|
make debugging a bit easier. The way out proposed for the previous
|
|
|
|
|
bullet would also help here.
|
|
|
|
|
|
|
|
|
|
- School Ib doesn't have a good optimization strategy if the case
|
|
|
|
|
expressions are named constants. The compiler cannot know their
|
|
|
|
|
values for sure, and it cannot know whether they are truly constant.
|
|
|
|
|
As a way out, it has been proposed to re-evaluate the expression
|
|
|
|
|
corresponding to the case once the dict has identified which case
|
|
|
|
|
should be taken, to verify that the value of the expression didn't
|
|
|
|
|
change. But strictly speaking, all the case expressions occurring
|
|
|
|
|
before that case would also have to be checked, in order to preserve
|
|
|
|
|
the true if/elif chain semantics, thereby completely killing the
|
|
|
|
|
optimization. Another proposed solution is to have callbacks
|
|
|
|
|
notifying the dispatch dict of changes in the value of variables or
|
|
|
|
|
attributes involved in the case expressions. But this is not likely
|
|
|
|
|
implementable in the general case, and would require many namespaces
|
|
|
|
|
to bear the burden of supporting such callbacks, which currently
|
|
|
|
|
don't exist at all.
|
|
|
|
|
|
|
|
|
|
- Finally, there's a difference of opinion regarding the treatment of
|
|
|
|
|
duplicate cases (i.e. two or more cases with match expressions that
|
|
|
|
|
evaluates to the same value). School I wants to treat this the same
|
|
|
|
|
is an if/elif chain would treat it (i.e. the first match wins and
|
2006-06-27 14:01:47 -04:00
|
|
|
|
the code for the second match is silently unreachable); school II
|
2006-06-27 01:06:58 -04:00
|
|
|
|
wants this to be an error at the time the dispatch dict is frozen
|
|
|
|
|
(so dead code doesn't go undiagnosed).
|
|
|
|
|
|
2006-06-27 14:01:47 -04:00
|
|
|
|
School I sees trouble in school II's approach of pre-freezing a
|
2006-06-27 01:06:58 -04:00
|
|
|
|
dispatch dict because it places a new and unusual burden on
|
|
|
|
|
programmers to understand exactly what kinds of case values are
|
|
|
|
|
allowed to be frozen and when the case values will be frozen, or they
|
|
|
|
|
might be surprised by the switch statement's behavior.
|
|
|
|
|
|
2006-06-27 14:01:47 -04:00
|
|
|
|
School II doesn't believe that school Ia's unoptimized switch is worth
|
|
|
|
|
the effort, and it sees trouble in school Ib's proposal for
|
2006-06-27 01:06:58 -04:00
|
|
|
|
optimization, which can cause optimized and unoptimized code to behave
|
|
|
|
|
differently.
|
|
|
|
|
|
|
|
|
|
In addition, school II sees little value in allowing cases involving
|
|
|
|
|
unhashable values; after all if the user expects such values, they can
|
|
|
|
|
just as easily write an if/elif chain. School II also doesn't believe
|
|
|
|
|
that it's right to allow dead code due to overlapping cases to occur
|
|
|
|
|
unflagged, when the dict-based dispatch implementation makes it so
|
|
|
|
|
easy to trap this.
|
2006-06-26 12:41:13 -04:00
|
|
|
|
|
2006-06-28 10:41:23 -04:00
|
|
|
|
However, there are some use cases for overlapping/duplicate cases.
|
|
|
|
|
Suppose you're switching on some OS-specific constants (e.g. exported
|
|
|
|
|
by the os module or some module like that). You have a case for each.
|
|
|
|
|
But on some OS, two different constants have the same value (since on
|
|
|
|
|
that OS they are implemented the same way -- like O_TEXT and O_BINARY
|
|
|
|
|
on Unix). If duplicate cases are flagged as errors, your switch
|
|
|
|
|
wouldn't work at all on that OS. It would be much better if you could
|
|
|
|
|
arrange the cases so that one case has preference over another.
|
|
|
|
|
|
|
|
|
|
There's also the (more likely) use case where you have a set of cases
|
|
|
|
|
to be treated the same, but one member of the set must be treated
|
|
|
|
|
differently. It would be convenient to put the exception in an
|
|
|
|
|
earlier case and be done with it.
|
|
|
|
|
|
|
|
|
|
(Yes, it seems a shame not to be able to diagnose dead code due to
|
|
|
|
|
accidental case duplication. Maybe that's less important, and
|
|
|
|
|
pychecker can deal with it? After all we don't diagnose duplicate
|
|
|
|
|
method definitions either.)
|
|
|
|
|
|
|
|
|
|
This suggests school IIb: like school II but redundant cases must be
|
|
|
|
|
resolved by choosing the first match. This is trivial to implement
|
|
|
|
|
when building the dispatch dict (skip keys already present).
|
|
|
|
|
|
|
|
|
|
(An alternative would be to introduce new syntax to indicate "okay to
|
|
|
|
|
have overlapping cases" or "ok if this case is dead code" but I find
|
|
|
|
|
that overkill.)
|
|
|
|
|
|
2006-06-26 12:41:13 -04:00
|
|
|
|
Personally, I'm in school II: I believe that the dict-based dispatch
|
|
|
|
|
is the one true implementation for switch statements and that we
|
2006-06-27 01:06:58 -04:00
|
|
|
|
should face the limitiations up front, so that we can reap maximal
|
2006-06-28 10:41:23 -04:00
|
|
|
|
benefits. I'm leaning towards school IIb -- duplicate cases should be
|
|
|
|
|
resolved by the ordering of the cases instead of flagged as errors.
|
2006-06-26 12:41:13 -04:00
|
|
|
|
|
|
|
|
|
When to Freeze the Dispatch Dict
|
|
|
|
|
--------------------------------
|
|
|
|
|
|
|
|
|
|
For the supporters of school II (dict-based dispatch), the next big
|
|
|
|
|
dividing issue is when to create the dict used for switching. I call
|
|
|
|
|
this "freezing the dict".
|
|
|
|
|
|
|
|
|
|
The main problem that makes this interesting is the observation that
|
|
|
|
|
Python doesn't have named compile-time constants. What is
|
|
|
|
|
conceptually a constant, such as re.IGNORECASE, is a variable to the
|
|
|
|
|
compiler, and there's nothing to stop crooked code from modifying its
|
|
|
|
|
value.
|
|
|
|
|
|
|
|
|
|
Option 1
|
|
|
|
|
''''''''
|
|
|
|
|
|
|
|
|
|
The most limiting option is to freeze the dict in the compiler. This
|
|
|
|
|
would require that the case expressions are all literals or
|
|
|
|
|
compile-time expressions involving only literals and operators whose
|
|
|
|
|
semantics are known to the compiler, since with the current state of
|
|
|
|
|
Python's dynamic semantics and single-module compilation, there is no
|
|
|
|
|
hope for the compiler to know with sufficient certainty the values of
|
|
|
|
|
any variables occurring in such expressions. This is widely though
|
|
|
|
|
not universally considered too restrictive.
|
|
|
|
|
|
|
|
|
|
Raymond Hettinger is the main advocate of this approach. He proposes
|
|
|
|
|
a syntax where only a single literal of certain types is allowed as
|
|
|
|
|
the case expression. It has the advantage of being unambiguous and
|
|
|
|
|
easy to implement.
|
|
|
|
|
|
2006-06-26 14:47:03 -04:00
|
|
|
|
My main complaint about this is that by disallowing "named constants"
|
2006-06-26 12:41:13 -04:00
|
|
|
|
we force programmers to give up good habits. Named constants are
|
|
|
|
|
introduced in most languages to solve the problem of "magic numbers"
|
|
|
|
|
occurring in the source code. For example, sys.maxint is a lot more
|
|
|
|
|
readable than 2147483647. Raymond proposes to use string literals
|
|
|
|
|
instead of named "enums", observing that the string literal's content
|
|
|
|
|
can be the name that the constant would otherwise have. Thus, we
|
|
|
|
|
could write "case 'IGNORECASE':" instead of "case re.IGNORECASE:".
|
|
|
|
|
However, if there is a spelling error in the string literal, the case
|
2006-06-26 14:47:03 -04:00
|
|
|
|
will silently be ignored, and who knows when the bug is detected. If
|
2006-06-26 12:41:13 -04:00
|
|
|
|
there is a spelling error in a NAME, however, the error will be caught
|
|
|
|
|
as soon as it is evaluated. Also, sometimes the constants are
|
2006-06-26 14:47:03 -04:00
|
|
|
|
externally defined (e.g. when parsing a file format like JPEG) and we
|
2006-06-26 12:41:13 -04:00
|
|
|
|
can't easily choose appropriate string values. Using an explicit
|
|
|
|
|
mappping dict sounds like a poor hack.
|
|
|
|
|
|
|
|
|
|
Option 2
|
|
|
|
|
''''''''
|
|
|
|
|
|
|
|
|
|
The oldest proposal to deal with this is to freeze the dispatch dict
|
|
|
|
|
the first time the switch is executed. At this point we can assume
|
|
|
|
|
that all the named "constants" (constant in the programmer's mind,
|
|
|
|
|
though not to the compiler) used as case expressions are defined --
|
|
|
|
|
otherwise an if/elif chain would have little chance of success either.
|
|
|
|
|
Assuming the switch will be executed many times, doing some extra work
|
|
|
|
|
the first time pays back quickly by very quick dispatch times later.
|
|
|
|
|
|
2006-06-26 14:05:39 -04:00
|
|
|
|
An objection to this option is that there is no obvious object where
|
|
|
|
|
the dispatch dict can be stored. It can't be stored on the code
|
|
|
|
|
object, which is supposed to be immutable; it can't be stored on the
|
|
|
|
|
function object, since many function objects may be created for the
|
|
|
|
|
same function (e.g. for nested functions). In practice, I'm sure that
|
|
|
|
|
something can be found; it could be stored in a section of the code
|
|
|
|
|
object that's not considered when comparing two code objects or when
|
|
|
|
|
pickling or marshalling a code object; or all switches could be stored
|
|
|
|
|
in a dict indexed by weak references to code objects. The solution
|
|
|
|
|
should also be careful not to leak switch dicts between multiple
|
|
|
|
|
interpreters.
|
2006-06-26 12:41:13 -04:00
|
|
|
|
|
|
|
|
|
Another objection is that the first-use rule allows obfuscated code
|
|
|
|
|
like this::
|
|
|
|
|
|
|
|
|
|
def foo(x, y):
|
|
|
|
|
switch x:
|
|
|
|
|
case y:
|
|
|
|
|
print 42
|
|
|
|
|
|
|
|
|
|
To the untrained eye (not familiar with Python) this code would be
|
|
|
|
|
equivalent to this::
|
|
|
|
|
|
|
|
|
|
def foo(x, y):
|
|
|
|
|
if x == y:
|
|
|
|
|
print 42
|
|
|
|
|
|
|
|
|
|
but that's not what it does (unless it is always called with the same
|
|
|
|
|
value as the second argument). This has been addressed by suggesting
|
|
|
|
|
that the case expressions should not be allowed to reference local
|
2006-06-26 14:05:39 -04:00
|
|
|
|
variables, but this is somewhat arbitrary.
|
2006-06-26 12:41:13 -04:00
|
|
|
|
|
|
|
|
|
A final objection is that in a multi-threaded application, the
|
|
|
|
|
first-use rule requires intricate locking in order to guarantee the
|
|
|
|
|
correct semantics. (The first-use rule suggests a promise that side
|
2006-06-26 14:05:39 -04:00
|
|
|
|
effects of case expressions are incurred exactly once.) This may be
|
|
|
|
|
as tricky as the import lock has proved to be, since the lock has to
|
|
|
|
|
be held while all the case expressions are being evaluated.
|
2006-06-26 12:41:13 -04:00
|
|
|
|
|
|
|
|
|
Option 3
|
|
|
|
|
''''''''
|
|
|
|
|
|
2006-06-26 14:05:39 -04:00
|
|
|
|
A proposal that has been winning support (including mine) is to freeze
|
|
|
|
|
a switch's dict when the innermost function containing it is defined.
|
|
|
|
|
The switch dict is stored on the function object, just as parameter
|
|
|
|
|
defaults are, and in fact the case expressions are evaluated at the
|
|
|
|
|
same time and in the same scope as the parameter defaults (i.e. in the
|
|
|
|
|
scope containing the function definition).
|
|
|
|
|
|
|
|
|
|
This option has the advantage of avoiding many of the finesses needed
|
|
|
|
|
to make option 2 work: there's no need for locking, no worry about
|
|
|
|
|
immutable code objects or multiple interpreters. It also provides a
|
|
|
|
|
clear explanation for why locals can't be referenced in case
|
|
|
|
|
expressions.
|
|
|
|
|
|
|
|
|
|
This option works just as well for situations where one would
|
|
|
|
|
typically use a switch; case expressions involving imported or global
|
|
|
|
|
named constants work exactly the same way as in option 2, as long as
|
|
|
|
|
they are imported or defined before the function definition is
|
|
|
|
|
encountered.
|
|
|
|
|
|
|
|
|
|
A downside however is that the dispatch dict for a switch inside a
|
|
|
|
|
nested function must be recomputed each time the nested function is
|
|
|
|
|
defined. For certain "functional" styles of programming this may make
|
|
|
|
|
switch unattractive in nested functions. (Unless all case expressions
|
|
|
|
|
are compile-time constants; then the compiler is of course free to
|
|
|
|
|
optimize away the swich freezing code and make the dispatch table part
|
|
|
|
|
of the code object.)
|
|
|
|
|
|
|
|
|
|
Another downside is that under this option, there's no clear moment
|
|
|
|
|
when the dispatch dict is frozen for a switch that doesn't occur
|
|
|
|
|
inside a function. There are a few pragmatic choices for how to treat
|
|
|
|
|
a switch outside a function:
|
|
|
|
|
|
|
|
|
|
(a) Disallow it.
|
|
|
|
|
(b) Translate it into an if/elif chain.
|
|
|
|
|
(c) Allow only compile-time constant expressions.
|
|
|
|
|
(d) Compute the dispatch dict each time the switch is reached.
|
|
|
|
|
(e) Like (b) but tests that all expressions evaluated are hashable.
|
|
|
|
|
|
|
|
|
|
Of these, (a) seems too restrictive: it's uniformly worse than (c);
|
|
|
|
|
and (d) has poor performance for little or no benefits compared to
|
|
|
|
|
(b). It doesn't make sense to have a performance-critical inner loop
|
|
|
|
|
at the module level, as all local variable references are slow there;
|
|
|
|
|
hence (b) is my (weak) favorite. Perhaps I should favor (e), which
|
|
|
|
|
attempts to prevent atypical use of a switch; examples that work
|
|
|
|
|
interactively but not in a function are annoying. In the end I don't
|
|
|
|
|
think this issue is all that important (except it must be resolved
|
|
|
|
|
somehow) and am willing to leave it up to whoever ends up implementing
|
|
|
|
|
it.
|
|
|
|
|
|
|
|
|
|
When a switch occurs in a class but not in a function, we can freeze
|
|
|
|
|
the dispatch dict at the same time the temporary function object
|
|
|
|
|
representing the class body is created. This means the case
|
|
|
|
|
expressions can reference module globals but not class variables.
|
|
|
|
|
Alternatively, if we choose (b) above, we could choose this
|
|
|
|
|
implementation inside a class definition as well.
|
|
|
|
|
|
|
|
|
|
Option 4
|
|
|
|
|
''''''''
|
|
|
|
|
|
|
|
|
|
There are a number of proposals to add a construct to the language
|
|
|
|
|
that makes the concept of a value pre-computed at function definition
|
|
|
|
|
time generally available, without tying it either to parameter default
|
|
|
|
|
values or case expressions. Some keywords proposed include 'const',
|
|
|
|
|
'static', 'only' or 'cached'. The associated syntax and semantics
|
|
|
|
|
vary.
|
|
|
|
|
|
|
|
|
|
These proposals are out of scope for this PEP, except to suggest that
|
|
|
|
|
*if* such a proposal is accepted, there are two ways for the switch to
|
|
|
|
|
benefit: we could require case expressions to be either compile-time
|
|
|
|
|
constants or pre-computed values; or we could make pre-computed values
|
|
|
|
|
the default (and only) evaluation mode for case expressions. The
|
|
|
|
|
latter would be my preference, since I don't see a use for more
|
|
|
|
|
dynamic case expressions that isn't addressed adequately by writing an
|
|
|
|
|
explicit if/elif chain.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Conclusion
|
|
|
|
|
==========
|
|
|
|
|
|
|
|
|
|
It is too early to decide. I'd like to see at least one completed
|
|
|
|
|
proposal for pre-computed values before deciding. In the mean time,
|
|
|
|
|
Python is fine without a switch statement, and perhaps those who claim
|
|
|
|
|
it would be a mistake to add one are right.
|
2006-06-26 12:41:13 -04:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Copyright
|
|
|
|
|
=========
|
|
|
|
|
|
|
|
|
|
This document has been placed in the public domain.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
..
|
|
|
|
|
Local Variables:
|
|
|
|
|
mode: indented-text
|
|
|
|
|
indent-tabs-mode: nil
|
|
|
|
|
sentence-end-double-space: t
|
|
|
|
|
fill-column: 70
|
|
|
|
|
coding: utf-8
|
|
|
|
|
End:
|