python-peps/pep-3103.txt

638 lines
24 KiB
Plaintext
Raw Normal View History

PEP: 3103
Title: A Switch/Case Statement
Version: $Revision$
Last-Modified: $Date$
Author: guido@python.org (Guido van Rossum)
Status: Rejected
Type: Standards Track
Content-Type: text/x-rst
Created: 25-Jun-2006
Python-Version: 3.0
Post-History: 26-Jun-2006
Rejection Notice
================
A quick poll during my keynote presentation at PyCon 2007 shows this
proposal has no popular support. I therefore reject it.
Abstract
========
Python-dev has recently seen a flurry of discussion on adding a switch
statement. In this PEP I'm trying to extract my own preferences from
the smorgasboard of proposals, discussing alternatives and explaining
my choices where I can. I'll also indicate how strongly I feel about
alternatives I discuss.
This PEP should be seen as an alternative to PEP 275. My views are
somewhat different from that PEP's author, but I'm grateful for the
work done in that PEP.
This PEP introduces canonical names for the many variants that have
been discussed for different aspects of the syntax and semantics, such
as "alternative 1", "school II", "option 3" and so on. Hopefully
these names will help the discussion.
Rationale
=========
A common programming idiom is to consider an expression and do
different things depending on its value. This is usually done with a
chain of if/elif tests; I'll refer to this form as the "if/elif
chain". There are two main motivations to want to introduce new
syntax for this idiom:
- It is repetitive: the variable and the test operator, usually '=='
or 'in', are repeated in each if/elif branch.
2006-06-26 14:47:03 -04:00
- It is inefficient: when an expression matches the last test value
(or no test value at all) it is compared to each of the preceding
test values.
Both of these complaints are relatively mild; there isn't a lot of
readability or performance to be gained by writing this differently.
Yet, some kind of switch statement is found in many languages and it
is not unreasonable to expect that its addition to Python will allow
us to write up certain code more cleanly and efficiently than before.
There are forms of dispatch that are not suitable for the proposed
switch statement; for example, when the number of cases is not
statically known, or when it is desirable to place the code for
different cases in different classes or files.
Basic Syntax
============
I'm considering several variants of the syntax first proposed in PEP
275 here. There are lots of other possibilities, but I don't see that
they add anything.
I've recently been converted to alternative 1.
2006-06-26 14:47:03 -04:00
I should note that all alternatives here have the "implicit break"
property: at the end of the suite for a particular case, the control
flow jumps to the end of the whole switch statement. There is no way
to pass control from one case to another. This in contrast to C,
where an explicit 'break' statement is required to prevent falling
through to the next case.
In all alternatives, the else-suite is optional. It is more Pythonic
to use 'else' here rather than introducing a new reserved word,
'default', as in C.
Semantics are discussed in the next top-level section.
Alternative 1
-------------
This is the preferred form in PEP 275::
switch EXPR:
case EXPR:
SUITE
case EXPR:
SUITE
...
else:
SUITE
The main downside is that the suites where all the action is are
indented two levels deep; this can be remedied by indenting the cases
"half a level" (e.g. 2 spaces if the general indentation level is 4).
Alternative 2
-------------
This is Fredrik Lundh's preferred form; it differs by not indenting
the cases::
switch EXPR:
case EXPR:
SUITE
case EXPR:
SUITE
....
else:
SUITE
Some reasons not to choose this include expected difficulties for
auto-indenting editors, folding editors, and the like; and confused
users. There are no situations currently in Python where a line
ending in a colon is followed by an unindented line.
Alternative 3
-------------
This is the same as alternative 2 but leaves out the colon after the
switch::
switch EXPR
case EXPR:
SUITE
case EXPR:
SUITE
....
else:
SUITE
The hope of this alternative is that it will not upset the auto-indent
logic of the average Python-aware text editor less. But it looks
strange to me.
Alternative 4
-------------
This leaves out the 'case' keyword on the basis that it is redundant::
switch EXPR:
EXPR:
SUITE
EXPR:
SUITE
...
else:
SUITE
Unfortunately now we are forced to indent the case expressions,
because otherwise (at least in the absence of an 'else' keyword) the
parser would have a hard time distinguishing between an unindented
case expression (which continues the switch statement) or an unrelated
statement that starts like an expression (such as an assignment or a
procedure call). The parser is not smart enough to backtrack once it
sees the colon. This is my least favorite alternative.
Extended Syntax
===============
There is one additional concern that needs to be addressed
syntactically. Often two or more values need to be treated the same.
In C, this done by writing multiple case labels together without any
code between them. The "fall through" semantics then mean that these
are all handled by the same code. Since the Python switch will not
have fall-through semantics (which have yet to find a champion) we
need another solution. Here are some alternatives.
Alternative A
-------------
Use::
case EXPR:
to match on a single expression; use::
case EXPR, EXPR, ...:
to match on mulltiple expressions. The is interpreted so that if EXPR
is a parenthesized tuple or another expression whose value is a tuple,
the switch expression must equal that tuple, not one of its elements.
This means that we cannot use a variable to indicate multiple cases.
While this is also true in C's switch statement, it is a relatively
common occurrence in Python (see for example sre_compile.py).
Alternative B
-------------
Use::
case EXPR:
to match on a single expression; use::
case in EXPR_LIST:
to match on multiple expressions. If EXPR_LIST is a single
expression, the 'in' forces its interpretation as an iterable (or
something supporting __contains__, in a minority semantics
alternative). If it is multiple expressions, each of those is
considered for a match.
Alternative C
-------------
Use::
case EXPR:
to match on a single expression; use::
case EXPR, EXPR, ...:
to match on multiple expressions (as in alternative A); and use::
case *EXPR:
to match on the elements of an expression whose value is an iterable.
The latter two cases can be combined, so that the true syntax is more
like this::
case [*]EXPR, [*]EXPR, ...:
2006-06-26 14:09:41 -04:00
The `*` notation is similar to the use of prefix `*` already in use for
variable-length parameter lists and for passing computed argument
lists, and often proposed for value-unpacking (e.g. ``a, b, *c = X`` as
an alternative to ``(a, b), c = X[:2], X[2:]``).
Alternative D
-------------
This is a mixture of alternatives B and C; the syntax is like
alternative B but instead of the 'in' keyword it uses '*'. This is
more limited, but still allows the same flexibility. It uses::
case EXPR:
to match on a single expression and::
case *EXPR:
to match on the elements of an iterable. If one wants to specify
multiple matches in one case, one can write this::
case *(EXPR, EXPR, ...):
or perhaps this (although it's a bit strange because the relative
priority of '*' and ',' is different than elsewhere)::
case * EXPR, EXPR, ...:
Discussion
----------
Alternatives B, C and D are motivated by the desire to specify
multiple cases with the same treatment using a variable representing a
set (usually a tuple) rather than spelling them out. The motivation
for this is usually that if one has several switches over the same set
of cases it's a shame to have to spell out all the alternatives each
time. An additional motivation is to be able to specify *ranges* to
be matched easily and efficiently, similar to Pascal's "1..1000:"
notation. At the same time we want to prevent the kind of mistake
that is common in exception handling (and which will be addressed in
Python 3000 by changing the syntax of the except clause): writing
"case 1, 2:" where "case (1, 2):" was meant, or vice versa.
The case could be made that the need is insufficient for the added
complexity; C doesn't have a way to express ranges either, and it's
used a lot more than Pascal these days. Also, if a dispatch method
based on dict lookup is chosen as the semantics, large ranges could be
inefficient (consider range(1, sys.maxint)).
All in all my preferences are (from most to least favorite) B, A, D',
C, where D' is D without the third possibility.
Semantics
=========
There are several issues to review before we can choose the right
semantics.
If/Elif Chain vs. Dict-based Dispatch
-------------------------------------
There are several main schools of thought about the switch statement's
semantics:
- School I wants to define the switch statement in term of an
equivalent if/elif chain (possibly with some optimization thrown
in).
- School II prefers to think of it as a dispatch on a precomputed
dict. There are different choices for when the precomputation
happens.
- There's also school III, which agrees with school I that the
definition of a switch statement should be in terms of an equivalent
if/elif chain, but concedes to the optimization camp that all
expressions involved must be hashable.
We need to further separate school I into school Ia and school Ib:
- School Ia has a simple position: a switch statement is translated to
an equivalent if/elif chain, and that's that. It should not be
linked to optimization at all. That is also my main objection
against this school: without any hint of optimization, the switch
statement isn't attractive enough to warrant new syntax.
- School Ib has a more complex position: it agrees with school II that
optimization is important, and is willing to concede the compiler
certain liberties to allow this. (For example, PEP 275 Solution 1.)
In particular, hash() of the switch and case expressions may or may
not be called (so it should be side-effect-free); and the case
expressions may not be evaluated each time as expected by the
if/elif chain behavior, so the case expressions should also be
side-effect free. My objection to this (elaborated below) is that
if either the hash() or the case expressions aren't
side-effect-free, optimized and unoptimized code may behave
differently.
School II grew out of the realization that optimization of commonly
found cases isn't so easy, and that it's better to face this head on.
This will become clear below.
The differences between school I (mostly school Ib) and school II are
threefold:
- When optimizing using a dispatch dict, if either the switch
expression or the case expressions are unhashable (in which case
hash() raises an exception), school Ib requires catching the hash()
failure and falling back to an if/elif chain. School II simply lets
the exception happen. The problem with catching an exception in
hash() as required by school Ib, is that this may hide a genuine
bug. A possible way out is to only use a dispatch dict if all case
expressions are ints, strings or other built-ins with known good
hash behavior, and to only attempt to hash the switch expression if
it is also one of those types. Type objects should probably also be
supported here. This is the (only) problem that school III
addresses.
- When optimizing using a dispatch dict, if the hash() function of any
expression involved returns an incorrect value, under school Ib,
optimized code will not behave the same as unoptimized code. This
is a well-known problem with optimization-related bugs, and waste
lots of developer time. Under school II, in this situation
incorrect results are produced at least consistently, which should
make debugging a bit easier. The way out proposed for the previous
bullet would also help here.
- School Ib doesn't have a good optimization strategy if the case
expressions are named constants. The compiler cannot know their
values for sure, and it cannot know whether they are truly constant.
As a way out, it has been proposed to re-evaluate the expression
corresponding to the case once the dict has identified which case
should be taken, to verify that the value of the expression didn't
change. But strictly speaking, all the case expressions occurring
before that case would also have to be checked, in order to preserve
the true if/elif chain semantics, thereby completely killing the
optimization. Another proposed solution is to have callbacks
notifying the dispatch dict of changes in the value of variables or
attributes involved in the case expressions. But this is not likely
implementable in the general case, and would require many namespaces
to bear the burden of supporting such callbacks, which currently
don't exist at all.
- Finally, there's a difference of opinion regarding the treatment of
duplicate cases (i.e. two or more cases with match expressions that
evaluates to the same value). School I wants to treat this the same
is an if/elif chain would treat it (i.e. the first match wins and
the code for the second match is silently unreachable); school II
wants this to be an error at the time the dispatch dict is frozen
(so dead code doesn't go undiagnosed).
School I sees trouble in school II's approach of pre-freezing a
dispatch dict because it places a new and unusual burden on
programmers to understand exactly what kinds of case values are
allowed to be frozen and when the case values will be frozen, or they
might be surprised by the switch statement's behavior.
School II doesn't believe that school Ia's unoptimized switch is worth
the effort, and it sees trouble in school Ib's proposal for
optimization, which can cause optimized and unoptimized code to behave
differently.
In addition, school II sees little value in allowing cases involving
unhashable values; after all if the user expects such values, they can
just as easily write an if/elif chain. School II also doesn't believe
that it's right to allow dead code due to overlapping cases to occur
unflagged, when the dict-based dispatch implementation makes it so
easy to trap this.
However, there are some use cases for overlapping/duplicate cases.
Suppose you're switching on some OS-specific constants (e.g. exported
by the os module or some module like that). You have a case for each.
But on some OS, two different constants have the same value (since on
that OS they are implemented the same way -- like O_TEXT and O_BINARY
on Unix). If duplicate cases are flagged as errors, your switch
wouldn't work at all on that OS. It would be much better if you could
arrange the cases so that one case has preference over another.
There's also the (more likely) use case where you have a set of cases
to be treated the same, but one member of the set must be treated
differently. It would be convenient to put the exception in an
earlier case and be done with it.
(Yes, it seems a shame not to be able to diagnose dead code due to
accidental case duplication. Maybe that's less important, and
pychecker can deal with it? After all we don't diagnose duplicate
method definitions either.)
This suggests school IIb: like school II but redundant cases must be
resolved by choosing the first match. This is trivial to implement
when building the dispatch dict (skip keys already present).
(An alternative would be to introduce new syntax to indicate "okay to
have overlapping cases" or "ok if this case is dead code" but I find
that overkill.)
Personally, I'm in school II: I believe that the dict-based dispatch
is the one true implementation for switch statements and that we
should face the limitiations up front, so that we can reap maximal
benefits. I'm leaning towards school IIb -- duplicate cases should be
resolved by the ordering of the cases instead of flagged as errors.
When to Freeze the Dispatch Dict
--------------------------------
For the supporters of school II (dict-based dispatch), the next big
dividing issue is when to create the dict used for switching. I call
this "freezing the dict".
The main problem that makes this interesting is the observation that
Python doesn't have named compile-time constants. What is
conceptually a constant, such as re.IGNORECASE, is a variable to the
compiler, and there's nothing to stop crooked code from modifying its
value.
Option 1
''''''''
The most limiting option is to freeze the dict in the compiler. This
would require that the case expressions are all literals or
compile-time expressions involving only literals and operators whose
semantics are known to the compiler, since with the current state of
Python's dynamic semantics and single-module compilation, there is no
hope for the compiler to know with sufficient certainty the values of
any variables occurring in such expressions. This is widely though
not universally considered too restrictive.
Raymond Hettinger is the main advocate of this approach. He proposes
a syntax where only a single literal of certain types is allowed as
the case expression. It has the advantage of being unambiguous and
easy to implement.
2006-06-26 14:47:03 -04:00
My main complaint about this is that by disallowing "named constants"
we force programmers to give up good habits. Named constants are
introduced in most languages to solve the problem of "magic numbers"
occurring in the source code. For example, sys.maxint is a lot more
readable than 2147483647. Raymond proposes to use string literals
instead of named "enums", observing that the string literal's content
can be the name that the constant would otherwise have. Thus, we
could write "case 'IGNORECASE':" instead of "case re.IGNORECASE:".
However, if there is a spelling error in the string literal, the case
2006-06-26 14:47:03 -04:00
will silently be ignored, and who knows when the bug is detected. If
there is a spelling error in a NAME, however, the error will be caught
as soon as it is evaluated. Also, sometimes the constants are
2006-06-26 14:47:03 -04:00
externally defined (e.g. when parsing a file format like JPEG) and we
can't easily choose appropriate string values. Using an explicit
mappping dict sounds like a poor hack.
Option 2
''''''''
The oldest proposal to deal with this is to freeze the dispatch dict
the first time the switch is executed. At this point we can assume
that all the named "constants" (constant in the programmer's mind,
though not to the compiler) used as case expressions are defined --
otherwise an if/elif chain would have little chance of success either.
Assuming the switch will be executed many times, doing some extra work
the first time pays back quickly by very quick dispatch times later.
An objection to this option is that there is no obvious object where
the dispatch dict can be stored. It can't be stored on the code
object, which is supposed to be immutable; it can't be stored on the
function object, since many function objects may be created for the
same function (e.g. for nested functions). In practice, I'm sure that
something can be found; it could be stored in a section of the code
object that's not considered when comparing two code objects or when
pickling or marshalling a code object; or all switches could be stored
in a dict indexed by weak references to code objects. The solution
should also be careful not to leak switch dicts between multiple
interpreters.
Another objection is that the first-use rule allows obfuscated code
like this::
def foo(x, y):
switch x:
case y:
print 42
To the untrained eye (not familiar with Python) this code would be
equivalent to this::
def foo(x, y):
if x == y:
print 42
but that's not what it does (unless it is always called with the same
value as the second argument). This has been addressed by suggesting
that the case expressions should not be allowed to reference local
variables, but this is somewhat arbitrary.
A final objection is that in a multi-threaded application, the
first-use rule requires intricate locking in order to guarantee the
correct semantics. (The first-use rule suggests a promise that side
effects of case expressions are incurred exactly once.) This may be
as tricky as the import lock has proved to be, since the lock has to
be held while all the case expressions are being evaluated.
Option 3
''''''''
A proposal that has been winning support (including mine) is to freeze
a switch's dict when the innermost function containing it is defined.
The switch dict is stored on the function object, just as parameter
defaults are, and in fact the case expressions are evaluated at the
same time and in the same scope as the parameter defaults (i.e. in the
scope containing the function definition).
This option has the advantage of avoiding many of the finesses needed
to make option 2 work: there's no need for locking, no worry about
immutable code objects or multiple interpreters. It also provides a
clear explanation for why locals can't be referenced in case
expressions.
This option works just as well for situations where one would
typically use a switch; case expressions involving imported or global
named constants work exactly the same way as in option 2, as long as
they are imported or defined before the function definition is
encountered.
A downside however is that the dispatch dict for a switch inside a
nested function must be recomputed each time the nested function is
defined. For certain "functional" styles of programming this may make
switch unattractive in nested functions. (Unless all case expressions
are compile-time constants; then the compiler is of course free to
optimize away the swich freezing code and make the dispatch table part
of the code object.)
Another downside is that under this option, there's no clear moment
when the dispatch dict is frozen for a switch that doesn't occur
inside a function. There are a few pragmatic choices for how to treat
a switch outside a function:
(a) Disallow it.
(b) Translate it into an if/elif chain.
(c) Allow only compile-time constant expressions.
(d) Compute the dispatch dict each time the switch is reached.
(e) Like (b) but tests that all expressions evaluated are hashable.
Of these, (a) seems too restrictive: it's uniformly worse than (c);
and (d) has poor performance for little or no benefits compared to
(b). It doesn't make sense to have a performance-critical inner loop
at the module level, as all local variable references are slow there;
hence (b) is my (weak) favorite. Perhaps I should favor (e), which
attempts to prevent atypical use of a switch; examples that work
interactively but not in a function are annoying. In the end I don't
think this issue is all that important (except it must be resolved
somehow) and am willing to leave it up to whoever ends up implementing
it.
When a switch occurs in a class but not in a function, we can freeze
the dispatch dict at the same time the temporary function object
representing the class body is created. This means the case
expressions can reference module globals but not class variables.
Alternatively, if we choose (b) above, we could choose this
implementation inside a class definition as well.
Option 4
''''''''
There are a number of proposals to add a construct to the language
that makes the concept of a value pre-computed at function definition
time generally available, without tying it either to parameter default
values or case expressions. Some keywords proposed include 'const',
'static', 'only' or 'cached'. The associated syntax and semantics
vary.
These proposals are out of scope for this PEP, except to suggest that
*if* such a proposal is accepted, there are two ways for the switch to
benefit: we could require case expressions to be either compile-time
constants or pre-computed values; or we could make pre-computed values
the default (and only) evaluation mode for case expressions. The
latter would be my preference, since I don't see a use for more
dynamic case expressions that isn't addressed adequately by writing an
explicit if/elif chain.
Conclusion
==========
It is too early to decide. I'd like to see at least one completed
proposal for pre-computed values before deciding. In the mean time,
Python is fine without a switch statement, and perhaps those who claim
it would be a mistake to add one are right.
Copyright
=========
This document has been placed in the public domain.
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End: