python-peps/pep-3103.txt

546 lines
20 KiB
Plaintext
Raw Normal View History

PEP: 3103
Title: A Switch/Case Statement
Version: $Revision$
Last-Modified: $Date$
Author: guido@python.org (Guido van Rossum)
Status: Draft
Type: Standards Track
Python-Version: 3.0
Content-Type: text/x-rst
Created: 25-Jun-2006
Post-History: 26-Jun-2006
Abstract
========
Python-dev has recently seen a flurry of discussion on adding a switch
statement. In this PEP I'm trying to extract my own preferences from
the smorgasboard of proposals, discussing alternatives and explaining
my choices where I can. I'll also indicate how strongly I feel about
alternatives I discuss.
This PEP should be seen as an alternative to PEP 275. My views are
somewhat different from that PEP's author, but I'm grateful for the
work done in that PEP.
This PEP introduces canonical names for the many variants that have
been discussed for different aspects of the syntax and semantics, such
as "alternative 2", "school II", "Option 3" and so on. Hopefully
these names will help the discussion.
Rationale
=========
A common programming idiom is to consider an expression and do
different things depending on its value. This is usually done with a
chain of if/elif tests; I'll refer to this form as the "if/elif
chain". There are two main motivations to want to introduce new
syntax for this idiom:
- It is repetitive: the variable and the test operator, usually '=='
or 'in', are repeated in each if/elif branch.
2006-06-26 14:47:03 -04:00
- It is inefficient: when an expression matches the last test value
(or no test value at all) it is compared to each of the preceding
test values.
Both of these complaints are relatively mild; there isn't a lot of
readability or performance to be gained by writing this differently.
Yet, some kind of switch statement is found in many languages and it
is not unreasonable to expect that its addition to Python will allow
us to write up certain code more cleanly and efficiently than before.
There are forms of dispatch that are not suitable for the proposed
switch statement; for example, when the number of cases is not
statically known, or when it is desirable to place the code for
different cases in different classes or files.
Basic Syntax
============
I'm considering several variants of the syntax first proposed in PEP
275 here. There are lots of other possibilities, but I don't see that
they add anything.
My current preference is alternative 2.
2006-06-26 14:47:03 -04:00
I should note that all alternatives here have the "implicit break"
property: at the end of the suite for a particular case, the control
flow jumps to the end of the whole switch statement. There is no way
to pass control from one case to another. This in contrast to C,
where an explicit 'break' statement is required to prevent falling
through to the next case.
In all alternatives, the else-suite is optional. It is more Pythonic
to use 'else' here rather than introducing a new reserved word,
'default', as in C.
Semantics are discussed in the next top-level section.
Alternative 1
-------------
This is the preferred form in PEP 275::
switch EXPR:
case EXPR:
SUITE
case EXPR:
SUITE
...
else:
SUITE
The main downside is that the suites where all the action is are
indented two levels deep.
Alternative 2
-------------
This is Fredrik Lundh's preferred form; it differs by not indenting
the cases::
switch EXPR:
case EXPR:
SUITE
case EXPR:
SUITE
....
else:
SUITE
Alternative 3
-------------
This is the same as alternative 2 but leaves out the colon after the
switch::
switch EXPR
case EXPR:
SUITE
case EXPR:
SUITE
....
else:
SUITE
The hope of this alternative is that is will upset the auto-indent
logic of the average Python-aware text editor less. But it looks
strange to me.
Alternative 4
-------------
This leaves out the 'case' keyword on the basis that it is redundant::
switch EXPR:
EXPR:
SUITE
EXPR:
SUITE
...
else:
SUITE
Unfortunately now we are forced to indent the case expressions,
because otherwise (at least in the absence of an 'else' keyword) the
parser would have a hard time distinguishing between an unindented
case expression (which continues the switch statement) or an unrelated
statement that starts like an expression (such as an assignment or a
procedure call). The parser is not smart enough to backtrack once it
sees the colon. This is my least favorite alternative.
Extended Syntax
===============
There is one additional concern that needs to be addressed
syntactically. Often two or more values need to be treated the same.
In C, this done by writing multiple case labels together without any
code between them. The "fall through" semantics then mean that these
are all handled by the same code. Since the Python switch will not
have fall-through semantics (which have yet to find a champion) we
need another solution. Here are some alternatives.
Alternative A
-------------
Use::
case EXPR:
to match on a single expression; use::
case EXPR, EXPR, ...:
to match on mulltiple expressions. The is interpreted so that if EXPR
is a parenthesized tuple or another expression whose value is a tuple,
the switch expression must equal that tuple, not one of its elements.
This means that we cannot use a variable to indicate multiple cases.
While this is also true in C's switch statement, it is a relatively
common occurrence in Python (see for example sre_compile.py).
Alternative B
-------------
Use::
case EXPR:
to match on a single expression; use::
case in EXPR_LIST:
to match on multiple expressions. If EXPR_LIST is a single
expression, the 'in' forces its interpretation as an iterable (or
something supporting __contains__, in a minority semantics
alternative). If it is multiple expressions, each of those is
considered for a match.
Alternative C
-------------
Use::
case EXPR:
to match on a single expression; use::
case EXPR, EXPR, ...:
to match on multiple expressions (as in alternative A); and use::
case *EXPR:
to match on the elements of an expression whose value is an iterable.
The latter two cases can be combined, so that the true syntax is more
like this::
case [*]EXPR, [*]EXPR, ...:
2006-06-26 14:09:41 -04:00
The `*` notation is similar to the use of prefix `*` already in use for
variable-length parameter lists and for passing computed argument
lists, and often proposed for value-unpacking (e.g. "a, b, *c = X" as
an alternative to "(a, b), c = X[:2], X[2:]").
Alternative D
-------------
This is a mixture of alternatives B and C; the syntax is like
alternative B but instead of the 'in' keyword it uses '*'. This is
more limited, but still allows the same flexibility. It uses::
case EXPR:
to match on a single expression and::
case *EXPR:
to match on the elements of an iterable. If one wants to specify
multiple matches in one case, one can write this::
case *(EXPR, EXPR, ...):
or perhaps this (although it's a bit strange because the relative
priority of '*' and ',' is different than elsewhere)::
case * EXPR, EXPR, ...:
Discussion
----------
Alternatives B, C and D are motivated by the desire to specify
multiple cases with the same treatment using a variable representing a
set (usually a tuple) rather than spelling them out. The motivation
for this is usually that if one has several switches over the same set
of cases it's a shame to have to spell out all the alternatives each
time. An additional motivation is to be able to specify *ranges* to
be matched easily and efficiently, similar to Pascal's "1..1000:"
notation. At the same time we want to prevent the kind of mistake
that is common in exception handling (and which will be addressed in
Python 3000 by changing the syntax of the except clause): writing
"case 1, 2:" where "case (1, 2):" was meant, or vice versa.
The case could be made that the need is insufficient for the added
complexity; C doesn't have a way to express ranges either, and it's
used a lot more than Pascal these days. Also, if a dispatch method
based on dict lookup is chosen as the semantics, large ranges could be
inefficient (consider range(1, sys.maxint)).
All in all my preferences are (in descending preference) B, A, D', C
where D' is D without the third possibility.
Semantics
=========
There are several issues to review before we can choose the right
semantics.
If/Elif Chain vs. Dict-based Dispatch
-------------------------------------
There are two main schools of thought about the switch statement's
semantics. School I wants to define the switch statement in term of
an equivalent if/elif chain. School II prefers to think of it as a
dispatch on a precomputed dictionary.
The difference is mainly important when either the switch expression
or one of the case expressions is not hashable; school I wants this to
be handled as it would be by an if/elif chain (i.e. hashability of the
expressions involved doesn't matter) while school II is willing to say
that the switch expression and all the case expressions must be
hashable if a switch is to be used; otherwise the user should have
written an if/elif chain.
There's also a difference of opinion regarding the treatment of
duplicate cases (i.e. two or more cases with the same match
expression). School I wants to treat this the same is an if/elif
chain would treat it (i.e. the first match wins and the code for the
second match is silently unreachable); school II generally wants this
to be an error at the time the switch is frozen.
There's also a school III which states that the definition of a switch
statement should be in terms of an equivalent if/elif chain, with the
exception that all the expressions must be hashable.
School I believes that the if/elif chain is the only reasonable,
surprise-free way of defining switch semantics, and that optimizations
as suggested by PEP 275's Solution 1 are sufficient to make most
common uses fast. School I sees trouble in the approach of
pre-freezing a dispatch dictionary because it places a new and unusual
burden on programmers to understand exactly what kinds of case values
are allowed to be frozen and when the case values will be frozen, or
they might be surprised by the switch statement's behavior.
School II sees trouble in trying to achieve semantics that match
those of an if/elif chain while optimizing the switch statement into
a hash lookup in a dispatch dictionary. In an if/elif chain, the
test "x == y" might well be comparing two unhashable values
(e.g. two lists); even "x == 1" could be comparing a user-defined
class instance that is not hashable but happens to define equality to
integers. Worse, the hash function might have a bug or a side effect;
if we generate code that believes the hash, a buggy hash might
generate an incorrect match, and if we generate code that catches
errors in the hash to fall back on an if/elif chain, we might hide
genuine bugs. In addition, school II sees little value in allowing
cases involving unhashable values; after all if the user expects such
values, they can just as easily write an if/elif chain. School II
also doesn't believe that it's fair to allow dead code due to
2006-06-26 14:47:03 -04:00
overlapping cases to occur unflagged, when the dict-based dispatch
implementation makes it so easy to trap this.
School III admits the problems with making hash() optional, but still
believes that the true semantics should be defined by an if/elif chain
even if the implementation should be allowed to use dict-based
dispatch as an optimization. This means that duplicate cases must be
resolved by always choosing the first case, making the second case
undiagnosed dead code.
Personally, I'm in school II: I believe that the dict-based dispatch
is the one true implementation for switch statements and that we
should face the limitiations and benefits up front.
When to Freeze the Dispatch Dict
--------------------------------
For the supporters of school II (dict-based dispatch), the next big
dividing issue is when to create the dict used for switching. I call
this "freezing the dict".
The main problem that makes this interesting is the observation that
Python doesn't have named compile-time constants. What is
conceptually a constant, such as re.IGNORECASE, is a variable to the
compiler, and there's nothing to stop crooked code from modifying its
value.
Option 1
''''''''
The most limiting option is to freeze the dict in the compiler. This
would require that the case expressions are all literals or
compile-time expressions involving only literals and operators whose
semantics are known to the compiler, since with the current state of
Python's dynamic semantics and single-module compilation, there is no
hope for the compiler to know with sufficient certainty the values of
any variables occurring in such expressions. This is widely though
not universally considered too restrictive.
Raymond Hettinger is the main advocate of this approach. He proposes
a syntax where only a single literal of certain types is allowed as
the case expression. It has the advantage of being unambiguous and
easy to implement.
2006-06-26 14:47:03 -04:00
My main complaint about this is that by disallowing "named constants"
we force programmers to give up good habits. Named constants are
introduced in most languages to solve the problem of "magic numbers"
occurring in the source code. For example, sys.maxint is a lot more
readable than 2147483647. Raymond proposes to use string literals
instead of named "enums", observing that the string literal's content
can be the name that the constant would otherwise have. Thus, we
could write "case 'IGNORECASE':" instead of "case re.IGNORECASE:".
However, if there is a spelling error in the string literal, the case
2006-06-26 14:47:03 -04:00
will silently be ignored, and who knows when the bug is detected. If
there is a spelling error in a NAME, however, the error will be caught
as soon as it is evaluated. Also, sometimes the constants are
2006-06-26 14:47:03 -04:00
externally defined (e.g. when parsing a file format like JPEG) and we
can't easily choose appropriate string values. Using an explicit
mappping dict sounds like a poor hack.
Option 2
''''''''
The oldest proposal to deal with this is to freeze the dispatch dict
the first time the switch is executed. At this point we can assume
that all the named "constants" (constant in the programmer's mind,
though not to the compiler) used as case expressions are defined --
otherwise an if/elif chain would have little chance of success either.
Assuming the switch will be executed many times, doing some extra work
the first time pays back quickly by very quick dispatch times later.
An objection to this option is that there is no obvious object where
the dispatch dict can be stored. It can't be stored on the code
object, which is supposed to be immutable; it can't be stored on the
function object, since many function objects may be created for the
same function (e.g. for nested functions). In practice, I'm sure that
something can be found; it could be stored in a section of the code
object that's not considered when comparing two code objects or when
pickling or marshalling a code object; or all switches could be stored
in a dict indexed by weak references to code objects. The solution
should also be careful not to leak switch dicts between multiple
interpreters.
Another objection is that the first-use rule allows obfuscated code
like this::
def foo(x, y):
switch x:
case y:
print 42
To the untrained eye (not familiar with Python) this code would be
equivalent to this::
def foo(x, y):
if x == y:
print 42
but that's not what it does (unless it is always called with the same
value as the second argument). This has been addressed by suggesting
that the case expressions should not be allowed to reference local
variables, but this is somewhat arbitrary.
A final objection is that in a multi-threaded application, the
first-use rule requires intricate locking in order to guarantee the
correct semantics. (The first-use rule suggests a promise that side
effects of case expressions are incurred exactly once.) This may be
as tricky as the import lock has proved to be, since the lock has to
be held while all the case expressions are being evaluated.
Option 3
''''''''
A proposal that has been winning support (including mine) is to freeze
a switch's dict when the innermost function containing it is defined.
The switch dict is stored on the function object, just as parameter
defaults are, and in fact the case expressions are evaluated at the
same time and in the same scope as the parameter defaults (i.e. in the
scope containing the function definition).
This option has the advantage of avoiding many of the finesses needed
to make option 2 work: there's no need for locking, no worry about
immutable code objects or multiple interpreters. It also provides a
clear explanation for why locals can't be referenced in case
expressions.
This option works just as well for situations where one would
typically use a switch; case expressions involving imported or global
named constants work exactly the same way as in option 2, as long as
they are imported or defined before the function definition is
encountered.
A downside however is that the dispatch dict for a switch inside a
nested function must be recomputed each time the nested function is
defined. For certain "functional" styles of programming this may make
switch unattractive in nested functions. (Unless all case expressions
are compile-time constants; then the compiler is of course free to
optimize away the swich freezing code and make the dispatch table part
of the code object.)
Another downside is that under this option, there's no clear moment
when the dispatch dict is frozen for a switch that doesn't occur
inside a function. There are a few pragmatic choices for how to treat
a switch outside a function:
(a) Disallow it.
(b) Translate it into an if/elif chain.
(c) Allow only compile-time constant expressions.
(d) Compute the dispatch dict each time the switch is reached.
(e) Like (b) but tests that all expressions evaluated are hashable.
Of these, (a) seems too restrictive: it's uniformly worse than (c);
and (d) has poor performance for little or no benefits compared to
(b). It doesn't make sense to have a performance-critical inner loop
at the module level, as all local variable references are slow there;
hence (b) is my (weak) favorite. Perhaps I should favor (e), which
attempts to prevent atypical use of a switch; examples that work
interactively but not in a function are annoying. In the end I don't
think this issue is all that important (except it must be resolved
somehow) and am willing to leave it up to whoever ends up implementing
it.
When a switch occurs in a class but not in a function, we can freeze
the dispatch dict at the same time the temporary function object
representing the class body is created. This means the case
expressions can reference module globals but not class variables.
Alternatively, if we choose (b) above, we could choose this
implementation inside a class definition as well.
Option 4
''''''''
There are a number of proposals to add a construct to the language
that makes the concept of a value pre-computed at function definition
time generally available, without tying it either to parameter default
values or case expressions. Some keywords proposed include 'const',
'static', 'only' or 'cached'. The associated syntax and semantics
vary.
These proposals are out of scope for this PEP, except to suggest that
*if* such a proposal is accepted, there are two ways for the switch to
benefit: we could require case expressions to be either compile-time
constants or pre-computed values; or we could make pre-computed values
the default (and only) evaluation mode for case expressions. The
latter would be my preference, since I don't see a use for more
dynamic case expressions that isn't addressed adequately by writing an
explicit if/elif chain.
Conclusion
==========
It is too early to decide. I'd like to see at least one completed
proposal for pre-computed values before deciding. In the mean time,
Python is fine without a switch statement, and perhaps those who claim
it would be a mistake to add one are right.
Copyright
=========
This document has been placed in the public domain.
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End: