Checkpoint submit so I can edit this elsewhere.
This commit is contained in:
parent
4677be349b
commit
aa6cebe8df
|
@ -0,0 +1,449 @@
|
||||||
|
PEP: 3103
|
||||||
|
Title: A Switch/Case Statement
|
||||||
|
Version: $Revision$
|
||||||
|
Last-Modified: $Date$
|
||||||
|
Author: guido@python.org (Guido van Rossum)
|
||||||
|
Status: Draft
|
||||||
|
Type: Standards Track
|
||||||
|
Python-Version: 3.0
|
||||||
|
Content-Type: text/x-rst
|
||||||
|
Created: 25-Jun-2006
|
||||||
|
Post-History: never
|
||||||
|
|
||||||
|
|
||||||
|
Abstract
|
||||||
|
========
|
||||||
|
|
||||||
|
Python-dev has recently seen a flurry of discussion on adding a switch
|
||||||
|
statement. In this PEP I'm trying to extract my own preferences from
|
||||||
|
the smorgasboard of proposals, discussing alternatives and explaining
|
||||||
|
my choices where I can. I'll also indicate how strongly I feel about
|
||||||
|
alternatives I discuss.
|
||||||
|
|
||||||
|
This PEP should be seen as an alternative to PEP 275. My views are
|
||||||
|
somewhat different from that PEP's author, but I'm grateful for the
|
||||||
|
work done in that PEP.
|
||||||
|
|
||||||
|
|
||||||
|
Rationale
|
||||||
|
=========
|
||||||
|
|
||||||
|
A common programming idiom is to consider an expression and do
|
||||||
|
different things depending on its value. This is usually done with a
|
||||||
|
chain of if/elif tests; I'll refer to this form as the "if/elif
|
||||||
|
chain". There are two main motivations to want to introduce new
|
||||||
|
syntax for this idiom:
|
||||||
|
|
||||||
|
- It is repetitive: the variable and the test operator, usually '=='
|
||||||
|
or 'in', are repeated in each if/elif branch.
|
||||||
|
|
||||||
|
- It is inefficient: when an expressaion matches the last test value
|
||||||
|
(or no test value at all) it is compared to each of the preceding
|
||||||
|
test values.
|
||||||
|
|
||||||
|
Both of these complaints are relatively mild; there isn't a lot of
|
||||||
|
readability or performance to be gained by writing this differently.
|
||||||
|
Yet, some kind of switch statement is found in many languages and it
|
||||||
|
is not unreasonable to expect that its addition to Python will allow
|
||||||
|
us to write up certain code more cleanly and efficiently than before.
|
||||||
|
|
||||||
|
There are forms of dispatch that are not suitable for the proposed
|
||||||
|
switch statement; for example, when the number of cases is not
|
||||||
|
statically known, or when it is desirable to place the code for
|
||||||
|
different cases in different classes or files.
|
||||||
|
|
||||||
|
|
||||||
|
Basic Syntax
|
||||||
|
============
|
||||||
|
|
||||||
|
I'm considering several variants of the syntax first proposed in PEP
|
||||||
|
275 here. There are lots of other possibilities, but I don't see that
|
||||||
|
they add anything.
|
||||||
|
|
||||||
|
My current preference is alternative 2.
|
||||||
|
|
||||||
|
I should not that all alternatives here have the "implicit break"
|
||||||
|
property: at the end of the suite for a particular case, the control
|
||||||
|
flow jumps to the end of the whole switch statement. There is no way
|
||||||
|
to pass control from one case to another. This in contrast to C,
|
||||||
|
where an explicit 'break' statement is required to prevent falling
|
||||||
|
through to the next case.
|
||||||
|
|
||||||
|
In all alternatives, the else-suite is optional. It is more Pythonic
|
||||||
|
to use 'else' here rather than introducing a new reserved word,
|
||||||
|
'default', as in C.
|
||||||
|
|
||||||
|
Semantics are discussed in the next top-level section.
|
||||||
|
|
||||||
|
Alternative 1
|
||||||
|
-------------
|
||||||
|
|
||||||
|
This is the preferred form in PEP 275::
|
||||||
|
|
||||||
|
switch EXPR:
|
||||||
|
case EXPR:
|
||||||
|
SUITE
|
||||||
|
case EXPR:
|
||||||
|
SUITE
|
||||||
|
...
|
||||||
|
else:
|
||||||
|
SUITE
|
||||||
|
|
||||||
|
The main downside is that the suites where all the action is are
|
||||||
|
indented two levels deep.
|
||||||
|
|
||||||
|
Alternative 2
|
||||||
|
-------------
|
||||||
|
|
||||||
|
This is Fredrik Lundh's preferred form; it differs by not indenting
|
||||||
|
the cases::
|
||||||
|
|
||||||
|
switch EXPR:
|
||||||
|
case EXPR:
|
||||||
|
SUITE
|
||||||
|
case EXPR:
|
||||||
|
SUITE
|
||||||
|
....
|
||||||
|
else:
|
||||||
|
SUITE
|
||||||
|
|
||||||
|
Alternative 3
|
||||||
|
-------------
|
||||||
|
|
||||||
|
This is the same as alternative 2 but leaves out the colon after the
|
||||||
|
switch::
|
||||||
|
|
||||||
|
switch EXPR
|
||||||
|
case EXPR:
|
||||||
|
SUITE
|
||||||
|
case EXPR:
|
||||||
|
SUITE
|
||||||
|
....
|
||||||
|
else:
|
||||||
|
SUITE
|
||||||
|
|
||||||
|
The hope of this alternative is that is will upset the auto-indent
|
||||||
|
logic of the average Python-aware text editor less. But it looks
|
||||||
|
strange to me.
|
||||||
|
|
||||||
|
Alternative 4
|
||||||
|
-------------
|
||||||
|
|
||||||
|
This leaves out the 'case' keyword on the basis that it is redundant::
|
||||||
|
|
||||||
|
switch EXPR:
|
||||||
|
EXPR:
|
||||||
|
SUITE
|
||||||
|
EXPR:
|
||||||
|
SUITE
|
||||||
|
...
|
||||||
|
else:
|
||||||
|
SUITE
|
||||||
|
|
||||||
|
Unfortunately now we are forced to indent the case expressions,
|
||||||
|
because otherwise (at least in the absence of an 'else' keyword) the
|
||||||
|
parser would have a hard time distinguishing between an unindented
|
||||||
|
case expression (which continues the switch statement) or an unrelated
|
||||||
|
statement that starts like an expression (such as an assignment or a
|
||||||
|
procedure call). The parser is not smart enough to backtrack once it
|
||||||
|
sees the colon. This is my least favorite alternative.
|
||||||
|
|
||||||
|
|
||||||
|
Extended Syntax
|
||||||
|
===============
|
||||||
|
|
||||||
|
There is one additional concern that needs to be addressed
|
||||||
|
syntactically. Often two or more values need to be treated the same.
|
||||||
|
In C, this done by writing multiple case labels together without any
|
||||||
|
code between them. The "fall through" semantics then mean that these
|
||||||
|
are all handled by the same code. Since the Python switch will not
|
||||||
|
have fall-through semantics (which have yet to find a champion) we
|
||||||
|
need another solution. Here are some alternatives.
|
||||||
|
|
||||||
|
Alternative A
|
||||||
|
-------------
|
||||||
|
|
||||||
|
Use::
|
||||||
|
|
||||||
|
case EXPR:
|
||||||
|
|
||||||
|
to match on a single expression; use::
|
||||||
|
|
||||||
|
case EXPR, EXPR, ...:
|
||||||
|
|
||||||
|
to match on mulltiple expressions. The is interpreted so that if EXPR
|
||||||
|
is a parenthesized tuple or another expression whose value is a tuple,
|
||||||
|
the switch expression must equal that tuple, not one of its elements.
|
||||||
|
This means that we cannot use a variable to indicate multiple cases.
|
||||||
|
While this is also true in C's switch statement, it is a relatively
|
||||||
|
common occurrence in Python (see for example sre_compile.py).
|
||||||
|
|
||||||
|
Alternative B
|
||||||
|
-------------
|
||||||
|
|
||||||
|
Use::
|
||||||
|
|
||||||
|
case EXPR:
|
||||||
|
|
||||||
|
to match on a single expression; use::
|
||||||
|
|
||||||
|
case in EXPR_LIST:
|
||||||
|
|
||||||
|
to match on multiple expressions. If EXPR_LIST is a single
|
||||||
|
expression, the 'in' forces its interpretation as an iterable (or
|
||||||
|
something supporting __contains__, in a minority semantics
|
||||||
|
alternative). If it is multiple expressions, each of those is
|
||||||
|
considered for a match.
|
||||||
|
|
||||||
|
Alternative C
|
||||||
|
-------------
|
||||||
|
|
||||||
|
Use::
|
||||||
|
|
||||||
|
case EXPR:
|
||||||
|
|
||||||
|
to match on a single expression; use::
|
||||||
|
|
||||||
|
case EXPR, EXPR, ...:
|
||||||
|
|
||||||
|
to match on multiple expressions (as in alternative A); and use::
|
||||||
|
|
||||||
|
case *EXPR:
|
||||||
|
|
||||||
|
to match on the elements of an expression whose value is an iterable.
|
||||||
|
The latter two cases can be combined, so that the true syntax is more
|
||||||
|
like this::
|
||||||
|
|
||||||
|
case [*]EXPR, [*]EXPR, ...:
|
||||||
|
|
||||||
|
Note that the * notation is similar to the use of prefix * already in
|
||||||
|
use for variable-length parameter lists and for passing computed
|
||||||
|
argument lists, and often proposed for value-unpacking (e.g. "a, b,
|
||||||
|
*c = X" as an alternative to "(a, b), c = X[:2], X[2:]").
|
||||||
|
|
||||||
|
Alternative D
|
||||||
|
-------------
|
||||||
|
|
||||||
|
This is a mixture of alternatives B and C; the syntax is like
|
||||||
|
alternative B but instead of the 'in' keyword it uses '*'. This is
|
||||||
|
more limited, but still allows the same flexibility. It uses::
|
||||||
|
|
||||||
|
case EXPR:
|
||||||
|
|
||||||
|
to match on a single expression and::
|
||||||
|
|
||||||
|
case *EXPR:
|
||||||
|
|
||||||
|
to match on the elements of an iterable. If one wants to specify
|
||||||
|
multiple matches in one case, one can write this::
|
||||||
|
|
||||||
|
case *(EXPR, EXPR, ...):
|
||||||
|
|
||||||
|
or perhaps this (although it's a bit strange because the relative
|
||||||
|
priority of '*' and ',' is different than elsewhere)::
|
||||||
|
|
||||||
|
case * EXPR, EXPR, ...:
|
||||||
|
|
||||||
|
Discussion
|
||||||
|
----------
|
||||||
|
|
||||||
|
Alternatives B, C and D are motivated by the desire to specify
|
||||||
|
multiple cases with the same treatment using a variable representing a
|
||||||
|
set (usually a tuple) rather than spelling them out. The motivation
|
||||||
|
for this is usually that if one has several switches over the same set
|
||||||
|
of cases it's a shame to have to spell out all the alternatives each
|
||||||
|
time. An additional motivation is to be able to specify *ranges* to
|
||||||
|
be matched easily and efficiently, similar to Pascal's "1..1000:"
|
||||||
|
notation. At the same time we want to prevent the kind of mistake
|
||||||
|
that is common in exception handling (and which will be addressed in
|
||||||
|
Python 3000 by changing the syntax of the except clause): writing
|
||||||
|
"case 1, 2:" where "case (1, 2):" was meant, or vice versa.
|
||||||
|
|
||||||
|
The case could be made that the need is insufficient for the added
|
||||||
|
complexity; C doesn't have a way to express ranges either, and it's
|
||||||
|
used a lot more than Pascal these days. Also, if a dispatch method
|
||||||
|
based on dict lookup is chosen as the semantics, large ranges could be
|
||||||
|
inefficient (consider range(1, sys.maxint)).
|
||||||
|
|
||||||
|
All in all my preferences are (in descending preference) B, A, D', C
|
||||||
|
where D' is D without the third possibility.
|
||||||
|
|
||||||
|
|
||||||
|
Semantics
|
||||||
|
=========
|
||||||
|
|
||||||
|
There are several issues to review before we can choose the right
|
||||||
|
semantics.
|
||||||
|
|
||||||
|
If/Elif Chain vs. Dict-based Dispatch
|
||||||
|
-------------------------------------
|
||||||
|
|
||||||
|
There are two main schools of thought about the switch statement's
|
||||||
|
semantics. School I wants to define the switch statement in term of
|
||||||
|
an equivalent if/elif chain. School II prefers to think of it as a
|
||||||
|
dispatch on a precomputed dictionary.
|
||||||
|
|
||||||
|
The difference is mainly important when either the switch expression
|
||||||
|
or one of the case expressions is not hashable; school I wants this to
|
||||||
|
be handled as it would be by an if/elif chain (i.e. hashability of the
|
||||||
|
expressions involved doesn't matter) while school II is willing to say
|
||||||
|
that the switch expression and all the case expressions must be
|
||||||
|
hashable if a switch is to be used; otherwise the user should have
|
||||||
|
written an if/elif chain.
|
||||||
|
|
||||||
|
There's also a difference of opinion regarding the treatment of
|
||||||
|
duplicate cases (i.e. two or more cases with the same match
|
||||||
|
expression). School I wants to treat this the same is an if/elif
|
||||||
|
chain would treat it (i.e. the first match wins and the code for the
|
||||||
|
second match is silently unreachable); school II generally wants this
|
||||||
|
to be an error at the time the switch is frozen.
|
||||||
|
|
||||||
|
There's also a school III which states that the definition of a switch
|
||||||
|
statement should be in terms of an equivalent if/elif chain, with the
|
||||||
|
exception that all the expressions must be hashable.
|
||||||
|
|
||||||
|
School I believes that the if/elif chain is the only reasonably,
|
||||||
|
surprise-free of defining switch semantics, and that optimizations as
|
||||||
|
suggested by PEP 275's Solution 1 are sufficient to make most common
|
||||||
|
uses fast.
|
||||||
|
|
||||||
|
School II sees nothing but trouble in that approach: in an if/elif
|
||||||
|
chain, the test "x == y" might well be comparing two unhashable values
|
||||||
|
(e.g. two lists); even "x == 1" could be comparing a user-defined
|
||||||
|
class instance that is not hashable but happens to define equality to
|
||||||
|
integers. Worse, the hash function might have a bug or a side effect;
|
||||||
|
if we generate code that believes the hash, a buggy hash might
|
||||||
|
generate an incorrect match, and if we generate code that catches
|
||||||
|
errors in the hash to fall back on an if/elif chain, we might hide
|
||||||
|
genuine bugs. In addition, school II sees little value in allowing
|
||||||
|
cases involving unhashable values; after all if the user expects such
|
||||||
|
values, they can just as easily write an if/elif chain. School II
|
||||||
|
also doesn't believe that it's fair to allow dead code due to
|
||||||
|
overlappin cases to occur unflagged, when the dict-based dispatch
|
||||||
|
implementation makes it so easy to trap this.
|
||||||
|
|
||||||
|
School III admits the problems with making hash() optional, but still
|
||||||
|
believes that the true semantics should be defined by an if/elif chain
|
||||||
|
even if the implementation should be allowed to use dict-based
|
||||||
|
dispatch as an optimization. This means that duplicate cases must be
|
||||||
|
resolved by always choosing the first case, making the second case
|
||||||
|
undiagnosed dead code.
|
||||||
|
|
||||||
|
Personally, I'm in school II: I believe that the dict-based dispatch
|
||||||
|
is the one true implementation for switch statements and that we
|
||||||
|
should face the limitiations and benefits up front.
|
||||||
|
|
||||||
|
When to Freeze the Dispatch Dict
|
||||||
|
--------------------------------
|
||||||
|
|
||||||
|
For the supporters of school II (dict-based dispatch), the next big
|
||||||
|
dividing issue is when to create the dict used for switching. I call
|
||||||
|
this "freezing the dict".
|
||||||
|
|
||||||
|
The main problem that makes this interesting is the observation that
|
||||||
|
Python doesn't have named compile-time constants. What is
|
||||||
|
conceptually a constant, such as re.IGNORECASE, is a variable to the
|
||||||
|
compiler, and there's nothing to stop crooked code from modifying its
|
||||||
|
value.
|
||||||
|
|
||||||
|
Option 1
|
||||||
|
''''''''
|
||||||
|
|
||||||
|
The most limiting option is to freeze the dict in the compiler. This
|
||||||
|
would require that the case expressions are all literals or
|
||||||
|
compile-time expressions involving only literals and operators whose
|
||||||
|
semantics are known to the compiler, since with the current state of
|
||||||
|
Python's dynamic semantics and single-module compilation, there is no
|
||||||
|
hope for the compiler to know with sufficient certainty the values of
|
||||||
|
any variables occurring in such expressions. This is widely though
|
||||||
|
not universally considered too restrictive.
|
||||||
|
|
||||||
|
Raymond Hettinger is the main advocate of this approach. He proposes
|
||||||
|
a syntax where only a single literal of certain types is allowed as
|
||||||
|
the case expression. It has the advantage of being unambiguous and
|
||||||
|
easy to implement.
|
||||||
|
|
||||||
|
My may complaint about this is that by disallowing "named constants"
|
||||||
|
we force programmers to give up good habits. Named constants are
|
||||||
|
introduced in most languages to solve the problem of "magic numbers"
|
||||||
|
occurring in the source code. For example, sys.maxint is a lot more
|
||||||
|
readable than 2147483647. Raymond proposes to use string literals
|
||||||
|
instead of named "enums", observing that the string literal's content
|
||||||
|
can be the name that the constant would otherwise have. Thus, we
|
||||||
|
could write "case 'IGNORECASE':" instead of "case re.IGNORECASE:".
|
||||||
|
However, if there is a spelling error in the string literal, the case
|
||||||
|
will silently be ignored, and who knows when the bug is detected. If
|
||||||
|
there is a spelling error in a NAME, however, the error will be caught
|
||||||
|
as soon as it is evaluated. Also, sometimes the constants are
|
||||||
|
externally defined (e.g. when parsing an file format like JPEG) and we
|
||||||
|
can't easily choose appropriate string values. Using an explicit
|
||||||
|
mappping dict sounds like a poor hack.
|
||||||
|
|
||||||
|
Option 2
|
||||||
|
''''''''
|
||||||
|
|
||||||
|
The oldest proposal to deal with this is to freeze the dispatch dict
|
||||||
|
the first time the switch is executed. At this point we can assume
|
||||||
|
that all the named "constants" (constant in the programmer's mind,
|
||||||
|
though not to the compiler) used as case expressions are defined --
|
||||||
|
otherwise an if/elif chain would have little chance of success either.
|
||||||
|
Assuming the switch will be executed many times, doing some extra work
|
||||||
|
the first time pays back quickly by very quick dispatch times later.
|
||||||
|
|
||||||
|
A mostly theoretical objection to this option is that there is no
|
||||||
|
obvious object where the dispatch dict can be stored. It can't be
|
||||||
|
stored on the code object, which is supposed to be immutable; it can't
|
||||||
|
be stored on the function object, since many function objects may be
|
||||||
|
created for the same function (e.g. for nested functions). In
|
||||||
|
practice, I'm sure that something can be found; it could be stored in
|
||||||
|
a section of the code object that's not considered when comparing two
|
||||||
|
code objects or when pickling or marshalling a code object; or all
|
||||||
|
switches could be stored in a dict indexed by weak references to code
|
||||||
|
objects.
|
||||||
|
|
||||||
|
Another objection is that the first-use rule allows obfuscated code
|
||||||
|
like this::
|
||||||
|
|
||||||
|
def foo(x, y):
|
||||||
|
switch x:
|
||||||
|
case y:
|
||||||
|
print 42
|
||||||
|
|
||||||
|
To the untrained eye (not familiar with Python) this code would be
|
||||||
|
equivalent to this::
|
||||||
|
|
||||||
|
def foo(x, y):
|
||||||
|
if x == y:
|
||||||
|
print 42
|
||||||
|
|
||||||
|
but that's not what it does (unless it is always called with the same
|
||||||
|
value as the second argument). This has been addressed by suggesting
|
||||||
|
that the case expressions should not be allowed to reference local
|
||||||
|
variables. But this is somewhat arbitrary.
|
||||||
|
|
||||||
|
A final objection is that in a multi-threaded application, the
|
||||||
|
first-use rule requires intricate locking in order to guarantee the
|
||||||
|
correct semantics. (The first-use rule suggests a promise that side
|
||||||
|
effects of case expressions are incurred exactly once.)
|
||||||
|
|
||||||
|
Option 3
|
||||||
|
''''''''
|
||||||
|
|
||||||
|
TBD
|
||||||
|
|
||||||
|
|
||||||
|
Copyright
|
||||||
|
=========
|
||||||
|
|
||||||
|
This document has been placed in the public domain.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
..
|
||||||
|
Local Variables:
|
||||||
|
mode: indented-text
|
||||||
|
indent-tabs-mode: nil
|
||||||
|
sentence-end-double-space: t
|
||||||
|
fill-column: 70
|
||||||
|
coding: utf-8
|
||||||
|
End:
|
Loading…
Reference in New Issue