PEP: 3103 Title: A Switch/Case Statement Version: $Revision$ Last-Modified: $Date$ Author: guido@python.org (Guido van Rossum) Status: Draft Type: Standards Track Python-Version: 3.0 Content-Type: text/x-rst Created: 25-Jun-2006 Post-History: never Abstract ======== Python-dev has recently seen a flurry of discussion on adding a switch statement. In this PEP I'm trying to extract my own preferences from the smorgasboard of proposals, discussing alternatives and explaining my choices where I can. I'll also indicate how strongly I feel about alternatives I discuss. This PEP should be seen as an alternative to PEP 275. My views are somewhat different from that PEP's author, but I'm grateful for the work done in that PEP. Rationale ========= A common programming idiom is to consider an expression and do different things depending on its value. This is usually done with a chain of if/elif tests; I'll refer to this form as the "if/elif chain". There are two main motivations to want to introduce new syntax for this idiom: - It is repetitive: the variable and the test operator, usually '==' or 'in', are repeated in each if/elif branch. - It is inefficient: when an expressaion matches the last test value (or no test value at all) it is compared to each of the preceding test values. Both of these complaints are relatively mild; there isn't a lot of readability or performance to be gained by writing this differently. Yet, some kind of switch statement is found in many languages and it is not unreasonable to expect that its addition to Python will allow us to write up certain code more cleanly and efficiently than before. There are forms of dispatch that are not suitable for the proposed switch statement; for example, when the number of cases is not statically known, or when it is desirable to place the code for different cases in different classes or files. Basic Syntax ============ I'm considering several variants of the syntax first proposed in PEP 275 here. There are lots of other possibilities, but I don't see that they add anything. My current preference is alternative 2. I should not that all alternatives here have the "implicit break" property: at the end of the suite for a particular case, the control flow jumps to the end of the whole switch statement. There is no way to pass control from one case to another. This in contrast to C, where an explicit 'break' statement is required to prevent falling through to the next case. In all alternatives, the else-suite is optional. It is more Pythonic to use 'else' here rather than introducing a new reserved word, 'default', as in C. Semantics are discussed in the next top-level section. Alternative 1 ------------- This is the preferred form in PEP 275:: switch EXPR: case EXPR: SUITE case EXPR: SUITE ... else: SUITE The main downside is that the suites where all the action is are indented two levels deep. Alternative 2 ------------- This is Fredrik Lundh's preferred form; it differs by not indenting the cases:: switch EXPR: case EXPR: SUITE case EXPR: SUITE .... else: SUITE Alternative 3 ------------- This is the same as alternative 2 but leaves out the colon after the switch:: switch EXPR case EXPR: SUITE case EXPR: SUITE .... else: SUITE The hope of this alternative is that is will upset the auto-indent logic of the average Python-aware text editor less. But it looks strange to me. Alternative 4 ------------- This leaves out the 'case' keyword on the basis that it is redundant:: switch EXPR: EXPR: SUITE EXPR: SUITE ... else: SUITE Unfortunately now we are forced to indent the case expressions, because otherwise (at least in the absence of an 'else' keyword) the parser would have a hard time distinguishing between an unindented case expression (which continues the switch statement) or an unrelated statement that starts like an expression (such as an assignment or a procedure call). The parser is not smart enough to backtrack once it sees the colon. This is my least favorite alternative. Extended Syntax =============== There is one additional concern that needs to be addressed syntactically. Often two or more values need to be treated the same. In C, this done by writing multiple case labels together without any code between them. The "fall through" semantics then mean that these are all handled by the same code. Since the Python switch will not have fall-through semantics (which have yet to find a champion) we need another solution. Here are some alternatives. Alternative A ------------- Use:: case EXPR: to match on a single expression; use:: case EXPR, EXPR, ...: to match on mulltiple expressions. The is interpreted so that if EXPR is a parenthesized tuple or another expression whose value is a tuple, the switch expression must equal that tuple, not one of its elements. This means that we cannot use a variable to indicate multiple cases. While this is also true in C's switch statement, it is a relatively common occurrence in Python (see for example sre_compile.py). Alternative B ------------- Use:: case EXPR: to match on a single expression; use:: case in EXPR_LIST: to match on multiple expressions. If EXPR_LIST is a single expression, the 'in' forces its interpretation as an iterable (or something supporting __contains__, in a minority semantics alternative). If it is multiple expressions, each of those is considered for a match. Alternative C ------------- Use:: case EXPR: to match on a single expression; use:: case EXPR, EXPR, ...: to match on multiple expressions (as in alternative A); and use:: case *EXPR: to match on the elements of an expression whose value is an iterable. The latter two cases can be combined, so that the true syntax is more like this:: case [*]EXPR, [*]EXPR, ...: Note that the * notation is similar to the use of prefix * already in use for variable-length parameter lists and for passing computed argument lists, and often proposed for value-unpacking (e.g. "a, b, *c = X" as an alternative to "(a, b), c = X[:2], X[2:]"). Alternative D ------------- This is a mixture of alternatives B and C; the syntax is like alternative B but instead of the 'in' keyword it uses '*'. This is more limited, but still allows the same flexibility. It uses:: case EXPR: to match on a single expression and:: case *EXPR: to match on the elements of an iterable. If one wants to specify multiple matches in one case, one can write this:: case *(EXPR, EXPR, ...): or perhaps this (although it's a bit strange because the relative priority of '*' and ',' is different than elsewhere):: case * EXPR, EXPR, ...: Discussion ---------- Alternatives B, C and D are motivated by the desire to specify multiple cases with the same treatment using a variable representing a set (usually a tuple) rather than spelling them out. The motivation for this is usually that if one has several switches over the same set of cases it's a shame to have to spell out all the alternatives each time. An additional motivation is to be able to specify *ranges* to be matched easily and efficiently, similar to Pascal's "1..1000:" notation. At the same time we want to prevent the kind of mistake that is common in exception handling (and which will be addressed in Python 3000 by changing the syntax of the except clause): writing "case 1, 2:" where "case (1, 2):" was meant, or vice versa. The case could be made that the need is insufficient for the added complexity; C doesn't have a way to express ranges either, and it's used a lot more than Pascal these days. Also, if a dispatch method based on dict lookup is chosen as the semantics, large ranges could be inefficient (consider range(1, sys.maxint)). All in all my preferences are (in descending preference) B, A, D', C where D' is D without the third possibility. Semantics ========= There are several issues to review before we can choose the right semantics. If/Elif Chain vs. Dict-based Dispatch ------------------------------------- There are two main schools of thought about the switch statement's semantics. School I wants to define the switch statement in term of an equivalent if/elif chain. School II prefers to think of it as a dispatch on a precomputed dictionary. The difference is mainly important when either the switch expression or one of the case expressions is not hashable; school I wants this to be handled as it would be by an if/elif chain (i.e. hashability of the expressions involved doesn't matter) while school II is willing to say that the switch expression and all the case expressions must be hashable if a switch is to be used; otherwise the user should have written an if/elif chain. There's also a difference of opinion regarding the treatment of duplicate cases (i.e. two or more cases with the same match expression). School I wants to treat this the same is an if/elif chain would treat it (i.e. the first match wins and the code for the second match is silently unreachable); school II generally wants this to be an error at the time the switch is frozen. There's also a school III which states that the definition of a switch statement should be in terms of an equivalent if/elif chain, with the exception that all the expressions must be hashable. School I believes that the if/elif chain is the only reasonably, surprise-free of defining switch semantics, and that optimizations as suggested by PEP 275's Solution 1 are sufficient to make most common uses fast. School II sees nothing but trouble in that approach: in an if/elif chain, the test "x == y" might well be comparing two unhashable values (e.g. two lists); even "x == 1" could be comparing a user-defined class instance that is not hashable but happens to define equality to integers. Worse, the hash function might have a bug or a side effect; if we generate code that believes the hash, a buggy hash might generate an incorrect match, and if we generate code that catches errors in the hash to fall back on an if/elif chain, we might hide genuine bugs. In addition, school II sees little value in allowing cases involving unhashable values; after all if the user expects such values, they can just as easily write an if/elif chain. School II also doesn't believe that it's fair to allow dead code due to overlappin cases to occur unflagged, when the dict-based dispatch implementation makes it so easy to trap this. School III admits the problems with making hash() optional, but still believes that the true semantics should be defined by an if/elif chain even if the implementation should be allowed to use dict-based dispatch as an optimization. This means that duplicate cases must be resolved by always choosing the first case, making the second case undiagnosed dead code. Personally, I'm in school II: I believe that the dict-based dispatch is the one true implementation for switch statements and that we should face the limitiations and benefits up front. When to Freeze the Dispatch Dict -------------------------------- For the supporters of school II (dict-based dispatch), the next big dividing issue is when to create the dict used for switching. I call this "freezing the dict". The main problem that makes this interesting is the observation that Python doesn't have named compile-time constants. What is conceptually a constant, such as re.IGNORECASE, is a variable to the compiler, and there's nothing to stop crooked code from modifying its value. Option 1 '''''''' The most limiting option is to freeze the dict in the compiler. This would require that the case expressions are all literals or compile-time expressions involving only literals and operators whose semantics are known to the compiler, since with the current state of Python's dynamic semantics and single-module compilation, there is no hope for the compiler to know with sufficient certainty the values of any variables occurring in such expressions. This is widely though not universally considered too restrictive. Raymond Hettinger is the main advocate of this approach. He proposes a syntax where only a single literal of certain types is allowed as the case expression. It has the advantage of being unambiguous and easy to implement. My may complaint about this is that by disallowing "named constants" we force programmers to give up good habits. Named constants are introduced in most languages to solve the problem of "magic numbers" occurring in the source code. For example, sys.maxint is a lot more readable than 2147483647. Raymond proposes to use string literals instead of named "enums", observing that the string literal's content can be the name that the constant would otherwise have. Thus, we could write "case 'IGNORECASE':" instead of "case re.IGNORECASE:". However, if there is a spelling error in the string literal, the case will silently be ignored, and who knows when the bug is detected. If there is a spelling error in a NAME, however, the error will be caught as soon as it is evaluated. Also, sometimes the constants are externally defined (e.g. when parsing an file format like JPEG) and we can't easily choose appropriate string values. Using an explicit mappping dict sounds like a poor hack. Option 2 '''''''' The oldest proposal to deal with this is to freeze the dispatch dict the first time the switch is executed. At this point we can assume that all the named "constants" (constant in the programmer's mind, though not to the compiler) used as case expressions are defined -- otherwise an if/elif chain would have little chance of success either. Assuming the switch will be executed many times, doing some extra work the first time pays back quickly by very quick dispatch times later. A mostly theoretical objection to this option is that there is no obvious object where the dispatch dict can be stored. It can't be stored on the code object, which is supposed to be immutable; it can't be stored on the function object, since many function objects may be created for the same function (e.g. for nested functions). In practice, I'm sure that something can be found; it could be stored in a section of the code object that's not considered when comparing two code objects or when pickling or marshalling a code object; or all switches could be stored in a dict indexed by weak references to code objects. Another objection is that the first-use rule allows obfuscated code like this:: def foo(x, y): switch x: case y: print 42 To the untrained eye (not familiar with Python) this code would be equivalent to this:: def foo(x, y): if x == y: print 42 but that's not what it does (unless it is always called with the same value as the second argument). This has been addressed by suggesting that the case expressions should not be allowed to reference local variables. But this is somewhat arbitrary. A final objection is that in a multi-threaded application, the first-use rule requires intricate locking in order to guarantee the correct semantics. (The first-use rule suggests a promise that side effects of case expressions are incurred exactly once.) Option 3 '''''''' TBD Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: