diff --git a/pep-3103.txt b/pep-3103.txt new file mode 100644 index 000000000..abb0c33f1 --- /dev/null +++ b/pep-3103.txt @@ -0,0 +1,449 @@ +PEP: 3103 +Title: A Switch/Case Statement +Version: $Revision$ +Last-Modified: $Date$ +Author: guido@python.org (Guido van Rossum) +Status: Draft +Type: Standards Track +Python-Version: 3.0 +Content-Type: text/x-rst +Created: 25-Jun-2006 +Post-History: never + + +Abstract +======== + +Python-dev has recently seen a flurry of discussion on adding a switch +statement. In this PEP I'm trying to extract my own preferences from +the smorgasboard of proposals, discussing alternatives and explaining +my choices where I can. I'll also indicate how strongly I feel about +alternatives I discuss. + +This PEP should be seen as an alternative to PEP 275. My views are +somewhat different from that PEP's author, but I'm grateful for the +work done in that PEP. + + +Rationale +========= + +A common programming idiom is to consider an expression and do +different things depending on its value. This is usually done with a +chain of if/elif tests; I'll refer to this form as the "if/elif +chain". There are two main motivations to want to introduce new +syntax for this idiom: + +- It is repetitive: the variable and the test operator, usually '==' + or 'in', are repeated in each if/elif branch. + +- It is inefficient: when an expressaion matches the last test value + (or no test value at all) it is compared to each of the preceding + test values. + +Both of these complaints are relatively mild; there isn't a lot of +readability or performance to be gained by writing this differently. +Yet, some kind of switch statement is found in many languages and it +is not unreasonable to expect that its addition to Python will allow +us to write up certain code more cleanly and efficiently than before. + +There are forms of dispatch that are not suitable for the proposed +switch statement; for example, when the number of cases is not +statically known, or when it is desirable to place the code for +different cases in different classes or files. + + +Basic Syntax +============ + +I'm considering several variants of the syntax first proposed in PEP +275 here. There are lots of other possibilities, but I don't see that +they add anything. + +My current preference is alternative 2. + +I should not that all alternatives here have the "implicit break" +property: at the end of the suite for a particular case, the control +flow jumps to the end of the whole switch statement. There is no way +to pass control from one case to another. This in contrast to C, +where an explicit 'break' statement is required to prevent falling +through to the next case. + +In all alternatives, the else-suite is optional. It is more Pythonic +to use 'else' here rather than introducing a new reserved word, +'default', as in C. + +Semantics are discussed in the next top-level section. + +Alternative 1 +------------- + +This is the preferred form in PEP 275:: + + switch EXPR: + case EXPR: + SUITE + case EXPR: + SUITE + ... + else: + SUITE + +The main downside is that the suites where all the action is are +indented two levels deep. + +Alternative 2 +------------- + +This is Fredrik Lundh's preferred form; it differs by not indenting +the cases:: + + switch EXPR: + case EXPR: + SUITE + case EXPR: + SUITE + .... + else: + SUITE + +Alternative 3 +------------- + +This is the same as alternative 2 but leaves out the colon after the +switch:: + + switch EXPR + case EXPR: + SUITE + case EXPR: + SUITE + .... + else: + SUITE + +The hope of this alternative is that is will upset the auto-indent +logic of the average Python-aware text editor less. But it looks +strange to me. + +Alternative 4 +------------- + +This leaves out the 'case' keyword on the basis that it is redundant:: + + switch EXPR: + EXPR: + SUITE + EXPR: + SUITE + ... + else: + SUITE + +Unfortunately now we are forced to indent the case expressions, +because otherwise (at least in the absence of an 'else' keyword) the +parser would have a hard time distinguishing between an unindented +case expression (which continues the switch statement) or an unrelated +statement that starts like an expression (such as an assignment or a +procedure call). The parser is not smart enough to backtrack once it +sees the colon. This is my least favorite alternative. + + +Extended Syntax +=============== + +There is one additional concern that needs to be addressed +syntactically. Often two or more values need to be treated the same. +In C, this done by writing multiple case labels together without any +code between them. The "fall through" semantics then mean that these +are all handled by the same code. Since the Python switch will not +have fall-through semantics (which have yet to find a champion) we +need another solution. Here are some alternatives. + +Alternative A +------------- + +Use:: + + case EXPR: + +to match on a single expression; use:: + + case EXPR, EXPR, ...: + +to match on mulltiple expressions. The is interpreted so that if EXPR +is a parenthesized tuple or another expression whose value is a tuple, +the switch expression must equal that tuple, not one of its elements. +This means that we cannot use a variable to indicate multiple cases. +While this is also true in C's switch statement, it is a relatively +common occurrence in Python (see for example sre_compile.py). + +Alternative B +------------- + +Use:: + + case EXPR: + +to match on a single expression; use:: + + case in EXPR_LIST: + +to match on multiple expressions. If EXPR_LIST is a single +expression, the 'in' forces its interpretation as an iterable (or +something supporting __contains__, in a minority semantics +alternative). If it is multiple expressions, each of those is +considered for a match. + +Alternative C +------------- + +Use:: + + case EXPR: + +to match on a single expression; use:: + + case EXPR, EXPR, ...: + +to match on multiple expressions (as in alternative A); and use:: + + case *EXPR: + +to match on the elements of an expression whose value is an iterable. +The latter two cases can be combined, so that the true syntax is more +like this:: + + case [*]EXPR, [*]EXPR, ...: + +Note that the * notation is similar to the use of prefix * already in +use for variable-length parameter lists and for passing computed +argument lists, and often proposed for value-unpacking (e.g. "a, b, +*c = X" as an alternative to "(a, b), c = X[:2], X[2:]"). + +Alternative D +------------- + +This is a mixture of alternatives B and C; the syntax is like +alternative B but instead of the 'in' keyword it uses '*'. This is +more limited, but still allows the same flexibility. It uses:: + + case EXPR: + +to match on a single expression and:: + + case *EXPR: + +to match on the elements of an iterable. If one wants to specify +multiple matches in one case, one can write this:: + + case *(EXPR, EXPR, ...): + +or perhaps this (although it's a bit strange because the relative +priority of '*' and ',' is different than elsewhere):: + + case * EXPR, EXPR, ...: + +Discussion +---------- + +Alternatives B, C and D are motivated by the desire to specify +multiple cases with the same treatment using a variable representing a +set (usually a tuple) rather than spelling them out. The motivation +for this is usually that if one has several switches over the same set +of cases it's a shame to have to spell out all the alternatives each +time. An additional motivation is to be able to specify *ranges* to +be matched easily and efficiently, similar to Pascal's "1..1000:" +notation. At the same time we want to prevent the kind of mistake +that is common in exception handling (and which will be addressed in +Python 3000 by changing the syntax of the except clause): writing +"case 1, 2:" where "case (1, 2):" was meant, or vice versa. + +The case could be made that the need is insufficient for the added +complexity; C doesn't have a way to express ranges either, and it's +used a lot more than Pascal these days. Also, if a dispatch method +based on dict lookup is chosen as the semantics, large ranges could be +inefficient (consider range(1, sys.maxint)). + +All in all my preferences are (in descending preference) B, A, D', C +where D' is D without the third possibility. + + +Semantics +========= + +There are several issues to review before we can choose the right +semantics. + +If/Elif Chain vs. Dict-based Dispatch +------------------------------------- + +There are two main schools of thought about the switch statement's +semantics. School I wants to define the switch statement in term of +an equivalent if/elif chain. School II prefers to think of it as a +dispatch on a precomputed dictionary. + +The difference is mainly important when either the switch expression +or one of the case expressions is not hashable; school I wants this to +be handled as it would be by an if/elif chain (i.e. hashability of the +expressions involved doesn't matter) while school II is willing to say +that the switch expression and all the case expressions must be +hashable if a switch is to be used; otherwise the user should have +written an if/elif chain. + +There's also a difference of opinion regarding the treatment of +duplicate cases (i.e. two or more cases with the same match +expression). School I wants to treat this the same is an if/elif +chain would treat it (i.e. the first match wins and the code for the +second match is silently unreachable); school II generally wants this +to be an error at the time the switch is frozen. + +There's also a school III which states that the definition of a switch +statement should be in terms of an equivalent if/elif chain, with the +exception that all the expressions must be hashable. + +School I believes that the if/elif chain is the only reasonably, +surprise-free of defining switch semantics, and that optimizations as +suggested by PEP 275's Solution 1 are sufficient to make most common +uses fast. + +School II sees nothing but trouble in that approach: in an if/elif +chain, the test "x == y" might well be comparing two unhashable values +(e.g. two lists); even "x == 1" could be comparing a user-defined +class instance that is not hashable but happens to define equality to +integers. Worse, the hash function might have a bug or a side effect; +if we generate code that believes the hash, a buggy hash might +generate an incorrect match, and if we generate code that catches +errors in the hash to fall back on an if/elif chain, we might hide +genuine bugs. In addition, school II sees little value in allowing +cases involving unhashable values; after all if the user expects such +values, they can just as easily write an if/elif chain. School II +also doesn't believe that it's fair to allow dead code due to +overlappin cases to occur unflagged, when the dict-based dispatch +implementation makes it so easy to trap this. + +School III admits the problems with making hash() optional, but still +believes that the true semantics should be defined by an if/elif chain +even if the implementation should be allowed to use dict-based +dispatch as an optimization. This means that duplicate cases must be +resolved by always choosing the first case, making the second case +undiagnosed dead code. + +Personally, I'm in school II: I believe that the dict-based dispatch +is the one true implementation for switch statements and that we +should face the limitiations and benefits up front. + +When to Freeze the Dispatch Dict +-------------------------------- + +For the supporters of school II (dict-based dispatch), the next big +dividing issue is when to create the dict used for switching. I call +this "freezing the dict". + +The main problem that makes this interesting is the observation that +Python doesn't have named compile-time constants. What is +conceptually a constant, such as re.IGNORECASE, is a variable to the +compiler, and there's nothing to stop crooked code from modifying its +value. + +Option 1 +'''''''' + +The most limiting option is to freeze the dict in the compiler. This +would require that the case expressions are all literals or +compile-time expressions involving only literals and operators whose +semantics are known to the compiler, since with the current state of +Python's dynamic semantics and single-module compilation, there is no +hope for the compiler to know with sufficient certainty the values of +any variables occurring in such expressions. This is widely though +not universally considered too restrictive. + +Raymond Hettinger is the main advocate of this approach. He proposes +a syntax where only a single literal of certain types is allowed as +the case expression. It has the advantage of being unambiguous and +easy to implement. + +My may complaint about this is that by disallowing "named constants" +we force programmers to give up good habits. Named constants are +introduced in most languages to solve the problem of "magic numbers" +occurring in the source code. For example, sys.maxint is a lot more +readable than 2147483647. Raymond proposes to use string literals +instead of named "enums", observing that the string literal's content +can be the name that the constant would otherwise have. Thus, we +could write "case 'IGNORECASE':" instead of "case re.IGNORECASE:". +However, if there is a spelling error in the string literal, the case +will silently be ignored, and who knows when the bug is detected. If +there is a spelling error in a NAME, however, the error will be caught +as soon as it is evaluated. Also, sometimes the constants are +externally defined (e.g. when parsing an file format like JPEG) and we +can't easily choose appropriate string values. Using an explicit +mappping dict sounds like a poor hack. + +Option 2 +'''''''' + +The oldest proposal to deal with this is to freeze the dispatch dict +the first time the switch is executed. At this point we can assume +that all the named "constants" (constant in the programmer's mind, +though not to the compiler) used as case expressions are defined -- +otherwise an if/elif chain would have little chance of success either. +Assuming the switch will be executed many times, doing some extra work +the first time pays back quickly by very quick dispatch times later. + +A mostly theoretical objection to this option is that there is no +obvious object where the dispatch dict can be stored. It can't be +stored on the code object, which is supposed to be immutable; it can't +be stored on the function object, since many function objects may be +created for the same function (e.g. for nested functions). In +practice, I'm sure that something can be found; it could be stored in +a section of the code object that's not considered when comparing two +code objects or when pickling or marshalling a code object; or all +switches could be stored in a dict indexed by weak references to code +objects. + +Another objection is that the first-use rule allows obfuscated code +like this:: + + def foo(x, y): + switch x: + case y: + print 42 + +To the untrained eye (not familiar with Python) this code would be +equivalent to this:: + + def foo(x, y): + if x == y: + print 42 + +but that's not what it does (unless it is always called with the same +value as the second argument). This has been addressed by suggesting +that the case expressions should not be allowed to reference local +variables. But this is somewhat arbitrary. + +A final objection is that in a multi-threaded application, the +first-use rule requires intricate locking in order to guarantee the +correct semantics. (The first-use rule suggests a promise that side +effects of case expressions are incurred exactly once.) + +Option 3 +'''''''' + +TBD + + +Copyright +========= + +This document has been placed in the public domain. + + + +.. + Local Variables: + mode: indented-text + indent-tabs-mode: nil + sentence-end-double-space: t + fill-column: 70 + coding: utf-8 + End: