diff --git a/pep-3103.txt b/pep-3103.txt index de55d4324..17c2c410e 100644 --- a/pep-3103.txt +++ b/pep-3103.txt @@ -26,7 +26,7 @@ work done in that PEP. This PEP introduces canonical names for the many variants that have been discussed for different aspects of the syntax and semantics, such -as "alternative 2", "school II", "Option 3" and so on. Hopefully +as "alternative 1", "school II", "option 3" and so on. Hopefully these names will help the discussion. @@ -223,8 +223,8 @@ like this:: The `*` notation is similar to the use of prefix `*` already in use for variable-length parameter lists and for passing computed argument -lists, and often proposed for value-unpacking (e.g. "a, b, *c = X" as -an alternative to "(a, b), c = X[:2], X[2:]"). +lists, and often proposed for value-unpacking (e.g. ``a, b, *c = X`` as +an alternative to ``(a, b), c = X[:2], X[2:]``). Alternative D ------------- @@ -270,8 +270,8 @@ used a lot more than Pascal these days. Also, if a dispatch method based on dict lookup is chosen as the semantics, large ranges could be inefficient (consider range(1, sys.maxint)). -All in all my preferences are (in descending preference) B, A, D', C -where D' is D without the third possibility. +All in all my preferences are (from most to least favorite) B, A, D', +C, where D' is D without the third possibility. Semantics @@ -283,66 +283,117 @@ semantics. If/Elif Chain vs. Dict-based Dispatch ------------------------------------- -There are two main schools of thought about the switch statement's -semantics. School I wants to define the switch statement in term of -an equivalent if/elif chain. School II prefers to think of it as a -dispatch on a precomputed dictionary. +There are several main schools of thought about the switch statement's +semantics: -The difference is mainly important when either the switch expression -or one of the case expressions is not hashable; school I wants this to -be handled as it would be by an if/elif chain (i.e. hashability of the -expressions involved doesn't matter) while school II is willing to say -that the switch expression and all the case expressions must be -hashable if a switch is to be used; otherwise the user should have -written an if/elif chain. +- School I wants to define the switch statement in term of an + equivalent if/elif chain (possibly with some optimization thrown + in). -There's also a difference of opinion regarding the treatment of -duplicate cases (i.e. two or more cases with the same match -expression). School I wants to treat this the same is an if/elif -chain would treat it (i.e. the first match wins and the code for the -second match is silently unreachable); school II generally wants this -to be an error at the time the switch is frozen. +- School II prefers to think of it as a dispatch on a precomputed + dict. There are different choices for when the precomputation + happens. -There's also a school III which states that the definition of a switch -statement should be in terms of an equivalent if/elif chain, with the -exception that all the expressions must be hashable. +- There's also school III, which agrees with School I that the + definition of a switch statement should be in terms of an equivalent + if/elif chain, but concedes to the optimization camp that all + expressions involved must be hashable. -School I believes that the if/elif chain is the only reasonable, -surprise-free way of defining switch semantics, and that optimizations -as suggested by PEP 275's Solution 1 are sufficient to make most -common uses fast. School I sees trouble in the approach of -pre-freezing a dispatch dictionary because it places a new and unusual -burden on programmers to understand exactly what kinds of case values -are allowed to be frozen and when the case values will be frozen, or -they might be surprised by the switch statement's behavior. +We need to further separate School I into School Ia and School Ib: -School II sees trouble in trying to achieve semantics that match -those of an if/elif chain while optimizing the switch statement into -a hash lookup in a dispatch dictionary. In an if/elif chain, the -test "x == y" might well be comparing two unhashable values -(e.g. two lists); even "x == 1" could be comparing a user-defined -class instance that is not hashable but happens to define equality to -integers. Worse, the hash function might have a bug or a side effect; -if we generate code that believes the hash, a buggy hash might -generate an incorrect match, and if we generate code that catches -errors in the hash to fall back on an if/elif chain, we might hide -genuine bugs. In addition, school II sees little value in allowing -cases involving unhashable values; after all if the user expects such -values, they can just as easily write an if/elif chain. School II -also doesn't believe that it's fair to allow dead code due to -overlapping cases to occur unflagged, when the dict-based dispatch -implementation makes it so easy to trap this. +- School Ia has a simple position: a switch statement is translated to + an equivalent if/elif chain, and that's that. It should not be + linked to optimization at all. That is also my main objection + against this school: without any hint of optimization, the switch + statement isn't attractive enough to warrant new syntax. -School III admits the problems with making hash() optional, but still -believes that the true semantics should be defined by an if/elif chain -even if the implementation should be allowed to use dict-based -dispatch as an optimization. This means that duplicate cases must be -resolved by always choosing the first case, making the second case -undiagnosed dead code. +- School Ib has a more complex position: it agrees with School II that + optimization is important, and is willing to concede the compiler + certain liberties to allow this. (For example, PEP 275 Solution 1.) + In particular, hash() of the switch and case expressions may or may + not be called (so it should be side-effect-free); and the case + expressions may not be evaluated each time as expected by the + if/elif chain behavior, so the case expressions should also be + side-effect free. My objection to this (elaborated below) is that + if either the hash() or the case expressions aren't + side-effect-free, optimized and unoptimized code may behave + differently. + +School II grew out of the realization that optimization of commonly +found cases isn't so easy, and that it's better to face this head on. +This will become clear below. + +The differences between School I (mostly School Ib) and School II are +threefold: + +- When optimizing using a dispatch dict, if either the switch + expression or the case expressions are unhashable (in which case + hash() raises an exception), School Ib requires catching the hash() + failure and falling back to an if/elif chain. School II simply lets + the exception happen. The problem with catching an exception in + hash() as required by School Ib, is that this may hide a genuine + bug. A possible way out is to only use a dispatch dict if all case + expressions are ints, strings or other built-ins with known good + hash behavior, and to only attempt to hash the switch expression if + it is also one of those types. Type objects should probably also be + supported here. This is the (only) problem that School III + addresses. + +- When optimizing using a dispatch dict, if the hash() function of any + expression involved returns an incorrect value, under school Ib, + optimized code will not behave the same as unoptimized code. This + is a well-known problem with optimization-related bugs, and waste + lots of developer time. Under School II, in this situation + incorrect results are produced at least consistently, which should + make debugging a bit easier. The way out proposed for the previous + bullet would also help here. + +- School Ib doesn't have a good optimization strategy if the case + expressions are named constants. The compiler cannot know their + values for sure, and it cannot know whether they are truly constant. + As a way out, it has been proposed to re-evaluate the expression + corresponding to the case once the dict has identified which case + should be taken, to verify that the value of the expression didn't + change. But strictly speaking, all the case expressions occurring + before that case would also have to be checked, in order to preserve + the true if/elif chain semantics, thereby completely killing the + optimization. Another proposed solution is to have callbacks + notifying the dispatch dict of changes in the value of variables or + attributes involved in the case expressions. But this is not likely + implementable in the general case, and would require many namespaces + to bear the burden of supporting such callbacks, which currently + don't exist at all. + +- Finally, there's a difference of opinion regarding the treatment of + duplicate cases (i.e. two or more cases with match expressions that + evaluates to the same value). School I wants to treat this the same + is an if/elif chain would treat it (i.e. the first match wins and + the code for the second match is silently unreachable); School II + wants this to be an error at the time the dispatch dict is frozen + (so dead code doesn't go undiagnosed). + +School I sees trouble in School II's approach of pre-freezing a +dispatch dict because it places a new and unusual burden on +programmers to understand exactly what kinds of case values are +allowed to be frozen and when the case values will be frozen, or they +might be surprised by the switch statement's behavior. + +School II doesn't believe that School Ia's unoptimized switch is worth +the effort, and it sees trouble in School Ib's proposal for +optimization, which can cause optimized and unoptimized code to behave +differently. + +In addition, school II sees little value in allowing cases involving +unhashable values; after all if the user expects such values, they can +just as easily write an if/elif chain. School II also doesn't believe +that it's right to allow dead code due to overlapping cases to occur +unflagged, when the dict-based dispatch implementation makes it so +easy to trap this. Personally, I'm in school II: I believe that the dict-based dispatch is the one true implementation for switch statements and that we -should face the limitiations and benefits up front. +should face the limitiations up front, so that we can reap maximal +benefits. When to Freeze the Dispatch Dict --------------------------------