Near-total rewrite of the section describing the different semantic

schools, inspired by discussion on python-dev with Ka-Ping. Hopefully the different schools and their relative advantages and disadvantages are represented more correctly now.
2006-06-27 05:06:58 +00:00 · 2006-06-27 05:06:58 +00:00 · f36e51cc18
parent f9467398b6
commit f36e51cc18
1 changed files with 107 additions and 56 deletions
--- a/pep-3103.txt
+++ b/pep-3103.txt
@ -26,7 +26,7 @@ work done in that PEP.

 This PEP introduces canonical names for the many variants that have
 been discussed for different aspects of the syntax and semantics, such
-as "alternative 2", "school II", "Option 3" and so on.  Hopefully
+as "alternative 1", "school II", "option 3" and so on.  Hopefully
 these names will help the discussion.


@ -223,8 +223,8 @@ like this::

 The `*` notation is similar to the use of prefix `*` already in use for
 variable-length parameter lists and for passing computed argument
-lists, and often proposed for value-unpacking (e.g.  "a, b, *c = X" as
-an alternative to "(a, b), c = X[:2], X[2:]").
+lists, and often proposed for value-unpacking (e.g.  ``a, b, *c = X`` as
+an alternative to ``(a, b), c = X[:2], X[2:]``).

 Alternative D
 -------------
@ -270,8 +270,8 @@ used a lot more than Pascal these days.  Also, if a dispatch method
 based on dict lookup is chosen as the semantics, large ranges could be
 inefficient (consider range(1, sys.maxint)).

-All in all my preferences are (in descending preference) B, A, D', C
-where D' is D without the third possibility.
+All in all my preferences are (from most to least favorite) B, A, D',
+C, where D' is D without the third possibility.


 Semantics
@ -283,66 +283,117 @@ semantics.
 If/Elif Chain vs. Dict-based Dispatch
 -------------------------------------

-There are two main schools of thought about the switch statement's
-semantics.  School I wants to define the switch statement in term of
-an equivalent if/elif chain.  School II prefers to think of it as a
-dispatch on a precomputed dictionary.
+There are several main schools of thought about the switch statement's
+semantics:

-The difference is mainly important when either the switch expression
-or one of the case expressions is not hashable; school I wants this to
-be handled as it would be by an if/elif chain (i.e. hashability of the
-expressions involved doesn't matter) while school II is willing to say
-that the switch expression and all the case expressions must be
-hashable if a switch is to be used; otherwise the user should have
-written an if/elif chain.
+- School I wants to define the switch statement in term of an
+  equivalent if/elif chain (possibly with some optimization thrown
+  in).

-There's also a difference of opinion regarding the treatment of
-duplicate cases (i.e. two or more cases with the same match
-expression).  School I wants to treat this the same is an if/elif
-chain would treat it (i.e. the first match wins and the code for the
-second match is silently unreachable); school II generally wants this
-to be an error at the time the switch is frozen.
+- School II prefers to think of it as a dispatch on a precomputed
+  dict.  There are different choices for when the precomputation
+  happens.

-There's also a school III which states that the definition of a switch
-statement should be in terms of an equivalent if/elif chain, with the
-exception that all the expressions must be hashable.
+- There's also school III, which agrees with School I that the
+  definition of a switch statement should be in terms of an equivalent
+  if/elif chain, but concedes to the optimization camp that all
+  expressions involved must be hashable.

-School I believes that the if/elif chain is the only reasonable,
-surprise-free way of defining switch semantics, and that optimizations
-as suggested by PEP 275's Solution 1 are sufficient to make most
-common uses fast.  School I sees trouble in the approach of
-pre-freezing a dispatch dictionary because it places a new and unusual
-burden on programmers to understand exactly what kinds of case values
-are allowed to be frozen and when the case values will be frozen, or
-they might be surprised by the switch statement's behavior.
+We need to further separate School I into School Ia and School Ib:

-School II sees trouble in trying to achieve semantics that match
-those of an if/elif chain while optimizing the switch statement into
-a hash lookup in a dispatch dictionary.  In an if/elif chain, the
-test "x == y" might well be comparing two unhashable values
-(e.g. two lists); even "x == 1" could be comparing a user-defined
-class instance that is not hashable but happens to define equality to
-integers.  Worse, the hash function might have a bug or a side effect;
-if we generate code that believes the hash, a buggy hash might
-generate an incorrect match, and if we generate code that catches
-errors in the hash to fall back on an if/elif chain, we might hide
-genuine bugs.  In addition, school II sees little value in allowing
-cases involving unhashable values; after all if the user expects such
-values, they can just as easily write an if/elif chain.  School II
-also doesn't believe that it's fair to allow dead code due to
-overlapping cases to occur unflagged, when the dict-based dispatch
-implementation makes it so easy to trap this.
+- School Ia has a simple position: a switch statement is translated to
+  an equivalent if/elif chain, and that's that.  It should not be
+  linked to optimization at all.  That is also my main objection
+  against this school: without any hint of optimization, the switch
+  statement isn't attractive enough to warrant new syntax.

-School III admits the problems with making hash() optional, but still
-believes that the true semantics should be defined by an if/elif chain
-even if the implementation should be allowed to use dict-based
-dispatch as an optimization.  This means that duplicate cases must be
-resolved by always choosing the first case, making the second case
-undiagnosed dead code.
+- School Ib has a more complex position: it agrees with School II that
+  optimization is important, and is willing to concede the compiler
+  certain liberties to allow this.  (For example, PEP 275 Solution 1.)
+  In particular, hash() of the switch and case expressions may or may
+  not be called (so it should be side-effect-free); and the case
+  expressions may not be evaluated each time as expected by the
+  if/elif chain behavior, so the case expressions should also be
+  side-effect free.  My objection to this (elaborated below) is that
+  if either the hash() or the case expressions aren't
+  side-effect-free, optimized and unoptimized code may behave
+  differently.
+
+School II grew out of the realization that optimization of commonly
+found cases isn't so easy, and that it's better to face this head on.
+This will become clear below.
+
+The differences between School I (mostly School Ib) and School II are
+threefold:
+
+- When optimizing using a dispatch dict, if either the switch
+  expression or the case expressions are unhashable (in which case
+  hash() raises an exception), School Ib requires catching the hash()
+  failure and falling back to an if/elif chain.  School II simply lets
+  the exception happen.  The problem with catching an exception in
+  hash() as required by School Ib, is that this may hide a genuine
+  bug.  A possible way out is to only use a dispatch dict if all case
+  expressions are ints, strings or other built-ins with known good
+  hash behavior, and to only attempt to hash the switch expression if
+  it is also one of those types.  Type objects should probably also be
+  supported here.  This is the (only) problem that School III
+  addresses.
+
+- When optimizing using a dispatch dict, if the hash() function of any
+  expression involved returns an incorrect value, under school Ib,
+  optimized code will not behave the same as unoptimized code.  This
+  is a well-known problem with optimization-related bugs, and waste
+  lots of developer time.  Under School II, in this situation
+  incorrect results are produced at least consistently, which should
+  make debugging a bit easier.  The way out proposed for the previous
+  bullet would also help here.
+
+- School Ib doesn't have a good optimization strategy if the case
+  expressions are named constants.  The compiler cannot know their
+  values for sure, and it cannot know whether they are truly constant.
+  As a way out, it has been proposed to re-evaluate the expression
+  corresponding to the case once the dict has identified which case
+  should be taken, to verify that the value of the expression didn't
+  change.  But strictly speaking, all the case expressions occurring
+  before that case would also have to be checked, in order to preserve
+  the true if/elif chain semantics, thereby completely killing the
+  optimization.  Another proposed solution is to have callbacks
+  notifying the dispatch dict of changes in the value of variables or
+  attributes involved in the case expressions.  But this is not likely
+  implementable in the general case, and would require many namespaces
+  to bear the burden of supporting such callbacks, which currently
+  don't exist at all.
+
+- Finally, there's a difference of opinion regarding the treatment of
+  duplicate cases (i.e. two or more cases with match expressions that
+  evaluates to the same value).  School I wants to treat this the same
+  is an if/elif chain would treat it (i.e. the first match wins and
+  the code for the second match is silently unreachable); School II
+  wants this to be an error at the time the dispatch dict is frozen
+  (so dead code doesn't go undiagnosed).
+
+School I sees trouble in School II's approach of pre-freezing a
+dispatch dict because it places a new and unusual burden on
+programmers to understand exactly what kinds of case values are
+allowed to be frozen and when the case values will be frozen, or they
+might be surprised by the switch statement's behavior.
+
+School II doesn't believe that School Ia's unoptimized switch is worth
+the effort, and it sees trouble in School Ib's proposal for
+optimization, which can cause optimized and unoptimized code to behave
+differently.
+
+In addition, school II sees little value in allowing cases involving
+unhashable values; after all if the user expects such values, they can
+just as easily write an if/elif chain.  School II also doesn't believe
+that it's right to allow dead code due to overlapping cases to occur
+unflagged, when the dict-based dispatch implementation makes it so
+easy to trap this.

 Personally, I'm in school II: I believe that the dict-based dispatch
 is the one true implementation for switch statements and that we
-should face the limitiations and benefits up front.
+should face the limitiations up front, so that we can reap maximal
+benefits.

 When to Freeze the Dispatch Dict
 --------------------------------