Near-total rewrite of the section describing the different semantic

schools, inspired by discussion on python-dev with Ka-Ping.  Hopefully
the different schools and their relative advantages and disadvantages
are represented more correctly now.
This commit is contained in:
Guido van Rossum 2006-06-27 05:06:58 +00:00
parent f9467398b6
commit f36e51cc18
1 changed files with 107 additions and 56 deletions

View File

@ -26,7 +26,7 @@ work done in that PEP.
This PEP introduces canonical names for the many variants that have
been discussed for different aspects of the syntax and semantics, such
as "alternative 2", "school II", "Option 3" and so on. Hopefully
as "alternative 1", "school II", "option 3" and so on. Hopefully
these names will help the discussion.
@ -223,8 +223,8 @@ like this::
The `*` notation is similar to the use of prefix `*` already in use for
variable-length parameter lists and for passing computed argument
lists, and often proposed for value-unpacking (e.g. "a, b, *c = X" as
an alternative to "(a, b), c = X[:2], X[2:]").
lists, and often proposed for value-unpacking (e.g. ``a, b, *c = X`` as
an alternative to ``(a, b), c = X[:2], X[2:]``).
Alternative D
-------------
@ -270,8 +270,8 @@ used a lot more than Pascal these days. Also, if a dispatch method
based on dict lookup is chosen as the semantics, large ranges could be
inefficient (consider range(1, sys.maxint)).
All in all my preferences are (in descending preference) B, A, D', C
where D' is D without the third possibility.
All in all my preferences are (from most to least favorite) B, A, D',
C, where D' is D without the third possibility.
Semantics
@ -283,66 +283,117 @@ semantics.
If/Elif Chain vs. Dict-based Dispatch
-------------------------------------
There are two main schools of thought about the switch statement's
semantics. School I wants to define the switch statement in term of
an equivalent if/elif chain. School II prefers to think of it as a
dispatch on a precomputed dictionary.
There are several main schools of thought about the switch statement's
semantics:
The difference is mainly important when either the switch expression
or one of the case expressions is not hashable; school I wants this to
be handled as it would be by an if/elif chain (i.e. hashability of the
expressions involved doesn't matter) while school II is willing to say
that the switch expression and all the case expressions must be
hashable if a switch is to be used; otherwise the user should have
written an if/elif chain.
- School I wants to define the switch statement in term of an
equivalent if/elif chain (possibly with some optimization thrown
in).
There's also a difference of opinion regarding the treatment of
duplicate cases (i.e. two or more cases with the same match
expression). School I wants to treat this the same is an if/elif
chain would treat it (i.e. the first match wins and the code for the
second match is silently unreachable); school II generally wants this
to be an error at the time the switch is frozen.
- School II prefers to think of it as a dispatch on a precomputed
dict. There are different choices for when the precomputation
happens.
There's also a school III which states that the definition of a switch
statement should be in terms of an equivalent if/elif chain, with the
exception that all the expressions must be hashable.
- There's also school III, which agrees with School I that the
definition of a switch statement should be in terms of an equivalent
if/elif chain, but concedes to the optimization camp that all
expressions involved must be hashable.
School I believes that the if/elif chain is the only reasonable,
surprise-free way of defining switch semantics, and that optimizations
as suggested by PEP 275's Solution 1 are sufficient to make most
common uses fast. School I sees trouble in the approach of
pre-freezing a dispatch dictionary because it places a new and unusual
burden on programmers to understand exactly what kinds of case values
are allowed to be frozen and when the case values will be frozen, or
they might be surprised by the switch statement's behavior.
We need to further separate School I into School Ia and School Ib:
School II sees trouble in trying to achieve semantics that match
those of an if/elif chain while optimizing the switch statement into
a hash lookup in a dispatch dictionary. In an if/elif chain, the
test "x == y" might well be comparing two unhashable values
(e.g. two lists); even "x == 1" could be comparing a user-defined
class instance that is not hashable but happens to define equality to
integers. Worse, the hash function might have a bug or a side effect;
if we generate code that believes the hash, a buggy hash might
generate an incorrect match, and if we generate code that catches
errors in the hash to fall back on an if/elif chain, we might hide
genuine bugs. In addition, school II sees little value in allowing
cases involving unhashable values; after all if the user expects such
values, they can just as easily write an if/elif chain. School II
also doesn't believe that it's fair to allow dead code due to
overlapping cases to occur unflagged, when the dict-based dispatch
implementation makes it so easy to trap this.
- School Ia has a simple position: a switch statement is translated to
an equivalent if/elif chain, and that's that. It should not be
linked to optimization at all. That is also my main objection
against this school: without any hint of optimization, the switch
statement isn't attractive enough to warrant new syntax.
School III admits the problems with making hash() optional, but still
believes that the true semantics should be defined by an if/elif chain
even if the implementation should be allowed to use dict-based
dispatch as an optimization. This means that duplicate cases must be
resolved by always choosing the first case, making the second case
undiagnosed dead code.
- School Ib has a more complex position: it agrees with School II that
optimization is important, and is willing to concede the compiler
certain liberties to allow this. (For example, PEP 275 Solution 1.)
In particular, hash() of the switch and case expressions may or may
not be called (so it should be side-effect-free); and the case
expressions may not be evaluated each time as expected by the
if/elif chain behavior, so the case expressions should also be
side-effect free. My objection to this (elaborated below) is that
if either the hash() or the case expressions aren't
side-effect-free, optimized and unoptimized code may behave
differently.
School II grew out of the realization that optimization of commonly
found cases isn't so easy, and that it's better to face this head on.
This will become clear below.
The differences between School I (mostly School Ib) and School II are
threefold:
- When optimizing using a dispatch dict, if either the switch
expression or the case expressions are unhashable (in which case
hash() raises an exception), School Ib requires catching the hash()
failure and falling back to an if/elif chain. School II simply lets
the exception happen. The problem with catching an exception in
hash() as required by School Ib, is that this may hide a genuine
bug. A possible way out is to only use a dispatch dict if all case
expressions are ints, strings or other built-ins with known good
hash behavior, and to only attempt to hash the switch expression if
it is also one of those types. Type objects should probably also be
supported here. This is the (only) problem that School III
addresses.
- When optimizing using a dispatch dict, if the hash() function of any
expression involved returns an incorrect value, under school Ib,
optimized code will not behave the same as unoptimized code. This
is a well-known problem with optimization-related bugs, and waste
lots of developer time. Under School II, in this situation
incorrect results are produced at least consistently, which should
make debugging a bit easier. The way out proposed for the previous
bullet would also help here.
- School Ib doesn't have a good optimization strategy if the case
expressions are named constants. The compiler cannot know their
values for sure, and it cannot know whether they are truly constant.
As a way out, it has been proposed to re-evaluate the expression
corresponding to the case once the dict has identified which case
should be taken, to verify that the value of the expression didn't
change. But strictly speaking, all the case expressions occurring
before that case would also have to be checked, in order to preserve
the true if/elif chain semantics, thereby completely killing the
optimization. Another proposed solution is to have callbacks
notifying the dispatch dict of changes in the value of variables or
attributes involved in the case expressions. But this is not likely
implementable in the general case, and would require many namespaces
to bear the burden of supporting such callbacks, which currently
don't exist at all.
- Finally, there's a difference of opinion regarding the treatment of
duplicate cases (i.e. two or more cases with match expressions that
evaluates to the same value). School I wants to treat this the same
is an if/elif chain would treat it (i.e. the first match wins and
the code for the second match is silently unreachable); School II
wants this to be an error at the time the dispatch dict is frozen
(so dead code doesn't go undiagnosed).
School I sees trouble in School II's approach of pre-freezing a
dispatch dict because it places a new and unusual burden on
programmers to understand exactly what kinds of case values are
allowed to be frozen and when the case values will be frozen, or they
might be surprised by the switch statement's behavior.
School II doesn't believe that School Ia's unoptimized switch is worth
the effort, and it sees trouble in School Ib's proposal for
optimization, which can cause optimized and unoptimized code to behave
differently.
In addition, school II sees little value in allowing cases involving
unhashable values; after all if the user expects such values, they can
just as easily write an if/elif chain. School II also doesn't believe
that it's right to allow dead code due to overlapping cases to occur
unflagged, when the dict-based dispatch implementation makes it so
easy to trap this.
Personally, I'm in school II: I believe that the dict-based dispatch
is the one true implementation for switch statements and that we
should face the limitiations and benefits up front.
should face the limitiations up front, so that we can reap maximal
benefits.
When to Freeze the Dispatch Dict
--------------------------------