Near-total rewrite of the section describing the different semantic
schools, inspired by discussion on python-dev with Ka-Ping. Hopefully the different schools and their relative advantages and disadvantages are represented more correctly now.
This commit is contained in:
parent
f9467398b6
commit
f36e51cc18
163
pep-3103.txt
163
pep-3103.txt
|
@ -26,7 +26,7 @@ work done in that PEP.
|
|||
|
||||
This PEP introduces canonical names for the many variants that have
|
||||
been discussed for different aspects of the syntax and semantics, such
|
||||
as "alternative 2", "school II", "Option 3" and so on. Hopefully
|
||||
as "alternative 1", "school II", "option 3" and so on. Hopefully
|
||||
these names will help the discussion.
|
||||
|
||||
|
||||
|
@ -223,8 +223,8 @@ like this::
|
|||
|
||||
The `*` notation is similar to the use of prefix `*` already in use for
|
||||
variable-length parameter lists and for passing computed argument
|
||||
lists, and often proposed for value-unpacking (e.g. "a, b, *c = X" as
|
||||
an alternative to "(a, b), c = X[:2], X[2:]").
|
||||
lists, and often proposed for value-unpacking (e.g. ``a, b, *c = X`` as
|
||||
an alternative to ``(a, b), c = X[:2], X[2:]``).
|
||||
|
||||
Alternative D
|
||||
-------------
|
||||
|
@ -270,8 +270,8 @@ used a lot more than Pascal these days. Also, if a dispatch method
|
|||
based on dict lookup is chosen as the semantics, large ranges could be
|
||||
inefficient (consider range(1, sys.maxint)).
|
||||
|
||||
All in all my preferences are (in descending preference) B, A, D', C
|
||||
where D' is D without the third possibility.
|
||||
All in all my preferences are (from most to least favorite) B, A, D',
|
||||
C, where D' is D without the third possibility.
|
||||
|
||||
|
||||
Semantics
|
||||
|
@ -283,66 +283,117 @@ semantics.
|
|||
If/Elif Chain vs. Dict-based Dispatch
|
||||
-------------------------------------
|
||||
|
||||
There are two main schools of thought about the switch statement's
|
||||
semantics. School I wants to define the switch statement in term of
|
||||
an equivalent if/elif chain. School II prefers to think of it as a
|
||||
dispatch on a precomputed dictionary.
|
||||
There are several main schools of thought about the switch statement's
|
||||
semantics:
|
||||
|
||||
The difference is mainly important when either the switch expression
|
||||
or one of the case expressions is not hashable; school I wants this to
|
||||
be handled as it would be by an if/elif chain (i.e. hashability of the
|
||||
expressions involved doesn't matter) while school II is willing to say
|
||||
that the switch expression and all the case expressions must be
|
||||
hashable if a switch is to be used; otherwise the user should have
|
||||
written an if/elif chain.
|
||||
- School I wants to define the switch statement in term of an
|
||||
equivalent if/elif chain (possibly with some optimization thrown
|
||||
in).
|
||||
|
||||
There's also a difference of opinion regarding the treatment of
|
||||
duplicate cases (i.e. two or more cases with the same match
|
||||
expression). School I wants to treat this the same is an if/elif
|
||||
chain would treat it (i.e. the first match wins and the code for the
|
||||
second match is silently unreachable); school II generally wants this
|
||||
to be an error at the time the switch is frozen.
|
||||
- School II prefers to think of it as a dispatch on a precomputed
|
||||
dict. There are different choices for when the precomputation
|
||||
happens.
|
||||
|
||||
There's also a school III which states that the definition of a switch
|
||||
statement should be in terms of an equivalent if/elif chain, with the
|
||||
exception that all the expressions must be hashable.
|
||||
- There's also school III, which agrees with School I that the
|
||||
definition of a switch statement should be in terms of an equivalent
|
||||
if/elif chain, but concedes to the optimization camp that all
|
||||
expressions involved must be hashable.
|
||||
|
||||
School I believes that the if/elif chain is the only reasonable,
|
||||
surprise-free way of defining switch semantics, and that optimizations
|
||||
as suggested by PEP 275's Solution 1 are sufficient to make most
|
||||
common uses fast. School I sees trouble in the approach of
|
||||
pre-freezing a dispatch dictionary because it places a new and unusual
|
||||
burden on programmers to understand exactly what kinds of case values
|
||||
are allowed to be frozen and when the case values will be frozen, or
|
||||
they might be surprised by the switch statement's behavior.
|
||||
We need to further separate School I into School Ia and School Ib:
|
||||
|
||||
School II sees trouble in trying to achieve semantics that match
|
||||
those of an if/elif chain while optimizing the switch statement into
|
||||
a hash lookup in a dispatch dictionary. In an if/elif chain, the
|
||||
test "x == y" might well be comparing two unhashable values
|
||||
(e.g. two lists); even "x == 1" could be comparing a user-defined
|
||||
class instance that is not hashable but happens to define equality to
|
||||
integers. Worse, the hash function might have a bug or a side effect;
|
||||
if we generate code that believes the hash, a buggy hash might
|
||||
generate an incorrect match, and if we generate code that catches
|
||||
errors in the hash to fall back on an if/elif chain, we might hide
|
||||
genuine bugs. In addition, school II sees little value in allowing
|
||||
cases involving unhashable values; after all if the user expects such
|
||||
values, they can just as easily write an if/elif chain. School II
|
||||
also doesn't believe that it's fair to allow dead code due to
|
||||
overlapping cases to occur unflagged, when the dict-based dispatch
|
||||
implementation makes it so easy to trap this.
|
||||
- School Ia has a simple position: a switch statement is translated to
|
||||
an equivalent if/elif chain, and that's that. It should not be
|
||||
linked to optimization at all. That is also my main objection
|
||||
against this school: without any hint of optimization, the switch
|
||||
statement isn't attractive enough to warrant new syntax.
|
||||
|
||||
School III admits the problems with making hash() optional, but still
|
||||
believes that the true semantics should be defined by an if/elif chain
|
||||
even if the implementation should be allowed to use dict-based
|
||||
dispatch as an optimization. This means that duplicate cases must be
|
||||
resolved by always choosing the first case, making the second case
|
||||
undiagnosed dead code.
|
||||
- School Ib has a more complex position: it agrees with School II that
|
||||
optimization is important, and is willing to concede the compiler
|
||||
certain liberties to allow this. (For example, PEP 275 Solution 1.)
|
||||
In particular, hash() of the switch and case expressions may or may
|
||||
not be called (so it should be side-effect-free); and the case
|
||||
expressions may not be evaluated each time as expected by the
|
||||
if/elif chain behavior, so the case expressions should also be
|
||||
side-effect free. My objection to this (elaborated below) is that
|
||||
if either the hash() or the case expressions aren't
|
||||
side-effect-free, optimized and unoptimized code may behave
|
||||
differently.
|
||||
|
||||
School II grew out of the realization that optimization of commonly
|
||||
found cases isn't so easy, and that it's better to face this head on.
|
||||
This will become clear below.
|
||||
|
||||
The differences between School I (mostly School Ib) and School II are
|
||||
threefold:
|
||||
|
||||
- When optimizing using a dispatch dict, if either the switch
|
||||
expression or the case expressions are unhashable (in which case
|
||||
hash() raises an exception), School Ib requires catching the hash()
|
||||
failure and falling back to an if/elif chain. School II simply lets
|
||||
the exception happen. The problem with catching an exception in
|
||||
hash() as required by School Ib, is that this may hide a genuine
|
||||
bug. A possible way out is to only use a dispatch dict if all case
|
||||
expressions are ints, strings or other built-ins with known good
|
||||
hash behavior, and to only attempt to hash the switch expression if
|
||||
it is also one of those types. Type objects should probably also be
|
||||
supported here. This is the (only) problem that School III
|
||||
addresses.
|
||||
|
||||
- When optimizing using a dispatch dict, if the hash() function of any
|
||||
expression involved returns an incorrect value, under school Ib,
|
||||
optimized code will not behave the same as unoptimized code. This
|
||||
is a well-known problem with optimization-related bugs, and waste
|
||||
lots of developer time. Under School II, in this situation
|
||||
incorrect results are produced at least consistently, which should
|
||||
make debugging a bit easier. The way out proposed for the previous
|
||||
bullet would also help here.
|
||||
|
||||
- School Ib doesn't have a good optimization strategy if the case
|
||||
expressions are named constants. The compiler cannot know their
|
||||
values for sure, and it cannot know whether they are truly constant.
|
||||
As a way out, it has been proposed to re-evaluate the expression
|
||||
corresponding to the case once the dict has identified which case
|
||||
should be taken, to verify that the value of the expression didn't
|
||||
change. But strictly speaking, all the case expressions occurring
|
||||
before that case would also have to be checked, in order to preserve
|
||||
the true if/elif chain semantics, thereby completely killing the
|
||||
optimization. Another proposed solution is to have callbacks
|
||||
notifying the dispatch dict of changes in the value of variables or
|
||||
attributes involved in the case expressions. But this is not likely
|
||||
implementable in the general case, and would require many namespaces
|
||||
to bear the burden of supporting such callbacks, which currently
|
||||
don't exist at all.
|
||||
|
||||
- Finally, there's a difference of opinion regarding the treatment of
|
||||
duplicate cases (i.e. two or more cases with match expressions that
|
||||
evaluates to the same value). School I wants to treat this the same
|
||||
is an if/elif chain would treat it (i.e. the first match wins and
|
||||
the code for the second match is silently unreachable); School II
|
||||
wants this to be an error at the time the dispatch dict is frozen
|
||||
(so dead code doesn't go undiagnosed).
|
||||
|
||||
School I sees trouble in School II's approach of pre-freezing a
|
||||
dispatch dict because it places a new and unusual burden on
|
||||
programmers to understand exactly what kinds of case values are
|
||||
allowed to be frozen and when the case values will be frozen, or they
|
||||
might be surprised by the switch statement's behavior.
|
||||
|
||||
School II doesn't believe that School Ia's unoptimized switch is worth
|
||||
the effort, and it sees trouble in School Ib's proposal for
|
||||
optimization, which can cause optimized and unoptimized code to behave
|
||||
differently.
|
||||
|
||||
In addition, school II sees little value in allowing cases involving
|
||||
unhashable values; after all if the user expects such values, they can
|
||||
just as easily write an if/elif chain. School II also doesn't believe
|
||||
that it's right to allow dead code due to overlapping cases to occur
|
||||
unflagged, when the dict-based dispatch implementation makes it so
|
||||
easy to trap this.
|
||||
|
||||
Personally, I'm in school II: I believe that the dict-based dispatch
|
||||
is the one true implementation for switch statements and that we
|
||||
should face the limitiations and benefits up front.
|
||||
should face the limitiations up front, so that we can reap maximal
|
||||
benefits.
|
||||
|
||||
When to Freeze the Dispatch Dict
|
||||
--------------------------------
|
||||
|
|
Loading…
Reference in New Issue