342 lines
12 KiB
Plaintext
342 lines
12 KiB
Plaintext
PEP: 0275
|
||
Title: Switching on Multiple Values
|
||
Version: $Revision$
|
||
Author: mal@lemburg.com (Marc-André Lemburg)
|
||
Status: Draft
|
||
Type: Standards Track
|
||
Python-Version: 2.4
|
||
Created: 10-Nov-2001
|
||
Post-History:
|
||
|
||
Abstract
|
||
|
||
This PEP proposes strategies to enhance Python's performance
|
||
with respect to handling switching on a single variable having
|
||
one of multiple possible values.
|
||
|
||
Problem
|
||
|
||
Up to Python 2.3, the typical way of writing multi-value switches
|
||
has been to use long switch constructs of the following type:
|
||
|
||
if x == 'first state':
|
||
...
|
||
elif x == 'second state':
|
||
...
|
||
elif x == 'third state':
|
||
...
|
||
elif x == 'fourth state':
|
||
...
|
||
else:
|
||
# default handling
|
||
...
|
||
|
||
This works fine for short switch constructs, since the overhead of
|
||
repeated loading of a local (the variable x in this case) and
|
||
comparing it to some constant is low (it has a complexity of O(n)
|
||
on average). However, when using such a construct to write a state
|
||
machine such as is needed for writing parsers the number of
|
||
possible states can easily reach 10 or more cases.
|
||
|
||
The current solution to this problem lies in using a dispatch
|
||
table to find the case implementing method to execute depending on
|
||
the value of the switch variable (this can be tuned to have a
|
||
complexity of O(1) on average, e.g. by using perfect hash
|
||
tables). This works well for state machines which require complex
|
||
and lengthy processing in the different case methods. It does not
|
||
perform well for ones which only process one or two instructions
|
||
per case, e.g.
|
||
|
||
def handle_data(self, data):
|
||
self.stack.append(data)
|
||
|
||
A nice example of this is the state machine implemented in
|
||
pickle.py which is used to serialize Python objects. Other
|
||
prominent cases include XML SAX parsers and Internet protocol
|
||
handlers.
|
||
|
||
Proposed Solutions
|
||
|
||
This PEP proposes two different but not necessarily conflicting
|
||
solutions:
|
||
|
||
1. Adding an optimization to the Python compiler and VM
|
||
which detects the above if-elif-else construct and
|
||
generates special opcodes for it which use an read-only
|
||
dictionary for storing jump offsets.
|
||
|
||
2. Adding new syntax to Python which mimics the C style
|
||
switch statement.
|
||
|
||
The first solution has the benefit of not relying on adding new
|
||
keywords to the language, while the second looks cleaner. Both
|
||
involve some run-time overhead to assure that the switching
|
||
variable is immutable and hashable.
|
||
|
||
Both solutions use a dictionary lookup to find the right
|
||
jump location, so they both share the same problem space in
|
||
terms of requiring that both the switch variable and the
|
||
constants need to be compatible to the dictionary implementation
|
||
(hashable, comparable, a==b => hash(a)==hash(b)).
|
||
|
||
Solution 1: Optimizing if-elif-else
|
||
|
||
Implementation:
|
||
|
||
It should be possible for the compiler to detect an
|
||
if-elif-else construct which has the following signature:
|
||
|
||
if x == 'first':...
|
||
elif x == 'second':...
|
||
else:...
|
||
|
||
i.e. the left hand side always references the same variable,
|
||
the right hand side a hashable immutable builtin type. The
|
||
right hand sides need not be all of the same type, but they
|
||
should be comparable to the type of the left hand switch
|
||
variable.
|
||
|
||
The compiler could then setup a read-only (perfect) hash
|
||
table, store it in the constants and add an opcode SWITCH in
|
||
front of the standard if-elif-else byte code stream which
|
||
triggers the following run-time behaviour:
|
||
|
||
At runtime, SWITCH would check x for being one of the
|
||
well-known immutable types (strings, unicode, numbers) and
|
||
use the hash table for finding the right opcode snippet. If
|
||
this condition is not met, the interpreter should revert to
|
||
the standard if-elif-else processing by simply skipping the
|
||
SWITCH opcode and procedding with the usual if-elif-else byte
|
||
code stream.
|
||
|
||
Issues:
|
||
|
||
The new optimization should not change the current Python
|
||
semantics (by reducing the number of __cmp__ calls and adding
|
||
__hash__ calls in if-elif-else constructs which are affected
|
||
by the optimiztation). To assure this, switching can only
|
||
safely be implemented either if a "from __future__" style
|
||
flag is used, or the switching variable is one of the builtin
|
||
immutable types: int, float, string, unicode, etc. (not
|
||
subtypes, since it's not clear whether these are still
|
||
immutable or not)
|
||
|
||
To prevent post-modifications of the jump-table dictionary
|
||
(which could be used to reach protected code), the jump-table
|
||
will have to be a read-only type (e.g. a read-only
|
||
dictionary).
|
||
|
||
The optimization should only be used for if-elif-else
|
||
constructs which have a minimum number of n cases (where n is
|
||
a number which has yet to be defined depending on performance
|
||
tests).
|
||
|
||
Solution 2: Adding a switch statement to Python
|
||
|
||
New Syntax:
|
||
|
||
switch EXPR:
|
||
case CONSTANT:
|
||
SUITE
|
||
case CONSTANT:
|
||
SUITE
|
||
...
|
||
else:
|
||
SUITE
|
||
|
||
(modulo indentation variations)
|
||
|
||
The "else" part is optional. If no else part is given and
|
||
none of the defined cases matches, no action is taken and
|
||
the switch statement is ignored. This is in line with the
|
||
current if-behaviour. A user who wants to signal this
|
||
situation using an exception can define an else-branch
|
||
which then implements the intended action.
|
||
|
||
Note that the constants need not be all of the same type, but
|
||
they should be comparable to the type of the switch variable.
|
||
|
||
Implementation:
|
||
|
||
The compiler would have to compile this into byte code
|
||
similar to this:
|
||
|
||
def whatis(x):
|
||
switch(x):
|
||
case 'one':
|
||
print '1'
|
||
case 'two':
|
||
print '2'
|
||
case 'three':
|
||
print '3'
|
||
else:
|
||
print "D'oh!"
|
||
|
||
into (ommitting POP_TOP's and SET_LINENO's):
|
||
|
||
6 LOAD_FAST 0 (x)
|
||
9 LOAD_CONST 1 (switch-table-1)
|
||
12 SWITCH 26 (to 38)
|
||
|
||
14 LOAD_CONST 2 ('1')
|
||
17 PRINT_ITEM
|
||
18 PRINT_NEWLINE
|
||
19 JUMP 43
|
||
|
||
22 LOAD_CONST 3 ('2')
|
||
25 PRINT_ITEM
|
||
26 PRINT_NEWLINE
|
||
27 JUMP 43
|
||
|
||
30 LOAD_CONST 4 ('3')
|
||
33 PRINT_ITEM
|
||
34 PRINT_NEWLINE
|
||
35 JUMP 43
|
||
|
||
38 LOAD_CONST 5 ("D'oh!")
|
||
41 PRINT_ITEM
|
||
42 PRINT_NEWLINE
|
||
|
||
>>43 LOAD_CONST 0 (None)
|
||
46 RETURN_VALUE
|
||
|
||
Where the 'SWITCH' opcode would jump to 14, 22, 30 or 38
|
||
depending on 'x'.
|
||
|
||
Thomas Wouters has written a patch which demonstrates the
|
||
above. You can download it from [1].
|
||
|
||
Issues:
|
||
|
||
The switch statement should not implement fall-through
|
||
behaviour (as does the switch statement in C). Each case
|
||
defines a complete and independent suite; much like in a
|
||
if-elif-else statement. This also enables using break in
|
||
switch statments inside loops.
|
||
|
||
If the interpreter finds that the switch variable x is
|
||
not hashable, it should raise a TypeError at run-time
|
||
pointing out the problem.
|
||
|
||
There have been other proposals for the syntax which reuse
|
||
existing keywords and avoid adding two new ones ("switch" and
|
||
"case"). Others have argued that the keywords should use new
|
||
terms to avoid confusion with the C keywords of the same name
|
||
but slightly different semantics (e.g. fall-through without
|
||
break). Some of the proposed variants:
|
||
|
||
case EXPR:
|
||
of CONSTANT:
|
||
SUITE
|
||
of CONSTANT:
|
||
SUITE
|
||
else:
|
||
SUITE
|
||
|
||
case EXPR:
|
||
if CONSTANT:
|
||
SUITE
|
||
if CONSTANT:
|
||
SUITE
|
||
else:
|
||
SUITE
|
||
|
||
when EXPR:
|
||
in CONSTANT_TUPLE:
|
||
SUITE
|
||
in CONSTANT_TUPLE:
|
||
SUITE
|
||
...
|
||
else:
|
||
SUITE
|
||
|
||
The switch statement could be extended to allow multiple
|
||
values for one section (e.g. case 'a', 'b', 'c': ...). Another
|
||
proposed extension would allow ranges of values (e.g. case
|
||
10..14: ...). These should probably be post-poned, but already
|
||
kept in mind when designing and implementing a first version.
|
||
|
||
Examples:
|
||
|
||
The following examples all use a new syntax as proposed by
|
||
solution 2. However, all of these examples would work with
|
||
solution 1 as well.
|
||
|
||
switch EXPR: switch x:
|
||
case CONSTANT: case "first":
|
||
SUITE print x
|
||
case CONSTANT: case "second":
|
||
SUITE x = x**2
|
||
... print x
|
||
else: else:
|
||
SUITE print "whoops!"
|
||
|
||
|
||
case EXPR: case x:
|
||
of CONSTANT: of "first":
|
||
SUITE print x
|
||
of CONSTANT: of "second":
|
||
SUITE print x**2
|
||
else: else:
|
||
SUITE print "whoops!"
|
||
|
||
|
||
case EXPR: case state:
|
||
if CONSTANT: if "first":
|
||
SUITE state = "second"
|
||
if CONSTANT: if "second":
|
||
SUITE state = "third"
|
||
else: else:
|
||
SUITE state = "first"
|
||
|
||
|
||
when EXPR: when state:
|
||
in CONSTANT_TUPLE: in ("first", "second"):
|
||
SUITE print state
|
||
in CONSTANT_TUPLE: state = next_state(state)
|
||
SUITE in ("seventh",):
|
||
... print "done"
|
||
else: break # out of loop!
|
||
SUITE else:
|
||
print "middle state"
|
||
state = next_state(state)
|
||
|
||
Here's another nice application found by Jack Jansen (switching
|
||
on argument types):
|
||
|
||
switch type(x).__name__:
|
||
case 'int':
|
||
SUITE
|
||
case 'string':
|
||
SUITE
|
||
|
||
Scope
|
||
|
||
XXX Explain "from __future__ import switch"
|
||
|
||
Credits
|
||
|
||
Martin von Löwis (issues with the optimization idea)
|
||
Thomas Wouters (switch statement + byte code compiler example)
|
||
Skip Montanaro (dispatching ideas, examples)
|
||
Donald Beaudry (switch syntax)
|
||
Greg Ewing (switch syntax)
|
||
Jack Jansen (type switching examples)
|
||
|
||
References
|
||
|
||
[1] https://sourceforge.net/tracker/index.php?func=detail&aid=481118&group_id=5470&atid=305470
|
||
|
||
Copyright
|
||
|
||
This document has been placed in the public domain.
|
||
|
||
|
||
Local Variables:
|
||
mode: indented-text
|
||
indent-tabs-mode: nil
|
||
sentence-end-double-space: t
|
||
fill-column: 70
|
||
coding: utf-8
|
||
End:
|