359 lines
11 KiB
ReStructuredText
359 lines
11 KiB
ReStructuredText
PEP: 275
|
|
Title: Switching on Multiple Values
|
|
Author: Marc-André Lemburg <mal@lemburg.com>
|
|
Status: Rejected
|
|
Type: Standards Track
|
|
Content-Type: text/x-rst
|
|
Created: 10-Nov-2001
|
|
Python-Version: 2.6
|
|
Post-History:
|
|
|
|
Rejection Notice
|
|
================
|
|
|
|
A similar PEP for Python 3000, :pep:`3103`, was already rejected,
|
|
so this proposal has no chance of being accepted either.
|
|
|
|
Abstract
|
|
========
|
|
|
|
This PEP proposes strategies to enhance Python's performance
|
|
with respect to handling switching on a single variable having
|
|
one of multiple possible values.
|
|
|
|
Problem
|
|
=======
|
|
|
|
Up to Python 2.5, the typical way of writing multi-value switches
|
|
has been to use long switch constructs of the following type::
|
|
|
|
if x == 'first state':
|
|
...
|
|
elif x == 'second state':
|
|
...
|
|
elif x == 'third state':
|
|
...
|
|
elif x == 'fourth state':
|
|
...
|
|
else:
|
|
# default handling
|
|
...
|
|
|
|
This works fine for short switch constructs, since the overhead of
|
|
repeated loading of a local (the variable x in this case) and
|
|
comparing it to some constant is low (it has a complexity of O(n)
|
|
on average). However, when using such a construct to write a state
|
|
machine such as is needed for writing parsers the number of
|
|
possible states can easily reach 10 or more cases.
|
|
|
|
The current solution to this problem lies in using a dispatch
|
|
table to find the case implementing method to execute depending on
|
|
the value of the switch variable (this can be tuned to have a
|
|
complexity of O(1) on average, e.g. by using perfect hash
|
|
tables). This works well for state machines which require complex
|
|
and lengthy processing in the different case methods. It does not
|
|
perform well for ones which only process one or two instructions
|
|
per case, e.g.
|
|
|
|
::
|
|
|
|
def handle_data(self, data):
|
|
self.stack.append(data)
|
|
|
|
A nice example of this is the state machine implemented in
|
|
pickle.py which is used to serialize Python objects. Other
|
|
prominent cases include XML SAX parsers and Internet protocol
|
|
handlers.
|
|
|
|
Proposed Solutions
|
|
==================
|
|
|
|
This PEP proposes two different but not necessarily conflicting
|
|
solutions:
|
|
|
|
1. Adding an optimization to the Python compiler and VM
|
|
which detects the above if-elif-else construct and
|
|
generates special opcodes for it which use a read-only
|
|
dictionary for storing jump offsets.
|
|
|
|
2. Adding new syntax to Python which mimics the C style
|
|
switch statement.
|
|
|
|
The first solution has the benefit of not relying on adding new
|
|
keywords to the language, while the second looks cleaner. Both
|
|
involve some run-time overhead to assure that the switching
|
|
variable is immutable and hashable.
|
|
|
|
Both solutions use a dictionary lookup to find the right
|
|
jump location, so they both share the same problem space in
|
|
terms of requiring that both the switch variable and the
|
|
constants need to be compatible to the dictionary implementation
|
|
(hashable, comparable, a==b => hash(a)==hash(b)).
|
|
|
|
Solution 1: Optimizing if-elif-else
|
|
-----------------------------------
|
|
|
|
Implementation:
|
|
|
|
It should be possible for the compiler to detect an
|
|
if-elif-else construct which has the following signature::
|
|
|
|
if x == 'first':...
|
|
elif x == 'second':...
|
|
else:...
|
|
|
|
i.e. the left hand side always references the same variable,
|
|
the right hand side a hashable immutable builtin type. The
|
|
right hand sides need not be all of the same type, but they
|
|
should be comparable to the type of the left hand switch
|
|
variable.
|
|
|
|
The compiler could then setup a read-only (perfect) hash
|
|
table, store it in the constants and add an opcode SWITCH in
|
|
front of the standard if-elif-else byte code stream which
|
|
triggers the following run-time behaviour:
|
|
|
|
At runtime, SWITCH would check x for being one of the
|
|
well-known immutable types (strings, unicode, numbers) and
|
|
use the hash table for finding the right opcode snippet. If
|
|
this condition is not met, the interpreter should revert to
|
|
the standard if-elif-else processing by simply skipping the
|
|
SWITCH opcode and proceeding with the usual if-elif-else byte
|
|
code stream.
|
|
|
|
|
|
Issues:
|
|
|
|
The new optimization should not change the current Python
|
|
semantics (by reducing the number of ``__cmp__`` calls and adding
|
|
``__hash__`` calls in if-elif-else constructs which are affected
|
|
by the optimization). To assure this, switching can only
|
|
safely be implemented either if a "from __future__" style
|
|
flag is used, or the switching variable is one of the builtin
|
|
immutable types: int, float, string, unicode, etc. (not
|
|
subtypes, since it's not clear whether these are still
|
|
immutable or not)
|
|
|
|
To prevent post-modifications of the jump-table dictionary
|
|
(which could be used to reach protected code), the jump-table
|
|
will have to be a read-only type (e.g. a read-only
|
|
dictionary).
|
|
|
|
The optimization should only be used for if-elif-else
|
|
constructs which have a minimum number of n cases (where n is
|
|
a number which has yet to be defined depending on performance
|
|
tests).
|
|
|
|
Solution 2: Adding a switch statement to Python
|
|
-----------------------------------------------
|
|
|
|
New Syntax
|
|
''''''''''
|
|
::
|
|
|
|
switch EXPR:
|
|
case CONSTANT:
|
|
SUITE
|
|
case CONSTANT:
|
|
SUITE
|
|
...
|
|
else:
|
|
SUITE
|
|
|
|
(modulo indentation variations)
|
|
|
|
The "else" part is optional. If no else part is given and
|
|
none of the defined cases matches, no action is taken and
|
|
the switch statement is ignored. This is in line with the
|
|
current if-behaviour. A user who wants to signal this
|
|
situation using an exception can define an else-branch
|
|
which then implements the intended action.
|
|
|
|
Note that the constants need not be all of the same type, but
|
|
they should be comparable to the type of the switch variable.
|
|
|
|
Implementation
|
|
''''''''''''''
|
|
|
|
The compiler would have to compile this into byte code
|
|
similar to this::
|
|
|
|
def whatis(x):
|
|
switch(x):
|
|
case 'one':
|
|
print '1'
|
|
case 'two':
|
|
print '2'
|
|
case 'three':
|
|
print '3'
|
|
else:
|
|
print "D'oh!"
|
|
|
|
into (omitting POP_TOP's and SET_LINENO's)::
|
|
|
|
6 LOAD_FAST 0 (x)
|
|
9 LOAD_CONST 1 (switch-table-1)
|
|
12 SWITCH 26 (to 38)
|
|
|
|
14 LOAD_CONST 2 ('1')
|
|
17 PRINT_ITEM
|
|
18 PRINT_NEWLINE
|
|
19 JUMP 43
|
|
|
|
22 LOAD_CONST 3 ('2')
|
|
25 PRINT_ITEM
|
|
26 PRINT_NEWLINE
|
|
27 JUMP 43
|
|
|
|
30 LOAD_CONST 4 ('3')
|
|
33 PRINT_ITEM
|
|
34 PRINT_NEWLINE
|
|
35 JUMP 43
|
|
|
|
38 LOAD_CONST 5 ("D'oh!")
|
|
41 PRINT_ITEM
|
|
42 PRINT_NEWLINE
|
|
|
|
>>43 LOAD_CONST 0 (None)
|
|
46 RETURN_VALUE
|
|
|
|
Where the 'SWITCH' opcode would jump to 14, 22, 30 or 38
|
|
depending on 'x'.
|
|
|
|
Thomas Wouters has written a patch which demonstrates the
|
|
above. You can download it from [1]_.
|
|
|
|
Issues
|
|
''''''
|
|
|
|
The switch statement should not implement fall-through
|
|
behaviour (as does the switch statement in C). Each case
|
|
defines a complete and independent suite; much like in an
|
|
if-elif-else statement. This also enables using break in
|
|
switch statements inside loops.
|
|
|
|
If the interpreter finds that the switch variable x is
|
|
not hashable, it should raise a TypeError at run-time
|
|
pointing out the problem.
|
|
|
|
There have been other proposals for the syntax which reuse
|
|
existing keywords and avoid adding two new ones ("switch" and
|
|
"case"). Others have argued that the keywords should use new
|
|
terms to avoid confusion with the C keywords of the same name
|
|
but slightly different semantics (e.g. fall-through without
|
|
break). Some of the proposed variants::
|
|
|
|
case EXPR:
|
|
of CONSTANT:
|
|
SUITE
|
|
of CONSTANT:
|
|
SUITE
|
|
else:
|
|
SUITE
|
|
|
|
case EXPR:
|
|
if CONSTANT:
|
|
SUITE
|
|
if CONSTANT:
|
|
SUITE
|
|
else:
|
|
SUITE
|
|
|
|
when EXPR:
|
|
in CONSTANT_TUPLE:
|
|
SUITE
|
|
in CONSTANT_TUPLE:
|
|
SUITE
|
|
...
|
|
else:
|
|
SUITE
|
|
|
|
The switch statement could be extended to allow multiple
|
|
values for one section (e.g. case 'a', 'b', 'c': ...). Another
|
|
proposed extension would allow ranges of values (e.g. case
|
|
10..14: ...). These should probably be post-poned, but already
|
|
kept in mind when designing and implementing a first version.
|
|
|
|
Examples
|
|
--------
|
|
|
|
The following examples all use a new syntax as proposed by
|
|
solution 2. However, all of these examples would work with
|
|
solution 1 as well.
|
|
|
|
::
|
|
|
|
switch EXPR: switch x:
|
|
case CONSTANT: case "first":
|
|
SUITE print x
|
|
case CONSTANT: case "second":
|
|
SUITE x = x**2
|
|
... print x
|
|
else: else:
|
|
SUITE print "whoops!"
|
|
|
|
|
|
case EXPR: case x:
|
|
of CONSTANT: of "first":
|
|
SUITE print x
|
|
of CONSTANT: of "second":
|
|
SUITE print x**2
|
|
else: else:
|
|
SUITE print "whoops!"
|
|
|
|
|
|
case EXPR: case state:
|
|
if CONSTANT: if "first":
|
|
SUITE state = "second"
|
|
if CONSTANT: if "second":
|
|
SUITE state = "third"
|
|
else: else:
|
|
SUITE state = "first"
|
|
|
|
|
|
when EXPR: when state:
|
|
in CONSTANT_TUPLE: in ("first", "second"):
|
|
SUITE print state
|
|
in CONSTANT_TUPLE: state = next_state(state)
|
|
SUITE in ("seventh",):
|
|
... print "done"
|
|
else: break # out of loop!
|
|
SUITE else:
|
|
print "middle state"
|
|
state = next_state(state)
|
|
|
|
Here's another nice application found by Jack Jansen (switching
|
|
on argument types)::
|
|
|
|
switch type(x).__name__:
|
|
case 'int':
|
|
SUITE
|
|
case 'string':
|
|
SUITE
|
|
|
|
Scope
|
|
=====
|
|
|
|
XXX Explain "from __future__ import switch"
|
|
|
|
Credits
|
|
=======
|
|
|
|
* Martin von Löwis (issues with the optimization idea)
|
|
* Thomas Wouters (switch statement + byte code compiler example)
|
|
* Skip Montanaro (dispatching ideas, examples)
|
|
* Donald Beaudry (switch syntax)
|
|
* Greg Ewing (switch syntax)
|
|
* Jack Jansen (type switching examples)
|
|
|
|
References
|
|
==========
|
|
|
|
.. [1] https://sourceforge.net/tracker/index.php?func=detail&aid=481118&group_id=5470&atid=305470
|
|
|
|
|
|
Copyright
|
|
=========
|
|
|
|
This document has been placed in the public domain.
|