reSTify PEP 275 (#356)

This commit is contained in:
Huang Huang 2017-08-29 06:10:25 +08:00 committed by Brett Cannon
parent 7a99bb9b72
commit 5a1b908205
1 changed files with 279 additions and 256 deletions

View File

@ -5,25 +5,29 @@ Last-Modified: $Date$
Author: mal@lemburg.com (Marc-André Lemburg) Author: mal@lemburg.com (Marc-André Lemburg)
Status: Rejected Status: Rejected
Type: Standards Track Type: Standards Track
Content-Type: text/x-rst
Created: 10-Nov-2001 Created: 10-Nov-2001
Python-Version: 2.6 Python-Version: 2.6
Post-History: Post-History:
Rejection Notice Rejection Notice
================
A similar PEP for Python 3000, PEP 3103 [2], was already rejected, A similar PEP for Python 3000, PEP 3103 [2]_, was already rejected,
so this proposal has no chance of being accepted either. so this proposal has no chance of being accepted either.
Abstract Abstract
========
This PEP proposes strategies to enhance Python's performance This PEP proposes strategies to enhance Python's performance
with respect to handling switching on a single variable having with respect to handling switching on a single variable having
one of multiple possible values. one of multiple possible values.
Problem Problem
=======
Up to Python 2.5, the typical way of writing multi-value switches Up to Python 2.5, the typical way of writing multi-value switches
has been to use long switch constructs of the following type: has been to use long switch constructs of the following type::
if x == 'first state': if x == 'first state':
... ...
@ -37,312 +41,331 @@ Problem
# default handling # default handling
... ...
This works fine for short switch constructs, since the overhead of This works fine for short switch constructs, since the overhead of
repeated loading of a local (the variable x in this case) and repeated loading of a local (the variable x in this case) and
comparing it to some constant is low (it has a complexity of O(n) comparing it to some constant is low (it has a complexity of O(n)
on average). However, when using such a construct to write a state on average). However, when using such a construct to write a state
machine such as is needed for writing parsers the number of machine such as is needed for writing parsers the number of
possible states can easily reach 10 or more cases. possible states can easily reach 10 or more cases.
The current solution to this problem lies in using a dispatch The current solution to this problem lies in using a dispatch
table to find the case implementing method to execute depending on table to find the case implementing method to execute depending on
the value of the switch variable (this can be tuned to have a the value of the switch variable (this can be tuned to have a
complexity of O(1) on average, e.g. by using perfect hash complexity of O(1) on average, e.g. by using perfect hash
tables). This works well for state machines which require complex tables). This works well for state machines which require complex
and lengthy processing in the different case methods. It does not and lengthy processing in the different case methods. It does not
perform well for ones which only process one or two instructions perform well for ones which only process one or two instructions
per case, e.g. per case, e.g.
::
def handle_data(self, data): def handle_data(self, data):
self.stack.append(data) self.stack.append(data)
A nice example of this is the state machine implemented in A nice example of this is the state machine implemented in
pickle.py which is used to serialize Python objects. Other pickle.py which is used to serialize Python objects. Other
prominent cases include XML SAX parsers and Internet protocol prominent cases include XML SAX parsers and Internet protocol
handlers. handlers.
Proposed Solutions Proposed Solutions
==================
This PEP proposes two different but not necessarily conflicting This PEP proposes two different but not necessarily conflicting
solutions: solutions:
1. Adding an optimization to the Python compiler and VM 1. Adding an optimization to the Python compiler and VM
which detects the above if-elif-else construct and which detects the above if-elif-else construct and
generates special opcodes for it which use a read-only generates special opcodes for it which use a read-only
dictionary for storing jump offsets. dictionary for storing jump offsets.
2. Adding new syntax to Python which mimics the C style 2. Adding new syntax to Python which mimics the C style
switch statement. switch statement.
The first solution has the benefit of not relying on adding new The first solution has the benefit of not relying on adding new
keywords to the language, while the second looks cleaner. Both keywords to the language, while the second looks cleaner. Both
involve some run-time overhead to assure that the switching involve some run-time overhead to assure that the switching
variable is immutable and hashable. variable is immutable and hashable.
Both solutions use a dictionary lookup to find the right Both solutions use a dictionary lookup to find the right
jump location, so they both share the same problem space in jump location, so they both share the same problem space in
terms of requiring that both the switch variable and the terms of requiring that both the switch variable and the
constants need to be compatible to the dictionary implementation constants need to be compatible to the dictionary implementation
(hashable, comparable, a==b => hash(a)==hash(b)). (hashable, comparable, a==b => hash(a)==hash(b)).
Solution 1: Optimizing if-elif-else Solution 1: Optimizing if-elif-else
-----------------------------------
Implementation: Implementation:
It should be possible for the compiler to detect an It should be possible for the compiler to detect an
if-elif-else construct which has the following signature: if-elif-else construct which has the following signature::
if x == 'first':... if x == 'first':...
elif x == 'second':... elif x == 'second':...
else:... else:...
i.e. the left hand side always references the same variable, i.e. the left hand side always references the same variable,
the right hand side a hashable immutable builtin type. The the right hand side a hashable immutable builtin type. The
right hand sides need not be all of the same type, but they right hand sides need not be all of the same type, but they
should be comparable to the type of the left hand switch should be comparable to the type of the left hand switch
variable. variable.
The compiler could then setup a read-only (perfect) hash The compiler could then setup a read-only (perfect) hash
table, store it in the constants and add an opcode SWITCH in table, store it in the constants and add an opcode SWITCH in
front of the standard if-elif-else byte code stream which front of the standard if-elif-else byte code stream which
triggers the following run-time behaviour: triggers the following run-time behaviour:
At runtime, SWITCH would check x for being one of the At runtime, SWITCH would check x for being one of the
well-known immutable types (strings, unicode, numbers) and well-known immutable types (strings, unicode, numbers) and
use the hash table for finding the right opcode snippet. If use the hash table for finding the right opcode snippet. If
this condition is not met, the interpreter should revert to this condition is not met, the interpreter should revert to
the standard if-elif-else processing by simply skipping the the standard if-elif-else processing by simply skipping the
SWITCH opcode and procedding with the usual if-elif-else byte SWITCH opcode and procedding with the usual if-elif-else byte
code stream. code stream.
Issues:
The new optimization should not change the current Python Issues:
semantics (by reducing the number of __cmp__ calls and adding
__hash__ calls in if-elif-else constructs which are affected
by the optimiztation). To assure this, switching can only
safely be implemented either if a "from __future__" style
flag is used, or the switching variable is one of the builtin
immutable types: int, float, string, unicode, etc. (not
subtypes, since it's not clear whether these are still
immutable or not)
To prevent post-modifications of the jump-table dictionary The new optimization should not change the current Python
(which could be used to reach protected code), the jump-table semantics (by reducing the number of ``__cmp__`` calls and adding
will have to be a read-only type (e.g. a read-only ``__hash__`` calls in if-elif-else constructs which are affected
dictionary). by the optimiztation). To assure this, switching can only
safely be implemented either if a "from __future__" style
flag is used, or the switching variable is one of the builtin
immutable types: int, float, string, unicode, etc. (not
subtypes, since it's not clear whether these are still
immutable or not)
The optimization should only be used for if-elif-else To prevent post-modifications of the jump-table dictionary
constructs which have a minimum number of n cases (where n is (which could be used to reach protected code), the jump-table
a number which has yet to be defined depending on performance will have to be a read-only type (e.g. a read-only
tests). dictionary).
The optimization should only be used for if-elif-else
constructs which have a minimum number of n cases (where n is
a number which has yet to be defined depending on performance
tests).
Solution 2: Adding a switch statement to Python Solution 2: Adding a switch statement to Python
-----------------------------------------------
New Syntax: New Syntax
''''''''''
::
switch EXPR: switch EXPR:
case CONSTANT: case CONSTANT:
SUITE SUITE
case CONSTANT: case CONSTANT:
SUITE SUITE
... ...
else: else:
SUITE SUITE
(modulo indentation variations) (modulo indentation variations)
The "else" part is optional. If no else part is given and The "else" part is optional. If no else part is given and
none of the defined cases matches, no action is taken and none of the defined cases matches, no action is taken and
the switch statement is ignored. This is in line with the the switch statement is ignored. This is in line with the
current if-behaviour. A user who wants to signal this current if-behaviour. A user who wants to signal this
situation using an exception can define an else-branch situation using an exception can define an else-branch
which then implements the intended action. which then implements the intended action.
Note that the constants need not be all of the same type, but Note that the constants need not be all of the same type, but
they should be comparable to the type of the switch variable. they should be comparable to the type of the switch variable.
Implementation: Implementation
''''''''''''''
The compiler would have to compile this into byte code The compiler would have to compile this into byte code
similar to this: similar to this::
def whatis(x): def whatis(x):
switch(x): switch(x):
case 'one': case 'one':
print '1' print '1'
case 'two': case 'two':
print '2' print '2'
case 'three': case 'three':
print '3' print '3'
else:
print "D'oh!"
into (omitting POP_TOP's and SET_LINENO's):
6 LOAD_FAST 0 (x)
9 LOAD_CONST 1 (switch-table-1)
12 SWITCH 26 (to 38)
14 LOAD_CONST 2 ('1')
17 PRINT_ITEM
18 PRINT_NEWLINE
19 JUMP 43
22 LOAD_CONST 3 ('2')
25 PRINT_ITEM
26 PRINT_NEWLINE
27 JUMP 43
30 LOAD_CONST 4 ('3')
33 PRINT_ITEM
34 PRINT_NEWLINE
35 JUMP 43
38 LOAD_CONST 5 ("D'oh!")
41 PRINT_ITEM
42 PRINT_NEWLINE
>>43 LOAD_CONST 0 (None)
46 RETURN_VALUE
Where the 'SWITCH' opcode would jump to 14, 22, 30 or 38
depending on 'x'.
Thomas Wouters has written a patch which demonstrates the
above. You can download it from [1].
Issues:
The switch statement should not implement fall-through
behaviour (as does the switch statement in C). Each case
defines a complete and independent suite; much like in a
if-elif-else statement. This also enables using break in
switch statements inside loops.
If the interpreter finds that the switch variable x is
not hashable, it should raise a TypeError at run-time
pointing out the problem.
There have been other proposals for the syntax which reuse
existing keywords and avoid adding two new ones ("switch" and
"case"). Others have argued that the keywords should use new
terms to avoid confusion with the C keywords of the same name
but slightly different semantics (e.g. fall-through without
break). Some of the proposed variants:
case EXPR:
of CONSTANT:
SUITE
of CONSTANT:
SUITE
else:
SUITE
case EXPR:
if CONSTANT:
SUITE
if CONSTANT:
SUITE
else:
SUITE
when EXPR:
in CONSTANT_TUPLE:
SUITE
in CONSTANT_TUPLE:
SUITE
...
else: else:
SUITE print "D'oh!"
The switch statement could be extended to allow multiple into (omitting POP_TOP's and SET_LINENO's)::
values for one section (e.g. case 'a', 'b', 'c': ...). Another
proposed extension would allow ranges of values (e.g. case
10..14: ...). These should probably be post-poned, but already
kept in mind when designing and implementing a first version.
Examples: 6 LOAD_FAST 0 (x)
9 LOAD_CONST 1 (switch-table-1)
12 SWITCH 26 (to 38)
The following examples all use a new syntax as proposed by 14 LOAD_CONST 2 ('1')
solution 2. However, all of these examples would work with 17 PRINT_ITEM
solution 1 as well. 18 PRINT_NEWLINE
19 JUMP 43
switch EXPR: switch x: 22 LOAD_CONST 3 ('2')
case CONSTANT: case "first": 25 PRINT_ITEM
SUITE print x 26 PRINT_NEWLINE
case CONSTANT: case "second": 27 JUMP 43
SUITE x = x**2
... print x 30 LOAD_CONST 4 ('3')
else: else: 33 PRINT_ITEM
SUITE print "whoops!" 34 PRINT_NEWLINE
35 JUMP 43
38 LOAD_CONST 5 ("D'oh!")
41 PRINT_ITEM
42 PRINT_NEWLINE
>>43 LOAD_CONST 0 (None)
46 RETURN_VALUE
Where the 'SWITCH' opcode would jump to 14, 22, 30 or 38
depending on 'x'.
Thomas Wouters has written a patch which demonstrates the
above. You can download it from [1]_.
Issues
''''''
The switch statement should not implement fall-through
behaviour (as does the switch statement in C). Each case
defines a complete and independent suite; much like in a
if-elif-else statement. This also enables using break in
switch statements inside loops.
If the interpreter finds that the switch variable x is
not hashable, it should raise a TypeError at run-time
pointing out the problem.
There have been other proposals for the syntax which reuse
existing keywords and avoid adding two new ones ("switch" and
"case"). Others have argued that the keywords should use new
terms to avoid confusion with the C keywords of the same name
but slightly different semantics (e.g. fall-through without
break). Some of the proposed variants::
case EXPR:
of CONSTANT:
SUITE
of CONSTANT:
SUITE
else:
SUITE
case EXPR:
if CONSTANT:
SUITE
if CONSTANT:
SUITE
else:
SUITE
when EXPR:
in CONSTANT_TUPLE:
SUITE
in CONSTANT_TUPLE:
SUITE
...
else:
SUITE
The switch statement could be extended to allow multiple
values for one section (e.g. case 'a', 'b', 'c': ...). Another
proposed extension would allow ranges of values (e.g. case
10..14: ...). These should probably be post-poned, but already
kept in mind when designing and implementing a first version.
Examples
--------
The following examples all use a new syntax as proposed by
solution 2. However, all of these examples would work with
solution 1 as well.
::
switch EXPR: switch x:
case CONSTANT: case "first":
SUITE print x
case CONSTANT: case "second":
SUITE x = x**2
... print x
else: else:
SUITE print "whoops!"
case EXPR: case x: case EXPR: case x:
of CONSTANT: of "first": of CONSTANT: of "first":
SUITE print x SUITE print x
of CONSTANT: of "second": of CONSTANT: of "second":
SUITE print x**2 SUITE print x**2
else: else: else: else:
SUITE print "whoops!" SUITE print "whoops!"
case EXPR: case state: case EXPR: case state:
if CONSTANT: if "first": if CONSTANT: if "first":
SUITE state = "second" SUITE state = "second"
if CONSTANT: if "second": if CONSTANT: if "second":
SUITE state = "third" SUITE state = "third"
else: else: else: else:
SUITE state = "first" SUITE state = "first"
when EXPR: when state: when EXPR: when state:
in CONSTANT_TUPLE: in ("first", "second"): in CONSTANT_TUPLE: in ("first", "second"):
SUITE print state SUITE print state
in CONSTANT_TUPLE: state = next_state(state) in CONSTANT_TUPLE: state = next_state(state)
SUITE in ("seventh",): SUITE in ("seventh",):
... print "done" ... print "done"
else: break # out of loop! else: break # out of loop!
SUITE else: SUITE else:
print "middle state" print "middle state"
state = next_state(state) state = next_state(state)
Here's another nice application found by Jack Jansen (switching Here's another nice application found by Jack Jansen (switching
on argument types): on argument types)::
switch type(x).__name__: switch type(x).__name__:
case 'int': case 'int':
SUITE SUITE
case 'string': case 'string':
SUITE SUITE
Scope Scope
=====
XXX Explain "from __future__ import switch" XXX Explain "from __future__ import switch"
Credits Credits
=======
Martin von Löwis (issues with the optimization idea) * Martin von Löwis (issues with the optimization idea)
Thomas Wouters (switch statement + byte code compiler example) * Thomas Wouters (switch statement + byte code compiler example)
Skip Montanaro (dispatching ideas, examples) * Skip Montanaro (dispatching ideas, examples)
Donald Beaudry (switch syntax) * Donald Beaudry (switch syntax)
Greg Ewing (switch syntax) * Greg Ewing (switch syntax)
Jack Jansen (type switching examples) * Jack Jansen (type switching examples)
References References
==========
.. [1] https://sourceforge.net/tracker/index.php?func=detail&aid=481118&group_id=5470&atid=305470
.. [2] http://www.python.org/dev/peps/pep-3103
[1] https://sourceforge.net/tracker/index.php?func=detail&aid=481118&group_id=5470&atid=305470
[2] http://www.python.org/dev/peps/pep-3103
Copyright Copyright
=========
This document has been placed in the public domain. This document has been placed in the public domain.
..
Local Variables: Local Variables:
mode: indented-text mode: indented-text
indent-tabs-mode: nil indent-tabs-mode: nil
sentence-end-double-space: t sentence-end-double-space: t
fill-column: 70 fill-column: 70
coding: utf-8 coding: utf-8
End: End: