205 lines
8.5 KiB
Plaintext
205 lines
8.5 KiB
Plaintext
PEP: 266
|
||
Title: Optimizing Global Variable/Attribute Access
|
||
Version: $Revision$
|
||
Author: skip@pobox.com (Skip Montanaro)
|
||
Status: Draft
|
||
Type: Standards Track
|
||
Python-Version: 2.3
|
||
Created: 13-Aug-2001
|
||
Post-History:
|
||
|
||
|
||
Abstract
|
||
|
||
The bindings for most global variables and attributes of other
|
||
modules typically never change during the execution of a Python
|
||
program, but because of Python's dynamic nature, code which
|
||
accesses such global objects must run through a full lookup each
|
||
time the object is needed. This PEP proposes a mechanism that
|
||
allows code that accesses most global objects to treat them as
|
||
local objects and places the burden of updating references on the
|
||
code that changes the name bindings of such objects.
|
||
|
||
|
||
Introduction
|
||
|
||
Consider the workhorse function sre_compile._compile. It is the
|
||
internal compilation function for the sre module. It consists
|
||
almost entirely of a loop over the elements of the pattern being
|
||
compiled, comparing opcodes with known constant values and
|
||
appending tokens to an output list. Most of the comparisons are
|
||
with constants imported from the sre_constants module. This means
|
||
there are lots of LOAD_GLOBAL bytecodes in the compiled output of
|
||
this module. Just by reading the code it's apparent that the
|
||
author intended LITERAL, NOT_LITERAL, OPCODES and many other
|
||
symbols to be constants. Still, each time they are involved in an
|
||
expression, they must be looked up anew.
|
||
|
||
Most global accesses are actually to objects that are "almost
|
||
constants". This includes global variables in the current module
|
||
as well as the attributes of other imported modules. Since they
|
||
rarely change, it seems reasonable to place the burden of updating
|
||
references to such objects on the code that changes the name
|
||
bindings. If sre_constants.LITERAL is changed to refer to another
|
||
object, perhaps it would be worthwhile for the code that modifies
|
||
the sre_constants module dict to correct any active references to
|
||
that object. By doing so, in many cases global variables and the
|
||
attributes of many objects could be cached as local variables. If
|
||
the bindings between the names given to the objects and the
|
||
objects themselves changes rarely, the cost of keeping track of
|
||
such objects should be low and the potential payoff fairly large.
|
||
|
||
|
||
Proposed Change
|
||
|
||
I propose that the Python virtual machine be modified to include
|
||
TRACK_OBJECT and UNTRACK_OBJECT opcodes. TRACK_OBJECT would
|
||
associate a global name or attribute of a global name with a slot
|
||
in the local variable array and perform an initial lookup of the
|
||
associated object to fill in the slot with a valid value. The
|
||
association it creates would be noted by the code responsible for
|
||
changing the name-to-object binding to cause the associated local
|
||
variable to be updated. The UNTRACK_OBJECT opcode would delete
|
||
any association between the name and the local variable slot.
|
||
|
||
|
||
Rationale
|
||
|
||
Global variables and attributes rarely change. For example, once
|
||
a function imports the math module, the binding between the name
|
||
"math" and the module it refers to aren't likely to change.
|
||
Similarly, if the function that uses the math module refers to its
|
||
"sin" attribute, it's unlikely to change. Still, every time the
|
||
module wants to call the math.sin function, it must first execute
|
||
a pair of instructions:
|
||
|
||
LOAD_GLOBAL math
|
||
LOAD_ATTR sin
|
||
|
||
If the client module always assumed that math.sin was a local
|
||
constant and it was the responsibility of "external forces"
|
||
outside the function to keep the reference correct, we might have
|
||
code like this:
|
||
|
||
TRACK_OBJECT math.sin
|
||
...
|
||
LOAD_FAST math.sin
|
||
...
|
||
UNTRACK_OBJECT math.sin
|
||
|
||
If the LOAD_FAST was in a loop the payoff in reduced global loads
|
||
and attribute lookups could be significant.
|
||
|
||
This technique could, in theory, be applied to any global variable
|
||
access or attribute lookup. Consider this code:
|
||
|
||
l = []
|
||
for i in range(10):
|
||
l.append(math.sin(i))
|
||
return l
|
||
|
||
Even though l is a local variable, you still pay the cost of
|
||
loading l.append ten times in the loop. The compiler (or an
|
||
optimizer) could recognize that both math.sin and l.append are
|
||
being called in the loop and decide to generate the tracked local
|
||
code, avoiding it for the builtin range() function because it's
|
||
only called once during loop setup.
|
||
|
||
According to a post to python-dev by Marc-Andre Lemburg [1],
|
||
LOAD_GLOBAL opcodes account for over 7% of all instructions
|
||
executed by the Python virtual machine. This can be a very
|
||
expensive instruction, at least relative to a LOAD_FAST
|
||
instruction, which is a simple array index and requires no extra
|
||
function calls by the virtual machine. I believe many LOAD_GLOBAL
|
||
instructions and LOAD_GLOBAL/ LOAD_ATTR pairs could be converted
|
||
to LOAD_FAST instructions.
|
||
|
||
Code that uses global variables heavily often resorts to various
|
||
tricks to avoid global variable and attribute lookup. The
|
||
aforementioned sre_compile._compile function caches the append
|
||
method of the growing output list. Many people commonly abuse
|
||
functions' default argument feature to cache global variable
|
||
lookups. Both of these schemes are hackish and rarely address all
|
||
the available opportunities for optimization. (For example,
|
||
sre_compile._compile does not cache the two globals that it uses
|
||
most frequently: the builtin len function and the global OPCODES
|
||
array that it imports from sre_constants.py.
|
||
|
||
|
||
Discussion
|
||
|
||
Jeremy Hylton has an alternate proposal on the table [2]. His
|
||
proposal seeks to create a hybrid dictionary/list object for use
|
||
in global name lookups that would make global variable access look
|
||
more like local variable access. While there is no C code
|
||
available to examine, the Python implementation given in his
|
||
proposal still appears to require dictionary key lookup. It
|
||
doesn't appear that his proposal could speed local variable
|
||
attribute lookup, which might be worthwhile in some situations.
|
||
|
||
|
||
Backwards Compatibility
|
||
|
||
I don't believe there will be any serious issues of backward
|
||
compatibility. Obviously, Python bytecode that contains
|
||
TRACK_OBJECT opcodes could not be executed by earlier versions of
|
||
the interpreter, but breakage at the bytecode level is often
|
||
assumed between versions.
|
||
|
||
|
||
Implementation
|
||
|
||
TBD. This is where I need help. I believe there should be either
|
||
a central name/location registry or the code that modifies object
|
||
attributes should be modified, but I'm not sure the best way to go
|
||
about this. If you look at the code that implements the
|
||
STORE_GLOBAL and STORE_ATTR opcodes, it seems likely that some
|
||
changes will be required to PyDict_SetItem and PyObject_SetAttr or
|
||
their String variants. Ideally, there'd be a fairly central place
|
||
to localize these changes. If you begin considering tracking
|
||
attributes of local variables you get into issues of modifying
|
||
STORE_FAST as well, which could be a problem, since the name
|
||
bindings for local variables are changed much more frequently. (I
|
||
think an optimizer could avoid inserting the tracking code for the
|
||
attributes for any local variables where the variable's name
|
||
binding changes.)
|
||
|
||
|
||
Performance
|
||
|
||
I believe (though I have no code to prove it at this point), that
|
||
implementing TRACK_OBJECT will generally not be much more
|
||
expensive than a single LOAD_GLOBAL instruction or a
|
||
LOAD_GLOBAL/LOAD_ATTR pair. An optimizer should be able to avoid
|
||
converting LOAD_GLOBAL and LOAD_GLOBAL/LOAD_ATTR to the new scheme
|
||
unless the object access occurred within a loop. Further down the
|
||
line, a register-oriented replacement for the current Python
|
||
virtual machine [3] could conceivably eliminate most of the
|
||
LOAD_FAST instructions as well.
|
||
|
||
The number of tracked objects should be relatively small. All
|
||
active frames of all active threads could conceivably be tracking
|
||
objects, but this seems small compared to the number of functions
|
||
defined in a given application.
|
||
|
||
|
||
References
|
||
|
||
[1] http://mail.python.org/pipermail/python-dev/2000-July/007609.html
|
||
|
||
[2] http://www.zope.org/Members/jeremy/CurrentAndFutureProjects/FastGlobalsPEP
|
||
|
||
[3] http://www.musi-cal.com/~skip/python/rattlesnake20010813.tar.gz
|
||
|
||
|
||
Copyright
|
||
|
||
This document has been placed in the public domain.
|
||
|
||
|
||
|
||
Local Variables:
|
||
mode: indented-text
|
||
indent-tabs-mode: nil
|
||
End:
|