455 lines
19 KiB
Plaintext
455 lines
19 KiB
Plaintext
PEP: 266
|
||
Title: Optimizing Global Variable/Attribute Access
|
||
Version: $Revision$
|
||
Last-Modified: $Date$
|
||
Author: skip@pobox.com (Skip Montanaro)
|
||
Status: Withdrawn
|
||
Type: Standards Track
|
||
Created: 13-Aug-2001
|
||
Python-Version: 2.3
|
||
Post-History:
|
||
|
||
|
||
Abstract
|
||
|
||
The bindings for most global variables and attributes of other
|
||
modules typically never change during the execution of a Python
|
||
program, but because of Python's dynamic nature, code which
|
||
accesses such global objects must run through a full lookup each
|
||
time the object is needed. This PEP proposes a mechanism that
|
||
allows code that accesses most global objects to treat them as
|
||
local objects and places the burden of updating references on the
|
||
code that changes the name bindings of such objects.
|
||
|
||
|
||
Introduction
|
||
|
||
Consider the workhorse function sre_compile._compile. It is the
|
||
internal compilation function for the sre module. It consists
|
||
almost entirely of a loop over the elements of the pattern being
|
||
compiled, comparing opcodes with known constant values and
|
||
appending tokens to an output list. Most of the comparisons are
|
||
with constants imported from the sre_constants module. This means
|
||
there are lots of LOAD_GLOBAL bytecodes in the compiled output of
|
||
this module. Just by reading the code it's apparent that the
|
||
author intended LITERAL, NOT_LITERAL, OPCODES and many other
|
||
symbols to be constants. Still, each time they are involved in an
|
||
expression, they must be looked up anew.
|
||
|
||
Most global accesses are actually to objects that are "almost
|
||
constants". This includes global variables in the current module
|
||
as well as the attributes of other imported modules. Since they
|
||
rarely change, it seems reasonable to place the burden of updating
|
||
references to such objects on the code that changes the name
|
||
bindings. If sre_constants.LITERAL is changed to refer to another
|
||
object, perhaps it would be worthwhile for the code that modifies
|
||
the sre_constants module dict to correct any active references to
|
||
that object. By doing so, in many cases global variables and the
|
||
attributes of many objects could be cached as local variables. If
|
||
the bindings between the names given to the objects and the
|
||
objects themselves changes rarely, the cost of keeping track of
|
||
such objects should be low and the potential payoff fairly large.
|
||
|
||
In an attempt to gauge the effect of this proposal, I modified the
|
||
Pystone benchmark program included in the Python distribution to
|
||
cache global functions. Its main function, Proc0, makes calls to
|
||
ten different functions inside its for loop. In addition, Func2
|
||
calls Func1 repeatedly inside a loop. If local copies of these 11
|
||
global idenfiers are made before the functions' loops are entered,
|
||
performance on this particular benchmark improves by about two per
|
||
cent (from 5561 pystones to 5685 on my laptop). It gives some
|
||
indication that performance would be improved by caching most
|
||
global variable access. Note also that the pystone benchmark
|
||
makes essentially no accesses of global module attributes, an
|
||
anticipated area of improvement for this PEP.
|
||
|
||
Proposed Change
|
||
|
||
I propose that the Python virtual machine be modified to include
|
||
TRACK_OBJECT and UNTRACK_OBJECT opcodes. TRACK_OBJECT would
|
||
associate a global name or attribute of a global name with a slot
|
||
in the local variable array and perform an initial lookup of the
|
||
associated object to fill in the slot with a valid value. The
|
||
association it creates would be noted by the code responsible for
|
||
changing the name-to-object binding to cause the associated local
|
||
variable to be updated. The UNTRACK_OBJECT opcode would delete
|
||
any association between the name and the local variable slot.
|
||
|
||
|
||
Threads
|
||
|
||
Operation of this code in threaded programs will be no different
|
||
than in unthreaded programs. If you need to lock an object to
|
||
access it, you would have had to do that before TRACK_OBJECT would
|
||
have been executed and retain that lock until after you stop using
|
||
it.
|
||
|
||
FIXME: I suspect I need more here.
|
||
|
||
|
||
Rationale
|
||
|
||
Global variables and attributes rarely change. For example, once
|
||
a function imports the math module, the binding between the name
|
||
"math" and the module it refers to aren't likely to change.
|
||
Similarly, if the function that uses the math module refers to its
|
||
"sin" attribute, it's unlikely to change. Still, every time the
|
||
module wants to call the math.sin function, it must first execute
|
||
a pair of instructions:
|
||
|
||
LOAD_GLOBAL math
|
||
LOAD_ATTR sin
|
||
|
||
If the client module always assumed that math.sin was a local
|
||
constant and it was the responsibility of "external forces"
|
||
outside the function to keep the reference correct, we might have
|
||
code like this:
|
||
|
||
TRACK_OBJECT math.sin
|
||
...
|
||
LOAD_FAST math.sin
|
||
...
|
||
UNTRACK_OBJECT math.sin
|
||
|
||
If the LOAD_FAST was in a loop the payoff in reduced global loads
|
||
and attribute lookups could be significant.
|
||
|
||
This technique could, in theory, be applied to any global variable
|
||
access or attribute lookup. Consider this code:
|
||
|
||
l = []
|
||
for i in range(10):
|
||
l.append(math.sin(i))
|
||
return l
|
||
|
||
Even though l is a local variable, you still pay the cost of
|
||
loading l.append ten times in the loop. The compiler (or an
|
||
optimizer) could recognize that both math.sin and l.append are
|
||
being called in the loop and decide to generate the tracked local
|
||
code, avoiding it for the builtin range() function because it's
|
||
only called once during loop setup. Performance issues related to
|
||
accessing local variables make tracking l.append less attractive
|
||
than tracking globals such as math.sin.
|
||
|
||
According to a post to python-dev by Marc-Andre Lemburg [1],
|
||
LOAD_GLOBAL opcodes account for over 7% of all instructions
|
||
executed by the Python virtual machine. This can be a very
|
||
expensive instruction, at least relative to a LOAD_FAST
|
||
instruction, which is a simple array index and requires no extra
|
||
function calls by the virtual machine. I believe many LOAD_GLOBAL
|
||
instructions and LOAD_GLOBAL/LOAD_ATTR pairs could be converted to
|
||
LOAD_FAST instructions.
|
||
|
||
Code that uses global variables heavily often resorts to various
|
||
tricks to avoid global variable and attribute lookup. The
|
||
aforementioned sre_compile._compile function caches the append
|
||
method of the growing output list. Many people commonly abuse
|
||
functions' default argument feature to cache global variable
|
||
lookups. Both of these schemes are hackish and rarely address all
|
||
the available opportunities for optimization. (For example,
|
||
sre_compile._compile does not cache the two globals that it uses
|
||
most frequently: the builtin len function and the global OPCODES
|
||
array that it imports from sre_constants.py.
|
||
|
||
|
||
Questions
|
||
|
||
Q. What about threads? What if math.sin changes while in cache?
|
||
|
||
A. I believe the global interpreter lock will protect values from
|
||
being corrupted. In any case, the situation would be no worse
|
||
than it is today. If one thread modified math.sin after another
|
||
thread had already executed "LOAD_GLOBAL math", but before it
|
||
executed "LOAD_ATTR sin", the client thread would see the old
|
||
value of math.sin.
|
||
|
||
The idea is this. I use a multi-attribute load below as an
|
||
example, not because it would happen very often, but because by
|
||
demonstrating the recursive nature with an extra call hopefully
|
||
it will become clearer what I have in mind. Suppose a function
|
||
defined in module foo wants to access spam.eggs.ham and that
|
||
spam is a module imported at the module level in foo:
|
||
|
||
import spam
|
||
...
|
||
def somefunc():
|
||
...
|
||
x = spam.eggs.ham
|
||
|
||
Upon entry to somefunc, a TRACK_GLOBAL instruction will be
|
||
executed:
|
||
|
||
TRACK_GLOBAL spam.eggs.ham n
|
||
|
||
"spam.eggs.ham" is a string literal stored in the function's
|
||
constants array. "n" is a fastlocals index. "&fastlocals[n]"
|
||
is a reference to slot "n" in the executing frame's fastlocals
|
||
array, the location in which the spam.eggs.ham reference will
|
||
be stored. Here's what I envision happening:
|
||
|
||
1. The TRACK_GLOBAL instruction locates the object referred to
|
||
by the name "spam" and finds it in its module scope. It
|
||
then executes a C function like
|
||
|
||
_PyObject_TrackName(m, "spam.eggs.ham", &fastlocals[n])
|
||
|
||
where "m" is the module object with an attribute "spam".
|
||
|
||
2. The module object strips the leading "spam." stores the
|
||
necessary information ("eggs.ham" and &fastlocals[n]) in
|
||
case its binding for the name "eggs" changes. It then
|
||
locates the object referred to by the key "eggs" in its
|
||
dict and recursively calls
|
||
|
||
_PyObject_TrackName(eggs, "eggs.ham", &fastlocals[n])
|
||
|
||
3. The eggs object strips the leading "eggs.", stores the
|
||
("ham", &fastlocals[n]) info, locates the object in its
|
||
namespace called "ham" and calls _PyObject_TrackName once
|
||
again:
|
||
|
||
_PyObject_TrackName(ham, "ham", &fastlocals[n])
|
||
|
||
4. The "ham" object strips the leading string (no "." this
|
||
time, but that's a minor point), sees that the result is
|
||
empty, then uses its own value (self, probably) to update
|
||
the location it was handed:
|
||
|
||
Py_XDECREF(&fastlocals[n]);
|
||
&fastlocals[n] = self;
|
||
Py_INCREF(&fastlocals[n]);
|
||
|
||
At this point, each object involved in resolving
|
||
"spam.eggs.ham" knows which entry in its namespace needs to be
|
||
tracked and what location to update if that name changes.
|
||
Furthermore, if the one name it is tracking in its local
|
||
storage changes, it can call _PyObject_TrackName using the new
|
||
object once the change has been made. At the bottom end of
|
||
the food chain, the last object will always strip a name, see
|
||
the empty string and know that its value should be stuffed
|
||
into the location it's been passed.
|
||
|
||
When the object referred to by the dotted expression
|
||
"spam.eggs.ham" is going to go out of scope, an
|
||
"UNTRACK_GLOBAL spam.eggs.ham n" instruction is executed. It
|
||
has the effect of deleting all the tracking information that
|
||
TRACK_GLOBAL established.
|
||
|
||
The tracking operation may seem expensive, but recall that the
|
||
objects being tracked are assumed to be "almost constant", so
|
||
the setup cost will be traded off against hopefully multiple
|
||
local instead of global loads. For globals with attributes
|
||
the tracking setup cost grows but is offset by avoiding the
|
||
extra LOAD_ATTR cost. The TRACK_GLOBAL instruction needs to
|
||
perform a PyDict_GetItemString for the first name in the chain
|
||
to determine where the top-level object resides. Each object
|
||
in the chain has to store a string and an address somewhere,
|
||
probably in a dict that uses storage locations as keys
|
||
(e.g. the &fastlocals[n]) and strings as values. (This dict
|
||
could possibly be a central dict of dicts whose keys are
|
||
object addresses instead of a per-object dict.) It shouldn't
|
||
be the other way around because multiple active frames may
|
||
want to track "spam.eggs.ham", but only one frame will want to
|
||
associate that name with one of its fast locals slots.
|
||
|
||
|
||
Unresolved Issues
|
||
|
||
Threading -
|
||
|
||
What about this (dumb) code?
|
||
|
||
l = []
|
||
lock = threading.Lock()
|
||
...
|
||
def fill_l():
|
||
for i in range(1000):
|
||
lock.acquire()
|
||
l.append(math.sin(i))
|
||
lock.release()
|
||
...
|
||
def consume_l():
|
||
while 1:
|
||
lock.acquire()
|
||
if l:
|
||
elt = l.pop()
|
||
lock.release()
|
||
fiddle(elt)
|
||
|
||
It's not clear from a static analysis of the code what the lock is
|
||
protecting. (You can't tell at compile-time that threads are even
|
||
involved can you?) Would or should it affect attempts to track
|
||
"l.append" or "math.sin" in the fill_l function?
|
||
|
||
If we annotate the code with mythical track_object and untrack_object
|
||
builtins (I'm not proposing this, just illustrating where stuff would
|
||
go!), we get
|
||
|
||
l = []
|
||
lock = threading.Lock()
|
||
...
|
||
def fill_l():
|
||
track_object("l.append", append)
|
||
track_object("math.sin", sin)
|
||
for i in range(1000):
|
||
lock.acquire()
|
||
append(sin(i))
|
||
lock.release()
|
||
untrack_object("math.sin", sin)
|
||
untrack_object("l.append", append)
|
||
...
|
||
def consume_l():
|
||
while 1:
|
||
lock.acquire()
|
||
if l:
|
||
elt = l.pop()
|
||
lock.release()
|
||
fiddle(elt)
|
||
|
||
Is that correct both with and without threads (or at least equally
|
||
incorrect with and without threads)?
|
||
|
||
Nested Scopes -
|
||
|
||
The presence of nested scopes will affect where TRACK_GLOBAL finds
|
||
a global variable, but shouldn't affect anything after that. (I
|
||
think.)
|
||
|
||
Missing Attributes -
|
||
|
||
Suppose I am tracking the object referred to by "spam.eggs.ham"
|
||
and "spam.eggs" is rebound to an object that does not have a "ham"
|
||
attribute. It's clear this will be an AttributeError if the
|
||
programmer attempts to resolve "spam.eggs.ham" in the current
|
||
Python virtual machine, but suppose the programmer has anticipated
|
||
this case:
|
||
|
||
if hasattr(spam.eggs, "ham"):
|
||
print spam.eggs.ham
|
||
elif hasattr(spam.eggs, "bacon"):
|
||
print spam.eggs.bacon
|
||
else:
|
||
print "what? no meat?"
|
||
|
||
You can't raise an AttributeError when the tracking information is
|
||
recalculated. If it does not raise AttributeError and instead
|
||
lets the tracking stand, it may be setting the programmer up for a
|
||
very subtle error.
|
||
|
||
One solution to this problem would be to track the shortest
|
||
possible root of each dotted expression the function refers to
|
||
directly. In the above example, "spam.eggs" would be tracked, but
|
||
"spam.eggs.ham" and "spam.eggs.bacon" would not.
|
||
|
||
Who does the dirty work? -
|
||
|
||
In the Questions section I postulated the existence of a
|
||
_PyObject_TrackName function. While the API is fairly easy to
|
||
specify, the implementation behind-the-scenes is not so obvious.
|
||
A central dictionary could be used to track the name/location
|
||
mappings, but it appears that all setattr functions might need to
|
||
be modified to accommodate this new functionality.
|
||
|
||
If all types used the PyObject_GenericSetAttr function to set
|
||
attributes that would localize the update code somewhat. They
|
||
don't however (which is not too surprising), so it seems that all
|
||
getattrfunc and getattrofunc functions will have to be updated.
|
||
In addition, this would place an absolute requirement on C
|
||
extension module authors to call some function when an attribute
|
||
changes value (PyObject_TrackUpdate?).
|
||
|
||
Finally, it's quite possible that some attributes will be set by
|
||
side effect and not by any direct call to a setattr method of some
|
||
sort. Consider a device interface module that has an interrupt
|
||
routine that copies the contents of a device register into a slot
|
||
in the object's struct whenever it changes. In these situations,
|
||
more extensive modifications would have to be made by the module
|
||
author. To identify such situations at compile time would be
|
||
impossible. I think an extra slot could be added to PyTypeObjects
|
||
to indicate if an object's code is safe for global tracking. It
|
||
would have a default value of 0 (Py_TRACKING_NOT_SAFE). If an
|
||
extension module author has implemented the necessary tracking
|
||
support, that field could be initialized to 1 (Py_TRACKING_SAFE).
|
||
_PyObject_TrackName could check that field and issue a warning if
|
||
it is asked to track an object that the author has not explicitly
|
||
said was safe for tracking.
|
||
|
||
Discussion
|
||
|
||
Jeremy Hylton has an alternate proposal on the table [2]. His
|
||
proposal seeks to create a hybrid dictionary/list object for use
|
||
in global name lookups that would make global variable access look
|
||
more like local variable access. While there is no C code
|
||
available to examine, the Python implementation given in his
|
||
proposal still appears to require dictionary key lookup. It
|
||
doesn't appear that his proposal could speed local variable
|
||
attribute lookup, which might be worthwhile in some situations if
|
||
potential performance burdens could be addressed.
|
||
|
||
|
||
Backwards Compatibility
|
||
|
||
I don't believe there will be any serious issues of backward
|
||
compatibility. Obviously, Python bytecode that contains
|
||
TRACK_OBJECT opcodes could not be executed by earlier versions of
|
||
the interpreter, but breakage at the bytecode level is often
|
||
assumed between versions.
|
||
|
||
|
||
Implementation
|
||
|
||
TBD. This is where I need help. I believe there should be either
|
||
a central name/location registry or the code that modifies object
|
||
attributes should be modified, but I'm not sure the best way to go
|
||
about this. If you look at the code that implements the
|
||
STORE_GLOBAL and STORE_ATTR opcodes, it seems likely that some
|
||
changes will be required to PyDict_SetItem and PyObject_SetAttr or
|
||
their String variants. Ideally, there'd be a fairly central place
|
||
to localize these changes. If you begin considering tracking
|
||
attributes of local variables you get into issues of modifying
|
||
STORE_FAST as well, which could be a problem, since the name
|
||
bindings for local variables are changed much more frequently. (I
|
||
think an optimizer could avoid inserting the tracking code for the
|
||
attributes for any local variables where the variable's name
|
||
binding changes.)
|
||
|
||
|
||
Performance
|
||
|
||
I believe (though I have no code to prove it at this point), that
|
||
implementing TRACK_OBJECT will generally not be much more
|
||
expensive than a single LOAD_GLOBAL instruction or a
|
||
LOAD_GLOBAL/LOAD_ATTR pair. An optimizer should be able to avoid
|
||
converting LOAD_GLOBAL and LOAD_GLOBAL/LOAD_ATTR to the new scheme
|
||
unless the object access occurred within a loop. Further down the
|
||
line, a register-oriented replacement for the current Python
|
||
virtual machine [3] could conceivably eliminate most of the
|
||
LOAD_FAST instructions as well.
|
||
|
||
The number of tracked objects should be relatively small. All
|
||
active frames of all active threads could conceivably be tracking
|
||
objects, but this seems small compared to the number of functions
|
||
defined in a given application.
|
||
|
||
|
||
References
|
||
|
||
[1] http://mail.python.org/pipermail/python-dev/2000-July/007609.html
|
||
|
||
[2] http://www.zope.org/Members/jeremy/CurrentAndFutureProjects/FastGlobalsPEP
|
||
|
||
[3] http://www.musi-cal.com/~skip/python/rattlesnake20010813.tar.gz
|
||
|
||
|
||
Copyright
|
||
|
||
This document has been placed in the public domain.
|
||
|
||
|
||
|
||
Local Variables:
|
||
mode: indented-text
|
||
indent-tabs-mode: nil
|
||
fill-column: 70
|
||
End:
|