205 lines
8.5 KiB
Plaintext
205 lines
8.5 KiB
Plaintext
|
PEP: 266
|
|||
|
Title: Optimizing Global Variable/Attribute Access
|
|||
|
Version: $Revision$
|
|||
|
Author: skip@pobox.com (Skip Montanaro)
|
|||
|
Status: Draft
|
|||
|
Type: Standards Track
|
|||
|
Python-Version: 2.3
|
|||
|
Created: 13-Aug-2001
|
|||
|
Post-History:
|
|||
|
|
|||
|
|
|||
|
Abstract
|
|||
|
|
|||
|
The bindings for most global variables and attributes of other
|
|||
|
modules typically never change during the execution of a Python
|
|||
|
program, but because of Python's dynamic nature, code which
|
|||
|
accesses such global objects must run through a full lookup each
|
|||
|
time the object is needed. This PEP proposes a mechanism that
|
|||
|
allows code that accesses most global objects to treat them as
|
|||
|
local objects and places the burden of updating references on the
|
|||
|
code that changes the name bindings of such objects.
|
|||
|
|
|||
|
|
|||
|
Introduction
|
|||
|
|
|||
|
Consider the workhorse function sre_compile._compile. It is the
|
|||
|
internal compilation function for the sre module. It consists
|
|||
|
almost entirely of a loop over the elements of the pattern being
|
|||
|
compiled, comparing opcodes with known constant values and
|
|||
|
appending tokens to an output list. Most of the comparisons are
|
|||
|
with constants imported from the sre_constants module. This means
|
|||
|
there are lots of LOAD_GLOBAL bytecodes in the compiled output of
|
|||
|
this module. Just by reading the code it's apparent that the
|
|||
|
author intended LITERAL, NOT_LITERAL, OPCODES and many other
|
|||
|
symbols to be constants. Still, each time they are involved in an
|
|||
|
expression, they must be looked up anew.
|
|||
|
|
|||
|
Most global accesses are actually to objects that are "almost
|
|||
|
constants". This includes global variables in the current module
|
|||
|
as well as the attributes of other imported modules. Since they
|
|||
|
rarely change, it seems reasonable to place the burden of updating
|
|||
|
references to such objects on the code that changes the name
|
|||
|
bindings. If sre_constants.LITERAL is changed to refer to another
|
|||
|
object, perhaps it would be worthwhile for the code that modifies
|
|||
|
the sre_constants module dict to correct any active references to
|
|||
|
that object. By doing so, in many cases global variables and the
|
|||
|
attributes of many objects could be cached as local variables. If
|
|||
|
the bindings between the names given to the objects and the
|
|||
|
objects themselves changes rarely, the cost of keeping track of
|
|||
|
such objects should be low and the potential payoff fairly large.
|
|||
|
|
|||
|
|
|||
|
Proposed Change
|
|||
|
|
|||
|
I propose that the Python virtual machine be modified to include
|
|||
|
TRACK_OBJECT and UNTRACK_OBJECT opcodes. TRACK_OBJECT would
|
|||
|
associate a global name or attribute of a global name with a slot
|
|||
|
in the local variable array and perform an initial lookup of the
|
|||
|
associated object to fill in the slot with a valid value. The
|
|||
|
association it creates would be noted by the code responsible for
|
|||
|
changing the name-to-object binding to cause the associated local
|
|||
|
variable to be updated. The UNTRACK_OBJECT opcode would delete
|
|||
|
any association between the name and the local variable slot.
|
|||
|
|
|||
|
|
|||
|
Rationale
|
|||
|
|
|||
|
Global variables and attributes rarely change. For example, once
|
|||
|
a function imports the math module, the binding between the name
|
|||
|
"math" and the module it refers to aren't likely to change.
|
|||
|
Similarly, if the function that uses the math module refers to its
|
|||
|
"sin" attribute, it's unlikely to change. Still, every time the
|
|||
|
module wants to call the math.sin function, it must first execute
|
|||
|
a pair of instructions:
|
|||
|
|
|||
|
LOAD_GLOBAL math
|
|||
|
LOAD_ATTR sin
|
|||
|
|
|||
|
If the client module always assumed that math.sin was a local
|
|||
|
constant and it was the responsibility of "external forces"
|
|||
|
outside the function to keep the reference correct, we might have
|
|||
|
code like this:
|
|||
|
|
|||
|
TRACK_OBJECT math.sin
|
|||
|
...
|
|||
|
LOAD_FAST math.sin
|
|||
|
...
|
|||
|
UNTRACK_OBJECT math.sin
|
|||
|
|
|||
|
If the LOAD_FAST was in a loop the payoff in reduced global loads
|
|||
|
and attribute lookups could be significant.
|
|||
|
|
|||
|
This technique could, in theory, be applied to any global variable
|
|||
|
access or attribute lookup. Consider this code:
|
|||
|
|
|||
|
l = []
|
|||
|
for i in range(10):
|
|||
|
l.append(math.sin(i))
|
|||
|
return l
|
|||
|
|
|||
|
Even though l is a local variable, you still pay the cost of
|
|||
|
loading l.append ten times in the loop. The compiler (or an
|
|||
|
optimizer) could recognize that both math.sin and l.append are
|
|||
|
being called in the loop and decide to generate the tracked local
|
|||
|
code, avoiding it for the builtin range() function because it's
|
|||
|
only called once during loop setup.
|
|||
|
|
|||
|
According to a post to python-dev by Marc-Andre Lemburg [1],
|
|||
|
LOAD_GLOBAL opcodes account for over 7% of all instructions
|
|||
|
executed by the Python virtual machine. This can be a very
|
|||
|
expensive instruction, at least relative to a LOAD_FAST
|
|||
|
instruction, which is a simple array index and requires no extra
|
|||
|
function calls by the virtual machine. I believe many LOAD_GLOBAL
|
|||
|
instructions and LOAD_GLOBAL/ LOAD_ATTR pairs could be converted
|
|||
|
to LOAD_FAST instructions.
|
|||
|
|
|||
|
Code that uses global variables heavily often resorts to various
|
|||
|
tricks to avoid global variable and attribute lookup. The
|
|||
|
aforementioned sre_compile._compile function caches the append
|
|||
|
method of the growing output list. Many people commonly abuse
|
|||
|
functions' default argument feature to cache global variable
|
|||
|
lookups. Both of these schemes are hackish and rarely address all
|
|||
|
the available opportunities for optimization. (For example,
|
|||
|
sre_compile._compile does not cache the two globals that it uses
|
|||
|
most frequently: the builtin len function and the global OPCODES
|
|||
|
array that it imports from sre_constants.py.
|
|||
|
|
|||
|
|
|||
|
Discussion
|
|||
|
|
|||
|
Jeremy Hylton has an alternate proposal on the table [2]. His
|
|||
|
proposal seeks to create a hybrid dictionary/list object for use
|
|||
|
in global name lookups that would make global variable access look
|
|||
|
more like local variable access. While there is no C code
|
|||
|
available to examine, the Python implementation given in his
|
|||
|
proposal still appears to require dictionary key lookup. It
|
|||
|
doesn't appear that his proposal could speed local variable
|
|||
|
attribute lookup, which might be worthwhile in some situations.
|
|||
|
|
|||
|
|
|||
|
Backwards Compatibility
|
|||
|
|
|||
|
I don't believe there will be any serious issues of backward
|
|||
|
compatibility. Obviously, Python bytecode that contains
|
|||
|
TRACK_OBJECT opcodes could not be executed by earlier versions of
|
|||
|
the interpreter, but breakage at the bytecode level is often
|
|||
|
assumed between versions.
|
|||
|
|
|||
|
|
|||
|
Implementation
|
|||
|
|
|||
|
TBD. This is where I need help. I believe there should be either
|
|||
|
a central name/location registry or the code that modifies object
|
|||
|
attributes should be modified, but I'm not sure the best way to go
|
|||
|
about this. If you look at the code that implements the
|
|||
|
STORE_GLOBAL and STORE_ATTR opcodes, it seems likely that some
|
|||
|
changes will be required to PyDict_SetItem and PyObject_SetAttr or
|
|||
|
their String variants. Ideally, there'd be a fairly central place
|
|||
|
to localize these changes. If you begin considering tracking
|
|||
|
attributes of local variables you get into issues of modifying
|
|||
|
STORE_FAST as well, which could be a problem, since the name
|
|||
|
bindings for local variables are changed much more frequently. (I
|
|||
|
think an optimizer could avoid inserting the tracking code for the
|
|||
|
attributes for any local variables where the variable's name
|
|||
|
binding changes.)
|
|||
|
|
|||
|
|
|||
|
Performance
|
|||
|
|
|||
|
I believe (though I have no code to prove it at this point), that
|
|||
|
implementing TRACK_OBJECT will generally not be much more
|
|||
|
expensive than a single LOAD_GLOBAL instruction or a
|
|||
|
LOAD_GLOBAL/LOAD_ATTR pair. An optimizer should be able to avoid
|
|||
|
converting LOAD_GLOBAL and LOAD_GLOBAL/LOAD_ATTR to the new scheme
|
|||
|
unless the object access occurred within a loop. Further down the
|
|||
|
line, a register-oriented replacement for the current Python
|
|||
|
virtual machine [3] could conceivably eliminate most of the
|
|||
|
LOAD_FAST instructions as well.
|
|||
|
|
|||
|
The number of tracked objects should be relatively small. All
|
|||
|
active frames of all active threads could conceivably be tracking
|
|||
|
objects, but this seems small compared to the number of functions
|
|||
|
defined in a given application.
|
|||
|
|
|||
|
|
|||
|
References
|
|||
|
|
|||
|
[1] http://mail.python.org/pipermail/python-dev/2000-July/007609.html
|
|||
|
|
|||
|
[2] http://www.zope.org/Members/jeremy/CurrentAndFutureProjects/FastGlobalsPEP
|
|||
|
|
|||
|
[3] http://www.musi-cal.com/~skip/python/rattlesnake20010813.tar.gz
|
|||
|
|
|||
|
|
|||
|
Copyright
|
|||
|
|
|||
|
This document has been placed in the public domain.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Local Variables:
|
|||
|
mode: indented-text
|
|||
|
indent-tabs-mode: nil
|
|||
|
End:
|