python-peps/peps/pep-0329.rst

268 lines
9.5 KiB
ReStructuredText

PEP: 329
Title: Treating Builtins as Constants in the Standard Library
Version: $Revision$
Last-Modified: $Date$
Author: Raymond Hettinger <python@rcn.com>
Status: Rejected
Type: Standards Track
Content-Type: text/x-rst
Created: 18-Apr-2004
Python-Version: 2.4
Post-History: 18-Apr-2004
Abstract
========
The proposal is to add a function for treating builtin references as
constants and to apply that function throughout the standard library.
Status
======
The PEP is self rejected by the author. Though the ASPN recipe was
well received, there was less willingness to consider this for
inclusion in the core distribution.
The Jython implementation does not use byte codes, so its performance
would suffer if the current ``_len=len`` optimizations were removed.
Also, altering byte codes is one of the least clean ways to improve
performance and enable cleaner coding. A more robust solution would
likely involve compiler pragma directives or metavariables indicating
what can be optimized (similar to const/volatile declarations).
Motivation
==========
The library contains code such as ``_len=len`` which is intended to
create fast local references instead of slower global lookups. Though
necessary for performance, these constructs clutter the code and are
usually incomplete (missing many opportunities).
If the proposal is adopted, those constructs could be eliminated from
the code base and at the same time improve upon their results in terms
of performance.
There are currently over a hundred instances of ``while 1`` in the
library. They were not replaced with the more readable ``while True``
because of performance reasons (the compiler cannot eliminate the test
because ``True`` is not known to always be a constant). Conversion of
True to a constant will clarify the code while retaining performance.
Many other basic Python operations run much slower because of global
lookups. In try/except statements, the trapped exceptions are
dynamically looked up before testing whether they match.
Similarly, simple identity tests such as ``while x is not None``
require the ``None`` variable to be re-looked up on every pass.
Builtin lookups are especially egregious because the enclosing global
scope must be checked first. These lookup chains devour cache space
that is best used elsewhere.
In short, if the proposal is adopted, the code will become cleaner
and performance will improve across the board.
Proposal
========
Add a module called codetweaks.py which contains two functions,
``bind_constants()`` and ``bind_all()``. The first function performs
constant binding and the second recursively applies it to every
function and class in a target module.
For most modules in the standard library, add a pair of lines near
the end of the script::
import codetweaks, sys
codetweaks.bind_all(sys.modules[__name__])
In addition to binding builtins, there are some modules (like
``sre_compile``) where it also makes sense to bind module variables
as well as builtins into constants.
Questions and Answers
=====================
1. Will this make everyone divert their attention to optimization
issues?
Because it is done automatically, it reduces the need to think
about optimizations.
2. In a nutshell, how does it work?
Every function has attributes with its bytecodes (the language of
the Python virtual machine) and a table of constants. The bind
function scans the bytecodes for a ``LOAD_GLOBAL`` instruction and
checks to see whether the value is already known. If so, it adds
that value to the constants table and replaces the opcode with
``LOAD_CONSTANT``.
3. When does it work?
When a module is imported for the first time, python compiles the
bytecode and runs the binding optimization. Subsequent imports
just re-use the previous work. Each session repeats this process
(the results are not saved in ``pyc`` files).
4. How do you know this works?
I implemented it, applied it to every module in library, and the test
suite ran without exception.
5. What if the module defines a variable shadowing a builtin?
This does happen. For instance, True can be redefined at the module
level as ``True = (1==1)``. The sample implementation below detects the
shadowing and leaves the global lookup unchanged.
6. Are you the first person to recognize that most global lookups are for
values that never change?
No, this has long been known. Skip Montanaro provides an eloquent
explanation in :pep:`266`.
7. What if I want to replace the builtins module and supply my own
implementations?
Either do this before importing a module, or just reload the
module, or disable ``codetweaks.py`` (it will have a disable flag).
8. How susceptible is this module to changes in Python's byte coding?
It imports ``opcode.py`` to protect against renumbering. Also, it
uses ``LOAD_CONST`` and ``LOAD_GLOBAL`` which are fundamental and have
been around forever. That notwithstanding, the coding scheme could
change and this implementation would have to change along with
modules like ``dis`` which also rely on the current coding scheme.
9. What is the effect on startup time?
I could not measure a difference. None of the startup modules are
bound except for warnings.py. Also, the binding function is very
fast, making just a single pass over the code string in search of
the ``LOAD_GLOBAL`` opcode.
Sample Implementation
=====================
Here is a sample implementation for codetweaks.py::
from types import ClassType, FunctionType
from opcode import opmap, HAVE_ARGUMENT, EXTENDED_ARG
LOAD_GLOBAL, LOAD_CONST = opmap['LOAD_GLOBAL'], opmap['LOAD_CONST']
ABORT_CODES = (EXTENDED_ARG, opmap['STORE_GLOBAL'])
def bind_constants(f, builtin_only=False, stoplist=[], verbose=False):
""" Return a new function with optimized global references.
Replaces global references with their currently defined values.
If not defined, the dynamic (runtime) global lookup is left undisturbed.
If builtin_only is True, then only builtins are optimized.
Variable names in the stoplist are also left undisturbed.
If verbose is True, prints each substitution as is occurs.
"""
import __builtin__
env = vars(__builtin__).copy()
stoplist = dict.fromkeys(stoplist)
if builtin_only:
stoplist.update(f.func_globals)
else:
env.update(f.func_globals)
co = f.func_code
newcode = map(ord, co.co_code)
newconsts = list(co.co_consts)
codelen = len(newcode)
i = 0
while i < codelen:
opcode = newcode[i]
if opcode in ABORT_CODES:
return f # for simplicity, only optimize common cases
if opcode == LOAD_GLOBAL:
oparg = newcode[i+1] + (newcode[i+2] << 8)
name = co.co_names[oparg]
if name in env and name not in stoplist:
value = env[name]
try:
pos = newconsts.index(value)
except ValueError:
pos = len(newconsts)
newconsts.append(value)
newcode[i] = LOAD_CONST
newcode[i+1] = pos & 0xFF
newcode[i+2] = pos >> 8
if verbose:
print name, '-->', value
i += 1
if opcode >= HAVE_ARGUMENT:
i += 2
codestr = ''.join(map(chr, newcode))
codeobj = type(co)(co.co_argcount, co.co_nlocals, co.co_stacksize,
co.co_flags, codestr, tuple(newconsts), co.co_names,
co.co_varnames, co.co_filename, co.co_name,
co.co_firstlineno, co.co_lnotab, co.co_freevars,
co.co_cellvars)
return type(f)(codeobj, f.func_globals, f.func_name, f.func_defaults,
f.func_closure)
def bind_all(mc, builtin_only=False, stoplist=[], verbose=False):
"""Recursively apply bind_constants() to functions in a module or class.
Use as the last line of the module (after everything is defined, but
before test code).
In modules that need modifiable globals, set builtin_only to True.
"""
for k, v in vars(mc).items():
if type(v) is FunctionType:
newv = bind_constants(v, builtin_only, stoplist, verbose)
setattr(mc, k, newv)
elif type(v) in (type, ClassType):
bind_all(v, builtin_only, stoplist, verbose)
def f(): pass
try:
f.func_code.code
except AttributeError: # detect non-CPython environments
bind_all = lambda *args, **kwds: 0
del f
import sys
bind_all(sys.modules[__name__]) # Optimizer, optimize thyself!
Note the automatic detection of a non-CPython environment that does not
have bytecodes [2]_. In that situation, the bind functions would simply
return the original function unchanged. This assures that the two
line additions to library modules do not impact other implementations.
The final code should add a flag to make it easy to disable binding.
References
==========
[1] ASPN Recipe for a non-private implementation
\ https://code.activestate.com/recipes/277940/
.. [2] Differences between CPython and Jython
https://web.archive.org/web/20031018014238/http://www.jython.org/cgi-bin/faqw.py?req=show&file=faq01.003.htp
Copyright
=========
This document has been placed in the public domain.