reSTify PEP 280 (#318)
This commit is contained in:
parent
7f68262609
commit
75b53a7813
814
pep-0280.txt
814
pep-0280.txt
|
@ -5,522 +5,534 @@ Last-Modified: $Date$
|
||||||
Author: guido@python.org (Guido van Rossum)
|
Author: guido@python.org (Guido van Rossum)
|
||||||
Status: Deferred
|
Status: Deferred
|
||||||
Type: Standards Track
|
Type: Standards Track
|
||||||
|
Content-Type: text/x-rst
|
||||||
Created: 10-Feb-2002
|
Created: 10-Feb-2002
|
||||||
Python-Version: 2.3
|
Python-Version: 2.3
|
||||||
Post-History:
|
Post-History:
|
||||||
|
|
||||||
|
|
||||||
Deferral
|
Deferral
|
||||||
|
========
|
||||||
|
|
||||||
While this PEP is a nice idea, no-one has yet emerged to do the work of
|
While this PEP is a nice idea, no-one has yet emerged to do the work of
|
||||||
hashing out the differences between this PEP, PEP 266 and PEP 267.
|
hashing out the differences between this PEP, PEP 266 and PEP 267.
|
||||||
Hence, it is being deferred.
|
Hence, it is being deferred.
|
||||||
|
|
||||||
|
|
||||||
Abstract
|
Abstract
|
||||||
|
========
|
||||||
|
|
||||||
This PEP describes yet another approach to optimizing access to
|
This PEP describes yet another approach to optimizing access to
|
||||||
module globals, providing an alternative to PEP 266 (Optimizing
|
module globals, providing an alternative to PEP 266 (Optimizing
|
||||||
Global Variable/Attribute Access by Skip Montanaro) and PEP 267
|
Global Variable/Attribute Access by Skip Montanaro) and PEP 267
|
||||||
(Optimized Access to Module Namespaces by Jeremy Hylton).
|
(Optimized Access to Module Namespaces by Jeremy Hylton).
|
||||||
|
|
||||||
The expectation is that eventually one approach will be picked and
|
The expectation is that eventually one approach will be picked and
|
||||||
implemented; possibly multiple approaches will be prototyped
|
implemented; possibly multiple approaches will be prototyped
|
||||||
first.
|
first.
|
||||||
|
|
||||||
|
|
||||||
Description
|
Description
|
||||||
|
===========
|
||||||
|
|
||||||
(Note: Jason Orendorff writes: """I implemented this once, long
|
(Note: Jason Orendorff writes: """I implemented this once, long
|
||||||
ago, for Python 1.5-ish, I believe. I got it to the point where
|
ago, for Python 1.5-ish, I believe. I got it to the point where
|
||||||
it was only 15% slower than ordinary Python, then abandoned it.
|
it was only 15% slower than ordinary Python, then abandoned it.
|
||||||
;) In my implementation, "cells" were real first-class objects,
|
;) In my implementation, "cells" were real first-class objects,
|
||||||
and "celldict" was a copy-and-hack version of dictionary. I
|
and "celldict" was a copy-and-hack version of dictionary. I
|
||||||
forget how the rest worked.""" Reference:
|
forget how the rest worked.""" Reference:
|
||||||
https://mail.python.org/pipermail/python-dev/2002-February/019876.html)
|
https://mail.python.org/pipermail/python-dev/2002-February/019876.html)
|
||||||
|
|
||||||
Let a cell be a really simple Python object, containing a pointer
|
Let a cell be a really simple Python object, containing a pointer
|
||||||
to a Python object and a pointer to a cell. Both pointers may be
|
to a Python object and a pointer to a cell. Both pointers may be
|
||||||
NULL. A Python implementation could be:
|
``NULL``. A Python implementation could be::
|
||||||
|
|
||||||
class cell(object):
|
class cell(object):
|
||||||
|
|
||||||
def __init__(self):
|
def __init__(self):
|
||||||
self.objptr = NULL
|
self.objptr = NULL
|
||||||
self.cellptr = NULL
|
self.cellptr = NULL
|
||||||
|
|
||||||
The cellptr attribute is used for chaining cells together for
|
The cellptr attribute is used for chaining cells together for
|
||||||
searching built-ins; this will be explained later.
|
searching built-ins; this will be explained later.
|
||||||
|
|
||||||
Let a celldict be a mapping from strings (the names of a module's
|
Let a celldict be a mapping from strings (the names of a module's
|
||||||
globals) to objects (the values of those globals), implemented
|
globals) to objects (the values of those globals), implemented
|
||||||
using a dict of cells. A Python implementation could be:
|
using a dict of cells. A Python implementation could be::
|
||||||
|
|
||||||
class celldict(object):
|
class celldict(object):
|
||||||
|
|
||||||
def __init__(self):
|
def __init__(self):
|
||||||
self.__dict = {} # dict of cells
|
self.__dict = {} # dict of cells
|
||||||
|
|
||||||
def getcell(self, key):
|
def getcell(self, key):
|
||||||
c = self.__dict.get(key)
|
c = self.__dict.get(key)
|
||||||
if c is None:
|
if c is None:
|
||||||
c = cell()
|
c = cell()
|
||||||
self.__dict[key] = c
|
self.__dict[key] = c
|
||||||
return c
|
return c
|
||||||
|
|
||||||
def cellkeys(self):
|
def cellkeys(self):
|
||||||
return self.__dict.keys()
|
return self.__dict.keys()
|
||||||
|
|
||||||
def __getitem__(self, key):
|
def __getitem__(self, key):
|
||||||
c = self.__dict.get(key)
|
c = self.__dict.get(key)
|
||||||
if c is None:
|
if c is None:
|
||||||
raise KeyError, key
|
raise KeyError, key
|
||||||
value = c.objptr
|
value = c.objptr
|
||||||
if value is NULL:
|
if value is NULL:
|
||||||
raise KeyError, key
|
raise KeyError, key
|
||||||
else:
|
else:
|
||||||
return value
|
return value
|
||||||
|
|
||||||
def __setitem__(self, key, value):
|
def __setitem__(self, key, value):
|
||||||
c = self.__dict.get(key)
|
c = self.__dict.get(key)
|
||||||
if c is None:
|
if c is None:
|
||||||
c = cell()
|
c = cell()
|
||||||
self.__dict[key] = c
|
self.__dict[key] = c
|
||||||
c.objptr = value
|
c.objptr = value
|
||||||
|
|
||||||
def __delitem__(self, key):
|
def __delitem__(self, key):
|
||||||
c = self.__dict.get(key)
|
c = self.__dict.get(key)
|
||||||
if c is None or c.objptr is NULL:
|
if c is None or c.objptr is NULL:
|
||||||
raise KeyError, key
|
raise KeyError, key
|
||||||
|
c.objptr = NULL
|
||||||
|
|
||||||
|
def keys(self):
|
||||||
|
return [k for k, c in self.__dict.iteritems()
|
||||||
|
if c.objptr is not NULL]
|
||||||
|
|
||||||
|
def items(self):
|
||||||
|
return [k, c.objptr for k, c in self.__dict.iteritems()
|
||||||
|
if c.objptr is not NULL]
|
||||||
|
|
||||||
|
def values(self):
|
||||||
|
preturn [c.objptr for c in self.__dict.itervalues()
|
||||||
|
if c.objptr is not NULL]
|
||||||
|
|
||||||
|
def clear(self):
|
||||||
|
for c in self.__dict.values():
|
||||||
c.objptr = NULL
|
c.objptr = NULL
|
||||||
|
|
||||||
def keys(self):
|
# Etc.
|
||||||
return [k for k, c in self.__dict.iteritems()
|
|
||||||
if c.objptr is not NULL]
|
|
||||||
|
|
||||||
def items(self):
|
It is possible that a cell exists corresponding to a given key,
|
||||||
return [k, c.objptr for k, c in self.__dict.iteritems()
|
but the cell's objptr is ``NULL``; let's call such a cell empty. When
|
||||||
if c.objptr is not NULL]
|
the celldict is used as a mapping, it is as if empty cells don't
|
||||||
|
exist. However, once added, a cell is never deleted from a
|
||||||
|
celldict, and it is possible to get at empty cells using the
|
||||||
|
``getcell()`` method.
|
||||||
|
|
||||||
def values(self):
|
The celldict implementation never uses the cellptr attribute of
|
||||||
preturn [c.objptr for c in self.__dict.itervalues()
|
cells.
|
||||||
if c.objptr is not NULL]
|
|
||||||
|
|
||||||
def clear(self):
|
We change the module implementation to use a celldict for its
|
||||||
for c in self.__dict.values():
|
``__dict__``. The module's getattr, setattr and delattr operations
|
||||||
c.objptr = NULL
|
now map to getitem, setitem and delitem on the celldict. The type
|
||||||
|
of ``<module>.__dict__`` and ``globals()`` is probably the only backwards
|
||||||
|
incompatibility.
|
||||||
|
|
||||||
# Etc.
|
When a module is initialized, its ``__builtins__`` is initialized from
|
||||||
|
the ``__builtin__`` module's ``__dict__``, which is itself a celldict.
|
||||||
|
For each cell in ``__builtins__``, the new module's ``__dict__`` adds a
|
||||||
|
cell with a ``NULL`` objptr, whose cellptr points to the corresponding
|
||||||
|
cell of ``__builtins__``. Python pseudo-code (ignoring rexec)::
|
||||||
|
|
||||||
It is possible that a cell exists corresponding to a given key,
|
import __builtin__
|
||||||
but the cell's objptr is NULL; let's call such a cell empty. When
|
|
||||||
the celldict is used as a mapping, it is as if empty cells don't
|
|
||||||
exist. However, once added, a cell is never deleted from a
|
|
||||||
celldict, and it is possible to get at empty cells using the
|
|
||||||
getcell() method.
|
|
||||||
|
|
||||||
The celldict implementation never uses the cellptr attribute of
|
class module(object):
|
||||||
cells.
|
|
||||||
|
|
||||||
We change the module implementation to use a celldict for its
|
def __init__(self):
|
||||||
__dict__. The module's getattr, setattr and delattr operations
|
self.__dict__ = d = celldict()
|
||||||
now map to getitem, setitem and delitem on the celldict. The type
|
d['__builtins__'] = bd = __builtin__.__dict__
|
||||||
of <module>.__dict__ and globals() is probably the only backwards
|
for k in bd.cellkeys():
|
||||||
incompatibility.
|
c = self.__dict__.getcell(k)
|
||||||
|
c.cellptr = bd.getcell(k)
|
||||||
|
|
||||||
When a module is initialized, its __builtins__ is initialized from
|
def __getattr__(self, k):
|
||||||
the __builtin__ module's __dict__, which is itself a celldict.
|
try:
|
||||||
For each cell in __builtins__, the new module's __dict__ adds a
|
return self.__dict__[k]
|
||||||
cell with a NULL objptr, whose cellptr points to the corresponding
|
except KeyError:
|
||||||
cell of __builtins__. Python pseudo-code (ignoring rexec):
|
raise IndexError, k
|
||||||
|
|
||||||
import __builtin__
|
def __setattr__(self, k, v):
|
||||||
|
self.__dict__[k] = v
|
||||||
|
|
||||||
class module(object):
|
def __delattr__(self, k):
|
||||||
|
del self.__dict__[k]
|
||||||
|
|
||||||
def __init__(self):
|
The compiler generates ``LOAD_GLOBAL_CELL <i>`` (and ``STORE_GLOBAL_CELL
|
||||||
self.__dict__ = d = celldict()
|
<i>`` etc.) opcodes for references to globals, where ``<i>`` is a small
|
||||||
d['__builtins__'] = bd = __builtin__.__dict__
|
index with meaning only within one code object like the const
|
||||||
for k in bd.cellkeys():
|
index in ``LOAD_CONST``. The code object has a new tuple, ``co_globals``,
|
||||||
c = self.__dict__.getcell(k)
|
giving the names of the globals referenced by the code indexed by
|
||||||
c.cellptr = bd.getcell(k)
|
``<i>``. No new analysis is required to be able to do this.
|
||||||
|
|
||||||
def __getattr__(self, k):
|
When a function object is created from a code object and a celldict,
|
||||||
try:
|
the function object creates an array of cell pointers by asking the
|
||||||
return self.__dict__[k]
|
celldict for cells corresponding to the names in the code object's
|
||||||
except KeyError:
|
``co_globals``. If the celldict doesn't already have a cell for a
|
||||||
raise IndexError, k
|
particular name, it creates and an empty one. This array of cell
|
||||||
|
pointers is stored on the function object as ``func_cells``. When a
|
||||||
|
function object is created from a regular dict instead of a
|
||||||
|
celldict, ``func_cells`` is a ``NULL`` pointer.
|
||||||
|
|
||||||
def __setattr__(self, k, v):
|
When the VM executes a ``LOAD_GLOBAL_CELL <i>`` instruction, it gets
|
||||||
self.__dict__[k] = v
|
cell number ``<i>`` from ``func_cells``. It then looks in the cell's
|
||||||
|
``PyObject`` pointer, and if not ``NULL``, that's the global value. If it
|
||||||
|
is ``NULL``, it follows the cell's cell pointer to the next cell, if it
|
||||||
|
is not ``NULL``, and looks in the ``PyObject`` pointer in that cell. If
|
||||||
|
that's also ``NULL``, or if there is no second cell, ``NameError`` is
|
||||||
|
raised. (It could follow the chain of cell pointers until a ``NULL``
|
||||||
|
cell pointer is found; but I have no use for this.) Similar for
|
||||||
|
``STORE_GLOBAL_CELL <i>``, except it doesn't follow the cell pointer
|
||||||
|
chain -- it always stores in the first cell.
|
||||||
|
|
||||||
def __delattr__(self, k):
|
There are fallbacks in the VM for the case where the function's
|
||||||
del self.__dict__[k]
|
globals aren't a celldict, and hence ``func_cells`` is ``NULL``. In that
|
||||||
|
case, the code object's ``co_globals`` is indexed with ``<i>`` to find the
|
||||||
The compiler generates LOAD_GLOBAL_CELL <i> (and STORE_GLOBAL_CELL
|
name of the corresponding global and this name is used to index the
|
||||||
<i> etc.) opcodes for references to globals, where <i> is a small
|
function's globals dict.
|
||||||
index with meaning only within one code object like the const
|
|
||||||
index in LOAD_CONST. The code object has a new tuple, co_globals,
|
|
||||||
giving the names of the globals referenced by the code indexed by
|
|
||||||
<i>. No new analysis is required to be able to do this.
|
|
||||||
|
|
||||||
When a function object is created from a code object and a celldict,
|
|
||||||
the function object creates an array of cell pointers by asking the
|
|
||||||
celldict for cells corresponding to the names in the code object's
|
|
||||||
co_globals. If the celldict doesn't already have a cell for a
|
|
||||||
particular name, it creates and an empty one. This array of cell
|
|
||||||
pointers is stored on the function object as func_cells. When a
|
|
||||||
function object is created from a regular dict instead of a
|
|
||||||
celldict, func_cells is a NULL pointer.
|
|
||||||
|
|
||||||
When the VM executes a LOAD_GLOBAL_CELL <i> instruction, it gets
|
|
||||||
cell number <i> from func_cells. It then looks in the cell's
|
|
||||||
PyObject pointer, and if not NULL, that's the global value. If it
|
|
||||||
is NULL, it follows the cell's cell pointer to the next cell, if it
|
|
||||||
is not NULL, and looks in the PyObject pointer in that cell. If
|
|
||||||
that's also NULL, or if there is no second cell, NameError is
|
|
||||||
raised. (It could follow the chain of cell pointers until a NULL
|
|
||||||
cell pointer is found; but I have no use for this.) Similar for
|
|
||||||
STORE_GLOBAL_CELL <i>, except it doesn't follow the cell pointer
|
|
||||||
chain -- it always stores in the first cell.
|
|
||||||
|
|
||||||
There are fallbacks in the VM for the case where the function's
|
|
||||||
globals aren't a celldict, and hence func_cells is NULL. In that
|
|
||||||
case, the code object's co_globals is indexed with <i> to find the
|
|
||||||
name of the corresponding global and this name is used to index the
|
|
||||||
function's globals dict.
|
|
||||||
|
|
||||||
|
|
||||||
Additional Ideas
|
Additional Ideas
|
||||||
|
================
|
||||||
|
|
||||||
- Never make func_cell a NULL pointer; instead, make up an array
|
- Never make ``func_cell`` a ``NULL`` pointer; instead, make up an array
|
||||||
of empty cells, so that LOAD_GLOBAL_CELL can index func_cells
|
of empty cells, so that ``LOAD_GLOBAL_CELL`` can index ``func_cells``
|
||||||
without a NULL check.
|
without a ``NULL`` check.
|
||||||
|
|
||||||
- Make c.cellptr equal to c when a cell is created, so that
|
- Make ``c.cellptr`` equal to c when a cell is created, so that
|
||||||
LOAD_GLOBAL_CELL can always dereference c.cellptr without a NULL
|
``LOAD_GLOBAL_CELL`` can always dereference ``c.cellptr`` without a ``NULL``
|
||||||
check.
|
check.
|
||||||
|
|
||||||
With these two additional ideas added, here's Python pseudo-code
|
With these two additional ideas added, here's Python pseudo-code
|
||||||
for LOAD_GLOBAL_CELL:
|
for ``LOAD_GLOBAL_CELL``::
|
||||||
|
|
||||||
def LOAD_GLOBAL_CELL(self, i):
|
def LOAD_GLOBAL_CELL(self, i):
|
||||||
# self is the frame
|
# self is the frame
|
||||||
c = self.func_cells[i]
|
c = self.func_cells[i]
|
||||||
obj = c.objptr
|
obj = c.objptr
|
||||||
if obj is not NULL:
|
if obj is not NULL:
|
||||||
return obj # Existing global
|
return obj # Existing global
|
||||||
return c.cellptr.objptr # Built-in or NULL
|
return c.cellptr.objptr # Built-in or NULL
|
||||||
|
|
||||||
- Be more aggressive: put the actual values of builtins into module
|
- Be more aggressive: put the actual values of builtins into module
|
||||||
dicts, not just pointers to cells containing the actual values.
|
dicts, not just pointers to cells containing the actual values.
|
||||||
|
|
||||||
There are two points to this: (1) Simplify and speed access, which
|
There are two points to this: (1) Simplify and speed access, which
|
||||||
is the most common operation. (2) Support faithful emulation of
|
is the most common operation. (2) Support faithful emulation of
|
||||||
extreme existing corner cases.
|
extreme existing corner cases.
|
||||||
|
|
||||||
WRT #2, the set of builtins in the scheme above is captured at the
|
WRT #2, the set of builtins in the scheme above is captured at the
|
||||||
time a module dict is first created. Mutations to the set of builtin
|
time a module dict is first created. Mutations to the set of builtin
|
||||||
names following that don't get reflected in the module dicts. Example:
|
names following that don't get reflected in the module dicts. Example:
|
||||||
consider files main.py and cheater.py:
|
consider files ``main.py`` and ``cheater.py``::
|
||||||
|
|
||||||
[main.py]
|
[main.py]
|
||||||
import cheater
|
import cheater
|
||||||
def f():
|
def f():
|
||||||
cheater.cheat()
|
cheater.cheat()
|
||||||
return pachinko()
|
return pachinko()
|
||||||
print f()
|
print f()
|
||||||
|
|
||||||
[cheater.py]
|
[cheater.py]
|
||||||
def cheat():
|
def cheat():
|
||||||
import __builtin__
|
import __builtin__
|
||||||
__builtin__.pachinko = lambda: 666
|
__builtin__.pachinko = lambda: 666
|
||||||
|
|
||||||
If main.py is run under Python 2.2 (or before), 666 is printed. But
|
If ``main.py`` is run under Python 2.2 (or before), 666 is printed. But
|
||||||
under the proposal, __builtin__.pachinko doesn't exist at the time
|
under the proposal, ``__builtin__.pachinko`` doesn't exist at the time
|
||||||
main's __dict__ is initialized. When the function object for
|
main's ``__dict__`` is initialized. When the function object for
|
||||||
f is created, main.__dict__ grows a pachinko cell mapping to two
|
f is created, ``main.__dict__`` grows a pachinko cell mapping to two
|
||||||
NULLs. When cheat() is called, __builtin__.__dict__ grows a pachinko
|
``NULLs``. When ``cheat()`` is called, ``__builtin__.__dict__`` grows a pachinko
|
||||||
cell too, but main.__dict__ doesn't know-- and will never know --about
|
cell too, but ``main.__dict__`` doesn't know-- and will never know --about
|
||||||
that. When f's return stmt references pachinko, in will still find
|
that. When f's return stmt references pachinko, in will still find
|
||||||
the double-NULLs in main.__dict__'s pachinko cell, and so raise
|
the double-NULLs in ``main.__dict__``'s ``pachinko`` cell, and so raise
|
||||||
NameError.
|
``NameError``.
|
||||||
|
|
||||||
A similar (in cause) break in compatibility can occur if a module
|
A similar (in cause) break in compatibility can occur if a module
|
||||||
global foo is del'ed, but a builtin foo was created prior to that
|
global foo is del'ed, but a builtin foo was created prior to that
|
||||||
but after the module dict was first created. Then the builtin foo
|
but after the module dict was first created. Then the builtin foo
|
||||||
becomes visible in the module under 2.2 and before, but remains
|
becomes visible in the module under 2.2 and before, but remains
|
||||||
invisible under the proposal.
|
invisible under the proposal.
|
||||||
|
|
||||||
Mutating builtins is extremely rare (most programs never mutate the
|
Mutating builtins is extremely rare (most programs never mutate the
|
||||||
builtins, and it's hard to imagine a plausible use for frequent
|
builtins, and it's hard to imagine a plausible use for frequent
|
||||||
mutation of the builtins -- I've never seen or heard of one), so it
|
mutation of the builtins -- I've never seen or heard of one), so it
|
||||||
doesn't matter how expensive mutating the builtins becomes. OTOH,
|
doesn't matter how expensive mutating the builtins becomes. OTOH,
|
||||||
referencing globals and builtins is very common. Combining those
|
referencing globals and builtins is very common. Combining those
|
||||||
observations suggests a more aggressive caching of builtins in module
|
observations suggests a more aggressive caching of builtins in module
|
||||||
globals, speeding access at the expense of making mutations of the
|
globals, speeding access at the expense of making mutations of the
|
||||||
builtins (potentially much) more expensive to keep the caches in
|
builtins (potentially much) more expensive to keep the caches in
|
||||||
synch.
|
synch.
|
||||||
|
|
||||||
Much of the scheme above remains the same, and most of the rest is
|
Much of the scheme above remains the same, and most of the rest is
|
||||||
just a little different. A cell changes to:
|
just a little different. A cell changes to::
|
||||||
|
|
||||||
class cell(object):
|
class cell(object):
|
||||||
def __init__(self, obj=NULL, builtin=0):
|
def __init__(self, obj=NULL, builtin=0):
|
||||||
self.objptr = obj
|
self.objptr = obj
|
||||||
self.builtinflag = builtin
|
self.builtinflag = builtin
|
||||||
|
|
||||||
and a celldict maps strings to this version of cells. builtinflag
|
and a celldict maps strings to this version of cells. ``builtinflag``
|
||||||
is true when and only when objptr contains a value obtained from
|
is true when and only when objptr contains a value obtained from
|
||||||
the builtins; in other words, it's true when and only when a cell
|
the builtins; in other words, it's true when and only when a cell
|
||||||
is acting as a cached value. When builtinflag is false, objptr is
|
is acting as a cached value. When ``builtinflag`` is false, objptr is
|
||||||
the value of a module global (possibly NULL). celldict changes to:
|
the value of a module global (possibly ``NULL``). celldict changes to::
|
||||||
|
|
||||||
class celldict(object):
|
class celldict(object):
|
||||||
|
|
||||||
def __init__(self, builtindict=()):
|
def __init__(self, builtindict=()):
|
||||||
self.basedict = builtindict
|
self.basedict = builtindict
|
||||||
self.__dict = d = {}
|
self.__dict = d = {}
|
||||||
for k, v in builtindict.items():
|
for k, v in builtindict.items():
|
||||||
d[k] = cell(v, 1)
|
d[k] = cell(v, 1)
|
||||||
|
|
||||||
def __getitem__(self, key):
|
def __getitem__(self, key):
|
||||||
c = self.__dict.get(key)
|
c = self.__dict.get(key)
|
||||||
if c is None or c.objptr is NULL or c.builtinflag:
|
if c is None or c.objptr is NULL or c.builtinflag:
|
||||||
raise KeyError, key
|
raise KeyError, key
|
||||||
return c.objptr
|
return c.objptr
|
||||||
|
|
||||||
def __setitem__(self, key, value):
|
def __setitem__(self, key, value):
|
||||||
c = self.__dict.get(key)
|
c = self.__dict.get(key)
|
||||||
if c is None:
|
if c is None:
|
||||||
c = cell()
|
c = cell()
|
||||||
self.__dict[key] = c
|
self.__dict[key] = c
|
||||||
c.objptr = value
|
c.objptr = value
|
||||||
c.builtinflag = 0
|
c.builtinflag = 0
|
||||||
|
|
||||||
def __delitem__(self, key):
|
def __delitem__(self, key):
|
||||||
c = self.__dict.get(key)
|
c = self.__dict.get(key)
|
||||||
if c is None or c.objptr is NULL or c.builtinflag:
|
if c is None or c.objptr is NULL or c.builtinflag:
|
||||||
raise KeyError, key
|
raise KeyError, key
|
||||||
c.objptr = NULL
|
c.objptr = NULL
|
||||||
# We may have unmasked a builtin. Note that because
|
# We may have unmasked a builtin. Note that because
|
||||||
# we're checking the builtin dict for that *now*, this
|
# we're checking the builtin dict for that *now*, this
|
||||||
# still works if the builtin first came into existence
|
# still works if the builtin first came into existence
|
||||||
# after we were constructed. Note too that del on
|
# after we were constructed. Note too that del on
|
||||||
# namespace dicts is rare, so the expensse of this check
|
# namespace dicts is rare, so the expensse of this check
|
||||||
# shouldn't matter.
|
# shouldn't matter.
|
||||||
if key in self.basedict:
|
if key in self.basedict:
|
||||||
c.objptr = self.basedict[key]
|
c.objptr = self.basedict[key]
|
||||||
assert c.objptr is not NULL # else "in" lied
|
assert c.objptr is not NULL # else "in" lied
|
||||||
c.builtinflag = 1
|
c.builtinflag = 1
|
||||||
else:
|
else:
|
||||||
# There is no builtin with the same name.
|
# There is no builtin with the same name.
|
||||||
assert not c.builtinflag
|
assert not c.builtinflag
|
||||||
|
|
||||||
def keys(self):
|
def keys(self):
|
||||||
return [k for k, c in self.__dict.iteritems()
|
return [k for k, c in self.__dict.iteritems()
|
||||||
if c.objptr is not NULL and not c.builtinflag]
|
if c.objptr is not NULL and not c.builtinflag]
|
||||||
|
|
||||||
def items(self):
|
def items(self):
|
||||||
return [k, c.objptr for k, c in self.__dict.iteritems()
|
return [k, c.objptr for k, c in self.__dict.iteritems()
|
||||||
if c.objptr is not NULL and not c.builtinflag]
|
if c.objptr is not NULL and not c.builtinflag]
|
||||||
|
|
||||||
def values(self):
|
def values(self):
|
||||||
preturn [c.objptr for c in self.__dict.itervalues()
|
preturn [c.objptr for c in self.__dict.itervalues()
|
||||||
if c.objptr is not NULL and not c.builtinflag]
|
if c.objptr is not NULL and not c.builtinflag]
|
||||||
|
|
||||||
def clear(self):
|
def clear(self):
|
||||||
for c in self.__dict.values():
|
for c in self.__dict.values():
|
||||||
if not c.builtinflag:
|
if not c.builtinflag:
|
||||||
c.objptr = NULL
|
c.objptr = NULL
|
||||||
|
|
||||||
# Etc.
|
# Etc.
|
||||||
|
|
||||||
The speed benefit comes from simplifying LOAD_GLOBAL_CELL, which
|
The speed benefit comes from simplifying ``LOAD_GLOBAL_CELL``, which
|
||||||
I expect is executed more frequently than all other namespace
|
I expect is executed more frequently than all other namespace
|
||||||
operations combined:
|
operations combined::
|
||||||
|
|
||||||
def LOAD_GLOBAL_CELL(self, i):
|
def LOAD_GLOBAL_CELL(self, i):
|
||||||
# self is the frame
|
# self is the frame
|
||||||
c = self.func_cells[i]
|
c = self.func_cells[i]
|
||||||
return c.objptr # may be NULL (also true before)
|
return c.objptr # may be NULL (also true before)
|
||||||
|
|
||||||
That is, accessing builtins and accessing module globals are equally
|
That is, accessing builtins and accessing module globals are equally
|
||||||
fast. For module globals, a NULL-pointer test+branch is saved. For
|
fast. For module globals, a NULL-pointer test+branch is saved. For
|
||||||
builtins, an additional pointer chase is also saved.
|
builtins, an additional pointer chase is also saved.
|
||||||
|
|
||||||
The other part needed to make this fly is expensive, propagating
|
The other part needed to make this fly is expensive, propagating
|
||||||
mutations of builtins into the module dicts that were initialized
|
mutations of builtins into the module dicts that were initialized
|
||||||
from the builtins. This is much like, in 2.2, propagating changes
|
from the builtins. This is much like, in 2.2, propagating changes
|
||||||
in new-style base classes to their descendants: the builtins need to
|
in new-style base classes to their descendants: the builtins need to
|
||||||
maintain a list of weakrefs to the modules (or module dicts)
|
maintain a list of weakrefs to the modules (or module dicts)
|
||||||
initialized from the builtin's dict. Given a mutation to the builtin
|
initialized from the builtin's dict. Given a mutation to the builtin
|
||||||
dict (adding a new key, changing the value associated with an
|
dict (adding a new key, changing the value associated with an
|
||||||
existing key, or deleting a key), traverse the list of module dicts
|
existing key, or deleting a key), traverse the list of module dicts
|
||||||
and make corresponding mutations to them. This is straightforward;
|
and make corresponding mutations to them. This is straightforward;
|
||||||
for example, if a key is deleted from builtins, execute
|
for example, if a key is deleted from builtins, execute
|
||||||
reflect_bltin_del in each module:
|
``reflect_bltin_del`` in each module::
|
||||||
|
|
||||||
def reflect_bltin_del(self, key):
|
def reflect_bltin_del(self, key):
|
||||||
c = self.__dict.get(key)
|
c = self.__dict.get(key)
|
||||||
assert c is not None # else we were already out of synch
|
assert c is not None # else we were already out of synch
|
||||||
if c.builtinflag:
|
if c.builtinflag:
|
||||||
# Put us back in synch.
|
# Put us back in synch.
|
||||||
c.objptr = NULL
|
c.objptr = NULL
|
||||||
c.builtinflag = 0
|
c.builtinflag = 0
|
||||||
# Else we're shadowing the builtin, so don't care that
|
# Else we're shadowing the builtin, so don't care that
|
||||||
# the builtin went away.
|
# the builtin went away.
|
||||||
|
|
||||||
Note that c.builtinflag protects from us erroneously deleting a
|
Note that ``c.builtinflag`` protects from us erroneously deleting a
|
||||||
module global of the same name. Adding a new (key, value) builtin
|
module global of the same name. Adding a new (key, value) builtin
|
||||||
pair is similar:
|
pair is similar::
|
||||||
|
|
||||||
def reflect_bltin_new(self, key, value):
|
def reflect_bltin_new(self, key, value):
|
||||||
c = self.__dict.get(key)
|
c = self.__dict.get(key)
|
||||||
if c is None:
|
if c is None:
|
||||||
# Never heard of it before: cache the builtin value.
|
# Never heard of it before: cache the builtin value.
|
||||||
self.__dict[key] = cell(value, 1)
|
self.__dict[key] = cell(value, 1)
|
||||||
elif c.objptr is NULL:
|
elif c.objptr is NULL:
|
||||||
# This used to exist in the module or the builtins,
|
# This used to exist in the module or the builtins,
|
||||||
# but doesn't anymore; rehabilitate it.
|
# but doesn't anymore; rehabilitate it.
|
||||||
assert not c.builtinflag
|
assert not c.builtinflag
|
||||||
c.objptr = value
|
c.objptr = value
|
||||||
c.builtinflag = 1
|
c.builtinflag = 1
|
||||||
else:
|
else:
|
||||||
# We're shadowing it already.
|
# We're shadowing it already.
|
||||||
assert not c.builtinflag
|
assert not c.builtinflag
|
||||||
|
|
||||||
Changing the value of an existing builtin:
|
Changing the value of an existing builtin::
|
||||||
|
|
||||||
def reflect_bltin_change(self, key, newvalue):
|
def reflect_bltin_change(self, key, newvalue):
|
||||||
c = self.__dict.get(key)
|
c = self.__dict.get(key)
|
||||||
assert c is not None # else we were already out of synch
|
assert c is not None # else we were already out of synch
|
||||||
if c.builtinflag:
|
if c.builtinflag:
|
||||||
# Put us back in synch.
|
# Put us back in synch.
|
||||||
c.objptr = newvalue
|
c.objptr = newvalue
|
||||||
# Else we're shadowing the builtin, so don't care that
|
# Else we're shadowing the builtin, so don't care that
|
||||||
# the builtin changed.
|
# the builtin changed.
|
||||||
|
|
||||||
|
|
||||||
FAQs
|
FAQs
|
||||||
|
====
|
||||||
|
|
||||||
Q. Will it still be possible to:
|
|
||||||
a) install new builtins in the __builtin__ namespace and have
|
|
||||||
them available in all already loaded modules right away ?
|
|
||||||
b) override builtins (e.g. open()) with my own copies
|
|
||||||
(e.g. to increase security) in a way that makes these new
|
|
||||||
copies override the previous ones in all modules ?
|
|
||||||
|
|
||||||
A. Yes, this is the whole point of this design. In the original
|
* Q: Will it still be possible to:
|
||||||
approach, when LOAD_GLOBAL_CELL finds a NULL in the second
|
|
||||||
cell, it should go back to see if the __builtins__ dict has
|
|
||||||
been modified (the pseudo code doesn't have this yet). Tim's
|
|
||||||
"more aggressive" alternative also takes care of this.
|
|
||||||
|
|
||||||
Q. How does the new scheme get along with the restricted execution
|
a) install new builtins in the ``__builtin__`` namespace and have
|
||||||
model?
|
them available in all already loaded modules right away ?
|
||||||
|
|
||||||
A. It is intended to support that fully.
|
b) override builtins (e.g. ``open()``) with my own copies
|
||||||
|
(e.g. to increase security) in a way that makes these new
|
||||||
|
copies override the previous ones in all modules ?
|
||||||
|
|
||||||
Q. What happens when a global is deleted?
|
A: Yes, this is the whole point of this design. In the original
|
||||||
|
approach, when ``LOAD_GLOBAL_CELL`` finds a ``NULL`` in the second
|
||||||
|
cell, it should go back to see if the ``__builtins__`` dict has
|
||||||
|
been modified (the pseudo code doesn't have this yet). Tim's
|
||||||
|
"more aggressive" alternative also takes care of this.
|
||||||
|
|
||||||
A. The module's celldict would have a cell with a NULL objptr for
|
* Q: How does the new scheme get along with the restricted execution
|
||||||
that key. This is true in both variations, but the "aggressive"
|
model?
|
||||||
variation goes on to see whether this unmasks a builtin of the
|
|
||||||
same name, and if so copies its value (just a pointer-copy of the
|
|
||||||
ultimate PyObject*) into the cell's objptr and sets the cell's
|
|
||||||
builtinflag to true.
|
|
||||||
|
|
||||||
Q. What would the C code for LOAD_GLOBAL_CELL look like?
|
A: It is intended to support that fully.
|
||||||
|
|
||||||
A. The first version, with the first two bullets under "Additional
|
* Q: What happens when a global is deleted?
|
||||||
ideas" incorporated, could look like this:
|
|
||||||
|
|
||||||
case LOAD_GLOBAL_CELL:
|
A: The module's celldict would have a cell with a ``NULL`` objptr for
|
||||||
cell = func_cells[oparg];
|
that key. This is true in both variations, but the "aggressive"
|
||||||
x = cell->objptr;
|
variation goes on to see whether this unmasks a builtin of the
|
||||||
if (x == NULL) {
|
same name, and if so copies its value (just a pointer-copy of the
|
||||||
x = cell->cellptr->objptr;
|
ultimate ``PyObject*``) into the cell's objptr and sets the cell's
|
||||||
if (x == NULL) {
|
``builtinflag`` to true.
|
||||||
... error recovery ...
|
|
||||||
break;
|
|
||||||
}
|
|
||||||
}
|
|
||||||
Py_INCREF(x);
|
|
||||||
PUSH(x);
|
|
||||||
continue;
|
|
||||||
|
|
||||||
We could even write it like this (idea courtesy of Ka-Ping Yee):
|
* Q: What would the C code for ``LOAD_GLOBAL_CELL`` look like?
|
||||||
|
|
||||||
case LOAD_GLOBAL_CELL:
|
A: The first version, with the first two bullets under "Additional
|
||||||
cell = func_cells[oparg];
|
ideas" incorporated, could look like this::
|
||||||
x = cell->cellptr->objptr;
|
|
||||||
if (x != NULL) {
|
|
||||||
Py_INCREF(x);
|
|
||||||
PUSH(x);
|
|
||||||
continue;
|
|
||||||
}
|
|
||||||
... error recovery ...
|
|
||||||
break;
|
|
||||||
|
|
||||||
In modern CPU architectures, this reduces the number of
|
case LOAD_GLOBAL_CELL:
|
||||||
branches taken for built-ins, which might be a really good
|
cell = func_cells[oparg];
|
||||||
thing, while any decent memory cache should realize that
|
x = cell->objptr;
|
||||||
cell->cellptr is the same as cell for regular globals and hence
|
if (x == NULL) {
|
||||||
this should be very fast in that case too.
|
x = cell->cellptr->objptr;
|
||||||
|
if (x == NULL) {
|
||||||
|
... error recovery ...
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
Py_INCREF(x);
|
||||||
|
PUSH(x);
|
||||||
|
continue;
|
||||||
|
|
||||||
For the aggressive variant:
|
We could even write it like this (idea courtesy of Ka-Ping Yee)::
|
||||||
|
|
||||||
case LOAD_GLOBAL_CELL:
|
case LOAD_GLOBAL_CELL:
|
||||||
cell = func_cells[oparg];
|
cell = func_cells[oparg];
|
||||||
x = cell->objptr;
|
x = cell->cellptr->objptr;
|
||||||
if (x != NULL) {
|
if (x != NULL) {
|
||||||
Py_INCREF(x);
|
Py_INCREF(x);
|
||||||
PUSH(x);
|
PUSH(x);
|
||||||
continue;
|
continue;
|
||||||
}
|
}
|
||||||
... error recovery ...
|
... error recovery ...
|
||||||
break;
|
break;
|
||||||
|
|
||||||
Q. What happens in the module's top-level code where there is
|
In modern CPU architectures, this reduces the number of
|
||||||
presumably no func_cells array?
|
branches taken for built-ins, which might be a really good
|
||||||
|
thing, while any decent memory cache should realize that
|
||||||
|
``cell->cellptr`` is the same as cell for regular globals and hence
|
||||||
|
this should be very fast in that case too.
|
||||||
|
|
||||||
A. We could do some code analysis and create a func_cells array,
|
For the aggressive variant::
|
||||||
or we could use LOAD_NAME which should use PyMapping_GetItem on
|
|
||||||
the globals dict.
|
case LOAD_GLOBAL_CELL:
|
||||||
|
cell = func_cells[oparg];
|
||||||
|
x = cell->objptr;
|
||||||
|
if (x != NULL) {
|
||||||
|
Py_INCREF(x);
|
||||||
|
PUSH(x);
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
... error recovery ...
|
||||||
|
break;
|
||||||
|
|
||||||
|
* Q: What happens in the module's top-level code where there is
|
||||||
|
presumably no ``func_cells`` array?
|
||||||
|
|
||||||
|
A: We could do some code analysis and create a ``func_cells`` array,
|
||||||
|
or we could use ``LOAD_NAME`` which should use ``PyMapping_GetItem`` on
|
||||||
|
the globals dict.
|
||||||
|
|
||||||
|
|
||||||
Graphics
|
Graphics
|
||||||
|
========
|
||||||
|
|
||||||
Ka-Ping Yee supplied a drawing of the state of things after
|
Ka-Ping Yee supplied a drawing of the state of things after
|
||||||
"import spam", where spam.py contains:
|
"import spam", where ``spam.py`` contains::
|
||||||
|
|
||||||
import eggs
|
import eggs
|
||||||
|
|
||||||
i = -2
|
i = -2
|
||||||
max = 3
|
max = 3
|
||||||
|
|
||||||
def foo(n):
|
def foo(n):
|
||||||
y = abs(i) + max
|
y = abs(i) + max
|
||||||
return eggs.ham(y + n)
|
return eggs.ham(y + n)
|
||||||
|
|
||||||
The drawing is at http://web.lfw.org/repo/cells.gif; a larger
|
The drawing is at http://web.lfw.org/repo/cells.gif; a larger
|
||||||
version is at http://lfw.org/repo/cells-big.gif; the source is at
|
version is at http://lfw.org/repo/cells-big.gif; the source is at
|
||||||
http://lfw.org/repo/cells.ai.
|
http://lfw.org/repo/cells.ai.
|
||||||
|
|
||||||
|
|
||||||
Comparison
|
Comparison
|
||||||
|
==========
|
||||||
|
|
||||||
XXX Here, a comparison of the three approaches could be added.
|
XXX Here, a comparison of the three approaches could be added.
|
||||||
|
|
||||||
|
|
||||||
Copyright
|
Copyright
|
||||||
|
=========
|
||||||
|
|
||||||
This document has been placed in the public domain.
|
This document has been placed in the public domain.
|
||||||
|
|
||||||
|
|
||||||
|
..
|
||||||
Local Variables:
|
Local Variables:
|
||||||
mode: indented-text
|
mode: indented-text
|
||||||
indent-tabs-mode: nil
|
indent-tabs-mode: nil
|
||||||
End:
|
End:
|
||||||
|
|
Loading…
Reference in New Issue