PEP 510
* convert Python functions to C functions * replace "specialized functions" with "specialized codes" to avoid confusion * guard functions now use 0 result as success to simplify the code
This commit is contained in:
parent
a19233d285
commit
43bb3fb7ad
313
pep-0510.txt
313
pep-0510.txt
|
@ -1,5 +1,5 @@
|
||||||
PEP: 510
|
PEP: 510
|
||||||
Title: Specialized functions with guards
|
Title: Specialize functions
|
||||||
Version: $Revision$
|
Version: $Revision$
|
||||||
Last-Modified: $Date$
|
Last-Modified: $Date$
|
||||||
Author: Victor Stinner <victor.stinner@gmail.com>
|
Author: Victor Stinner <victor.stinner@gmail.com>
|
||||||
|
@ -13,8 +13,9 @@ Python-Version: 3.6
|
||||||
Abstract
|
Abstract
|
||||||
========
|
========
|
||||||
|
|
||||||
Add a private API to CPython to add specialized functions with guards to
|
Add functions to the Python C API to specialize pure Python functions:
|
||||||
functions, to support static optimizers respecting the Python semantics.
|
add specialized codes with guards. It allows to implement static
|
||||||
|
optimizers respecting the Python semantics.
|
||||||
|
|
||||||
|
|
||||||
Rationale
|
Rationale
|
||||||
|
@ -29,15 +30,16 @@ modified at runtime. Implement optimizations respecting the Python
|
||||||
semantics requires to detect when "something changes", we will call these
|
semantics requires to detect when "something changes", we will call these
|
||||||
checks "guards".
|
checks "guards".
|
||||||
|
|
||||||
This PEP proposes to add an API to add specialized functions with guards
|
This PEP proposes to add a public API to the Python C API to add
|
||||||
to a function. When the function is called, the specialized function is
|
specialized codes with guards to a function. When the function is
|
||||||
used if nothing changed, otherwise use the original bytecode.
|
called, a specialized code is used if nothing changed, otherwise use the
|
||||||
|
original bytecode.
|
||||||
|
|
||||||
Even if guards help to respect most parts of the Python semantics, it's
|
Even if guards help to respect most parts of the Python semantics, it's
|
||||||
really hard to optimize Python without making subtle changes on the
|
hard to optimize Python without making subtle changes on the exact
|
||||||
exact behaviour. CPython has a long history and many applications rely
|
behaviour. CPython has a long history and many applications rely on
|
||||||
on implementation details. A compromise must be found between
|
implementation details. A compromise must be found between "everything
|
||||||
"everything is mutable" and performance.
|
is mutable" and performance.
|
||||||
|
|
||||||
Writing an optimizer is out of the scope of this PEP.
|
Writing an optimizer is out of the scope of this PEP.
|
||||||
|
|
||||||
|
@ -101,13 +103,35 @@ between CPython 3.5 and PyPy.
|
||||||
2011.
|
2011.
|
||||||
|
|
||||||
|
|
||||||
Example
|
Examples
|
||||||
=======
|
========
|
||||||
|
|
||||||
|
Following examples are not written to show powerful optimizations
|
||||||
|
promising important speedup, but to be short and easy to understand,
|
||||||
|
just to explain the principle.
|
||||||
|
|
||||||
|
Hypothetical myoptimizer module
|
||||||
|
-------------------------------
|
||||||
|
|
||||||
|
Examples in this PEP uses an hypothetical ``myoptimizer`` module which
|
||||||
|
provides the following functions and types:
|
||||||
|
|
||||||
|
* ``specialize(func, code, guards)``: add the specialized code `code`
|
||||||
|
with guards `guards` to the function `func`
|
||||||
|
* ``get_specialized(func)``: get the list of specialized codes as a list
|
||||||
|
of ``(code, guards)`` tuples where `code` is a callable or code object
|
||||||
|
and `guards` is a list of a guards
|
||||||
|
* ``GuardBuiltins(name)``: guard watching for
|
||||||
|
``builtins.__dict__[name]`` and ``globals()[name]``. The guard fails
|
||||||
|
if ``builtins.__dict__[name]`` is replaced, or if ``globals()[name]``
|
||||||
|
is set.
|
||||||
|
|
||||||
|
|
||||||
Using bytecode
|
Using bytecode
|
||||||
--------------
|
--------------
|
||||||
|
|
||||||
Replace ``chr(65)`` with ``"A"``::
|
Add specialized bytecode where the call to the pure builtin function
|
||||||
|
``chr(65)`` is replaced with its result ``"A"``::
|
||||||
|
|
||||||
import myoptimizer
|
import myoptimizer
|
||||||
|
|
||||||
|
@ -117,18 +141,21 @@ Replace ``chr(65)`` with ``"A"``::
|
||||||
def fast_func():
|
def fast_func():
|
||||||
return "A"
|
return "A"
|
||||||
|
|
||||||
func._specialize(fast_func.__code__, [myoptimizer.GuardBuiltins("chr")])
|
myoptimizer.specialize(func, fast_func.__code__,
|
||||||
|
[myoptimizer.GuardBuiltins("chr")])
|
||||||
del fast_func
|
del fast_func
|
||||||
|
|
||||||
|
Example showing the behaviour of the guard::
|
||||||
|
|
||||||
print("func(): %s" % func())
|
print("func(): %s" % func())
|
||||||
print("#specialized: %s" % len(func._get_specialized()))
|
print("#specialized: %s" % len(myoptimizer.get_specialized(func)))
|
||||||
print()
|
print()
|
||||||
|
|
||||||
import builtins
|
import builtins
|
||||||
builtins.chr = lambda obj: "mock"
|
builtins.chr = lambda obj: "mock"
|
||||||
|
|
||||||
print("func(): %s" % func())
|
print("func(): %s" % func())
|
||||||
print("#specialized: %s" % len(func._get_specialized()))
|
print("#specialized: %s" % len(myoptimizer.get_specialized(func)))
|
||||||
|
|
||||||
Output::
|
Output::
|
||||||
|
|
||||||
|
@ -138,41 +165,40 @@ Output::
|
||||||
func(): mock
|
func(): mock
|
||||||
#specialized: 0
|
#specialized: 0
|
||||||
|
|
||||||
The hypothetical ``myoptimizer.GuardBuiltins("len")`` is a guard on the
|
The first call uses the specialized bytecode which returns the string
|
||||||
builtin ``len()`` function and the ``len`` name in the global namespace.
|
``"A"``. The second call removes the specialized code because the
|
||||||
The guard fails if the builtin function is replaced or if a ``len`` name
|
builtin ``chr()`` function was replaced, and executes the original
|
||||||
is defined in the global namespace.
|
bytecode calling ``chr(65)``.
|
||||||
|
|
||||||
The first call returns directly the string ``"A"``. The second call
|
On a microbenchmark, calling the specialized bytecode takes 88 ns,
|
||||||
removes the specialized function because the builtin ``chr()`` function
|
whereas the original function takes 145 ns (+57 ns): 1.6 times as fast.
|
||||||
was replaced, and executes the original bytecode
|
|
||||||
|
|
||||||
On a microbenchmark, calling the specialized function takes 88 ns,
|
|
||||||
whereas the original bytecode takes 145 ns (+57 ns): 1.6 times as fast.
|
|
||||||
|
|
||||||
|
|
||||||
Using builtin function
|
Using builtin function
|
||||||
----------------------
|
----------------------
|
||||||
|
|
||||||
Replace a slow Python function calling ``chr(obj)`` with a direct call
|
Add the C builtin ``chr()`` function as the specialized code instead of
|
||||||
to the builtin ``chr()`` function::
|
a bytecode calling ``chr(obj)``::
|
||||||
|
|
||||||
import myoptimizer
|
import myoptimizer
|
||||||
|
|
||||||
def func(arg):
|
def func(arg):
|
||||||
return chr(arg)
|
return chr(arg)
|
||||||
|
|
||||||
func._specialize(chr, [myoptimizer.GuardBuiltins("chr")])
|
myoptimizer.specialize(func, chr,
|
||||||
|
[myoptimizer.GuardBuiltins("chr")])
|
||||||
|
|
||||||
|
Example showing the behaviour of the guard::
|
||||||
|
|
||||||
print("func(65): %s" % func(65))
|
print("func(65): %s" % func(65))
|
||||||
print("#specialized: %s" % len(func._get_specialized()))
|
print("#specialized: %s" % len(myoptimizer.get_specialized(func)))
|
||||||
print()
|
print()
|
||||||
|
|
||||||
import builtins
|
import builtins
|
||||||
builtins.chr = lambda obj: "mock"
|
builtins.chr = lambda obj: "mock"
|
||||||
|
|
||||||
print("func(65): %s" % func(65))
|
print("func(65): %s" % func(65))
|
||||||
print("#specialized: %s" % len(func.get_specialized()))
|
print("#specialized: %s" % len(myoptimizer.get_specialized(func)))
|
||||||
|
|
||||||
Output::
|
Output::
|
||||||
|
|
||||||
|
@ -182,88 +208,165 @@ Output::
|
||||||
func(): mock
|
func(): mock
|
||||||
#specialized: 0
|
#specialized: 0
|
||||||
|
|
||||||
The first call returns directly the builtin ``chr()`` function (without
|
The first call calls the C builtin ``chr()`` function (without creating
|
||||||
creating a Python frame). The second call removes the specialized
|
a Python frame). The second call removes the specialized code because
|
||||||
function because the builtin ``chr()`` function was replaced, and
|
the builtin ``chr()`` function was replaced, and executes the original
|
||||||
executes the original bytecode.
|
bytecode.
|
||||||
|
|
||||||
On a microbenchmark, calling the specialized function takes 95 ns,
|
On a microbenchmark, calling the C builtin takes 95 ns, whereas the
|
||||||
whereas the original bytecode takes 155 ns (+60 ns): 1.6 times as fast.
|
original bytecode takes 155 ns (+60 ns): 1.6 times as fast. Calling
|
||||||
Calling directly ``chr(65)`` takes 76 ns.
|
directly ``chr(65)`` takes 76 ns.
|
||||||
|
|
||||||
|
|
||||||
Python Function Call
|
Choose the specialized code
|
||||||
====================
|
===========================
|
||||||
|
|
||||||
Pseudo-code to call a Python function having specialized functions with
|
Pseudo-code to choose the specialized code to call a pure Python
|
||||||
guards::
|
function::
|
||||||
|
|
||||||
def call_func(func, *args, **kwargs):
|
def call_func(func, args, kwargs):
|
||||||
# by default, call the regular bytecode
|
specialized = myoptimizer.get_specialized(func)
|
||||||
code = func.__code__.co_code
|
|
||||||
specialized = func.get_specialized()
|
|
||||||
nspecialized = len(specialized)
|
nspecialized = len(specialized)
|
||||||
|
|
||||||
index = 0
|
index = 0
|
||||||
while index < nspecialized:
|
while index < nspecialized:
|
||||||
guard = specialized[index].guard
|
specialized_code, guards = specialized[index]
|
||||||
# pass arguments, some guards need them
|
|
||||||
check = guard(args, kwargs)
|
for guard in guards:
|
||||||
if check == 1:
|
check = guard(args, kwargs)
|
||||||
# guard succeeded: we can use the specialized function
|
if check:
|
||||||
code = specialized[index].code
|
break
|
||||||
break
|
|
||||||
elif check == -1:
|
if not check:
|
||||||
# guard will always fail: remove the specialized function
|
# all guards succeeded:
|
||||||
del specialized[index]
|
# use the specialized code
|
||||||
elif check == 0:
|
return specialized_code
|
||||||
# guard failed temporarely
|
elif check == 1:
|
||||||
index += 1
|
# a guard failed temporarely:
|
||||||
|
# try the next specialized code
|
||||||
|
index += 1
|
||||||
|
else:
|
||||||
|
assert check == 2
|
||||||
|
# a guard will always fail:
|
||||||
|
# remove the specialized code
|
||||||
|
del specialized[index]
|
||||||
|
|
||||||
|
# if a guard of each specialized code failed, or if the function
|
||||||
|
# has no specialized code, use original bytecode
|
||||||
|
code = func.__code__
|
||||||
|
|
||||||
# code can be a code object or any callable object
|
|
||||||
execute_code(code, args, kwargs)
|
|
||||||
|
|
||||||
|
|
||||||
Changes
|
Changes
|
||||||
=======
|
=======
|
||||||
|
|
||||||
* Add two new private methods to functions:
|
Changes to the Python C API:
|
||||||
|
|
||||||
* ``_specialize(code, guards: list)``: add specialized
|
* Add a ``PyFuncGuardObject`` object and a ``PyFuncGuard_Type`` type
|
||||||
function with guard. `code` is a code object (ex:
|
* Add a ``PySpecializedFunc`` structure
|
||||||
``func2.__code__``) or any callable object (ex: the builtin
|
* Add the following fields to the ``PyFunctionObject`` structure::
|
||||||
``len()`` function). The specialization can be ignored if a guard
|
|
||||||
already fails or for other reasons (ex: the implementation of Python
|
|
||||||
does not implement this feature). Return ``False`` is the
|
|
||||||
specialized function was ignored, return ``True`` otherwise.
|
|
||||||
|
|
||||||
* ``_get_specialized()``: get the list of specialized functions with
|
Py_ssize_t nb_specialized;
|
||||||
guards. Return a list of ``(func, guards)`` tuples where func is the
|
PyObject *specialized; /* array of PySpecializedFunc objects */
|
||||||
specialized function and guards is a list of guards. Return an empty
|
|
||||||
list if the function was never specialized.
|
|
||||||
|
|
||||||
* Add a private ``PyFuncGuard`` Python type. It requires to implement a
|
* Add function methods:
|
||||||
C ``check()`` function, with an optional C ``init()`` function. API:
|
|
||||||
|
|
||||||
* ``int init(PyObject *guard, PyObject *func)``: initialize a guard,
|
* ``PyFunction_Specialize()``
|
||||||
*func* is the function to which the specialized function will be
|
* ``PyFunction_GetSpecializedCodes()``
|
||||||
attached. Result:
|
* ``PyFunction_GetSpecializedCode()``
|
||||||
|
|
||||||
* return ``1`` on success
|
None of these function and types are exposed at the Python level.
|
||||||
* return ``0`` if the guard will always fail (the specialization must be
|
|
||||||
ignored)
|
|
||||||
* raise an exception and return ``-1`` on error
|
|
||||||
|
|
||||||
* ``int check(PyObject *guard, PyObject **stack, int na, int nk)``:
|
All these additions are explicitly excluded of the stable ABI.
|
||||||
check the guard. Result:
|
|
||||||
|
|
||||||
* return 2 on success
|
When a function code is replaced (``func.__code__ = new_code``), all
|
||||||
* return 1 if the guard failed temporarely
|
specialized codes and guards are removed.
|
||||||
* return 0 if the guard will always fail
|
|
||||||
* raise an exception and return -1 on error
|
|
||||||
|
|
||||||
* A guard can be called in Python with parameters, it returns the
|
When a function is serialized ``pickle``, specialized codes and guards are
|
||||||
result of the guard check.
|
ignored (not serialized). Specialized codes and guards are not stored in
|
||||||
|
``.pyc`` files but created and registered at runtime, when a module is
|
||||||
|
loaded.
|
||||||
|
|
||||||
|
|
||||||
|
Function guard
|
||||||
|
--------------
|
||||||
|
|
||||||
|
Add a function guard object::
|
||||||
|
|
||||||
|
typedef struct {
|
||||||
|
PyObject ob_base;
|
||||||
|
int (*init) (PyObject *guard, PyObject *func);
|
||||||
|
int (*check) (PyObject *guard, PyObject **stack, int na, int nk);
|
||||||
|
} PyFuncGuardObject;
|
||||||
|
|
||||||
|
The ``init()`` function initializes a guard:
|
||||||
|
|
||||||
|
* Return ``0`` on success
|
||||||
|
* Return ``1`` if the guard will always fail: ``PyFunction_Specialize()``
|
||||||
|
must ignore the specialized code
|
||||||
|
* Raise an exception and return ``-1`` on error
|
||||||
|
|
||||||
|
|
||||||
|
The ``check()`` function checks a guard:
|
||||||
|
|
||||||
|
* Return ``0`` on success
|
||||||
|
* Return ``1`` if the guard failed temporarely
|
||||||
|
* Return ``2`` if the guard will always fail: the specialized code must
|
||||||
|
be removed
|
||||||
|
* Raise an exception and return ``-1`` on error
|
||||||
|
|
||||||
|
*stack* is an array of arguments: indexed arguments followed by (*key*,
|
||||||
|
*value*) pairs of keyword arguments. *na* is the number of indexed
|
||||||
|
arguments. *nk* is the number of keyword arguments: the number of (*key*,
|
||||||
|
*value*) pairs. `stack` contains ``na + nk * 2`` objects.
|
||||||
|
|
||||||
|
|
||||||
|
Specialized code
|
||||||
|
----------------
|
||||||
|
|
||||||
|
Add a specialized code structure::
|
||||||
|
|
||||||
|
typedef struct {
|
||||||
|
PyObject *code; /* callable or code object */
|
||||||
|
Py_ssize_t nb_guard;
|
||||||
|
PyObject **guards; /* PyFuncGuardObject objects */
|
||||||
|
} PySpecializedCode;
|
||||||
|
|
||||||
|
|
||||||
|
Function methods
|
||||||
|
----------------
|
||||||
|
|
||||||
|
Add a function method to specialize the function, add a specialized code
|
||||||
|
with guards::
|
||||||
|
|
||||||
|
int PyFunction_Specialize(PyObject *func,
|
||||||
|
PyObject *code, PyObject *guards)
|
||||||
|
|
||||||
|
Result:
|
||||||
|
|
||||||
|
* Return ``0`` on success
|
||||||
|
* Return ``1`` if the specialization has been ignored
|
||||||
|
* Raise an exception and return ``-1`` on error
|
||||||
|
|
||||||
|
Add a function method to get the list of specialized codes::
|
||||||
|
|
||||||
|
PyObject* PyFunction_GetSpecializedCodes(PyObject *func)
|
||||||
|
|
||||||
|
Return a list of (*code*, *guards*) tuples where *code* is a callable or
|
||||||
|
code object and *guards* is a list of ``PyFuncGuard`` objects. Raise an
|
||||||
|
exception and return ``NULL`` on error.
|
||||||
|
|
||||||
|
Add a function method to get the specialized code::
|
||||||
|
|
||||||
|
PyObject* PyFunction_GetSpecializedCode(PyObject *func,
|
||||||
|
PyObject **stack,
|
||||||
|
int na, int nk)
|
||||||
|
|
||||||
|
See ``check()`` function of guards for *stack*, *na* and *nk* arguments.
|
||||||
|
Return a callable or a code object on success. Raise an exception and
|
||||||
|
return ``NULL`` on error.
|
||||||
|
|
||||||
|
Benchmark
|
||||||
|
---------
|
||||||
|
|
||||||
Microbenchmark on ``python3.6 -m timeit -s 'def f(): pass' 'f()'`` (best
|
Microbenchmark on ``python3.6 -m timeit -s 'def f(): pass' 'f()'`` (best
|
||||||
of 3 runs):
|
of 3 runs):
|
||||||
|
@ -275,30 +378,18 @@ According to this microbenchmark, the changes has no overhead on calling
|
||||||
a Python function without specialization.
|
a Python function without specialization.
|
||||||
|
|
||||||
|
|
||||||
Behaviour
|
|
||||||
=========
|
|
||||||
|
|
||||||
When a function code is replaced (``func.__code__ = new_code``), all
|
|
||||||
specialized functions are removed.
|
|
||||||
|
|
||||||
When a function is serialized ``pickle``, specialized functions and
|
|
||||||
guards are ignored (not serialized). Specialized functions and guards
|
|
||||||
are not stored in ``.pyc`` files but created and registered at runtime,
|
|
||||||
when a module is loaded.
|
|
||||||
|
|
||||||
|
|
||||||
Other implementations of Python
|
Other implementations of Python
|
||||||
===============================
|
===============================
|
||||||
|
|
||||||
This PEP is designed to be implemented in C for CPython.
|
This PEP only contains changes to the Python C API, the Python API is
|
||||||
|
unchanged. Other implementations of Python are free to not implement new
|
||||||
|
additions, or implement added functions as no-op:
|
||||||
|
|
||||||
Other implementations of Python are free to not implement added private
|
* ``PyFunction_Specialize()``: always return ``1`` (the specialization
|
||||||
function methods.
|
has been ignored)
|
||||||
|
* ``PyFunction_GetSpecializedCodes()``: always return an empty list
|
||||||
Or they can implement a ``_specialize()`` method which always ignores
|
* ``PyFunction_GetSpecializedCode()``: return the function code object,
|
||||||
the specialized function (in short, do nothing and always return
|
as the existing ``PyFunction_GET_CODE()`` macro
|
||||||
``False``) and a ``_get_specialized()`` method which always returns an
|
|
||||||
empty list.
|
|
||||||
|
|
||||||
|
|
||||||
Discussion
|
Discussion
|
||||||
|
|
Loading…
Reference in New Issue