* convert Python functions to C functions
* replace "specialized functions" with "specialized codes" to avoid confusion
* guard functions now use 0 result as success to simplify the code
This commit is contained in:
Victor Stinner 2016-01-12 12:11:53 +01:00
parent a19233d285
commit 43bb3fb7ad
1 changed files with 202 additions and 111 deletions

View File

@ -1,5 +1,5 @@
PEP: 510 PEP: 510
Title: Specialized functions with guards Title: Specialize functions
Version: $Revision$ Version: $Revision$
Last-Modified: $Date$ Last-Modified: $Date$
Author: Victor Stinner <victor.stinner@gmail.com> Author: Victor Stinner <victor.stinner@gmail.com>
@ -13,8 +13,9 @@ Python-Version: 3.6
Abstract Abstract
======== ========
Add a private API to CPython to add specialized functions with guards to Add functions to the Python C API to specialize pure Python functions:
functions, to support static optimizers respecting the Python semantics. add specialized codes with guards. It allows to implement static
optimizers respecting the Python semantics.
Rationale Rationale
@ -29,15 +30,16 @@ modified at runtime. Implement optimizations respecting the Python
semantics requires to detect when "something changes", we will call these semantics requires to detect when "something changes", we will call these
checks "guards". checks "guards".
This PEP proposes to add an API to add specialized functions with guards This PEP proposes to add a public API to the Python C API to add
to a function. When the function is called, the specialized function is specialized codes with guards to a function. When the function is
used if nothing changed, otherwise use the original bytecode. called, a specialized code is used if nothing changed, otherwise use the
original bytecode.
Even if guards help to respect most parts of the Python semantics, it's Even if guards help to respect most parts of the Python semantics, it's
really hard to optimize Python without making subtle changes on the hard to optimize Python without making subtle changes on the exact
exact behaviour. CPython has a long history and many applications rely behaviour. CPython has a long history and many applications rely on
on implementation details. A compromise must be found between implementation details. A compromise must be found between "everything
"everything is mutable" and performance. is mutable" and performance.
Writing an optimizer is out of the scope of this PEP. Writing an optimizer is out of the scope of this PEP.
@ -101,13 +103,35 @@ between CPython 3.5 and PyPy.
2011. 2011.
Example Examples
======= ========
Following examples are not written to show powerful optimizations
promising important speedup, but to be short and easy to understand,
just to explain the principle.
Hypothetical myoptimizer module
-------------------------------
Examples in this PEP uses an hypothetical ``myoptimizer`` module which
provides the following functions and types:
* ``specialize(func, code, guards)``: add the specialized code `code`
with guards `guards` to the function `func`
* ``get_specialized(func)``: get the list of specialized codes as a list
of ``(code, guards)`` tuples where `code` is a callable or code object
and `guards` is a list of a guards
* ``GuardBuiltins(name)``: guard watching for
``builtins.__dict__[name]`` and ``globals()[name]``. The guard fails
if ``builtins.__dict__[name]`` is replaced, or if ``globals()[name]``
is set.
Using bytecode Using bytecode
-------------- --------------
Replace ``chr(65)`` with ``"A"``:: Add specialized bytecode where the call to the pure builtin function
``chr(65)`` is replaced with its result ``"A"``::
import myoptimizer import myoptimizer
@ -117,18 +141,21 @@ Replace ``chr(65)`` with ``"A"``::
def fast_func(): def fast_func():
return "A" return "A"
func._specialize(fast_func.__code__, [myoptimizer.GuardBuiltins("chr")]) myoptimizer.specialize(func, fast_func.__code__,
[myoptimizer.GuardBuiltins("chr")])
del fast_func del fast_func
Example showing the behaviour of the guard::
print("func(): %s" % func()) print("func(): %s" % func())
print("#specialized: %s" % len(func._get_specialized())) print("#specialized: %s" % len(myoptimizer.get_specialized(func)))
print() print()
import builtins import builtins
builtins.chr = lambda obj: "mock" builtins.chr = lambda obj: "mock"
print("func(): %s" % func()) print("func(): %s" % func())
print("#specialized: %s" % len(func._get_specialized())) print("#specialized: %s" % len(myoptimizer.get_specialized(func)))
Output:: Output::
@ -138,41 +165,40 @@ Output::
func(): mock func(): mock
#specialized: 0 #specialized: 0
The hypothetical ``myoptimizer.GuardBuiltins("len")`` is a guard on the The first call uses the specialized bytecode which returns the string
builtin ``len()`` function and the ``len`` name in the global namespace. ``"A"``. The second call removes the specialized code because the
The guard fails if the builtin function is replaced or if a ``len`` name builtin ``chr()`` function was replaced, and executes the original
is defined in the global namespace. bytecode calling ``chr(65)``.
The first call returns directly the string ``"A"``. The second call On a microbenchmark, calling the specialized bytecode takes 88 ns,
removes the specialized function because the builtin ``chr()`` function whereas the original function takes 145 ns (+57 ns): 1.6 times as fast.
was replaced, and executes the original bytecode
On a microbenchmark, calling the specialized function takes 88 ns,
whereas the original bytecode takes 145 ns (+57 ns): 1.6 times as fast.
Using builtin function Using builtin function
---------------------- ----------------------
Replace a slow Python function calling ``chr(obj)`` with a direct call Add the C builtin ``chr()`` function as the specialized code instead of
to the builtin ``chr()`` function:: a bytecode calling ``chr(obj)``::
import myoptimizer import myoptimizer
def func(arg): def func(arg):
return chr(arg) return chr(arg)
func._specialize(chr, [myoptimizer.GuardBuiltins("chr")]) myoptimizer.specialize(func, chr,
[myoptimizer.GuardBuiltins("chr")])
Example showing the behaviour of the guard::
print("func(65): %s" % func(65)) print("func(65): %s" % func(65))
print("#specialized: %s" % len(func._get_specialized())) print("#specialized: %s" % len(myoptimizer.get_specialized(func)))
print() print()
import builtins import builtins
builtins.chr = lambda obj: "mock" builtins.chr = lambda obj: "mock"
print("func(65): %s" % func(65)) print("func(65): %s" % func(65))
print("#specialized: %s" % len(func.get_specialized())) print("#specialized: %s" % len(myoptimizer.get_specialized(func)))
Output:: Output::
@ -182,88 +208,165 @@ Output::
func(): mock func(): mock
#specialized: 0 #specialized: 0
The first call returns directly the builtin ``chr()`` function (without The first call calls the C builtin ``chr()`` function (without creating
creating a Python frame). The second call removes the specialized a Python frame). The second call removes the specialized code because
function because the builtin ``chr()`` function was replaced, and the builtin ``chr()`` function was replaced, and executes the original
executes the original bytecode. bytecode.
On a microbenchmark, calling the specialized function takes 95 ns, On a microbenchmark, calling the C builtin takes 95 ns, whereas the
whereas the original bytecode takes 155 ns (+60 ns): 1.6 times as fast. original bytecode takes 155 ns (+60 ns): 1.6 times as fast. Calling
Calling directly ``chr(65)`` takes 76 ns. directly ``chr(65)`` takes 76 ns.
Python Function Call Choose the specialized code
==================== ===========================
Pseudo-code to call a Python function having specialized functions with Pseudo-code to choose the specialized code to call a pure Python
guards:: function::
def call_func(func, *args, **kwargs): def call_func(func, args, kwargs):
# by default, call the regular bytecode specialized = myoptimizer.get_specialized(func)
code = func.__code__.co_code
specialized = func.get_specialized()
nspecialized = len(specialized) nspecialized = len(specialized)
index = 0 index = 0
while index < nspecialized: while index < nspecialized:
guard = specialized[index].guard specialized_code, guards = specialized[index]
# pass arguments, some guards need them
check = guard(args, kwargs) for guard in guards:
if check == 1: check = guard(args, kwargs)
# guard succeeded: we can use the specialized function if check:
code = specialized[index].code break
break
elif check == -1: if not check:
# guard will always fail: remove the specialized function # all guards succeeded:
del specialized[index] # use the specialized code
elif check == 0: return specialized_code
# guard failed temporarely elif check == 1:
index += 1 # a guard failed temporarely:
# try the next specialized code
index += 1
else:
assert check == 2
# a guard will always fail:
# remove the specialized code
del specialized[index]
# if a guard of each specialized code failed, or if the function
# has no specialized code, use original bytecode
code = func.__code__
# code can be a code object or any callable object
execute_code(code, args, kwargs)
Changes Changes
======= =======
* Add two new private methods to functions: Changes to the Python C API:
* ``_specialize(code, guards: list)``: add specialized * Add a ``PyFuncGuardObject`` object and a ``PyFuncGuard_Type`` type
function with guard. `code` is a code object (ex: * Add a ``PySpecializedFunc`` structure
``func2.__code__``) or any callable object (ex: the builtin * Add the following fields to the ``PyFunctionObject`` structure::
``len()`` function). The specialization can be ignored if a guard
already fails or for other reasons (ex: the implementation of Python
does not implement this feature). Return ``False`` is the
specialized function was ignored, return ``True`` otherwise.
* ``_get_specialized()``: get the list of specialized functions with Py_ssize_t nb_specialized;
guards. Return a list of ``(func, guards)`` tuples where func is the PyObject *specialized; /* array of PySpecializedFunc objects */
specialized function and guards is a list of guards. Return an empty
list if the function was never specialized.
* Add a private ``PyFuncGuard`` Python type. It requires to implement a * Add function methods:
C ``check()`` function, with an optional C ``init()`` function. API:
* ``int init(PyObject *guard, PyObject *func)``: initialize a guard, * ``PyFunction_Specialize()``
*func* is the function to which the specialized function will be * ``PyFunction_GetSpecializedCodes()``
attached. Result: * ``PyFunction_GetSpecializedCode()``
* return ``1`` on success None of these function and types are exposed at the Python level.
* return ``0`` if the guard will always fail (the specialization must be
ignored)
* raise an exception and return ``-1`` on error
* ``int check(PyObject *guard, PyObject **stack, int na, int nk)``: All these additions are explicitly excluded of the stable ABI.
check the guard. Result:
* return 2 on success When a function code is replaced (``func.__code__ = new_code``), all
* return 1 if the guard failed temporarely specialized codes and guards are removed.
* return 0 if the guard will always fail
* raise an exception and return -1 on error
* A guard can be called in Python with parameters, it returns the When a function is serialized ``pickle``, specialized codes and guards are
result of the guard check. ignored (not serialized). Specialized codes and guards are not stored in
``.pyc`` files but created and registered at runtime, when a module is
loaded.
Function guard
--------------
Add a function guard object::
typedef struct {
PyObject ob_base;
int (*init) (PyObject *guard, PyObject *func);
int (*check) (PyObject *guard, PyObject **stack, int na, int nk);
} PyFuncGuardObject;
The ``init()`` function initializes a guard:
* Return ``0`` on success
* Return ``1`` if the guard will always fail: ``PyFunction_Specialize()``
must ignore the specialized code
* Raise an exception and return ``-1`` on error
The ``check()`` function checks a guard:
* Return ``0`` on success
* Return ``1`` if the guard failed temporarely
* Return ``2`` if the guard will always fail: the specialized code must
be removed
* Raise an exception and return ``-1`` on error
*stack* is an array of arguments: indexed arguments followed by (*key*,
*value*) pairs of keyword arguments. *na* is the number of indexed
arguments. *nk* is the number of keyword arguments: the number of (*key*,
*value*) pairs. `stack` contains ``na + nk * 2`` objects.
Specialized code
----------------
Add a specialized code structure::
typedef struct {
PyObject *code; /* callable or code object */
Py_ssize_t nb_guard;
PyObject **guards; /* PyFuncGuardObject objects */
} PySpecializedCode;
Function methods
----------------
Add a function method to specialize the function, add a specialized code
with guards::
int PyFunction_Specialize(PyObject *func,
PyObject *code, PyObject *guards)
Result:
* Return ``0`` on success
* Return ``1`` if the specialization has been ignored
* Raise an exception and return ``-1`` on error
Add a function method to get the list of specialized codes::
PyObject* PyFunction_GetSpecializedCodes(PyObject *func)
Return a list of (*code*, *guards*) tuples where *code* is a callable or
code object and *guards* is a list of ``PyFuncGuard`` objects. Raise an
exception and return ``NULL`` on error.
Add a function method to get the specialized code::
PyObject* PyFunction_GetSpecializedCode(PyObject *func,
PyObject **stack,
int na, int nk)
See ``check()`` function of guards for *stack*, *na* and *nk* arguments.
Return a callable or a code object on success. Raise an exception and
return ``NULL`` on error.
Benchmark
---------
Microbenchmark on ``python3.6 -m timeit -s 'def f(): pass' 'f()'`` (best Microbenchmark on ``python3.6 -m timeit -s 'def f(): pass' 'f()'`` (best
of 3 runs): of 3 runs):
@ -275,30 +378,18 @@ According to this microbenchmark, the changes has no overhead on calling
a Python function without specialization. a Python function without specialization.
Behaviour
=========
When a function code is replaced (``func.__code__ = new_code``), all
specialized functions are removed.
When a function is serialized ``pickle``, specialized functions and
guards are ignored (not serialized). Specialized functions and guards
are not stored in ``.pyc`` files but created and registered at runtime,
when a module is loaded.
Other implementations of Python Other implementations of Python
=============================== ===============================
This PEP is designed to be implemented in C for CPython. This PEP only contains changes to the Python C API, the Python API is
unchanged. Other implementations of Python are free to not implement new
additions, or implement added functions as no-op:
Other implementations of Python are free to not implement added private * ``PyFunction_Specialize()``: always return ``1`` (the specialization
function methods. has been ignored)
* ``PyFunction_GetSpecializedCodes()``: always return an empty list
Or they can implement a ``_specialize()`` method which always ignores * ``PyFunction_GetSpecializedCode()``: return the function code object,
the specialized function (in short, do nothing and always return as the existing ``PyFunction_GET_CODE()`` macro
``False``) and a ``_get_specialized()`` method which always returns an
empty list.
Discussion Discussion