PEP 510
* convert Python functions to C functions * replace "specialized functions" with "specialized codes" to avoid confusion * guard functions now use 0 result as success to simplify the code
This commit is contained in:
parent
a19233d285
commit
43bb3fb7ad
313
pep-0510.txt
313
pep-0510.txt
|
@ -1,5 +1,5 @@
|
|||
PEP: 510
|
||||
Title: Specialized functions with guards
|
||||
Title: Specialize functions
|
||||
Version: $Revision$
|
||||
Last-Modified: $Date$
|
||||
Author: Victor Stinner <victor.stinner@gmail.com>
|
||||
|
@ -13,8 +13,9 @@ Python-Version: 3.6
|
|||
Abstract
|
||||
========
|
||||
|
||||
Add a private API to CPython to add specialized functions with guards to
|
||||
functions, to support static optimizers respecting the Python semantics.
|
||||
Add functions to the Python C API to specialize pure Python functions:
|
||||
add specialized codes with guards. It allows to implement static
|
||||
optimizers respecting the Python semantics.
|
||||
|
||||
|
||||
Rationale
|
||||
|
@ -29,15 +30,16 @@ modified at runtime. Implement optimizations respecting the Python
|
|||
semantics requires to detect when "something changes", we will call these
|
||||
checks "guards".
|
||||
|
||||
This PEP proposes to add an API to add specialized functions with guards
|
||||
to a function. When the function is called, the specialized function is
|
||||
used if nothing changed, otherwise use the original bytecode.
|
||||
This PEP proposes to add a public API to the Python C API to add
|
||||
specialized codes with guards to a function. When the function is
|
||||
called, a specialized code is used if nothing changed, otherwise use the
|
||||
original bytecode.
|
||||
|
||||
Even if guards help to respect most parts of the Python semantics, it's
|
||||
really hard to optimize Python without making subtle changes on the
|
||||
exact behaviour. CPython has a long history and many applications rely
|
||||
on implementation details. A compromise must be found between
|
||||
"everything is mutable" and performance.
|
||||
hard to optimize Python without making subtle changes on the exact
|
||||
behaviour. CPython has a long history and many applications rely on
|
||||
implementation details. A compromise must be found between "everything
|
||||
is mutable" and performance.
|
||||
|
||||
Writing an optimizer is out of the scope of this PEP.
|
||||
|
||||
|
@ -101,13 +103,35 @@ between CPython 3.5 and PyPy.
|
|||
2011.
|
||||
|
||||
|
||||
Example
|
||||
=======
|
||||
Examples
|
||||
========
|
||||
|
||||
Following examples are not written to show powerful optimizations
|
||||
promising important speedup, but to be short and easy to understand,
|
||||
just to explain the principle.
|
||||
|
||||
Hypothetical myoptimizer module
|
||||
-------------------------------
|
||||
|
||||
Examples in this PEP uses an hypothetical ``myoptimizer`` module which
|
||||
provides the following functions and types:
|
||||
|
||||
* ``specialize(func, code, guards)``: add the specialized code `code`
|
||||
with guards `guards` to the function `func`
|
||||
* ``get_specialized(func)``: get the list of specialized codes as a list
|
||||
of ``(code, guards)`` tuples where `code` is a callable or code object
|
||||
and `guards` is a list of a guards
|
||||
* ``GuardBuiltins(name)``: guard watching for
|
||||
``builtins.__dict__[name]`` and ``globals()[name]``. The guard fails
|
||||
if ``builtins.__dict__[name]`` is replaced, or if ``globals()[name]``
|
||||
is set.
|
||||
|
||||
|
||||
Using bytecode
|
||||
--------------
|
||||
|
||||
Replace ``chr(65)`` with ``"A"``::
|
||||
Add specialized bytecode where the call to the pure builtin function
|
||||
``chr(65)`` is replaced with its result ``"A"``::
|
||||
|
||||
import myoptimizer
|
||||
|
||||
|
@ -117,18 +141,21 @@ Replace ``chr(65)`` with ``"A"``::
|
|||
def fast_func():
|
||||
return "A"
|
||||
|
||||
func._specialize(fast_func.__code__, [myoptimizer.GuardBuiltins("chr")])
|
||||
myoptimizer.specialize(func, fast_func.__code__,
|
||||
[myoptimizer.GuardBuiltins("chr")])
|
||||
del fast_func
|
||||
|
||||
Example showing the behaviour of the guard::
|
||||
|
||||
print("func(): %s" % func())
|
||||
print("#specialized: %s" % len(func._get_specialized()))
|
||||
print("#specialized: %s" % len(myoptimizer.get_specialized(func)))
|
||||
print()
|
||||
|
||||
import builtins
|
||||
builtins.chr = lambda obj: "mock"
|
||||
|
||||
print("func(): %s" % func())
|
||||
print("#specialized: %s" % len(func._get_specialized()))
|
||||
print("#specialized: %s" % len(myoptimizer.get_specialized(func)))
|
||||
|
||||
Output::
|
||||
|
||||
|
@ -138,41 +165,40 @@ Output::
|
|||
func(): mock
|
||||
#specialized: 0
|
||||
|
||||
The hypothetical ``myoptimizer.GuardBuiltins("len")`` is a guard on the
|
||||
builtin ``len()`` function and the ``len`` name in the global namespace.
|
||||
The guard fails if the builtin function is replaced or if a ``len`` name
|
||||
is defined in the global namespace.
|
||||
The first call uses the specialized bytecode which returns the string
|
||||
``"A"``. The second call removes the specialized code because the
|
||||
builtin ``chr()`` function was replaced, and executes the original
|
||||
bytecode calling ``chr(65)``.
|
||||
|
||||
The first call returns directly the string ``"A"``. The second call
|
||||
removes the specialized function because the builtin ``chr()`` function
|
||||
was replaced, and executes the original bytecode
|
||||
|
||||
On a microbenchmark, calling the specialized function takes 88 ns,
|
||||
whereas the original bytecode takes 145 ns (+57 ns): 1.6 times as fast.
|
||||
On a microbenchmark, calling the specialized bytecode takes 88 ns,
|
||||
whereas the original function takes 145 ns (+57 ns): 1.6 times as fast.
|
||||
|
||||
|
||||
Using builtin function
|
||||
----------------------
|
||||
|
||||
Replace a slow Python function calling ``chr(obj)`` with a direct call
|
||||
to the builtin ``chr()`` function::
|
||||
Add the C builtin ``chr()`` function as the specialized code instead of
|
||||
a bytecode calling ``chr(obj)``::
|
||||
|
||||
import myoptimizer
|
||||
|
||||
def func(arg):
|
||||
return chr(arg)
|
||||
|
||||
func._specialize(chr, [myoptimizer.GuardBuiltins("chr")])
|
||||
myoptimizer.specialize(func, chr,
|
||||
[myoptimizer.GuardBuiltins("chr")])
|
||||
|
||||
Example showing the behaviour of the guard::
|
||||
|
||||
print("func(65): %s" % func(65))
|
||||
print("#specialized: %s" % len(func._get_specialized()))
|
||||
print("#specialized: %s" % len(myoptimizer.get_specialized(func)))
|
||||
print()
|
||||
|
||||
import builtins
|
||||
builtins.chr = lambda obj: "mock"
|
||||
|
||||
print("func(65): %s" % func(65))
|
||||
print("#specialized: %s" % len(func.get_specialized()))
|
||||
print("#specialized: %s" % len(myoptimizer.get_specialized(func)))
|
||||
|
||||
Output::
|
||||
|
||||
|
@ -182,88 +208,165 @@ Output::
|
|||
func(): mock
|
||||
#specialized: 0
|
||||
|
||||
The first call returns directly the builtin ``chr()`` function (without
|
||||
creating a Python frame). The second call removes the specialized
|
||||
function because the builtin ``chr()`` function was replaced, and
|
||||
executes the original bytecode.
|
||||
The first call calls the C builtin ``chr()`` function (without creating
|
||||
a Python frame). The second call removes the specialized code because
|
||||
the builtin ``chr()`` function was replaced, and executes the original
|
||||
bytecode.
|
||||
|
||||
On a microbenchmark, calling the specialized function takes 95 ns,
|
||||
whereas the original bytecode takes 155 ns (+60 ns): 1.6 times as fast.
|
||||
Calling directly ``chr(65)`` takes 76 ns.
|
||||
On a microbenchmark, calling the C builtin takes 95 ns, whereas the
|
||||
original bytecode takes 155 ns (+60 ns): 1.6 times as fast. Calling
|
||||
directly ``chr(65)`` takes 76 ns.
|
||||
|
||||
|
||||
Python Function Call
|
||||
====================
|
||||
Choose the specialized code
|
||||
===========================
|
||||
|
||||
Pseudo-code to call a Python function having specialized functions with
|
||||
guards::
|
||||
Pseudo-code to choose the specialized code to call a pure Python
|
||||
function::
|
||||
|
||||
def call_func(func, *args, **kwargs):
|
||||
# by default, call the regular bytecode
|
||||
code = func.__code__.co_code
|
||||
specialized = func.get_specialized()
|
||||
def call_func(func, args, kwargs):
|
||||
specialized = myoptimizer.get_specialized(func)
|
||||
nspecialized = len(specialized)
|
||||
|
||||
index = 0
|
||||
while index < nspecialized:
|
||||
guard = specialized[index].guard
|
||||
# pass arguments, some guards need them
|
||||
check = guard(args, kwargs)
|
||||
if check == 1:
|
||||
# guard succeeded: we can use the specialized function
|
||||
code = specialized[index].code
|
||||
break
|
||||
elif check == -1:
|
||||
# guard will always fail: remove the specialized function
|
||||
del specialized[index]
|
||||
elif check == 0:
|
||||
# guard failed temporarely
|
||||
index += 1
|
||||
specialized_code, guards = specialized[index]
|
||||
|
||||
for guard in guards:
|
||||
check = guard(args, kwargs)
|
||||
if check:
|
||||
break
|
||||
|
||||
if not check:
|
||||
# all guards succeeded:
|
||||
# use the specialized code
|
||||
return specialized_code
|
||||
elif check == 1:
|
||||
# a guard failed temporarely:
|
||||
# try the next specialized code
|
||||
index += 1
|
||||
else:
|
||||
assert check == 2
|
||||
# a guard will always fail:
|
||||
# remove the specialized code
|
||||
del specialized[index]
|
||||
|
||||
# if a guard of each specialized code failed, or if the function
|
||||
# has no specialized code, use original bytecode
|
||||
code = func.__code__
|
||||
|
||||
# code can be a code object or any callable object
|
||||
execute_code(code, args, kwargs)
|
||||
|
||||
|
||||
Changes
|
||||
=======
|
||||
|
||||
* Add two new private methods to functions:
|
||||
Changes to the Python C API:
|
||||
|
||||
* ``_specialize(code, guards: list)``: add specialized
|
||||
function with guard. `code` is a code object (ex:
|
||||
``func2.__code__``) or any callable object (ex: the builtin
|
||||
``len()`` function). The specialization can be ignored if a guard
|
||||
already fails or for other reasons (ex: the implementation of Python
|
||||
does not implement this feature). Return ``False`` is the
|
||||
specialized function was ignored, return ``True`` otherwise.
|
||||
* Add a ``PyFuncGuardObject`` object and a ``PyFuncGuard_Type`` type
|
||||
* Add a ``PySpecializedFunc`` structure
|
||||
* Add the following fields to the ``PyFunctionObject`` structure::
|
||||
|
||||
* ``_get_specialized()``: get the list of specialized functions with
|
||||
guards. Return a list of ``(func, guards)`` tuples where func is the
|
||||
specialized function and guards is a list of guards. Return an empty
|
||||
list if the function was never specialized.
|
||||
Py_ssize_t nb_specialized;
|
||||
PyObject *specialized; /* array of PySpecializedFunc objects */
|
||||
|
||||
* Add a private ``PyFuncGuard`` Python type. It requires to implement a
|
||||
C ``check()`` function, with an optional C ``init()`` function. API:
|
||||
* Add function methods:
|
||||
|
||||
* ``int init(PyObject *guard, PyObject *func)``: initialize a guard,
|
||||
*func* is the function to which the specialized function will be
|
||||
attached. Result:
|
||||
* ``PyFunction_Specialize()``
|
||||
* ``PyFunction_GetSpecializedCodes()``
|
||||
* ``PyFunction_GetSpecializedCode()``
|
||||
|
||||
* return ``1`` on success
|
||||
* return ``0`` if the guard will always fail (the specialization must be
|
||||
ignored)
|
||||
* raise an exception and return ``-1`` on error
|
||||
None of these function and types are exposed at the Python level.
|
||||
|
||||
* ``int check(PyObject *guard, PyObject **stack, int na, int nk)``:
|
||||
check the guard. Result:
|
||||
All these additions are explicitly excluded of the stable ABI.
|
||||
|
||||
* return 2 on success
|
||||
* return 1 if the guard failed temporarely
|
||||
* return 0 if the guard will always fail
|
||||
* raise an exception and return -1 on error
|
||||
When a function code is replaced (``func.__code__ = new_code``), all
|
||||
specialized codes and guards are removed.
|
||||
|
||||
* A guard can be called in Python with parameters, it returns the
|
||||
result of the guard check.
|
||||
When a function is serialized ``pickle``, specialized codes and guards are
|
||||
ignored (not serialized). Specialized codes and guards are not stored in
|
||||
``.pyc`` files but created and registered at runtime, when a module is
|
||||
loaded.
|
||||
|
||||
|
||||
Function guard
|
||||
--------------
|
||||
|
||||
Add a function guard object::
|
||||
|
||||
typedef struct {
|
||||
PyObject ob_base;
|
||||
int (*init) (PyObject *guard, PyObject *func);
|
||||
int (*check) (PyObject *guard, PyObject **stack, int na, int nk);
|
||||
} PyFuncGuardObject;
|
||||
|
||||
The ``init()`` function initializes a guard:
|
||||
|
||||
* Return ``0`` on success
|
||||
* Return ``1`` if the guard will always fail: ``PyFunction_Specialize()``
|
||||
must ignore the specialized code
|
||||
* Raise an exception and return ``-1`` on error
|
||||
|
||||
|
||||
The ``check()`` function checks a guard:
|
||||
|
||||
* Return ``0`` on success
|
||||
* Return ``1`` if the guard failed temporarely
|
||||
* Return ``2`` if the guard will always fail: the specialized code must
|
||||
be removed
|
||||
* Raise an exception and return ``-1`` on error
|
||||
|
||||
*stack* is an array of arguments: indexed arguments followed by (*key*,
|
||||
*value*) pairs of keyword arguments. *na* is the number of indexed
|
||||
arguments. *nk* is the number of keyword arguments: the number of (*key*,
|
||||
*value*) pairs. `stack` contains ``na + nk * 2`` objects.
|
||||
|
||||
|
||||
Specialized code
|
||||
----------------
|
||||
|
||||
Add a specialized code structure::
|
||||
|
||||
typedef struct {
|
||||
PyObject *code; /* callable or code object */
|
||||
Py_ssize_t nb_guard;
|
||||
PyObject **guards; /* PyFuncGuardObject objects */
|
||||
} PySpecializedCode;
|
||||
|
||||
|
||||
Function methods
|
||||
----------------
|
||||
|
||||
Add a function method to specialize the function, add a specialized code
|
||||
with guards::
|
||||
|
||||
int PyFunction_Specialize(PyObject *func,
|
||||
PyObject *code, PyObject *guards)
|
||||
|
||||
Result:
|
||||
|
||||
* Return ``0`` on success
|
||||
* Return ``1`` if the specialization has been ignored
|
||||
* Raise an exception and return ``-1`` on error
|
||||
|
||||
Add a function method to get the list of specialized codes::
|
||||
|
||||
PyObject* PyFunction_GetSpecializedCodes(PyObject *func)
|
||||
|
||||
Return a list of (*code*, *guards*) tuples where *code* is a callable or
|
||||
code object and *guards* is a list of ``PyFuncGuard`` objects. Raise an
|
||||
exception and return ``NULL`` on error.
|
||||
|
||||
Add a function method to get the specialized code::
|
||||
|
||||
PyObject* PyFunction_GetSpecializedCode(PyObject *func,
|
||||
PyObject **stack,
|
||||
int na, int nk)
|
||||
|
||||
See ``check()`` function of guards for *stack*, *na* and *nk* arguments.
|
||||
Return a callable or a code object on success. Raise an exception and
|
||||
return ``NULL`` on error.
|
||||
|
||||
Benchmark
|
||||
---------
|
||||
|
||||
Microbenchmark on ``python3.6 -m timeit -s 'def f(): pass' 'f()'`` (best
|
||||
of 3 runs):
|
||||
|
@ -275,30 +378,18 @@ According to this microbenchmark, the changes has no overhead on calling
|
|||
a Python function without specialization.
|
||||
|
||||
|
||||
Behaviour
|
||||
=========
|
||||
|
||||
When a function code is replaced (``func.__code__ = new_code``), all
|
||||
specialized functions are removed.
|
||||
|
||||
When a function is serialized ``pickle``, specialized functions and
|
||||
guards are ignored (not serialized). Specialized functions and guards
|
||||
are not stored in ``.pyc`` files but created and registered at runtime,
|
||||
when a module is loaded.
|
||||
|
||||
|
||||
Other implementations of Python
|
||||
===============================
|
||||
|
||||
This PEP is designed to be implemented in C for CPython.
|
||||
This PEP only contains changes to the Python C API, the Python API is
|
||||
unchanged. Other implementations of Python are free to not implement new
|
||||
additions, or implement added functions as no-op:
|
||||
|
||||
Other implementations of Python are free to not implement added private
|
||||
function methods.
|
||||
|
||||
Or they can implement a ``_specialize()`` method which always ignores
|
||||
the specialized function (in short, do nothing and always return
|
||||
``False``) and a ``_get_specialized()`` method which always returns an
|
||||
empty list.
|
||||
* ``PyFunction_Specialize()``: always return ``1`` (the specialization
|
||||
has been ignored)
|
||||
* ``PyFunction_GetSpecializedCodes()``: always return an empty list
|
||||
* ``PyFunction_GetSpecializedCode()``: return the function code object,
|
||||
as the existing ``PyFunction_GET_CODE()`` macro
|
||||
|
||||
|
||||
Discussion
|
||||
|
|
Loading…
Reference in New Issue