From 43bb3fb7ada48c7c440d9d95cd238059fcf63d15 Mon Sep 17 00:00:00 2001 From: Victor Stinner Date: Tue, 12 Jan 2016 12:11:53 +0100 Subject: [PATCH] PEP 510 * convert Python functions to C functions * replace "specialized functions" with "specialized codes" to avoid confusion * guard functions now use 0 result as success to simplify the code --- pep-0510.txt | 313 +++++++++++++++++++++++++++++++++------------------ 1 file changed, 202 insertions(+), 111 deletions(-) diff --git a/pep-0510.txt b/pep-0510.txt index b26305977..6ae4a32e5 100644 --- a/pep-0510.txt +++ b/pep-0510.txt @@ -1,5 +1,5 @@ PEP: 510 -Title: Specialized functions with guards +Title: Specialize functions Version: $Revision$ Last-Modified: $Date$ Author: Victor Stinner @@ -13,8 +13,9 @@ Python-Version: 3.6 Abstract ======== -Add a private API to CPython to add specialized functions with guards to -functions, to support static optimizers respecting the Python semantics. +Add functions to the Python C API to specialize pure Python functions: +add specialized codes with guards. It allows to implement static +optimizers respecting the Python semantics. Rationale @@ -29,15 +30,16 @@ modified at runtime. Implement optimizations respecting the Python semantics requires to detect when "something changes", we will call these checks "guards". -This PEP proposes to add an API to add specialized functions with guards -to a function. When the function is called, the specialized function is -used if nothing changed, otherwise use the original bytecode. +This PEP proposes to add a public API to the Python C API to add +specialized codes with guards to a function. When the function is +called, a specialized code is used if nothing changed, otherwise use the +original bytecode. Even if guards help to respect most parts of the Python semantics, it's -really hard to optimize Python without making subtle changes on the -exact behaviour. CPython has a long history and many applications rely -on implementation details. A compromise must be found between -"everything is mutable" and performance. +hard to optimize Python without making subtle changes on the exact +behaviour. CPython has a long history and many applications rely on +implementation details. A compromise must be found between "everything +is mutable" and performance. Writing an optimizer is out of the scope of this PEP. @@ -101,13 +103,35 @@ between CPython 3.5 and PyPy. 2011. -Example -======= +Examples +======== + +Following examples are not written to show powerful optimizations +promising important speedup, but to be short and easy to understand, +just to explain the principle. + +Hypothetical myoptimizer module +------------------------------- + +Examples in this PEP uses an hypothetical ``myoptimizer`` module which +provides the following functions and types: + +* ``specialize(func, code, guards)``: add the specialized code `code` + with guards `guards` to the function `func` +* ``get_specialized(func)``: get the list of specialized codes as a list + of ``(code, guards)`` tuples where `code` is a callable or code object + and `guards` is a list of a guards +* ``GuardBuiltins(name)``: guard watching for + ``builtins.__dict__[name]`` and ``globals()[name]``. The guard fails + if ``builtins.__dict__[name]`` is replaced, or if ``globals()[name]`` + is set. + Using bytecode -------------- -Replace ``chr(65)`` with ``"A"``:: +Add specialized bytecode where the call to the pure builtin function +``chr(65)`` is replaced with its result ``"A"``:: import myoptimizer @@ -117,18 +141,21 @@ Replace ``chr(65)`` with ``"A"``:: def fast_func(): return "A" - func._specialize(fast_func.__code__, [myoptimizer.GuardBuiltins("chr")]) + myoptimizer.specialize(func, fast_func.__code__, + [myoptimizer.GuardBuiltins("chr")]) del fast_func +Example showing the behaviour of the guard:: + print("func(): %s" % func()) - print("#specialized: %s" % len(func._get_specialized())) + print("#specialized: %s" % len(myoptimizer.get_specialized(func))) print() import builtins builtins.chr = lambda obj: "mock" print("func(): %s" % func()) - print("#specialized: %s" % len(func._get_specialized())) + print("#specialized: %s" % len(myoptimizer.get_specialized(func))) Output:: @@ -138,41 +165,40 @@ Output:: func(): mock #specialized: 0 -The hypothetical ``myoptimizer.GuardBuiltins("len")`` is a guard on the -builtin ``len()`` function and the ``len`` name in the global namespace. -The guard fails if the builtin function is replaced or if a ``len`` name -is defined in the global namespace. +The first call uses the specialized bytecode which returns the string +``"A"``. The second call removes the specialized code because the +builtin ``chr()`` function was replaced, and executes the original +bytecode calling ``chr(65)``. -The first call returns directly the string ``"A"``. The second call -removes the specialized function because the builtin ``chr()`` function -was replaced, and executes the original bytecode - -On a microbenchmark, calling the specialized function takes 88 ns, -whereas the original bytecode takes 145 ns (+57 ns): 1.6 times as fast. +On a microbenchmark, calling the specialized bytecode takes 88 ns, +whereas the original function takes 145 ns (+57 ns): 1.6 times as fast. Using builtin function ---------------------- -Replace a slow Python function calling ``chr(obj)`` with a direct call -to the builtin ``chr()`` function:: +Add the C builtin ``chr()`` function as the specialized code instead of +a bytecode calling ``chr(obj)``:: import myoptimizer def func(arg): return chr(arg) - func._specialize(chr, [myoptimizer.GuardBuiltins("chr")]) + myoptimizer.specialize(func, chr, + [myoptimizer.GuardBuiltins("chr")]) + +Example showing the behaviour of the guard:: print("func(65): %s" % func(65)) - print("#specialized: %s" % len(func._get_specialized())) + print("#specialized: %s" % len(myoptimizer.get_specialized(func))) print() import builtins builtins.chr = lambda obj: "mock" print("func(65): %s" % func(65)) - print("#specialized: %s" % len(func.get_specialized())) + print("#specialized: %s" % len(myoptimizer.get_specialized(func))) Output:: @@ -182,88 +208,165 @@ Output:: func(): mock #specialized: 0 -The first call returns directly the builtin ``chr()`` function (without -creating a Python frame). The second call removes the specialized -function because the builtin ``chr()`` function was replaced, and -executes the original bytecode. +The first call calls the C builtin ``chr()`` function (without creating +a Python frame). The second call removes the specialized code because +the builtin ``chr()`` function was replaced, and executes the original +bytecode. -On a microbenchmark, calling the specialized function takes 95 ns, -whereas the original bytecode takes 155 ns (+60 ns): 1.6 times as fast. -Calling directly ``chr(65)`` takes 76 ns. +On a microbenchmark, calling the C builtin takes 95 ns, whereas the +original bytecode takes 155 ns (+60 ns): 1.6 times as fast. Calling +directly ``chr(65)`` takes 76 ns. -Python Function Call -==================== +Choose the specialized code +=========================== -Pseudo-code to call a Python function having specialized functions with -guards:: +Pseudo-code to choose the specialized code to call a pure Python +function:: - def call_func(func, *args, **kwargs): - # by default, call the regular bytecode - code = func.__code__.co_code - specialized = func.get_specialized() + def call_func(func, args, kwargs): + specialized = myoptimizer.get_specialized(func) nspecialized = len(specialized) - index = 0 while index < nspecialized: - guard = specialized[index].guard - # pass arguments, some guards need them - check = guard(args, kwargs) - if check == 1: - # guard succeeded: we can use the specialized function - code = specialized[index].code - break - elif check == -1: - # guard will always fail: remove the specialized function - del specialized[index] - elif check == 0: - # guard failed temporarely - index += 1 + specialized_code, guards = specialized[index] + + for guard in guards: + check = guard(args, kwargs) + if check: + break + + if not check: + # all guards succeeded: + # use the specialized code + return specialized_code + elif check == 1: + # a guard failed temporarely: + # try the next specialized code + index += 1 + else: + assert check == 2 + # a guard will always fail: + # remove the specialized code + del specialized[index] + + # if a guard of each specialized code failed, or if the function + # has no specialized code, use original bytecode + code = func.__code__ - # code can be a code object or any callable object - execute_code(code, args, kwargs) Changes ======= -* Add two new private methods to functions: +Changes to the Python C API: - * ``_specialize(code, guards: list)``: add specialized - function with guard. `code` is a code object (ex: - ``func2.__code__``) or any callable object (ex: the builtin - ``len()`` function). The specialization can be ignored if a guard - already fails or for other reasons (ex: the implementation of Python - does not implement this feature). Return ``False`` is the - specialized function was ignored, return ``True`` otherwise. +* Add a ``PyFuncGuardObject`` object and a ``PyFuncGuard_Type`` type +* Add a ``PySpecializedFunc`` structure +* Add the following fields to the ``PyFunctionObject`` structure:: - * ``_get_specialized()``: get the list of specialized functions with - guards. Return a list of ``(func, guards)`` tuples where func is the - specialized function and guards is a list of guards. Return an empty - list if the function was never specialized. + Py_ssize_t nb_specialized; + PyObject *specialized; /* array of PySpecializedFunc objects */ -* Add a private ``PyFuncGuard`` Python type. It requires to implement a - C ``check()`` function, with an optional C ``init()`` function. API: +* Add function methods: - * ``int init(PyObject *guard, PyObject *func)``: initialize a guard, - *func* is the function to which the specialized function will be - attached. Result: + * ``PyFunction_Specialize()`` + * ``PyFunction_GetSpecializedCodes()`` + * ``PyFunction_GetSpecializedCode()`` - * return ``1`` on success - * return ``0`` if the guard will always fail (the specialization must be - ignored) - * raise an exception and return ``-1`` on error +None of these function and types are exposed at the Python level. - * ``int check(PyObject *guard, PyObject **stack, int na, int nk)``: - check the guard. Result: +All these additions are explicitly excluded of the stable ABI. - * return 2 on success - * return 1 if the guard failed temporarely - * return 0 if the guard will always fail - * raise an exception and return -1 on error +When a function code is replaced (``func.__code__ = new_code``), all +specialized codes and guards are removed. - * A guard can be called in Python with parameters, it returns the - result of the guard check. +When a function is serialized ``pickle``, specialized codes and guards are +ignored (not serialized). Specialized codes and guards are not stored in +``.pyc`` files but created and registered at runtime, when a module is +loaded. + + +Function guard +-------------- + +Add a function guard object:: + + typedef struct { + PyObject ob_base; + int (*init) (PyObject *guard, PyObject *func); + int (*check) (PyObject *guard, PyObject **stack, int na, int nk); + } PyFuncGuardObject; + +The ``init()`` function initializes a guard: + +* Return ``0`` on success +* Return ``1`` if the guard will always fail: ``PyFunction_Specialize()`` + must ignore the specialized code +* Raise an exception and return ``-1`` on error + + +The ``check()`` function checks a guard: + +* Return ``0`` on success +* Return ``1`` if the guard failed temporarely +* Return ``2`` if the guard will always fail: the specialized code must + be removed +* Raise an exception and return ``-1`` on error + +*stack* is an array of arguments: indexed arguments followed by (*key*, +*value*) pairs of keyword arguments. *na* is the number of indexed +arguments. *nk* is the number of keyword arguments: the number of (*key*, +*value*) pairs. `stack` contains ``na + nk * 2`` objects. + + +Specialized code +---------------- + +Add a specialized code structure:: + + typedef struct { + PyObject *code; /* callable or code object */ + Py_ssize_t nb_guard; + PyObject **guards; /* PyFuncGuardObject objects */ + } PySpecializedCode; + + +Function methods +---------------- + +Add a function method to specialize the function, add a specialized code +with guards:: + + int PyFunction_Specialize(PyObject *func, + PyObject *code, PyObject *guards) + +Result: + +* Return ``0`` on success +* Return ``1`` if the specialization has been ignored +* Raise an exception and return ``-1`` on error + +Add a function method to get the list of specialized codes:: + + PyObject* PyFunction_GetSpecializedCodes(PyObject *func) + +Return a list of (*code*, *guards*) tuples where *code* is a callable or +code object and *guards* is a list of ``PyFuncGuard`` objects. Raise an +exception and return ``NULL`` on error. + +Add a function method to get the specialized code:: + + PyObject* PyFunction_GetSpecializedCode(PyObject *func, + PyObject **stack, + int na, int nk) + +See ``check()`` function of guards for *stack*, *na* and *nk* arguments. +Return a callable or a code object on success. Raise an exception and +return ``NULL`` on error. + +Benchmark +--------- Microbenchmark on ``python3.6 -m timeit -s 'def f(): pass' 'f()'`` (best of 3 runs): @@ -275,30 +378,18 @@ According to this microbenchmark, the changes has no overhead on calling a Python function without specialization. -Behaviour -========= - -When a function code is replaced (``func.__code__ = new_code``), all -specialized functions are removed. - -When a function is serialized ``pickle``, specialized functions and -guards are ignored (not serialized). Specialized functions and guards -are not stored in ``.pyc`` files but created and registered at runtime, -when a module is loaded. - - Other implementations of Python =============================== -This PEP is designed to be implemented in C for CPython. +This PEP only contains changes to the Python C API, the Python API is +unchanged. Other implementations of Python are free to not implement new +additions, or implement added functions as no-op: -Other implementations of Python are free to not implement added private -function methods. - -Or they can implement a ``_specialize()`` method which always ignores -the specialized function (in short, do nothing and always return -``False``) and a ``_get_specialized()`` method which always returns an -empty list. +* ``PyFunction_Specialize()``: always return ``1`` (the specialization + has been ignored) +* ``PyFunction_GetSpecializedCodes()``: always return an empty list +* ``PyFunction_GetSpecializedCode()``: return the function code object, + as the existing ``PyFunction_GET_CODE()`` macro Discussion