276 lines
8.4 KiB
Plaintext
276 lines
8.4 KiB
Plaintext
PEP: 510
|
|
Title: Specialized functions with guards
|
|
Version: $Revision$
|
|
Last-Modified: $Date$
|
|
Author: Victor Stinner <victor.stinner@gmail.com>
|
|
Status: Draft
|
|
Type: Standards Track
|
|
Content-Type: text/x-rst
|
|
Created: 4-January-2016
|
|
Python-Version: 3.6
|
|
|
|
|
|
Abstract
|
|
========
|
|
|
|
Add an API to add specialized functions with guards to functions, to
|
|
support static optimizers respecting the Python semantics.
|
|
|
|
|
|
Rationale
|
|
=========
|
|
|
|
Python semantics
|
|
----------------
|
|
|
|
Python is hard to optimize because almost everything is mutable: builtin
|
|
functions, function code, global variables, local variables, ... can be
|
|
modified at runtime. Implement optimizations respecting the Python
|
|
semantics requires to detect when "something changes", we will call these
|
|
checks "guards".
|
|
|
|
This PEP proposes to add a ``specialize()`` method to functions to add a
|
|
specialized functions with guards. When the function is called, the
|
|
specialized function is used if nothing changed, otherwise use the
|
|
original bytecode.
|
|
|
|
Writing an optimizer is out of the scope of this PEP.
|
|
|
|
|
|
Why not a JIT compiler?
|
|
-----------------------
|
|
|
|
There are multiple JIT compilers for Python actively developed:
|
|
|
|
* `PyPy <http://pypy.org/>`_
|
|
* `Pyston <https://github.com/dropbox/pyston>`_
|
|
* `Numba <http://numba.pydata.org/>`_
|
|
* `Pyjion <https://github.com/microsoft/pyjion>`_
|
|
|
|
Numba is specific to numerical computation. Pyston and Pyjion are still
|
|
young. PyPy is the most complete Python interpreter, it is much faster
|
|
than CPython and has a very good compatibility with CPython (it respects
|
|
the Python semantics). There are still issues with Python JIT compilers
|
|
which avoid them to be widely used instead of CPython.
|
|
|
|
Many popular libraries like numpy, PyGTK, PyQt, PySide and wxPython are
|
|
implemented in C or C++ and use the Python C API. To have a small memory
|
|
footprint and better performances, Python JIT compilers do not use
|
|
reference counting to use a faster garbage collector, do not use C
|
|
structures of CPython objects and manage memory allocations differently.
|
|
PyPy has a ``cpyext`` module which emulates the Python C API but it has
|
|
worse performances than CPython and does not support the full Python C
|
|
API.
|
|
|
|
New features are first developped in CPython. In january 2016, the
|
|
latest CPython stable version is 3.5, whereas PyPy only supports Python
|
|
2.7 and 3.2, and Pyston only supports Python 2.7.
|
|
|
|
Even if PyPy has a very good compatibility with Python, some modules are
|
|
still not compatible with PyPy: see `PyPy Compatibility Wiki
|
|
<https://bitbucket.org/pypy/compatibility/wiki/Home>`_. The incomplete
|
|
support of the the Python C API is part of this problem. There are also
|
|
subtle differences between PyPy and CPython like reference counting:
|
|
object destructors are always called in PyPy, but can be called "later"
|
|
than in CPython. Using context managers helps to control when resources
|
|
are released.
|
|
|
|
Even if PyPy is much faster than CPython in a wide range of benchmarks,
|
|
some users still report worse performances than CPython on some specific
|
|
use cases or unstable performances.
|
|
|
|
When Python is used as a scripting program for programs running less
|
|
than 1 minute, JIT compilers can be slower because their startup time is
|
|
higher and the JIT compiler takes time to optimize the code. For
|
|
example, most Mercurial commands take a few seconds.
|
|
|
|
Numba now supports ahead of time compilation, but it requires decorator
|
|
to specify arguments types and it only supports numerical types.
|
|
|
|
CPython 3.5 has almost no optimization: the peephole optimizer only
|
|
implements basic optimizations. A static compiler is a compromise
|
|
between CPython 3.5 and PyPy.
|
|
|
|
.. note::
|
|
There was also the Unladen Swallow project, but it was abandonned in
|
|
2011.
|
|
|
|
|
|
Example
|
|
=======
|
|
|
|
Using bytecode
|
|
--------------
|
|
|
|
Replace ``chr(65)`` with ``"A"``::
|
|
|
|
import myoptimizer
|
|
|
|
def func():
|
|
return chr(65)
|
|
|
|
def fast_func():
|
|
return "A"
|
|
|
|
func.specialize(fast_func.__code__, [myoptimizer.GuardBuiltins("chr")])
|
|
del fast_func
|
|
|
|
print("func(): %s" % func())
|
|
print("#specialized: %s" % len(func.get_specialized()))
|
|
print()
|
|
|
|
import builtins
|
|
builtins.chr = lambda obj: "mock"
|
|
|
|
print("func(): %s" % func())
|
|
print("#specialized: %s" % len(func.get_specialized()))
|
|
|
|
Output::
|
|
|
|
func(): A
|
|
#specialized: 1
|
|
|
|
func(): mock
|
|
#specialized: 0
|
|
|
|
The hypothetical ``myoptimizer.GuardBuiltins("len")`` is a guard on the
|
|
builtin ``len()`` function and the ``len`` name in the global namespace.
|
|
The guard fails if the builtin function is replaced or if a ``len`` name
|
|
is defined in the global namespace.
|
|
|
|
The first call returns directly the string ``"A"``. The second call
|
|
removes the specialized function because the builtin ``chr()`` function
|
|
was replaced, and executes the original bytecode
|
|
|
|
On a microbenchmark, calling the specialized function takes 88 ns,
|
|
whereas the original bytecode takes 145 ns (+57 ns): 1.6 times as fast.
|
|
|
|
|
|
Using builtin function
|
|
----------------------
|
|
|
|
Replace a slow Python function calling ``chr(obj)`` with a direct call
|
|
to the builtin ``chr()`` function::
|
|
|
|
import myoptimizer
|
|
|
|
def func(arg):
|
|
return chr(arg)
|
|
|
|
func.specialize(chr, [myoptimizer.GuardBuiltins("chr")])
|
|
|
|
print("func(65): %s" % func(65))
|
|
print("#specialized: %s" % len(func.get_specialized()))
|
|
print()
|
|
|
|
import builtins
|
|
builtins.chr = lambda obj: "mock"
|
|
|
|
print("func(65): %s" % func(65))
|
|
print("#specialized: %s" % len(func.get_specialized()))
|
|
|
|
Output::
|
|
|
|
func(): A
|
|
#specialized: 1
|
|
|
|
func(): mock
|
|
#specialized: 0
|
|
|
|
The first call returns directly the builtin ``chr()`` function (without
|
|
creating a Python frame). The second call removes the specialized
|
|
function because the builtin ``chr()`` function was replaced, and
|
|
executes the original bytecode.
|
|
|
|
On a microbenchmark, calling the specialized function takes 95 ns,
|
|
whereas the original bytecode takes 155 ns (+60 ns): 1.6 times as fast.
|
|
Calling directly ``chr(65)`` takes 76 ns.
|
|
|
|
|
|
Python Function Call
|
|
====================
|
|
|
|
Pseudo-code to call a Python function having specialized functions with
|
|
guards::
|
|
|
|
def call_func(func, *args, **kwargs):
|
|
# by default, call the regular bytecode
|
|
code = func.__code__.co_code
|
|
specialized = func.get_specialized()
|
|
nspecialized = len(specialized)
|
|
|
|
index = 0
|
|
while index < nspecialized:
|
|
guard = specialized[index].guard
|
|
# pass arguments, some guards need them
|
|
check = guard(args, kwargs)
|
|
if check == 1:
|
|
# guard succeeded: we can use the specialized function
|
|
code = specialized[index].code
|
|
break
|
|
elif check == -1:
|
|
# guard will always fail: remove the specialized function
|
|
del specialized[index]
|
|
elif check == 0:
|
|
# guard failed temporarely
|
|
index += 1
|
|
|
|
# code can be a code object or any callable object
|
|
execute_code(code, args, kwargs)
|
|
|
|
|
|
Changes
|
|
=======
|
|
|
|
* Add two new methods to functions:
|
|
|
|
- ``specialize(code, guards: list)``: add specialized
|
|
function with guard. `code` is a code object (ex:
|
|
``func2.__code__``) or any callable object (ex: ``len``).
|
|
The specialization can be ignored if a guard already fails.
|
|
- ``get_specialized()``: get the list of specialized functions with
|
|
guards
|
|
|
|
* Base ``Guard`` type which can be used as parent type to implement
|
|
guards. It requires to implement a ``check()`` function, with an
|
|
optional ``first_check()`` function. API:
|
|
|
|
* ``int first_check(PyObject *guard, PyObject *func)``: return 0 on
|
|
success, -1 if the guard will always fail
|
|
* ``int check(PyObject *guard, PyObject **stack, int na, int nk)``:
|
|
return 1 on success, 0 if the guard failed temporarely, -1 if the
|
|
guard will always fail
|
|
|
|
Microbenchmark on ``python3.6 -m timeit -s 'def f(): pass' 'f()'`` (best
|
|
of 3 runs):
|
|
|
|
* Original Python: 79 ns
|
|
* Patched Python: 79 ns
|
|
|
|
According to this microbenchmark, the changes has no overhead on calling
|
|
a Python function without specialization.
|
|
|
|
|
|
Behaviour
|
|
=========
|
|
|
|
When a function code is replaced (``func.__code__ = new_code``), all
|
|
specialized functions are removed.
|
|
|
|
When a function is serialized (by ``marshal`` or ``pickle`` for
|
|
example), specialized functions and guards are ignored (not serialized).
|
|
|
|
|
|
Discussion
|
|
==========
|
|
|
|
Thread on the python-ideas mailing list: `RFC: PEP: Specialized
|
|
functions with guards
|
|
<https://mail.python.org/pipermail/python-ideas/2016-January/037703.html>`_.
|
|
|
|
|
|
Copyright
|
|
=========
|
|
|
|
This document has been placed in the public domain.
|