python-peps/pep-0590.rst

259 lines
13 KiB
ReStructuredText

PEP: 590
Title: Vectorcall: A new calling convention for CPython
Author: Mark Shannon <mark@hotpy.org>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 29-Mar-2019
Python-Version: 3.8
Post-History:
Abstract
========
This PEP introduces a new calling convention [1]_ for use by CPython and other software and tools in the CPython ecosystem.
The new calling convention is a formalisation and extension of "fastcall", a calling convention already used internally by CPython.
Rationale
=========
The choice of a calling convention impacts the performance and flexibility of code on either side of the call.
Often there is tension between performance and flexibility.
The current ``tp_call`` [2]_ calling convention is sufficiently flexible to cover all cases, but its performance is poor.
The poor performance is largely a result of having to create intermediate tuples, and possibly intermediate dicts, during the call.
This is mitigated in CPython by including special-case code to speed up calls to Python and builtin functions.
Unfortunately this means that other callables such as classes and third party extension objects are called using the
slower, more general ``tp_call`` calling convention.
This PEP proposes that the calling convention used internally for Python and builtin functions is generalized and published
so that all calls can benefit from better performance.
Improved Performance
--------------------
The current ``tp_call`` calling convention requires creation of a tuple and, if there are any named arguments, a dictionary for every call.
This is expensive. The proposed calling convention removes the need to create almost all of these temporary objects.
Another source of inefficiency in the ``tp_call`` convention is that it has one function pointer per-class, rather than per-object. This is inefficient for calls to classes as several intermediate objects need to be created. For a user defined class, ``UserClass``, at least one intermediate object is created for each call in the sequence ``type.__call__``, ``object.__new__``, ``UserClass.__init__``.
The new proposed calling convention is not fully general, but covers the large majority of calls.
It is designed to remove the overhead of temporary object creation and multiple indirections.
Specification
=============
The function pointer type
-------------------------
Calls are made through a function pointer taking the following parameters:
* ``PyObject *callable``: The called object
* ``Py_ssize_t n``: The number of arguments plus an optional offset flag for the first argument in vector.
* ``PyObject **args``: A vector of arguments
* ``PyTupleObject *kwnames``: A tuple of the names of the named arguments.
This is implemented by the function pointer type:
``typedef PyObject *(*vectorcall)(PyObject *callable, Py_ssize_t n, PyObject** args, PyTupleObject *kwnames);``
Changes to the ``PyTypeObject``
-------------------------------
The a new slot called ``tp_vectorcall_offset`` is added. It has the type ``uint32_t``.
A new flag is added, ``Py_TPFLAGS_HAVE_VECTORCALL``, which is set for any new PyTypeObjects that include the
``tp_vectorcall_offset`` member.
If ``Py_TPFLAGS_HAVE_VECTORCALL`` is set then ``tp_vectorcall_offset`` is the offset
into the object of the ``vectorcall`` function-pointer.
The unused slot ``printfunc tp_print`` is replaced with ``vectorcall tp_vectorcall``, so that classes
can support the vectorcall calling convention.
Additional flags
----------------
One additional flag is specified: ``Py_TPFLAGS_METHOD_DESCRIPTOR``.
``Py_TPFLAGS_METHOD_DESCRIPTOR`` should be set if the the callable uses the descriptor protocol to create a method or method-like object.
This is used by the interpreter to avoid creating temporary objects when calling methods.
If this flag is set for a class ``F``, then instances of that class are expected to behave the same as a Python function when used as a class attribute.
Specifically, this means that the value of ``c.m`` where ``C.m`` is an instance of the class ``F`` (and ``c`` is an instance of ``C``)
must be an object that acts like a bound method binding ``C.m`` and ``c``.
This flag is necessary if custom callables are to be able to behave like Python functions *and* be called as efficiently as Python or built-in functions.
The call
--------
The call takes the form ``((vectorcall)(((char *)o)+offset))(o, n, args, kwnames)`` where
``offset`` is ``Py_TYPE(o)->tp_vectorcall_offset``.
The caller is responsible for creating the ``kwnames`` tuple and ensuring that there are no duplicates in it.
``n`` is the number of postional arguments plus ``PY_VECTORCALL_ARGUMENTS_OFFSET`` if the argument vector pointer points to argument 1 in the
allocated vector and the callee is allowed to mutate the contents of the ``args`` vector.
``n = number_postional_args | (offset ? PY_VECTORCALL_ARGUMENTS_OFFSET: 0))``.
PY_VECTORCALL_ARGUMENTS_OFFSET
------------------------------
When a caller sets the ``PY_VECTORCALL_ARGUMENTS_OFFSET`` it is indicating that the ``args`` pointer points to item 1 (counting from 0) of the allocated array
and that the contents of the allocated array can be safely mutated by the callee. The callee still needs to make sure that the reference counts of any objects
in the array remain correct.
Example of how ``PY_VECTORCALL_ARGUMENTS_OFFSET`` is used by a callee is safely used to avoid allocation [3]_
Whenever they can do so cheaply (without allocation) callers are encouraged to offset the arguments.
Doing so will allow callables such as bound methods to make their onward calls cheaply.
The interpreter already allocates space on the stack for the callable, so it can offset its arguments for no additional cost.
Continued prohibition of callable classes as base classes
---------------------------------------------------------
Currently any attempt to use ``function``, ``method`` or ``method_descriptor`` as a base class for a new class will fail with a ``TypeError``.
This behaviour is desirable as it prevents errors when a subclass overrides the ``__call__`` method.
If callables could be sub-classed then any call to a ``function`` or a ``method_descriptor`` would need an additional check that the ``__call__`` method had not been overridden. By exposing an additional call mechanism, the potential for errors becomes greater. As a consequence, any third-party class implementing the new call interface will not be usable as a base class.
New C API and changes to CPython
================================
``PyObject *PyObject_VectorCall(PyObject *obj, PyObject **args, Py_ssize_t nargs, PyTupleObject *kwnames)``
Calls ``obj`` with the given arguments. Note that ``nargs`` may include the flag ``PY_VECTORCALL_ARGUMENTS_OFFSET``.
``nargs & ~PY_VECTORCALL_ARGUMENTS_OFFSET`` is the number of positional arguments.
``PyObject_VectorCall`` raises an exception if ``obj`` is not callable.
Two utility functions are provided to call the new calling convention from the old one, or vice-versa.
These functions are ``PyObject *PyCall_MakeVectorCall(PyObject *obj, PyObject *tuple, PyObject **dict)`` and
``PyObject *PyCall_MakeTpCall(PyObject *obj, PyObject **args, Py_ssize_t nargs, PyTupleObject *kwnames)``, respectively.
Both functions raise an exception if ``obj`` does not support the relevant protocol.
``METH_FASTCALL`` and ``METH_VECTORCALL`` flags
-----------------------------------------------
A new ``METH_VECTORCALL`` flag is added for specifying ``PyMethodDef`` structs. It is equivalent to the currently undocumented ``METH_FASTCALL | METH_KEYWORD`` flags.
The new flag specifies that the function has the type ``PyObject *(*call) (PyObject *self, PyObject *const *args, Py_ssize_t nargs, PyObject *kwname)``
Internal CPython changes
========================
In order to conform to the specification, the only changes required are:
* To use the new calling convention in the interpreter.
* An implementation of the ``PyObject_VectorCall`` function.
* An implementation of the ``PyCall_MakeVectorCall`` and ``PyCall_MakeTpCall`` convenience functions.
To gain the promised performance advantage, the following classes will need to implement the new calling convention:
* Python functions
* Builtin functions and methods
* Bound methods
* Method descriptors
* A few of the most commonly used classes, probably ``range``, ``list``, ``str``, and ``type``.
Changes to existing C structs
-----------------------------
The ``function``, ``builtin_function_or_method``, ``method_descriptor`` and ``method`` classes will have their corresponding structs changed to
include a ``vectorcall`` pointer.
Third-party built-in classes using the new extended call interface
------------------------------------------------------------------
To enable call performance on a par with Python functions and built-in functions, third-party callables should include a ``vectorcall`` function pointer
and set ``tp_vectorcall_offset`` to the correct value.
Any class that sets ``tp_vectorcall_offset`` to non-zero should also implement the ``tp_call`` function and make sure its behaviour is consistent with the ``vectorcall`` function.
Setting ``tp_call`` to ``PyCall_MakeVectorCall`` will suffice.
The ``PyMethodDef`` protocol and Argument Clinic
================================================
Argument Clinic [4]_ automatically generates wrapper functions around lower-level callables, providing safe unboxing of primitive types and
other safety checks.
Argument Clinic could be extended to generate wrapper objects conforming to the new ``vectorcall`` protocol.
This will allow execution to flow from the caller to the Argument Clinic generated wrapper and
thence to the hand-written code with only a single indirection.
Performance implications of these changes
=========================================
Initial experiments, implementing the new calling convention for Python functions, builtin functions and method-descriptors showed a
speedup of around 2%. A full implementation involving other callables and adding support for the new calling convention to argument
clinic would, in the author's estimation, yield a speedup of between 3% and 4% for the standard benchmark suite.
Alternative Suggestions
=======================
PEP 576 and PEP 580
-------------------
Both PEP 576 and PEP 580 are designed to enable 3rd party objects to be both expressive and performant (on a par with
CPython objects). The purpose of this PEP is provide a uniform way to call objects in the CPython ecosystem that is
both expressive and as performant as possible.
This PEP is broader in scope than PEP 576 and uses variable rather than fixed offset function-pointers.
The underlying calling convention is similar. Because PEP 576 only allows a fixed offset for the function pointer,
it would not allow the improvements to any objects with constraints on their layout.
PEP 580 proposes a major change to the ``PyMethodDef`` protocol used to define builtin functions.
This PEP provides a more general and simpler mechanism in the form of a new calling convention.
This PEP also extends the ``PyMethodDef`` protocol, but merely to formalise existing conventions.
Other rejected approaches
-------------------------
A longer, 6 argument, form combining both the vector and optional tuple and dictionary arguments was considered.
However, it was found that the code to convert between it and the old ``tp_call`` form was overly cumbersome and inefficient.
Also, since only 4 arguments are passed in registers on x64 Windows, the two extra arguments would have non-neglible costs.
Removing any special cases and making all calls use the ``tp_call`` form was also considered.
However, unless a much more efficient way was found to create and destroy tuples, and to a lesser extent dictionaries,
then it would be too slow.
Acknowledgements
================
Victor Stinner for developing the original "vector call" calling convention internally to CPython (where is it is called "fast call")
this PEP codifies and extends his work.
Jeroen Demeyer for authoring PEP 575 and PEP 580 which helped motivate this PEP.
References
==========
.. [1] Calling conventions
https://en.wikipedia.org/wiki/Calling_convention
.. [2] tp_call/PyObject_Call calling convention
https://docs.python.org/3/c-api/typeobj.html#c.PyTypeObject.tp_call
.. [3] Using PY_VECTORCALL_ARGUMENTS_OFFSET in callee
https://github.com/markshannon/cpython/blob/vectorcall-minimal/Objects/classobject.c#L53
.. [4] Argument Clinic
https://docs.python.org/3/howto/clinic.html
.. [5] PEP 576
https://www.python.org/dev/peps/pep-0576/
.. [6] PEP 580
https://www.python.org/dev/peps/pep-0580/
Reference implementation
========================
A minimal implementation can be found at https://github.com/markshannon/cpython/tree/vectorcall-minimal
Copyright
=========
This document has been placed in the public domain.
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End: