slower, more general ``tp_call`` calling convention.
This PEP proposes that the calling convention used internally for Python and builtin functions is generalized and published
so that all calls can benefit from better performance.
Improved Performance
--------------------
The current ``tp_call`` calling convention requires creation of a tuple and, if there are any named arguments, a dictionary for every call.
This is expensive. The proposed calling convention removes the need to create almost all of these temporary objects.
Another source of inefficiency in the ``tp_call`` convention is that it has one function pointer per-class, rather than per-object. This is inefficient for calls to classes as several intermediate objects need to be created. For a user defined class, ``UserClass``, at least one intermediate object is created for each call in the sequence ``type.__call__``, ``object.__new__``, ``UserClass.__init__``.
The new proposed calling convention is not fully general, but covers the large majority of calls.
It is designed to remove the overhead of temporary object creation and multiple indirections.
Specification
=============
The function pointer type
-------------------------
Calls are made through a function pointer taking the following parameters:
*``PyObject *callable``: The called object
*``Py_ssize_t n``: The number of arguments plus an optional offset flag for the first argument in vector.
*``PyObject **args``: A vector of arguments
*``PyTupleObject *kwnames``: A tuple of the names of the named arguments.
This is implemented by the function pointer type:
``typedef PyObject *(*vectorcall)(PyObject *callable, Py_ssize_t n, PyObject** args, PyTupleObject *kwnames);``
A new slot ``tp_vectorcall`` is added so that classes can support the vectorcall calling convention.
It has the type ``vectorcall``.
The ``tp_print`` slot is reused as the ``tp_vectorcall_offset`` slot to make it easier for for external projects to backport the vectorcall protocol to earlier Python versions.
In particular, the Cython project has shown interest in doing that (see https://mail.python.org/pipermail/python-dev/2018-June/153927.html).
This is used by the interpreter to avoid creating temporary objects when calling methods.
If this flag is set for a class ``F``, then instances of that class are expected to behave the same as a Python function when used as a class attribute.
This flag is necessary if custom callables are to be able to behave like Python functions *and* be called as efficiently as Python or built-in functions.
The call
--------
The call takes the form ``((vectorcall)(((char *)o)+offset))(o, n, args, kwnames)`` where
When a caller sets the ``PY_VECTORCALL_ARGUMENTS_OFFSET`` it is indicating that the ``args`` pointer points to item 1 (counting from 0) of the allocated array
and that the contents of the allocated array can be safely mutated by the callee. The callee still needs to make sure that the reference counts of any objects
in the array remain correct.
Example of how ``PY_VECTORCALL_ARGUMENTS_OFFSET`` is used by a callee is safely used to avoid allocation [3]_
If callables could be sub-classed then any call to a ``function`` or a ``method_descriptor`` would need an additional check that the ``__call__`` method had not been overridden. By exposing an additional call mechanism, the potential for errors becomes greater. As a consequence, any third-party class implementing the new call interface will not be usable as a base class.
A new ``METH_VECTORCALL`` flag is added for specifying ``PyMethodDef`` structs. It is equivalent to the currently undocumented ``METH_FASTCALL | METH_KEYWORD`` flags.
The new flag specifies that the function has the type ``PyObject *(*call) (PyObject *self, PyObject *const *args, Py_ssize_t nargs, PyObject *kwname)``
To enable call performance on a par with Python functions and built-in functions, third-party callables should include a ``vectorcall`` function pointer
and set ``tp_vectorcall_offset`` to the correct value.
Any class that sets ``tp_vectorcall_offset`` to non-zero should also implement the ``tp_call`` function and make sure its behaviour is consistent with the ``vectorcall`` function.