PEP: 575 Title: Unifying function/method classes Author: Jeroen Demeyer Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 27-Mar-2018 Python-Version: 3.8 Post-History: 31-Mar-2018, 12-Apr-2018, 27-Apr-2018, 5-May-2018 Abstract ======== Reorganize the class hierarchy for functions and methods with the goal of reducing the difference between built-in functions (implemented in C) and Python functions. Mainly, make built-in functions behave more like Python functions without sacrificing performance. A new base class ``base_function`` is introduced and the various function classes, as well as ``method`` (renamed to ``bound_method``), inherit from it. We also allow subclassing the Python ``function`` class. Motivation ========== Currently, CPython has two different function classes: the first is Python functions, which is what you get when defining a function with ``def`` or ``lambda``. The second is built-in functions such as ``len``, ``isinstance`` or ``numpy.dot``. These are implemented in C. These two classes are implemented completely independently and have different functionality. In particular, it is currently not possible to implement a function efficiently in C (only built-in functions can do that) while still allowing introspection like ``inspect.signature`` or ``inspect.getsourcefile`` (only Python functions can do that). This is a problem for projects like Cython [#cython]_ that want to do exactly that. In Cython, this was worked around by inventing a new function class called ``cyfunction``. Unfortunately, a new function class creates problems: the ``inspect`` module does not recognize such functions as being functions [#bpo30071]_ and the performance is worse (CPython has specific optimizations for calling built-in functions). A second motivation is more generally making built-in functions and methods behave more like Python functions and methods. For example, Python unbound methods are just functions but unbound methods of extension types (e.g. ``dict.get``) are a distinct class. Bound methods of Python classes have a ``__func__`` attribute, bound methods of extension types do not. Third, this PEP allows great customization of functions. The ``function`` class becomes subclassable and custom function subclasses are also allowed for functions implemented in C. In the latter case, this can be done with the same performance as true built-in functions. All functions can access the function object (the ``self`` in ``__call__``), paving the way for PEP 573. New classes =========== This is the new class hierarchy for functions and methods:: object | | base_function / | \ / | \ / | defined_function / | \ cfunction (*) | \ | function | bound_method (*) The two classes marked with (*) do *not* allow subclassing; the others do. There is no difference between functions and unbound methods, while bound methods are instances of ``bound_method``. base_function ------------- The class ``base_function`` becomes a new base class for all function types. It is based on the existing ``builtin_function_or_method`` class, but with the following differences and new features: #. It acts as a descriptor implementing ``__get__`` to turn a function into a method if ``m_self`` is ``NULL``. If ``m_self`` is not ``NULL``, then this is a no-op: the existing function is returned instead. #. A new read-only attribute ``__parent__``, represented in the C structure as ``m_parent``. If this attribute exists, it represents the defining object. For methods of extension types, this is the defining class (``__class__`` in plain Python) and for functions of a module, this is the defining module. In general, it can be any Python object. If ``__parent__`` is a class, it carries special semantics: in that case, the function must be called with ``self`` being an instance of that class. Finally, ``__qualname__`` and ``__reduce__`` will use ``__parent__`` as namespace (instead of ``__self__`` before). #. A new attribute ``__objclass__`` which equals ``__parent__`` if ``__parent__`` is a class. Otherwise, accessing ``__objclass__`` raises ``AttributeError``. This is meant to be backwards compatible with ``method_descriptor``. #. The field ``ml_doc`` and the attributes ``__doc__`` and ``__text_signature__`` (see Argument Clinic [#clinic]_) are not supported. #. A new flag ``METH_PASS_FUNCTION`` for ``ml_flags``. If this flag is set, the C function stored in ``ml_meth`` is called with an additional first argument equal to the function object. #. A new flag ``METH_BINDING`` for ``ml_flags`` which only applies to functions of a module (not methods of a class). If this flag is set, then ``m_self`` is set to ``NULL`` instead of the module. This allows the function to behave more like a Python function as it enables ``__get__``. #. A new flag ``METH_CALL_UNBOUND`` to disable `self slicing`_. #. A new flag ``METH_PYTHON`` for ``ml_flags``. This flag indicates that this function should be treated as Python function. Ideally, use of this flag should be avoided because it goes against the duck typing philosophy. It is still needed in a few places though, for example `profiling`_. The goal of ``base_function`` is that it supports all different ways of calling functions and methods in just one structure. For example, the new flag ``METH_PASS_FUNCTION`` will be used by the implementation of methods. It is not possible to directly create instances of ``base_function`` (``tp_new`` is ``NULL``). However, it is legal for C code to manually create instances. These are the relevant C structures:: PyTypeObject PyBaseFunction_Type; typedef struct { PyObject_HEAD PyCFunctionDef *m_ml; /* Description of the C function to call */ PyObject *m_self; /* __self__: anything, can be NULL; readonly */ PyObject *m_module; /* __module__: anything (typically str) */ PyObject *m_parent; /* __parent__: anything, can be NULL; readonly */ PyObject *m_weakreflist; /* List of weak references */ } PyBaseFunctionObject; typedef struct { const char *ml_name; /* The name of the built-in function/method */ PyCFunction ml_meth; /* The C function that implements it */ int ml_flags; /* Combination of METH_xxx flags, which mostly describe the args expected by the C func */ } PyCFunctionDef; Subclasses may extend ``PyCFunctionDef`` with extra fields. The Python attribute ``__self__`` returns ``m_self``, except if ``METH_STATIC`` is set. In that case or if ``m_self`` is ``NULL``, then there is no ``__self__`` attribute at all. For that reason, we write either ``m_self`` or ``__self__`` in this PEP with slightly different meanings. cfunction --------- This is the new version of the old ``builtin_function_or_method`` class. The name ``cfunction`` was chosen to avoid confusion with "built-in" in the sense of "something in the ``builtins`` module". It also fits better with the C API which use the `PyCFunction`` prefix. The class ``cfunction`` is a copy of ``base_function``, with the following differences: #. ``m_ml`` points to a ``PyMethodDef`` structure, extending ``PyCFunctionDef`` with an additional ``ml_doc`` field to implement ``__doc__`` and ``__text_signature__`` as read-only attributes:: typedef struct { const char *ml_name; PyCFunction ml_meth; int ml_flags; const char *ml_doc; } PyMethodDef; Note that ``PyMethodDef`` is part of the Python Stable ABI [#ABI]_ and it is used by practically all extension modules, so we absolutely cannot change this structure. #. Argument Clinic [#clinic]_ is supported. #. ``__self__`` always exists. In the cases where ``base_function.__self__`` would raise ``AttributeError``, instead ``None`` is returned. The type object is ``PyTypeObject PyCFunction_Type`` and we define ``PyCFunctionObject`` as alias of ``PyBaseFunctionObject`` (except for the type of ``m_ml``). defined_function ---------------- The class ``defined_function`` is an abstract base class meant to indicate that the function has introspection support. Instances of ``defined_function`` are required to support all attributes that Python functions have, namely ``__code__``, ``__globals__``, ``__doc__``, ``__defaults__``, ``__kwdefaults__``, ``__closure__`` and ``__annotations__``. There is also a ``__dict__`` to support attributes added by the user. None of these is required to be meaningful. In particular, ``__code__`` may not be a working code object, possibly only a few fields may be filled in. This PEP does not dictate how the various attributes are implemented. They may be simple struct members or more complicated descriptors. Only read-only support is required, none of the attributes is required to be writable. The class ``defined_function`` is mainly meant for auto-generated C code, for example produced by Cython [#cython]_. There is no API to create instances of it. The C structure is the following:: PyTypeObject PyDefinedFunction_Type; typedef struct { PyBaseFunctionObject base; PyObject *func_dict; /* __dict__: dict or NULL */ } PyDefinedFunctionObject; **TODO**: maybe find a better name for ``defined_function``. Other proposals: ``inspect_function`` (anything that satisfies ``inspect.isfunction``), ``builtout_function`` (a function that is better built out; pun on builtin), ``generic_function`` (original proposal but conflicts with ``functools.singledispatch`` generic functions), ``user_function`` (defined by the user as opposed to CPython). function -------- This is the class meant for functions implemented in Python. Unlike the other function types, instances of ``function`` can be created from Python code. This is not changed, so we do not describe the details in this PEP. The layout of the C structure is the following:: PyTypeObject PyFunction_Type; typedef struct { PyBaseFunctionObject base; PyObject *func_dict; /* __dict__: dict or NULL */ PyObject *func_code; /* __code__: code */ PyObject *func_globals; /* __globals__: dict; readonly */ PyObject *func_name; /* __name__: string */ PyObject *func_qualname; /* __qualname__: string */ PyObject *func_doc; /* __doc__: can be anything or NULL */ PyObject *func_defaults; /* __defaults__: tuple or NULL */ PyObject *func_kwdefaults; /* __kwdefaults__: dict or NULL */ PyObject *func_closure; /* __closure__: tuple of cell objects or NULL; readonly */ PyObject *func_annotations; /* __annotations__: dict or NULL */ PyCFunctionDef _ml; /* Storage for base.m_ml */ } PyFunctionObject; The descriptor ``__name__`` returns ``func_name``. When setting ``__name__``, also ``base.m_ml->ml_name`` is updated with the UTF-8 encoded name. The ``_ml`` field reserves space to be used by ``base.m_ml``. A ``base_function`` instance must have the flag ``METH_PYTHON`` set if and only if it is an instance of ``function``. When constructing an instance of ``function`` from ``code`` and ``globals``, an instance is created with ``base.m_ml = &_ml``, ``base.m_self = NULL``. To make subclassing easier, we also add a copy constructor: if ``f`` is an instance of ``function``, then ``types.FunctionType(f)`` copies ``f``. This conveniently allows using a custom function type as decorator:: >>> from types import FunctionType >>> class CustomFunction(FunctionType): ... pass >>> @CustomFunction ... def f(x): ... return x >>> type(f) This also removes many use cases of ``functools.wraps``: wrappers can be replaced by subclasses of ``function``. bound_method ------------ The class ``bound_method`` is used for all bound methods, regardless of the class of the underlying function. It adds one new attribute on top of ``base_function``: ``__func__`` points to that function. ``bound_method`` replaces the old ``method`` class which was used only for Python functions bound as method. There is a complication because we want to allow constructing a method from an arbitrary callable. This may be an already-bound method or simply not an instance of ``base_function``. Therefore, in practice there are two kinds of methods: - For arbitrary callables, we use a single fixed ``PyCFunctionDef`` structure with the ``METH_PASS_FUNCTION`` flag set. - For methods which bind instances of ``base_function`` (more precisely, which have the ``Py_TPFLAGS_BASEFUNCTION`` flag set) that have `self slicing`_, we instead use the ``PyCFunctionDef`` from the original function. This way, we don't lose any performance when calling bound methods. In this case, the ``__func__`` attribute is only used to implement various attributes but not for calling the method. When constructing a new method from a ``base_function``, we check that the ``self`` object is an instance of ``__objclass__`` (if a class was specified as parent) and raise a ``TypeError`` otherwise. The C structure is:: PyTypeObject PyMethod_Type; typedef struct { PyBaseFunctionObject base; PyObject *im_func; /* __func__: function implementing the method; readonly */ } PyMethodObject; Calling base_function instances =============================== We specify the implementation of ``__call__`` for instances of ``base_function``. Checking __objclass__ --------------------- First of all, a type check is done if the ``__parent__`` of the function is a class (recall that ``__objclass__`` then becomes an alias of ``__parent__``): if ``m_self`` is ``NULL`` (this is the case for unbound methods of extension types), then the function must be called with at least one positional argument and the first (typically called ``self``) must be an instance of ``__objclass__``. If not, a ``TypeError`` is raised. Note that bound methods have ``m_self != NULL``, so the ``__objclass__`` is not checked. Instead, the ``__objclass__`` check is done when constructing the method. Flags ----- For convenience, we define a new constant: ``METH_CALLFLAGS`` combines all flags from ``PyCFunctionDef.ml_flags`` which specify the signature of the C function to be called. It is equal to :: METH_VARARGS | METH_FASTCALL | METH_NOARGS | METH_O | METH_KEYWORDS | METH_PASS_FUNCTION Exactly one of the first four flags above must be set and only ``METH_VARARGS`` and ``METH_FASTCALL`` may be combined with ``METH_KEYWORDS``. Violating these rules is undefined behaviour. There are one new flags which affects calling functions, namely ``METH_PASS_FUNCTION`` and ``METH_CALL_UNBOUND``. Some flags are already documented in [#methoddoc]_. We explain the others below. Self slicing ------------ If the function has ``m_self == NULL`` and the flag ``METH_CALL_UNBOUND`` is not set, then the first positional argument (if any) is removed from ``*args`` and instead passed as first argument to the C function. Effectively, the first positional argument is treated as ``__self__``. This is meant to support unbound methods such that the C function does not see the difference between bound and unbound method calls. This does not affect keyword arguments in any way. This process is called *self slicing* and a function is said to *have self slicing* if ``m_self == NULL`` and ``METH_CALL_UNBOUND`` is not set. Note that a ``METH_NOARGS`` function which has self slicing effectively has one argument, namely ``self``. Analogously, a ``METH_O`` function with self slicing has two arguments. METH_PASS_FUNCTION ------------------ If this flag is set, then the C function is called with an additional first argument, namely the function itself (the ``base_function`` instance). As special case, if the function is a ``bound_method``, then the underlying function of the method is passed (but not recursively: if a ``bound_method`` wraps a ``bound_method``, then ``__func__`` is only applied once). For example, an ordinary ``METH_VARARGS`` function has signature ``(PyObject *self, PyObject *args)``. With ``METH_VARARGS | METH_PASS_FUNCTION``, this becomes ``(PyObject *func, PyObject *self, PyObject *args)``. METH_FASTCALL ------------- This is an existing but undocumented flag. We suggest to officially support and document it. If the flag ``METH_FASTCALL`` is set without ``METH_KEYWORDS``, then the ``ml_meth`` field is of type ``PyCFunctionFast`` which takes the arguments ``(PyObject *self, PyObject *const *args, Py_ssize_t nargs)``. Such a function takes only positional arguments and they are passed as plain C array ``args`` of length ``nargs``. If the flags ``METH_FASTCALL | METH_KEYWORDS`` are set, then the ``ml_meth`` field is of type ``PyCFunctionFastKeywords`` which takes the arguments ``(PyObject *self, PyObject *const *args, Py_ssize_t nargs, PyObject *kwnames)``. The positional arguments are passed as C array ``args`` of length ``nargs``. The *values* of the keyword arguments follow in that array, starting at position ``nargs``. The *keys* (names) of the keyword arguments are passed as a ``tuple`` in ``kwnames``. As an example, assume that 3 positional and 2 keyword arguments are given. Then ``args`` is an array of length 3 + 2 = 5, ``nargs`` equals 3 and ``kwnames`` is a 2-tuple. Automatic creation of built-in functions ======================================== Python automatically generates instances of ``cfunction`` for extension types (using the ``PyTypeObject.tp_methods`` field) and modules (using the ``PyModuleDef.m_methods`` field). The arrays ``PyTypeObject.tp_methods`` and ``PyModuleDef.m_methods`` must be arrays of ``PyMethodDef`` structures. Unbound methods of extension types ---------------------------------- The type of unbound methods changes from ``method_descriptor`` to ``cfunction``. The object which appears as unbound method is the same object which appears in the class ``__dict__``. Python automatically sets the ``__parent__`` attribute to the defining class. Built-in functions of a module ------------------------------ For the case of functions of a module, ``__parent__`` will be set to the module. Unless the flag ``METH_BINDING`` is given, also ``__self__`` will be set to the module (for backwards compatibility). An important consequence is that such functions by default do not become methods when used as attribute (``base_function.__get__`` only does that if ``m_self`` was ``NULL``). One could consider this a bug, but this was done for backwards compatibility reasons: in an initial post on python-ideas [#proposal]_ the concensus was to keep this misfeature of built-in functions. However, to allow this anyway for specific or newly implemented built-in functions, the ``METH_BINDING`` flag prevents setting ``__self__``. Further changes =============== New type flag ------------- A new ``PyTypeObject`` flag (for ``tp_flags``) is added: ``Py_TPFLAGS_BASEFUNCTION`` to indicate that instances of this type are functions which can be called and bound as method like a ``base_function``. This is different from flags like ``Py_TPFLAGS_LIST_SUBCLASS`` because it indicates more than just a subclass: it also indicates a default implementation of ``__call__`` and ``__get__``. In particular, such subclasses of ``base_function`` must follow the implementation from the section `Calling base_function instances`_. This flag is automatically set for extension types which inherit the ``tp_call`` and ``tp_descr_get`` implementation from ``base_function``. Extension types can explicitly specify it if they override ``__call__`` or ``__get__`` in a compatible way. The flag ``Py_TPFLAGS_BASEFUNCTION`` must never be set for a heap type because that would not be safe (heap types can be changed dynamically). C API functions --------------- We list some relevant Python/C API macros and functions. Some of these are existing (possibly changed) functions, some are new: - ``int PyBaseFunction_CheckFast(PyObject *op)``: return true if ``op`` is an instance of a class with the ``Py_TPFLAGS_BASEFUNCTION`` set. This is the function that you need to use to determine whether it is meaningful to access the ``base_function`` internals. - ``int PyBaseFunction_Check(PyObject *op)``: return true if ``op`` is an instance of ``base_function``. - ``PyObject *PyBaseFunction_New(PyTypeObject *cls, PyCFunctionDef *ml, PyObject *self, PyObject *module, PyObject *parent)``: create a new instance of ``cls`` (which must be a subclass of ``base_function``) from the given data. - ``int PyCFunction_Check(PyObject *op)``: return true if ``op`` is an instance of ``cfunction``. - ``int PyCFunction_NewEx(PyMethodDef* ml, PyObject *self, PyObject* module)``: create a new instance of ``cfunction``. As special case, if ``self`` is ``NULL``, then set ``self = Py_None`` instead (for backwards compatibility). If ``self`` is a module, then ``__parent__`` is set to ``self``. Otherwise, ``__parent__`` is ``NULL``. - For many existing ``PyCFunction_...`` and ``PyMethod_`` functions, we define a new function ``PyBaseFunction_...`` acting on ``base_function`` instances. The old functions are kept as aliases of the new functions. - ``int PyFunction_Check(PyObject *op)``: return true if ``op`` is an instance of ``base_function`` with the ``METH_PYTHON`` flag set (this is equivalent to checking whether ``op`` is an instance of ``function``). - ``int PyFunction_CheckFast(PyObject *op)``: equivalent to ``PyFunction_Check(op) && PyBaseFunction_CheckFast(op)``. - ``int PyFunction_CheckExact(PyObject *op)``: return true if the type of ``op`` is ``function``. - ``PyObject *PyFunction_NewPython(PyTypeObject *cls, PyObject *code, PyObject *globals, PyObject *name, PyObject *qualname)``: create a new instance of ``cls`` (which must be a sublass of ``function``) from the given data. - ``PyObject *PyFunction_New(PyObject *code, PyObject *globals)``: create a new instance of ``function``. - ``PyObject *PyFunction_NewWithQualName(PyObject *code, PyObject *globals, PyObject *qualname)``: create a new instance of ``function``. - ``PyObject *PyFunction_Copy(PyTypeObject *cls, PyObject *func)``: create a new instance of ``cls`` (which must be a sublass of ``function``) by copying a given ``function``. Changes to the types module --------------------------- Two types are added: ``types.BaseFunctionType`` corresponding to ``base_function`` and ``types.DefinedFunctionType`` corresponding to ``defined_function``. Apart from that, no changes to the ``types`` module are made. In particular, ``types.FunctionType`` refers to ``function``. However, the actual types will change: in particular, ``types.BuiltinFunctionType`` will no longer be the same as ``types.BuiltinMethodType``. Changes to the inspect module ----------------------------- The new function ``inspect.isbasefunction`` checks for an instance of ``base_function``. ``inspect.isfunction`` checks for an instance of ``defined_function``. ``inspect.isbuiltin`` checks for an instance of ``cfunction``. ``inspect.isroutine`` checks ``isbasefunction`` or ``ismethoddescriptor``. **NOTE**: bpo-33261 [#bpo33261]_ should be fixed first. Profiling --------- Currently, ``sys.setprofile`` supports ``c_call``, ``c_return`` and ``c_exception`` events for built-in functions. These events are generated when calling or returning from a built-in function. By contrast, the ``call`` and ``return`` events are generated by the function itself. So nothing needs to change for the ``call`` and ``return`` events. Since we no longer make a difference between C functions and Python functions, we need to prevent the ``c_*`` events for Python functions. This is done by not generating those events if the ``METH_PYTHON`` flag in ``ml_flags`` is set. Non-CPython implementations =========================== Most of this PEP is only relevant to CPython. For other implementations of Python, the two changes that are required are the ``base_function`` base class and the fact that ``function`` can be subclassed. The classes ``cfunction`` and ``defined_function`` are not required. We require ``base_function`` for consistency but we put no requirements on it: it is acceptable if this is just a copy of ``object``. Support for the new ``__parent__`` (and ``__objclass__``) attribute is not required. If there is no ``defined_function`` class, then ``types.DefinedFunctionType`` should be an alias of ``types.FunctionType``. Rationale ========= Why not simply change existing classes? --------------------------------------- One could try to solve the problem by keeping the existing classes without introducing a new ``base_function`` class. That might look like a simpler solution but it is not: it would require introspection support for 3 distinct classes: ``function``, ``builtin_function_or_method`` and ``method_descriptor``. For the latter two classes, "introspection support" would mean at a minimum allowing subclassing. But we don't want to lose performance, so we want fast subclass checks. This would require two new flags in ``tp_flags``. And we want subclasses to allow ``__get__`` for built-in functions, so we should implement the ``LOAD_METHOD`` opcode for built-in functions too. More generally, a lot of functionality would need to be duplicated and the end result would be far more complex code. It is also not clear how the introspection of built-in function subclasses would interact with ``__text_signature__``. Having two independent kinds of ``inspect.signature`` support on the same class sounds like asking for problems. And this would not fix some of the other differences between built-in functions and Python functions that were mentioned in the `motivation`_. Why __text_signature__ is not a solution ---------------------------------------- Built-in functions have an attribute ``__text_signature__``, which gives the signature of the function as plain text. The default values are evaluated by ``ast.literal_eval``. Because of this, it supports only a small number of standard Python classes and not arbitrary Python objects. And even if ``__text_signature__`` would allow arbitrary signatures somehow, that is only one piece of introspection: it does not help with ``inspect.getsourcefile`` for example. defined_function versus function -------------------------------- In many places, a decision needs to be made whether the old ``function`` class should be replaced by ``defined_function`` or the new ``function`` class. This is done by thinking of the most likely use case: 1. ``types.FunctionType`` refers to ``function`` because that type might be used to construct instances using ``types.FunctionType(...)``. 2. ``inspect.isfunction()`` refers to ``defined_function`` because this is the class where introspection is supported. 3. The C API functions must refer to ``function`` because we do not specify how the various attributes of ``defined_function`` are implemented. We expect that this is not a problem since there is typically no reason for introspection to be done by C extensions. Scope of this PEP: which classes are involved? ---------------------------------------------- The main motivation of this PEP is fixing function classes, so we certainly want to unify the existing classes ``builtin_function_or_method`` and ``function``. Since built-in functions and methods have the same class, it seems natural to include bound methods too. And since there are no "unbound methods" for Python functions, it makes sense to get rid of unbound methods for extension types. For now, no changes are made to the classes ``staticmethod``, ``classmethod`` and ``classmethod_descriptor``. It would certainly make sense to put these in the ``base_function`` class hierarchy and unify ``classmethod`` and ``classmethod_descriptor``. However, this PEP is already big enough and this is left as a possible future improvement. Slot wrappers for extension types like ``__init__`` or ``__eq__`` are quite different from normal methods. They are also typically not called directly because you would normally write ``foo[i]`` instead of ``foo.__getitem__(i)``. So these are left outside the scope of this PEP. Python also has an ``instancemethod`` class, which seems to be a relic from Python 2, where it was used for bound and unbound methods. It is not clear whether there is still a use case for it. In any case, there is no reason to deal with it in this PEP. **TODO**: should ``instancemethod`` be deprecated? It doesn't seem used at all within CPython 3.7, but maybe external packages use it? Not treating METH_STATIC and METH_CLASS --------------------------------------- Almost nothing in this PEP refers to the flags ``METH_STATIC`` and ``METH_CLASS``. These flags are checked only by the `automatic creation of built-in functions`_. When a ``staticmethod``, ``classmethod`` or ``classmethod_descriptor`` is bound (i.e. ``__get__`` is called), a ``base_function`` instance is created with ``m_self != NULL``. For a ``classmethod``, this is obvious since ``m_self`` is the class that the method is bound to. For a ``staticmethod``, one can take an arbitrary Python object for ``m_self``. For backwards compatibility, we choose ``m_self = __parent__`` for static methods of extension types. __self__ in base_function ------------------------- It may look strange at first sight to add the ``__self__`` slot in ``base_function`` as opposed to ``bound_method``. We took this idea from the existing ``builtin_function_or_method`` class. It allows us to have a single general implementation of ``__call__`` and ``__get__`` for the various function classes discussed in this PEP. It also makes it easy to support existing built-in functions which set ``__self__`` to the module (for example, ``sys.exit.__self__`` is ``sys``). Two implementations of __doc__ ------------------------------ ``base_function`` does not support function docstrings. Instead, the classes ``cfunction`` and ``function`` each have their own way of dealing with docstrings (and ``bound_method`` just takes the ``__doc__`` from the wrapped function). For ``cfunction``, the docstring is stored (together with the text signature) as C string in the read-only ``ml_doc`` field of a ``PyMethodDef``. For ``function``, the docstring is stored as a writable Python object and it does not actually need to be a string. It looks hard to unify these two very different ways of dealing with ``__doc__``. For backwards compatibility, we keep the existing implementations. For ``defined_function``, we require ``__doc__`` to be implemented but we do not specify how. A subclass can implement ``__doc__`` the same way as ``cfunction`` or using a struct member or some other way. Subclassing ----------- We disallow subclassing of ``cfunction`` and ``bound_method`` to enable fast type checks for ``PyCFunction_Check`` and ``PyMethod_Check``. We allow subclassing of the other classes because there is no reason to disallow it. For Python modules, the only relevant class to subclass is ``function`` because the others cannot be instantiated anyway. Replacing tp_call: METH_PASS_FUNCTION and METH_CALL_UNBOUND ----------------------------------------------------------- The new flags ``METH_PASS_FUNCTION`` and ``METH_CALL_UNBOUND`` are meant to support cases where formerly a custom ``tp_call`` was used. It reduces the number of special fast paths in ``Python/ceval.c`` for calling objects: instead of treating Python functions, built-in functions and method descriptors separately, there would only be a single check. The signature of ``tp_call`` is essentially the signature of ``PyBaseFunctionObject.m_ml.ml_meth`` with flags ``METH_VARARGS | METH_KEYWORDS | METH_PASS_FUNCTION | METH_CALL_UNBOUND`` (the only difference is an added ``self`` argument). Therefore, it should be easy to change existing ``tp_call`` slots to use the ``base_function`` implementation instead. It also makes sense to use ``METH_PASS_FUNCTION`` without ``METH_CALL_UNBOUND`` in cases where the C function simply needs access to additional metadata from the function, such as the ``__parent__``. This is for example needed to support PEP 573. Converting existing methods to use ``METH_PASS_FUNCTION`` is trivial: it only requires adding an extra argument to the C function. Backwards compatibility ======================= While designing this PEP, great care was taken to not break backwards compatibility too much. Most of the potentially incompatible changes are changes to CPython implementation details which are different anyway in other Python interpreters. In particular, Python code which correctly runs on PyPy will very likely continue to work with this PEP. The standard classes and functions like ``staticmethod``, ``functools.partial`` or ``operator.methodcaller`` do not need to change at all. Changes to types and inspect ---------------------------- The proposed changes to ``types`` and ``inspect`` are meant to minimize changes in behaviour. However, it is unavoidable that some things change and this can cause code which uses ``types`` or ``inspect`` to break. In the Python standard library for example, changes are needed in the ``doctest`` module because of this. Also, tools which take various kinds of functions as input will need to deal with the new function hieararchy and the possibility of custom function classes. Python functions ---------------- For Python functions, essentially nothing changes. The attributes that existed before still exist and Python functions can be initialized, called and turned into methods as before. The name ``function`` is kept for backwards compatibility. While it might make sense to change the name to something more specific like ``python_function``, that would require a lot of annoying changes in documentation and testsuites. Built-in functions of a module ------------------------------ Also for built-in functions, nothing changes. We keep the old behaviour that such functions do not bind as methods. This is a consequence of the fact that ``__self__`` is set to the module. Built-in bound and unbound methods ---------------------------------- The types of built-in bound and unbound methods will change. However, this does not affect calling such methods because the protocol in ``base_function.__call__`` (in particular the handling of ``__objclass__`` and self slicing) was specifically designed to be backwards compatible. All attributes which existed before (like ``__objclass__`` and ``__self__``) still exist. New attributes -------------- Some objects get new special double-underscore attributes. For example, the new attribute ``__parent__`` appears on all built-in functions and all methods get a ``__func__`` attribute. The fact that ``__self__`` is now a special read-only attribute for Python functions caused trouble in [#bpo33265]_. Generally, we expect that not much will break though. method_descriptor and PyDescr_NewMethod --------------------------------------- The class ``method_descriptor`` and the constructor ``PyDescr_NewMethod`` should be deprecated. They are no longer used by CPython itself but are still supported. Two-phase Implementation ======================== **TODO**: this section is optional. If this PEP is accepted, it should be decided whether to apply this two-phase implementation or not. As mentioned above, the `changes to types and inspect`_ can break some existing code. In order to further minimize breakage, this PEP could be implemented in two phases. Phase one: keep existing classes but add base classes ----------------------------------------------------- Initially, implement the ``base_function`` class and use it as common base class but otherwise keep the existing classes (but not their implementation). In this proposal, the class hierarchy would become:: object | | base_function / | \ / | \ / | \ cfunction | defined_function | | | \ | | bound_method \ | | \ | method_descriptor function | builtin_function_or_method The leaf classes ``builtin_function_or_method``, ``method_descriptor``, ``bound_method`` and ``function`` correspond to the existing classes (with ``method`` renamed to ``bound_method``). Automatically created functions created in modules become instances of ``builtin_function_or_method``. Unbound methods of extension types become instances of ``method_descriptor``. The class ``method_descriptor`` is a copy of ``cfunction`` except that ``__get__`` returns a ``builtin_function_or_method`` instead of a ``bound_method``. The class ``builtin_function_or_method`` has the same C structure as a ``bound_method``, but it inherits from ``cfunction``. The ``__func__`` attribute is not mandatory: it is only defined when binding a ``method_descriptor``. We keep the implementation of the ``inspect`` functions as they are. Because of this and because the existing classes are kept, backwards compatibility is ensured for code doing type checks. Since showing an actual ``DeprecationWarning`` would affect a lot of correctly-functioning code, any deprecations would only appear in the documentation. Another reason is that it is hard to show warnings for calling ``isinstance(x, t)`` (but it could be done using ``__instancecheck__`` hacking) and impossible for ``type(x) is t``. Phase two --------- Phase two is what is actually described in the rest of this PEP. In terms of implementation, it would be a relatively small change compared to phase one. Reference Implementation ======================== Most of this PEP has been implemented for CPython at https://github.com/jdemeyer/cpython/tree/pep575 There are four steps, corresponding to the commits on that branch. After each step, CPython is in a mostly working state. 1. Add the ``base_function`` class and make it a subclass for ``cfunction``. This is by far the biggest step as the complete ``__call__`` protocol is implemented in this step. 2. Rename ``method`` to ``bound_method`` and make it a subclass of ``base_function``. Change unbound methods of extension types to be instances of ``cfunction`` such that bound methods of extension types are also instances of ``bound_method``. 3. Implement ``defined_function`` and ``function``. 4. Changes to other parts of Python, such as the standard library and testsuite. Appendix: current situation =========================== **NOTE**: This section is more useful during the draft period of the PEP, so feel free to remove this once the PEP has been accepted. For reference, we describe in detail the relevant existing classes in CPython 3.7. Each of the classes involved is an "orphan" class (no non-trivial subclasses nor superclasses). builtin_function_or_method: built-in functions and bound methods ---------------------------------------------------------------- These are of type `PyCFunction_Type `_ with structure `PyCFunctionObject `_:: typedef struct { PyObject_HEAD PyMethodDef *m_ml; /* Description of the C function to call */ PyObject *m_self; /* Passed as 'self' arg to the C func, can be NULL */ PyObject *m_module; /* The __module__ attribute, can be anything */ PyObject *m_weakreflist; /* List of weak references */ } PyCFunctionObject; struct PyMethodDef { const char *ml_name; /* The name of the built-in function/method */ PyCFunction ml_meth; /* The C function that implements it */ int ml_flags; /* Combination of METH_xxx flags, which mostly describe the args expected by the C func */ const char *ml_doc; /* The __doc__ attribute, or NULL */ }; where ``PyCFunction`` is a C function pointer (there are various forms of this, the most basic takes two arguments for ``self`` and ``*args``). This class is used both for functions and bound methods: for a method, the ``m_self`` slot points to the object:: >>> dict(foo=42).get >>> dict(foo=42).get.__self__ {'foo': 42} In some cases, a function is considered a "method" of the module defining it:: >>> import os >>> os.kill >>> os.kill.__self__ method_descriptor: built-in unbound methods ------------------------------------------- These are of type `PyMethodDescr_Type `_ with structure `PyMethodDescrObject `_:: typedef struct { PyDescrObject d_common; PyMethodDef *d_method; } PyMethodDescrObject; typedef struct { PyObject_HEAD PyTypeObject *d_type; PyObject *d_name; PyObject *d_qualname; } PyDescrObject; function: Python functions -------------------------- These are of type `PyFunction_Type `_ with structure `PyFunctionObject `_:: typedef struct { PyObject_HEAD PyObject *func_code; /* A code object, the __code__ attribute */ PyObject *func_globals; /* A dictionary (other mappings won't do) */ PyObject *func_defaults; /* NULL or a tuple */ PyObject *func_kwdefaults; /* NULL or a dict */ PyObject *func_closure; /* NULL or a tuple of cell objects */ PyObject *func_doc; /* The __doc__ attribute, can be anything */ PyObject *func_name; /* The __name__ attribute, a string object */ PyObject *func_dict; /* The __dict__ attribute, a dict or NULL */ PyObject *func_weakreflist; /* List of weak references */ PyObject *func_module; /* The __module__ attribute, can be anything */ PyObject *func_annotations; /* Annotations, a dict or NULL */ PyObject *func_qualname; /* The qualified name */ /* Invariant: * func_closure contains the bindings for func_code->co_freevars, so * PyTuple_Size(func_closure) == PyCode_GetNumFree(func_code) * (func_closure may be NULL if PyCode_GetNumFree(func_code) == 0). */ } PyFunctionObject; In Python 3, there is no "unbound method" class: an unbound method is just a plain function. method: Python bound methods ---------------------------- These are of type `PyMethod_Type `_ with structure `PyMethodObject `_:: typedef struct { PyObject_HEAD PyObject *im_func; /* The callable object implementing the method */ PyObject *im_self; /* The instance it is bound to */ PyObject *im_weakreflist; /* List of weak references */ } PyMethodObject; References ========== .. [#cython] Cython (http://cython.org/) .. [#bpo30071] Python bug 30071, Duck-typing inspect.isfunction() (https://bugs.python.org/issue30071) .. [#bpo33261] Python bug 33261, inspect.isgeneratorfunction fails on hand-created methods (https://bugs.python.org/issue33261 and https://github.com/python/cpython/pull/6448) .. [#bpo33265] Python bug 33265, contextlib.ExitStack abuses __self__ (https://bugs.python.org/issue33265 and https://github.com/python/cpython/pull/6456) .. [#ABI] PEP 384, Defining a Stable ABI, Löwis (https://www.python.org/dev/peps/pep-0384) .. [#clinic] PEP 436, The Argument Clinic DSL, Hastings (https://www.python.org/dev/peps/pep-0436) .. [#methoddoc] PyMethodDef documentation (https://docs.python.org/3.7/c-api/structures.html#c.PyMethodDef) .. [#proposal] PEP proposal: unifying function/method classes (https://mail.python.org/pipermail/python-ideas/2018-March/049398.html) Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: