962 lines
40 KiB
Plaintext
962 lines
40 KiB
Plaintext
PEP: 253
|
|
Title: Subtyping Built-in Types
|
|
Author: Guido van Rossum <guido@python.org>
|
|
Status: Final
|
|
Type: Standards Track
|
|
Content-Type: text/x-rst
|
|
Created: 14-May-2001
|
|
Python-Version: 2.2
|
|
Post-History:
|
|
|
|
Abstract
|
|
========
|
|
|
|
This PEP proposes additions to the type object API that will allow
|
|
the creation of subtypes of built-in types, in C and in Python.
|
|
|
|
[Editor's note: the ideas described in this PEP have been incorporated
|
|
into Python. The PEP no longer accurately describes the implementation.]
|
|
|
|
|
|
Introduction
|
|
============
|
|
|
|
Traditionally, types in Python have been created statically, by
|
|
declaring a global variable of type PyTypeObject and initializing
|
|
it with a static initializer. The slots in the type object
|
|
describe all aspects of a Python type that are relevant to the
|
|
Python interpreter. A few slots contain dimensional information
|
|
(like the basic allocation size of instances), others contain
|
|
various flags, but most slots are pointers to functions to
|
|
implement various kinds of behaviors. A NULL pointer means that
|
|
the type does not implement the specific behavior; in that case
|
|
the system may provide a default behavior or raise an exception
|
|
when the behavior is invoked for an instance of the type. Some
|
|
collections of function pointers that are usually defined together
|
|
are obtained indirectly via a pointer to an additional structure
|
|
containing more function pointers.
|
|
|
|
While the details of initializing a PyTypeObject structure haven't
|
|
been documented as such, they are easily gleaned from the examples
|
|
in the source code, and I am assuming that the reader is
|
|
sufficiently familiar with the traditional way of creating new
|
|
Python types in C.
|
|
|
|
This PEP will introduce the following features:
|
|
|
|
- a type can be a factory function for its instances
|
|
|
|
- types can be subtyped in C
|
|
|
|
- types can be subtyped in Python with the class statement
|
|
|
|
- multiple inheritance from types is supported (insofar as
|
|
practical -- you still can't multiply inherit from list and
|
|
dictionary)
|
|
|
|
- the standard coercion functions (int, tuple, str etc.) will
|
|
be redefined to be the corresponding type objects, which serve
|
|
as their own factory functions
|
|
|
|
- a class statement can contain a ``__metaclass__`` declaration,
|
|
specifying the metaclass to be used to create the new class
|
|
|
|
- a class statement can contain a ``__slots__`` declaration,
|
|
specifying the specific names of the instance variables
|
|
supported
|
|
|
|
This PEP builds on :pep:`252`, which adds standard introspection to
|
|
types; for example, when a particular type object initializes the
|
|
``tp_hash`` slot, that type object has a ``__hash__`` method when
|
|
introspected. :pep:`252` also adds a dictionary to type objects
|
|
which contains all methods. At the Python level, this dictionary
|
|
is read-only for built-in types; at the C level, it is accessible
|
|
directly (but it should not be modified except as part of
|
|
initialization).
|
|
|
|
For binary compatibility, a flag bit in the tp_flags slot
|
|
indicates the existence of the various new slots in the type
|
|
object introduced below. Types that don't have the
|
|
``Py_TPFLAGS_HAVE_CLASS`` bit set in their ``tp_flags`` slot are assumed
|
|
to have NULL values for all the subtyping slots. (Warning: the
|
|
current implementation prototype is not yet consistent in its
|
|
checking of this flag bit. This should be fixed before the final
|
|
release.)
|
|
|
|
In current Python, a distinction is made between types and
|
|
classes. This PEP together with :pep:`254` will remove that
|
|
distinction. However, for backwards compatibility the distinction
|
|
will probably remain for years to come, and without :pep:`254`, the
|
|
distinction is still large: types ultimately have a built-in type
|
|
as a base class, while classes ultimately derive from a
|
|
user-defined class. Therefore, in the rest of this PEP, I will
|
|
use the word type whenever I can -- including base type or
|
|
supertype, derived type or subtype, and metatype. However,
|
|
sometimes the terminology necessarily blends, for example an
|
|
object's type is given by its ``__class__`` attribute, and subtyping
|
|
in Python is spelled with a class statement. If further
|
|
distinction is necessary, user-defined classes can be referred to
|
|
as "classic" classes.
|
|
|
|
|
|
About metatypes
|
|
===============
|
|
|
|
Inevitably the discussion comes to metatypes (or metaclasses).
|
|
Metatypes are nothing new in Python: Python has always been able
|
|
to talk about the type of a type::
|
|
|
|
>>> a = 0
|
|
>>> type(a)
|
|
<type 'int'>
|
|
>>> type(type(a))
|
|
<type 'type'>
|
|
>>> type(type(type(a)))
|
|
<type 'type'>
|
|
>>>
|
|
|
|
In this example, ``type(a)`` is a "regular" type, and ``type(type(a))`` is
|
|
a metatype. While as distributed all types have the same metatype
|
|
(``PyType_Type``, which is also its own metatype), this is not a
|
|
requirement, and in fact a useful and relevant 3rd party extension
|
|
(ExtensionClasses by Jim Fulton) creates an additional metatype.
|
|
The type of classic classes, known as ``types.ClassType``, can also be
|
|
considered a distinct metatype.
|
|
|
|
A feature closely connected to metatypes is the "Don Beaudry
|
|
hook", which says that if a metatype is callable, its instances
|
|
(which are regular types) can be subclassed (really subtyped)
|
|
using a Python class statement. I will use this rule to support
|
|
subtyping of built-in types, and in fact it greatly simplifies the
|
|
logic of class creation to always simply call the metatype. When
|
|
no base class is specified, a default metatype is called -- the
|
|
default metatype is the "ClassType" object, so the class statement
|
|
will behave as before in the normal case. (This default can be
|
|
changed per module by setting the global variable ``__metaclass__``.)
|
|
|
|
Python uses the concept of metatypes or metaclasses in a different
|
|
way than Smalltalk. In Smalltalk-80, there is a hierarchy of
|
|
metaclasses that mirrors the hierarchy of regular classes,
|
|
metaclasses map 1-1 to classes (except for some funny business at
|
|
the root of the hierarchy), and each class statement creates both
|
|
a regular class and its metaclass, putting class methods in the
|
|
metaclass and instance methods in the regular class.
|
|
|
|
Nice though this may be in the context of Smalltalk, it's not
|
|
compatible with the traditional use of metatypes in Python, and I
|
|
prefer to continue in the Python way. This means that Python
|
|
metatypes are typically written in C, and may be shared between
|
|
many regular types. (It will be possible to subtype metatypes in
|
|
Python, so it won't be absolutely necessary to write C to use
|
|
metatypes; but the power of Python metatypes will be limited. For
|
|
example, Python code will never be allowed to allocate raw memory
|
|
and initialize it at will.)
|
|
|
|
Metatypes determine various **policies** for types, such as what
|
|
happens when a type is called, how dynamic types are (whether a
|
|
type's ``__dict__`` can be modified after it is created), what the
|
|
method resolution order is, how instance attributes are looked
|
|
up, and so on.
|
|
|
|
I'll argue that left-to-right depth-first is not the best
|
|
solution when you want to get the most use from multiple
|
|
inheritance.
|
|
|
|
I'll argue that with multiple inheritance, the metatype of the
|
|
subtype must be a descendant of the metatypes of all base types.
|
|
|
|
I'll come back to metatypes later.
|
|
|
|
|
|
Making a type a factory for its instances
|
|
=========================================
|
|
|
|
Traditionally, for each type there is at least one C factory
|
|
function that creates instances of the type (``PyTuple_New()``,
|
|
``PyInt_FromLong()`` and so on). These factory functions take care of
|
|
both allocating memory for the object and initializing that
|
|
memory. As of Python 2.0, they also have to interface with the
|
|
garbage collection subsystem, if the type chooses to participate
|
|
in garbage collection (which is optional, but strongly recommended
|
|
for so-called "container" types: types that may contain references
|
|
to other objects, and hence may participate in reference cycles).
|
|
|
|
In this proposal, type objects can be factory functions for their
|
|
instances, making the types directly callable from Python. This
|
|
mimics the way classes are instantiated. The C APIs for creating
|
|
instances of various built-in types will remain valid and in some
|
|
cases more efficient. Not all types will become their own factory
|
|
functions.
|
|
|
|
The type object has a new slot, tp_new, which can act as a factory
|
|
for instances of the type. Types are now callable, because the
|
|
tp_call slot is set in ``PyType_Type`` (the metatype); the function
|
|
looks for the tp_new slot of the type that is being called.
|
|
|
|
Explanation: the ``tp_call`` slot of a regular type object (such as
|
|
``PyInt_Type`` or ``PyList_Type``) defines what happens when **instances**
|
|
of that type are called; in particular, the ``tp_call`` slot in the
|
|
function type, ``PyFunction_Type``, is the key to making functions
|
|
callable. As another example, ``PyInt_Type.tp_call`` is ``NULL``, because
|
|
integers are not callable. The new paradigm makes **type objects**
|
|
callable. Since type objects are instances of their metatype
|
|
(``PyType_Type``), the metatype's ``tp_call`` slot (``PyType_Type.tp_call``)
|
|
points to a function that is invoked when any type object is
|
|
called. Now, since each type has to do something different to
|
|
create an instance of itself, ``PyType_Type.tp_call`` immediately
|
|
defers to the ``tp_new`` slot of the type that is being called.
|
|
``PyType_Type`` itself is also callable: its ``tp_new`` slot creates a new
|
|
type. This is used by the class statement (formalizing the Don
|
|
Beaudry hook, see above). And what makes ``PyType_Type`` callable?
|
|
The ``tp_call`` slot of **its** metatype -- but since it is its own
|
|
metatype, that is its own ``tp_call`` slot!
|
|
|
|
If the type's ``tp_new`` slot is NULL, an exception is raised.
|
|
Otherwise, the tp_new slot is called. The signature for the
|
|
``tp_new`` slot is
|
|
|
|
::
|
|
|
|
PyObject *tp_new(PyTypeObject *type,
|
|
PyObject *args,
|
|
PyObject *kwds)
|
|
|
|
where 'type' is the type whose ``tp_new`` slot is called, and 'args'
|
|
and 'kwds' are the sequential and keyword arguments to the call,
|
|
passed unchanged from tp_call. (The 'type' argument is used in
|
|
combination with inheritance, see below.)
|
|
|
|
There are no constraints on the object type that is returned,
|
|
although by convention it should be an instance of the given
|
|
type. It is not necessary that a new object is returned; a
|
|
reference to an existing object is fine too. The return value
|
|
should always be a new reference, owned by the caller.
|
|
|
|
Once the ``tp_new`` slot has returned an object, further initialization
|
|
is attempted by calling the ``tp_init()`` slot of the resulting
|
|
object's type, if not NULL. This has the following signature::
|
|
|
|
int tp_init(PyObject *self,
|
|
PyObject *args,
|
|
PyObject *kwds)
|
|
|
|
It corresponds more closely to the ``__init__()`` method of classic
|
|
classes, and in fact is mapped to that by the slot/special-method
|
|
correspondence rules. The difference in responsibilities between
|
|
the ``tp_new()`` slot and the ``tp_init()`` slot lies in the invariants
|
|
they ensure. The ``tp_new()`` slot should ensure only the most
|
|
essential invariants, without which the C code that implements the
|
|
objects would break. The ``tp_init()`` slot should be used for
|
|
overridable user-specific initializations. Take for example the
|
|
dictionary type. The implementation has an internal pointer to a
|
|
hash table which should never be NULL. This invariant is taken
|
|
care of by the ``tp_new()`` slot for dictionaries. The dictionary
|
|
``tp_init()`` slot, on the other hand, could be used to give the
|
|
dictionary an initial set of keys and values based on the
|
|
arguments passed in.
|
|
|
|
Note that for immutable object types, the initialization cannot be
|
|
done by the ``tp_init()`` slot: this would provide the Python user
|
|
with a way to change the initialization. Therefore, immutable
|
|
objects typically have an empty ``tp_init()`` implementation and do
|
|
all their initialization in their ``tp_new()`` slot.
|
|
|
|
You may wonder why the ``tp_new()`` slot shouldn't call the ``tp_init()``
|
|
slot itself. The reason is that in certain circumstances (like
|
|
support for persistent objects), it is important to be able to
|
|
create an object of a particular type without initializing it any
|
|
further than necessary. This may conveniently be done by calling
|
|
the ``tp_new()`` slot without calling ``tp_init()``. It is also possible
|
|
that ``tp_init()`` is not called, or called more than once -- its
|
|
operation should be robust even in these anomalous cases.
|
|
|
|
For some objects, ``tp_new()`` may return an existing object. For
|
|
example, the factory function for integers caches the integers -1
|
|
through 99. This is permissible only when the type argument to
|
|
``tp_new()`` is the type that defined the ``tp_new()`` function (in the
|
|
example, if ``type == &PyInt_Type``), and when the ``tp_init()`` slot for
|
|
this type does nothing. If the type argument differs, the
|
|
``tp_new()`` call is initiated by a derived type's ``tp_new()`` to
|
|
create the object and initialize the base type portion of the
|
|
object; in this case ``tp_new()`` should always return a new object
|
|
(or raise an exception).
|
|
|
|
Both ``tp_new()`` and ``tp_init()`` should receive exactly the same 'args'
|
|
and 'kwds' arguments, and both should check that the arguments are
|
|
acceptable, because they may be called independently.
|
|
|
|
There's a third slot related to object creation: ``tp_alloc()``. Its
|
|
responsibility is to allocate the memory for the object,
|
|
initialize the reference count (``ob_refcnt``) and the type pointer
|
|
(``ob_type``), and initialize the rest of the object to all zeros. It
|
|
should also register the object with the garbage collection
|
|
subsystem if the type supports garbage collection. This slot
|
|
exists so that derived types can override the memory allocation
|
|
policy (like which heap is being used) separately from the
|
|
initialization code. The signature is::
|
|
|
|
PyObject *tp_alloc(PyTypeObject *type, int nitems)
|
|
|
|
The type argument is the type of the new object. The nitems
|
|
argument is normally zero, except for objects with a variable
|
|
allocation size (basically strings, tuples, and longs). The
|
|
allocation size is given by the following expression::
|
|
|
|
type->tp_basicsize + nitems * type->tp_itemsize
|
|
|
|
The ``tp_alloc`` slot is only used for subclassable types. The ``tp_new()``
|
|
function of the base class must call the ``tp_alloc()`` slot of the
|
|
type passed in as its first argument. It is the ``tp_new()``
|
|
function's responsibility to calculate the number of items. The
|
|
``tp_alloc()`` slot will set the ob_size member of the new object if
|
|
the ``type->tp_itemsize`` member is nonzero.
|
|
|
|
(Note: in certain debugging compilation modes, the type structure
|
|
used to have members named ``tp_alloc`` and a ``tp_free`` slot already,
|
|
counters for the number of allocations and deallocations. These
|
|
are renamed to ``tp_allocs`` and ``tp_deallocs``.)
|
|
|
|
Standard implementations for ``tp_alloc()`` and ``tp_new()`` are
|
|
available. ``PyType_GenericAlloc()`` allocates an object from the
|
|
standard heap and initializes it properly. It uses the above
|
|
formula to determine the amount of memory to allocate, and takes
|
|
care of GC registration. The only reason not to use this
|
|
implementation would be to allocate objects from a different heap
|
|
(as is done by some very small frequently used objects like ints
|
|
and tuples). ``PyType_GenericNew()`` adds very little: it just calls
|
|
the type's ``tp_alloc()`` slot with zero for nitems. But for mutable
|
|
types that do all their initialization in their ``tp_init()`` slot,
|
|
this may be just the ticket.
|
|
|
|
|
|
Preparing a type for subtyping
|
|
==============================
|
|
|
|
The idea behind subtyping is very similar to that of single
|
|
inheritance in C++. A base type is described by a structure
|
|
declaration (similar to the C++ class declaration) plus a type
|
|
object (similar to the C++ vtable). A derived type can extend the
|
|
structure (but must leave the names, order and type of the members
|
|
of the base structure unchanged) and can override certain slots in
|
|
the type object, leaving others the same. (Unlike C++ vtables,
|
|
all Python type objects have the same memory layout.)
|
|
|
|
The base type must do the following:
|
|
|
|
- Add the flag value ``Py_TPFLAGS_BASETYPE`` to ``tp_flags``.
|
|
|
|
- Declare and use ``tp_new()``, ``tp_alloc()`` and optional ``tp_init()``
|
|
slots.
|
|
|
|
- Declare and use ``tp_dealloc()`` and ``tp_free()``.
|
|
|
|
- Export its object structure declaration.
|
|
|
|
- Export a subtyping-aware type-checking macro.
|
|
|
|
The requirements and signatures for ``tp_new()``, ``tp_alloc()`` and
|
|
``tp_init()`` have already been discussed above: ``tp_alloc()`` should
|
|
allocate the memory and initialize it to mostly zeros; ``tp_new()``
|
|
should call the ``tp_alloc()`` slot and then proceed to do the
|
|
minimally required initialization; ``tp_init()`` should be used for
|
|
more extensive initialization of mutable objects.
|
|
|
|
It should come as no surprise that there are similar conventions
|
|
at the end of an object's lifetime. The slots involved are
|
|
``tp_dealloc()`` (familiar to all who have ever implemented a Python
|
|
extension type) and ``tp_free()``, the new kid on the block. (The
|
|
names aren't quite symmetric; ``tp_free()`` corresponds to ``tp_alloc()``,
|
|
which is fine, but ``tp_dealloc()`` corresponds to ``tp_new()``. Maybe
|
|
the tp_dealloc slot should be renamed?)
|
|
|
|
The ``tp_free()`` slot should be used to free the memory and
|
|
unregister the object with the garbage collection subsystem, and
|
|
can be overridden by a derived class; ``tp_dealloc()`` should
|
|
deinitialize the object (usually by calling ``Py_XDECREF()`` for
|
|
various sub-objects) and then call ``tp_free()`` to deallocate the
|
|
memory. The signature for ``tp_dealloc()`` is the same as it always
|
|
was::
|
|
|
|
void tp_dealloc(PyObject *object)
|
|
|
|
The signature for tp_free() is the same::
|
|
|
|
void tp_free(PyObject *object)
|
|
|
|
(In a previous version of this PEP, there was also a role reserved
|
|
for the ``tp_clear()`` slot. This turned out to be a bad idea.)
|
|
|
|
To be usefully subtyped in C, a type must export the structure
|
|
declaration for its instances through a header file, as it is
|
|
needed to derive a subtype. The type object for the base type
|
|
must also be exported.
|
|
|
|
If the base type has a type-checking macro (like ``PyDict_Check()``),
|
|
this macro should be made to recognize subtypes. This can be done
|
|
by using the new ``PyObject_TypeCheck(object, type)`` macro, which
|
|
calls a function that follows the base class links.
|
|
|
|
The ``PyObject_TypeCheck()`` macro contains a slight optimization: it
|
|
first compares ``object->ob_type`` directly to the type argument, and
|
|
if this is a match, bypasses the function call. This should make
|
|
it fast enough for most situations.
|
|
|
|
Note that this change in the type-checking macro means that C
|
|
functions that require an instance of the base type may be invoked
|
|
with instances of the derived type. Before enabling subtyping of
|
|
a particular type, its code should be checked to make sure that
|
|
this won't break anything. It has proved useful in the prototype
|
|
to add another type-checking macro for the built-in Python object
|
|
types, to check for exact type match too (for example,
|
|
``PyDict_Check(x)`` is true if x is an instance of dictionary or of a
|
|
dictionary subclass, while ``PyDict_CheckExact(x)`` is true only if x
|
|
is a dictionary).
|
|
|
|
|
|
Creating a subtype of a built-in type in C
|
|
==========================================
|
|
|
|
The simplest form of subtyping is subtyping in C. It is the
|
|
simplest form because we can require the C code to be aware of
|
|
some of the problems, and it's acceptable for C code that doesn't
|
|
follow the rules to dump core. For added simplicity, it is
|
|
limited to single inheritance.
|
|
|
|
Let's assume we're deriving from a mutable base type whose
|
|
tp_itemsize is zero. The subtype code is not GC-aware, although
|
|
it may inherit GC-awareness from the base type (this is
|
|
automatic). The base type's allocation uses the standard heap.
|
|
|
|
The derived type begins by declaring a type structure which
|
|
contains the base type's structure. For example, here's the type
|
|
structure for a subtype of the built-in list type::
|
|
|
|
typedef struct {
|
|
PyListObject list;
|
|
int state;
|
|
} spamlistobject;
|
|
|
|
Note that the base type structure member (here ``PyListObject``) must
|
|
be the first member of the structure; any following members are
|
|
additions. Also note that the base type is not referenced via a
|
|
pointer; the actual contents of its structure must be included!
|
|
(The goal is for the memory layout of the beginning of the
|
|
subtype instance to be the same as that of the base type
|
|
instance.)
|
|
|
|
Next, the derived type must declare a type object and initialize
|
|
it. Most of the slots in the type object may be initialized to
|
|
zero, which is a signal that the base type slot must be copied
|
|
into it. Some slots that must be initialized properly:
|
|
|
|
- The object header must be filled in as usual; the type should
|
|
be ``&PyType_Type``.
|
|
|
|
- The tp_basicsize slot must be set to the size of the subtype
|
|
instance struct (in the above example: ``sizeof(spamlistobject)``).
|
|
|
|
- The tp_base slot must be set to the address of the base type's
|
|
type object.
|
|
|
|
- If the derived slot defines any pointer members, the
|
|
``tp_dealloc`` slot function requires special attention, see
|
|
below; otherwise, it can be set to zero, to inherit the base
|
|
type's deallocation function.
|
|
|
|
- The ``tp_flags`` slot must be set to the usual ``Py_TPFLAGS_DEFAULT``
|
|
value.
|
|
|
|
- The ``tp_name`` slot must be set; it is recommended to set ``tp_doc``
|
|
as well (these are not inherited).
|
|
|
|
If the subtype defines no additional structure members (it only
|
|
defines new behavior, no new data), the ``tp_basicsize`` and the
|
|
``tp_dealloc`` slots may be left set to zero.
|
|
|
|
The subtype's ``tp_dealloc`` slot deserves special attention. If the
|
|
derived type defines no additional pointer members that need to be
|
|
DECREF'ed or freed when the object is deallocated, it can be set
|
|
to zero. Otherwise, the subtype's ``tp_dealloc()`` function must call
|
|
``Py_XDECREF()`` for any ``PyObject *`` members and the correct memory
|
|
freeing function for any other pointers it owns, and then call the
|
|
base class's ``tp_dealloc()`` slot. This call has to be made via the
|
|
base type's type structure, for example, when deriving from the
|
|
standard list type::
|
|
|
|
PyList_Type.tp_dealloc(self);
|
|
|
|
If the subtype wants to use a different allocation heap than the
|
|
base type, the subtype must override both the ``tp_alloc()`` and the
|
|
``tp_free()`` slots. These will be called by the base class's
|
|
``tp_new()`` and ``tp_dealloc()`` slots, respectively.
|
|
|
|
To complete the initialization of the type, ``PyType_InitDict()`` must
|
|
be called. This replaces slots initialized to zero in the subtype
|
|
with the value of the corresponding base type slots. (It also
|
|
fills in ``tp_dict``, the type's dictionary, and does various other
|
|
initializations necessary for type objects.)
|
|
|
|
A subtype is not usable until ``PyType_InitDict()`` is called for it;
|
|
this is best done during module initialization, assuming the
|
|
subtype belongs to a module. An alternative for subtypes added to
|
|
the Python core (which don't live in a particular module) would be
|
|
to initialize the subtype in their constructor function. It is
|
|
allowed to call ``PyType_InitDict()`` more than once; the second and
|
|
further calls have no effect. To avoid unnecessary calls, a test
|
|
for ``tp_dict==NULL`` can be made.
|
|
|
|
(During initialization of the Python interpreter, some types are
|
|
actually used before they are initialized. As long as the slots
|
|
that are actually needed are initialized, especially ``tp_dealloc``,
|
|
this works, but it is fragile and not recommended as a general
|
|
practice.)
|
|
|
|
To create a subtype instance, the subtype's ``tp_new()`` slot is
|
|
called. This should first call the base type's ``tp_new()`` slot and
|
|
then initialize the subtype's additional data members. To further
|
|
initialize the instance, the ``tp_init()`` slot is typically called.
|
|
Note that the ``tp_new()`` slot should **not** call the ``tp_init()`` slot;
|
|
this is up to ``tp_new()``'s caller (typically a factory function).
|
|
There are circumstances where it is appropriate not to call
|
|
``tp_init()``.
|
|
|
|
If a subtype defines a ``tp_init()`` slot, the ``tp_init()`` slot should
|
|
normally first call the base type's ``tp_init()`` slot.
|
|
|
|
(XXX There should be a paragraph or two about argument passing
|
|
here.)
|
|
|
|
|
|
Subtyping in Python
|
|
===================
|
|
|
|
The next step is to allow subtyping of selected built-in types
|
|
through a class statement in Python. Limiting ourselves to single
|
|
inheritance for now, here is what happens for a simple class
|
|
statement::
|
|
|
|
class C(B):
|
|
var1 = 1
|
|
def method1(self): pass
|
|
# etc.
|
|
|
|
The body of the class statement is executed in a fresh environment
|
|
(basically, a new dictionary used as local namespace), and then C
|
|
is created. The following explains how C is created.
|
|
|
|
Assume B is a type object. Since type objects are objects, and
|
|
every object has a type, B has a type. Since B is itself a type,
|
|
we also call its type its metatype. B's metatype is accessible
|
|
via ``type(B)`` or ``B.__class__`` (the latter notation is new for types;
|
|
it is introduced in :pep:`252`). Let's say this metatype is M (for
|
|
Metatype). The class statement will create a new type, C. Since
|
|
C will be a type object just like B, we view the creation of C as
|
|
an instantiation of the metatype, M. The information that needs
|
|
to be provided for the creation of a subclass is:
|
|
|
|
- its name (in this example the string "C");
|
|
|
|
- its bases (a singleton tuple containing B);
|
|
|
|
- the results of executing the class body, in the form of a
|
|
dictionary (for example
|
|
``{"var1": 1, "method1": <functionmethod1 at ...>, ...}``).
|
|
|
|
The class statement will result in the following call::
|
|
|
|
C = M("C", (B,), dict)
|
|
|
|
where dict is the dictionary resulting from execution of the
|
|
class body. In other words, the metatype (M) is called.
|
|
|
|
Note that even though the example has only one base, we still pass
|
|
in a (singleton) sequence of bases; this makes the interface
|
|
uniform with the multiple-inheritance case.
|
|
|
|
In current Python, this is called the "Don Beaudry hook" after its
|
|
inventor; it is an exceptional case that is only invoked when a
|
|
base class is not a regular class. For a regular base class (or
|
|
when no base class is specified), current Python calls
|
|
``PyClass_New()``, the C level factory function for classes, directly.
|
|
|
|
Under the new system this is changed so that Python **always**
|
|
determines a metatype and calls it as given above. When one or
|
|
more bases are given, the type of the first base is used as the
|
|
metatype; when no base is given, a default metatype is chosen. By
|
|
setting the default metatype to ``PyClass_Type``, the metatype of
|
|
"classic" classes, the classic behavior of the class statement is
|
|
retained. This default can be changed per module by setting the
|
|
global variable ``__metaclass__``.
|
|
|
|
There are two further refinements here. First, a useful feature
|
|
is to be able to specify a metatype directly. If the class
|
|
suite defines a variable ``__metaclass__``, that is the metatype
|
|
to call. (Note that setting ``__metaclass__`` at the module level
|
|
only affects class statements without a base class and without an
|
|
explicit ``__metaclass__`` declaration; but setting ``__metaclass__`` in a
|
|
class suite overrides the default metatype unconditionally.)
|
|
|
|
Second, with multiple bases, not all bases need to have the same
|
|
metatype. This is called a metaclass conflict [1]_. Some
|
|
metaclass conflicts can be resolved by searching through the set
|
|
of bases for a metatype that derives from all other given
|
|
metatypes. If such a metatype cannot be found, an exception is
|
|
raised and the class statement fails.
|
|
|
|
This conflict resolution can be implemented by the metatype
|
|
constructors: the class statement just calls the metatype of the first
|
|
base (or that specified by the ``__metaclass__`` variable), and this
|
|
metatype's constructor looks for the most derived metatype. If
|
|
that is itself, it proceeds; otherwise, it calls that metatype's
|
|
constructor. (Ultimate flexibility: another metatype might choose
|
|
to require that all bases have the same metatype, or that there's
|
|
only one base class, or whatever.)
|
|
|
|
(In [1]_, a new metaclass is automatically derived that is a
|
|
subclass of all given metaclasses. But since it is questionable
|
|
in Python how conflicting method definitions of the various
|
|
metaclasses should be merged, I don't think this is feasible.
|
|
Should the need arise, the user can derive such a metaclass
|
|
manually and specify it using the ``__metaclass__`` variable. It is
|
|
also possible to have a new metaclass that does this.)
|
|
|
|
Note that calling M requires that M itself has a type: the
|
|
meta-metatype. And the meta-metatype has a type, the
|
|
meta-meta-metatype. And so on. This is normally cut short at
|
|
some level by making a metatype be its own metatype. This is
|
|
indeed what happens in Python: the ``ob_type`` reference in
|
|
``PyType_Type`` is set to ``&PyType_Type``. In the absence of third party
|
|
metatypes, ``PyType_Type`` is the only metatype in the Python
|
|
interpreter.
|
|
|
|
(In a previous version of this PEP, there was one additional
|
|
meta-level, and there was a meta-metatype called "turtle". This
|
|
turned out to be unnecessary.)
|
|
|
|
In any case, the work for creating C is done by M's ``tp_new()`` slot.
|
|
It allocates space for an "extended" type structure, containing:
|
|
the type object; the auxiliary structures (as_sequence etc.); the
|
|
string object containing the type name (to ensure that this object
|
|
isn't deallocated while the type object is still referencing it); and
|
|
some auxiliary storage (to be described later). It initializes this
|
|
storage to zeros except for a few crucial slots (for example, tp_name
|
|
is set to point to the type name) and then sets the tp_base slot to
|
|
point to B. Then ``PyType_InitDict()`` is called to inherit B's slots.
|
|
Finally, C's ``tp_dict`` slot is updated with the contents of the
|
|
namespace dictionary (the third argument to the call to M).
|
|
|
|
|
|
Multiple inheritance
|
|
====================
|
|
|
|
The Python class statement supports multiple inheritance, and we
|
|
will also support multiple inheritance involving built-in types.
|
|
|
|
However, there are some restrictions. The C runtime architecture
|
|
doesn't make it feasible to have a meaningful subtype of two
|
|
different built-in types except in a few degenerate cases.
|
|
Changing the C runtime to support fully general multiple
|
|
inheritance would be too much of an upheaval of the code base.
|
|
|
|
The main problem with multiple inheritance from different built-in
|
|
types stems from the fact that the C implementation of built-in
|
|
types accesses structure members directly; the C compiler
|
|
generates an offset relative to the object pointer and that's
|
|
that. For example, the list and dictionary type structures each
|
|
declare a number of different but overlapping structure members.
|
|
A C function accessing an object expecting a list won't work when
|
|
passed a dictionary, and vice versa, and there's not much we could
|
|
do about this without rewriting all code that accesses lists and
|
|
dictionaries. This would be too much work, so we won't do this.
|
|
|
|
The problem with multiple inheritance is caused by conflicting
|
|
structure member allocations. Classes defined in Python normally
|
|
don't store their instance variables in structure members: they
|
|
are stored in an instance dictionary. This is the key to a
|
|
partial solution. Suppose we have the following two classes::
|
|
|
|
class A(dictionary):
|
|
def foo(self): pass
|
|
|
|
class B(dictionary):
|
|
def bar(self): pass
|
|
|
|
class C(A, B): pass
|
|
|
|
(Here, 'dictionary' is the type of built-in dictionary objects,
|
|
a.k.a. ``type({})`` or ``{}.__class__`` or ``types.DictType``.) If we look at
|
|
the structure layout, we find that an A instance has the layout
|
|
of a dictionary followed by the ``__dict__`` pointer, and a B instance
|
|
has the same layout; since there are no structure member layout
|
|
conflicts, this is okay.
|
|
|
|
Here's another example::
|
|
|
|
class X(object):
|
|
def foo(self): pass
|
|
|
|
class Y(dictionary):
|
|
def bar(self): pass
|
|
|
|
class Z(X, Y): pass
|
|
|
|
(Here, 'object' is the base for all built-in types; its structure
|
|
layout only contains the ``ob_refcnt`` and ``ob_type`` members.) This
|
|
example is more complicated, because the ``__dict__`` pointer for X
|
|
instances has a different offset than that for Y instances. Where
|
|
is the ``__dict__`` pointer for Z instances? The answer is that the
|
|
offset for the ``__dict__`` pointer is not hardcoded, it is stored in
|
|
the type object.
|
|
|
|
Suppose on a particular machine an 'object' structure is 8 bytes
|
|
long, and a 'dictionary' struct is 60 bytes, and an object pointer
|
|
is 4 bytes. Then an X structure is 12 bytes (an object structure
|
|
followed by a ``__dict__`` pointer), and a Y structure is 64 bytes (a
|
|
dictionary structure followed by a ``__dict__`` pointer). The Z
|
|
structure has the same layout as the Y structure in this example.
|
|
Each type object (X, Y and Z) has a "__dict__ offset" which is
|
|
used to find the ``__dict__`` pointer. Thus, the recipe for looking
|
|
up an instance variable is:
|
|
|
|
1. get the type of the instance
|
|
2. get the ``__dict__`` offset from the type object
|
|
3. add the ``__dict__`` offset to the instance pointer
|
|
4. look in the resulting address to find a dictionary reference
|
|
5. look up the instance variable name in that dictionary
|
|
|
|
Of course, this recipe can only be implemented in C, and I have
|
|
left out some details. But this allows us to use multiple
|
|
inheritance patterns similar to the ones we can use with classic
|
|
classes.
|
|
|
|
XXX I should write up the complete algorithm here to determine
|
|
base class compatibility, but I can't be bothered right now. Look
|
|
at ``best_base()`` in typeobject.c in the implementation mentioned
|
|
below.
|
|
|
|
|
|
MRO: Method resolution order (the lookup rule)
|
|
===============================================
|
|
|
|
With multiple inheritance comes the question of method resolution
|
|
order: the order in which a class or type and its bases are
|
|
searched looking for a method of a given name.
|
|
|
|
In classic Python, the rule is given by the following recursive
|
|
function, also known as the left-to-right depth-first rule::
|
|
|
|
def classic_lookup(cls, name):
|
|
if cls.__dict__.has_key(name):
|
|
return cls.__dict__[name]
|
|
for base in cls.__bases__:
|
|
try:
|
|
return classic_lookup(base, name)
|
|
except AttributeError:
|
|
pass
|
|
raise AttributeError, name
|
|
|
|
The problem with this becomes apparent when we consider a "diamond
|
|
diagram"::
|
|
|
|
class A:
|
|
^ ^ def save(self): ...
|
|
/ \
|
|
/ \
|
|
/ \
|
|
/ \
|
|
class B class C:
|
|
^ ^ def save(self): ...
|
|
\ /
|
|
\ /
|
|
\ /
|
|
\ /
|
|
class D
|
|
|
|
Arrows point from a subtype to its base ``type(s)``. This particular
|
|
diagram means B and C derive from A, and D derives from B and C
|
|
(and hence also, indirectly, from A).
|
|
|
|
Assume that C overrides the method ``save()``, which is defined in the
|
|
base A. (``C.save()`` probably calls ``A.save()`` and then saves some of
|
|
its own state.) B and D don't override ``save()``. When we invoke
|
|
``save()`` on a D instance, which method is called? According to the
|
|
classic lookup rule, ``A.save()`` is called, ignoring ``C.save()``!
|
|
|
|
This is not good. It probably breaks C (its state doesn't get
|
|
saved), defeating the whole purpose of inheriting from C in the
|
|
first place.
|
|
|
|
Why was this not a problem in classic Python? Diamond diagrams
|
|
are rarely found in classic Python class hierarchies. Most class
|
|
hierarchies use single inheritance, and multiple inheritance is
|
|
usually confined to mix-in classes. In fact, the problem shown
|
|
here is probably the reason why multiple inheritance is unpopular
|
|
in classic Python.
|
|
|
|
Why will this be a problem in the new system? The 'object' type
|
|
at the top of the type hierarchy defines a number of methods that
|
|
can usefully be extended by subtypes, for example ``__getattr__()``.
|
|
|
|
(Aside: in classic Python, the ``__getattr__()`` method is not really
|
|
the implementation for the get-attribute operation; it is a hook
|
|
that only gets invoked when an attribute cannot be found by normal
|
|
means. This has often been cited as a shortcoming -- some class
|
|
designs have a legitimate need for a ``__getattr__()`` method that
|
|
gets called for **all** attribute references. But then of course
|
|
this method has to be able to invoke the default implementation
|
|
directly. The most natural way is to make the default
|
|
implementation available as ``object.__getattr__(self, name)``.)
|
|
|
|
Thus, a classic class hierarchy like this::
|
|
|
|
class B class C:
|
|
^ ^ def __getattr__(self, name): ...
|
|
\ /
|
|
\ /
|
|
\ /
|
|
\ /
|
|
class D
|
|
|
|
will change into a diamond diagram under the new system::
|
|
|
|
object:
|
|
^ ^ __getattr__()
|
|
/ \
|
|
/ \
|
|
/ \
|
|
/ \
|
|
class B class C:
|
|
^ ^ def __getattr__(self, name): ...
|
|
\ /
|
|
\ /
|
|
\ /
|
|
\ /
|
|
class D
|
|
|
|
and while in the original diagram ``C.__getattr__()`` is invoked,
|
|
under the new system with the classic lookup rule,
|
|
``object.__getattr__()`` would be invoked!
|
|
|
|
Fortunately, there's a lookup rule that's better. It's a bit
|
|
difficult to explain, but it does the right thing in the diamond
|
|
diagram, and it is the same as the classic lookup rule when there
|
|
are no diamonds in the inheritance graph (when it is a tree).
|
|
|
|
The new lookup rule constructs a list of all classes in the
|
|
inheritance diagram in the order in which they will be searched.
|
|
This construction is done at class definition time to save time.
|
|
To explain the new lookup rule, let's first consider what such a
|
|
list would look like for the classic lookup rule. Note that in
|
|
the presence of diamonds the classic lookup visits some classes
|
|
multiple times. For example, in the ABCD diamond diagram above,
|
|
the classic lookup rule visits the classes in this order::
|
|
|
|
D, B, A, C, A
|
|
|
|
Note how A occurs twice in the list. The second occurrence is
|
|
redundant, since anything that could be found there would already
|
|
have been found when searching the first occurrence.
|
|
|
|
We use this observation to explain our new lookup rule. Using the
|
|
classic lookup rule, construct the list of classes that would be
|
|
searched, including duplicates. Now for each class that occurs in
|
|
the list multiple times, remove all occurrences except for the
|
|
last. The resulting list contains each ancestor class exactly
|
|
once (including the most derived class, D in the example).
|
|
|
|
Searching for methods in this order will do the right thing for
|
|
the diamond diagram. Because of the way the list is constructed,
|
|
it does not change the search order in situations where no diamond
|
|
is involved.
|
|
|
|
Isn't this backwards incompatible? Won't it break existing code?
|
|
It would, if we changed the method resolution order for all
|
|
classes. However, in Python 2.2, the new lookup rule will only be
|
|
applied to types derived from built-in types, which is a new
|
|
feature. Class statements without a base class create "classic
|
|
classes", and so do class statements whose base classes are
|
|
themselves classic classes. For classic classes the classic
|
|
lookup rule will be used. (To experiment with the new lookup rule
|
|
for classic classes, you will be able to specify a different
|
|
metaclass explicitly.) We'll also provide a tool that analyzes a
|
|
class hierarchy looking for methods that would be affected by a
|
|
change in method resolution order.
|
|
|
|
XXX Another way to explain the motivation for the new MRO, due to
|
|
Damian Conway: you never use the method defined in a base class if
|
|
it is defined in a derived class that you haven't explored yet
|
|
(using the old search order).
|
|
|
|
|
|
XXX To be done
|
|
==============
|
|
|
|
Additional topics to be discussed in this PEP:
|
|
|
|
- backwards compatibility issues!!!
|
|
|
|
- class methods and static methods
|
|
|
|
- cooperative methods and ``super()``
|
|
|
|
- mapping between type object slots (tp_foo) and special methods
|
|
(``__foo__``) (actually, this may belong in :pep:`252`)
|
|
|
|
- built-in names for built-in types (object, int, str, list etc.)
|
|
|
|
- ``__dict__`` and ``__dictoffset__``
|
|
|
|
- ``__slots__``
|
|
|
|
- the ``HEAPTYPE`` flag bit
|
|
|
|
- GC support
|
|
|
|
- API docs for all the new functions
|
|
|
|
- how to use ``__new__``
|
|
|
|
- writing metaclasses (using ``mro()`` etc.)
|
|
|
|
- high level user overview
|
|
|
|
open issues
|
|
-----------
|
|
|
|
- do we need ``__del__``?
|
|
|
|
- assignment to ``__dict__``, ``__bases__``
|
|
|
|
- inconsistent naming
|
|
(e.g. tp_dealloc/tp_new/tp_init/tp_alloc/tp_free)
|
|
|
|
- add builtin alias 'dict' for 'dictionary'?
|
|
|
|
- when subclasses of dict/list etc. are passed to system
|
|
functions, the ``__getitem__`` overrides (etc.) aren't always
|
|
used
|
|
|
|
|
|
Implementation
|
|
==============
|
|
|
|
A prototype implementation of this PEP (and for :pep:`252`) is
|
|
available from CVS, and in the series of Python 2.2 alpha and beta
|
|
releases. For some examples of the features described here, see
|
|
the file Lib/test/test_descr.py and the extension module
|
|
Modules/xxsubtype.c.
|
|
|
|
|
|
References
|
|
==========
|
|
|
|
.. [1] "Putting Metaclasses to Work", by Ira R. Forman and Scott
|
|
H. Danforth, Addison-Wesley 1999.
|
|
(http://www.aw.com/product/0,2627,0201433052,00.html)
|
|
|
|
|
|
Copyright
|
|
=========
|
|
|
|
This document has been placed in the public domain.
|