2001-05-14 09:43:23 -04:00
|
|
|
|
PEP: 253
|
|
|
|
|
Title: Subtyping Built-in Types
|
|
|
|
|
Version: $Revision$
|
|
|
|
|
Author: guido@python.org (Guido van Rossum)
|
|
|
|
|
Status: Draft
|
|
|
|
|
Type: Standards Track
|
|
|
|
|
Python-Version: 2.2
|
|
|
|
|
Created: 14-May-2001
|
|
|
|
|
Post-History:
|
|
|
|
|
|
|
|
|
|
Abstract
|
|
|
|
|
|
2001-06-13 17:48:31 -04:00
|
|
|
|
This PEP proposes additions to the type object API that will allow
|
|
|
|
|
the creation of subtypes of built-in types, in C and in Python.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Introduction
|
2001-05-14 09:43:23 -04:00
|
|
|
|
|
2001-05-14 21:36:46 -04:00
|
|
|
|
Traditionally, types in Python have been created statically, by
|
|
|
|
|
declaring a global variable of type PyTypeObject and initializing
|
2001-07-10 16:01:52 -04:00
|
|
|
|
it with a static initializer. The slots in the type object
|
2001-06-13 17:48:31 -04:00
|
|
|
|
describe all aspects of a Python type that are relevant to the
|
2001-07-10 16:01:52 -04:00
|
|
|
|
Python interpreter. A few slots contain dimensional information
|
2001-07-10 13:11:19 -04:00
|
|
|
|
(like the basic allocation size of instances), others contain
|
2001-07-10 16:01:52 -04:00
|
|
|
|
various flags, but most slots are pointers to functions to
|
2001-05-14 21:36:46 -04:00
|
|
|
|
implement various kinds of behaviors. A NULL pointer means that
|
|
|
|
|
the type does not implement the specific behavior; in that case
|
|
|
|
|
the system may provide a default behavior in that case or raise an
|
|
|
|
|
exception when the behavior is invoked. Some collections of
|
2001-07-18 09:44:57 -04:00
|
|
|
|
function pointers that are usually defined together are obtained
|
2001-06-13 17:48:31 -04:00
|
|
|
|
indirectly via a pointer to an additional structure containing
|
|
|
|
|
more function pointers.
|
2001-05-14 09:43:23 -04:00
|
|
|
|
|
2001-05-14 21:36:46 -04:00
|
|
|
|
While the details of initializing a PyTypeObject structure haven't
|
2001-06-13 17:48:31 -04:00
|
|
|
|
been documented as such, they are easily gleaned from the examples
|
2001-05-14 21:36:46 -04:00
|
|
|
|
in the source code, and I am assuming that the reader is
|
|
|
|
|
sufficiently familiar with the traditional way of creating new
|
|
|
|
|
Python types in C.
|
|
|
|
|
|
2001-06-11 16:07:37 -04:00
|
|
|
|
This PEP will introduce the following features:
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
2001-07-10 13:11:19 -04:00
|
|
|
|
- a type can be a factory function for its instances
|
2001-06-13 17:48:31 -04:00
|
|
|
|
|
2001-07-10 13:11:19 -04:00
|
|
|
|
- types can be subtyped in C
|
2001-06-13 17:48:31 -04:00
|
|
|
|
|
2001-07-10 13:11:19 -04:00
|
|
|
|
- types can be subtyped in Python with the class statement
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
2001-07-10 13:11:19 -04:00
|
|
|
|
- multiple inheritance from types is supported (insofar as
|
|
|
|
|
practical -- you still can't multiply inherit from list and
|
|
|
|
|
dictionary)
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
2001-07-18 09:44:57 -04:00
|
|
|
|
- the standard coercion functions (int, tuple, str etc.) will
|
2001-07-10 13:11:19 -04:00
|
|
|
|
be redefined to be the corresponding type objects, which serve
|
|
|
|
|
as their own factory functions
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
2001-07-10 13:11:19 -04:00
|
|
|
|
- a class statement can contain a __metaclass__ declaration,
|
|
|
|
|
specifying the metaclass to be used to create the new class
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
2001-07-10 13:11:19 -04:00
|
|
|
|
- a class statement can contain a __slots__ declaration,
|
|
|
|
|
specifying the specific names of the instance variables
|
|
|
|
|
supported
|
2001-06-11 16:07:37 -04:00
|
|
|
|
|
2001-07-05 15:00:02 -04:00
|
|
|
|
This PEP builds on PEP 252, which adds standard introspection to
|
2001-07-10 13:11:19 -04:00
|
|
|
|
types; for example, when a particular type object initializes the
|
|
|
|
|
tp_hash slot, that type object has a __hash__ method when
|
|
|
|
|
introspected. PEP 252 also adds a dictionary to type objects
|
|
|
|
|
which contains all methods. At the Python level, this dictionary
|
|
|
|
|
is read-only for built-in types; at the C level, it is accessible
|
|
|
|
|
directly (but it should not be modified except as part of
|
|
|
|
|
initialization).
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
2001-06-13 17:48:31 -04:00
|
|
|
|
For binary compatibility, a flag bit in the tp_flags slot
|
|
|
|
|
indicates the existence of the various new slots in the type
|
|
|
|
|
object introduced below. Types that don't have the
|
2001-07-10 16:01:52 -04:00
|
|
|
|
Py_TPFLAGS_HAVE_CLASS bit set in their tp_flags slot are assumed
|
2001-06-13 17:48:31 -04:00
|
|
|
|
to have NULL values for all the subtyping slots. (Warning: the
|
|
|
|
|
current implementation prototype is not yet consistent in its
|
|
|
|
|
checking of this flag bit. This should be fixed before the final
|
|
|
|
|
release.)
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
2001-06-14 16:48:43 -04:00
|
|
|
|
In current Python, a distinction is made between types and
|
2001-07-05 15:00:02 -04:00
|
|
|
|
classes. This PEP together with PEP 254 will remove that
|
2001-07-10 13:11:19 -04:00
|
|
|
|
distinction. However, for backwards compatibility the distinction
|
|
|
|
|
will probably remain for years to come, and without PEP 254, the
|
|
|
|
|
distinction is still large: types ultimately have a built-in type
|
|
|
|
|
as a base class, while classes ultimately derive from a
|
|
|
|
|
user-defined class. Therefore, in the rest of this PEP, I will
|
|
|
|
|
use the word type whenever I can -- including base type or
|
|
|
|
|
supertype, derived type or subtype, and metatype. However,
|
|
|
|
|
sometimes the terminology necessarily blends, for example an
|
2001-06-14 16:48:43 -04:00
|
|
|
|
object's type is given by its __class__ attribute, and subtyping
|
|
|
|
|
in Python is spelled with a class statement. If further
|
|
|
|
|
distinction is necessary, user-defined classes can be referred to
|
|
|
|
|
as "classic" classes.
|
|
|
|
|
|
2001-06-13 17:48:31 -04:00
|
|
|
|
|
|
|
|
|
About metatypes
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
2001-07-10 13:11:19 -04:00
|
|
|
|
Inevitably the discussion comes to metatypes (or metaclasses).
|
|
|
|
|
Metatypes are nothing new in Python: Python has always been able
|
|
|
|
|
to talk about the type of a type:
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
|
|
|
|
>>> a = 0
|
|
|
|
|
>>> type(a)
|
|
|
|
|
<type 'int'>
|
|
|
|
|
>>> type(type(a))
|
|
|
|
|
<type 'type'>
|
|
|
|
|
>>> type(type(type(a)))
|
|
|
|
|
<type 'type'>
|
|
|
|
|
>>>
|
|
|
|
|
|
|
|
|
|
In this example, type(a) is a "regular" type, and type(type(a)) is
|
|
|
|
|
a metatype. While as distributed all types have the same metatype
|
2001-06-13 17:48:31 -04:00
|
|
|
|
(PyType_Type, which is also its own metatype), this is not a
|
|
|
|
|
requirement, and in fact a useful and relevant 3rd party extension
|
|
|
|
|
(ExtensionClasses by Jim Fulton) creates an additional metatype.
|
2001-07-10 13:11:19 -04:00
|
|
|
|
The type of classic classes, known as types.ClassType, can also be
|
|
|
|
|
considered a distinct metatype.
|
|
|
|
|
|
|
|
|
|
A feature closely connected to metatypes is the "Don Beaudry
|
|
|
|
|
hook", which says that if a metatype is callable, its instances
|
|
|
|
|
(which are regular types) can be subclassed (really subtyped)
|
|
|
|
|
using a Python class statement. I will use this rule to support
|
|
|
|
|
subtyping of built-in types, and in fact it greatly simplifies the
|
|
|
|
|
logic of class creation to always simply call the metatype. When
|
|
|
|
|
no base class is specified, a default metatype is called -- the
|
|
|
|
|
default metatype is the "ClassType" object, so the class statement
|
|
|
|
|
will behave as before in the normal case. (This default can be
|
|
|
|
|
changed per module by setting the global variable __metaclass__.)
|
2001-06-13 17:48:31 -04:00
|
|
|
|
|
|
|
|
|
Python uses the concept of metatypes or metaclasses in a different
|
|
|
|
|
way than Smalltalk. In Smalltalk-80, there is a hierarchy of
|
|
|
|
|
metaclasses that mirrors the hierarchy of regular classes,
|
|
|
|
|
metaclasses map 1-1 to classes (except for some funny business at
|
|
|
|
|
the root of the hierarchy), and each class statement creates both
|
|
|
|
|
a regular class and its metaclass, putting class methods in the
|
|
|
|
|
metaclass and instance methods in the regular class.
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
|
|
|
|
Nice though this may be in the context of Smalltalk, it's not
|
|
|
|
|
compatible with the traditional use of metatypes in Python, and I
|
|
|
|
|
prefer to continue in the Python way. This means that Python
|
|
|
|
|
metatypes are typically written in C, and may be shared between
|
|
|
|
|
many regular types. (It will be possible to subtype metatypes in
|
2001-07-11 17:26:08 -04:00
|
|
|
|
Python, so it won't be absolutely necessary to write C to use
|
|
|
|
|
metatypes; but the power of Python metatypes will be limited. For
|
|
|
|
|
example, Python code will never be allowed to allocate raw memory
|
|
|
|
|
and initialize it at will.)
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
2001-07-10 13:11:19 -04:00
|
|
|
|
Metatypes determine various *policies* for types,such as what
|
2001-06-13 17:48:31 -04:00
|
|
|
|
happens when a type is called, how dynamic types are (whether a
|
|
|
|
|
type's __dict__ can be modified after it is created), what the
|
|
|
|
|
method resolution order is, how instance attributes are looked
|
|
|
|
|
up, and so on.
|
|
|
|
|
|
|
|
|
|
I'll argue that left-to-right depth-first is not the best
|
|
|
|
|
solution when you want to get the most use from multiple
|
|
|
|
|
inheritance.
|
|
|
|
|
|
|
|
|
|
I'll argue that with multiple inheritance, the metatype of the
|
|
|
|
|
subtype must be a descendant of the metatypes of all base types.
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
2001-06-13 17:48:31 -04:00
|
|
|
|
I'll come back to metatypes later.
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
2001-06-13 17:48:31 -04:00
|
|
|
|
|
|
|
|
|
Making a type a factory for its instances
|
|
|
|
|
|
|
|
|
|
Traditionally, for each type there is at least one C factory
|
|
|
|
|
function that creates instances of the type (PyTuple_New(),
|
|
|
|
|
PyInt_FromLong() and so on). These factory functions take care of
|
2001-05-14 21:36:46 -04:00
|
|
|
|
both allocating memory for the object and initializing that
|
2001-06-13 17:48:31 -04:00
|
|
|
|
memory. As of Python 2.0, they also have to interface with the
|
2001-05-14 21:36:46 -04:00
|
|
|
|
garbage collection subsystem, if the type chooses to participate
|
|
|
|
|
in garbage collection (which is optional, but strongly recommended
|
|
|
|
|
for so-called "container" types: types that may contain arbitrary
|
|
|
|
|
references to other objects, and hence may participate in
|
|
|
|
|
reference cycles).
|
|
|
|
|
|
2001-06-13 17:48:31 -04:00
|
|
|
|
In this proposal, type objects can be factory functions for their
|
|
|
|
|
instances, making the types directly callable from Python. This
|
|
|
|
|
mimics the way classes are instantiated. Of course, the C APIs
|
|
|
|
|
for creating instances of various built-in types will remain valid
|
|
|
|
|
and probably the most common; and not all types will become their
|
|
|
|
|
own factory functions.
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
2001-06-13 17:48:31 -04:00
|
|
|
|
The type object has a new slot, tp_new, which can act as a factory
|
|
|
|
|
for instances of the type. Types are made callable by providing a
|
|
|
|
|
tp_call slot in PyType_Type (the metatype); the slot
|
|
|
|
|
implementation function looks for the tp_new slot of the type that
|
|
|
|
|
is being called.
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
2001-07-10 13:11:19 -04:00
|
|
|
|
(Confusion alert: the tp_call slot of a regular type object (such
|
|
|
|
|
as PyInt_Type or PyList_Type) defines what happens when
|
|
|
|
|
*instances* of that type are called; in particular, the tp_call
|
|
|
|
|
slot in the function type, PyFunction_Type, is the key to making
|
|
|
|
|
functions callable. As another example, PyInt_Type.tp_call is
|
|
|
|
|
NULL, because integers are not callable. The new paradigm makes
|
|
|
|
|
*type objects* callable. Since type objects are instances of
|
|
|
|
|
their metatype (PyType_Type), the metatype's tp_call slot
|
|
|
|
|
(PyType_Type.tp_call) points to a function that is invoked when
|
|
|
|
|
any type object is called. Now, since each type has do do
|
|
|
|
|
something different to create an instance of itself,
|
|
|
|
|
PyType_Type.tp_call immediately defers to the tp_new slot of the
|
|
|
|
|
type that is being called. To add to the confusion, PyType_Type
|
|
|
|
|
itself is also callable: its tp_new slot creates a new type. This
|
|
|
|
|
is used by the class statement (via the Don Beaudry hook, see
|
|
|
|
|
above). And what makes PyType_Type callable? The tp_call slot of
|
|
|
|
|
*its* metatype -- but since it is its own metatype, that is its
|
|
|
|
|
own tp_call slot!)
|
|
|
|
|
|
2001-06-13 17:48:31 -04:00
|
|
|
|
If the type's tp_new slot is NULL, an exception is raised.
|
|
|
|
|
Otherwise, the tp_new slot is called. The signature for the
|
|
|
|
|
tp_new slot is
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
2001-06-13 17:48:31 -04:00
|
|
|
|
PyObject *tp_new(PyTypeObject *type,
|
|
|
|
|
PyObject *args,
|
|
|
|
|
PyObject *kwds)
|
|
|
|
|
|
|
|
|
|
where 'type' is the type whose tp_new slot is called, and 'args'
|
|
|
|
|
and 'kwds' are the sequential and keyword arguments to the call,
|
|
|
|
|
passed unchanged from tp_call. (The 'type' argument is used in
|
|
|
|
|
combination with inheritance, see below.)
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
2001-06-13 17:48:31 -04:00
|
|
|
|
There are no constraints on the object type that is returned,
|
|
|
|
|
although by convention it should be an instance of the given
|
|
|
|
|
type. It is not necessary that a new object is returned; a
|
|
|
|
|
reference to an existing object is fine too. The return value
|
|
|
|
|
should always be a new reference, owned by the caller.
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
2001-07-10 13:11:19 -04:00
|
|
|
|
One the tp_new slot has returned an object, further initialization
|
|
|
|
|
is attempted by calling the tp_init() slot of the resulting
|
|
|
|
|
object's type, if not NULL. This has the following signature:
|
|
|
|
|
|
|
|
|
|
PyObject *tp_init(PyObject *self,
|
|
|
|
|
PyObject *args,
|
|
|
|
|
PyObject *kwds)
|
|
|
|
|
|
|
|
|
|
It corresponds more closely to the __init__() method of classic
|
|
|
|
|
classes, and in fact is mapped to that by the slot/special-method
|
|
|
|
|
correspondence rules. The difference in responsibilities between
|
|
|
|
|
the tp_new() slot and the tp_init() slot lies in the invariants
|
|
|
|
|
they ensure. The tp_new() slot should ensure only the most
|
|
|
|
|
essential invariants, without which the C code that implements the
|
2001-07-18 09:44:57 -04:00
|
|
|
|
object's would break. The tp_init() slot should be used for
|
2001-07-10 13:11:19 -04:00
|
|
|
|
overridable user-specific initializations. Take for example the
|
|
|
|
|
dictionary type. The implementation has an internal pointer to a
|
|
|
|
|
hash table which should never be NULL. This invariant is taken
|
|
|
|
|
care of by the tp_new() slot for dictionaries. The dictionary
|
|
|
|
|
tp_init() slot, on the other hand, could be used to give the
|
|
|
|
|
dictionary an initial set of keys and values based on the
|
|
|
|
|
arguments passed in.
|
|
|
|
|
|
2001-07-10 16:01:52 -04:00
|
|
|
|
Note that for immutable object types, the initialization cannot be
|
|
|
|
|
done by the tp_init() slot: this would provide the Python user
|
2001-07-18 09:44:57 -04:00
|
|
|
|
with a way to change the initialization. Therefore, immutable
|
2001-07-10 16:01:52 -04:00
|
|
|
|
objects typically have an empty tp_init() implementation and do
|
|
|
|
|
all their initialization in their tp_new() slot.
|
|
|
|
|
|
2001-07-10 13:11:19 -04:00
|
|
|
|
You may wonder why the tp_new() slot shouldn't call the tp_init()
|
|
|
|
|
slot itself. The reason is that in certain circumstances (like
|
|
|
|
|
support for persistent objects), it is important to be able to
|
|
|
|
|
create an object of a particular type without initializing it any
|
|
|
|
|
further than necessary. This may conveniently be done by calling
|
|
|
|
|
the tp_new() slot without calling tp_init(). It is also possible
|
|
|
|
|
that tp_init() is not called, or called more than once -- its
|
|
|
|
|
operation should be robust even in these anomalous cases.
|
|
|
|
|
|
|
|
|
|
For some objects, tp_new() may return an existing object. For
|
|
|
|
|
example, the factory function for integers caches the integers -1
|
|
|
|
|
throug 99. This is permissible only when the type argument to
|
|
|
|
|
tp_new() is the type that defined the tp_new() function (in the
|
|
|
|
|
example, if type == &PyInt_Type), and when the tp_init() slot for
|
|
|
|
|
this type does nothing. If the type argument differs, the
|
|
|
|
|
tp_new() call is initiated by by a derived type's tp_new() to
|
|
|
|
|
create the object and initialize the base type portion of the
|
|
|
|
|
object; in this case tp_new() should always return a new object
|
|
|
|
|
(or raise an exception).
|
|
|
|
|
|
|
|
|
|
There's a third slot related to object creation: tp_alloc(). Its
|
|
|
|
|
responsibility is to allocate the memory for the object,
|
2001-07-10 16:01:52 -04:00
|
|
|
|
initialize the reference count (ob_refcnt) and the type pointer
|
|
|
|
|
(ob_type), and initialize the rest of the object to all zeros. It
|
|
|
|
|
should also register the object with the garbage collection
|
|
|
|
|
subsystem if the type supports garbage collection. This slot
|
|
|
|
|
exists so that derived types can override the memory allocation
|
2001-07-11 15:09:28 -04:00
|
|
|
|
policy (like which heap is being used) separately from the
|
2001-07-10 16:01:52 -04:00
|
|
|
|
initialization code. The signature is:
|
2001-07-10 13:11:19 -04:00
|
|
|
|
|
|
|
|
|
PyObject *tp_alloc(PyTypeObject *type, int nitems)
|
|
|
|
|
|
|
|
|
|
The type argument is the type of the new object. The nitems
|
|
|
|
|
argument is normally zero, except for objects with a variable
|
|
|
|
|
allocation size (basically strings, tuples, and longs). The
|
|
|
|
|
allocation size is given by the following expression:
|
|
|
|
|
|
|
|
|
|
type->tp_basicsize + nitems * type->tp_itemsize
|
|
|
|
|
|
|
|
|
|
This slot is only used for subclassable types. The tp_new()
|
|
|
|
|
function of the base class must call the tp_alloc() slot of the
|
|
|
|
|
type passed in as its first argument. It is the tp_new()
|
|
|
|
|
function's responsibility to calculate the number of items. The
|
2001-07-10 16:01:52 -04:00
|
|
|
|
tp_alloc() slot will set the ob_size member of the new object if
|
|
|
|
|
the type->tp_itemsize member is nonzero.
|
|
|
|
|
|
|
|
|
|
(Note: in certain debugging compilation modes, the type structure
|
|
|
|
|
used to have members named tp_alloc and a tp_free slot already,
|
|
|
|
|
counters for the number of allocations and deallocations. These
|
|
|
|
|
are renamed to tp_allocs and tp_deallocs.)
|
2001-07-10 13:11:19 -04:00
|
|
|
|
|
|
|
|
|
XXX The keyword arguments are currently not passed to tp_new();
|
|
|
|
|
its kwds argument is always NULL. This is a relic from a previous
|
|
|
|
|
revision and should probably be fixed. Both tp_new() and
|
|
|
|
|
tp_init() should receive exactly the same arguments, and both
|
|
|
|
|
should check that the arguments are acceptable, because they may
|
|
|
|
|
be called independently.
|
|
|
|
|
|
2001-07-10 16:01:52 -04:00
|
|
|
|
Standard implementations for tp_alloc() and tp_new() are
|
|
|
|
|
available. PyType_GenericAlloc() allocates an object from the
|
|
|
|
|
standard heap and initializes it properly. It uses the above
|
|
|
|
|
formula to determine the amount of memory to allocate, and takes
|
|
|
|
|
care of GC registration. The only reason not to use this
|
2001-07-18 09:44:57 -04:00
|
|
|
|
implementation would be to allocate objects from a different heap
|
2001-07-10 16:01:52 -04:00
|
|
|
|
(as is done by some very small frequently used objects like ints
|
|
|
|
|
and tuples). PyType_GenericNew() adds very little: it just calls
|
|
|
|
|
the type's tp_alloc() slot with zero for nitems. But for mutable
|
|
|
|
|
types that do all their initialization in their tp_init() slot,
|
|
|
|
|
this may be just the ticket.
|
2001-06-13 17:48:31 -04:00
|
|
|
|
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
2001-07-10 16:01:52 -04:00
|
|
|
|
Preparing a type for subtyping
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
|
|
|
|
The idea behind subtyping is very similar to that of single
|
|
|
|
|
inheritance in C++. A base type is described by a structure
|
2001-07-10 16:01:52 -04:00
|
|
|
|
declaration (similar to the C++ class declaration) plus a type
|
|
|
|
|
object (similar to the C++ vtable). A derived type can extend the
|
|
|
|
|
structure (but must leave the names, order and type of the members
|
2001-05-14 21:36:46 -04:00
|
|
|
|
of the base structure unchanged) and can override certain slots in
|
2001-07-10 16:01:52 -04:00
|
|
|
|
the type object, leaving others the same. (Unlike C++ vtables,
|
|
|
|
|
all Python type objects have the same memory lay-out.)
|
|
|
|
|
|
|
|
|
|
The base type must do the following:
|
|
|
|
|
|
2001-07-10 16:46:24 -04:00
|
|
|
|
- Add the flag value Py_TPFLAGS_BASETYPE to tp_flags.
|
|
|
|
|
|
|
|
|
|
- Declare and use tp_new(), tp_alloc() and optional tp_init()
|
|
|
|
|
slots.
|
|
|
|
|
|
|
|
|
|
- Declare and use tp_dealloc() and tp_free().
|
|
|
|
|
|
|
|
|
|
- Export its object structure declaration.
|
|
|
|
|
|
|
|
|
|
- Export a subtyping-aware type-checking macro.
|
2001-07-10 16:01:52 -04:00
|
|
|
|
|
|
|
|
|
The requirements and signatures for tp_new(), tp_alloc() and
|
|
|
|
|
tp_init() have already been discussed above: tp_alloc() should
|
|
|
|
|
allocate the memory and initialize it to mostly zeros; tp_new()
|
|
|
|
|
should call the tp_alloc() slot and then proceed to do the
|
|
|
|
|
minimally required initialization; tp_init() should be used for
|
|
|
|
|
more extensive initialization of mutable objects.
|
|
|
|
|
|
|
|
|
|
It should come as no surprise that there are similar conventions
|
|
|
|
|
at the end of an object's lifetime. The slots involved are
|
|
|
|
|
tp_dealloc() (familiar to all who have ever implemented a Python
|
|
|
|
|
extension type) and tp_free(), the new kid on he block. (The
|
|
|
|
|
names aren't quite symmetric; tp_free() corresponds to tp_alloc(),
|
|
|
|
|
which is fine, but tp_dealloc() corresponds to tp_new(). Maybe
|
|
|
|
|
the tp_dealloc slot should be renamed?)
|
|
|
|
|
|
|
|
|
|
The tp_free() slot should be used to free the memory and
|
|
|
|
|
unregister the object with the garbage collection subsystem, and
|
|
|
|
|
can be overridden by a derived class; tp_dealloc() should
|
2001-07-11 15:09:28 -04:00
|
|
|
|
deinitialize the object (usually by calling Py_XDECREF() for
|
|
|
|
|
various sub-objects) and then call tp_free() to deallocate the
|
|
|
|
|
memory. The signature for tp_dealloc() is the same as it always
|
|
|
|
|
was:
|
2001-07-10 16:01:52 -04:00
|
|
|
|
|
|
|
|
|
void tp_dealloc(PyObject *object)
|
|
|
|
|
|
|
|
|
|
The signature for tp_free() is the same:
|
|
|
|
|
|
|
|
|
|
void tp_free(PyObject *object)
|
|
|
|
|
|
2001-07-18 09:44:57 -04:00
|
|
|
|
(In a previous version of this PEP, there was also a role reserved
|
2001-07-10 16:01:52 -04:00
|
|
|
|
for the tp_clear() slot. This turned out to be a bad idea.)
|
|
|
|
|
|
2001-07-11 17:26:08 -04:00
|
|
|
|
To be usefully subtyped in C, a type must export the structure
|
|
|
|
|
declaration for its instances through a header file, as it is
|
|
|
|
|
needed to derive a subtype. The type object for the base type
|
|
|
|
|
must also be exported.
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
2001-07-10 13:11:19 -04:00
|
|
|
|
If the base type has a type-checking macro (like PyDict_Check()),
|
2001-07-10 16:01:52 -04:00
|
|
|
|
this macro should be made to recognize subtypes. This can be done
|
|
|
|
|
by using the new PyObject_TypeCheck(object, type) macro, which
|
|
|
|
|
calls a function that follows the base class links.
|
|
|
|
|
|
|
|
|
|
The PyObject_TypeCheck() macro contains a slight optimization: it
|
|
|
|
|
first compares object->ob_type directly to the type argument, and
|
|
|
|
|
if this is a match, bypasses the function call. This should make
|
|
|
|
|
it fast enough for most situations.
|
|
|
|
|
|
|
|
|
|
Note that this change in the type-checking macro means that C
|
|
|
|
|
functions that require an instance of the base type may be invoked
|
|
|
|
|
with instances of the derived type. Before enabling subtyping of
|
|
|
|
|
a particular type, its code should be checked to make sure that
|
|
|
|
|
this won't break anything.
|
2001-06-13 17:48:31 -04:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Creating a subtype of a built-in type in C
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
2001-07-10 16:01:52 -04:00
|
|
|
|
The simplest form of subtyping is subtyping in C. It is the
|
|
|
|
|
simplest form because we can require the C code to be aware of
|
|
|
|
|
some of the problems, and it's acceptable for C code that doesn't
|
|
|
|
|
follow the rules to dump core. For added simplicity, it is
|
|
|
|
|
limited to single inheritance.
|
|
|
|
|
|
2001-06-14 09:37:45 -04:00
|
|
|
|
Let's assume we're deriving from a mutable base type whose
|
|
|
|
|
tp_itemsize is zero. The subtype code is not GC-aware, although
|
|
|
|
|
it may inherit GC-awareness from the base type (this is
|
|
|
|
|
automatic). The base type's allocation uses the standard heap.
|
|
|
|
|
|
2001-05-14 21:36:46 -04:00
|
|
|
|
The derived type begins by declaring a type structure which
|
|
|
|
|
contains the base type's structure. For example, here's the type
|
|
|
|
|
structure for a subtype of the built-in list type:
|
|
|
|
|
|
|
|
|
|
typedef struct {
|
|
|
|
|
PyListObject list;
|
|
|
|
|
int state;
|
|
|
|
|
} spamlistobject;
|
|
|
|
|
|
2001-07-10 16:01:52 -04:00
|
|
|
|
Note that the base type structure member (here PyListObject) must
|
|
|
|
|
be the first member of the structure; any following members are
|
|
|
|
|
additions. Also note that the base type is not referenced via a
|
|
|
|
|
pointer; the actual contents of its structure must be included!
|
|
|
|
|
(The goal is for the memory lay out of the beginning of the
|
|
|
|
|
subtype instance to be the same as that of the base type
|
2001-05-14 21:36:46 -04:00
|
|
|
|
instance.)
|
|
|
|
|
|
|
|
|
|
Next, the derived type must declare a type object and initialize
|
|
|
|
|
it. Most of the slots in the type object may be initialized to
|
|
|
|
|
zero, which is a signal that the base type slot must be copied
|
2001-07-10 16:01:52 -04:00
|
|
|
|
into it. Some slots that must be initialized properly:
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
2001-07-10 16:46:24 -04:00
|
|
|
|
- The object header must be filled in as usual; the type should
|
|
|
|
|
be &PyType_Type.
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
2001-07-10 16:46:24 -04:00
|
|
|
|
- The tp_basicsize slot must be set to the size of the subtype
|
|
|
|
|
instance struct (in the above example:
|
|
|
|
|
sizeof(spamlistobject)).
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
2001-07-10 16:46:24 -04:00
|
|
|
|
- The tp_base slot must be set to the address of the base type's
|
|
|
|
|
type object.
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
2001-07-10 16:46:24 -04:00
|
|
|
|
- If the derived slot defines any pointer members, the
|
|
|
|
|
tp_dealloc slot function requires special attention, see
|
|
|
|
|
below; otherwise, it can be set to zero, to inherit the base
|
|
|
|
|
type's deallocation function.
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
2001-07-10 16:46:24 -04:00
|
|
|
|
- The tp_flags slot must be set to the usual Py_TPFLAGS_DEFAULT
|
|
|
|
|
value.
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
2001-07-10 16:46:24 -04:00
|
|
|
|
- The tp_name slot must be set; it is recommended to set tp_doc
|
|
|
|
|
as well (these are not inherited).
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
2001-07-10 16:01:52 -04:00
|
|
|
|
If the subtype defines no additional structure members (it only
|
|
|
|
|
defines new behavior, no new data), the tp_basicsize and the
|
|
|
|
|
tp_dealloc slots may be left set to zero.
|
2001-06-14 09:37:45 -04:00
|
|
|
|
|
|
|
|
|
The subtype's tp_dealloc slot deserves special attention. If the
|
2001-07-10 16:01:52 -04:00
|
|
|
|
derived type defines no additional pointer members that need to be
|
2001-06-14 09:37:45 -04:00
|
|
|
|
DECREF'ed or freed when the object is deallocated, it can be set
|
2001-07-10 16:01:52 -04:00
|
|
|
|
to zero. Otherwise, the subtype's tp_dealloc() function must call
|
|
|
|
|
Py_XDECREF() for any PyObject * members and the correct memory
|
2001-06-14 09:37:45 -04:00
|
|
|
|
freeing function for any other pointers it owns, and then call the
|
2001-07-10 16:01:52 -04:00
|
|
|
|
base class's tp_dealloc() slot. This call has to be made via the
|
|
|
|
|
base type's type structure, for example, when deriving from the
|
2001-07-10 13:11:19 -04:00
|
|
|
|
standard list type:
|
2001-06-14 09:37:45 -04:00
|
|
|
|
|
|
|
|
|
PyList_Type.tp_dealloc(self);
|
|
|
|
|
|
2001-07-10 16:01:52 -04:00
|
|
|
|
If the subtype wants to use a different allocation heap than the
|
|
|
|
|
base type, the subtype must override both the tp_alloc() and the
|
|
|
|
|
tp_free() slots. These will be called by the base class's
|
|
|
|
|
tp_new() and tp_dealloc() slots, respectively.
|
|
|
|
|
|
2001-07-11 17:26:08 -04:00
|
|
|
|
To complete the initialization of the type, PyType_InitDict() must
|
|
|
|
|
be called. This replaces slots initialized to zero in the subtype
|
|
|
|
|
with the value of the corresponding base type slots. (It also
|
|
|
|
|
fills in tp_dict, the type's dictionary, and does various other
|
|
|
|
|
initializations necessary for type objects.)
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
|
|
|
|
A subtype is not usable until PyType_InitDict() is called for it;
|
|
|
|
|
this is best done during module initialization, assuming the
|
|
|
|
|
subtype belongs to a module. An alternative for subtypes added to
|
|
|
|
|
the Python core (which don't live in a particular module) would be
|
|
|
|
|
to initialize the subtype in their constructor function. It is
|
2001-07-10 16:01:52 -04:00
|
|
|
|
allowed to call PyType_InitDict() more than once; the second and
|
2001-07-11 17:26:08 -04:00
|
|
|
|
further calls have no effect. To avoid unnecessary calls, a test
|
|
|
|
|
for tp_dict==NULL can be made.
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
2001-07-10 16:01:52 -04:00
|
|
|
|
(During initialization of the Python interpreter, some types are
|
|
|
|
|
actually used before they are initialized. As long as the slots
|
|
|
|
|
that are actually needed are initialized, especially tp_dealloc,
|
|
|
|
|
this works, but it is fragile and not recommended as a general
|
|
|
|
|
practice.)
|
|
|
|
|
|
|
|
|
|
To create a subtype instance, the subtype's tp_new() slot is
|
|
|
|
|
called. This should first call the base type's tp_new() slot and
|
|
|
|
|
then initialize the subtype's additional data members. To further
|
|
|
|
|
initialize the instance, the tp_init() slot is typically called.
|
|
|
|
|
Note that the tp_new() slot should *not* call the tp_init() slot;
|
|
|
|
|
this is up to tp_new()'s caller (typically a factory function).
|
|
|
|
|
There are circumstances where it is appropriate not to call
|
|
|
|
|
tp_init().
|
|
|
|
|
|
|
|
|
|
If a subtype defines a tp_init() slot, the tp_init() slot should
|
|
|
|
|
normally first call the base type's tp_init() slot.
|
|
|
|
|
|
|
|
|
|
(XXX There should be a paragraph or two about argument passing
|
|
|
|
|
here.)
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Subtyping in Python
|
|
|
|
|
|
|
|
|
|
The next step is to allow subtyping of selected built-in types
|
|
|
|
|
through a class statement in Python. Limiting ourselves to single
|
|
|
|
|
inheritance for now, here is what happens for a simple class
|
|
|
|
|
statement:
|
|
|
|
|
|
|
|
|
|
class C(B):
|
|
|
|
|
var1 = 1
|
|
|
|
|
def method1(self): pass
|
|
|
|
|
# etc.
|
|
|
|
|
|
2001-07-18 09:44:57 -04:00
|
|
|
|
The body of the class statement is executed in a fresh environment
|
2001-05-14 21:36:46 -04:00
|
|
|
|
(basically, a new dictionary used as local namespace), and then C
|
|
|
|
|
is created. The following explains how C is created.
|
|
|
|
|
|
|
|
|
|
Assume B is a type object. Since type objects are objects, and
|
2001-07-10 16:46:24 -04:00
|
|
|
|
every object has a type, B has a type. Since B is itself a type,
|
|
|
|
|
we also call its type its metatype. B's metatype is accessible
|
|
|
|
|
via type(B) or B.__class__ (the latter notation is new for types;
|
|
|
|
|
it is introduced in PEP 252). Let's say this metatype is M (for
|
2001-05-14 21:36:46 -04:00
|
|
|
|
Metatype). The class statement will create a new type, C. Since
|
|
|
|
|
C will be a type object just like B, we view the creation of C as
|
|
|
|
|
an instantiation of the metatype, M. The information that needs
|
2001-07-10 16:46:24 -04:00
|
|
|
|
to be provided for the creation of a subclass is:
|
|
|
|
|
|
|
|
|
|
- its name (in this example the string "C");
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
2001-07-10 16:46:24 -04:00
|
|
|
|
- its bases (a singleton tuple containing B);
|
|
|
|
|
|
|
|
|
|
- the results of executing the class body, in the form of a
|
|
|
|
|
dictionary (for example {"var1": 1, "method1": <function
|
|
|
|
|
method1 at ...>, ...}).
|
|
|
|
|
|
|
|
|
|
The class statement will result in the following call:
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
2001-06-14 16:48:43 -04:00
|
|
|
|
C = M("C", (B,), dict)
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
|
|
|
|
(where dict is the dictionary resulting from execution of the
|
2001-06-14 16:48:43 -04:00
|
|
|
|
class body). In other words, the metatype (M) is called.
|
|
|
|
|
|
2001-07-10 16:46:24 -04:00
|
|
|
|
Note that even though the example has only one base, we still pass
|
|
|
|
|
in a (singleton) sequence of bases; this makes the interface
|
|
|
|
|
uniform with the multiple-inheritance case.
|
2001-06-14 16:48:43 -04:00
|
|
|
|
|
|
|
|
|
In current Python, this is called the "Don Beaudry hook" after its
|
|
|
|
|
inventor; it is an exceptional case that is only invoked when a
|
|
|
|
|
base class is not a regular class. For a regular base class (or
|
|
|
|
|
when no base class is specified), current Python calls
|
|
|
|
|
PyClass_New(), the C level factory function for classes, directly.
|
2001-07-10 16:46:24 -04:00
|
|
|
|
|
|
|
|
|
Under the new system this is changed so that Python *always*
|
|
|
|
|
determines a metatype and calls it as given above. When one or
|
|
|
|
|
more bases are given, the type of the first base is used as the
|
|
|
|
|
metatype; when no base is given, a default metatype is chosen. By
|
|
|
|
|
setting the default metatype to PyClass_Type, the metatype of
|
2001-06-14 16:48:43 -04:00
|
|
|
|
"classic" classes, the classic behavior of the class statement is
|
2001-07-10 16:46:24 -04:00
|
|
|
|
retained. This default can be changed per module by setting the
|
|
|
|
|
global variable __metaclass__.
|
2001-06-14 16:48:43 -04:00
|
|
|
|
|
|
|
|
|
There are two further refinements here. First, a useful feature
|
|
|
|
|
is to be able to specify a metatype directly. If the class
|
|
|
|
|
statement defines a variable __metaclass__, that is the metatype
|
2001-07-10 16:46:24 -04:00
|
|
|
|
to call. (Note that setting __metaclass__ at the module level
|
|
|
|
|
only affects class statements without a base class and without an
|
|
|
|
|
explicit __metaclass__ declaration; but setting __metaclass__ in a
|
|
|
|
|
class statement overrides the default metatype unconditionally.)
|
2001-06-14 16:48:43 -04:00
|
|
|
|
|
|
|
|
|
Second, with multiple bases, not all bases need to have the same
|
|
|
|
|
metatype. This is called a metaclass conflict [1]. Some
|
|
|
|
|
metaclass conflicts can be resolved by searching through the set
|
|
|
|
|
of bases for a metatype that derives from all other given
|
|
|
|
|
metatypes. If such a metatype cannot be found, an exception is
|
|
|
|
|
raised and the class statement fails.
|
|
|
|
|
|
|
|
|
|
This conflict resultion can be implemented in the metatypes
|
|
|
|
|
itself: the class statement just calls the metatype of the first
|
2001-07-10 16:46:24 -04:00
|
|
|
|
base (or that specified by the __metaclass__ variable), and this
|
|
|
|
|
metatype's constructor looks for the most derived metatype. If
|
|
|
|
|
that is itself, it proceeds; otherwise, it calls that metatype's
|
|
|
|
|
constructor. (Ultimate flexibility: another metatype might choose
|
|
|
|
|
to require that all bases have the same metatype, or that there's
|
|
|
|
|
only one base class, or whatever.)
|
|
|
|
|
|
|
|
|
|
(In [1], a new metaclass is automatically derived that is a
|
|
|
|
|
subclass of all given metaclasses. But since it is questionable
|
|
|
|
|
in Python how conflicting method definitions of the various
|
|
|
|
|
metaclasses should be merged, I don't think this is feasible.
|
|
|
|
|
Should the need arise, the user can derive such a metaclass
|
|
|
|
|
manually and specify it using the __metaclass__ variable. It is
|
|
|
|
|
also possible to have a new metaclass that does this.)
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
|
|
|
|
Note that calling M requires that M itself has a type: the
|
2001-07-10 16:46:24 -04:00
|
|
|
|
meta-metatype. And the meta-metatype has a type, the
|
|
|
|
|
meta-meta-metatype. And so on. This is normally cut short at
|
|
|
|
|
some level by making a metatype be its own metatype. This is
|
|
|
|
|
indeed what happens in Python: the ob_type reference in
|
|
|
|
|
PyType_Type is set to &PyType_Type. In the absence of third party
|
|
|
|
|
metatypes, PyType_Type is the only metatype in the Python
|
|
|
|
|
interpreter.
|
|
|
|
|
|
|
|
|
|
(In a previous version of this PEP, there was one additional
|
|
|
|
|
meta-level, and there was a meta-metatype called "turtle". This
|
|
|
|
|
turned out to be unnecessary.)
|
|
|
|
|
|
|
|
|
|
In any case, the work for creating C is done by M's tp_new() slot.
|
|
|
|
|
It allocates space for an "extended" type structure, which
|
2001-05-14 21:36:46 -04:00
|
|
|
|
contains space for: the type object; the auxiliary structures
|
|
|
|
|
(as_sequence etc.); the string object containing the type name (to
|
|
|
|
|
ensure that this object isn't deallocated while the type object is
|
|
|
|
|
still referencing it); and some more auxiliary storage (to be
|
|
|
|
|
described later). It initializes this storage to zeros except for
|
2001-07-10 13:11:19 -04:00
|
|
|
|
a few crucial slots (for example, tp_name is set to point to the
|
|
|
|
|
type name) and then sets the tp_base slot to point to B. Then
|
2001-05-14 21:36:46 -04:00
|
|
|
|
PyType_InitDict() is called to inherit B's slots. Finally, C's
|
|
|
|
|
tp_dict slot is updated with the contents of the namespace
|
|
|
|
|
dictionary (the third argument to the call to M).
|
2001-05-14 09:43:23 -04:00
|
|
|
|
|
|
|
|
|
|
2001-07-11 17:26:08 -04:00
|
|
|
|
Multiple inheritance
|
2001-07-11 15:09:28 -04:00
|
|
|
|
|
|
|
|
|
The Python class statement supports multiple inheritance, and we
|
|
|
|
|
will also support multiple inheritance involving built-in types.
|
|
|
|
|
|
|
|
|
|
However, there are some restrictions. The C runtime architecture
|
|
|
|
|
doesn't make it feasible to have a meaningful subtype of two
|
|
|
|
|
different built-in types except in a few degenerate cases.
|
|
|
|
|
Changing the C runtime to support fully general multiple
|
|
|
|
|
inheritance would be too much of an upheaval of the code base.
|
|
|
|
|
|
|
|
|
|
The main problem with multiple inheritance from different built-in
|
|
|
|
|
types stems from the fact that the C implementation of built-in
|
|
|
|
|
types accesses structure members directly; the C compiler
|
|
|
|
|
generates an offset relative to the object pointer and that's
|
|
|
|
|
that. For example, the list and dictionary type structures each
|
|
|
|
|
declare a number of different but overlapping structure members.
|
|
|
|
|
A C function accessing an object expecting a list won't work when
|
|
|
|
|
passed a dictionary, and vice versa, and there's not much we could
|
|
|
|
|
do about this without rewriting all code that accesses lists and
|
|
|
|
|
dictionaries. This would be too much work, so we won't do this.
|
|
|
|
|
|
|
|
|
|
The problem with multiple inheritance is caused by conflicting
|
|
|
|
|
structure member allocations. Classes defined in Python normally
|
|
|
|
|
don't store their instance variables in structure members: they
|
|
|
|
|
are stored in an instance dictionary. This is the key to a
|
|
|
|
|
partial solution. Suppose we have the following two classes:
|
|
|
|
|
|
|
|
|
|
class A(dictionary):
|
|
|
|
|
def foo(self): pass
|
|
|
|
|
|
|
|
|
|
class B(dictionary):
|
|
|
|
|
def bar(self): pass
|
|
|
|
|
|
|
|
|
|
class C(A, B): pass
|
|
|
|
|
|
|
|
|
|
(Here, 'dictionary' is the type of built-in dictionary objects,
|
|
|
|
|
a.k.a. type({}) or {}.__class__ or types.DictType.) If we look at
|
|
|
|
|
the structure lay-out, we find that an A instance has the lay-out
|
|
|
|
|
of a dictionary followed by the __dict__ pointer, and a B instance
|
|
|
|
|
has the same lay-out; since there are no structure member lay-out
|
|
|
|
|
conflicts, this is okay.
|
|
|
|
|
|
|
|
|
|
Here's another example:
|
|
|
|
|
|
|
|
|
|
class X(object):
|
|
|
|
|
def foo(self): pass
|
|
|
|
|
|
|
|
|
|
class Y(dictionary):
|
|
|
|
|
def bar(self): pass
|
|
|
|
|
|
|
|
|
|
class Z(X, Y): pass
|
|
|
|
|
|
|
|
|
|
(Here, 'object' is the base for all built-in types; its structure
|
|
|
|
|
lay-out only contains the ob_refcnt and ob_type members.) This
|
|
|
|
|
example is more complicated, because the __dict__ pointer for X
|
|
|
|
|
instances has a different offset than that for Y instances. Where
|
|
|
|
|
is the __dict__ pointer for Z instances? The answer is that the
|
|
|
|
|
offset for the __dict__ pointer is not hardcoded, it is stored in
|
|
|
|
|
the type object.
|
|
|
|
|
|
|
|
|
|
Suppose on a particular machine an 'object' structure is 8 bytes
|
|
|
|
|
long, and a 'dictionary' struct is 60 bytes, and an object pointer
|
|
|
|
|
is 4 bytes. Then an X structure is 12 bytes (an object structure
|
|
|
|
|
followed by a __dict__ pointer), and a Y structure is 64 bytes (a
|
|
|
|
|
dictionary structure followed by a __dict__ pointer). The Z
|
|
|
|
|
structure has the same lay-out as the Y structure in this example.
|
|
|
|
|
Each type object (X, Y and Z) has a "__dict__ offset" which is
|
|
|
|
|
used to find the __dict__ pointer. Thus, the recipe for looking
|
|
|
|
|
up an instance variable is:
|
|
|
|
|
|
|
|
|
|
1. get the type of the instance
|
|
|
|
|
2. get the __dict__ offset from the type object
|
|
|
|
|
3. add the __dict__ offset to the instance pointer
|
|
|
|
|
4. look in the resulting address to find a dictionary reference
|
|
|
|
|
5. look up the instance variable name in that dictionary
|
|
|
|
|
|
|
|
|
|
Of course, this recipe can only be implemented in C, and I have
|
|
|
|
|
left out some details. But this allows us to use multiple
|
|
|
|
|
inheritance patterns similar to the ones we can use with classic
|
|
|
|
|
classes.
|
|
|
|
|
|
|
|
|
|
XXX I should write up the complete algorithm here to determine
|
|
|
|
|
base class compatibility, but I can't be bothered right now. Look
|
|
|
|
|
at best_base() in typeobject.c in the implementation mentioned
|
|
|
|
|
below.
|
|
|
|
|
|
|
|
|
|
|
2001-07-11 17:26:08 -04:00
|
|
|
|
Method resolution order (the lookup rule)
|
|
|
|
|
|
|
|
|
|
With multiple inheritance comes the question of method resolution
|
|
|
|
|
order: the order in which a class or type and its bases are
|
|
|
|
|
searched looking for a method of a given name.
|
|
|
|
|
|
|
|
|
|
In classic Python, the rule is given by the following recursive
|
|
|
|
|
function, also known as the left-to-right depth-first rule:
|
|
|
|
|
|
|
|
|
|
def classic_lookup(cls, name):
|
|
|
|
|
if cls.__dict__.has_key(name):
|
|
|
|
|
return cls.__dict__[name]
|
|
|
|
|
for base in cls.__bases__:
|
|
|
|
|
try:
|
|
|
|
|
return classic_lookup(base, name)
|
|
|
|
|
except AttributeError:
|
|
|
|
|
pass
|
|
|
|
|
raise AttributeError, name
|
|
|
|
|
|
|
|
|
|
The problem with this becomes apparent when we consider a "diamond
|
|
|
|
|
diagram":
|
|
|
|
|
|
|
|
|
|
class A:
|
|
|
|
|
^ ^ def save(self): ...
|
|
|
|
|
/ \
|
|
|
|
|
/ \
|
|
|
|
|
/ \
|
|
|
|
|
/ \
|
|
|
|
|
class B class C:
|
|
|
|
|
^ ^ def save(self): ...
|
|
|
|
|
\ /
|
|
|
|
|
\ /
|
|
|
|
|
\ /
|
|
|
|
|
\ /
|
|
|
|
|
class D
|
|
|
|
|
|
|
|
|
|
Arrows point from a subtype to its base type(s). This particular
|
|
|
|
|
diagram means B and C derive from A, and D derives from B and C
|
|
|
|
|
(and hence also, indirectly, from A).
|
|
|
|
|
|
|
|
|
|
Assume that C overrides the method save(), which is defined in the
|
|
|
|
|
base A. (C.save() probably calls A.save() and then saves some of
|
|
|
|
|
its own state.) B and D don't override save(). When we invoke
|
|
|
|
|
save() on a D instance, which method is called? According to the
|
|
|
|
|
classic lookup rule, A.save() is called, ignoring C.save()!
|
|
|
|
|
|
|
|
|
|
This is not good. It probably breaks C (its state doesn't get
|
|
|
|
|
saved), defeating the whole purpose of inheriting from C in the
|
|
|
|
|
first place.
|
|
|
|
|
|
|
|
|
|
Why was this not a problem in classic Python? Diamond diagrams is
|
|
|
|
|
found rarely in classic Python class hierarchies. Most class
|
|
|
|
|
hierarchies use single inheritance, and multiple inheritance is
|
|
|
|
|
usually confined to mix-in classes. In fact, the problem shown
|
|
|
|
|
here is probably the reason why multiple inheritance is impopular
|
|
|
|
|
in classic Python.
|
|
|
|
|
|
|
|
|
|
Why will this be a problem in the new system? The 'object' type
|
|
|
|
|
at the top of the type hierarchy defines a number of methods that
|
|
|
|
|
can usefully be extended by subtypes, for example __getattr__().
|
|
|
|
|
|
|
|
|
|
(Aside: in classic Python, the __getattr__() method is not really
|
|
|
|
|
the implementation for the get-attribute operation; it is a hook
|
|
|
|
|
that only gets invoked when an attribute cannot be found by normal
|
|
|
|
|
means. This has often been cited as a shortcoming -- some class
|
|
|
|
|
designs have a legitimate need for a __getattr__() method that
|
|
|
|
|
gets called for *all* attribute references. But then of course
|
|
|
|
|
this method has to be able to invoke the default implementation
|
|
|
|
|
directly. The most natural way is to make the default
|
|
|
|
|
implementation available as object.__getattr__(self, name).)
|
|
|
|
|
|
|
|
|
|
Thus, a classic class hierarchy like this:
|
|
|
|
|
|
|
|
|
|
class B class C:
|
|
|
|
|
^ ^ def __getattr__(self, name): ...
|
|
|
|
|
\ /
|
|
|
|
|
\ /
|
|
|
|
|
\ /
|
|
|
|
|
\ /
|
|
|
|
|
class D
|
|
|
|
|
|
|
|
|
|
will change into a diamond diagram under the new system:
|
|
|
|
|
|
|
|
|
|
object:
|
|
|
|
|
^ ^ __getattr__()
|
|
|
|
|
/ \
|
|
|
|
|
/ \
|
|
|
|
|
/ \
|
|
|
|
|
/ \
|
|
|
|
|
class B class C:
|
|
|
|
|
^ ^ def __getattr__(self, name): ...
|
|
|
|
|
\ /
|
|
|
|
|
\ /
|
|
|
|
|
\ /
|
|
|
|
|
\ /
|
|
|
|
|
class D
|
|
|
|
|
|
|
|
|
|
and while in the original diagram C.__getattr__() is invoked,
|
|
|
|
|
under the new system with the classic lookup rule,
|
|
|
|
|
object.__getattr__() would be invoked!
|
|
|
|
|
|
|
|
|
|
Fortunately, there's a lookup rule that's better. It's a bit
|
|
|
|
|
difficult to explain, but it does the right thing in the diamond
|
|
|
|
|
diagram, and it is the same as the classic lookup rule when there
|
|
|
|
|
are no diamonds in the inheritance graph (when it is a tree).
|
|
|
|
|
|
|
|
|
|
The new lookup rule constructs a list of all classes in the
|
|
|
|
|
inheritance diagram in the order in which they will be searched.
|
|
|
|
|
This construction is done at class definition time to save time.
|
|
|
|
|
To explain the new lookup rule, let's first consider what such a
|
|
|
|
|
list would look like for the classic lookup rule. Note that in
|
|
|
|
|
the presence of diamonds the classic lookup visits some classes
|
|
|
|
|
multiple times. For example, in the ABCD diamond diagram above,
|
|
|
|
|
the classic lookup rule visits the classes in this order:
|
|
|
|
|
|
|
|
|
|
D, B, A, C, A
|
|
|
|
|
|
|
|
|
|
Note how A occurs twice in the list. The second occurrence is
|
|
|
|
|
redundant, since anything that could be found there would already
|
|
|
|
|
have been found when searching the first occurrence.
|
|
|
|
|
|
|
|
|
|
We use this observation to explain our new lookup rule. Using the
|
|
|
|
|
classic lookup rule, construct the list of classes that would be
|
|
|
|
|
searched, including duplicates. Now for each class that occurs in
|
|
|
|
|
the list multiple times, remove all occurrences except for the
|
|
|
|
|
last. The resulting list contains each ancestor class exactly
|
|
|
|
|
once (including the most derived class, D in the example).
|
|
|
|
|
|
|
|
|
|
Searching for methods in this order will do the right thing for
|
|
|
|
|
the diamond diagram. Because of the way the list is constructed,
|
|
|
|
|
it does not change the search order in situations where no diamond
|
|
|
|
|
is involved.
|
|
|
|
|
|
|
|
|
|
Isn't this backwards incompatible? Won't it break existing code?
|
|
|
|
|
It would, if we changed the method resolution order for all
|
|
|
|
|
classes. However, in Python 2.2, the new lookup rule will only be
|
|
|
|
|
applied to types derived from built-in types, which is a new
|
|
|
|
|
feature. Class statements without a base class create "classic
|
|
|
|
|
classes", and so do class statements whose base classes are
|
|
|
|
|
themselves classic classes. For classic classes the classic
|
|
|
|
|
lookup rule will be used. (To experiment with the new lookup rule
|
|
|
|
|
for classic classes, you will be able to specify a different
|
|
|
|
|
metaclass explicitly.) We'll also provide a tool that analyzes a
|
|
|
|
|
class hierarchy looking for methods that would be affected by a
|
|
|
|
|
change in method resolution order.
|
|
|
|
|
|
|
|
|
|
|
2001-07-10 16:46:24 -04:00
|
|
|
|
XXX To be done
|
|
|
|
|
|
|
|
|
|
Additional topics to be discussed in this PEP:
|
2001-06-11 16:07:37 -04:00
|
|
|
|
|
2001-07-11 17:26:08 -04:00
|
|
|
|
- backwards compatibility issues!!!
|
|
|
|
|
|
2001-07-10 16:46:24 -04:00
|
|
|
|
- class methods and static methods
|
2001-06-11 16:07:37 -04:00
|
|
|
|
|
2001-07-13 15:48:07 -04:00
|
|
|
|
- cooperative methods and super()
|
|
|
|
|
|
2001-07-10 16:46:24 -04:00
|
|
|
|
- mapping between type object slots (tp_foo) and special methods
|
2001-07-11 15:09:28 -04:00
|
|
|
|
(__foo__) (actually, this may belong in PEP 252)
|
2001-07-10 16:46:24 -04:00
|
|
|
|
|
|
|
|
|
- built-in names for built-in types (object, int, str, list etc.)
|
|
|
|
|
|
2001-07-11 15:09:28 -04:00
|
|
|
|
- __dict__
|
2001-07-10 16:46:24 -04:00
|
|
|
|
|
|
|
|
|
- __slots__
|
|
|
|
|
|
|
|
|
|
- the HEAPTYPE and DYNAMICTYPE flag bits
|
|
|
|
|
|
2001-07-11 15:09:28 -04:00
|
|
|
|
- GC support
|
|
|
|
|
|
2001-07-10 16:46:24 -04:00
|
|
|
|
- API docs for all the new functions
|
|
|
|
|
|
2001-07-11 17:26:08 -04:00
|
|
|
|
- using __new__
|
|
|
|
|
|
|
|
|
|
- writing metaclasses (using mro() etc.)
|
|
|
|
|
|
2001-07-11 15:09:28 -04:00
|
|
|
|
- high level user overview
|
|
|
|
|
|
2001-07-10 16:46:24 -04:00
|
|
|
|
|
|
|
|
|
Implementation
|
|
|
|
|
|
|
|
|
|
A prototype implementation of this PEP (and for PEP 252) is
|
|
|
|
|
available from CVS as a branch named "descr-branch". To
|
|
|
|
|
experiment with this implementation, proceed to check out Python
|
|
|
|
|
from CVS according to the instructions at
|
|
|
|
|
http://sourceforge.net/cvs/?group_id=5470 but add the arguments
|
|
|
|
|
"-r descr-branch" to the cvs checkout command. (You can also
|
|
|
|
|
start with an existing checkout and do "cvs update -r
|
|
|
|
|
descr-branch".) For some examples of the features described here,
|
|
|
|
|
see the file Lib/test/test_descr.py and the extension module
|
|
|
|
|
Modules/xxsubtype.c.
|
2001-06-11 16:07:37 -04:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
References
|
|
|
|
|
|
|
|
|
|
[1] "Putting Metaclasses to Work", by Ira R. Forman and Scott
|
|
|
|
|
H. Danforth, Addison-Wesley 1999.
|
|
|
|
|
(http://www.aw.com/product/0,2627,0201433052,00.html)
|
|
|
|
|
|
|
|
|
|
|
2001-05-14 09:43:23 -04:00
|
|
|
|
Copyright
|
|
|
|
|
|
|
|
|
|
This document has been placed in the public domain.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Local Variables:
|
|
|
|
|
mode: indented-text
|
|
|
|
|
indent-tabs-mode: nil
|
|
|
|
|
End:
|