2001-05-14 09:43:23 -04:00
|
|
|
|
PEP: 253
|
|
|
|
|
Title: Subtyping Built-in Types
|
|
|
|
|
Version: $Revision$
|
|
|
|
|
Author: guido@python.org (Guido van Rossum)
|
|
|
|
|
Status: Draft
|
|
|
|
|
Type: Standards Track
|
|
|
|
|
Python-Version: 2.2
|
|
|
|
|
Created: 14-May-2001
|
|
|
|
|
Post-History:
|
|
|
|
|
|
|
|
|
|
Abstract
|
|
|
|
|
|
2001-06-13 17:48:31 -04:00
|
|
|
|
This PEP proposes additions to the type object API that will allow
|
|
|
|
|
the creation of subtypes of built-in types, in C and in Python.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Introduction
|
2001-05-14 09:43:23 -04:00
|
|
|
|
|
2001-05-14 21:36:46 -04:00
|
|
|
|
Traditionally, types in Python have been created statically, by
|
|
|
|
|
declaring a global variable of type PyTypeObject and initializing
|
2001-07-10 16:01:52 -04:00
|
|
|
|
it with a static initializer. The slots in the type object
|
2001-06-13 17:48:31 -04:00
|
|
|
|
describe all aspects of a Python type that are relevant to the
|
2001-07-10 16:01:52 -04:00
|
|
|
|
Python interpreter. A few slots contain dimensional information
|
2001-07-10 13:11:19 -04:00
|
|
|
|
(like the basic allocation size of instances), others contain
|
2001-07-10 16:01:52 -04:00
|
|
|
|
various flags, but most slots are pointers to functions to
|
2001-05-14 21:36:46 -04:00
|
|
|
|
implement various kinds of behaviors. A NULL pointer means that
|
|
|
|
|
the type does not implement the specific behavior; in that case
|
|
|
|
|
the system may provide a default behavior in that case or raise an
|
|
|
|
|
exception when the behavior is invoked. Some collections of
|
|
|
|
|
functions pointers that are usually defined together are obtained
|
2001-06-13 17:48:31 -04:00
|
|
|
|
indirectly via a pointer to an additional structure containing
|
|
|
|
|
more function pointers.
|
2001-05-14 09:43:23 -04:00
|
|
|
|
|
2001-05-14 21:36:46 -04:00
|
|
|
|
While the details of initializing a PyTypeObject structure haven't
|
2001-06-13 17:48:31 -04:00
|
|
|
|
been documented as such, they are easily gleaned from the examples
|
2001-05-14 21:36:46 -04:00
|
|
|
|
in the source code, and I am assuming that the reader is
|
|
|
|
|
sufficiently familiar with the traditional way of creating new
|
|
|
|
|
Python types in C.
|
|
|
|
|
|
2001-06-11 16:07:37 -04:00
|
|
|
|
This PEP will introduce the following features:
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
2001-07-10 13:11:19 -04:00
|
|
|
|
- a type can be a factory function for its instances
|
2001-06-13 17:48:31 -04:00
|
|
|
|
|
2001-07-10 13:11:19 -04:00
|
|
|
|
- types can be subtyped in C
|
2001-06-13 17:48:31 -04:00
|
|
|
|
|
2001-07-10 13:11:19 -04:00
|
|
|
|
- types can be subtyped in Python with the class statement
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
2001-07-10 13:11:19 -04:00
|
|
|
|
- multiple inheritance from types is supported (insofar as
|
|
|
|
|
practical -- you still can't multiply inherit from list and
|
|
|
|
|
dictionary)
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
2001-07-10 13:11:19 -04:00
|
|
|
|
- the standard coercions functions (int, tuple, str etc.) will
|
|
|
|
|
be redefined to be the corresponding type objects, which serve
|
|
|
|
|
as their own factory functions
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
2001-07-10 13:11:19 -04:00
|
|
|
|
- a class statement can contain a __metaclass__ declaration,
|
|
|
|
|
specifying the metaclass to be used to create the new class
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
2001-07-10 13:11:19 -04:00
|
|
|
|
- a class statement can contain a __slots__ declaration,
|
|
|
|
|
specifying the specific names of the instance variables
|
|
|
|
|
supported
|
2001-06-11 16:07:37 -04:00
|
|
|
|
|
2001-07-10 13:11:19 -04:00
|
|
|
|
- there will be a standard type hierarchy (maybe)
|
2001-06-11 16:07:37 -04:00
|
|
|
|
|
2001-07-05 15:00:02 -04:00
|
|
|
|
This PEP builds on PEP 252, which adds standard introspection to
|
2001-07-10 13:11:19 -04:00
|
|
|
|
types; for example, when a particular type object initializes the
|
|
|
|
|
tp_hash slot, that type object has a __hash__ method when
|
|
|
|
|
introspected. PEP 252 also adds a dictionary to type objects
|
|
|
|
|
which contains all methods. At the Python level, this dictionary
|
|
|
|
|
is read-only for built-in types; at the C level, it is accessible
|
|
|
|
|
directly (but it should not be modified except as part of
|
|
|
|
|
initialization).
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
2001-06-13 17:48:31 -04:00
|
|
|
|
For binary compatibility, a flag bit in the tp_flags slot
|
|
|
|
|
indicates the existence of the various new slots in the type
|
|
|
|
|
object introduced below. Types that don't have the
|
2001-07-10 16:01:52 -04:00
|
|
|
|
Py_TPFLAGS_HAVE_CLASS bit set in their tp_flags slot are assumed
|
2001-06-13 17:48:31 -04:00
|
|
|
|
to have NULL values for all the subtyping slots. (Warning: the
|
|
|
|
|
current implementation prototype is not yet consistent in its
|
|
|
|
|
checking of this flag bit. This should be fixed before the final
|
|
|
|
|
release.)
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
2001-06-14 16:48:43 -04:00
|
|
|
|
In current Python, a distinction is made between types and
|
2001-07-05 15:00:02 -04:00
|
|
|
|
classes. This PEP together with PEP 254 will remove that
|
2001-07-10 13:11:19 -04:00
|
|
|
|
distinction. However, for backwards compatibility the distinction
|
|
|
|
|
will probably remain for years to come, and without PEP 254, the
|
|
|
|
|
distinction is still large: types ultimately have a built-in type
|
|
|
|
|
as a base class, while classes ultimately derive from a
|
|
|
|
|
user-defined class. Therefore, in the rest of this PEP, I will
|
|
|
|
|
use the word type whenever I can -- including base type or
|
|
|
|
|
supertype, derived type or subtype, and metatype. However,
|
|
|
|
|
sometimes the terminology necessarily blends, for example an
|
2001-06-14 16:48:43 -04:00
|
|
|
|
object's type is given by its __class__ attribute, and subtyping
|
|
|
|
|
in Python is spelled with a class statement. If further
|
|
|
|
|
distinction is necessary, user-defined classes can be referred to
|
|
|
|
|
as "classic" classes.
|
|
|
|
|
|
2001-06-13 17:48:31 -04:00
|
|
|
|
|
|
|
|
|
About metatypes
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
2001-07-10 13:11:19 -04:00
|
|
|
|
Inevitably the discussion comes to metatypes (or metaclasses).
|
|
|
|
|
Metatypes are nothing new in Python: Python has always been able
|
|
|
|
|
to talk about the type of a type:
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
|
|
|
|
>>> a = 0
|
|
|
|
|
>>> type(a)
|
|
|
|
|
<type 'int'>
|
|
|
|
|
>>> type(type(a))
|
|
|
|
|
<type 'type'>
|
|
|
|
|
>>> type(type(type(a)))
|
|
|
|
|
<type 'type'>
|
|
|
|
|
>>>
|
|
|
|
|
|
|
|
|
|
In this example, type(a) is a "regular" type, and type(type(a)) is
|
|
|
|
|
a metatype. While as distributed all types have the same metatype
|
2001-06-13 17:48:31 -04:00
|
|
|
|
(PyType_Type, which is also its own metatype), this is not a
|
|
|
|
|
requirement, and in fact a useful and relevant 3rd party extension
|
|
|
|
|
(ExtensionClasses by Jim Fulton) creates an additional metatype.
|
2001-07-10 13:11:19 -04:00
|
|
|
|
The type of classic classes, known as types.ClassType, can also be
|
|
|
|
|
considered a distinct metatype.
|
|
|
|
|
|
|
|
|
|
A feature closely connected to metatypes is the "Don Beaudry
|
|
|
|
|
hook", which says that if a metatype is callable, its instances
|
|
|
|
|
(which are regular types) can be subclassed (really subtyped)
|
|
|
|
|
using a Python class statement. I will use this rule to support
|
|
|
|
|
subtyping of built-in types, and in fact it greatly simplifies the
|
|
|
|
|
logic of class creation to always simply call the metatype. When
|
|
|
|
|
no base class is specified, a default metatype is called -- the
|
|
|
|
|
default metatype is the "ClassType" object, so the class statement
|
|
|
|
|
will behave as before in the normal case. (This default can be
|
|
|
|
|
changed per module by setting the global variable __metaclass__.)
|
2001-06-13 17:48:31 -04:00
|
|
|
|
|
|
|
|
|
Python uses the concept of metatypes or metaclasses in a different
|
|
|
|
|
way than Smalltalk. In Smalltalk-80, there is a hierarchy of
|
|
|
|
|
metaclasses that mirrors the hierarchy of regular classes,
|
|
|
|
|
metaclasses map 1-1 to classes (except for some funny business at
|
|
|
|
|
the root of the hierarchy), and each class statement creates both
|
|
|
|
|
a regular class and its metaclass, putting class methods in the
|
|
|
|
|
metaclass and instance methods in the regular class.
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
|
|
|
|
Nice though this may be in the context of Smalltalk, it's not
|
|
|
|
|
compatible with the traditional use of metatypes in Python, and I
|
|
|
|
|
prefer to continue in the Python way. This means that Python
|
|
|
|
|
metatypes are typically written in C, and may be shared between
|
|
|
|
|
many regular types. (It will be possible to subtype metatypes in
|
|
|
|
|
Python, so it won't be absolutely necessary to write C in order to
|
2001-07-10 13:11:19 -04:00
|
|
|
|
use metatypes; but the power of Python metatypes will be limited.
|
|
|
|
|
For example, Python code will never be allowed to allocate raw
|
|
|
|
|
memory and initialize it at will.)
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
2001-07-10 13:11:19 -04:00
|
|
|
|
Metatypes determine various *policies* for types,such as what
|
2001-06-13 17:48:31 -04:00
|
|
|
|
happens when a type is called, how dynamic types are (whether a
|
|
|
|
|
type's __dict__ can be modified after it is created), what the
|
|
|
|
|
method resolution order is, how instance attributes are looked
|
|
|
|
|
up, and so on.
|
|
|
|
|
|
|
|
|
|
I'll argue that left-to-right depth-first is not the best
|
|
|
|
|
solution when you want to get the most use from multiple
|
|
|
|
|
inheritance.
|
|
|
|
|
|
|
|
|
|
I'll argue that with multiple inheritance, the metatype of the
|
|
|
|
|
subtype must be a descendant of the metatypes of all base types.
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
2001-06-13 17:48:31 -04:00
|
|
|
|
I'll come back to metatypes later.
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
2001-06-13 17:48:31 -04:00
|
|
|
|
|
|
|
|
|
Making a type a factory for its instances
|
|
|
|
|
|
|
|
|
|
Traditionally, for each type there is at least one C factory
|
|
|
|
|
function that creates instances of the type (PyTuple_New(),
|
|
|
|
|
PyInt_FromLong() and so on). These factory functions take care of
|
2001-05-14 21:36:46 -04:00
|
|
|
|
both allocating memory for the object and initializing that
|
2001-06-13 17:48:31 -04:00
|
|
|
|
memory. As of Python 2.0, they also have to interface with the
|
2001-05-14 21:36:46 -04:00
|
|
|
|
garbage collection subsystem, if the type chooses to participate
|
|
|
|
|
in garbage collection (which is optional, but strongly recommended
|
|
|
|
|
for so-called "container" types: types that may contain arbitrary
|
|
|
|
|
references to other objects, and hence may participate in
|
|
|
|
|
reference cycles).
|
|
|
|
|
|
2001-06-13 17:48:31 -04:00
|
|
|
|
In this proposal, type objects can be factory functions for their
|
|
|
|
|
instances, making the types directly callable from Python. This
|
|
|
|
|
mimics the way classes are instantiated. Of course, the C APIs
|
|
|
|
|
for creating instances of various built-in types will remain valid
|
|
|
|
|
and probably the most common; and not all types will become their
|
|
|
|
|
own factory functions.
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
2001-06-13 17:48:31 -04:00
|
|
|
|
The type object has a new slot, tp_new, which can act as a factory
|
|
|
|
|
for instances of the type. Types are made callable by providing a
|
|
|
|
|
tp_call slot in PyType_Type (the metatype); the slot
|
|
|
|
|
implementation function looks for the tp_new slot of the type that
|
|
|
|
|
is being called.
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
2001-07-10 13:11:19 -04:00
|
|
|
|
(Confusion alert: the tp_call slot of a regular type object (such
|
|
|
|
|
as PyInt_Type or PyList_Type) defines what happens when
|
|
|
|
|
*instances* of that type are called; in particular, the tp_call
|
|
|
|
|
slot in the function type, PyFunction_Type, is the key to making
|
|
|
|
|
functions callable. As another example, PyInt_Type.tp_call is
|
|
|
|
|
NULL, because integers are not callable. The new paradigm makes
|
|
|
|
|
*type objects* callable. Since type objects are instances of
|
|
|
|
|
their metatype (PyType_Type), the metatype's tp_call slot
|
|
|
|
|
(PyType_Type.tp_call) points to a function that is invoked when
|
|
|
|
|
any type object is called. Now, since each type has do do
|
|
|
|
|
something different to create an instance of itself,
|
|
|
|
|
PyType_Type.tp_call immediately defers to the tp_new slot of the
|
|
|
|
|
type that is being called. To add to the confusion, PyType_Type
|
|
|
|
|
itself is also callable: its tp_new slot creates a new type. This
|
|
|
|
|
is used by the class statement (via the Don Beaudry hook, see
|
|
|
|
|
above). And what makes PyType_Type callable? The tp_call slot of
|
|
|
|
|
*its* metatype -- but since it is its own metatype, that is its
|
|
|
|
|
own tp_call slot!)
|
|
|
|
|
|
2001-06-13 17:48:31 -04:00
|
|
|
|
If the type's tp_new slot is NULL, an exception is raised.
|
|
|
|
|
Otherwise, the tp_new slot is called. The signature for the
|
|
|
|
|
tp_new slot is
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
2001-06-13 17:48:31 -04:00
|
|
|
|
PyObject *tp_new(PyTypeObject *type,
|
|
|
|
|
PyObject *args,
|
|
|
|
|
PyObject *kwds)
|
|
|
|
|
|
|
|
|
|
where 'type' is the type whose tp_new slot is called, and 'args'
|
|
|
|
|
and 'kwds' are the sequential and keyword arguments to the call,
|
|
|
|
|
passed unchanged from tp_call. (The 'type' argument is used in
|
|
|
|
|
combination with inheritance, see below.)
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
2001-06-13 17:48:31 -04:00
|
|
|
|
There are no constraints on the object type that is returned,
|
|
|
|
|
although by convention it should be an instance of the given
|
|
|
|
|
type. It is not necessary that a new object is returned; a
|
|
|
|
|
reference to an existing object is fine too. The return value
|
|
|
|
|
should always be a new reference, owned by the caller.
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
2001-07-10 13:11:19 -04:00
|
|
|
|
One the tp_new slot has returned an object, further initialization
|
|
|
|
|
is attempted by calling the tp_init() slot of the resulting
|
|
|
|
|
object's type, if not NULL. This has the following signature:
|
|
|
|
|
|
|
|
|
|
PyObject *tp_init(PyObject *self,
|
|
|
|
|
PyObject *args,
|
|
|
|
|
PyObject *kwds)
|
|
|
|
|
|
|
|
|
|
It corresponds more closely to the __init__() method of classic
|
|
|
|
|
classes, and in fact is mapped to that by the slot/special-method
|
|
|
|
|
correspondence rules. The difference in responsibilities between
|
|
|
|
|
the tp_new() slot and the tp_init() slot lies in the invariants
|
|
|
|
|
they ensure. The tp_new() slot should ensure only the most
|
|
|
|
|
essential invariants, without which the C code that implements the
|
|
|
|
|
object's wold break. The tp_init() slot should be used for
|
|
|
|
|
overridable user-specific initializations. Take for example the
|
|
|
|
|
dictionary type. The implementation has an internal pointer to a
|
|
|
|
|
hash table which should never be NULL. This invariant is taken
|
|
|
|
|
care of by the tp_new() slot for dictionaries. The dictionary
|
|
|
|
|
tp_init() slot, on the other hand, could be used to give the
|
|
|
|
|
dictionary an initial set of keys and values based on the
|
|
|
|
|
arguments passed in.
|
|
|
|
|
|
2001-07-10 16:01:52 -04:00
|
|
|
|
Note that for immutable object types, the initialization cannot be
|
|
|
|
|
done by the tp_init() slot: this would provide the Python user
|
|
|
|
|
with a way to change the initialiation. Therefore, immutable
|
|
|
|
|
objects typically have an empty tp_init() implementation and do
|
|
|
|
|
all their initialization in their tp_new() slot.
|
|
|
|
|
|
2001-07-10 13:11:19 -04:00
|
|
|
|
You may wonder why the tp_new() slot shouldn't call the tp_init()
|
|
|
|
|
slot itself. The reason is that in certain circumstances (like
|
|
|
|
|
support for persistent objects), it is important to be able to
|
|
|
|
|
create an object of a particular type without initializing it any
|
|
|
|
|
further than necessary. This may conveniently be done by calling
|
|
|
|
|
the tp_new() slot without calling tp_init(). It is also possible
|
|
|
|
|
that tp_init() is not called, or called more than once -- its
|
|
|
|
|
operation should be robust even in these anomalous cases.
|
|
|
|
|
|
|
|
|
|
For some objects, tp_new() may return an existing object. For
|
|
|
|
|
example, the factory function for integers caches the integers -1
|
|
|
|
|
throug 99. This is permissible only when the type argument to
|
|
|
|
|
tp_new() is the type that defined the tp_new() function (in the
|
|
|
|
|
example, if type == &PyInt_Type), and when the tp_init() slot for
|
|
|
|
|
this type does nothing. If the type argument differs, the
|
|
|
|
|
tp_new() call is initiated by by a derived type's tp_new() to
|
|
|
|
|
create the object and initialize the base type portion of the
|
|
|
|
|
object; in this case tp_new() should always return a new object
|
|
|
|
|
(or raise an exception).
|
|
|
|
|
|
|
|
|
|
There's a third slot related to object creation: tp_alloc(). Its
|
|
|
|
|
responsibility is to allocate the memory for the object,
|
2001-07-10 16:01:52 -04:00
|
|
|
|
initialize the reference count (ob_refcnt) and the type pointer
|
|
|
|
|
(ob_type), and initialize the rest of the object to all zeros. It
|
|
|
|
|
should also register the object with the garbage collection
|
|
|
|
|
subsystem if the type supports garbage collection. This slot
|
|
|
|
|
exists so that derived types can override the memory allocation
|
|
|
|
|
policy (e.g. which heap is being used) separately from the
|
|
|
|
|
initialization code. The signature is:
|
2001-07-10 13:11:19 -04:00
|
|
|
|
|
|
|
|
|
PyObject *tp_alloc(PyTypeObject *type, int nitems)
|
|
|
|
|
|
|
|
|
|
The type argument is the type of the new object. The nitems
|
|
|
|
|
argument is normally zero, except for objects with a variable
|
|
|
|
|
allocation size (basically strings, tuples, and longs). The
|
|
|
|
|
allocation size is given by the following expression:
|
|
|
|
|
|
|
|
|
|
type->tp_basicsize + nitems * type->tp_itemsize
|
|
|
|
|
|
|
|
|
|
This slot is only used for subclassable types. The tp_new()
|
|
|
|
|
function of the base class must call the tp_alloc() slot of the
|
|
|
|
|
type passed in as its first argument. It is the tp_new()
|
|
|
|
|
function's responsibility to calculate the number of items. The
|
2001-07-10 16:01:52 -04:00
|
|
|
|
tp_alloc() slot will set the ob_size member of the new object if
|
|
|
|
|
the type->tp_itemsize member is nonzero.
|
|
|
|
|
|
|
|
|
|
(Note: in certain debugging compilation modes, the type structure
|
|
|
|
|
used to have members named tp_alloc and a tp_free slot already,
|
|
|
|
|
counters for the number of allocations and deallocations. These
|
|
|
|
|
are renamed to tp_allocs and tp_deallocs.)
|
2001-07-10 13:11:19 -04:00
|
|
|
|
|
|
|
|
|
XXX The keyword arguments are currently not passed to tp_new();
|
|
|
|
|
its kwds argument is always NULL. This is a relic from a previous
|
|
|
|
|
revision and should probably be fixed. Both tp_new() and
|
|
|
|
|
tp_init() should receive exactly the same arguments, and both
|
|
|
|
|
should check that the arguments are acceptable, because they may
|
|
|
|
|
be called independently.
|
|
|
|
|
|
2001-07-10 16:01:52 -04:00
|
|
|
|
Standard implementations for tp_alloc() and tp_new() are
|
|
|
|
|
available. PyType_GenericAlloc() allocates an object from the
|
|
|
|
|
standard heap and initializes it properly. It uses the above
|
|
|
|
|
formula to determine the amount of memory to allocate, and takes
|
|
|
|
|
care of GC registration. The only reason not to use this
|
|
|
|
|
implementation would be to allocate objects from different heap
|
|
|
|
|
(as is done by some very small frequently used objects like ints
|
|
|
|
|
and tuples). PyType_GenericNew() adds very little: it just calls
|
|
|
|
|
the type's tp_alloc() slot with zero for nitems. But for mutable
|
|
|
|
|
types that do all their initialization in their tp_init() slot,
|
|
|
|
|
this may be just the ticket.
|
2001-06-13 17:48:31 -04:00
|
|
|
|
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
2001-07-10 16:01:52 -04:00
|
|
|
|
Preparing a type for subtyping
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
|
|
|
|
The idea behind subtyping is very similar to that of single
|
|
|
|
|
inheritance in C++. A base type is described by a structure
|
2001-07-10 16:01:52 -04:00
|
|
|
|
declaration (similar to the C++ class declaration) plus a type
|
|
|
|
|
object (similar to the C++ vtable). A derived type can extend the
|
|
|
|
|
structure (but must leave the names, order and type of the members
|
2001-05-14 21:36:46 -04:00
|
|
|
|
of the base structure unchanged) and can override certain slots in
|
2001-07-10 16:01:52 -04:00
|
|
|
|
the type object, leaving others the same. (Unlike C++ vtables,
|
|
|
|
|
all Python type objects have the same memory lay-out.)
|
|
|
|
|
|
|
|
|
|
The base type must do the following:
|
|
|
|
|
|
|
|
|
|
- Add the flag value Py_TPFLAGS_BASETYPE to tp_flags.
|
|
|
|
|
- Declare and use tp_new(), tp_alloc() and optional tp_init() slots.
|
|
|
|
|
- Declare and use tp_dealloc() and tp_free().
|
|
|
|
|
- Export its object structure declaration.
|
|
|
|
|
- Export a subtyping-aware type-checking macro.
|
|
|
|
|
|
|
|
|
|
The requirements and signatures for tp_new(), tp_alloc() and
|
|
|
|
|
tp_init() have already been discussed above: tp_alloc() should
|
|
|
|
|
allocate the memory and initialize it to mostly zeros; tp_new()
|
|
|
|
|
should call the tp_alloc() slot and then proceed to do the
|
|
|
|
|
minimally required initialization; tp_init() should be used for
|
|
|
|
|
more extensive initialization of mutable objects.
|
|
|
|
|
|
|
|
|
|
It should come as no surprise that there are similar conventions
|
|
|
|
|
at the end of an object's lifetime. The slots involved are
|
|
|
|
|
tp_dealloc() (familiar to all who have ever implemented a Python
|
|
|
|
|
extension type) and tp_free(), the new kid on he block. (The
|
|
|
|
|
names aren't quite symmetric; tp_free() corresponds to tp_alloc(),
|
|
|
|
|
which is fine, but tp_dealloc() corresponds to tp_new(). Maybe
|
|
|
|
|
the tp_dealloc slot should be renamed?)
|
|
|
|
|
|
|
|
|
|
The tp_free() slot should be used to free the memory and
|
|
|
|
|
unregister the object with the garbage collection subsystem, and
|
|
|
|
|
can be overridden by a derived class; tp_dealloc() should
|
|
|
|
|
deinitialize the object (e.g. by calling Py_XDECREF() for various
|
|
|
|
|
sub-objects) and then call tp_free() to deallocate the memory.
|
|
|
|
|
The signature for tp_dealloc() is the same as it always was:
|
|
|
|
|
|
|
|
|
|
void tp_dealloc(PyObject *object)
|
|
|
|
|
|
|
|
|
|
The signature for tp_free() is the same:
|
|
|
|
|
|
|
|
|
|
void tp_free(PyObject *object)
|
|
|
|
|
|
|
|
|
|
(In a previous version of this PEP, there was also role reserved
|
|
|
|
|
for the tp_clear() slot. This turned out to be a bad idea.)
|
|
|
|
|
|
|
|
|
|
In order to be usefully subtyped in C, a type must export the
|
2001-06-13 17:48:31 -04:00
|
|
|
|
structure declaration for its instances through a header file, as
|
|
|
|
|
it is needed in order to derive a subtype. The type object for
|
|
|
|
|
the base type must also be exported.
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
2001-07-10 13:11:19 -04:00
|
|
|
|
If the base type has a type-checking macro (like PyDict_Check()),
|
2001-07-10 16:01:52 -04:00
|
|
|
|
this macro should be made to recognize subtypes. This can be done
|
|
|
|
|
by using the new PyObject_TypeCheck(object, type) macro, which
|
|
|
|
|
calls a function that follows the base class links.
|
|
|
|
|
|
|
|
|
|
The PyObject_TypeCheck() macro contains a slight optimization: it
|
|
|
|
|
first compares object->ob_type directly to the type argument, and
|
|
|
|
|
if this is a match, bypasses the function call. This should make
|
|
|
|
|
it fast enough for most situations.
|
|
|
|
|
|
|
|
|
|
Note that this change in the type-checking macro means that C
|
|
|
|
|
functions that require an instance of the base type may be invoked
|
|
|
|
|
with instances of the derived type. Before enabling subtyping of
|
|
|
|
|
a particular type, its code should be checked to make sure that
|
|
|
|
|
this won't break anything.
|
2001-06-13 17:48:31 -04:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Creating a subtype of a built-in type in C
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
2001-07-10 16:01:52 -04:00
|
|
|
|
The simplest form of subtyping is subtyping in C. It is the
|
|
|
|
|
simplest form because we can require the C code to be aware of
|
|
|
|
|
some of the problems, and it's acceptable for C code that doesn't
|
|
|
|
|
follow the rules to dump core. For added simplicity, it is
|
|
|
|
|
limited to single inheritance.
|
|
|
|
|
|
2001-06-14 09:37:45 -04:00
|
|
|
|
Let's assume we're deriving from a mutable base type whose
|
|
|
|
|
tp_itemsize is zero. The subtype code is not GC-aware, although
|
|
|
|
|
it may inherit GC-awareness from the base type (this is
|
|
|
|
|
automatic). The base type's allocation uses the standard heap.
|
|
|
|
|
|
2001-05-14 21:36:46 -04:00
|
|
|
|
The derived type begins by declaring a type structure which
|
|
|
|
|
contains the base type's structure. For example, here's the type
|
|
|
|
|
structure for a subtype of the built-in list type:
|
|
|
|
|
|
|
|
|
|
typedef struct {
|
|
|
|
|
PyListObject list;
|
|
|
|
|
int state;
|
|
|
|
|
} spamlistobject;
|
|
|
|
|
|
2001-07-10 16:01:52 -04:00
|
|
|
|
Note that the base type structure member (here PyListObject) must
|
|
|
|
|
be the first member of the structure; any following members are
|
|
|
|
|
additions. Also note that the base type is not referenced via a
|
|
|
|
|
pointer; the actual contents of its structure must be included!
|
|
|
|
|
(The goal is for the memory lay out of the beginning of the
|
|
|
|
|
subtype instance to be the same as that of the base type
|
2001-05-14 21:36:46 -04:00
|
|
|
|
instance.)
|
|
|
|
|
|
|
|
|
|
Next, the derived type must declare a type object and initialize
|
|
|
|
|
it. Most of the slots in the type object may be initialized to
|
|
|
|
|
zero, which is a signal that the base type slot must be copied
|
2001-07-10 16:01:52 -04:00
|
|
|
|
into it. Some slots that must be initialized properly:
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
2001-06-14 09:37:45 -04:00
|
|
|
|
- The object header must be filled in as usual; the type should be
|
|
|
|
|
&PyType_Type.
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
2001-07-10 16:01:52 -04:00
|
|
|
|
- The tp_basicsize slot must be set to the size of the subtype
|
2001-06-14 09:37:45 -04:00
|
|
|
|
instance struct (in the above example: sizeof(spamlistobject)).
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
2001-07-10 16:01:52 -04:00
|
|
|
|
- The tp_base slot must be set to the address of the base type's
|
2001-06-14 09:37:45 -04:00
|
|
|
|
type object.
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
2001-07-10 16:01:52 -04:00
|
|
|
|
- If the derived slot defines any pointer members, the tp_dealloc
|
2001-06-14 09:37:45 -04:00
|
|
|
|
slot function requires special attention, see below; otherwise,
|
|
|
|
|
it can be set to zero, to inherit the base type's deallocation
|
|
|
|
|
function.
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
2001-07-10 16:01:52 -04:00
|
|
|
|
- The tp_flags slot must be set to the usual Py_TPFLAGS_DEFAULT
|
2001-06-14 09:37:45 -04:00
|
|
|
|
value.
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
2001-07-10 16:01:52 -04:00
|
|
|
|
- The tp_name slot must be set; it is recommended to set tp_doc
|
2001-06-14 09:37:45 -04:00
|
|
|
|
as well (these are not inherited).
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
2001-07-10 16:01:52 -04:00
|
|
|
|
If the subtype defines no additional structure members (it only
|
|
|
|
|
defines new behavior, no new data), the tp_basicsize and the
|
|
|
|
|
tp_dealloc slots may be left set to zero.
|
2001-06-14 09:37:45 -04:00
|
|
|
|
|
|
|
|
|
The subtype's tp_dealloc slot deserves special attention. If the
|
2001-07-10 16:01:52 -04:00
|
|
|
|
derived type defines no additional pointer members that need to be
|
2001-06-14 09:37:45 -04:00
|
|
|
|
DECREF'ed or freed when the object is deallocated, it can be set
|
2001-07-10 16:01:52 -04:00
|
|
|
|
to zero. Otherwise, the subtype's tp_dealloc() function must call
|
|
|
|
|
Py_XDECREF() for any PyObject * members and the correct memory
|
2001-06-14 09:37:45 -04:00
|
|
|
|
freeing function for any other pointers it owns, and then call the
|
2001-07-10 16:01:52 -04:00
|
|
|
|
base class's tp_dealloc() slot. This call has to be made via the
|
|
|
|
|
base type's type structure, for example, when deriving from the
|
2001-07-10 13:11:19 -04:00
|
|
|
|
standard list type:
|
2001-06-14 09:37:45 -04:00
|
|
|
|
|
|
|
|
|
PyList_Type.tp_dealloc(self);
|
|
|
|
|
|
2001-07-10 16:01:52 -04:00
|
|
|
|
If the subtype wants to use a different allocation heap than the
|
|
|
|
|
base type, the subtype must override both the tp_alloc() and the
|
|
|
|
|
tp_free() slots. These will be called by the base class's
|
|
|
|
|
tp_new() and tp_dealloc() slots, respectively.
|
|
|
|
|
|
|
|
|
|
In order to complete the initialization of the type,
|
|
|
|
|
PyType_InitDict() must be called. This replaces slots initialized
|
|
|
|
|
to zero in the subtype with the value of the corresponding base
|
|
|
|
|
type slots. (It also fills in tp_dict, the type's dictionary, and
|
|
|
|
|
does various other initializations necessary for type objects.)
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
|
|
|
|
A subtype is not usable until PyType_InitDict() is called for it;
|
|
|
|
|
this is best done during module initialization, assuming the
|
|
|
|
|
subtype belongs to a module. An alternative for subtypes added to
|
|
|
|
|
the Python core (which don't live in a particular module) would be
|
|
|
|
|
to initialize the subtype in their constructor function. It is
|
2001-07-10 16:01:52 -04:00
|
|
|
|
allowed to call PyType_InitDict() more than once; the second and
|
2001-05-14 21:36:46 -04:00
|
|
|
|
further calls have no effect. In order to avoid unnecessary
|
|
|
|
|
calls, a test for tp_dict==NULL can be made.
|
|
|
|
|
|
2001-07-10 16:01:52 -04:00
|
|
|
|
(During initialization of the Python interpreter, some types are
|
|
|
|
|
actually used before they are initialized. As long as the slots
|
|
|
|
|
that are actually needed are initialized, especially tp_dealloc,
|
|
|
|
|
this works, but it is fragile and not recommended as a general
|
|
|
|
|
practice.)
|
|
|
|
|
|
|
|
|
|
To create a subtype instance, the subtype's tp_new() slot is
|
|
|
|
|
called. This should first call the base type's tp_new() slot and
|
|
|
|
|
then initialize the subtype's additional data members. To further
|
|
|
|
|
initialize the instance, the tp_init() slot is typically called.
|
|
|
|
|
Note that the tp_new() slot should *not* call the tp_init() slot;
|
|
|
|
|
this is up to tp_new()'s caller (typically a factory function).
|
|
|
|
|
There are circumstances where it is appropriate not to call
|
|
|
|
|
tp_init().
|
|
|
|
|
|
|
|
|
|
If a subtype defines a tp_init() slot, the tp_init() slot should
|
|
|
|
|
normally first call the base type's tp_init() slot.
|
|
|
|
|
|
|
|
|
|
(XXX There should be a paragraph or two about argument passing
|
|
|
|
|
here.)
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Subtyping in Python
|
|
|
|
|
|
|
|
|
|
The next step is to allow subtyping of selected built-in types
|
|
|
|
|
through a class statement in Python. Limiting ourselves to single
|
|
|
|
|
inheritance for now, here is what happens for a simple class
|
|
|
|
|
statement:
|
|
|
|
|
|
|
|
|
|
class C(B):
|
|
|
|
|
var1 = 1
|
|
|
|
|
def method1(self): pass
|
|
|
|
|
# etc.
|
|
|
|
|
|
|
|
|
|
The body of the class statement is executes in a fresh environment
|
|
|
|
|
(basically, a new dictionary used as local namespace), and then C
|
|
|
|
|
is created. The following explains how C is created.
|
|
|
|
|
|
|
|
|
|
Assume B is a type object. Since type objects are objects, and
|
|
|
|
|
every object has a type, B has a type. B's type is accessible via
|
|
|
|
|
type(B) or B.__class__ (the latter notation is new for types; it
|
2001-07-05 15:00:02 -04:00
|
|
|
|
is introduced in PEP 252). Let's say B's type is M (for
|
2001-05-14 21:36:46 -04:00
|
|
|
|
Metatype). The class statement will create a new type, C. Since
|
|
|
|
|
C will be a type object just like B, we view the creation of C as
|
|
|
|
|
an instantiation of the metatype, M. The information that needs
|
|
|
|
|
to be provided for the creation of C is: its name (in this example
|
|
|
|
|
the string "C"); the list of base classes (a singleton tuple
|
|
|
|
|
containing B); and the results of executing the class body, in the
|
2001-07-10 13:11:19 -04:00
|
|
|
|
form of a dictionary (for example {"var1": 1, "method1": <function
|
2001-06-14 16:48:43 -04:00
|
|
|
|
method1 at ...>, ...}).
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
2001-06-14 16:48:43 -04:00
|
|
|
|
I propose to rig the class statement to make the following call:
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
2001-06-14 16:48:43 -04:00
|
|
|
|
C = M("C", (B,), dict)
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
|
|
|
|
(where dict is the dictionary resulting from execution of the
|
2001-06-14 16:48:43 -04:00
|
|
|
|
class body). In other words, the metatype (M) is called.
|
|
|
|
|
|
|
|
|
|
Note that even though we currently require there to be exactly one
|
|
|
|
|
base class, we still pass in a (singleton) sequence of base
|
|
|
|
|
classes; this makes it possible to support multiple inheritance
|
|
|
|
|
later (or for types with a different metaclass!) without changing
|
|
|
|
|
this interface.
|
|
|
|
|
|
|
|
|
|
In current Python, this is called the "Don Beaudry hook" after its
|
|
|
|
|
inventor; it is an exceptional case that is only invoked when a
|
|
|
|
|
base class is not a regular class. For a regular base class (or
|
|
|
|
|
when no base class is specified), current Python calls
|
|
|
|
|
PyClass_New(), the C level factory function for classes, directly.
|
|
|
|
|
I propose to change this so that Python *always* determines a
|
|
|
|
|
metaclass and calls it as given above. When one or more bases are
|
|
|
|
|
given, the type of the first base is used as the metatype;
|
|
|
|
|
when no base class is given, a default metaclass is chosen. By
|
|
|
|
|
setting the default metaclass to PyClass_Type, the metatype of
|
|
|
|
|
"classic" classes, the classic behavior of the class statement is
|
|
|
|
|
retained.
|
|
|
|
|
|
|
|
|
|
There are two further refinements here. First, a useful feature
|
|
|
|
|
is to be able to specify a metatype directly. If the class
|
|
|
|
|
statement defines a variable __metaclass__, that is the metatype
|
|
|
|
|
to call.
|
|
|
|
|
|
|
|
|
|
Second, with multiple bases, not all bases need to have the same
|
|
|
|
|
metatype. This is called a metaclass conflict [1]. Some
|
|
|
|
|
metaclass conflicts can be resolved by searching through the set
|
|
|
|
|
of bases for a metatype that derives from all other given
|
|
|
|
|
metatypes. If such a metatype cannot be found, an exception is
|
|
|
|
|
raised and the class statement fails.
|
|
|
|
|
|
|
|
|
|
This conflict resultion can be implemented in the metatypes
|
|
|
|
|
itself: the class statement just calls the metatype of the first
|
|
|
|
|
base, and this metatype's constructor looks for the most derived
|
|
|
|
|
metatype. If that is itself, it proceeds; otherwise, it calls
|
|
|
|
|
that metatype's constructor. (Ultimate flexibility: another
|
|
|
|
|
metatype might choose to require that all bases have the same
|
|
|
|
|
metatype, or that there's only one base class, or whatever.)
|
|
|
|
|
|
|
|
|
|
(Theoretically, it might be possible to automatically derive a new
|
|
|
|
|
metatype that is a subtype of all given metatypes; but since it is
|
|
|
|
|
questionable how conflicting method definitions of the various
|
|
|
|
|
metatypes should be merged, I don't think this is useful or
|
|
|
|
|
feasible. Should the need arise, the user can derive such a
|
|
|
|
|
metatype and specify it using the __metaclass__ variable. It is
|
|
|
|
|
also possible to have a new metatype that does this.)
|
|
|
|
|
|
|
|
|
|
HIRO
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
|
|
|
|
Note that calling M requires that M itself has a type: the
|
|
|
|
|
meta-metatype. In the current implementation, I have introduced a
|
|
|
|
|
new type object for this purpose, named turtle because of my
|
|
|
|
|
fondness of the phrase "turtles all the way down". However I now
|
|
|
|
|
believe that it would be better if M were its own metatype, just
|
|
|
|
|
like before. This can be accomplished by making M's tp_call slot
|
|
|
|
|
slightly more flexible.
|
|
|
|
|
|
|
|
|
|
In any case, the work for creating C is done by M's tp_construct
|
|
|
|
|
slot. It allocates space for an "extended" type structure, which
|
|
|
|
|
contains space for: the type object; the auxiliary structures
|
|
|
|
|
(as_sequence etc.); the string object containing the type name (to
|
|
|
|
|
ensure that this object isn't deallocated while the type object is
|
|
|
|
|
still referencing it); and some more auxiliary storage (to be
|
|
|
|
|
described later). It initializes this storage to zeros except for
|
2001-07-10 13:11:19 -04:00
|
|
|
|
a few crucial slots (for example, tp_name is set to point to the
|
|
|
|
|
type name) and then sets the tp_base slot to point to B. Then
|
2001-05-14 21:36:46 -04:00
|
|
|
|
PyType_InitDict() is called to inherit B's slots. Finally, C's
|
|
|
|
|
tp_dict slot is updated with the contents of the namespace
|
|
|
|
|
dictionary (the third argument to the call to M).
|
2001-05-14 09:43:23 -04:00
|
|
|
|
|
|
|
|
|
|
2001-06-11 16:07:37 -04:00
|
|
|
|
Implementation
|
|
|
|
|
|
|
|
|
|
A prototype implementation of this PEP is available from CVS as a
|
|
|
|
|
branch named "descr-branch". To experiment with this
|
|
|
|
|
implementation, proceed to check out Python from CVS according to
|
|
|
|
|
the instructions at http://sourceforge.net/cvs/?group_id=5470 but
|
|
|
|
|
add the arguments "-r descr-branch" to the cvs checkout command.
|
|
|
|
|
(You can also start with an existing checkout and do "cvs update
|
|
|
|
|
-r descr-branch".) For some examples of the features described
|
|
|
|
|
here, see the file Lib/test/test_descr.py and the extension module
|
|
|
|
|
Modules/spam.c.
|
|
|
|
|
|
2001-07-05 15:00:02 -04:00
|
|
|
|
Note: the code in this branch is for PEP 252, PEP 253, and
|
2001-06-11 16:07:37 -04:00
|
|
|
|
pep-254.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
References
|
|
|
|
|
|
|
|
|
|
[1] "Putting Metaclasses to Work", by Ira R. Forman and Scott
|
|
|
|
|
H. Danforth, Addison-Wesley 1999.
|
|
|
|
|
(http://www.aw.com/product/0,2627,0201433052,00.html)
|
|
|
|
|
|
|
|
|
|
|
2001-05-14 09:43:23 -04:00
|
|
|
|
Copyright
|
|
|
|
|
|
|
|
|
|
This document has been placed in the public domain.
|
|
|
|
|
|
2001-06-13 17:48:31 -04:00
|
|
|
|
|
|
|
|
|
Junk text (to be reused somewhere above)
|
|
|
|
|
|
|
|
|
|
The deallocation mechanism chosen should match the allocation
|
|
|
|
|
mechanism: an allocation policy should prescribe both the
|
|
|
|
|
allocation and deallocation mechanism. And again, planning ahead
|
|
|
|
|
for subtyping would be nice. But the available mechanisms are
|
|
|
|
|
different. The deallocation function has always been part of the
|
|
|
|
|
type structure, as tp_dealloc, which combines the
|
|
|
|
|
"uninitialization" with deallocation. This was good enough for
|
|
|
|
|
the traditional situation, where it matched the combined
|
|
|
|
|
allocation and initialization of the creation function. But now
|
|
|
|
|
imagine a type whose creation function uses a special free list
|
|
|
|
|
for allocation. It's deallocation function puts the object's
|
|
|
|
|
memory back on the same free list. But when allocation and
|
|
|
|
|
creation are separate, the object may have been allocated from the
|
|
|
|
|
regular heap, and it would be wrong (in some cases disastrous) if
|
|
|
|
|
it were placed on the free list by the deallocation function.
|
|
|
|
|
|
|
|
|
|
A solution would be for the tp_construct function to somehow mark
|
|
|
|
|
whether the object was allocated from the special free list, so
|
|
|
|
|
that the tp_dealloc function can choose the right deallocation
|
|
|
|
|
method (assuming that the only two alternatives are a special free
|
|
|
|
|
list or the regular heap). A variant that doesn't require space
|
|
|
|
|
for an allocation flag bit would be to have two type objects,
|
|
|
|
|
identical in the contents of all their slots except for their
|
|
|
|
|
deallocation slot. But this requires that all type-checking code
|
2001-07-10 13:11:19 -04:00
|
|
|
|
(like PyDict_Check()) recognizes both types. We'll come back to
|
|
|
|
|
this solution in the context of subtyping. Another alternative is
|
|
|
|
|
to require the metatype's tp_call to leave the allocation to the
|
|
|
|
|
tp_construct method, by passing in a NULL pointer. But this
|
2001-06-13 17:48:31 -04:00
|
|
|
|
doesn't work once we allow subtyping.
|
|
|
|
|
|
|
|
|
|
Eventually, when we add any form of subtyping, we'll have to
|
|
|
|
|
separate deallocation from uninitialization. The way to do this
|
|
|
|
|
is to add a separate slot to the type object that does the
|
|
|
|
|
uninitialization without the deallocation. Fortunately, there is
|
|
|
|
|
already such a slot: tp_clear, currently used by the garbage
|
|
|
|
|
collection subsystem. A simple rule makes this slot reusable as
|
|
|
|
|
an uninitialization: for types that support separate allocation
|
|
|
|
|
and initialization, tp_clear must be defined (even if the object
|
|
|
|
|
doesn't support garbage collection) and it must DECREF all
|
|
|
|
|
contained objects and FREE all other memory areas the object owns.
|
|
|
|
|
It must also be reentrant: it must be possible to clear an already
|
|
|
|
|
cleared object. The easiest way to do this is to replace all
|
|
|
|
|
pointers DECREFed or FREEd with NULL pointers.
|
|
|
|
|
|
2001-05-14 09:43:23 -04:00
|
|
|
|
|
|
|
|
|
Local Variables:
|
|
|
|
|
mode: indented-text
|
|
|
|
|
indent-tabs-mode: nil
|
|
|
|
|
End:
|