2001-05-14 09:43:23 -04:00
|
|
|
|
PEP: 253
|
|
|
|
|
Title: Subtyping Built-in Types
|
|
|
|
|
Version: $Revision$
|
|
|
|
|
Author: guido@python.org (Guido van Rossum)
|
|
|
|
|
Status: Draft
|
|
|
|
|
Type: Standards Track
|
|
|
|
|
Python-Version: 2.2
|
|
|
|
|
Created: 14-May-2001
|
|
|
|
|
Post-History:
|
|
|
|
|
|
|
|
|
|
Abstract
|
|
|
|
|
|
2001-06-13 17:48:31 -04:00
|
|
|
|
This PEP proposes additions to the type object API that will allow
|
|
|
|
|
the creation of subtypes of built-in types, in C and in Python.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Introduction
|
2001-05-14 09:43:23 -04:00
|
|
|
|
|
2001-05-14 21:36:46 -04:00
|
|
|
|
Traditionally, types in Python have been created statically, by
|
|
|
|
|
declaring a global variable of type PyTypeObject and initializing
|
|
|
|
|
it with a static initializer. The fields in the type object
|
2001-06-13 17:48:31 -04:00
|
|
|
|
describe all aspects of a Python type that are relevant to the
|
2001-05-14 21:36:46 -04:00
|
|
|
|
Python interpreter. A few fields contain dimensional information
|
|
|
|
|
(e.g. the basic allocation size of instances), others contain
|
|
|
|
|
various flags, but most fields are pointers to functions to
|
|
|
|
|
implement various kinds of behaviors. A NULL pointer means that
|
|
|
|
|
the type does not implement the specific behavior; in that case
|
|
|
|
|
the system may provide a default behavior in that case or raise an
|
|
|
|
|
exception when the behavior is invoked. Some collections of
|
|
|
|
|
functions pointers that are usually defined together are obtained
|
2001-06-13 17:48:31 -04:00
|
|
|
|
indirectly via a pointer to an additional structure containing
|
|
|
|
|
more function pointers.
|
2001-05-14 09:43:23 -04:00
|
|
|
|
|
2001-05-14 21:36:46 -04:00
|
|
|
|
While the details of initializing a PyTypeObject structure haven't
|
2001-06-13 17:48:31 -04:00
|
|
|
|
been documented as such, they are easily gleaned from the examples
|
2001-05-14 21:36:46 -04:00
|
|
|
|
in the source code, and I am assuming that the reader is
|
|
|
|
|
sufficiently familiar with the traditional way of creating new
|
|
|
|
|
Python types in C.
|
|
|
|
|
|
2001-06-11 16:07:37 -04:00
|
|
|
|
This PEP will introduce the following features:
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
2001-06-13 17:48:31 -04:00
|
|
|
|
- a type can be a factory function for its instances
|
|
|
|
|
|
|
|
|
|
- types can be subtyped in C
|
|
|
|
|
|
|
|
|
|
- types can be subtyped in Python with the class statement
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
2001-06-13 17:48:31 -04:00
|
|
|
|
- multiple inheritance from types is supported (insofar as
|
|
|
|
|
practical)
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
2001-06-13 17:48:31 -04:00
|
|
|
|
- the standard coercions functions (int, tuple, str etc.) will be
|
|
|
|
|
redefined to be the corresponding type objects, which serve as
|
|
|
|
|
their own factory functions
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
2001-06-13 17:48:31 -04:00
|
|
|
|
- there will be a standard type hierarchy
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
2001-06-13 17:48:31 -04:00
|
|
|
|
- a class statement can contain a metaclass declaration,
|
|
|
|
|
specifying the metaclass to be used to create the new class
|
2001-06-11 16:07:37 -04:00
|
|
|
|
|
2001-06-13 17:48:31 -04:00
|
|
|
|
- a class statement can contain a slots declaration, specifying
|
|
|
|
|
the specific names of the instance variables supported
|
2001-06-11 16:07:37 -04:00
|
|
|
|
|
|
|
|
|
This PEP builds on pep-0252, which adds standard introspection to
|
2001-06-13 17:48:31 -04:00
|
|
|
|
types; e.g., when the type object defines the tp_hash slot, the
|
|
|
|
|
type object has a __hash__ method. pep-0252 also adds a
|
|
|
|
|
dictionary to type objects which contains all methods. At the
|
|
|
|
|
Python level, this dictionary is read-only for built-in types; at
|
|
|
|
|
the C level, it is accessible directly (but it should not be
|
|
|
|
|
modified except as part of initialization).
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
2001-06-13 17:48:31 -04:00
|
|
|
|
For binary compatibility, a flag bit in the tp_flags slot
|
|
|
|
|
indicates the existence of the various new slots in the type
|
|
|
|
|
object introduced below. Types that don't have the
|
|
|
|
|
Py_TPFLAGS_HAVE_CLASS bit set in their tp_flags field are assumed
|
|
|
|
|
to have NULL values for all the subtyping slots. (Warning: the
|
|
|
|
|
current implementation prototype is not yet consistent in its
|
|
|
|
|
checking of this flag bit. This should be fixed before the final
|
|
|
|
|
release.)
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
2001-06-14 16:48:43 -04:00
|
|
|
|
In current Python, a distinction is made between types and
|
|
|
|
|
classes. This PEP together with pep-0254 will remove that
|
|
|
|
|
distinction. However, for backwards compatibility there will
|
|
|
|
|
probably remain a bit of a distinction for years to come, and
|
|
|
|
|
without pep-0254, the distinction is still large: types ultimately
|
|
|
|
|
have a built-in type as a base class, while classes ultimately
|
|
|
|
|
derive from a user-defined class. Therefore, in the rest of this
|
|
|
|
|
PEP, I will use the word type whenever I can -- including base
|
|
|
|
|
type or supertype, derived type or subtype, and metatype.
|
|
|
|
|
However, sometimes the terminology necessarily blends, e.g. an
|
|
|
|
|
object's type is given by its __class__ attribute, and subtyping
|
|
|
|
|
in Python is spelled with a class statement. If further
|
|
|
|
|
distinction is necessary, user-defined classes can be referred to
|
|
|
|
|
as "classic" classes.
|
|
|
|
|
|
2001-06-13 17:48:31 -04:00
|
|
|
|
|
|
|
|
|
About metatypes
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
|
|
|
|
Inevitably the following discussion will come to mention metatypes
|
|
|
|
|
(or metaclasses). Metatypes are nothing new in Python: Python has
|
|
|
|
|
always been able to talk about the type of a type:
|
|
|
|
|
|
|
|
|
|
>>> a = 0
|
|
|
|
|
>>> type(a)
|
|
|
|
|
<type 'int'>
|
|
|
|
|
>>> type(type(a))
|
|
|
|
|
<type 'type'>
|
|
|
|
|
>>> type(type(type(a)))
|
|
|
|
|
<type 'type'>
|
|
|
|
|
>>>
|
|
|
|
|
|
|
|
|
|
In this example, type(a) is a "regular" type, and type(type(a)) is
|
|
|
|
|
a metatype. While as distributed all types have the same metatype
|
2001-06-13 17:48:31 -04:00
|
|
|
|
(PyType_Type, which is also its own metatype), this is not a
|
|
|
|
|
requirement, and in fact a useful and relevant 3rd party extension
|
|
|
|
|
(ExtensionClasses by Jim Fulton) creates an additional metatype.
|
|
|
|
|
|
|
|
|
|
A related feature is the "Don Beaudry hook", which says that if a
|
|
|
|
|
metatype is callable, its instances (which are regular types) can
|
|
|
|
|
be subclassed (really subtyped) using a Python class statement.
|
|
|
|
|
I will use this rule to support subtyping of built-in types, and
|
|
|
|
|
in fact it greatly simplifies the logic of class creation to
|
|
|
|
|
always simply call the metatype. When no base class is specified,
|
|
|
|
|
a default metatype is called -- the default metatype is the
|
|
|
|
|
"ClassType" object, so the class statement will behave as before
|
|
|
|
|
in the normal case.
|
|
|
|
|
|
|
|
|
|
Python uses the concept of metatypes or metaclasses in a different
|
|
|
|
|
way than Smalltalk. In Smalltalk-80, there is a hierarchy of
|
|
|
|
|
metaclasses that mirrors the hierarchy of regular classes,
|
|
|
|
|
metaclasses map 1-1 to classes (except for some funny business at
|
|
|
|
|
the root of the hierarchy), and each class statement creates both
|
|
|
|
|
a regular class and its metaclass, putting class methods in the
|
|
|
|
|
metaclass and instance methods in the regular class.
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
|
|
|
|
Nice though this may be in the context of Smalltalk, it's not
|
|
|
|
|
compatible with the traditional use of metatypes in Python, and I
|
|
|
|
|
prefer to continue in the Python way. This means that Python
|
|
|
|
|
metatypes are typically written in C, and may be shared between
|
|
|
|
|
many regular types. (It will be possible to subtype metatypes in
|
|
|
|
|
Python, so it won't be absolutely necessary to write C in order to
|
|
|
|
|
use metatypes; but the power of Python metatypes will be limited,
|
|
|
|
|
e.g. Python code will never be allowed to allocate raw memory and
|
|
|
|
|
initialize it at will.)
|
|
|
|
|
|
2001-06-13 17:48:31 -04:00
|
|
|
|
Metatypes determine various *policies* for types, e.g. what
|
|
|
|
|
happens when a type is called, how dynamic types are (whether a
|
|
|
|
|
type's __dict__ can be modified after it is created), what the
|
|
|
|
|
method resolution order is, how instance attributes are looked
|
|
|
|
|
up, and so on.
|
|
|
|
|
|
|
|
|
|
I'll argue that left-to-right depth-first is not the best
|
|
|
|
|
solution when you want to get the most use from multiple
|
|
|
|
|
inheritance.
|
|
|
|
|
|
|
|
|
|
I'll argue that with multiple inheritance, the metatype of the
|
|
|
|
|
subtype must be a descendant of the metatypes of all base types.
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
2001-06-13 17:48:31 -04:00
|
|
|
|
I'll come back to metatypes later.
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
2001-06-13 17:48:31 -04:00
|
|
|
|
|
|
|
|
|
Making a type a factory for its instances
|
|
|
|
|
|
|
|
|
|
Traditionally, for each type there is at least one C factory
|
|
|
|
|
function that creates instances of the type (PyTuple_New(),
|
|
|
|
|
PyInt_FromLong() and so on). These factory functions take care of
|
2001-05-14 21:36:46 -04:00
|
|
|
|
both allocating memory for the object and initializing that
|
2001-06-13 17:48:31 -04:00
|
|
|
|
memory. As of Python 2.0, they also have to interface with the
|
2001-05-14 21:36:46 -04:00
|
|
|
|
garbage collection subsystem, if the type chooses to participate
|
|
|
|
|
in garbage collection (which is optional, but strongly recommended
|
|
|
|
|
for so-called "container" types: types that may contain arbitrary
|
|
|
|
|
references to other objects, and hence may participate in
|
|
|
|
|
reference cycles).
|
|
|
|
|
|
2001-06-13 17:48:31 -04:00
|
|
|
|
In this proposal, type objects can be factory functions for their
|
|
|
|
|
instances, making the types directly callable from Python. This
|
|
|
|
|
mimics the way classes are instantiated. Of course, the C APIs
|
|
|
|
|
for creating instances of various built-in types will remain valid
|
|
|
|
|
and probably the most common; and not all types will become their
|
|
|
|
|
own factory functions.
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
2001-06-13 17:48:31 -04:00
|
|
|
|
The type object has a new slot, tp_new, which can act as a factory
|
|
|
|
|
for instances of the type. Types are made callable by providing a
|
|
|
|
|
tp_call slot in PyType_Type (the metatype); the slot
|
|
|
|
|
implementation function looks for the tp_new slot of the type that
|
|
|
|
|
is being called.
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
2001-06-13 17:48:31 -04:00
|
|
|
|
If the type's tp_new slot is NULL, an exception is raised.
|
|
|
|
|
Otherwise, the tp_new slot is called. The signature for the
|
|
|
|
|
tp_new slot is
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
2001-06-13 17:48:31 -04:00
|
|
|
|
PyObject *tp_new(PyTypeObject *type,
|
|
|
|
|
PyObject *args,
|
|
|
|
|
PyObject *kwds)
|
|
|
|
|
|
|
|
|
|
where 'type' is the type whose tp_new slot is called, and 'args'
|
|
|
|
|
and 'kwds' are the sequential and keyword arguments to the call,
|
|
|
|
|
passed unchanged from tp_call. (The 'type' argument is used in
|
|
|
|
|
combination with inheritance, see below.)
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
2001-06-13 17:48:31 -04:00
|
|
|
|
There are no constraints on the object type that is returned,
|
|
|
|
|
although by convention it should be an instance of the given
|
|
|
|
|
type. It is not necessary that a new object is returned; a
|
|
|
|
|
reference to an existing object is fine too. The return value
|
|
|
|
|
should always be a new reference, owned by the caller.
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
2001-06-13 17:48:31 -04:00
|
|
|
|
|
|
|
|
|
Requirements for a type to allow subtyping
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
|
|
|
|
The simplest form of subtyping is subtyping in C. It is the
|
|
|
|
|
simplest form because we can require the C code to be aware of the
|
|
|
|
|
various problems, and it's acceptable for C code that doesn't
|
2001-06-13 17:48:31 -04:00
|
|
|
|
follow the rules to dump core. For added simplicity, it is
|
|
|
|
|
limited to single inheritance.
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
|
|
|
|
The idea behind subtyping is very similar to that of single
|
|
|
|
|
inheritance in C++. A base type is described by a structure
|
|
|
|
|
declaration plus a type object. A derived type can extend the
|
|
|
|
|
structure (but must leave the names, order and type of the fields
|
|
|
|
|
of the base structure unchanged) and can override certain slots in
|
|
|
|
|
the type object, leaving others the same.
|
|
|
|
|
|
2001-06-13 17:48:31 -04:00
|
|
|
|
Most issues have to do with construction and destruction of
|
|
|
|
|
instances of derived types.
|
|
|
|
|
|
|
|
|
|
Creation of a new object is separated into allocation and
|
|
|
|
|
initialization: allocation allocates the memory, and
|
|
|
|
|
initialization fill it with appropriate initial values. The
|
|
|
|
|
separation is needed for the convenience of subtypes.
|
|
|
|
|
Instantiation of a subtype goes as follows:
|
|
|
|
|
|
|
|
|
|
1. allocate memory for the whole (subtype) instance
|
|
|
|
|
2. initialize the base type
|
|
|
|
|
3. initialize the subtype's instance variables
|
|
|
|
|
|
|
|
|
|
If allocation and initialization were done by the same function,
|
|
|
|
|
you would need a way to tell the base type's constructor to
|
|
|
|
|
allocate additional memory for the subtype's instance variables,
|
|
|
|
|
and there would be no way to change the allocation method for a
|
|
|
|
|
subtype (without giving up on calling the base type to initialize
|
|
|
|
|
its part of the instance structure).
|
|
|
|
|
|
|
|
|
|
A similar reasoning applies to destruction: if a subtype changes
|
|
|
|
|
the instance allocator (e.g. to use a different heap), it must
|
|
|
|
|
also change the instance deallocator; but it must still call on
|
|
|
|
|
the base type's destructor to DECREF the base type's instance
|
|
|
|
|
variables.
|
|
|
|
|
|
|
|
|
|
In this proposal, I assign stricter meanings to two existing
|
|
|
|
|
slots for deallocation and deinitialization, and I add two new
|
|
|
|
|
slots for allocation and initialization.
|
|
|
|
|
|
|
|
|
|
The tp_clear slot gets the new task of deinitializing an object so
|
|
|
|
|
that all that remains to be done is free its memory. Originally,
|
|
|
|
|
all it had to do was clear object references. The difference is
|
|
|
|
|
subtle: the list and dictionary objects contain references to an
|
|
|
|
|
additional heap-allocated piece of memory that isn't freed by
|
|
|
|
|
tp_clear in Python 2.1, but which must be freed by tp_clear under
|
|
|
|
|
this proposal. It should be safe to call tp_clear repeatedly on
|
|
|
|
|
the same object. If an object contains no references to other
|
|
|
|
|
objects or heap-allocated memory, the tp_clear slot may be NULL.
|
|
|
|
|
|
|
|
|
|
The only additional requirement for the tp_dealloc slot is that it
|
|
|
|
|
should do the right thing whether or not tp_clear has been called.
|
|
|
|
|
|
|
|
|
|
The new slots are tp_alloc for allocation and tp_init for
|
|
|
|
|
initialization. Their signatures:
|
|
|
|
|
|
|
|
|
|
PyObject *tp_alloc(PyTypeObject *type,
|
|
|
|
|
PyObject *args,
|
|
|
|
|
PyObject *kwds)
|
|
|
|
|
|
|
|
|
|
int tp_init(PyObject *self,
|
|
|
|
|
PyObject *args,
|
|
|
|
|
PyObject *kwds)
|
|
|
|
|
|
2001-06-14 16:48:43 -04:00
|
|
|
|
[XXX We'll have to rename tp_alloc to something else, because in
|
|
|
|
|
debug mode there's already a tp_alloc field.]
|
|
|
|
|
|
2001-06-13 17:48:31 -04:00
|
|
|
|
The arguments for tp_alloc are the same as for tp_new, described
|
|
|
|
|
above. The arguments for tp_init are the same except that the
|
|
|
|
|
first argument is replaced with the instance to be initialized.
|
|
|
|
|
Its return value is 0 for success or -1 for failure.
|
|
|
|
|
|
|
|
|
|
It is possible that tp_init is called more than once or not at
|
|
|
|
|
all. The implementation should allow this usage. The object may
|
|
|
|
|
be non-functional until tp_init is called, and a second call to
|
|
|
|
|
tp_init may raise an exception, but it should not be possible to
|
|
|
|
|
cause a core dump or memory leakage this way.
|
|
|
|
|
|
|
|
|
|
Because tp_init is in a sense optional, tp_alloc is required to do
|
2001-06-14 16:48:43 -04:00
|
|
|
|
*some* initialization of the object. It must initialize ob_refcnt
|
|
|
|
|
to 1 and ob_type to its type argument. It should zero out the
|
|
|
|
|
rest of the object.
|
2001-06-13 17:48:31 -04:00
|
|
|
|
|
|
|
|
|
The constructor arguments are passed to tp_alloc so that for
|
|
|
|
|
variable-size objects (like tuples and strings) it knows to
|
|
|
|
|
allocate the right amount of memory.
|
|
|
|
|
|
|
|
|
|
For immutable types, tp_alloc may have to do the full
|
|
|
|
|
initialization; otherwise, different calls to tp_init might cause
|
|
|
|
|
an immutable object to be modified, which is considered a grave
|
|
|
|
|
offense in Python (unlike in Fortran :-).
|
|
|
|
|
|
|
|
|
|
Not every type can serve as a base type. The assumption is made
|
|
|
|
|
that if a type has a non-NULL value in its tp_init slot, it is
|
|
|
|
|
ready to be subclassed; otherwise, it is not, and using it as a
|
|
|
|
|
base class will raise an exception.
|
|
|
|
|
|
|
|
|
|
In order to be usefully subtyped in C, a type must also export the
|
|
|
|
|
structure declaration for its instances through a header file, as
|
|
|
|
|
it is needed in order to derive a subtype. The type object for
|
|
|
|
|
the base type must also be exported.
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
|
|
|
|
If the base type has a type-checking macro (e.g. PyDict_Check()),
|
2001-06-13 17:48:31 -04:00
|
|
|
|
this macro probably should be changed to recognize subtypes. This
|
|
|
|
|
can be done by using the new PyObject_TypeCheck(object, type)
|
|
|
|
|
macro, which calls a function that follows the base class links.
|
|
|
|
|
|
|
|
|
|
(An argument against changing the type-checking macro could be
|
|
|
|
|
that the type check is used frequently and a function call would
|
|
|
|
|
slow things down too much, but I find this hard to believe. One
|
|
|
|
|
could also fear that a subtype might break an invariant assumed by
|
|
|
|
|
the support functions of the base type. Usually it is best to
|
|
|
|
|
change the base type to remove this reliance, at least to the
|
|
|
|
|
point of raising an exception rather than dumping core when the
|
|
|
|
|
invariant is broken.)
|
|
|
|
|
|
|
|
|
|
Here are the inteactions between, tp_alloc, tp_clear, tp_dealloc
|
|
|
|
|
and subtypes; all assuming that the base type defines tp_init
|
|
|
|
|
(otherwise it cannot be subtyped anyway):
|
|
|
|
|
|
|
|
|
|
- If the base type's allocation scheme doesn't use the standard
|
|
|
|
|
heap, it should not define tp_alloc. This is a signal for the
|
|
|
|
|
subclass to provide its own tp_alloc *and* tp_dealloc
|
|
|
|
|
implementation (probably using the standard heap).
|
|
|
|
|
|
|
|
|
|
- If the base type's tp_dealloc does anything besides calling
|
|
|
|
|
PyObject_DEL() (typically, calling Py_XDECREF() on contained
|
|
|
|
|
objects or freeing dependent memory blocks), it should define a
|
|
|
|
|
tp_clear that does the same without calling PyObject_DEL(), and
|
|
|
|
|
which checks for zero pointers before and zeros the pointers
|
|
|
|
|
afterwards, so that calling tp_clear more than once or calling
|
|
|
|
|
tp_dealloc after tp_clear will not attempt to DECREF or free the
|
|
|
|
|
same object/memory twice. (It should also be allowed to
|
|
|
|
|
continue using the object after tp_clear -- tp_clear should
|
|
|
|
|
simply reset the object to its pristine state.)
|
|
|
|
|
|
|
|
|
|
- If the derived type overrides tp_alloc, it should also override
|
|
|
|
|
tp_dealloc, and tp_dealloc should call the derived type's
|
|
|
|
|
tp_clear if non-NULL (or its own tp_clear).
|
|
|
|
|
|
|
|
|
|
- If the derived type overrides tp_clear, it should call the base
|
|
|
|
|
type's tp_clear if non-NULL.
|
|
|
|
|
|
|
|
|
|
- If the base type defines tp_init as well as tp_new, its tp_new
|
|
|
|
|
should be inheritable: it should call the tp_alloc and the
|
|
|
|
|
tp_init of the type passed in as its first argument.
|
|
|
|
|
|
|
|
|
|
- If the base type defines tp_init as well as tp_alloc, its
|
|
|
|
|
tp_alloc should be inheritable: it should look in the
|
|
|
|
|
tp_basicsize slot of the type passed in for the amount of memory
|
|
|
|
|
to allocate, and it should initialize all allocated bytes to
|
|
|
|
|
zero.
|
|
|
|
|
|
|
|
|
|
- For types whose tp_itemsize is nonzero, the allocation size used
|
|
|
|
|
in tp_alloc should be tp_basicsize + n*tp_itemsize, rounded up
|
|
|
|
|
to the next integral multiple of sizeof(PyObject *), where n is
|
|
|
|
|
the number of items determined by the arguments to tp_alloc.
|
|
|
|
|
|
|
|
|
|
- Things are further complicated by the garbage collection API.
|
|
|
|
|
This affects tp_basicsize, and the actions to be taken by
|
|
|
|
|
tp_alloc. tp_alloc should look at the Py_TPFLAGS_GC flag bit in
|
|
|
|
|
the tp_flags field of the type passed in, and not assume that
|
|
|
|
|
this is the same as the corresponding bit in the base type. (In
|
|
|
|
|
part, the GC API is at fault; Neil Schemenauer has a patch that
|
|
|
|
|
fixes the API, but it is currently backwards incompatible.)
|
|
|
|
|
|
|
|
|
|
Note: the rules here are very complicated -- probably too
|
|
|
|
|
complicated. It may be better to give up on subtyping immutable
|
|
|
|
|
types, types with custom allocators, and types with variable size
|
|
|
|
|
allocation (such as int, string and tuple) -- then the rules can
|
|
|
|
|
be much simplified because you can assume allocation on the
|
|
|
|
|
standard heap, no requirement beyond zeroing memory in tp_alloc,
|
|
|
|
|
and no variable length allocation.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Creating a subtype of a built-in type in C
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
2001-06-14 09:37:45 -04:00
|
|
|
|
Let's assume we're deriving from a mutable base type whose
|
|
|
|
|
tp_itemsize is zero. The subtype code is not GC-aware, although
|
|
|
|
|
it may inherit GC-awareness from the base type (this is
|
|
|
|
|
automatic). The base type's allocation uses the standard heap.
|
|
|
|
|
|
2001-05-14 21:36:46 -04:00
|
|
|
|
The derived type begins by declaring a type structure which
|
|
|
|
|
contains the base type's structure. For example, here's the type
|
|
|
|
|
structure for a subtype of the built-in list type:
|
|
|
|
|
|
|
|
|
|
typedef struct {
|
|
|
|
|
PyListObject list;
|
|
|
|
|
int state;
|
|
|
|
|
} spamlistobject;
|
|
|
|
|
|
|
|
|
|
Note that the base type structure field (here PyListObject) must
|
|
|
|
|
be the first field in the structure; any following fields are
|
|
|
|
|
extension fields. Also note that the base type is not referenced
|
|
|
|
|
via a pointer; the actual contents of its structure must be
|
|
|
|
|
included! (The goal is for the memory lay out of the beginning of
|
|
|
|
|
the subtype instance to be the same as that of the base type
|
|
|
|
|
instance.)
|
|
|
|
|
|
|
|
|
|
Next, the derived type must declare a type object and initialize
|
|
|
|
|
it. Most of the slots in the type object may be initialized to
|
|
|
|
|
zero, which is a signal that the base type slot must be copied
|
|
|
|
|
into it. Some fields that must be initialized properly:
|
|
|
|
|
|
2001-06-14 09:37:45 -04:00
|
|
|
|
- The object header must be filled in as usual; the type should be
|
|
|
|
|
&PyType_Type.
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
2001-06-14 09:37:45 -04:00
|
|
|
|
- The tp_basicsize field must be set to the size of the subtype
|
|
|
|
|
instance struct (in the above example: sizeof(spamlistobject)).
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
2001-06-14 09:37:45 -04:00
|
|
|
|
- The tp_base field must be set to the address of the base type's
|
|
|
|
|
type object.
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
2001-06-14 09:37:45 -04:00
|
|
|
|
- If the derived slot defines any pointer fields, the tp_dealloc
|
|
|
|
|
slot function requires special attention, see below; otherwise,
|
|
|
|
|
it can be set to zero, to inherit the base type's deallocation
|
|
|
|
|
function.
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
2001-06-14 09:37:45 -04:00
|
|
|
|
- The tp_flags field must be set to the usual Py_TPFLAGS_DEFAULT
|
|
|
|
|
value.
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
2001-06-14 09:37:45 -04:00
|
|
|
|
- The tp_name field must be set; it is recommended to set tp_doc
|
|
|
|
|
as well (these are not inherited).
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
|
|
|
|
Exception: if the subtype defines no additional fields in its
|
|
|
|
|
structure (i.e., it only defines new behavior, no new data), the
|
2001-06-14 09:37:45 -04:00
|
|
|
|
tp_basicsize and the tp_dealloc fields may be set to zero.
|
|
|
|
|
|
|
|
|
|
In order to complete the initialization of the type,
|
2001-05-14 21:36:46 -04:00
|
|
|
|
PyType_InitDict() must be called. This replaces zero slots in the
|
2001-06-14 09:37:45 -04:00
|
|
|
|
subtype with the value of the corresponding base type slots. (It
|
|
|
|
|
also fills in tp_dict, the type's dictionary, and does various
|
|
|
|
|
other initializations necessary for type objects.)
|
|
|
|
|
|
|
|
|
|
The subtype's tp_dealloc slot deserves special attention. If the
|
|
|
|
|
derived type defines no additional pointers that need to be
|
|
|
|
|
DECREF'ed or freed when the object is deallocated, it can be set
|
|
|
|
|
to zero. Otherwise, the subtype's deallocation function must call
|
|
|
|
|
Py_XDECREF() for any PyObject * fields and the correct memory
|
|
|
|
|
freeing function for any other pointers it owns, and then call the
|
|
|
|
|
base class's tp_dealloc slot. Because deallocation functions
|
|
|
|
|
typically are not exported, this call has to be made via the base
|
|
|
|
|
type's type structure, e.g., when deriving from the standard list
|
|
|
|
|
type:
|
|
|
|
|
|
|
|
|
|
PyList_Type.tp_dealloc(self);
|
|
|
|
|
|
|
|
|
|
(If the subtype uses a different allocation heap than the base
|
|
|
|
|
type, the subtype must call the base type's tp_clear() slot
|
|
|
|
|
instead, followed by a call to free the object's memory from the
|
|
|
|
|
appropriate heap, e.g. PyObject_DEL(self) if the subtype uses the
|
|
|
|
|
standard heap. But in this case subtyping is not recommended.)
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
|
|
|
|
A subtype is not usable until PyType_InitDict() is called for it;
|
|
|
|
|
this is best done during module initialization, assuming the
|
|
|
|
|
subtype belongs to a module. An alternative for subtypes added to
|
|
|
|
|
the Python core (which don't live in a particular module) would be
|
|
|
|
|
to initialize the subtype in their constructor function. It is
|
|
|
|
|
allowed to call PyType_InitDict() more than once, the second and
|
|
|
|
|
further calls have no effect. In order to avoid unnecessary
|
|
|
|
|
calls, a test for tp_dict==NULL can be made.
|
|
|
|
|
|
2001-06-14 09:37:45 -04:00
|
|
|
|
To create a subtype instance, the base type's tp_alloc slot must
|
|
|
|
|
be called with the subtype as its first argument. Then, if the
|
|
|
|
|
base type has a tp_init slot, that must be called to initialize
|
|
|
|
|
the base portion of the instance; finally the subtype's own fields
|
|
|
|
|
must be initialized. After allocation, the initialization can
|
|
|
|
|
also be done by calling the subtype's tp_init slot, assuming this
|
|
|
|
|
correctly calls its base type's tp_init slot.
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Subtyping in Python
|
|
|
|
|
|
|
|
|
|
The next step is to allow subtyping of selected built-in types
|
|
|
|
|
through a class statement in Python. Limiting ourselves to single
|
|
|
|
|
inheritance for now, here is what happens for a simple class
|
|
|
|
|
statement:
|
|
|
|
|
|
|
|
|
|
class C(B):
|
|
|
|
|
var1 = 1
|
|
|
|
|
def method1(self): pass
|
|
|
|
|
# etc.
|
|
|
|
|
|
|
|
|
|
The body of the class statement is executes in a fresh environment
|
|
|
|
|
(basically, a new dictionary used as local namespace), and then C
|
|
|
|
|
is created. The following explains how C is created.
|
|
|
|
|
|
|
|
|
|
Assume B is a type object. Since type objects are objects, and
|
|
|
|
|
every object has a type, B has a type. B's type is accessible via
|
|
|
|
|
type(B) or B.__class__ (the latter notation is new for types; it
|
2001-06-11 16:07:37 -04:00
|
|
|
|
is introduced in pep-0252). Let's say B's type is M (for
|
2001-05-14 21:36:46 -04:00
|
|
|
|
Metatype). The class statement will create a new type, C. Since
|
|
|
|
|
C will be a type object just like B, we view the creation of C as
|
|
|
|
|
an instantiation of the metatype, M. The information that needs
|
|
|
|
|
to be provided for the creation of C is: its name (in this example
|
|
|
|
|
the string "C"); the list of base classes (a singleton tuple
|
|
|
|
|
containing B); and the results of executing the class body, in the
|
2001-06-14 16:48:43 -04:00
|
|
|
|
form of a dictionary (e.g. {"var1": 1, "method1": <function
|
|
|
|
|
method1 at ...>, ...}).
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
2001-06-14 16:48:43 -04:00
|
|
|
|
I propose to rig the class statement to make the following call:
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
2001-06-14 16:48:43 -04:00
|
|
|
|
C = M("C", (B,), dict)
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
|
|
|
|
(where dict is the dictionary resulting from execution of the
|
2001-06-14 16:48:43 -04:00
|
|
|
|
class body). In other words, the metatype (M) is called.
|
|
|
|
|
|
|
|
|
|
Note that even though we currently require there to be exactly one
|
|
|
|
|
base class, we still pass in a (singleton) sequence of base
|
|
|
|
|
classes; this makes it possible to support multiple inheritance
|
|
|
|
|
later (or for types with a different metaclass!) without changing
|
|
|
|
|
this interface.
|
|
|
|
|
|
|
|
|
|
In current Python, this is called the "Don Beaudry hook" after its
|
|
|
|
|
inventor; it is an exceptional case that is only invoked when a
|
|
|
|
|
base class is not a regular class. For a regular base class (or
|
|
|
|
|
when no base class is specified), current Python calls
|
|
|
|
|
PyClass_New(), the C level factory function for classes, directly.
|
|
|
|
|
I propose to change this so that Python *always* determines a
|
|
|
|
|
metaclass and calls it as given above. When one or more bases are
|
|
|
|
|
given, the type of the first base is used as the metatype;
|
|
|
|
|
when no base class is given, a default metaclass is chosen. By
|
|
|
|
|
setting the default metaclass to PyClass_Type, the metatype of
|
|
|
|
|
"classic" classes, the classic behavior of the class statement is
|
|
|
|
|
retained.
|
|
|
|
|
|
|
|
|
|
There are two further refinements here. First, a useful feature
|
|
|
|
|
is to be able to specify a metatype directly. If the class
|
|
|
|
|
statement defines a variable __metaclass__, that is the metatype
|
|
|
|
|
to call.
|
|
|
|
|
|
|
|
|
|
Second, with multiple bases, not all bases need to have the same
|
|
|
|
|
metatype. This is called a metaclass conflict [1]. Some
|
|
|
|
|
metaclass conflicts can be resolved by searching through the set
|
|
|
|
|
of bases for a metatype that derives from all other given
|
|
|
|
|
metatypes. If such a metatype cannot be found, an exception is
|
|
|
|
|
raised and the class statement fails.
|
|
|
|
|
|
|
|
|
|
This conflict resultion can be implemented in the metatypes
|
|
|
|
|
itself: the class statement just calls the metatype of the first
|
|
|
|
|
base, and this metatype's constructor looks for the most derived
|
|
|
|
|
metatype. If that is itself, it proceeds; otherwise, it calls
|
|
|
|
|
that metatype's constructor. (Ultimate flexibility: another
|
|
|
|
|
metatype might choose to require that all bases have the same
|
|
|
|
|
metatype, or that there's only one base class, or whatever.)
|
|
|
|
|
|
|
|
|
|
(Theoretically, it might be possible to automatically derive a new
|
|
|
|
|
metatype that is a subtype of all given metatypes; but since it is
|
|
|
|
|
questionable how conflicting method definitions of the various
|
|
|
|
|
metatypes should be merged, I don't think this is useful or
|
|
|
|
|
feasible. Should the need arise, the user can derive such a
|
|
|
|
|
metatype and specify it using the __metaclass__ variable. It is
|
|
|
|
|
also possible to have a new metatype that does this.)
|
|
|
|
|
|
|
|
|
|
HIRO
|
2001-05-14 21:36:46 -04:00
|
|
|
|
|
|
|
|
|
Note that calling M requires that M itself has a type: the
|
|
|
|
|
meta-metatype. In the current implementation, I have introduced a
|
|
|
|
|
new type object for this purpose, named turtle because of my
|
|
|
|
|
fondness of the phrase "turtles all the way down". However I now
|
|
|
|
|
believe that it would be better if M were its own metatype, just
|
|
|
|
|
like before. This can be accomplished by making M's tp_call slot
|
|
|
|
|
slightly more flexible.
|
|
|
|
|
|
|
|
|
|
In any case, the work for creating C is done by M's tp_construct
|
|
|
|
|
slot. It allocates space for an "extended" type structure, which
|
|
|
|
|
contains space for: the type object; the auxiliary structures
|
|
|
|
|
(as_sequence etc.); the string object containing the type name (to
|
|
|
|
|
ensure that this object isn't deallocated while the type object is
|
|
|
|
|
still referencing it); and some more auxiliary storage (to be
|
|
|
|
|
described later). It initializes this storage to zeros except for
|
|
|
|
|
a few crucial slots (e.g. tp_name is set to point to the type
|
|
|
|
|
name) and then sets the tp_base slot to point to B. Then
|
|
|
|
|
PyType_InitDict() is called to inherit B's slots. Finally, C's
|
|
|
|
|
tp_dict slot is updated with the contents of the namespace
|
|
|
|
|
dictionary (the third argument to the call to M).
|
2001-05-14 09:43:23 -04:00
|
|
|
|
|
|
|
|
|
|
2001-06-11 16:07:37 -04:00
|
|
|
|
Implementation
|
|
|
|
|
|
|
|
|
|
A prototype implementation of this PEP is available from CVS as a
|
|
|
|
|
branch named "descr-branch". To experiment with this
|
|
|
|
|
implementation, proceed to check out Python from CVS according to
|
|
|
|
|
the instructions at http://sourceforge.net/cvs/?group_id=5470 but
|
|
|
|
|
add the arguments "-r descr-branch" to the cvs checkout command.
|
|
|
|
|
(You can also start with an existing checkout and do "cvs update
|
|
|
|
|
-r descr-branch".) For some examples of the features described
|
|
|
|
|
here, see the file Lib/test/test_descr.py and the extension module
|
|
|
|
|
Modules/spam.c.
|
|
|
|
|
|
|
|
|
|
Note: the code in this branch is for pep-0252, pep-0253, and
|
|
|
|
|
pep-254.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
References
|
|
|
|
|
|
|
|
|
|
[1] "Putting Metaclasses to Work", by Ira R. Forman and Scott
|
|
|
|
|
H. Danforth, Addison-Wesley 1999.
|
|
|
|
|
(http://www.aw.com/product/0,2627,0201433052,00.html)
|
|
|
|
|
|
|
|
|
|
|
2001-05-14 09:43:23 -04:00
|
|
|
|
Copyright
|
|
|
|
|
|
|
|
|
|
This document has been placed in the public domain.
|
|
|
|
|
|
2001-06-13 17:48:31 -04:00
|
|
|
|
|
|
|
|
|
Junk text (to be reused somewhere above)
|
|
|
|
|
|
|
|
|
|
The deallocation mechanism chosen should match the allocation
|
|
|
|
|
mechanism: an allocation policy should prescribe both the
|
|
|
|
|
allocation and deallocation mechanism. And again, planning ahead
|
|
|
|
|
for subtyping would be nice. But the available mechanisms are
|
|
|
|
|
different. The deallocation function has always been part of the
|
|
|
|
|
type structure, as tp_dealloc, which combines the
|
|
|
|
|
"uninitialization" with deallocation. This was good enough for
|
|
|
|
|
the traditional situation, where it matched the combined
|
|
|
|
|
allocation and initialization of the creation function. But now
|
|
|
|
|
imagine a type whose creation function uses a special free list
|
|
|
|
|
for allocation. It's deallocation function puts the object's
|
|
|
|
|
memory back on the same free list. But when allocation and
|
|
|
|
|
creation are separate, the object may have been allocated from the
|
|
|
|
|
regular heap, and it would be wrong (in some cases disastrous) if
|
|
|
|
|
it were placed on the free list by the deallocation function.
|
|
|
|
|
|
|
|
|
|
A solution would be for the tp_construct function to somehow mark
|
|
|
|
|
whether the object was allocated from the special free list, so
|
|
|
|
|
that the tp_dealloc function can choose the right deallocation
|
|
|
|
|
method (assuming that the only two alternatives are a special free
|
|
|
|
|
list or the regular heap). A variant that doesn't require space
|
|
|
|
|
for an allocation flag bit would be to have two type objects,
|
|
|
|
|
identical in the contents of all their slots except for their
|
|
|
|
|
deallocation slot. But this requires that all type-checking code
|
|
|
|
|
(e.g. the PyDict_Check()) recognizes both types. We'll come back
|
|
|
|
|
to this solution in the context of subtyping. Another alternative
|
|
|
|
|
is to require the metatype's tp_call to leave the allocation to
|
|
|
|
|
the tp_construct method, by passing in a NULL pointer. But this
|
|
|
|
|
doesn't work once we allow subtyping.
|
|
|
|
|
|
|
|
|
|
Eventually, when we add any form of subtyping, we'll have to
|
|
|
|
|
separate deallocation from uninitialization. The way to do this
|
|
|
|
|
is to add a separate slot to the type object that does the
|
|
|
|
|
uninitialization without the deallocation. Fortunately, there is
|
|
|
|
|
already such a slot: tp_clear, currently used by the garbage
|
|
|
|
|
collection subsystem. A simple rule makes this slot reusable as
|
|
|
|
|
an uninitialization: for types that support separate allocation
|
|
|
|
|
and initialization, tp_clear must be defined (even if the object
|
|
|
|
|
doesn't support garbage collection) and it must DECREF all
|
|
|
|
|
contained objects and FREE all other memory areas the object owns.
|
|
|
|
|
It must also be reentrant: it must be possible to clear an already
|
|
|
|
|
cleared object. The easiest way to do this is to replace all
|
|
|
|
|
pointers DECREFed or FREEd with NULL pointers.
|
|
|
|
|
|
2001-05-14 09:43:23 -04:00
|
|
|
|
|
|
|
|
|
Local Variables:
|
|
|
|
|
mode: indented-text
|
|
|
|
|
indent-tabs-mode: nil
|
|
|
|
|
End:
|