2001-05-14 09:43:23 -04:00
|
|
|
|
PEP: 253
|
|
|
|
|
Title: Subtyping Built-in Types
|
|
|
|
|
Version: $Revision$
|
|
|
|
|
Author: guido@python.org (Guido van Rossum)
|
|
|
|
|
Status: Draft
|
|
|
|
|
Type: Standards Track
|
|
|
|
|
Python-Version: 2.2
|
|
|
|
|
Created: 14-May-2001
|
|
|
|
|
Post-History:
|
|
|
|
|
|
|
|
|
|
Abstract
|
|
|
|
|
|
|
|
|
|
This PEP proposes ways for creating subtypes of existing built-in
|
2001-05-14 21:36:46 -04:00
|
|
|
|
types, either in C or in Python. The text is currently long and
|
|
|
|
|
rambling; I'll go over it again later to make it shorter.
|
2001-05-14 09:43:23 -04:00
|
|
|
|
|
2001-05-14 21:36:46 -04:00
|
|
|
|
Traditionally, types in Python have been created statically, by
|
|
|
|
|
declaring a global variable of type PyTypeObject and initializing
|
|
|
|
|
it with a static initializer. The fields in the type object
|
|
|
|
|
describe all aspects of a Python object that are relevant to the
|
|
|
|
|
Python interpreter. A few fields contain dimensional information
|
|
|
|
|
(e.g. the basic allocation size of instances), others contain
|
|
|
|
|
various flags, but most fields are pointers to functions to
|
|
|
|
|
implement various kinds of behaviors. A NULL pointer means that
|
|
|
|
|
the type does not implement the specific behavior; in that case
|
|
|
|
|
the system may provide a default behavior in that case or raise an
|
|
|
|
|
exception when the behavior is invoked. Some collections of
|
|
|
|
|
functions pointers that are usually defined together are obtained
|
|
|
|
|
indirectly via a pointer to an additional structure containing.
|
2001-05-14 09:43:23 -04:00
|
|
|
|
|
2001-05-14 21:36:46 -04:00
|
|
|
|
While the details of initializing a PyTypeObject structure haven't
|
|
|
|
|
been documented as such, they are easily glanced from the examples
|
|
|
|
|
in the source code, and I am assuming that the reader is
|
|
|
|
|
sufficiently familiar with the traditional way of creating new
|
|
|
|
|
Python types in C.
|
|
|
|
|
|
|
|
|
|
This PEP will introduce the following optional features to types:
|
|
|
|
|
|
|
|
|
|
- create an instance of a type by calling it
|
|
|
|
|
|
|
|
|
|
- create a subtype in C by specifying a base type pointer
|
|
|
|
|
|
|
|
|
|
- create a subtype in Python using a class statement
|
|
|
|
|
|
|
|
|
|
- multiple inheritance
|
|
|
|
|
|
|
|
|
|
This PEP builds on PEP 252, which adds standard introspection to
|
|
|
|
|
types; in particular, types are assumed to have e.g. a __hash__
|
|
|
|
|
method when the type object defines the tp_hash slot. PEP 252 also
|
|
|
|
|
adds a dictionary to type objects which contains all methods. At
|
|
|
|
|
the Python level, this dictionary is read-only; at the C level, it
|
|
|
|
|
is accessible directly (but modifying it is not recommended except
|
|
|
|
|
as part of initialization).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Metatypes
|
|
|
|
|
|
|
|
|
|
Inevitably the following discussion will come to mention metatypes
|
|
|
|
|
(or metaclasses). Metatypes are nothing new in Python: Python has
|
|
|
|
|
always been able to talk about the type of a type:
|
|
|
|
|
|
|
|
|
|
>>> a = 0
|
|
|
|
|
>>> type(a)
|
|
|
|
|
<type 'int'>
|
|
|
|
|
>>> type(type(a))
|
|
|
|
|
<type 'type'>
|
|
|
|
|
>>> type(type(type(a)))
|
|
|
|
|
<type 'type'>
|
|
|
|
|
>>>
|
|
|
|
|
|
|
|
|
|
In this example, type(a) is a "regular" type, and type(type(a)) is
|
|
|
|
|
a metatype. While as distributed all types have the same metatype
|
|
|
|
|
(which is also its own metatype), this is not a requirement, and
|
|
|
|
|
in fact a useful 3rd party extension (ExtensionClasses by Jim
|
|
|
|
|
Fulton) creates an additional metatype. A related feature is the
|
|
|
|
|
"Don Beaudry hook", which says that if a metatype is callable, its
|
|
|
|
|
instances (which are regular types) can be subclassed (really
|
|
|
|
|
subtyped) using a Python class statement. We will use this rule
|
|
|
|
|
to support subtyping of built-in types, and in the process we will
|
|
|
|
|
introduce some additional metatypes, and a "metametatype". (The
|
|
|
|
|
metametatype is nothing unusual; Python's type system allows any
|
|
|
|
|
number of metalevels.)
|
|
|
|
|
|
|
|
|
|
Note that Python uses the concept of metatypes or metaclasses in a
|
|
|
|
|
different way than Smalltalk. In Smalltalk-80, there is a
|
|
|
|
|
hierarchy of metaclasses that mirrors the hierarchy of regular
|
|
|
|
|
classes, metaclasses map 1-1 to classes (except for some funny
|
|
|
|
|
business at the root of the hierarchy), and each class statement
|
|
|
|
|
creates both a regular class and its metaclass, putting class
|
|
|
|
|
methods in the metaclass and instance methods in the regular
|
|
|
|
|
class.
|
|
|
|
|
|
|
|
|
|
Nice though this may be in the context of Smalltalk, it's not
|
|
|
|
|
compatible with the traditional use of metatypes in Python, and I
|
|
|
|
|
prefer to continue in the Python way. This means that Python
|
|
|
|
|
metatypes are typically written in C, and may be shared between
|
|
|
|
|
many regular types. (It will be possible to subtype metatypes in
|
|
|
|
|
Python, so it won't be absolutely necessary to write C in order to
|
|
|
|
|
use metatypes; but the power of Python metatypes will be limited,
|
|
|
|
|
e.g. Python code will never be allowed to allocate raw memory and
|
|
|
|
|
initialize it at will.)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Instantiation by calling the type object
|
|
|
|
|
|
|
|
|
|
Traditionally, for each type there is at least one C function that
|
|
|
|
|
creates instances of the type. This function has to take care of
|
|
|
|
|
both allocating memory for the object and initializing that
|
|
|
|
|
memory. As of Python 2.0, it also has to interface with the
|
|
|
|
|
garbage collection subsystem, if the type chooses to participate
|
|
|
|
|
in garbage collection (which is optional, but strongly recommended
|
|
|
|
|
for so-called "container" types: types that may contain arbitrary
|
|
|
|
|
references to other objects, and hence may participate in
|
|
|
|
|
reference cycles).
|
|
|
|
|
|
|
|
|
|
If we're going to implement subtyping, we must separate allocation
|
|
|
|
|
and initialization: typically, the most derived subtype is in
|
|
|
|
|
charge of allocation (and hence deallocation!), but in most cases
|
|
|
|
|
each base type's initializer (constructor) must still be called,
|
|
|
|
|
from the "most base" type to the most derived type.
|
|
|
|
|
|
|
|
|
|
But let's first get the interface for instantiation right. If we
|
|
|
|
|
call an object, the tp_call slot if its type gets invoked. Thus,
|
|
|
|
|
if we call a type, this invokes the tp_call slot of the type's
|
|
|
|
|
type: in other words, the tp_call slot of the metatype.
|
|
|
|
|
Traditionally this has been a NULL pointer, meaning that types
|
|
|
|
|
can't be called. Now we're adding a tp_call slot to the metatype,
|
|
|
|
|
which makes all types "callable" in a trivial sense. But
|
|
|
|
|
obviously the metatype's tp_call implementation doesn't know how
|
|
|
|
|
to initialize individual types. So the type defines a new slot,
|
|
|
|
|
tp_construct, which is invoked by the metatype's tp_call slot. If
|
|
|
|
|
the tp_construct slot is NULL, the metatype's tp_call issues a
|
|
|
|
|
nice error message: the type isn't callable.
|
|
|
|
|
|
|
|
|
|
We already know that tp_construct is responsible for initializing
|
|
|
|
|
the object (this will be important for subtyping too). Who should
|
|
|
|
|
be responsible for allocation of the new object? Either the
|
|
|
|
|
metatype's tp_call can allocate the object, or the type's
|
|
|
|
|
tp_construct can allocate it. The solution is copied from typical
|
|
|
|
|
C++ implementations: if the metatype's tp_call allocates storage
|
|
|
|
|
for the object it passes the storage as a pointer to the type's
|
|
|
|
|
tp_construct; if the metatype's tp_call does not allocate storage,
|
|
|
|
|
it passes a NULL pointer to the type's tp_call in which case the
|
|
|
|
|
type allocates the storage itself. This moves the policy decision
|
|
|
|
|
to the metatype, and different metatypes may have different
|
|
|
|
|
policies. The mechanisms are fixed though: either the metatype's
|
|
|
|
|
tp_call allocates storage, or the type's tp_construct allocates.
|
|
|
|
|
|
|
|
|
|
The deallocation mechanism chosen should match the allocation
|
|
|
|
|
mechanism: an allocation policy should prescribe both the
|
|
|
|
|
allocation and deallocation mechanism. And again, planning ahead
|
|
|
|
|
for subtyping would be nice. But the available mechanisms are
|
|
|
|
|
different. The deallocation function has always been part of the
|
|
|
|
|
type structure, as tp_dealloc, which combines the
|
|
|
|
|
"uninitialization" with deallocation. This was good enough for
|
|
|
|
|
the traditional situation, where it matched the combined
|
|
|
|
|
allocation and initialization of the creation function. But now
|
|
|
|
|
imagine a type whose creation function uses a special free list
|
|
|
|
|
for allocation. It's deallocation function puts the object's
|
|
|
|
|
memory back on the same free list. But when allocation and
|
|
|
|
|
creation are separate, the object may have been allocated from the
|
|
|
|
|
regular heap, and it would be wrong (in some cases disastrous) if
|
|
|
|
|
it were placed on the free list by the deallocation function.
|
|
|
|
|
|
|
|
|
|
A solution would be for the tp_construct function to somehow mark
|
|
|
|
|
whether the object was allocated from the special free list, so
|
|
|
|
|
that the tp_dealloc function can choose the right deallocation
|
|
|
|
|
method (assuming that the only two alternatives are a special free
|
|
|
|
|
list or the regular heap). A variant that doesn't require space
|
|
|
|
|
for an allocation flag bit would be to have two type objects,
|
|
|
|
|
identical in the contents of all their slots except for their
|
|
|
|
|
deallocation slot. But this requires that all type-checking code
|
|
|
|
|
(e.g. the PyDict_Check()) recognizes both types. We'll come back
|
|
|
|
|
to this solution in the context of subtyping. Another alternative
|
|
|
|
|
is to require the metatype's tp_call to leave the allocation to
|
|
|
|
|
the tp_construct method, by passing in a NULL pointer. But this
|
|
|
|
|
doesn't work once we allow subtyping.
|
|
|
|
|
|
|
|
|
|
Eventually, when we add any form of subtyping, we'll have to
|
|
|
|
|
separate deallocation from uninitialization. The way to do this
|
|
|
|
|
is to add a separate slot to the type object that does the
|
|
|
|
|
uninitialization without the deallocation. Fortunately, there is
|
|
|
|
|
already such a slot: tp_clear, currently used by the garbage
|
|
|
|
|
collection subsystem. A simple rule makes this slot reusable as
|
|
|
|
|
an uninitialization: for types that support separate allocation
|
|
|
|
|
and initialization, tp_clear must be defined (even if the object
|
|
|
|
|
doesn't support garbage collection) and it must DECREF all
|
|
|
|
|
contained objects and FREE all other memory areas the object owns.
|
|
|
|
|
It must also be reentrant: it must be possible to clear an already
|
|
|
|
|
cleared object. The easiest way to do this is to replace all
|
|
|
|
|
pointers DECREFed or FREEd with NULL pointers.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Subtyping in C
|
|
|
|
|
|
|
|
|
|
The simplest form of subtyping is subtyping in C. It is the
|
|
|
|
|
simplest form because we can require the C code to be aware of the
|
|
|
|
|
various problems, and it's acceptable for C code that doesn't
|
|
|
|
|
follow the rules to dump core; while for Python subtyping we would
|
|
|
|
|
need to catch all errors before they become core dumps.
|
|
|
|
|
|
|
|
|
|
The idea behind subtyping is very similar to that of single
|
|
|
|
|
inheritance in C++. A base type is described by a structure
|
|
|
|
|
declaration plus a type object. A derived type can extend the
|
|
|
|
|
structure (but must leave the names, order and type of the fields
|
|
|
|
|
of the base structure unchanged) and can override certain slots in
|
|
|
|
|
the type object, leaving others the same.
|
|
|
|
|
|
|
|
|
|
Not every type can serve as a base type. The base type must
|
|
|
|
|
support separation of allocation and initialization by having a
|
|
|
|
|
tp_construct slot that can be called with a preallocated object,
|
|
|
|
|
and it must support uninitialization without deallocation by
|
|
|
|
|
having a tp_clear slot as described above. The derived type must
|
|
|
|
|
also export the structure declaration for its instances through a
|
|
|
|
|
header file, as it is needed in order to derive a subtype. The
|
|
|
|
|
type object for the base type must also be exported.
|
|
|
|
|
|
|
|
|
|
If the base type has a type-checking macro (e.g. PyDict_Check()),
|
|
|
|
|
this macro may be changed to recognize subtypes. This can be done
|
|
|
|
|
by using the new PyObject_TypeCheck(object, type) macro, which
|
|
|
|
|
calls a function that follows the base class links. There are
|
|
|
|
|
arguments for and against changing the type-checking macro in this
|
|
|
|
|
way. The argument for the change should be clear: it allows
|
|
|
|
|
subtypes to be used in places where the base type is required,
|
|
|
|
|
which is often the prime attraction of subtyping (as opposed to
|
|
|
|
|
sharing implementation). An argument against changing the
|
|
|
|
|
type-checking macro could be that the type check is used
|
|
|
|
|
frequently and a function call would slow things down too much
|
|
|
|
|
(hard to believe); or one could fear that a subtype might break an
|
|
|
|
|
invariant assumed by the support functions of the base type.
|
|
|
|
|
Sometimes it would be wise to change the base type to remove this
|
|
|
|
|
reliance; other times, it would be better to require that derived
|
|
|
|
|
types (implemented in C) maintain the invariants.
|
|
|
|
|
|
|
|
|
|
The derived type begins by declaring a type structure which
|
|
|
|
|
contains the base type's structure. For example, here's the type
|
|
|
|
|
structure for a subtype of the built-in list type:
|
|
|
|
|
|
|
|
|
|
typedef struct {
|
|
|
|
|
PyListObject list;
|
|
|
|
|
int state;
|
|
|
|
|
} spamlistobject;
|
|
|
|
|
|
|
|
|
|
Note that the base type structure field (here PyListObject) must
|
|
|
|
|
be the first field in the structure; any following fields are
|
|
|
|
|
extension fields. Also note that the base type is not referenced
|
|
|
|
|
via a pointer; the actual contents of its structure must be
|
|
|
|
|
included! (The goal is for the memory lay out of the beginning of
|
|
|
|
|
the subtype instance to be the same as that of the base type
|
|
|
|
|
instance.)
|
|
|
|
|
|
|
|
|
|
Next, the derived type must declare a type object and initialize
|
|
|
|
|
it. Most of the slots in the type object may be initialized to
|
|
|
|
|
zero, which is a signal that the base type slot must be copied
|
|
|
|
|
into it. Some fields that must be initialized properly:
|
|
|
|
|
|
|
|
|
|
- the object header must be filled in as usual; the type should be
|
|
|
|
|
PyType_Type
|
|
|
|
|
|
|
|
|
|
- the tp_basicsize field must be set to the size of the subtype
|
|
|
|
|
instances
|
|
|
|
|
|
|
|
|
|
- the tp_base field must be set to the address of the base type's
|
|
|
|
|
type object
|
|
|
|
|
|
|
|
|
|
- the tp_dealloc slot function must be a deallocation function for
|
|
|
|
|
the subtype
|
|
|
|
|
|
|
|
|
|
- the tp_flags field must be set to the usual Py_TPFLAGS_DEFAULT
|
|
|
|
|
value
|
|
|
|
|
|
|
|
|
|
- the tp_name field must be set (otherwise it will be inherited,
|
|
|
|
|
which is wrong)
|
|
|
|
|
|
|
|
|
|
Exception: if the subtype defines no additional fields in its
|
|
|
|
|
structure (i.e., it only defines new behavior, no new data), the
|
|
|
|
|
tp_basicsize and the tp_dealloc fields may be set to zero. In
|
|
|
|
|
order to complete the initialization of the type,
|
|
|
|
|
PyType_InitDict() must be called. This replaces zero slots in the
|
|
|
|
|
subtype with the value of the corresponding base type slots. It
|
|
|
|
|
also fills in tp_dict, the type's dictionary; this is more a
|
|
|
|
|
matter of PEP 252.
|
|
|
|
|
|
|
|
|
|
The subtype's tp_dealloc slot deserves special attention. It must
|
|
|
|
|
uninitialize and deallocate the object in an orderly manner: first
|
|
|
|
|
it must uninitialize the fields added by the extension type; then
|
|
|
|
|
it must call the base type's tp_clear function; finally it must
|
|
|
|
|
deallocate the memory of the object. Usually, the base type's
|
|
|
|
|
tp_clear function has no global name; it is permissible to call it
|
|
|
|
|
via the base type's tp_clear slot, e.g. PyListType.tp_clear(obj).
|
|
|
|
|
Only if it is known that the base type uses the same allocation
|
|
|
|
|
method as the subtype and the subtype requires no uninitialization
|
|
|
|
|
(e.g. it adds no data fields or all its data fields are numbers)
|
|
|
|
|
is it permissible to leave tp_dealloc set to zero in the subtype's
|
|
|
|
|
type object; it will be copied from the base type.
|
|
|
|
|
|
|
|
|
|
A subtype is not usable until PyType_InitDict() is called for it;
|
|
|
|
|
this is best done during module initialization, assuming the
|
|
|
|
|
subtype belongs to a module. An alternative for subtypes added to
|
|
|
|
|
the Python core (which don't live in a particular module) would be
|
|
|
|
|
to initialize the subtype in their constructor function. It is
|
|
|
|
|
allowed to call PyType_InitDict() more than once, the second and
|
|
|
|
|
further calls have no effect. In order to avoid unnecessary
|
|
|
|
|
calls, a test for tp_dict==NULL can be made.
|
|
|
|
|
|
|
|
|
|
If the subtype itself should be subtypable (usually desirable), it
|
|
|
|
|
should follow the same rules are given above for base types: have
|
|
|
|
|
a tp_construct that accepts a preallocated object and calls the
|
|
|
|
|
base type's tp_construct, and have a tp_clear that calls the base
|
|
|
|
|
type's tp_clear.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Subtyping in Python
|
|
|
|
|
|
|
|
|
|
The next step is to allow subtyping of selected built-in types
|
|
|
|
|
through a class statement in Python. Limiting ourselves to single
|
|
|
|
|
inheritance for now, here is what happens for a simple class
|
|
|
|
|
statement:
|
|
|
|
|
|
|
|
|
|
class C(B):
|
|
|
|
|
var1 = 1
|
|
|
|
|
def method1(self): pass
|
|
|
|
|
# etc.
|
|
|
|
|
|
|
|
|
|
The body of the class statement is executes in a fresh environment
|
|
|
|
|
(basically, a new dictionary used as local namespace), and then C
|
|
|
|
|
is created. The following explains how C is created.
|
|
|
|
|
|
|
|
|
|
Assume B is a type object. Since type objects are objects, and
|
|
|
|
|
every object has a type, B has a type. B's type is accessible via
|
|
|
|
|
type(B) or B.__class__ (the latter notation is new for types; it
|
|
|
|
|
is introduced in PEP 252). Let's say B's type is M (for
|
|
|
|
|
Metatype). The class statement will create a new type, C. Since
|
|
|
|
|
C will be a type object just like B, we view the creation of C as
|
|
|
|
|
an instantiation of the metatype, M. The information that needs
|
|
|
|
|
to be provided for the creation of C is: its name (in this example
|
|
|
|
|
the string "C"); the list of base classes (a singleton tuple
|
|
|
|
|
containing B); and the results of executing the class body, in the
|
|
|
|
|
form of a dictionary (e.g. {"var1": 1, "method1": <function...>,
|
|
|
|
|
...}).
|
|
|
|
|
|
|
|
|
|
According to the Don Beaudry hook, the following call is made:
|
|
|
|
|
|
|
|
|
|
C = M("C", (B,), dict)
|
|
|
|
|
|
|
|
|
|
(where dict is the dictionary resulting from execution of the
|
|
|
|
|
class body). In other words, the metatype (M) is called. Note
|
|
|
|
|
that even though we currently require there to be exactly one base
|
|
|
|
|
class, we still pass in a (singleton) sequence of base classes;
|
|
|
|
|
this makes it possible to support multiple inheritance later (or
|
|
|
|
|
for types with a different metaclass!) without changing this
|
|
|
|
|
interface.
|
|
|
|
|
|
|
|
|
|
Note that calling M requires that M itself has a type: the
|
|
|
|
|
meta-metatype. In the current implementation, I have introduced a
|
|
|
|
|
new type object for this purpose, named turtle because of my
|
|
|
|
|
fondness of the phrase "turtles all the way down". However I now
|
|
|
|
|
believe that it would be better if M were its own metatype, just
|
|
|
|
|
like before. This can be accomplished by making M's tp_call slot
|
|
|
|
|
slightly more flexible.
|
|
|
|
|
|
|
|
|
|
In any case, the work for creating C is done by M's tp_construct
|
|
|
|
|
slot. It allocates space for an "extended" type structure, which
|
|
|
|
|
contains space for: the type object; the auxiliary structures
|
|
|
|
|
(as_sequence etc.); the string object containing the type name (to
|
|
|
|
|
ensure that this object isn't deallocated while the type object is
|
|
|
|
|
still referencing it); and some more auxiliary storage (to be
|
|
|
|
|
described later). It initializes this storage to zeros except for
|
|
|
|
|
a few crucial slots (e.g. tp_name is set to point to the type
|
|
|
|
|
name) and then sets the tp_base slot to point to B. Then
|
|
|
|
|
PyType_InitDict() is called to inherit B's slots. Finally, C's
|
|
|
|
|
tp_dict slot is updated with the contents of the namespace
|
|
|
|
|
dictionary (the third argument to the call to M).
|
2001-05-14 09:43:23 -04:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Copyright
|
|
|
|
|
|
|
|
|
|
This document has been placed in the public domain.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Local Variables:
|
|
|
|
|
mode: indented-text
|
|
|
|
|
indent-tabs-mode: nil
|
|
|
|
|
End:
|