385 lines
19 KiB
Plaintext
385 lines
19 KiB
Plaintext
PEP: 253
|
||
Title: Subtyping Built-in Types
|
||
Version: $Revision$
|
||
Author: guido@python.org (Guido van Rossum)
|
||
Status: Draft
|
||
Type: Standards Track
|
||
Python-Version: 2.2
|
||
Created: 14-May-2001
|
||
Post-History:
|
||
|
||
Abstract
|
||
|
||
This PEP proposes ways for creating subtypes of existing built-in
|
||
types, either in C or in Python. The text is currently long and
|
||
rambling; I'll go over it again later to make it shorter.
|
||
|
||
Traditionally, types in Python have been created statically, by
|
||
declaring a global variable of type PyTypeObject and initializing
|
||
it with a static initializer. The fields in the type object
|
||
describe all aspects of a Python object that are relevant to the
|
||
Python interpreter. A few fields contain dimensional information
|
||
(e.g. the basic allocation size of instances), others contain
|
||
various flags, but most fields are pointers to functions to
|
||
implement various kinds of behaviors. A NULL pointer means that
|
||
the type does not implement the specific behavior; in that case
|
||
the system may provide a default behavior in that case or raise an
|
||
exception when the behavior is invoked. Some collections of
|
||
functions pointers that are usually defined together are obtained
|
||
indirectly via a pointer to an additional structure containing.
|
||
|
||
While the details of initializing a PyTypeObject structure haven't
|
||
been documented as such, they are easily glanced from the examples
|
||
in the source code, and I am assuming that the reader is
|
||
sufficiently familiar with the traditional way of creating new
|
||
Python types in C.
|
||
|
||
This PEP will introduce the following optional features to types:
|
||
|
||
- create an instance of a type by calling it
|
||
|
||
- create a subtype in C by specifying a base type pointer
|
||
|
||
- create a subtype in Python using a class statement
|
||
|
||
- multiple inheritance
|
||
|
||
This PEP builds on PEP 252, which adds standard introspection to
|
||
types; in particular, types are assumed to have e.g. a __hash__
|
||
method when the type object defines the tp_hash slot. PEP 252 also
|
||
adds a dictionary to type objects which contains all methods. At
|
||
the Python level, this dictionary is read-only; at the C level, it
|
||
is accessible directly (but modifying it is not recommended except
|
||
as part of initialization).
|
||
|
||
|
||
Metatypes
|
||
|
||
Inevitably the following discussion will come to mention metatypes
|
||
(or metaclasses). Metatypes are nothing new in Python: Python has
|
||
always been able to talk about the type of a type:
|
||
|
||
>>> a = 0
|
||
>>> type(a)
|
||
<type 'int'>
|
||
>>> type(type(a))
|
||
<type 'type'>
|
||
>>> type(type(type(a)))
|
||
<type 'type'>
|
||
>>>
|
||
|
||
In this example, type(a) is a "regular" type, and type(type(a)) is
|
||
a metatype. While as distributed all types have the same metatype
|
||
(which is also its own metatype), this is not a requirement, and
|
||
in fact a useful 3rd party extension (ExtensionClasses by Jim
|
||
Fulton) creates an additional metatype. A related feature is the
|
||
"Don Beaudry hook", which says that if a metatype is callable, its
|
||
instances (which are regular types) can be subclassed (really
|
||
subtyped) using a Python class statement. We will use this rule
|
||
to support subtyping of built-in types, and in the process we will
|
||
introduce some additional metatypes, and a "metametatype". (The
|
||
metametatype is nothing unusual; Python's type system allows any
|
||
number of metalevels.)
|
||
|
||
Note that Python uses the concept of metatypes or metaclasses in a
|
||
different way than Smalltalk. In Smalltalk-80, there is a
|
||
hierarchy of metaclasses that mirrors the hierarchy of regular
|
||
classes, metaclasses map 1-1 to classes (except for some funny
|
||
business at the root of the hierarchy), and each class statement
|
||
creates both a regular class and its metaclass, putting class
|
||
methods in the metaclass and instance methods in the regular
|
||
class.
|
||
|
||
Nice though this may be in the context of Smalltalk, it's not
|
||
compatible with the traditional use of metatypes in Python, and I
|
||
prefer to continue in the Python way. This means that Python
|
||
metatypes are typically written in C, and may be shared between
|
||
many regular types. (It will be possible to subtype metatypes in
|
||
Python, so it won't be absolutely necessary to write C in order to
|
||
use metatypes; but the power of Python metatypes will be limited,
|
||
e.g. Python code will never be allowed to allocate raw memory and
|
||
initialize it at will.)
|
||
|
||
|
||
Instantiation by calling the type object
|
||
|
||
Traditionally, for each type there is at least one C function that
|
||
creates instances of the type. This function has to take care of
|
||
both allocating memory for the object and initializing that
|
||
memory. As of Python 2.0, it also has to interface with the
|
||
garbage collection subsystem, if the type chooses to participate
|
||
in garbage collection (which is optional, but strongly recommended
|
||
for so-called "container" types: types that may contain arbitrary
|
||
references to other objects, and hence may participate in
|
||
reference cycles).
|
||
|
||
If we're going to implement subtyping, we must separate allocation
|
||
and initialization: typically, the most derived subtype is in
|
||
charge of allocation (and hence deallocation!), but in most cases
|
||
each base type's initializer (constructor) must still be called,
|
||
from the "most base" type to the most derived type.
|
||
|
||
But let's first get the interface for instantiation right. If we
|
||
call an object, the tp_call slot if its type gets invoked. Thus,
|
||
if we call a type, this invokes the tp_call slot of the type's
|
||
type: in other words, the tp_call slot of the metatype.
|
||
Traditionally this has been a NULL pointer, meaning that types
|
||
can't be called. Now we're adding a tp_call slot to the metatype,
|
||
which makes all types "callable" in a trivial sense. But
|
||
obviously the metatype's tp_call implementation doesn't know how
|
||
to initialize individual types. So the type defines a new slot,
|
||
tp_construct, which is invoked by the metatype's tp_call slot. If
|
||
the tp_construct slot is NULL, the metatype's tp_call issues a
|
||
nice error message: the type isn't callable.
|
||
|
||
We already know that tp_construct is responsible for initializing
|
||
the object (this will be important for subtyping too). Who should
|
||
be responsible for allocation of the new object? Either the
|
||
metatype's tp_call can allocate the object, or the type's
|
||
tp_construct can allocate it. The solution is copied from typical
|
||
C++ implementations: if the metatype's tp_call allocates storage
|
||
for the object it passes the storage as a pointer to the type's
|
||
tp_construct; if the metatype's tp_call does not allocate storage,
|
||
it passes a NULL pointer to the type's tp_call in which case the
|
||
type allocates the storage itself. This moves the policy decision
|
||
to the metatype, and different metatypes may have different
|
||
policies. The mechanisms are fixed though: either the metatype's
|
||
tp_call allocates storage, or the type's tp_construct allocates.
|
||
|
||
The deallocation mechanism chosen should match the allocation
|
||
mechanism: an allocation policy should prescribe both the
|
||
allocation and deallocation mechanism. And again, planning ahead
|
||
for subtyping would be nice. But the available mechanisms are
|
||
different. The deallocation function has always been part of the
|
||
type structure, as tp_dealloc, which combines the
|
||
"uninitialization" with deallocation. This was good enough for
|
||
the traditional situation, where it matched the combined
|
||
allocation and initialization of the creation function. But now
|
||
imagine a type whose creation function uses a special free list
|
||
for allocation. It's deallocation function puts the object's
|
||
memory back on the same free list. But when allocation and
|
||
creation are separate, the object may have been allocated from the
|
||
regular heap, and it would be wrong (in some cases disastrous) if
|
||
it were placed on the free list by the deallocation function.
|
||
|
||
A solution would be for the tp_construct function to somehow mark
|
||
whether the object was allocated from the special free list, so
|
||
that the tp_dealloc function can choose the right deallocation
|
||
method (assuming that the only two alternatives are a special free
|
||
list or the regular heap). A variant that doesn't require space
|
||
for an allocation flag bit would be to have two type objects,
|
||
identical in the contents of all their slots except for their
|
||
deallocation slot. But this requires that all type-checking code
|
||
(e.g. the PyDict_Check()) recognizes both types. We'll come back
|
||
to this solution in the context of subtyping. Another alternative
|
||
is to require the metatype's tp_call to leave the allocation to
|
||
the tp_construct method, by passing in a NULL pointer. But this
|
||
doesn't work once we allow subtyping.
|
||
|
||
Eventually, when we add any form of subtyping, we'll have to
|
||
separate deallocation from uninitialization. The way to do this
|
||
is to add a separate slot to the type object that does the
|
||
uninitialization without the deallocation. Fortunately, there is
|
||
already such a slot: tp_clear, currently used by the garbage
|
||
collection subsystem. A simple rule makes this slot reusable as
|
||
an uninitialization: for types that support separate allocation
|
||
and initialization, tp_clear must be defined (even if the object
|
||
doesn't support garbage collection) and it must DECREF all
|
||
contained objects and FREE all other memory areas the object owns.
|
||
It must also be reentrant: it must be possible to clear an already
|
||
cleared object. The easiest way to do this is to replace all
|
||
pointers DECREFed or FREEd with NULL pointers.
|
||
|
||
|
||
Subtyping in C
|
||
|
||
The simplest form of subtyping is subtyping in C. It is the
|
||
simplest form because we can require the C code to be aware of the
|
||
various problems, and it's acceptable for C code that doesn't
|
||
follow the rules to dump core; while for Python subtyping we would
|
||
need to catch all errors before they become core dumps.
|
||
|
||
The idea behind subtyping is very similar to that of single
|
||
inheritance in C++. A base type is described by a structure
|
||
declaration plus a type object. A derived type can extend the
|
||
structure (but must leave the names, order and type of the fields
|
||
of the base structure unchanged) and can override certain slots in
|
||
the type object, leaving others the same.
|
||
|
||
Not every type can serve as a base type. The base type must
|
||
support separation of allocation and initialization by having a
|
||
tp_construct slot that can be called with a preallocated object,
|
||
and it must support uninitialization without deallocation by
|
||
having a tp_clear slot as described above. The derived type must
|
||
also export the structure declaration for its instances through a
|
||
header file, as it is needed in order to derive a subtype. The
|
||
type object for the base type must also be exported.
|
||
|
||
If the base type has a type-checking macro (e.g. PyDict_Check()),
|
||
this macro may be changed to recognize subtypes. This can be done
|
||
by using the new PyObject_TypeCheck(object, type) macro, which
|
||
calls a function that follows the base class links. There are
|
||
arguments for and against changing the type-checking macro in this
|
||
way. The argument for the change should be clear: it allows
|
||
subtypes to be used in places where the base type is required,
|
||
which is often the prime attraction of subtyping (as opposed to
|
||
sharing implementation). An argument against changing the
|
||
type-checking macro could be that the type check is used
|
||
frequently and a function call would slow things down too much
|
||
(hard to believe); or one could fear that a subtype might break an
|
||
invariant assumed by the support functions of the base type.
|
||
Sometimes it would be wise to change the base type to remove this
|
||
reliance; other times, it would be better to require that derived
|
||
types (implemented in C) maintain the invariants.
|
||
|
||
The derived type begins by declaring a type structure which
|
||
contains the base type's structure. For example, here's the type
|
||
structure for a subtype of the built-in list type:
|
||
|
||
typedef struct {
|
||
PyListObject list;
|
||
int state;
|
||
} spamlistobject;
|
||
|
||
Note that the base type structure field (here PyListObject) must
|
||
be the first field in the structure; any following fields are
|
||
extension fields. Also note that the base type is not referenced
|
||
via a pointer; the actual contents of its structure must be
|
||
included! (The goal is for the memory lay out of the beginning of
|
||
the subtype instance to be the same as that of the base type
|
||
instance.)
|
||
|
||
Next, the derived type must declare a type object and initialize
|
||
it. Most of the slots in the type object may be initialized to
|
||
zero, which is a signal that the base type slot must be copied
|
||
into it. Some fields that must be initialized properly:
|
||
|
||
- the object header must be filled in as usual; the type should be
|
||
PyType_Type
|
||
|
||
- the tp_basicsize field must be set to the size of the subtype
|
||
instances
|
||
|
||
- the tp_base field must be set to the address of the base type's
|
||
type object
|
||
|
||
- the tp_dealloc slot function must be a deallocation function for
|
||
the subtype
|
||
|
||
- the tp_flags field must be set to the usual Py_TPFLAGS_DEFAULT
|
||
value
|
||
|
||
- the tp_name field must be set (otherwise it will be inherited,
|
||
which is wrong)
|
||
|
||
Exception: if the subtype defines no additional fields in its
|
||
structure (i.e., it only defines new behavior, no new data), the
|
||
tp_basicsize and the tp_dealloc fields may be set to zero. In
|
||
order to complete the initialization of the type,
|
||
PyType_InitDict() must be called. This replaces zero slots in the
|
||
subtype with the value of the corresponding base type slots. It
|
||
also fills in tp_dict, the type's dictionary; this is more a
|
||
matter of PEP 252.
|
||
|
||
The subtype's tp_dealloc slot deserves special attention. It must
|
||
uninitialize and deallocate the object in an orderly manner: first
|
||
it must uninitialize the fields added by the extension type; then
|
||
it must call the base type's tp_clear function; finally it must
|
||
deallocate the memory of the object. Usually, the base type's
|
||
tp_clear function has no global name; it is permissible to call it
|
||
via the base type's tp_clear slot, e.g. PyListType.tp_clear(obj).
|
||
Only if it is known that the base type uses the same allocation
|
||
method as the subtype and the subtype requires no uninitialization
|
||
(e.g. it adds no data fields or all its data fields are numbers)
|
||
is it permissible to leave tp_dealloc set to zero in the subtype's
|
||
type object; it will be copied from the base type.
|
||
|
||
A subtype is not usable until PyType_InitDict() is called for it;
|
||
this is best done during module initialization, assuming the
|
||
subtype belongs to a module. An alternative for subtypes added to
|
||
the Python core (which don't live in a particular module) would be
|
||
to initialize the subtype in their constructor function. It is
|
||
allowed to call PyType_InitDict() more than once, the second and
|
||
further calls have no effect. In order to avoid unnecessary
|
||
calls, a test for tp_dict==NULL can be made.
|
||
|
||
If the subtype itself should be subtypable (usually desirable), it
|
||
should follow the same rules are given above for base types: have
|
||
a tp_construct that accepts a preallocated object and calls the
|
||
base type's tp_construct, and have a tp_clear that calls the base
|
||
type's tp_clear.
|
||
|
||
|
||
Subtyping in Python
|
||
|
||
The next step is to allow subtyping of selected built-in types
|
||
through a class statement in Python. Limiting ourselves to single
|
||
inheritance for now, here is what happens for a simple class
|
||
statement:
|
||
|
||
class C(B):
|
||
var1 = 1
|
||
def method1(self): pass
|
||
# etc.
|
||
|
||
The body of the class statement is executes in a fresh environment
|
||
(basically, a new dictionary used as local namespace), and then C
|
||
is created. The following explains how C is created.
|
||
|
||
Assume B is a type object. Since type objects are objects, and
|
||
every object has a type, B has a type. B's type is accessible via
|
||
type(B) or B.__class__ (the latter notation is new for types; it
|
||
is introduced in PEP 252). Let's say B's type is M (for
|
||
Metatype). The class statement will create a new type, C. Since
|
||
C will be a type object just like B, we view the creation of C as
|
||
an instantiation of the metatype, M. The information that needs
|
||
to be provided for the creation of C is: its name (in this example
|
||
the string "C"); the list of base classes (a singleton tuple
|
||
containing B); and the results of executing the class body, in the
|
||
form of a dictionary (e.g. {"var1": 1, "method1": <function...>,
|
||
...}).
|
||
|
||
According to the Don Beaudry hook, the following call is made:
|
||
|
||
C = M("C", (B,), dict)
|
||
|
||
(where dict is the dictionary resulting from execution of the
|
||
class body). In other words, the metatype (M) is called. Note
|
||
that even though we currently require there to be exactly one base
|
||
class, we still pass in a (singleton) sequence of base classes;
|
||
this makes it possible to support multiple inheritance later (or
|
||
for types with a different metaclass!) without changing this
|
||
interface.
|
||
|
||
Note that calling M requires that M itself has a type: the
|
||
meta-metatype. In the current implementation, I have introduced a
|
||
new type object for this purpose, named turtle because of my
|
||
fondness of the phrase "turtles all the way down". However I now
|
||
believe that it would be better if M were its own metatype, just
|
||
like before. This can be accomplished by making M's tp_call slot
|
||
slightly more flexible.
|
||
|
||
In any case, the work for creating C is done by M's tp_construct
|
||
slot. It allocates space for an "extended" type structure, which
|
||
contains space for: the type object; the auxiliary structures
|
||
(as_sequence etc.); the string object containing the type name (to
|
||
ensure that this object isn't deallocated while the type object is
|
||
still referencing it); and some more auxiliary storage (to be
|
||
described later). It initializes this storage to zeros except for
|
||
a few crucial slots (e.g. tp_name is set to point to the type
|
||
name) and then sets the tp_base slot to point to B. Then
|
||
PyType_InitDict() is called to inherit B's slots. Finally, C's
|
||
tp_dict slot is updated with the contents of the namespace
|
||
dictionary (the third argument to the call to M).
|
||
|
||
|
||
Copyright
|
||
|
||
This document has been placed in the public domain.
|
||
|
||
|
||
Local Variables:
|
||
mode: indented-text
|
||
indent-tabs-mode: nil
|
||
End:
|