Another intermediate update. I've rewritten the requirements for a

base type to be subtypable.  Needs way more work!
This commit is contained in:
Guido van Rossum 2001-06-13 21:48:31 +00:00
parent 137a35ac05
commit b54a36962d
1 changed files with 317 additions and 134 deletions

View File

@ -10,14 +10,16 @@ Post-History:
Abstract
This PEP proposes ways for creating subtypes of existing built-in
types, either in C or in Python. The text is currently long and
rambling; I'll go over it again later to make it shorter.
This PEP proposes additions to the type object API that will allow
the creation of subtypes of built-in types, in C and in Python.
Introduction
Traditionally, types in Python have been created statically, by
declaring a global variable of type PyTypeObject and initializing
it with a static initializer. The fields in the type object
describe all aspects of a Python object that are relevant to the
describe all aspects of a Python type that are relevant to the
Python interpreter. A few fields contain dimensional information
(e.g. the basic allocation size of instances), others contain
various flags, but most fields are pointers to functions to
@ -26,39 +28,57 @@ Abstract
the system may provide a default behavior in that case or raise an
exception when the behavior is invoked. Some collections of
functions pointers that are usually defined together are obtained
indirectly via a pointer to an additional structure containing.
indirectly via a pointer to an additional structure containing
more function pointers.
While the details of initializing a PyTypeObject structure haven't
been documented as such, they are easily glanced from the examples
been documented as such, they are easily gleaned from the examples
in the source code, and I am assuming that the reader is
sufficiently familiar with the traditional way of creating new
Python types in C.
This PEP will introduce the following features:
- a type, like a class, can be a factory for its instances
- a type can be a factory function for its instances
- types can be subtyped in C by specifying a base type pointer
- types can be subtyped in C
- types can be subtyped in Python using the class statement
- types can be subtyped in Python with the class statement
- multiple inheritance from types (insofar as practical)
- multiple inheritance from types is supported (insofar as
practical)
- the standard coercions (int, tuple, str etc.) will be the
corresponding type objects
- the standard coercions functions (int, tuple, str etc.) will be
redefined to be the corresponding type objects, which serve as
their own factory functions
- a standard type hierarchy
- there will be a standard type hierarchy
- a class statement can contain a metaclass declaration,
specifying the metaclass to be used to create the new class
- a class statement can contain a slots declaration, specifying
the specific names of the instance variables supported
This PEP builds on pep-0252, which adds standard introspection to
types; in particular, types are assumed to have e.g. a __hash__
method when the type object defines the tp_hash slot. pep-0252 also
adds a dictionary to type objects which contains all methods. At
the Python level, this dictionary is read-only; at the C level, it
is accessible directly (but modifying it is not recommended except
as part of initialization).
types; e.g., when the type object defines the tp_hash slot, the
type object has a __hash__ method. pep-0252 also adds a
dictionary to type objects which contains all methods. At the
Python level, this dictionary is read-only for built-in types; at
the C level, it is accessible directly (but it should not be
modified except as part of initialization).
For binary compatibility, a flag bit in the tp_flags slot
indicates the existence of the various new slots in the type
object introduced below. Types that don't have the
Py_TPFLAGS_HAVE_CLASS bit set in their tp_flags field are assumed
to have NULL values for all the subtyping slots. (Warning: the
current implementation prototype is not yet consistent in its
checking of this flag bit. This should be fixed before the final
release.)
Metatypes
About metatypes
Inevitably the following discussion will come to mention metatypes
(or metaclasses). Metatypes are nothing new in Python: Python has
@ -75,26 +95,27 @@ Metatypes
In this example, type(a) is a "regular" type, and type(type(a)) is
a metatype. While as distributed all types have the same metatype
(which is also its own metatype), this is not a requirement, and
in fact a useful 3rd party extension (ExtensionClasses by Jim
Fulton) creates an additional metatype. A related feature is the
"Don Beaudry hook", which says that if a metatype is callable, its
instances (which are regular types) can be subclassed (really
subtyped) using a Python class statement. We will use this rule
to support subtyping of built-in types, and in fact it greatly
simplifies the logic of class creation to always simply call the
metatype. When no base class is specified, a default metatype is
called -- the default metatype is the "ClassType" object, so the
class statement will behave as before in the normal case.
(PyType_Type, which is also its own metatype), this is not a
requirement, and in fact a useful and relevant 3rd party extension
(ExtensionClasses by Jim Fulton) creates an additional metatype.
Python uses the concept of metatypes or metaclasses in a
different way than Smalltalk. In Smalltalk-80, there is a
hierarchy of metaclasses that mirrors the hierarchy of regular
classes, metaclasses map 1-1 to classes (except for some funny
business at the root of the hierarchy), and each class statement
creates both a regular class and its metaclass, putting class
methods in the metaclass and instance methods in the regular
class.
A related feature is the "Don Beaudry hook", which says that if a
metatype is callable, its instances (which are regular types) can
be subclassed (really subtyped) using a Python class statement.
I will use this rule to support subtyping of built-in types, and
in fact it greatly simplifies the logic of class creation to
always simply call the metatype. When no base class is specified,
a default metatype is called -- the default metatype is the
"ClassType" object, so the class statement will behave as before
in the normal case.
Python uses the concept of metatypes or metaclasses in a different
way than Smalltalk. In Smalltalk-80, there is a hierarchy of
metaclasses that mirrors the hierarchy of regular classes,
metaclasses map 1-1 to classes (except for some funny business at
the root of the hierarchy), and each class statement creates both
a regular class and its metaclass, putting class methods in the
metaclass and instance methods in the regular class.
Nice though this may be in the context of Smalltalk, it's not
compatible with the traditional use of metatypes in Python, and I
@ -106,98 +127,75 @@ Metatypes
e.g. Python code will never be allowed to allocate raw memory and
initialize it at will.)
Metatypes determine various *policies* for types, e.g. what
happens when a type is called, how dynamic types are (whether a
type's __dict__ can be modified after it is created), what the
method resolution order is, how instance attributes are looked
up, and so on.
Instantiation by calling the type object
I'll argue that left-to-right depth-first is not the best
solution when you want to get the most use from multiple
inheritance.
Traditionally, for each type there is at least one C function that
creates instances of the type (e.g. PyInt_FromLong(),
PyTuple_New() and so on). This function has to take care of
I'll argue that with multiple inheritance, the metatype of the
subtype must be a descendant of the metatypes of all base types.
I'll come back to metatypes later.
Making a type a factory for its instances
Traditionally, for each type there is at least one C factory
function that creates instances of the type (PyTuple_New(),
PyInt_FromLong() and so on). These factory functions take care of
both allocating memory for the object and initializing that
memory. As of Python 2.0, it also has to interface with the
memory. As of Python 2.0, they also have to interface with the
garbage collection subsystem, if the type chooses to participate
in garbage collection (which is optional, but strongly recommended
for so-called "container" types: types that may contain arbitrary
references to other objects, and hence may participate in
reference cycles).
If we're going to implement subtyping, we must separate allocation
and initialization: typically, the most derived subtype is in
charge of allocation (and hence deallocation!), but in most cases
each base type's initializer (constructor) must still be called,
from the "most base" type to the most derived type.
In this proposal, type objects can be factory functions for their
instances, making the types directly callable from Python. This
mimics the way classes are instantiated. Of course, the C APIs
for creating instances of various built-in types will remain valid
and probably the most common; and not all types will become their
own factory functions.
But let's first get the interface for instantiation right. If we
call an object, the tp_call slot if its type gets invoked. Thus,
if we call a type, this invokes the tp_call slot of the type's
type: in other words, the tp_call slot of the metatype.
Traditionally this has been a NULL pointer, meaning that types
can't be called. Now we're adding a tp_call slot to the metatype,
which makes all types "callable" in a trivial sense. But
obviously the metatype's tp_call implementation doesn't know how
to initialize the instances of individual types. So the type
defines a new slot, tp_new, which is invoked by the metatype's
tp_call slot. If the tp_new slot is NULL, the metatype's tp_call
issues a nice error message: the type isn't callable.
The type object has a new slot, tp_new, which can act as a factory
for instances of the type. Types are made callable by providing a
tp_call slot in PyType_Type (the metatype); the slot
implementation function looks for the tp_new slot of the type that
is being called.
This mechanism gives the maximum freedom to the type: a type's
tp_new doesn't necessarily have to return a new object, or even an
object that is an instance of the type (although the latter should
be rare).
If the type's tp_new slot is NULL, an exception is raised.
Otherwise, the tp_new slot is called. The signature for the
tp_new slot is
HIRO
PyObject *tp_new(PyTypeObject *type,
PyObject *args,
PyObject *kwds)
The deallocation mechanism chosen should match the allocation
mechanism: an allocation policy should prescribe both the
allocation and deallocation mechanism. And again, planning ahead
for subtyping would be nice. But the available mechanisms are
different. The deallocation function has always been part of the
type structure, as tp_dealloc, which combines the
"uninitialization" with deallocation. This was good enough for
the traditional situation, where it matched the combined
allocation and initialization of the creation function. But now
imagine a type whose creation function uses a special free list
for allocation. It's deallocation function puts the object's
memory back on the same free list. But when allocation and
creation are separate, the object may have been allocated from the
regular heap, and it would be wrong (in some cases disastrous) if
it were placed on the free list by the deallocation function.
where 'type' is the type whose tp_new slot is called, and 'args'
and 'kwds' are the sequential and keyword arguments to the call,
passed unchanged from tp_call. (The 'type' argument is used in
combination with inheritance, see below.)
A solution would be for the tp_construct function to somehow mark
whether the object was allocated from the special free list, so
that the tp_dealloc function can choose the right deallocation
method (assuming that the only two alternatives are a special free
list or the regular heap). A variant that doesn't require space
for an allocation flag bit would be to have two type objects,
identical in the contents of all their slots except for their
deallocation slot. But this requires that all type-checking code
(e.g. the PyDict_Check()) recognizes both types. We'll come back
to this solution in the context of subtyping. Another alternative
is to require the metatype's tp_call to leave the allocation to
the tp_construct method, by passing in a NULL pointer. But this
doesn't work once we allow subtyping.
Eventually, when we add any form of subtyping, we'll have to
separate deallocation from uninitialization. The way to do this
is to add a separate slot to the type object that does the
uninitialization without the deallocation. Fortunately, there is
already such a slot: tp_clear, currently used by the garbage
collection subsystem. A simple rule makes this slot reusable as
an uninitialization: for types that support separate allocation
and initialization, tp_clear must be defined (even if the object
doesn't support garbage collection) and it must DECREF all
contained objects and FREE all other memory areas the object owns.
It must also be reentrant: it must be possible to clear an already
cleared object. The easiest way to do this is to replace all
pointers DECREFed or FREEd with NULL pointers.
There are no constraints on the object type that is returned,
although by convention it should be an instance of the given
type. It is not necessary that a new object is returned; a
reference to an existing object is fine too. The return value
should always be a new reference, owned by the caller.
Subtyping in C
Requirements for a type to allow subtyping
The simplest form of subtyping is subtyping in C. It is the
simplest form because we can require the C code to be aware of the
various problems, and it's acceptable for C code that doesn't
follow the rules to dump core; while for Python subtyping we would
need to catch all errors before they become core dumps.
follow the rules to dump core. For added simplicity, it is
limited to single inheritance.
The idea behind subtyping is very similar to that of single
inheritance in C++. A base type is described by a structure
@ -206,31 +204,169 @@ Subtyping in C
of the base structure unchanged) and can override certain slots in
the type object, leaving others the same.
Not every type can serve as a base type. The base type must
support separation of allocation and initialization by having a
tp_construct slot that can be called with a preallocated object,
and it must support uninitialization without deallocation by
having a tp_clear slot as described above. The derived type must
also export the structure declaration for its instances through a
header file, as it is needed in order to derive a subtype. The
type object for the base type must also be exported.
Most issues have to do with construction and destruction of
instances of derived types.
Creation of a new object is separated into allocation and
initialization: allocation allocates the memory, and
initialization fill it with appropriate initial values. The
separation is needed for the convenience of subtypes.
Instantiation of a subtype goes as follows:
1. allocate memory for the whole (subtype) instance
2. initialize the base type
3. initialize the subtype's instance variables
If allocation and initialization were done by the same function,
you would need a way to tell the base type's constructor to
allocate additional memory for the subtype's instance variables,
and there would be no way to change the allocation method for a
subtype (without giving up on calling the base type to initialize
its part of the instance structure).
A similar reasoning applies to destruction: if a subtype changes
the instance allocator (e.g. to use a different heap), it must
also change the instance deallocator; but it must still call on
the base type's destructor to DECREF the base type's instance
variables.
In this proposal, I assign stricter meanings to two existing
slots for deallocation and deinitialization, and I add two new
slots for allocation and initialization.
The tp_clear slot gets the new task of deinitializing an object so
that all that remains to be done is free its memory. Originally,
all it had to do was clear object references. The difference is
subtle: the list and dictionary objects contain references to an
additional heap-allocated piece of memory that isn't freed by
tp_clear in Python 2.1, but which must be freed by tp_clear under
this proposal. It should be safe to call tp_clear repeatedly on
the same object. If an object contains no references to other
objects or heap-allocated memory, the tp_clear slot may be NULL.
The only additional requirement for the tp_dealloc slot is that it
should do the right thing whether or not tp_clear has been called.
The new slots are tp_alloc for allocation and tp_init for
initialization. Their signatures:
PyObject *tp_alloc(PyTypeObject *type,
PyObject *args,
PyObject *kwds)
int tp_init(PyObject *self,
PyObject *args,
PyObject *kwds)
The arguments for tp_alloc are the same as for tp_new, described
above. The arguments for tp_init are the same except that the
first argument is replaced with the instance to be initialized.
Its return value is 0 for success or -1 for failure.
It is possible that tp_init is called more than once or not at
all. The implementation should allow this usage. The object may
be non-functional until tp_init is called, and a second call to
tp_init may raise an exception, but it should not be possible to
cause a core dump or memory leakage this way.
Because tp_init is in a sense optional, tp_alloc is required to do
*some* initialization of the object. It is required to initialize
ob_refcnt to 1 and ob_type to its type argument. To be safe, it
should probably zero out the rest of the object.
The constructor arguments are passed to tp_alloc so that for
variable-size objects (like tuples and strings) it knows to
allocate the right amount of memory.
For immutable types, tp_alloc may have to do the full
initialization; otherwise, different calls to tp_init might cause
an immutable object to be modified, which is considered a grave
offense in Python (unlike in Fortran :-).
Not every type can serve as a base type. The assumption is made
that if a type has a non-NULL value in its tp_init slot, it is
ready to be subclassed; otherwise, it is not, and using it as a
base class will raise an exception.
In order to be usefully subtyped in C, a type must also export the
structure declaration for its instances through a header file, as
it is needed in order to derive a subtype. The type object for
the base type must also be exported.
If the base type has a type-checking macro (e.g. PyDict_Check()),
this macro may be changed to recognize subtypes. This can be done
by using the new PyObject_TypeCheck(object, type) macro, which
calls a function that follows the base class links. There are
arguments for and against changing the type-checking macro in this
way. The argument for the change should be clear: it allows
subtypes to be used in places where the base type is required,
which is often the prime attraction of subtyping (as opposed to
sharing implementation). An argument against changing the
type-checking macro could be that the type check is used
frequently and a function call would slow things down too much
(hard to believe); or one could fear that a subtype might break an
invariant assumed by the support functions of the base type.
Sometimes it would be wise to change the base type to remove this
reliance; other times, it would be better to require that derived
types (implemented in C) maintain the invariants.
this macro probably should be changed to recognize subtypes. This
can be done by using the new PyObject_TypeCheck(object, type)
macro, which calls a function that follows the base class links.
(An argument against changing the type-checking macro could be
that the type check is used frequently and a function call would
slow things down too much, but I find this hard to believe. One
could also fear that a subtype might break an invariant assumed by
the support functions of the base type. Usually it is best to
change the base type to remove this reliance, at least to the
point of raising an exception rather than dumping core when the
invariant is broken.)
Here are the inteactions between, tp_alloc, tp_clear, tp_dealloc
and subtypes; all assuming that the base type defines tp_init
(otherwise it cannot be subtyped anyway):
- If the base type's allocation scheme doesn't use the standard
heap, it should not define tp_alloc. This is a signal for the
subclass to provide its own tp_alloc *and* tp_dealloc
implementation (probably using the standard heap).
- If the base type's tp_dealloc does anything besides calling
PyObject_DEL() (typically, calling Py_XDECREF() on contained
objects or freeing dependent memory blocks), it should define a
tp_clear that does the same without calling PyObject_DEL(), and
which checks for zero pointers before and zeros the pointers
afterwards, so that calling tp_clear more than once or calling
tp_dealloc after tp_clear will not attempt to DECREF or free the
same object/memory twice. (It should also be allowed to
continue using the object after tp_clear -- tp_clear should
simply reset the object to its pristine state.)
- If the derived type overrides tp_alloc, it should also override
tp_dealloc, and tp_dealloc should call the derived type's
tp_clear if non-NULL (or its own tp_clear).
- If the derived type overrides tp_clear, it should call the base
type's tp_clear if non-NULL.
- If the base type defines tp_init as well as tp_new, its tp_new
should be inheritable: it should call the tp_alloc and the
tp_init of the type passed in as its first argument.
- If the base type defines tp_init as well as tp_alloc, its
tp_alloc should be inheritable: it should look in the
tp_basicsize slot of the type passed in for the amount of memory
to allocate, and it should initialize all allocated bytes to
zero.
- For types whose tp_itemsize is nonzero, the allocation size used
in tp_alloc should be tp_basicsize + n*tp_itemsize, rounded up
to the next integral multiple of sizeof(PyObject *), where n is
the number of items determined by the arguments to tp_alloc.
- Things are further complicated by the garbage collection API.
This affects tp_basicsize, and the actions to be taken by
tp_alloc. tp_alloc should look at the Py_TPFLAGS_GC flag bit in
the tp_flags field of the type passed in, and not assume that
this is the same as the corresponding bit in the base type. (In
part, the GC API is at fault; Neil Schemenauer has a patch that
fixes the API, but it is currently backwards incompatible.)
Note: the rules here are very complicated -- probably too
complicated. It may be better to give up on subtyping immutable
types, types with custom allocators, and types with variable size
allocation (such as int, string and tuple) -- then the rules can
be much simplified because you can assume allocation on the
standard heap, no requirement beyond zeroing memory in tp_alloc,
and no variable length allocation.
Creating a subtype of a built-in type in C
The derived type begins by declaring a type structure which
contains the base type's structure. For example, here's the type
@ -400,6 +536,53 @@ Copyright
This document has been placed in the public domain.
Junk text (to be reused somewhere above)
The deallocation mechanism chosen should match the allocation
mechanism: an allocation policy should prescribe both the
allocation and deallocation mechanism. And again, planning ahead
for subtyping would be nice. But the available mechanisms are
different. The deallocation function has always been part of the
type structure, as tp_dealloc, which combines the
"uninitialization" with deallocation. This was good enough for
the traditional situation, where it matched the combined
allocation and initialization of the creation function. But now
imagine a type whose creation function uses a special free list
for allocation. It's deallocation function puts the object's
memory back on the same free list. But when allocation and
creation are separate, the object may have been allocated from the
regular heap, and it would be wrong (in some cases disastrous) if
it were placed on the free list by the deallocation function.
A solution would be for the tp_construct function to somehow mark
whether the object was allocated from the special free list, so
that the tp_dealloc function can choose the right deallocation
method (assuming that the only two alternatives are a special free
list or the regular heap). A variant that doesn't require space
for an allocation flag bit would be to have two type objects,
identical in the contents of all their slots except for their
deallocation slot. But this requires that all type-checking code
(e.g. the PyDict_Check()) recognizes both types. We'll come back
to this solution in the context of subtyping. Another alternative
is to require the metatype's tp_call to leave the allocation to
the tp_construct method, by passing in a NULL pointer. But this
doesn't work once we allow subtyping.
Eventually, when we add any form of subtyping, we'll have to
separate deallocation from uninitialization. The way to do this
is to add a separate slot to the type object that does the
uninitialization without the deallocation. Fortunately, there is
already such a slot: tp_clear, currently used by the garbage
collection subsystem. A simple rule makes this slot reusable as
an uninitialization: for types that support separate allocation
and initialization, tp_clear must be defined (even if the object
doesn't support garbage collection) and it must DECREF all
contained objects and FREE all other memory areas the object owns.
It must also be reentrant: it must be possible to clear an already
cleared object. The easiest way to do this is to replace all
pointers DECREFed or FREEd with NULL pointers.
Local Variables:
mode: indented-text