Add a lot of text. A looooooot of text. Way too much rambling. And
it isn't even finished. I'll do that later. But at least there's some text here now...
This commit is contained in:
parent
d1e2189144
commit
8ce73a97d3
361
pep-0253.txt
361
pep-0253.txt
|
@ -11,11 +11,366 @@ Post-History:
|
|||
Abstract
|
||||
|
||||
This PEP proposes ways for creating subtypes of existing built-in
|
||||
types, either in C or in Python.
|
||||
types, either in C or in Python. The text is currently long and
|
||||
rambling; I'll go over it again later to make it shorter.
|
||||
|
||||
Introduction
|
||||
Traditionally, types in Python have been created statically, by
|
||||
declaring a global variable of type PyTypeObject and initializing
|
||||
it with a static initializer. The fields in the type object
|
||||
describe all aspects of a Python object that are relevant to the
|
||||
Python interpreter. A few fields contain dimensional information
|
||||
(e.g. the basic allocation size of instances), others contain
|
||||
various flags, but most fields are pointers to functions to
|
||||
implement various kinds of behaviors. A NULL pointer means that
|
||||
the type does not implement the specific behavior; in that case
|
||||
the system may provide a default behavior in that case or raise an
|
||||
exception when the behavior is invoked. Some collections of
|
||||
functions pointers that are usually defined together are obtained
|
||||
indirectly via a pointer to an additional structure containing.
|
||||
|
||||
[XXX to be done.]
|
||||
While the details of initializing a PyTypeObject structure haven't
|
||||
been documented as such, they are easily glanced from the examples
|
||||
in the source code, and I am assuming that the reader is
|
||||
sufficiently familiar with the traditional way of creating new
|
||||
Python types in C.
|
||||
|
||||
This PEP will introduce the following optional features to types:
|
||||
|
||||
- create an instance of a type by calling it
|
||||
|
||||
- create a subtype in C by specifying a base type pointer
|
||||
|
||||
- create a subtype in Python using a class statement
|
||||
|
||||
- multiple inheritance
|
||||
|
||||
This PEP builds on PEP 252, which adds standard introspection to
|
||||
types; in particular, types are assumed to have e.g. a __hash__
|
||||
method when the type object defines the tp_hash slot. PEP 252 also
|
||||
adds a dictionary to type objects which contains all methods. At
|
||||
the Python level, this dictionary is read-only; at the C level, it
|
||||
is accessible directly (but modifying it is not recommended except
|
||||
as part of initialization).
|
||||
|
||||
|
||||
Metatypes
|
||||
|
||||
Inevitably the following discussion will come to mention metatypes
|
||||
(or metaclasses). Metatypes are nothing new in Python: Python has
|
||||
always been able to talk about the type of a type:
|
||||
|
||||
>>> a = 0
|
||||
>>> type(a)
|
||||
<type 'int'>
|
||||
>>> type(type(a))
|
||||
<type 'type'>
|
||||
>>> type(type(type(a)))
|
||||
<type 'type'>
|
||||
>>>
|
||||
|
||||
In this example, type(a) is a "regular" type, and type(type(a)) is
|
||||
a metatype. While as distributed all types have the same metatype
|
||||
(which is also its own metatype), this is not a requirement, and
|
||||
in fact a useful 3rd party extension (ExtensionClasses by Jim
|
||||
Fulton) creates an additional metatype. A related feature is the
|
||||
"Don Beaudry hook", which says that if a metatype is callable, its
|
||||
instances (which are regular types) can be subclassed (really
|
||||
subtyped) using a Python class statement. We will use this rule
|
||||
to support subtyping of built-in types, and in the process we will
|
||||
introduce some additional metatypes, and a "metametatype". (The
|
||||
metametatype is nothing unusual; Python's type system allows any
|
||||
number of metalevels.)
|
||||
|
||||
Note that Python uses the concept of metatypes or metaclasses in a
|
||||
different way than Smalltalk. In Smalltalk-80, there is a
|
||||
hierarchy of metaclasses that mirrors the hierarchy of regular
|
||||
classes, metaclasses map 1-1 to classes (except for some funny
|
||||
business at the root of the hierarchy), and each class statement
|
||||
creates both a regular class and its metaclass, putting class
|
||||
methods in the metaclass and instance methods in the regular
|
||||
class.
|
||||
|
||||
Nice though this may be in the context of Smalltalk, it's not
|
||||
compatible with the traditional use of metatypes in Python, and I
|
||||
prefer to continue in the Python way. This means that Python
|
||||
metatypes are typically written in C, and may be shared between
|
||||
many regular types. (It will be possible to subtype metatypes in
|
||||
Python, so it won't be absolutely necessary to write C in order to
|
||||
use metatypes; but the power of Python metatypes will be limited,
|
||||
e.g. Python code will never be allowed to allocate raw memory and
|
||||
initialize it at will.)
|
||||
|
||||
|
||||
Instantiation by calling the type object
|
||||
|
||||
Traditionally, for each type there is at least one C function that
|
||||
creates instances of the type. This function has to take care of
|
||||
both allocating memory for the object and initializing that
|
||||
memory. As of Python 2.0, it also has to interface with the
|
||||
garbage collection subsystem, if the type chooses to participate
|
||||
in garbage collection (which is optional, but strongly recommended
|
||||
for so-called "container" types: types that may contain arbitrary
|
||||
references to other objects, and hence may participate in
|
||||
reference cycles).
|
||||
|
||||
If we're going to implement subtyping, we must separate allocation
|
||||
and initialization: typically, the most derived subtype is in
|
||||
charge of allocation (and hence deallocation!), but in most cases
|
||||
each base type's initializer (constructor) must still be called,
|
||||
from the "most base" type to the most derived type.
|
||||
|
||||
But let's first get the interface for instantiation right. If we
|
||||
call an object, the tp_call slot if its type gets invoked. Thus,
|
||||
if we call a type, this invokes the tp_call slot of the type's
|
||||
type: in other words, the tp_call slot of the metatype.
|
||||
Traditionally this has been a NULL pointer, meaning that types
|
||||
can't be called. Now we're adding a tp_call slot to the metatype,
|
||||
which makes all types "callable" in a trivial sense. But
|
||||
obviously the metatype's tp_call implementation doesn't know how
|
||||
to initialize individual types. So the type defines a new slot,
|
||||
tp_construct, which is invoked by the metatype's tp_call slot. If
|
||||
the tp_construct slot is NULL, the metatype's tp_call issues a
|
||||
nice error message: the type isn't callable.
|
||||
|
||||
We already know that tp_construct is responsible for initializing
|
||||
the object (this will be important for subtyping too). Who should
|
||||
be responsible for allocation of the new object? Either the
|
||||
metatype's tp_call can allocate the object, or the type's
|
||||
tp_construct can allocate it. The solution is copied from typical
|
||||
C++ implementations: if the metatype's tp_call allocates storage
|
||||
for the object it passes the storage as a pointer to the type's
|
||||
tp_construct; if the metatype's tp_call does not allocate storage,
|
||||
it passes a NULL pointer to the type's tp_call in which case the
|
||||
type allocates the storage itself. This moves the policy decision
|
||||
to the metatype, and different metatypes may have different
|
||||
policies. The mechanisms are fixed though: either the metatype's
|
||||
tp_call allocates storage, or the type's tp_construct allocates.
|
||||
|
||||
The deallocation mechanism chosen should match the allocation
|
||||
mechanism: an allocation policy should prescribe both the
|
||||
allocation and deallocation mechanism. And again, planning ahead
|
||||
for subtyping would be nice. But the available mechanisms are
|
||||
different. The deallocation function has always been part of the
|
||||
type structure, as tp_dealloc, which combines the
|
||||
"uninitialization" with deallocation. This was good enough for
|
||||
the traditional situation, where it matched the combined
|
||||
allocation and initialization of the creation function. But now
|
||||
imagine a type whose creation function uses a special free list
|
||||
for allocation. It's deallocation function puts the object's
|
||||
memory back on the same free list. But when allocation and
|
||||
creation are separate, the object may have been allocated from the
|
||||
regular heap, and it would be wrong (in some cases disastrous) if
|
||||
it were placed on the free list by the deallocation function.
|
||||
|
||||
A solution would be for the tp_construct function to somehow mark
|
||||
whether the object was allocated from the special free list, so
|
||||
that the tp_dealloc function can choose the right deallocation
|
||||
method (assuming that the only two alternatives are a special free
|
||||
list or the regular heap). A variant that doesn't require space
|
||||
for an allocation flag bit would be to have two type objects,
|
||||
identical in the contents of all their slots except for their
|
||||
deallocation slot. But this requires that all type-checking code
|
||||
(e.g. the PyDict_Check()) recognizes both types. We'll come back
|
||||
to this solution in the context of subtyping. Another alternative
|
||||
is to require the metatype's tp_call to leave the allocation to
|
||||
the tp_construct method, by passing in a NULL pointer. But this
|
||||
doesn't work once we allow subtyping.
|
||||
|
||||
Eventually, when we add any form of subtyping, we'll have to
|
||||
separate deallocation from uninitialization. The way to do this
|
||||
is to add a separate slot to the type object that does the
|
||||
uninitialization without the deallocation. Fortunately, there is
|
||||
already such a slot: tp_clear, currently used by the garbage
|
||||
collection subsystem. A simple rule makes this slot reusable as
|
||||
an uninitialization: for types that support separate allocation
|
||||
and initialization, tp_clear must be defined (even if the object
|
||||
doesn't support garbage collection) and it must DECREF all
|
||||
contained objects and FREE all other memory areas the object owns.
|
||||
It must also be reentrant: it must be possible to clear an already
|
||||
cleared object. The easiest way to do this is to replace all
|
||||
pointers DECREFed or FREEd with NULL pointers.
|
||||
|
||||
|
||||
Subtyping in C
|
||||
|
||||
The simplest form of subtyping is subtyping in C. It is the
|
||||
simplest form because we can require the C code to be aware of the
|
||||
various problems, and it's acceptable for C code that doesn't
|
||||
follow the rules to dump core; while for Python subtyping we would
|
||||
need to catch all errors before they become core dumps.
|
||||
|
||||
The idea behind subtyping is very similar to that of single
|
||||
inheritance in C++. A base type is described by a structure
|
||||
declaration plus a type object. A derived type can extend the
|
||||
structure (but must leave the names, order and type of the fields
|
||||
of the base structure unchanged) and can override certain slots in
|
||||
the type object, leaving others the same.
|
||||
|
||||
Not every type can serve as a base type. The base type must
|
||||
support separation of allocation and initialization by having a
|
||||
tp_construct slot that can be called with a preallocated object,
|
||||
and it must support uninitialization without deallocation by
|
||||
having a tp_clear slot as described above. The derived type must
|
||||
also export the structure declaration for its instances through a
|
||||
header file, as it is needed in order to derive a subtype. The
|
||||
type object for the base type must also be exported.
|
||||
|
||||
If the base type has a type-checking macro (e.g. PyDict_Check()),
|
||||
this macro may be changed to recognize subtypes. This can be done
|
||||
by using the new PyObject_TypeCheck(object, type) macro, which
|
||||
calls a function that follows the base class links. There are
|
||||
arguments for and against changing the type-checking macro in this
|
||||
way. The argument for the change should be clear: it allows
|
||||
subtypes to be used in places where the base type is required,
|
||||
which is often the prime attraction of subtyping (as opposed to
|
||||
sharing implementation). An argument against changing the
|
||||
type-checking macro could be that the type check is used
|
||||
frequently and a function call would slow things down too much
|
||||
(hard to believe); or one could fear that a subtype might break an
|
||||
invariant assumed by the support functions of the base type.
|
||||
Sometimes it would be wise to change the base type to remove this
|
||||
reliance; other times, it would be better to require that derived
|
||||
types (implemented in C) maintain the invariants.
|
||||
|
||||
The derived type begins by declaring a type structure which
|
||||
contains the base type's structure. For example, here's the type
|
||||
structure for a subtype of the built-in list type:
|
||||
|
||||
typedef struct {
|
||||
PyListObject list;
|
||||
int state;
|
||||
} spamlistobject;
|
||||
|
||||
Note that the base type structure field (here PyListObject) must
|
||||
be the first field in the structure; any following fields are
|
||||
extension fields. Also note that the base type is not referenced
|
||||
via a pointer; the actual contents of its structure must be
|
||||
included! (The goal is for the memory lay out of the beginning of
|
||||
the subtype instance to be the same as that of the base type
|
||||
instance.)
|
||||
|
||||
Next, the derived type must declare a type object and initialize
|
||||
it. Most of the slots in the type object may be initialized to
|
||||
zero, which is a signal that the base type slot must be copied
|
||||
into it. Some fields that must be initialized properly:
|
||||
|
||||
- the object header must be filled in as usual; the type should be
|
||||
PyType_Type
|
||||
|
||||
- the tp_basicsize field must be set to the size of the subtype
|
||||
instances
|
||||
|
||||
- the tp_base field must be set to the address of the base type's
|
||||
type object
|
||||
|
||||
- the tp_dealloc slot function must be a deallocation function for
|
||||
the subtype
|
||||
|
||||
- the tp_flags field must be set to the usual Py_TPFLAGS_DEFAULT
|
||||
value
|
||||
|
||||
- the tp_name field must be set (otherwise it will be inherited,
|
||||
which is wrong)
|
||||
|
||||
Exception: if the subtype defines no additional fields in its
|
||||
structure (i.e., it only defines new behavior, no new data), the
|
||||
tp_basicsize and the tp_dealloc fields may be set to zero. In
|
||||
order to complete the initialization of the type,
|
||||
PyType_InitDict() must be called. This replaces zero slots in the
|
||||
subtype with the value of the corresponding base type slots. It
|
||||
also fills in tp_dict, the type's dictionary; this is more a
|
||||
matter of PEP 252.
|
||||
|
||||
The subtype's tp_dealloc slot deserves special attention. It must
|
||||
uninitialize and deallocate the object in an orderly manner: first
|
||||
it must uninitialize the fields added by the extension type; then
|
||||
it must call the base type's tp_clear function; finally it must
|
||||
deallocate the memory of the object. Usually, the base type's
|
||||
tp_clear function has no global name; it is permissible to call it
|
||||
via the base type's tp_clear slot, e.g. PyListType.tp_clear(obj).
|
||||
Only if it is known that the base type uses the same allocation
|
||||
method as the subtype and the subtype requires no uninitialization
|
||||
(e.g. it adds no data fields or all its data fields are numbers)
|
||||
is it permissible to leave tp_dealloc set to zero in the subtype's
|
||||
type object; it will be copied from the base type.
|
||||
|
||||
A subtype is not usable until PyType_InitDict() is called for it;
|
||||
this is best done during module initialization, assuming the
|
||||
subtype belongs to a module. An alternative for subtypes added to
|
||||
the Python core (which don't live in a particular module) would be
|
||||
to initialize the subtype in their constructor function. It is
|
||||
allowed to call PyType_InitDict() more than once, the second and
|
||||
further calls have no effect. In order to avoid unnecessary
|
||||
calls, a test for tp_dict==NULL can be made.
|
||||
|
||||
If the subtype itself should be subtypable (usually desirable), it
|
||||
should follow the same rules are given above for base types: have
|
||||
a tp_construct that accepts a preallocated object and calls the
|
||||
base type's tp_construct, and have a tp_clear that calls the base
|
||||
type's tp_clear.
|
||||
|
||||
|
||||
Subtyping in Python
|
||||
|
||||
The next step is to allow subtyping of selected built-in types
|
||||
through a class statement in Python. Limiting ourselves to single
|
||||
inheritance for now, here is what happens for a simple class
|
||||
statement:
|
||||
|
||||
class C(B):
|
||||
var1 = 1
|
||||
def method1(self): pass
|
||||
# etc.
|
||||
|
||||
The body of the class statement is executes in a fresh environment
|
||||
(basically, a new dictionary used as local namespace), and then C
|
||||
is created. The following explains how C is created.
|
||||
|
||||
Assume B is a type object. Since type objects are objects, and
|
||||
every object has a type, B has a type. B's type is accessible via
|
||||
type(B) or B.__class__ (the latter notation is new for types; it
|
||||
is introduced in PEP 252). Let's say B's type is M (for
|
||||
Metatype). The class statement will create a new type, C. Since
|
||||
C will be a type object just like B, we view the creation of C as
|
||||
an instantiation of the metatype, M. The information that needs
|
||||
to be provided for the creation of C is: its name (in this example
|
||||
the string "C"); the list of base classes (a singleton tuple
|
||||
containing B); and the results of executing the class body, in the
|
||||
form of a dictionary (e.g. {"var1": 1, "method1": <function...>,
|
||||
...}).
|
||||
|
||||
According to the Don Beaudry hook, the following call is made:
|
||||
|
||||
C = M("C", (B,), dict)
|
||||
|
||||
(where dict is the dictionary resulting from execution of the
|
||||
class body). In other words, the metatype (M) is called. Note
|
||||
that even though we currently require there to be exactly one base
|
||||
class, we still pass in a (singleton) sequence of base classes;
|
||||
this makes it possible to support multiple inheritance later (or
|
||||
for types with a different metaclass!) without changing this
|
||||
interface.
|
||||
|
||||
Note that calling M requires that M itself has a type: the
|
||||
meta-metatype. In the current implementation, I have introduced a
|
||||
new type object for this purpose, named turtle because of my
|
||||
fondness of the phrase "turtles all the way down". However I now
|
||||
believe that it would be better if M were its own metatype, just
|
||||
like before. This can be accomplished by making M's tp_call slot
|
||||
slightly more flexible.
|
||||
|
||||
In any case, the work for creating C is done by M's tp_construct
|
||||
slot. It allocates space for an "extended" type structure, which
|
||||
contains space for: the type object; the auxiliary structures
|
||||
(as_sequence etc.); the string object containing the type name (to
|
||||
ensure that this object isn't deallocated while the type object is
|
||||
still referencing it); and some more auxiliary storage (to be
|
||||
described later). It initializes this storage to zeros except for
|
||||
a few crucial slots (e.g. tp_name is set to point to the type
|
||||
name) and then sets the tp_base slot to point to B. Then
|
||||
PyType_InitDict() is called to inherit B's slots. Finally, C's
|
||||
tp_dict slot is updated with the contents of the namespace
|
||||
dictionary (the third argument to the call to M).
|
||||
|
||||
|
||||
Copyright
|
||||
|
|
Loading…
Reference in New Issue