Another intermediate checkin. Removed a lot of lies about an older

idea for what tp_alloc() should be.
This commit is contained in:
Guido van Rossum 2001-07-10 20:01:52 +00:00
parent 15299026e7
commit 14f1593cc7
1 changed files with 141 additions and 210 deletions

View File

@ -18,11 +18,11 @@ Introduction
Traditionally, types in Python have been created statically, by
declaring a global variable of type PyTypeObject and initializing
it with a static initializer. The fields in the type object
it with a static initializer. The slots in the type object
describe all aspects of a Python type that are relevant to the
Python interpreter. A few fields contain dimensional information
Python interpreter. A few slots contain dimensional information
(like the basic allocation size of instances), others contain
various flags, but most fields are pointers to functions to
various flags, but most slots are pointers to functions to
implement various kinds of behaviors. A NULL pointer means that
the type does not implement the specific behavior; in that case
the system may provide a default behavior in that case or raise an
@ -74,7 +74,7 @@ Introduction
For binary compatibility, a flag bit in the tp_flags slot
indicates the existence of the various new slots in the type
object introduced below. Types that don't have the
Py_TPFLAGS_HAVE_CLASS bit set in their tp_flags field are assumed
Py_TPFLAGS_HAVE_CLASS bit set in their tp_flags slot are assumed
to have NULL values for all the subtyping slots. (Warning: the
current implementation prototype is not yet consistent in its
checking of this flag bit. This should be fixed before the final
@ -251,6 +251,12 @@ Making a type a factory for its instances
dictionary an initial set of keys and values based on the
arguments passed in.
Note that for immutable object types, the initialization cannot be
done by the tp_init() slot: this would provide the Python user
with a way to change the initialiation. Therefore, immutable
objects typically have an empty tp_init() implementation and do
all their initialization in their tp_new() slot.
You may wonder why the tp_new() slot shouldn't call the tp_init()
slot itself. The reason is that in certain circumstances (like
support for persistent objects), it is important to be able to
@ -273,13 +279,13 @@ Making a type a factory for its instances
There's a third slot related to object creation: tp_alloc(). Its
responsibility is to allocate the memory for the object,
initialize the reference count and type pointer field, and
initialize the rest of the object to all zeros. It should also
register the object with the garbage collection subsystem if the
type supports garbage collection. This slot exists so that
derived types can override the memory allocation policy
(e.g. which heap is being used) separately from the initialization
code. The signature is:
initialize the reference count (ob_refcnt) and the type pointer
(ob_type), and initialize the rest of the object to all zeros. It
should also register the object with the garbage collection
subsystem if the type supports garbage collection. This slot
exists so that derived types can override the memory allocation
policy (e.g. which heap is being used) separately from the
initialization code. The signature is:
PyObject *tp_alloc(PyTypeObject *type, int nitems)
@ -294,8 +300,13 @@ Making a type a factory for its instances
function of the base class must call the tp_alloc() slot of the
type passed in as its first argument. It is the tp_new()
function's responsibility to calculate the number of items. The
tp_alloc() slot will set the ob_size field of the new object if
the type->tp_itemsize field is nonzero.
tp_alloc() slot will set the ob_size member of the new object if
the type->tp_itemsize member is nonzero.
(Note: in certain debugging compilation modes, the type structure
used to have members named tp_alloc and a tp_free slot already,
counters for the number of allocations and deallocations. These
are renamed to tp_allocs and tp_deallocs.)
XXX The keyword arguments are currently not passed to tp_new();
its kwds argument is always NULL. This is a relic from a previous
@ -304,189 +315,99 @@ Making a type a factory for its instances
should check that the arguments are acceptable, because they may
be called independently.
Standard implementations for tp_alloc() and tp_new() are
available. PyType_GenericAlloc() allocates an object from the
standard heap and initializes it properly. It uses the above
formula to determine the amount of memory to allocate, and takes
care of GC registration. The only reason not to use this
implementation would be to allocate objects from different heap
(as is done by some very small frequently used objects like ints
and tuples). PyType_GenericNew() adds very little: it just calls
the type's tp_alloc() slot with zero for nitems. But for mutable
types that do all their initialization in their tp_init() slot,
this may be just the ticket.
Requirements for a type to allow subtyping
The simplest form of subtyping is subtyping in C. It is the
simplest form because we can require the C code to be aware of the
various problems, and it's acceptable for C code that doesn't
follow the rules to dump core. For added simplicity, it is
limited to single inheritance.
Preparing a type for subtyping
The idea behind subtyping is very similar to that of single
inheritance in C++. A base type is described by a structure
declaration plus a type object. A derived type can extend the
structure (but must leave the names, order and type of the fields
declaration (similar to the C++ class declaration) plus a type
object (similar to the C++ vtable). A derived type can extend the
structure (but must leave the names, order and type of the members
of the base structure unchanged) and can override certain slots in
the type object, leaving others the same.
the type object, leaving others the same. (Unlike C++ vtables,
all Python type objects have the same memory lay-out.)
Most issues have to do with construction and destruction of
instances of derived types.
The base type must do the following:
Creation of a new object is separated into allocation and
initialization: allocation allocates the memory, and
initialization fill it with appropriate initial values. The
separation is needed for the convenience of subtypes.
Instantiation of a subtype goes as follows:
- Add the flag value Py_TPFLAGS_BASETYPE to tp_flags.
- Declare and use tp_new(), tp_alloc() and optional tp_init() slots.
- Declare and use tp_dealloc() and tp_free().
- Export its object structure declaration.
- Export a subtyping-aware type-checking macro.
1. allocate memory for the whole (subtype) instance
2. initialize the base type
3. initialize the subtype's instance variables
The requirements and signatures for tp_new(), tp_alloc() and
tp_init() have already been discussed above: tp_alloc() should
allocate the memory and initialize it to mostly zeros; tp_new()
should call the tp_alloc() slot and then proceed to do the
minimally required initialization; tp_init() should be used for
more extensive initialization of mutable objects.
If allocation and initialization were done by the same function,
you would need a way to tell the base type's constructor to
allocate additional memory for the subtype's instance variables,
and there would be no way to change the allocation method for a
subtype (without giving up on calling the base type to initialize
its part of the instance structure).
It should come as no surprise that there are similar conventions
at the end of an object's lifetime. The slots involved are
tp_dealloc() (familiar to all who have ever implemented a Python
extension type) and tp_free(), the new kid on he block. (The
names aren't quite symmetric; tp_free() corresponds to tp_alloc(),
which is fine, but tp_dealloc() corresponds to tp_new(). Maybe
the tp_dealloc slot should be renamed?)
A similar reasoning applies to destruction: if a subtype changes
the instance allocator (for example to use a different heap), it
must also change the instance deallocator; but it must still call
on the base type's destructor to DECREF the base type's instance
variables.
The tp_free() slot should be used to free the memory and
unregister the object with the garbage collection subsystem, and
can be overridden by a derived class; tp_dealloc() should
deinitialize the object (e.g. by calling Py_XDECREF() for various
sub-objects) and then call tp_free() to deallocate the memory.
The signature for tp_dealloc() is the same as it always was:
In this proposal, I assign stricter meanings to two existing
slots for deallocation and deinitialization, and I add two new
slots for allocation and initialization.
void tp_dealloc(PyObject *object)
The tp_clear slot gets the new task of deinitializing an object so
that all that remains to be done is free its memory. Originally,
all it had to do was clear object references. The difference is
subtle: the list and dictionary objects contain references to an
additional heap-allocated piece of memory that isn't freed by
tp_clear in Python 2.1, but which must be freed by tp_clear under
this proposal. It should be safe to call tp_clear repeatedly on
the same object. If an object contains no references to other
objects or heap-allocated memory, the tp_clear slot may be NULL.
The signature for tp_free() is the same:
The only additional requirement for the tp_dealloc slot is that it
should do the right thing whether or not tp_clear has been called.
void tp_free(PyObject *object)
The new slots are tp_alloc for allocation and tp_init for
initialization. Their signatures:
(In a previous version of this PEP, there was also role reserved
for the tp_clear() slot. This turned out to be a bad idea.)
PyObject *tp_alloc(PyTypeObject *type,
PyObject *args,
PyObject *kwds)
int tp_init(PyObject *self,
PyObject *args,
PyObject *kwds)
[XXX We'll have to rename tp_alloc to something else, because in
debug mode there's already a tp_alloc field.]
The arguments for tp_alloc are the same as for tp_new, described
above. The arguments for tp_init are the same except that the
first argument is replaced with the instance to be initialized.
Its return value is 0 for success or -1 for failure.
It is possible that tp_init is called more than once or not at
all. The implementation should allow this usage. The object may
be non-functional until tp_init is called, and a second call to
tp_init may raise an exception, but it should not be possible to
cause a core dump or memory leakage this way.
Because tp_init is in a sense optional, tp_alloc is required to do
*some* initialization of the object. It must initialize ob_refcnt
to 1 and ob_type to its type argument. It should zero out the
rest of the object.
The constructor arguments are passed to tp_alloc so that for
variable-size objects (like tuples and strings) it knows to
allocate the right amount of memory.
For immutable types, tp_alloc may have to do the full
initialization; otherwise, different calls to tp_init might cause
an immutable object to be modified, which is considered a grave
offense in Python (unlike in Fortran :-).
Not every type can serve as a base type. The assumption is made
that if a type has a non-NULL value in its tp_init slot, it is
ready to be subclassed; otherwise, it is not, and using it as a
base class will raise an exception.
In order to be usefully subtyped in C, a type must also export the
In order to be usefully subtyped in C, a type must export the
structure declaration for its instances through a header file, as
it is needed in order to derive a subtype. The type object for
the base type must also be exported.
If the base type has a type-checking macro (like PyDict_Check()),
this macro probably should be changed to recognize subtypes. This
can be done by using the new PyObject_TypeCheck(object, type)
macro, which calls a function that follows the base class links.
this macro should be made to recognize subtypes. This can be done
by using the new PyObject_TypeCheck(object, type) macro, which
calls a function that follows the base class links.
(An argument against changing the type-checking macro could be
that the type check is used frequently and a function call would
slow things down too much, but I find this hard to believe. One
could also fear that a subtype might break an invariant assumed by
the support functions of the base type. Usually it is best to
change the base type to remove this reliance, at least to the
point of raising an exception rather than dumping core when the
invariant is broken.)
The PyObject_TypeCheck() macro contains a slight optimization: it
first compares object->ob_type directly to the type argument, and
if this is a match, bypasses the function call. This should make
it fast enough for most situations.
Here are the inteactions between, tp_alloc, tp_clear, tp_dealloc
and subtypes; all assuming that the base type defines tp_init
(otherwise it cannot be subtyped anyway):
- If the base type's allocation scheme doesn't use the standard
heap, it should not define tp_alloc. This is a signal for the
subclass to provide its own tp_alloc *and* tp_dealloc
implementation (probably using the standard heap).
- If the base type's tp_dealloc does anything besides calling
PyObject_DEL() (typically, calling Py_XDECREF() on contained
objects or freeing dependent memory blocks), it should define a
tp_clear that does the same without calling PyObject_DEL(), and
which checks for zero pointers before and zeros the pointers
afterwards, so that calling tp_clear more than once or calling
tp_dealloc after tp_clear will not attempt to DECREF or free the
same object/memory twice. (It should also be allowed to
continue using the object after tp_clear -- tp_clear should
simply reset the object to its pristine state.)
- If the derived type overrides tp_alloc, it should also override
tp_dealloc, and tp_dealloc should call the derived type's
tp_clear if non-NULL (or its own tp_clear).
- If the derived type overrides tp_clear, it should call the base
type's tp_clear if non-NULL.
- If the base type defines tp_init as well as tp_new, its tp_new
should be inheritable: it should call the tp_alloc and the
tp_init of the type passed in as its first argument.
- If the base type defines tp_init as well as tp_alloc, its
tp_alloc should be inheritable: it should look in the
tp_basicsize slot of the type passed in for the amount of memory
to allocate, and it should initialize all allocated bytes to
zero.
- For types whose tp_itemsize is nonzero, the allocation size used
in tp_alloc should be tp_basicsize + n*tp_itemsize, rounded up
to the next integral multiple of sizeof(PyObject *), where n is
the number of items determined by the arguments to tp_alloc.
- Things are further complicated by the garbage collection API.
This affects tp_basicsize, and the actions to be taken by
tp_alloc. tp_alloc should look at the Py_TPFLAGS_GC flag bit in
the tp_flags field of the type passed in, and not assume that
this is the same as the corresponding bit in the base type. (In
part, the GC API is at fault; Neil Schemenauer has a patch that
fixes the API, but it is currently backwards incompatible.)
Note: the rules here are very complicated -- probably too
complicated. It may be better to give up on subtyping immutable
types, types with custom allocators, and types with variable size
allocation (such as int, string and tuple) -- then the rules can
be much simplified because you can assume allocation on the
standard heap, no requirement beyond zeroing memory in tp_alloc,
and no variable length allocation.
Note that this change in the type-checking macro means that C
functions that require an instance of the base type may be invoked
with instances of the derived type. Before enabling subtyping of
a particular type, its code should be checked to make sure that
this won't break anything.
Creating a subtype of a built-in type in C
The simplest form of subtyping is subtyping in C. It is the
simplest form because we can require the C code to be aware of
some of the problems, and it's acceptable for C code that doesn't
follow the rules to dump core. For added simplicity, it is
limited to single inheritance.
Let's assume we're deriving from a mutable base type whose
tp_itemsize is zero. The subtype code is not GC-aware, although
it may inherit GC-awareness from the base type (this is
@ -501,85 +422,95 @@ Creating a subtype of a built-in type in C
int state;
} spamlistobject;
Note that the base type structure field (here PyListObject) must
be the first field in the structure; any following fields are
extension fields. Also note that the base type is not referenced
via a pointer; the actual contents of its structure must be
included! (The goal is for the memory lay out of the beginning of
the subtype instance to be the same as that of the base type
Note that the base type structure member (here PyListObject) must
be the first member of the structure; any following members are
additions. Also note that the base type is not referenced via a
pointer; the actual contents of its structure must be included!
(The goal is for the memory lay out of the beginning of the
subtype instance to be the same as that of the base type
instance.)
Next, the derived type must declare a type object and initialize
it. Most of the slots in the type object may be initialized to
zero, which is a signal that the base type slot must be copied
into it. Some fields that must be initialized properly:
into it. Some slots that must be initialized properly:
- The object header must be filled in as usual; the type should be
&PyType_Type.
- The tp_basicsize field must be set to the size of the subtype
- The tp_basicsize slot must be set to the size of the subtype
instance struct (in the above example: sizeof(spamlistobject)).
- The tp_base field must be set to the address of the base type's
- The tp_base slot must be set to the address of the base type's
type object.
- If the derived slot defines any pointer fields, the tp_dealloc
- If the derived slot defines any pointer members, the tp_dealloc
slot function requires special attention, see below; otherwise,
it can be set to zero, to inherit the base type's deallocation
function.
- The tp_flags field must be set to the usual Py_TPFLAGS_DEFAULT
- The tp_flags slot must be set to the usual Py_TPFLAGS_DEFAULT
value.
- The tp_name field must be set; it is recommended to set tp_doc
- The tp_name slot must be set; it is recommended to set tp_doc
as well (these are not inherited).
Exception: if the subtype defines no additional fields in its
structure (it only defines new behavior, no new data), the
tp_basicsize and the tp_dealloc fields may be set to zero.
In order to complete the initialization of the type,
PyType_InitDict() must be called. This replaces zero slots in the
subtype with the value of the corresponding base type slots. (It
also fills in tp_dict, the type's dictionary, and does various
other initializations necessary for type objects.)
If the subtype defines no additional structure members (it only
defines new behavior, no new data), the tp_basicsize and the
tp_dealloc slots may be left set to zero.
The subtype's tp_dealloc slot deserves special attention. If the
derived type defines no additional pointers that need to be
derived type defines no additional pointer members that need to be
DECREF'ed or freed when the object is deallocated, it can be set
to zero. Otherwise, the subtype's deallocation function must call
Py_XDECREF() for any PyObject * fields and the correct memory
to zero. Otherwise, the subtype's tp_dealloc() function must call
Py_XDECREF() for any PyObject * members and the correct memory
freeing function for any other pointers it owns, and then call the
base class's tp_dealloc slot. Because deallocation functions
typically are not exported, this call has to be made via the base
type's type structure, for example, when deriving from the
base class's tp_dealloc() slot. This call has to be made via the
base type's type structure, for example, when deriving from the
standard list type:
PyList_Type.tp_dealloc(self);
(If the subtype uses a different allocation heap than the base
type, the subtype must call the base type's tp_clear() slot
instead, followed by a call to free the object's memory from the
appropriate heap, such as PyObject_DEL(self) if the subtype uses
the standard heap. But in this case subtyping is not
recommended.)
If the subtype wants to use a different allocation heap than the
base type, the subtype must override both the tp_alloc() and the
tp_free() slots. These will be called by the base class's
tp_new() and tp_dealloc() slots, respectively.
In order to complete the initialization of the type,
PyType_InitDict() must be called. This replaces slots initialized
to zero in the subtype with the value of the corresponding base
type slots. (It also fills in tp_dict, the type's dictionary, and
does various other initializations necessary for type objects.)
A subtype is not usable until PyType_InitDict() is called for it;
this is best done during module initialization, assuming the
subtype belongs to a module. An alternative for subtypes added to
the Python core (which don't live in a particular module) would be
to initialize the subtype in their constructor function. It is
allowed to call PyType_InitDict() more than once, the second and
allowed to call PyType_InitDict() more than once; the second and
further calls have no effect. In order to avoid unnecessary
calls, a test for tp_dict==NULL can be made.
To create a subtype instance, the base type's tp_alloc slot must
be called with the subtype as its first argument. Then, if the
base type has a tp_init slot, that must be called to initialize
the base portion of the instance; finally the subtype's own fields
must be initialized. After allocation, the initialization can
also be done by calling the subtype's tp_init slot, assuming this
correctly calls its base type's tp_init slot.
(During initialization of the Python interpreter, some types are
actually used before they are initialized. As long as the slots
that are actually needed are initialized, especially tp_dealloc,
this works, but it is fragile and not recommended as a general
practice.)
To create a subtype instance, the subtype's tp_new() slot is
called. This should first call the base type's tp_new() slot and
then initialize the subtype's additional data members. To further
initialize the instance, the tp_init() slot is typically called.
Note that the tp_new() slot should *not* call the tp_init() slot;
this is up to tp_new()'s caller (typically a factory function).
There are circumstances where it is appropriate not to call
tp_init().
If a subtype defines a tp_init() slot, the tp_init() slot should
normally first call the base type's tp_init() slot.
(XXX There should be a paragraph or two about argument passing
here.)
Subtyping in Python