Another intermediate checkin. Removed a lot of lies about an older

idea for what tp_alloc() should be.
This commit is contained in:
Guido van Rossum 2001-07-10 20:01:52 +00:00
parent 15299026e7
commit 14f1593cc7
1 changed files with 141 additions and 210 deletions

View File

@ -18,11 +18,11 @@ Introduction
Traditionally, types in Python have been created statically, by Traditionally, types in Python have been created statically, by
declaring a global variable of type PyTypeObject and initializing declaring a global variable of type PyTypeObject and initializing
it with a static initializer. The fields in the type object it with a static initializer. The slots in the type object
describe all aspects of a Python type that are relevant to the describe all aspects of a Python type that are relevant to the
Python interpreter. A few fields contain dimensional information Python interpreter. A few slots contain dimensional information
(like the basic allocation size of instances), others contain (like the basic allocation size of instances), others contain
various flags, but most fields are pointers to functions to various flags, but most slots are pointers to functions to
implement various kinds of behaviors. A NULL pointer means that implement various kinds of behaviors. A NULL pointer means that
the type does not implement the specific behavior; in that case the type does not implement the specific behavior; in that case
the system may provide a default behavior in that case or raise an the system may provide a default behavior in that case or raise an
@ -74,7 +74,7 @@ Introduction
For binary compatibility, a flag bit in the tp_flags slot For binary compatibility, a flag bit in the tp_flags slot
indicates the existence of the various new slots in the type indicates the existence of the various new slots in the type
object introduced below. Types that don't have the object introduced below. Types that don't have the
Py_TPFLAGS_HAVE_CLASS bit set in their tp_flags field are assumed Py_TPFLAGS_HAVE_CLASS bit set in their tp_flags slot are assumed
to have NULL values for all the subtyping slots. (Warning: the to have NULL values for all the subtyping slots. (Warning: the
current implementation prototype is not yet consistent in its current implementation prototype is not yet consistent in its
checking of this flag bit. This should be fixed before the final checking of this flag bit. This should be fixed before the final
@ -251,6 +251,12 @@ Making a type a factory for its instances
dictionary an initial set of keys and values based on the dictionary an initial set of keys and values based on the
arguments passed in. arguments passed in.
Note that for immutable object types, the initialization cannot be
done by the tp_init() slot: this would provide the Python user
with a way to change the initialiation. Therefore, immutable
objects typically have an empty tp_init() implementation and do
all their initialization in their tp_new() slot.
You may wonder why the tp_new() slot shouldn't call the tp_init() You may wonder why the tp_new() slot shouldn't call the tp_init()
slot itself. The reason is that in certain circumstances (like slot itself. The reason is that in certain circumstances (like
support for persistent objects), it is important to be able to support for persistent objects), it is important to be able to
@ -273,13 +279,13 @@ Making a type a factory for its instances
There's a third slot related to object creation: tp_alloc(). Its There's a third slot related to object creation: tp_alloc(). Its
responsibility is to allocate the memory for the object, responsibility is to allocate the memory for the object,
initialize the reference count and type pointer field, and initialize the reference count (ob_refcnt) and the type pointer
initialize the rest of the object to all zeros. It should also (ob_type), and initialize the rest of the object to all zeros. It
register the object with the garbage collection subsystem if the should also register the object with the garbage collection
type supports garbage collection. This slot exists so that subsystem if the type supports garbage collection. This slot
derived types can override the memory allocation policy exists so that derived types can override the memory allocation
(e.g. which heap is being used) separately from the initialization policy (e.g. which heap is being used) separately from the
code. The signature is: initialization code. The signature is:
PyObject *tp_alloc(PyTypeObject *type, int nitems) PyObject *tp_alloc(PyTypeObject *type, int nitems)
@ -294,8 +300,13 @@ Making a type a factory for its instances
function of the base class must call the tp_alloc() slot of the function of the base class must call the tp_alloc() slot of the
type passed in as its first argument. It is the tp_new() type passed in as its first argument. It is the tp_new()
function's responsibility to calculate the number of items. The function's responsibility to calculate the number of items. The
tp_alloc() slot will set the ob_size field of the new object if tp_alloc() slot will set the ob_size member of the new object if
the type->tp_itemsize field is nonzero. the type->tp_itemsize member is nonzero.
(Note: in certain debugging compilation modes, the type structure
used to have members named tp_alloc and a tp_free slot already,
counters for the number of allocations and deallocations. These
are renamed to tp_allocs and tp_deallocs.)
XXX The keyword arguments are currently not passed to tp_new(); XXX The keyword arguments are currently not passed to tp_new();
its kwds argument is always NULL. This is a relic from a previous its kwds argument is always NULL. This is a relic from a previous
@ -304,189 +315,99 @@ Making a type a factory for its instances
should check that the arguments are acceptable, because they may should check that the arguments are acceptable, because they may
be called independently. be called independently.
Standard implementations for tp_alloc() and tp_new() are
available. PyType_GenericAlloc() allocates an object from the
standard heap and initializes it properly. It uses the above
formula to determine the amount of memory to allocate, and takes
care of GC registration. The only reason not to use this
implementation would be to allocate objects from different heap
(as is done by some very small frequently used objects like ints
and tuples). PyType_GenericNew() adds very little: it just calls
the type's tp_alloc() slot with zero for nitems. But for mutable
types that do all their initialization in their tp_init() slot,
this may be just the ticket.
Requirements for a type to allow subtyping
The simplest form of subtyping is subtyping in C. It is the Preparing a type for subtyping
simplest form because we can require the C code to be aware of the
various problems, and it's acceptable for C code that doesn't
follow the rules to dump core. For added simplicity, it is
limited to single inheritance.
The idea behind subtyping is very similar to that of single The idea behind subtyping is very similar to that of single
inheritance in C++. A base type is described by a structure inheritance in C++. A base type is described by a structure
declaration plus a type object. A derived type can extend the declaration (similar to the C++ class declaration) plus a type
structure (but must leave the names, order and type of the fields object (similar to the C++ vtable). A derived type can extend the
structure (but must leave the names, order and type of the members
of the base structure unchanged) and can override certain slots in of the base structure unchanged) and can override certain slots in
the type object, leaving others the same. the type object, leaving others the same. (Unlike C++ vtables,
all Python type objects have the same memory lay-out.)
Most issues have to do with construction and destruction of The base type must do the following:
instances of derived types.
Creation of a new object is separated into allocation and - Add the flag value Py_TPFLAGS_BASETYPE to tp_flags.
initialization: allocation allocates the memory, and - Declare and use tp_new(), tp_alloc() and optional tp_init() slots.
initialization fill it with appropriate initial values. The - Declare and use tp_dealloc() and tp_free().
separation is needed for the convenience of subtypes. - Export its object structure declaration.
Instantiation of a subtype goes as follows: - Export a subtyping-aware type-checking macro.
1. allocate memory for the whole (subtype) instance The requirements and signatures for tp_new(), tp_alloc() and
2. initialize the base type tp_init() have already been discussed above: tp_alloc() should
3. initialize the subtype's instance variables allocate the memory and initialize it to mostly zeros; tp_new()
should call the tp_alloc() slot and then proceed to do the
minimally required initialization; tp_init() should be used for
more extensive initialization of mutable objects.
If allocation and initialization were done by the same function, It should come as no surprise that there are similar conventions
you would need a way to tell the base type's constructor to at the end of an object's lifetime. The slots involved are
allocate additional memory for the subtype's instance variables, tp_dealloc() (familiar to all who have ever implemented a Python
and there would be no way to change the allocation method for a extension type) and tp_free(), the new kid on he block. (The
subtype (without giving up on calling the base type to initialize names aren't quite symmetric; tp_free() corresponds to tp_alloc(),
its part of the instance structure). which is fine, but tp_dealloc() corresponds to tp_new(). Maybe
the tp_dealloc slot should be renamed?)
A similar reasoning applies to destruction: if a subtype changes The tp_free() slot should be used to free the memory and
the instance allocator (for example to use a different heap), it unregister the object with the garbage collection subsystem, and
must also change the instance deallocator; but it must still call can be overridden by a derived class; tp_dealloc() should
on the base type's destructor to DECREF the base type's instance deinitialize the object (e.g. by calling Py_XDECREF() for various
variables. sub-objects) and then call tp_free() to deallocate the memory.
The signature for tp_dealloc() is the same as it always was:
In this proposal, I assign stricter meanings to two existing void tp_dealloc(PyObject *object)
slots for deallocation and deinitialization, and I add two new
slots for allocation and initialization.
The tp_clear slot gets the new task of deinitializing an object so The signature for tp_free() is the same:
that all that remains to be done is free its memory. Originally,
all it had to do was clear object references. The difference is
subtle: the list and dictionary objects contain references to an
additional heap-allocated piece of memory that isn't freed by
tp_clear in Python 2.1, but which must be freed by tp_clear under
this proposal. It should be safe to call tp_clear repeatedly on
the same object. If an object contains no references to other
objects or heap-allocated memory, the tp_clear slot may be NULL.
The only additional requirement for the tp_dealloc slot is that it void tp_free(PyObject *object)
should do the right thing whether or not tp_clear has been called.
The new slots are tp_alloc for allocation and tp_init for (In a previous version of this PEP, there was also role reserved
initialization. Their signatures: for the tp_clear() slot. This turned out to be a bad idea.)
PyObject *tp_alloc(PyTypeObject *type, In order to be usefully subtyped in C, a type must export the
PyObject *args,
PyObject *kwds)
int tp_init(PyObject *self,
PyObject *args,
PyObject *kwds)
[XXX We'll have to rename tp_alloc to something else, because in
debug mode there's already a tp_alloc field.]
The arguments for tp_alloc are the same as for tp_new, described
above. The arguments for tp_init are the same except that the
first argument is replaced with the instance to be initialized.
Its return value is 0 for success or -1 for failure.
It is possible that tp_init is called more than once or not at
all. The implementation should allow this usage. The object may
be non-functional until tp_init is called, and a second call to
tp_init may raise an exception, but it should not be possible to
cause a core dump or memory leakage this way.
Because tp_init is in a sense optional, tp_alloc is required to do
*some* initialization of the object. It must initialize ob_refcnt
to 1 and ob_type to its type argument. It should zero out the
rest of the object.
The constructor arguments are passed to tp_alloc so that for
variable-size objects (like tuples and strings) it knows to
allocate the right amount of memory.
For immutable types, tp_alloc may have to do the full
initialization; otherwise, different calls to tp_init might cause
an immutable object to be modified, which is considered a grave
offense in Python (unlike in Fortran :-).
Not every type can serve as a base type. The assumption is made
that if a type has a non-NULL value in its tp_init slot, it is
ready to be subclassed; otherwise, it is not, and using it as a
base class will raise an exception.
In order to be usefully subtyped in C, a type must also export the
structure declaration for its instances through a header file, as structure declaration for its instances through a header file, as
it is needed in order to derive a subtype. The type object for it is needed in order to derive a subtype. The type object for
the base type must also be exported. the base type must also be exported.
If the base type has a type-checking macro (like PyDict_Check()), If the base type has a type-checking macro (like PyDict_Check()),
this macro probably should be changed to recognize subtypes. This this macro should be made to recognize subtypes. This can be done
can be done by using the new PyObject_TypeCheck(object, type) by using the new PyObject_TypeCheck(object, type) macro, which
macro, which calls a function that follows the base class links. calls a function that follows the base class links.
(An argument against changing the type-checking macro could be The PyObject_TypeCheck() macro contains a slight optimization: it
that the type check is used frequently and a function call would first compares object->ob_type directly to the type argument, and
slow things down too much, but I find this hard to believe. One if this is a match, bypasses the function call. This should make
could also fear that a subtype might break an invariant assumed by it fast enough for most situations.
the support functions of the base type. Usually it is best to
change the base type to remove this reliance, at least to the
point of raising an exception rather than dumping core when the
invariant is broken.)
Here are the inteactions between, tp_alloc, tp_clear, tp_dealloc Note that this change in the type-checking macro means that C
and subtypes; all assuming that the base type defines tp_init functions that require an instance of the base type may be invoked
(otherwise it cannot be subtyped anyway): with instances of the derived type. Before enabling subtyping of
a particular type, its code should be checked to make sure that
- If the base type's allocation scheme doesn't use the standard this won't break anything.
heap, it should not define tp_alloc. This is a signal for the
subclass to provide its own tp_alloc *and* tp_dealloc
implementation (probably using the standard heap).
- If the base type's tp_dealloc does anything besides calling
PyObject_DEL() (typically, calling Py_XDECREF() on contained
objects or freeing dependent memory blocks), it should define a
tp_clear that does the same without calling PyObject_DEL(), and
which checks for zero pointers before and zeros the pointers
afterwards, so that calling tp_clear more than once or calling
tp_dealloc after tp_clear will not attempt to DECREF or free the
same object/memory twice. (It should also be allowed to
continue using the object after tp_clear -- tp_clear should
simply reset the object to its pristine state.)
- If the derived type overrides tp_alloc, it should also override
tp_dealloc, and tp_dealloc should call the derived type's
tp_clear if non-NULL (or its own tp_clear).
- If the derived type overrides tp_clear, it should call the base
type's tp_clear if non-NULL.
- If the base type defines tp_init as well as tp_new, its tp_new
should be inheritable: it should call the tp_alloc and the
tp_init of the type passed in as its first argument.
- If the base type defines tp_init as well as tp_alloc, its
tp_alloc should be inheritable: it should look in the
tp_basicsize slot of the type passed in for the amount of memory
to allocate, and it should initialize all allocated bytes to
zero.
- For types whose tp_itemsize is nonzero, the allocation size used
in tp_alloc should be tp_basicsize + n*tp_itemsize, rounded up
to the next integral multiple of sizeof(PyObject *), where n is
the number of items determined by the arguments to tp_alloc.
- Things are further complicated by the garbage collection API.
This affects tp_basicsize, and the actions to be taken by
tp_alloc. tp_alloc should look at the Py_TPFLAGS_GC flag bit in
the tp_flags field of the type passed in, and not assume that
this is the same as the corresponding bit in the base type. (In
part, the GC API is at fault; Neil Schemenauer has a patch that
fixes the API, but it is currently backwards incompatible.)
Note: the rules here are very complicated -- probably too
complicated. It may be better to give up on subtyping immutable
types, types with custom allocators, and types with variable size
allocation (such as int, string and tuple) -- then the rules can
be much simplified because you can assume allocation on the
standard heap, no requirement beyond zeroing memory in tp_alloc,
and no variable length allocation.
Creating a subtype of a built-in type in C Creating a subtype of a built-in type in C
The simplest form of subtyping is subtyping in C. It is the
simplest form because we can require the C code to be aware of
some of the problems, and it's acceptable for C code that doesn't
follow the rules to dump core. For added simplicity, it is
limited to single inheritance.
Let's assume we're deriving from a mutable base type whose Let's assume we're deriving from a mutable base type whose
tp_itemsize is zero. The subtype code is not GC-aware, although tp_itemsize is zero. The subtype code is not GC-aware, although
it may inherit GC-awareness from the base type (this is it may inherit GC-awareness from the base type (this is
@ -501,85 +422,95 @@ Creating a subtype of a built-in type in C
int state; int state;
} spamlistobject; } spamlistobject;
Note that the base type structure field (here PyListObject) must Note that the base type structure member (here PyListObject) must
be the first field in the structure; any following fields are be the first member of the structure; any following members are
extension fields. Also note that the base type is not referenced additions. Also note that the base type is not referenced via a
via a pointer; the actual contents of its structure must be pointer; the actual contents of its structure must be included!
included! (The goal is for the memory lay out of the beginning of (The goal is for the memory lay out of the beginning of the
the subtype instance to be the same as that of the base type subtype instance to be the same as that of the base type
instance.) instance.)
Next, the derived type must declare a type object and initialize Next, the derived type must declare a type object and initialize
it. Most of the slots in the type object may be initialized to it. Most of the slots in the type object may be initialized to
zero, which is a signal that the base type slot must be copied zero, which is a signal that the base type slot must be copied
into it. Some fields that must be initialized properly: into it. Some slots that must be initialized properly:
- The object header must be filled in as usual; the type should be - The object header must be filled in as usual; the type should be
&PyType_Type. &PyType_Type.
- The tp_basicsize field must be set to the size of the subtype - The tp_basicsize slot must be set to the size of the subtype
instance struct (in the above example: sizeof(spamlistobject)). instance struct (in the above example: sizeof(spamlistobject)).
- The tp_base field must be set to the address of the base type's - The tp_base slot must be set to the address of the base type's
type object. type object.
- If the derived slot defines any pointer fields, the tp_dealloc - If the derived slot defines any pointer members, the tp_dealloc
slot function requires special attention, see below; otherwise, slot function requires special attention, see below; otherwise,
it can be set to zero, to inherit the base type's deallocation it can be set to zero, to inherit the base type's deallocation
function. function.
- The tp_flags field must be set to the usual Py_TPFLAGS_DEFAULT - The tp_flags slot must be set to the usual Py_TPFLAGS_DEFAULT
value. value.
- The tp_name field must be set; it is recommended to set tp_doc - The tp_name slot must be set; it is recommended to set tp_doc
as well (these are not inherited). as well (these are not inherited).
Exception: if the subtype defines no additional fields in its If the subtype defines no additional structure members (it only
structure (it only defines new behavior, no new data), the defines new behavior, no new data), the tp_basicsize and the
tp_basicsize and the tp_dealloc fields may be set to zero. tp_dealloc slots may be left set to zero.
In order to complete the initialization of the type,
PyType_InitDict() must be called. This replaces zero slots in the
subtype with the value of the corresponding base type slots. (It
also fills in tp_dict, the type's dictionary, and does various
other initializations necessary for type objects.)
The subtype's tp_dealloc slot deserves special attention. If the The subtype's tp_dealloc slot deserves special attention. If the
derived type defines no additional pointers that need to be derived type defines no additional pointer members that need to be
DECREF'ed or freed when the object is deallocated, it can be set DECREF'ed or freed when the object is deallocated, it can be set
to zero. Otherwise, the subtype's deallocation function must call to zero. Otherwise, the subtype's tp_dealloc() function must call
Py_XDECREF() for any PyObject * fields and the correct memory Py_XDECREF() for any PyObject * members and the correct memory
freeing function for any other pointers it owns, and then call the freeing function for any other pointers it owns, and then call the
base class's tp_dealloc slot. Because deallocation functions base class's tp_dealloc() slot. This call has to be made via the
typically are not exported, this call has to be made via the base base type's type structure, for example, when deriving from the
type's type structure, for example, when deriving from the
standard list type: standard list type:
PyList_Type.tp_dealloc(self); PyList_Type.tp_dealloc(self);
(If the subtype uses a different allocation heap than the base If the subtype wants to use a different allocation heap than the
type, the subtype must call the base type's tp_clear() slot base type, the subtype must override both the tp_alloc() and the
instead, followed by a call to free the object's memory from the tp_free() slots. These will be called by the base class's
appropriate heap, such as PyObject_DEL(self) if the subtype uses tp_new() and tp_dealloc() slots, respectively.
the standard heap. But in this case subtyping is not
recommended.) In order to complete the initialization of the type,
PyType_InitDict() must be called. This replaces slots initialized
to zero in the subtype with the value of the corresponding base
type slots. (It also fills in tp_dict, the type's dictionary, and
does various other initializations necessary for type objects.)
A subtype is not usable until PyType_InitDict() is called for it; A subtype is not usable until PyType_InitDict() is called for it;
this is best done during module initialization, assuming the this is best done during module initialization, assuming the
subtype belongs to a module. An alternative for subtypes added to subtype belongs to a module. An alternative for subtypes added to
the Python core (which don't live in a particular module) would be the Python core (which don't live in a particular module) would be
to initialize the subtype in their constructor function. It is to initialize the subtype in their constructor function. It is
allowed to call PyType_InitDict() more than once, the second and allowed to call PyType_InitDict() more than once; the second and
further calls have no effect. In order to avoid unnecessary further calls have no effect. In order to avoid unnecessary
calls, a test for tp_dict==NULL can be made. calls, a test for tp_dict==NULL can be made.
To create a subtype instance, the base type's tp_alloc slot must (During initialization of the Python interpreter, some types are
be called with the subtype as its first argument. Then, if the actually used before they are initialized. As long as the slots
base type has a tp_init slot, that must be called to initialize that are actually needed are initialized, especially tp_dealloc,
the base portion of the instance; finally the subtype's own fields this works, but it is fragile and not recommended as a general
must be initialized. After allocation, the initialization can practice.)
also be done by calling the subtype's tp_init slot, assuming this
correctly calls its base type's tp_init slot. To create a subtype instance, the subtype's tp_new() slot is
called. This should first call the base type's tp_new() slot and
then initialize the subtype's additional data members. To further
initialize the instance, the tp_init() slot is typically called.
Note that the tp_new() slot should *not* call the tp_init() slot;
this is up to tp_new()'s caller (typically a factory function).
There are circumstances where it is appropriate not to call
tp_init().
If a subtype defines a tp_init() slot, the tp_init() slot should
normally first call the base type's tp_init() slot.
(XXX There should be a paragraph or two about argument passing
here.)
Subtyping in Python Subtyping in Python