Intermediate checkin (documented tp_new, tp_init, tp_alloc properly).

This commit is contained in:
Guido van Rossum 2001-07-10 17:11:19 +00:00
parent a921257c6a
commit 15299026e7
1 changed files with 162 additions and 60 deletions

View File

@ -21,7 +21,7 @@ Introduction
it with a static initializer. The fields in the type object
describe all aspects of a Python type that are relevant to the
Python interpreter. A few fields contain dimensional information
(e.g. the basic allocation size of instances), others contain
(like the basic allocation size of instances), others contain
various flags, but most fields are pointers to functions to
implement various kinds of behaviors. A NULL pointer means that
the type does not implement the specific behavior; in that case
@ -39,34 +39,37 @@ Introduction
This PEP will introduce the following features:
- a type can be a factory function for its instances
- a type can be a factory function for its instances
- types can be subtyped in C
- types can be subtyped in C
- types can be subtyped in Python with the class statement
- types can be subtyped in Python with the class statement
- multiple inheritance from types is supported (insofar as
practical)
- multiple inheritance from types is supported (insofar as
practical -- you still can't multiply inherit from list and
dictionary)
- the standard coercions functions (int, tuple, str etc.) will be
redefined to be the corresponding type objects, which serve as
their own factory functions
- the standard coercions functions (int, tuple, str etc.) will
be redefined to be the corresponding type objects, which serve
as their own factory functions
- there will be a standard type hierarchy
- a class statement can contain a __metaclass__ declaration,
specifying the metaclass to be used to create the new class
- a class statement can contain a metaclass declaration,
specifying the metaclass to be used to create the new class
- a class statement can contain a __slots__ declaration,
specifying the specific names of the instance variables
supported
- a class statement can contain a slots declaration, specifying
the specific names of the instance variables supported
- there will be a standard type hierarchy (maybe)
This PEP builds on PEP 252, which adds standard introspection to
types; e.g., when the type object defines the tp_hash slot, the
type object has a __hash__ method. PEP 252 also adds a
dictionary to type objects which contains all methods. At the
Python level, this dictionary is read-only for built-in types; at
the C level, it is accessible directly (but it should not be
modified except as part of initialization).
types; for example, when a particular type object initializes the
tp_hash slot, that type object has a __hash__ method when
introspected. PEP 252 also adds a dictionary to type objects
which contains all methods. At the Python level, this dictionary
is read-only for built-in types; at the C level, it is accessible
directly (but it should not be modified except as part of
initialization).
For binary compatibility, a flag bit in the tp_flags slot
indicates the existence of the various new slots in the type
@ -79,14 +82,14 @@ Introduction
In current Python, a distinction is made between types and
classes. This PEP together with PEP 254 will remove that
distinction. However, for backwards compatibility there will
probably remain a bit of a distinction for years to come, and
without PEP 254, the distinction is still large: types ultimately
have a built-in type as a base class, while classes ultimately
derive from a user-defined class. Therefore, in the rest of this
PEP, I will use the word type whenever I can -- including base
type or supertype, derived type or subtype, and metatype.
However, sometimes the terminology necessarily blends, e.g. an
distinction. However, for backwards compatibility the distinction
will probably remain for years to come, and without PEP 254, the
distinction is still large: types ultimately have a built-in type
as a base class, while classes ultimately derive from a
user-defined class. Therefore, in the rest of this PEP, I will
use the word type whenever I can -- including base type or
supertype, derived type or subtype, and metatype. However,
sometimes the terminology necessarily blends, for example an
object's type is given by its __class__ attribute, and subtyping
in Python is spelled with a class statement. If further
distinction is necessary, user-defined classes can be referred to
@ -95,9 +98,9 @@ Introduction
About metatypes
Inevitably the following discussion will come to mention metatypes
(or metaclasses). Metatypes are nothing new in Python: Python has
always been able to talk about the type of a type:
Inevitably the discussion comes to metatypes (or metaclasses).
Metatypes are nothing new in Python: Python has always been able
to talk about the type of a type:
>>> a = 0
>>> type(a)
@ -113,16 +116,19 @@ About metatypes
(PyType_Type, which is also its own metatype), this is not a
requirement, and in fact a useful and relevant 3rd party extension
(ExtensionClasses by Jim Fulton) creates an additional metatype.
The type of classic classes, known as types.ClassType, can also be
considered a distinct metatype.
A related feature is the "Don Beaudry hook", which says that if a
metatype is callable, its instances (which are regular types) can
be subclassed (really subtyped) using a Python class statement.
I will use this rule to support subtyping of built-in types, and
in fact it greatly simplifies the logic of class creation to
always simply call the metatype. When no base class is specified,
a default metatype is called -- the default metatype is the
"ClassType" object, so the class statement will behave as before
in the normal case.
A feature closely connected to metatypes is the "Don Beaudry
hook", which says that if a metatype is callable, its instances
(which are regular types) can be subclassed (really subtyped)
using a Python class statement. I will use this rule to support
subtyping of built-in types, and in fact it greatly simplifies the
logic of class creation to always simply call the metatype. When
no base class is specified, a default metatype is called -- the
default metatype is the "ClassType" object, so the class statement
will behave as before in the normal case. (This default can be
changed per module by setting the global variable __metaclass__.)
Python uses the concept of metatypes or metaclasses in a different
way than Smalltalk. In Smalltalk-80, there is a hierarchy of
@ -138,11 +144,11 @@ About metatypes
metatypes are typically written in C, and may be shared between
many regular types. (It will be possible to subtype metatypes in
Python, so it won't be absolutely necessary to write C in order to
use metatypes; but the power of Python metatypes will be limited,
e.g. Python code will never be allowed to allocate raw memory and
initialize it at will.)
use metatypes; but the power of Python metatypes will be limited.
For example, Python code will never be allowed to allocate raw
memory and initialize it at will.)
Metatypes determine various *policies* for types, e.g. what
Metatypes determine various *policies* for types,such as what
happens when a type is called, how dynamic types are (whether a
type's __dict__ can be modified after it is created), what the
method resolution order is, how instance attributes are looked
@ -184,6 +190,25 @@ Making a type a factory for its instances
implementation function looks for the tp_new slot of the type that
is being called.
(Confusion alert: the tp_call slot of a regular type object (such
as PyInt_Type or PyList_Type) defines what happens when
*instances* of that type are called; in particular, the tp_call
slot in the function type, PyFunction_Type, is the key to making
functions callable. As another example, PyInt_Type.tp_call is
NULL, because integers are not callable. The new paradigm makes
*type objects* callable. Since type objects are instances of
their metatype (PyType_Type), the metatype's tp_call slot
(PyType_Type.tp_call) points to a function that is invoked when
any type object is called. Now, since each type has do do
something different to create an instance of itself,
PyType_Type.tp_call immediately defers to the tp_new slot of the
type that is being called. To add to the confusion, PyType_Type
itself is also callable: its tp_new slot creates a new type. This
is used by the class statement (via the Don Beaudry hook, see
above). And what makes PyType_Type callable? The tp_call slot of
*its* metatype -- but since it is its own metatype, that is its
own tp_call slot!)
If the type's tp_new slot is NULL, an exception is raised.
Otherwise, the tp_new slot is called. The signature for the
tp_new slot is
@ -203,6 +228,82 @@ Making a type a factory for its instances
reference to an existing object is fine too. The return value
should always be a new reference, owned by the caller.
One the tp_new slot has returned an object, further initialization
is attempted by calling the tp_init() slot of the resulting
object's type, if not NULL. This has the following signature:
PyObject *tp_init(PyObject *self,
PyObject *args,
PyObject *kwds)
It corresponds more closely to the __init__() method of classic
classes, and in fact is mapped to that by the slot/special-method
correspondence rules. The difference in responsibilities between
the tp_new() slot and the tp_init() slot lies in the invariants
they ensure. The tp_new() slot should ensure only the most
essential invariants, without which the C code that implements the
object's wold break. The tp_init() slot should be used for
overridable user-specific initializations. Take for example the
dictionary type. The implementation has an internal pointer to a
hash table which should never be NULL. This invariant is taken
care of by the tp_new() slot for dictionaries. The dictionary
tp_init() slot, on the other hand, could be used to give the
dictionary an initial set of keys and values based on the
arguments passed in.
You may wonder why the tp_new() slot shouldn't call the tp_init()
slot itself. The reason is that in certain circumstances (like
support for persistent objects), it is important to be able to
create an object of a particular type without initializing it any
further than necessary. This may conveniently be done by calling
the tp_new() slot without calling tp_init(). It is also possible
that tp_init() is not called, or called more than once -- its
operation should be robust even in these anomalous cases.
For some objects, tp_new() may return an existing object. For
example, the factory function for integers caches the integers -1
throug 99. This is permissible only when the type argument to
tp_new() is the type that defined the tp_new() function (in the
example, if type == &PyInt_Type), and when the tp_init() slot for
this type does nothing. If the type argument differs, the
tp_new() call is initiated by by a derived type's tp_new() to
create the object and initialize the base type portion of the
object; in this case tp_new() should always return a new object
(or raise an exception).
There's a third slot related to object creation: tp_alloc(). Its
responsibility is to allocate the memory for the object,
initialize the reference count and type pointer field, and
initialize the rest of the object to all zeros. It should also
register the object with the garbage collection subsystem if the
type supports garbage collection. This slot exists so that
derived types can override the memory allocation policy
(e.g. which heap is being used) separately from the initialization
code. The signature is:
PyObject *tp_alloc(PyTypeObject *type, int nitems)
The type argument is the type of the new object. The nitems
argument is normally zero, except for objects with a variable
allocation size (basically strings, tuples, and longs). The
allocation size is given by the following expression:
type->tp_basicsize + nitems * type->tp_itemsize
This slot is only used for subclassable types. The tp_new()
function of the base class must call the tp_alloc() slot of the
type passed in as its first argument. It is the tp_new()
function's responsibility to calculate the number of items. The
tp_alloc() slot will set the ob_size field of the new object if
the type->tp_itemsize field is nonzero.
XXX The keyword arguments are currently not passed to tp_new();
its kwds argument is always NULL. This is a relic from a previous
revision and should probably be fixed. Both tp_new() and
tp_init() should receive exactly the same arguments, and both
should check that the arguments are acceptable, because they may
be called independently.
Requirements for a type to allow subtyping
@ -240,9 +341,9 @@ Requirements for a type to allow subtyping
its part of the instance structure).
A similar reasoning applies to destruction: if a subtype changes
the instance allocator (e.g. to use a different heap), it must
also change the instance deallocator; but it must still call on
the base type's destructor to DECREF the base type's instance
the instance allocator (for example to use a different heap), it
must also change the instance deallocator; but it must still call
on the base type's destructor to DECREF the base type's instance
variables.
In this proposal, I assign stricter meanings to two existing
@ -311,7 +412,7 @@ Requirements for a type to allow subtyping
it is needed in order to derive a subtype. The type object for
the base type must also be exported.
If the base type has a type-checking macro (e.g. PyDict_Check()),
If the base type has a type-checking macro (like PyDict_Check()),
this macro probably should be changed to recognize subtypes. This
can be done by using the new PyObject_TypeCheck(object, type)
macro, which calls a function that follows the base class links.
@ -434,7 +535,7 @@ Creating a subtype of a built-in type in C
as well (these are not inherited).
Exception: if the subtype defines no additional fields in its
structure (i.e., it only defines new behavior, no new data), the
structure (it only defines new behavior, no new data), the
tp_basicsize and the tp_dealloc fields may be set to zero.
In order to complete the initialization of the type,
@ -451,16 +552,17 @@ Creating a subtype of a built-in type in C
freeing function for any other pointers it owns, and then call the
base class's tp_dealloc slot. Because deallocation functions
typically are not exported, this call has to be made via the base
type's type structure, e.g., when deriving from the standard list
type:
type's type structure, for example, when deriving from the
standard list type:
PyList_Type.tp_dealloc(self);
(If the subtype uses a different allocation heap than the base
type, the subtype must call the base type's tp_clear() slot
instead, followed by a call to free the object's memory from the
appropriate heap, e.g. PyObject_DEL(self) if the subtype uses the
standard heap. But in this case subtyping is not recommended.)
appropriate heap, such as PyObject_DEL(self) if the subtype uses
the standard heap. But in this case subtyping is not
recommended.)
A subtype is not usable until PyType_InitDict() is called for it;
this is best done during module initialization, assuming the
@ -506,7 +608,7 @@ Subtyping in Python
to be provided for the creation of C is: its name (in this example
the string "C"); the list of base classes (a singleton tuple
containing B); and the results of executing the class body, in the
form of a dictionary (e.g. {"var1": 1, "method1": <function
form of a dictionary (for example {"var1": 1, "method1": <function
method1 at ...>, ...}).
I propose to rig the class statement to make the following call:
@ -580,8 +682,8 @@ Subtyping in Python
ensure that this object isn't deallocated while the type object is
still referencing it); and some more auxiliary storage (to be
described later). It initializes this storage to zeros except for
a few crucial slots (e.g. tp_name is set to point to the type
name) and then sets the tp_base slot to point to B. Then
a few crucial slots (for example, tp_name is set to point to the
type name) and then sets the tp_base slot to point to B. Then
PyType_InitDict() is called to inherit B's slots. Finally, C's
tp_dict slot is updated with the contents of the namespace
dictionary (the third argument to the call to M).
@ -641,10 +743,10 @@ Junk text (to be reused somewhere above)
for an allocation flag bit would be to have two type objects,
identical in the contents of all their slots except for their
deallocation slot. But this requires that all type-checking code
(e.g. the PyDict_Check()) recognizes both types. We'll come back
to this solution in the context of subtyping. Another alternative
is to require the metatype's tp_call to leave the allocation to
the tp_construct method, by passing in a NULL pointer. But this
(like PyDict_Check()) recognizes both types. We'll come back to
this solution in the context of subtyping. Another alternative is
to require the metatype's tp_call to leave the allocation to the
tp_construct method, by passing in a NULL pointer. But this
doesn't work once we allow subtyping.
Eventually, when we add any form of subtyping, we'll have to