Intermediate checkin (documented tp_new, tp_init, tp_alloc properly).
This commit is contained in:
parent
a921257c6a
commit
15299026e7
222
pep-0253.txt
222
pep-0253.txt
|
@ -21,7 +21,7 @@ Introduction
|
|||
it with a static initializer. The fields in the type object
|
||||
describe all aspects of a Python type that are relevant to the
|
||||
Python interpreter. A few fields contain dimensional information
|
||||
(e.g. the basic allocation size of instances), others contain
|
||||
(like the basic allocation size of instances), others contain
|
||||
various flags, but most fields are pointers to functions to
|
||||
implement various kinds of behaviors. A NULL pointer means that
|
||||
the type does not implement the specific behavior; in that case
|
||||
|
@ -39,34 +39,37 @@ Introduction
|
|||
|
||||
This PEP will introduce the following features:
|
||||
|
||||
- a type can be a factory function for its instances
|
||||
- a type can be a factory function for its instances
|
||||
|
||||
- types can be subtyped in C
|
||||
- types can be subtyped in C
|
||||
|
||||
- types can be subtyped in Python with the class statement
|
||||
- types can be subtyped in Python with the class statement
|
||||
|
||||
- multiple inheritance from types is supported (insofar as
|
||||
practical)
|
||||
- multiple inheritance from types is supported (insofar as
|
||||
practical -- you still can't multiply inherit from list and
|
||||
dictionary)
|
||||
|
||||
- the standard coercions functions (int, tuple, str etc.) will be
|
||||
redefined to be the corresponding type objects, which serve as
|
||||
their own factory functions
|
||||
- the standard coercions functions (int, tuple, str etc.) will
|
||||
be redefined to be the corresponding type objects, which serve
|
||||
as their own factory functions
|
||||
|
||||
- there will be a standard type hierarchy
|
||||
- a class statement can contain a __metaclass__ declaration,
|
||||
specifying the metaclass to be used to create the new class
|
||||
|
||||
- a class statement can contain a metaclass declaration,
|
||||
specifying the metaclass to be used to create the new class
|
||||
- a class statement can contain a __slots__ declaration,
|
||||
specifying the specific names of the instance variables
|
||||
supported
|
||||
|
||||
- a class statement can contain a slots declaration, specifying
|
||||
the specific names of the instance variables supported
|
||||
- there will be a standard type hierarchy (maybe)
|
||||
|
||||
This PEP builds on PEP 252, which adds standard introspection to
|
||||
types; e.g., when the type object defines the tp_hash slot, the
|
||||
type object has a __hash__ method. PEP 252 also adds a
|
||||
dictionary to type objects which contains all methods. At the
|
||||
Python level, this dictionary is read-only for built-in types; at
|
||||
the C level, it is accessible directly (but it should not be
|
||||
modified except as part of initialization).
|
||||
types; for example, when a particular type object initializes the
|
||||
tp_hash slot, that type object has a __hash__ method when
|
||||
introspected. PEP 252 also adds a dictionary to type objects
|
||||
which contains all methods. At the Python level, this dictionary
|
||||
is read-only for built-in types; at the C level, it is accessible
|
||||
directly (but it should not be modified except as part of
|
||||
initialization).
|
||||
|
||||
For binary compatibility, a flag bit in the tp_flags slot
|
||||
indicates the existence of the various new slots in the type
|
||||
|
@ -79,14 +82,14 @@ Introduction
|
|||
|
||||
In current Python, a distinction is made between types and
|
||||
classes. This PEP together with PEP 254 will remove that
|
||||
distinction. However, for backwards compatibility there will
|
||||
probably remain a bit of a distinction for years to come, and
|
||||
without PEP 254, the distinction is still large: types ultimately
|
||||
have a built-in type as a base class, while classes ultimately
|
||||
derive from a user-defined class. Therefore, in the rest of this
|
||||
PEP, I will use the word type whenever I can -- including base
|
||||
type or supertype, derived type or subtype, and metatype.
|
||||
However, sometimes the terminology necessarily blends, e.g. an
|
||||
distinction. However, for backwards compatibility the distinction
|
||||
will probably remain for years to come, and without PEP 254, the
|
||||
distinction is still large: types ultimately have a built-in type
|
||||
as a base class, while classes ultimately derive from a
|
||||
user-defined class. Therefore, in the rest of this PEP, I will
|
||||
use the word type whenever I can -- including base type or
|
||||
supertype, derived type or subtype, and metatype. However,
|
||||
sometimes the terminology necessarily blends, for example an
|
||||
object's type is given by its __class__ attribute, and subtyping
|
||||
in Python is spelled with a class statement. If further
|
||||
distinction is necessary, user-defined classes can be referred to
|
||||
|
@ -95,9 +98,9 @@ Introduction
|
|||
|
||||
About metatypes
|
||||
|
||||
Inevitably the following discussion will come to mention metatypes
|
||||
(or metaclasses). Metatypes are nothing new in Python: Python has
|
||||
always been able to talk about the type of a type:
|
||||
Inevitably the discussion comes to metatypes (or metaclasses).
|
||||
Metatypes are nothing new in Python: Python has always been able
|
||||
to talk about the type of a type:
|
||||
|
||||
>>> a = 0
|
||||
>>> type(a)
|
||||
|
@ -113,16 +116,19 @@ About metatypes
|
|||
(PyType_Type, which is also its own metatype), this is not a
|
||||
requirement, and in fact a useful and relevant 3rd party extension
|
||||
(ExtensionClasses by Jim Fulton) creates an additional metatype.
|
||||
The type of classic classes, known as types.ClassType, can also be
|
||||
considered a distinct metatype.
|
||||
|
||||
A related feature is the "Don Beaudry hook", which says that if a
|
||||
metatype is callable, its instances (which are regular types) can
|
||||
be subclassed (really subtyped) using a Python class statement.
|
||||
I will use this rule to support subtyping of built-in types, and
|
||||
in fact it greatly simplifies the logic of class creation to
|
||||
always simply call the metatype. When no base class is specified,
|
||||
a default metatype is called -- the default metatype is the
|
||||
"ClassType" object, so the class statement will behave as before
|
||||
in the normal case.
|
||||
A feature closely connected to metatypes is the "Don Beaudry
|
||||
hook", which says that if a metatype is callable, its instances
|
||||
(which are regular types) can be subclassed (really subtyped)
|
||||
using a Python class statement. I will use this rule to support
|
||||
subtyping of built-in types, and in fact it greatly simplifies the
|
||||
logic of class creation to always simply call the metatype. When
|
||||
no base class is specified, a default metatype is called -- the
|
||||
default metatype is the "ClassType" object, so the class statement
|
||||
will behave as before in the normal case. (This default can be
|
||||
changed per module by setting the global variable __metaclass__.)
|
||||
|
||||
Python uses the concept of metatypes or metaclasses in a different
|
||||
way than Smalltalk. In Smalltalk-80, there is a hierarchy of
|
||||
|
@ -138,11 +144,11 @@ About metatypes
|
|||
metatypes are typically written in C, and may be shared between
|
||||
many regular types. (It will be possible to subtype metatypes in
|
||||
Python, so it won't be absolutely necessary to write C in order to
|
||||
use metatypes; but the power of Python metatypes will be limited,
|
||||
e.g. Python code will never be allowed to allocate raw memory and
|
||||
initialize it at will.)
|
||||
use metatypes; but the power of Python metatypes will be limited.
|
||||
For example, Python code will never be allowed to allocate raw
|
||||
memory and initialize it at will.)
|
||||
|
||||
Metatypes determine various *policies* for types, e.g. what
|
||||
Metatypes determine various *policies* for types,such as what
|
||||
happens when a type is called, how dynamic types are (whether a
|
||||
type's __dict__ can be modified after it is created), what the
|
||||
method resolution order is, how instance attributes are looked
|
||||
|
@ -184,6 +190,25 @@ Making a type a factory for its instances
|
|||
implementation function looks for the tp_new slot of the type that
|
||||
is being called.
|
||||
|
||||
(Confusion alert: the tp_call slot of a regular type object (such
|
||||
as PyInt_Type or PyList_Type) defines what happens when
|
||||
*instances* of that type are called; in particular, the tp_call
|
||||
slot in the function type, PyFunction_Type, is the key to making
|
||||
functions callable. As another example, PyInt_Type.tp_call is
|
||||
NULL, because integers are not callable. The new paradigm makes
|
||||
*type objects* callable. Since type objects are instances of
|
||||
their metatype (PyType_Type), the metatype's tp_call slot
|
||||
(PyType_Type.tp_call) points to a function that is invoked when
|
||||
any type object is called. Now, since each type has do do
|
||||
something different to create an instance of itself,
|
||||
PyType_Type.tp_call immediately defers to the tp_new slot of the
|
||||
type that is being called. To add to the confusion, PyType_Type
|
||||
itself is also callable: its tp_new slot creates a new type. This
|
||||
is used by the class statement (via the Don Beaudry hook, see
|
||||
above). And what makes PyType_Type callable? The tp_call slot of
|
||||
*its* metatype -- but since it is its own metatype, that is its
|
||||
own tp_call slot!)
|
||||
|
||||
If the type's tp_new slot is NULL, an exception is raised.
|
||||
Otherwise, the tp_new slot is called. The signature for the
|
||||
tp_new slot is
|
||||
|
@ -203,6 +228,82 @@ Making a type a factory for its instances
|
|||
reference to an existing object is fine too. The return value
|
||||
should always be a new reference, owned by the caller.
|
||||
|
||||
One the tp_new slot has returned an object, further initialization
|
||||
is attempted by calling the tp_init() slot of the resulting
|
||||
object's type, if not NULL. This has the following signature:
|
||||
|
||||
PyObject *tp_init(PyObject *self,
|
||||
PyObject *args,
|
||||
PyObject *kwds)
|
||||
|
||||
It corresponds more closely to the __init__() method of classic
|
||||
classes, and in fact is mapped to that by the slot/special-method
|
||||
correspondence rules. The difference in responsibilities between
|
||||
the tp_new() slot and the tp_init() slot lies in the invariants
|
||||
they ensure. The tp_new() slot should ensure only the most
|
||||
essential invariants, without which the C code that implements the
|
||||
object's wold break. The tp_init() slot should be used for
|
||||
overridable user-specific initializations. Take for example the
|
||||
dictionary type. The implementation has an internal pointer to a
|
||||
hash table which should never be NULL. This invariant is taken
|
||||
care of by the tp_new() slot for dictionaries. The dictionary
|
||||
tp_init() slot, on the other hand, could be used to give the
|
||||
dictionary an initial set of keys and values based on the
|
||||
arguments passed in.
|
||||
|
||||
You may wonder why the tp_new() slot shouldn't call the tp_init()
|
||||
slot itself. The reason is that in certain circumstances (like
|
||||
support for persistent objects), it is important to be able to
|
||||
create an object of a particular type without initializing it any
|
||||
further than necessary. This may conveniently be done by calling
|
||||
the tp_new() slot without calling tp_init(). It is also possible
|
||||
that tp_init() is not called, or called more than once -- its
|
||||
operation should be robust even in these anomalous cases.
|
||||
|
||||
For some objects, tp_new() may return an existing object. For
|
||||
example, the factory function for integers caches the integers -1
|
||||
throug 99. This is permissible only when the type argument to
|
||||
tp_new() is the type that defined the tp_new() function (in the
|
||||
example, if type == &PyInt_Type), and when the tp_init() slot for
|
||||
this type does nothing. If the type argument differs, the
|
||||
tp_new() call is initiated by by a derived type's tp_new() to
|
||||
create the object and initialize the base type portion of the
|
||||
object; in this case tp_new() should always return a new object
|
||||
(or raise an exception).
|
||||
|
||||
There's a third slot related to object creation: tp_alloc(). Its
|
||||
responsibility is to allocate the memory for the object,
|
||||
initialize the reference count and type pointer field, and
|
||||
initialize the rest of the object to all zeros. It should also
|
||||
register the object with the garbage collection subsystem if the
|
||||
type supports garbage collection. This slot exists so that
|
||||
derived types can override the memory allocation policy
|
||||
(e.g. which heap is being used) separately from the initialization
|
||||
code. The signature is:
|
||||
|
||||
PyObject *tp_alloc(PyTypeObject *type, int nitems)
|
||||
|
||||
The type argument is the type of the new object. The nitems
|
||||
argument is normally zero, except for objects with a variable
|
||||
allocation size (basically strings, tuples, and longs). The
|
||||
allocation size is given by the following expression:
|
||||
|
||||
type->tp_basicsize + nitems * type->tp_itemsize
|
||||
|
||||
This slot is only used for subclassable types. The tp_new()
|
||||
function of the base class must call the tp_alloc() slot of the
|
||||
type passed in as its first argument. It is the tp_new()
|
||||
function's responsibility to calculate the number of items. The
|
||||
tp_alloc() slot will set the ob_size field of the new object if
|
||||
the type->tp_itemsize field is nonzero.
|
||||
|
||||
XXX The keyword arguments are currently not passed to tp_new();
|
||||
its kwds argument is always NULL. This is a relic from a previous
|
||||
revision and should probably be fixed. Both tp_new() and
|
||||
tp_init() should receive exactly the same arguments, and both
|
||||
should check that the arguments are acceptable, because they may
|
||||
be called independently.
|
||||
|
||||
|
||||
Requirements for a type to allow subtyping
|
||||
|
||||
|
@ -240,9 +341,9 @@ Requirements for a type to allow subtyping
|
|||
its part of the instance structure).
|
||||
|
||||
A similar reasoning applies to destruction: if a subtype changes
|
||||
the instance allocator (e.g. to use a different heap), it must
|
||||
also change the instance deallocator; but it must still call on
|
||||
the base type's destructor to DECREF the base type's instance
|
||||
the instance allocator (for example to use a different heap), it
|
||||
must also change the instance deallocator; but it must still call
|
||||
on the base type's destructor to DECREF the base type's instance
|
||||
variables.
|
||||
|
||||
In this proposal, I assign stricter meanings to two existing
|
||||
|
@ -311,7 +412,7 @@ Requirements for a type to allow subtyping
|
|||
it is needed in order to derive a subtype. The type object for
|
||||
the base type must also be exported.
|
||||
|
||||
If the base type has a type-checking macro (e.g. PyDict_Check()),
|
||||
If the base type has a type-checking macro (like PyDict_Check()),
|
||||
this macro probably should be changed to recognize subtypes. This
|
||||
can be done by using the new PyObject_TypeCheck(object, type)
|
||||
macro, which calls a function that follows the base class links.
|
||||
|
@ -434,7 +535,7 @@ Creating a subtype of a built-in type in C
|
|||
as well (these are not inherited).
|
||||
|
||||
Exception: if the subtype defines no additional fields in its
|
||||
structure (i.e., it only defines new behavior, no new data), the
|
||||
structure (it only defines new behavior, no new data), the
|
||||
tp_basicsize and the tp_dealloc fields may be set to zero.
|
||||
|
||||
In order to complete the initialization of the type,
|
||||
|
@ -451,16 +552,17 @@ Creating a subtype of a built-in type in C
|
|||
freeing function for any other pointers it owns, and then call the
|
||||
base class's tp_dealloc slot. Because deallocation functions
|
||||
typically are not exported, this call has to be made via the base
|
||||
type's type structure, e.g., when deriving from the standard list
|
||||
type:
|
||||
type's type structure, for example, when deriving from the
|
||||
standard list type:
|
||||
|
||||
PyList_Type.tp_dealloc(self);
|
||||
|
||||
(If the subtype uses a different allocation heap than the base
|
||||
type, the subtype must call the base type's tp_clear() slot
|
||||
instead, followed by a call to free the object's memory from the
|
||||
appropriate heap, e.g. PyObject_DEL(self) if the subtype uses the
|
||||
standard heap. But in this case subtyping is not recommended.)
|
||||
appropriate heap, such as PyObject_DEL(self) if the subtype uses
|
||||
the standard heap. But in this case subtyping is not
|
||||
recommended.)
|
||||
|
||||
A subtype is not usable until PyType_InitDict() is called for it;
|
||||
this is best done during module initialization, assuming the
|
||||
|
@ -506,7 +608,7 @@ Subtyping in Python
|
|||
to be provided for the creation of C is: its name (in this example
|
||||
the string "C"); the list of base classes (a singleton tuple
|
||||
containing B); and the results of executing the class body, in the
|
||||
form of a dictionary (e.g. {"var1": 1, "method1": <function
|
||||
form of a dictionary (for example {"var1": 1, "method1": <function
|
||||
method1 at ...>, ...}).
|
||||
|
||||
I propose to rig the class statement to make the following call:
|
||||
|
@ -580,8 +682,8 @@ Subtyping in Python
|
|||
ensure that this object isn't deallocated while the type object is
|
||||
still referencing it); and some more auxiliary storage (to be
|
||||
described later). It initializes this storage to zeros except for
|
||||
a few crucial slots (e.g. tp_name is set to point to the type
|
||||
name) and then sets the tp_base slot to point to B. Then
|
||||
a few crucial slots (for example, tp_name is set to point to the
|
||||
type name) and then sets the tp_base slot to point to B. Then
|
||||
PyType_InitDict() is called to inherit B's slots. Finally, C's
|
||||
tp_dict slot is updated with the contents of the namespace
|
||||
dictionary (the third argument to the call to M).
|
||||
|
@ -641,10 +743,10 @@ Junk text (to be reused somewhere above)
|
|||
for an allocation flag bit would be to have two type objects,
|
||||
identical in the contents of all their slots except for their
|
||||
deallocation slot. But this requires that all type-checking code
|
||||
(e.g. the PyDict_Check()) recognizes both types. We'll come back
|
||||
to this solution in the context of subtyping. Another alternative
|
||||
is to require the metatype's tp_call to leave the allocation to
|
||||
the tp_construct method, by passing in a NULL pointer. But this
|
||||
(like PyDict_Check()) recognizes both types. We'll come back to
|
||||
this solution in the context of subtyping. Another alternative is
|
||||
to require the metatype's tp_call to leave the allocation to the
|
||||
tp_construct method, by passing in a NULL pointer. But this
|
||||
doesn't work once we allow subtyping.
|
||||
|
||||
Eventually, when we add any form of subtyping, we'll have to
|
||||
|
|
Loading…
Reference in New Issue