Another intermediate update. I've rewritten the requirements for a

base type to be subtypable. Needs way more work!
2001-06-13 21:48:31 +00:00 · 2001-06-13 21:48:31 +00:00 · b54a36962d
parent 137a35ac05
commit b54a36962d
1 changed files with 317 additions and 134 deletions
--- a/pep-0253.txt
+++ b/pep-0253.txt
@ -10,14 +10,16 @@ Post-History:

 Abstract

-    This PEP proposes ways for creating subtypes of existing built-in
-    types, either in C or in Python.  The text is currently long and
-    rambling; I'll go over it again later to make it shorter.
+    This PEP proposes additions to the type object API that will allow
+    the creation of subtypes of built-in types, in C and in Python.
+
+
+Introduction

    Traditionally, types in Python have been created statically, by
    declaring a global variable of type PyTypeObject and initializing
    it with a static initializer.  The fields in the type object
-    describe all aspects of a Python object that are relevant to the
+    describe all aspects of a Python type that are relevant to the
    Python interpreter.  A few fields contain dimensional information
    (e.g. the basic allocation size of instances), others contain
    various flags, but most fields are pointers to functions to
@ -26,39 +28,57 @@ Abstract
    the system may provide a default behavior in that case or raise an
    exception when the behavior is invoked.  Some collections of
    functions pointers that are usually defined together are obtained
-    indirectly via a pointer to an additional structure containing.
+    indirectly via a pointer to an additional structure containing
+    more function pointers.

    While the details of initializing a PyTypeObject structure haven't
-    been documented as such, they are easily glanced from the examples
+    been documented as such, they are easily gleaned from the examples
    in the source code, and I am assuming that the reader is
    sufficiently familiar with the traditional way of creating new
    Python types in C.

    This PEP will introduce the following features:

-    - a type, like a class, can be a factory for its instances
+    - a type can be a factory function for its instances

-    - types can be subtyped in C by specifying a base type pointer
+    - types can be subtyped in C

-    - types can be subtyped in Python using the class statement
+    - types can be subtyped in Python with the class statement

-    - multiple inheritance from types (insofar as practical)
+    - multiple inheritance from types is supported (insofar as
+      practical)

-    - the standard coercions (int, tuple, str etc.) will be the
-      corresponding type objects
+    - the standard coercions functions (int, tuple, str etc.) will be
+      redefined to be the corresponding type objects, which serve as
+      their own factory functions

-    - a standard type hierarchy
+    - there will be a standard type hierarchy
+
+    - a class statement can contain a metaclass declaration,
+      specifying the metaclass to be used to create the new class
+
+    - a class statement can contain a slots declaration, specifying
+      the specific names of the instance variables supported

    This PEP builds on pep-0252, which adds standard introspection to
-    types; in particular, types are assumed to have e.g. a __hash__
-    method when the type object defines the tp_hash slot.  pep-0252 also
-    adds a dictionary to type objects which contains all methods.  At
-    the Python level, this dictionary is read-only; at the C level, it
-    is accessible directly (but modifying it is not recommended except
-    as part of initialization).
+    types; e.g., when the type object defines the tp_hash slot, the
+    type object has a __hash__ method.  pep-0252 also adds a
+    dictionary to type objects which contains all methods.  At the
+    Python level, this dictionary is read-only for built-in types; at
+    the C level, it is accessible directly (but it should not be
+    modified except as part of initialization).
+
+    For binary compatibility, a flag bit in the tp_flags slot
+    indicates the existence of the various new slots in the type
+    object introduced below.  Types that don't have the
+    Py_TPFLAGS_HAVE_CLASS bit set in their tp_flags field are assumed
+    to have NULL values for all the subtyping slots.  (Warning: the
+    current implementation prototype is not yet consistent in its
+    checking of this flag bit.  This should be fixed before the final
+    release.)


-Metatypes
+About metatypes

    Inevitably the following discussion will come to mention metatypes
    (or metaclasses).  Metatypes are nothing new in Python: Python has
@ -75,26 +95,27 @@ Metatypes

    In this example, type(a) is a "regular" type, and type(type(a)) is
    a metatype.  While as distributed all types have the same metatype
-    (which is also its own metatype), this is not a requirement, and
-    in fact a useful 3rd party extension (ExtensionClasses by Jim
-    Fulton) creates an additional metatype.  A related feature is the
-    "Don Beaudry hook", which says that if a metatype is callable, its
-    instances (which are regular types) can be subclassed (really
-    subtyped) using a Python class statement.  We will use this rule
-    to support subtyping of built-in types, and in fact it greatly
-    simplifies the logic of class creation to always simply call the
-    metatype.  When no base class is specified, a default metatype is
-    called -- the default metatype is the "ClassType" object, so the
-    class statement will behave as before in the normal case.
+    (PyType_Type, which is also its own metatype), this is not a
+    requirement, and in fact a useful and relevant 3rd party extension
+    (ExtensionClasses by Jim Fulton) creates an additional metatype.

-    Python uses the concept of metatypes or metaclasses in a
-    different way than Smalltalk.  In Smalltalk-80, there is a
-    hierarchy of metaclasses that mirrors the hierarchy of regular
-    classes, metaclasses map 1-1 to classes (except for some funny
-    business at the root of the hierarchy), and each class statement
-    creates both a regular class and its metaclass, putting class
-    methods in the metaclass and instance methods in the regular
-    class.
+    A related feature is the "Don Beaudry hook", which says that if a
+    metatype is callable, its instances (which are regular types) can
+    be subclassed (really subtyped) using a Python class statement.
+    I will use this rule to support subtyping of built-in types, and
+    in fact it greatly simplifies the logic of class creation to
+    always simply call the metatype.  When no base class is specified,
+    a default metatype is called -- the default metatype is the
+    "ClassType" object, so the class statement will behave as before
+    in the normal case.
+
+    Python uses the concept of metatypes or metaclasses in a different
+    way than Smalltalk.  In Smalltalk-80, there is a hierarchy of
+    metaclasses that mirrors the hierarchy of regular classes,
+    metaclasses map 1-1 to classes (except for some funny business at
+    the root of the hierarchy), and each class statement creates both
+    a regular class and its metaclass, putting class methods in the
+    metaclass and instance methods in the regular class.

    Nice though this may be in the context of Smalltalk, it's not
    compatible with the traditional use of metatypes in Python, and I
@ -106,98 +127,75 @@ Metatypes
    e.g. Python code will never be allowed to allocate raw memory and
    initialize it at will.)

+    Metatypes determine various *policies* for types, e.g. what
+    happens when a type is called, how dynamic types are (whether a
+    type's __dict__ can be modified after it is created), what the
+    method resolution order is, how instance attributes are looked
+    up, and so on.

-Instantiation by calling the type object
+    I'll argue that left-to-right depth-first is not the best
+    solution when you want to get the most use from multiple
+    inheritance.

-    Traditionally, for each type there is at least one C function that
-    creates instances of the type (e.g. PyInt_FromLong(),
-    PyTuple_New() and so on).  This function has to take care of
+    I'll argue that with multiple inheritance, the metatype of the
+    subtype must be a descendant of the metatypes of all base types.
+
+    I'll come back to metatypes later.
+
+
+Making a type a factory for its instances
+
+    Traditionally, for each type there is at least one C factory
+    function that creates instances of the type (PyTuple_New(),
+    PyInt_FromLong() and so on).  These factory functions take care of
    both allocating memory for the object and initializing that
-    memory.  As of Python 2.0, it also has to interface with the
+    memory.  As of Python 2.0, they also have to interface with the
    garbage collection subsystem, if the type chooses to participate
    in garbage collection (which is optional, but strongly recommended
    for so-called "container" types: types that may contain arbitrary
    references to other objects, and hence may participate in
    reference cycles).

-    If we're going to implement subtyping, we must separate allocation
-    and initialization: typically, the most derived subtype is in
-    charge of allocation (and hence deallocation!), but in most cases
-    each base type's initializer (constructor) must still be called,
-    from the "most base" type to the most derived type.
+    In this proposal, type objects can be factory functions for their
+    instances, making the types directly callable from Python.  This
+    mimics the way classes are instantiated.  Of course, the C APIs
+    for creating instances of various built-in types will remain valid
+    and probably the most common; and not all types will become their
+    own factory functions.

-    But let's first get the interface for instantiation right.  If we
-    call an object, the tp_call slot if its type gets invoked.  Thus,
-    if we call a type, this invokes the tp_call slot of the type's
-    type: in other words, the tp_call slot of the metatype.
-    Traditionally this has been a NULL pointer, meaning that types
-    can't be called.  Now we're adding a tp_call slot to the metatype,
-    which makes all types "callable" in a trivial sense.  But
-    obviously the metatype's tp_call implementation doesn't know how
-    to initialize the instances of individual types.  So the type
-    defines a new slot, tp_new, which is invoked by the metatype's
-    tp_call slot.  If the tp_new slot is NULL, the metatype's tp_call
-    issues a nice error message: the type isn't callable.
+    The type object has a new slot, tp_new, which can act as a factory
+    for instances of the type.  Types are made callable by providing a
+    tp_call slot in PyType_Type (the metatype); the slot
+    implementation function looks for the tp_new slot of the type that
+    is being called.

-    This mechanism gives the maximum freedom to the type: a type's
-    tp_new doesn't necessarily have to return a new object, or even an
-    object that is an instance of the type (although the latter should
-    be rare).
+    If the type's tp_new slot is NULL, an exception is raised.
+    Otherwise, the tp_new slot is called.  The signature for the
+    tp_new slot is

-    HIRO
+        PyObject *tp_new(PyTypeObject *type,
+                         PyObject *args,
+                         PyObject *kwds)

-    The deallocation mechanism chosen should match the allocation
-    mechanism: an allocation policy should prescribe both the
-    allocation and deallocation mechanism.  And again, planning ahead
-    for subtyping would be nice.  But the available mechanisms are
-    different.  The deallocation function has always been part of the
-    type structure, as tp_dealloc, which combines the
-    "uninitialization" with deallocation.  This was good enough for
-    the traditional situation, where it matched the combined
-    allocation and initialization of the creation function.  But now
-    imagine a type whose creation function uses a special free list
-    for allocation.  It's deallocation function puts the object's
-    memory back on the same free list.  But when allocation and
-    creation are separate, the object may have been allocated from the
-    regular heap, and it would be wrong (in some cases disastrous) if
-    it were placed on the free list by the deallocation function.
+    where 'type' is the type whose tp_new slot is called, and 'args'
+    and 'kwds' are the sequential and keyword arguments to the call,
+    passed unchanged from tp_call.  (The 'type' argument is used in
+    combination with inheritance, see below.)

-    A solution would be for the tp_construct function to somehow mark
-    whether the object was allocated from the special free list, so
-    that the tp_dealloc function can choose the right deallocation
-    method (assuming that the only two alternatives are a special free
-    list or the regular heap).  A variant that doesn't require space
-    for an allocation flag bit would be to have two type objects,
-    identical in the contents of all their slots except for their
-    deallocation slot.  But this requires that all type-checking code
-    (e.g. the PyDict_Check()) recognizes both types.  We'll come back
-    to this solution in the context of subtyping.  Another alternative
-    is to require the metatype's tp_call to leave the allocation to
-    the tp_construct method, by passing in a NULL pointer.  But this
-    doesn't work once we allow subtyping.
-
-    Eventually, when we add any form of subtyping, we'll have to
-    separate deallocation from uninitialization.  The way to do this
-    is to add a separate slot to the type object that does the
-    uninitialization without the deallocation.  Fortunately, there is
-    already such a slot: tp_clear, currently used by the garbage
-    collection subsystem.  A simple rule makes this slot reusable as
-    an uninitialization: for types that support separate allocation
-    and initialization, tp_clear must be defined (even if the object
-    doesn't support garbage collection) and it must DECREF all
-    contained objects and FREE all other memory areas the object owns.
-    It must also be reentrant: it must be possible to clear an already
-    cleared object.  The easiest way to do this is to replace all
-    pointers DECREFed or FREEd with NULL pointers.
+    There are no constraints on the object type that is returned,
+    although by convention it should be an instance of the given
+    type.  It is not necessary that a new object is returned; a
+    reference to an existing object is fine too.  The return value
+    should always be a new reference, owned by the caller.


-Subtyping in C
+Requirements for a type to allow subtyping

    The simplest form of subtyping is subtyping in C.  It is the
    simplest form because we can require the C code to be aware of the
    various problems, and it's acceptable for C code that doesn't
-    follow the rules to dump core; while for Python subtyping we would
-    need to catch all errors before they become core dumps.
+    follow the rules to dump core.  For added simplicity, it is
+    limited to single inheritance.

    The idea behind subtyping is very similar to that of single
    inheritance in C++.  A base type is described by a structure
@ -206,31 +204,169 @@ Subtyping in C
    of the base structure unchanged) and can override certain slots in
    the type object, leaving others the same.

-    Not every type can serve as a base type.  The base type must
-    support separation of allocation and initialization by having a
-    tp_construct slot that can be called with a preallocated object,
-    and it must support uninitialization without deallocation by
-    having a tp_clear slot as described above.  The derived type must
-    also export the structure declaration for its instances through a
-    header file, as it is needed in order to derive a subtype.  The
-    type object for the base type must also be exported.
+    Most issues have to do with construction and destruction of
+    instances of derived types.
+
+    Creation of a new object is separated into allocation and
+    initialization: allocation allocates the memory, and
+    initialization fill it with appropriate initial values.  The
+    separation is needed for the convenience of subtypes.
+    Instantiation of a subtype goes as follows:
+
+        1. allocate memory for the whole (subtype) instance
+        2. initialize the base type
+        3. initialize the subtype's instance variables
+
+    If allocation and initialization were done by the same function,
+    you would need a way to tell the base type's constructor to
+    allocate additional memory for the subtype's instance variables,
+    and there would be no way to change the allocation method for a
+    subtype (without giving up on calling the base type to initialize
+    its part of the instance structure).
+
+    A similar reasoning applies to destruction: if a subtype changes
+    the instance allocator (e.g. to use a different heap), it must
+    also change the instance deallocator; but it must still call on
+    the base type's destructor to DECREF the base type's instance
+    variables.
+
+    In this proposal, I assign stricter meanings to two existing
+    slots for deallocation and deinitialization, and I add two new
+    slots for allocation and initialization.
+
+    The tp_clear slot gets the new task of deinitializing an object so
+    that all that remains to be done is free its memory.  Originally,
+    all it had to do was clear object references.  The difference is
+    subtle: the list and dictionary objects contain references to an
+    additional heap-allocated piece of memory that isn't freed by
+    tp_clear in Python 2.1, but which must be freed by tp_clear under
+    this proposal. It should be safe to call tp_clear repeatedly on
+    the same object.  If an object contains no references to other
+    objects or heap-allocated memory, the tp_clear slot may be NULL.
+
+    The only additional requirement for the tp_dealloc slot is that it
+    should do the right thing whether or not tp_clear has been called.
+
+    The new slots are tp_alloc for allocation and tp_init for
+    initialization.  Their signatures:
+
+        PyObject *tp_alloc(PyTypeObject *type,
+                           PyObject *args,
+                           PyObject *kwds)
+
+        int tp_init(PyObject *self,
+                    PyObject *args,
+                    PyObject *kwds)
+
+    The arguments for tp_alloc are the same as for tp_new, described
+    above.  The arguments for tp_init are the same except that the
+    first argument is replaced with the instance to be initialized.
+    Its return value is 0 for success or -1 for failure.
+
+    It is possible that tp_init is called more than once or not at
+    all.  The implementation should allow this usage.  The object may
+    be non-functional until tp_init is called, and a second call to
+    tp_init may raise an exception, but it should not be possible to
+    cause a core dump or memory leakage this way.
+
+    Because tp_init is in a sense optional, tp_alloc is required to do
+    *some* initialization of the object.  It is required to initialize
+    ob_refcnt to 1 and ob_type to its type argument.  To be safe, it
+    should probably zero out the rest of the object.
+
+    The constructor arguments are passed to tp_alloc so that for
+    variable-size objects (like tuples and strings) it knows to
+    allocate the right amount of memory.
+
+    For immutable types, tp_alloc may have to do the full
+    initialization; otherwise, different calls to tp_init might cause
+    an immutable object to be modified, which is considered a grave
+    offense in Python (unlike in Fortran :-).
+
+    Not every type can serve as a base type.  The assumption is made
+    that if a type has a non-NULL value in its tp_init slot, it is
+    ready to be subclassed; otherwise, it is not, and using it as a
+    base class will raise an exception.
+
+    In order to be usefully subtyped in C, a type must also export the
+    structure declaration for its instances through a header file, as
+    it is needed in order to derive a subtype.  The type object for
+    the base type must also be exported.

    If the base type has a type-checking macro (e.g. PyDict_Check()),
-    this macro may be changed to recognize subtypes.  This can be done
-    by using the new PyObject_TypeCheck(object, type) macro, which
-    calls a function that follows the base class links.  There are
-    arguments for and against changing the type-checking macro in this
-    way.  The argument for the change should be clear: it allows
-    subtypes to be used in places where the base type is required,
-    which is often the prime attraction of subtyping (as opposed to
-    sharing implementation).  An argument against changing the
-    type-checking macro could be that the type check is used
-    frequently and a function call would slow things down too much
-    (hard to believe); or one could fear that a subtype might break an
-    invariant assumed by the support functions of the base type.
-    Sometimes it would be wise to change the base type to remove this
-    reliance; other times, it would be better to require that derived
-    types (implemented in C) maintain the invariants.
+    this macro probably should be changed to recognize subtypes.  This
+    can be done by using the new PyObject_TypeCheck(object, type)
+    macro, which calls a function that follows the base class links.
+
+    (An argument against changing the type-checking macro could be
+    that the type check is used frequently and a function call would
+    slow things down too much, but I find this hard to believe.  One
+    could also fear that a subtype might break an invariant assumed by
+    the support functions of the base type.  Usually it is best to
+    change the base type to remove this reliance, at least to the
+    point of raising an exception rather than dumping core when the
+    invariant is broken.)
+
+    Here are the inteactions between, tp_alloc, tp_clear, tp_dealloc
+    and subtypes; all assuming that the base type defines tp_init
+    (otherwise it cannot be subtyped anyway):
+
+    - If the base type's allocation scheme doesn't use the standard
+      heap, it should not define tp_alloc.  This is a signal for the
+      subclass to provide its own tp_alloc *and* tp_dealloc
+      implementation (probably using the standard heap).
+
+    - If the base type's tp_dealloc does anything besides calling
+      PyObject_DEL() (typically, calling Py_XDECREF() on contained
+      objects or freeing dependent memory blocks), it should define a
+      tp_clear that does the same without calling PyObject_DEL(), and
+      which checks for zero pointers before and zeros the pointers
+      afterwards, so that calling tp_clear more than once or calling
+      tp_dealloc after tp_clear will not attempt to DECREF or free the
+      same object/memory twice.  (It should also be allowed to
+      continue using the object after tp_clear -- tp_clear should
+      simply reset the object to its pristine state.)
+
+    - If the derived type overrides tp_alloc, it should also override
+      tp_dealloc, and tp_dealloc should call the derived type's
+      tp_clear if non-NULL (or its own tp_clear).
+
+    - If the derived type overrides tp_clear, it should call the base
+      type's tp_clear if non-NULL.
+
+    - If the base type defines tp_init as well as tp_new, its tp_new
+      should be inheritable: it should call the tp_alloc and the
+      tp_init of the type passed in as its first argument.
+
+    - If the base type defines tp_init as well as tp_alloc, its
+      tp_alloc should be inheritable: it should look in the
+      tp_basicsize slot of the type passed in for the amount of memory
+      to allocate, and it should initialize all allocated bytes to
+      zero.
+
+    - For types whose tp_itemsize is nonzero, the allocation size used
+      in tp_alloc should be tp_basicsize + n*tp_itemsize, rounded up
+      to the next integral multiple of sizeof(PyObject *), where n is
+      the number of items determined by the arguments to tp_alloc.
+
+    - Things are further complicated by the garbage collection API.
+      This affects tp_basicsize, and the actions to be taken by
+      tp_alloc.  tp_alloc should look at the Py_TPFLAGS_GC flag bit in
+      the tp_flags field of the type passed in, and not assume that
+      this is the same as the corresponding bit in the base type.  (In
+      part, the GC API is at fault; Neil Schemenauer has a patch that
+      fixes the API, but it is currently backwards incompatible.)
+
+    Note: the rules here are very complicated -- probably too
+    complicated.  It may be better to give up on subtyping immutable
+    types, types with custom allocators, and types with variable size
+    allocation (such as int, string and tuple) -- then the rules can
+    be much simplified because you can assume allocation on the
+    standard heap, no requirement beyond zeroing memory in tp_alloc,
+    and no variable length allocation.
+
+
+Creating a subtype of a built-in type in C

    The derived type begins by declaring a type structure which
    contains the base type's structure.  For example, here's the type
@ -400,6 +536,53 @@ Copyright

    This document has been placed in the public domain.

+
+Junk text (to be reused somewhere above)
+
+    The deallocation mechanism chosen should match the allocation
+    mechanism: an allocation policy should prescribe both the
+    allocation and deallocation mechanism.  And again, planning ahead
+    for subtyping would be nice.  But the available mechanisms are
+    different.  The deallocation function has always been part of the
+    type structure, as tp_dealloc, which combines the
+    "uninitialization" with deallocation.  This was good enough for
+    the traditional situation, where it matched the combined
+    allocation and initialization of the creation function.  But now
+    imagine a type whose creation function uses a special free list
+    for allocation.  It's deallocation function puts the object's
+    memory back on the same free list.  But when allocation and
+    creation are separate, the object may have been allocated from the
+    regular heap, and it would be wrong (in some cases disastrous) if
+    it were placed on the free list by the deallocation function.
+
+    A solution would be for the tp_construct function to somehow mark
+    whether the object was allocated from the special free list, so
+    that the tp_dealloc function can choose the right deallocation
+    method (assuming that the only two alternatives are a special free
+    list or the regular heap).  A variant that doesn't require space
+    for an allocation flag bit would be to have two type objects,
+    identical in the contents of all their slots except for their
+    deallocation slot.  But this requires that all type-checking code
+    (e.g. the PyDict_Check()) recognizes both types.  We'll come back
+    to this solution in the context of subtyping.  Another alternative
+    is to require the metatype's tp_call to leave the allocation to
+    the tp_construct method, by passing in a NULL pointer.  But this
+    doesn't work once we allow subtyping.
+
+    Eventually, when we add any form of subtyping, we'll have to
+    separate deallocation from uninitialization.  The way to do this
+    is to add a separate slot to the type object that does the
+    uninitialization without the deallocation.  Fortunately, there is
+    already such a slot: tp_clear, currently used by the garbage
+    collection subsystem.  A simple rule makes this slot reusable as
+    an uninitialization: for types that support separate allocation
+    and initialization, tp_clear must be defined (even if the object
+    doesn't support garbage collection) and it must DECREF all
+    contained objects and FREE all other memory areas the object owns.
+    It must also be reentrant: it must be possible to clear an already
+    cleared object.  The easiest way to do this is to replace all
+    pointers DECREFed or FREEd with NULL pointers.
+

 Local Variables:
 mode: indented-text