Another intermediate checkin. Removed a lot of lies about an older

idea for what tp_alloc() should be.
2001-07-10 20:01:52 +00:00 · 2001-07-10 20:01:52 +00:00 · 14f1593cc7
parent 15299026e7
commit 14f1593cc7
1 changed files with 141 additions and 210 deletions
--- a/pep-0253.txt
+++ b/pep-0253.txt
@ -18,11 +18,11 @@ Introduction

    Traditionally, types in Python have been created statically, by
    declaring a global variable of type PyTypeObject and initializing
-    it with a static initializer.  The fields in the type object
+    it with a static initializer.  The slots in the type object
    describe all aspects of a Python type that are relevant to the
-    Python interpreter.  A few fields contain dimensional information
+    Python interpreter.  A few slots contain dimensional information
    (like the basic allocation size of instances), others contain
-    various flags, but most fields are pointers to functions to
+    various flags, but most slots are pointers to functions to
    implement various kinds of behaviors.  A NULL pointer means that
    the type does not implement the specific behavior; in that case
    the system may provide a default behavior in that case or raise an
@ -74,7 +74,7 @@ Introduction
    For binary compatibility, a flag bit in the tp_flags slot
    indicates the existence of the various new slots in the type
    object introduced below.  Types that don't have the
-    Py_TPFLAGS_HAVE_CLASS bit set in their tp_flags field are assumed
+    Py_TPFLAGS_HAVE_CLASS bit set in their tp_flags slot are assumed
    to have NULL values for all the subtyping slots.  (Warning: the
    current implementation prototype is not yet consistent in its
    checking of this flag bit.  This should be fixed before the final
@ -251,6 +251,12 @@ Making a type a factory for its instances
    dictionary an initial set of keys and values based on the
    arguments passed in.

+    Note that for immutable object types, the initialization cannot be
+    done by the tp_init() slot: this would provide the Python user
+    with a way to change the initialiation.  Therefore, immutable
+    objects typically have an empty tp_init() implementation and do
+    all their initialization in their tp_new() slot.
+
    You may wonder why the tp_new() slot shouldn't call the tp_init()
    slot itself.  The reason is that in certain circumstances (like
    support for persistent objects), it is important to be able to
@ -273,13 +279,13 @@ Making a type a factory for its instances

    There's a third slot related to object creation: tp_alloc().  Its
    responsibility is to allocate the memory for the object,
-    initialize the reference count and type pointer field, and
-    initialize the rest of the object to all zeros.  It should also
-    register the object with the garbage collection subsystem if the
-    type supports garbage collection.  This slot exists so that
-    derived types can override the memory allocation policy
-    (e.g. which heap is being used) separately from the initialization
-    code.  The signature is:
+    initialize the reference count (ob_refcnt) and the type pointer
+    (ob_type), and initialize the rest of the object to all zeros.  It
+    should also register the object with the garbage collection
+    subsystem if the type supports garbage collection.  This slot
+    exists so that derived types can override the memory allocation
+    policy (e.g. which heap is being used) separately from the
+    initialization code.  The signature is:

        PyObject *tp_alloc(PyTypeObject *type, int nitems)

@ -294,8 +300,13 @@ Making a type a factory for its instances
    function of the base class must call the tp_alloc() slot of the
    type passed in as its first argument.  It is the tp_new()
    function's responsibility to calculate the number of items.  The
-    tp_alloc() slot will set the ob_size field of the new object if
-    the type->tp_itemsize field is nonzero.
+    tp_alloc() slot will set the ob_size member of the new object if
+    the type->tp_itemsize member is nonzero.
+
+    (Note: in certain debugging compilation modes, the type structure
+    used to have members named tp_alloc and a tp_free slot already,
+    counters for the number of allocations and deallocations.  These
+    are renamed to tp_allocs and tp_deallocs.)

    XXX The keyword arguments are currently not passed to tp_new();
    its kwds argument is always NULL.  This is a relic from a previous
@ -304,189 +315,99 @@ Making a type a factory for its instances
    should check that the arguments are acceptable, because they may
    be called independently.

+    Standard implementations for tp_alloc() and tp_new() are
+    available.  PyType_GenericAlloc() allocates an object from the
+    standard heap and initializes it properly.  It uses the above
+    formula to determine the amount of memory to allocate, and takes
+    care of GC registration.  The only reason not to use this
+    implementation would be to allocate objects from different heap
+    (as is done by some very small frequently used objects like ints
+    and tuples).  PyType_GenericNew() adds very little: it just calls
+    the type's tp_alloc() slot with zero for nitems.  But for mutable
+    types that do all their initialization in their tp_init() slot,
+    this may be just the ticket.

-Requirements for a type to allow subtyping

-    The simplest form of subtyping is subtyping in C.  It is the
-    simplest form because we can require the C code to be aware of the
-    various problems, and it's acceptable for C code that doesn't
-    follow the rules to dump core.  For added simplicity, it is
-    limited to single inheritance.
+Preparing a type for subtyping

    The idea behind subtyping is very similar to that of single
    inheritance in C++.  A base type is described by a structure
-    declaration plus a type object.  A derived type can extend the
-    structure (but must leave the names, order and type of the fields
+    declaration (similar to the C++ class declaration) plus a type
+    object (similar to the C++ vtable).  A derived type can extend the
+    structure (but must leave the names, order and type of the members
    of the base structure unchanged) and can override certain slots in
-    the type object, leaving others the same.
+    the type object, leaving others the same.  (Unlike C++ vtables,
+    all Python type objects have the same memory lay-out.)

-    Most issues have to do with construction and destruction of
-    instances of derived types.
+    The base type must do the following:

-    Creation of a new object is separated into allocation and
-    initialization: allocation allocates the memory, and
-    initialization fill it with appropriate initial values.  The
-    separation is needed for the convenience of subtypes.
-    Instantiation of a subtype goes as follows:
+    - Add the flag value Py_TPFLAGS_BASETYPE to tp_flags.
+    - Declare and use tp_new(), tp_alloc() and optional tp_init() slots.
+    - Declare and use tp_dealloc() and tp_free().
+    - Export its object structure declaration.
+    - Export a subtyping-aware type-checking macro.

-        1. allocate memory for the whole (subtype) instance
-        2. initialize the base type
-        3. initialize the subtype's instance variables
+    The requirements and signatures for tp_new(), tp_alloc() and
+    tp_init() have already been discussed above: tp_alloc() should
+    allocate the memory and initialize it to mostly zeros; tp_new()
+    should call the tp_alloc() slot and then proceed to do the
+    minimally required initialization; tp_init() should be used for
+    more extensive initialization of mutable objects.

-    If allocation and initialization were done by the same function,
-    you would need a way to tell the base type's constructor to
-    allocate additional memory for the subtype's instance variables,
-    and there would be no way to change the allocation method for a
-    subtype (without giving up on calling the base type to initialize
-    its part of the instance structure).
+    It should come as no surprise that there are similar conventions
+    at the end of an object's lifetime.  The slots involved are
+    tp_dealloc() (familiar to all who have ever implemented a Python
+    extension type) and tp_free(), the new kid on he block.  (The
+    names aren't quite symmetric; tp_free() corresponds to tp_alloc(),
+    which is fine, but tp_dealloc() corresponds to tp_new().  Maybe
+    the tp_dealloc slot should be renamed?)

-    A similar reasoning applies to destruction: if a subtype changes
-    the instance allocator (for example to use a different heap), it
-    must also change the instance deallocator; but it must still call
-    on the base type's destructor to DECREF the base type's instance
-    variables.
+    The tp_free() slot should be used to free the memory and
+    unregister the object with the garbage collection subsystem, and
+    can be overridden by a derived class; tp_dealloc() should
+    deinitialize the object (e.g. by calling Py_XDECREF() for various
+    sub-objects) and then call tp_free() to deallocate the memory.
+    The signature for tp_dealloc() is the same as it always was:

-    In this proposal, I assign stricter meanings to two existing
-    slots for deallocation and deinitialization, and I add two new
-    slots for allocation and initialization.
+        void tp_dealloc(PyObject *object)

-    The tp_clear slot gets the new task of deinitializing an object so
-    that all that remains to be done is free its memory.  Originally,
-    all it had to do was clear object references.  The difference is
-    subtle: the list and dictionary objects contain references to an
-    additional heap-allocated piece of memory that isn't freed by
-    tp_clear in Python 2.1, but which must be freed by tp_clear under
-    this proposal. It should be safe to call tp_clear repeatedly on
-    the same object.  If an object contains no references to other
-    objects or heap-allocated memory, the tp_clear slot may be NULL.
+    The signature for tp_free() is the same:

-    The only additional requirement for the tp_dealloc slot is that it
-    should do the right thing whether or not tp_clear has been called.
+        void tp_free(PyObject *object)

-    The new slots are tp_alloc for allocation and tp_init for
-    initialization.  Their signatures:
+    (In a previous version of this PEP, there was also role reserved
+    for the tp_clear() slot.  This turned out to be a bad idea.)

-        PyObject *tp_alloc(PyTypeObject *type,
-                           PyObject *args,
-                           PyObject *kwds)
-
-        int tp_init(PyObject *self,
-                    PyObject *args,
-                    PyObject *kwds)
-
-    [XXX We'll have to rename tp_alloc to something else, because in
-    debug mode there's already a tp_alloc field.]
-
-    The arguments for tp_alloc are the same as for tp_new, described
-    above.  The arguments for tp_init are the same except that the
-    first argument is replaced with the instance to be initialized.
-    Its return value is 0 for success or -1 for failure.
-
-    It is possible that tp_init is called more than once or not at
-    all.  The implementation should allow this usage.  The object may
-    be non-functional until tp_init is called, and a second call to
-    tp_init may raise an exception, but it should not be possible to
-    cause a core dump or memory leakage this way.
-
-    Because tp_init is in a sense optional, tp_alloc is required to do
-    *some* initialization of the object.  It must initialize ob_refcnt
-    to 1 and ob_type to its type argument.  It should zero out the
-    rest of the object.
-
-    The constructor arguments are passed to tp_alloc so that for
-    variable-size objects (like tuples and strings) it knows to
-    allocate the right amount of memory.
-
-    For immutable types, tp_alloc may have to do the full
-    initialization; otherwise, different calls to tp_init might cause
-    an immutable object to be modified, which is considered a grave
-    offense in Python (unlike in Fortran :-).
-
-    Not every type can serve as a base type.  The assumption is made
-    that if a type has a non-NULL value in its tp_init slot, it is
-    ready to be subclassed; otherwise, it is not, and using it as a
-    base class will raise an exception.
-
-    In order to be usefully subtyped in C, a type must also export the
+    In order to be usefully subtyped in C, a type must export the
    structure declaration for its instances through a header file, as
    it is needed in order to derive a subtype.  The type object for
    the base type must also be exported.

    If the base type has a type-checking macro (like PyDict_Check()),
-    this macro probably should be changed to recognize subtypes.  This
-    can be done by using the new PyObject_TypeCheck(object, type)
-    macro, which calls a function that follows the base class links.
+    this macro should be made to recognize subtypes.  This can be done
+    by using the new PyObject_TypeCheck(object, type) macro, which
+    calls a function that follows the base class links.

-    (An argument against changing the type-checking macro could be
-    that the type check is used frequently and a function call would
-    slow things down too much, but I find this hard to believe.  One
-    could also fear that a subtype might break an invariant assumed by
-    the support functions of the base type.  Usually it is best to
-    change the base type to remove this reliance, at least to the
-    point of raising an exception rather than dumping core when the
-    invariant is broken.)
+    The PyObject_TypeCheck() macro contains a slight optimization: it
+    first compares object->ob_type directly to the type argument, and
+    if this is a match, bypasses the function call.  This should make
+    it fast enough for most situations.

-    Here are the inteactions between, tp_alloc, tp_clear, tp_dealloc
-    and subtypes; all assuming that the base type defines tp_init
-    (otherwise it cannot be subtyped anyway):
-
-    - If the base type's allocation scheme doesn't use the standard
-      heap, it should not define tp_alloc.  This is a signal for the
-      subclass to provide its own tp_alloc *and* tp_dealloc
-      implementation (probably using the standard heap).
-
-    - If the base type's tp_dealloc does anything besides calling
-      PyObject_DEL() (typically, calling Py_XDECREF() on contained
-      objects or freeing dependent memory blocks), it should define a
-      tp_clear that does the same without calling PyObject_DEL(), and
-      which checks for zero pointers before and zeros the pointers
-      afterwards, so that calling tp_clear more than once or calling
-      tp_dealloc after tp_clear will not attempt to DECREF or free the
-      same object/memory twice.  (It should also be allowed to
-      continue using the object after tp_clear -- tp_clear should
-      simply reset the object to its pristine state.)
-
-    - If the derived type overrides tp_alloc, it should also override
-      tp_dealloc, and tp_dealloc should call the derived type's
-      tp_clear if non-NULL (or its own tp_clear).
-
-    - If the derived type overrides tp_clear, it should call the base
-      type's tp_clear if non-NULL.
-
-    - If the base type defines tp_init as well as tp_new, its tp_new
-      should be inheritable: it should call the tp_alloc and the
-      tp_init of the type passed in as its first argument.
-
-    - If the base type defines tp_init as well as tp_alloc, its
-      tp_alloc should be inheritable: it should look in the
-      tp_basicsize slot of the type passed in for the amount of memory
-      to allocate, and it should initialize all allocated bytes to
-      zero.
-
-    - For types whose tp_itemsize is nonzero, the allocation size used
-      in tp_alloc should be tp_basicsize + n*tp_itemsize, rounded up
-      to the next integral multiple of sizeof(PyObject *), where n is
-      the number of items determined by the arguments to tp_alloc.
-
-    - Things are further complicated by the garbage collection API.
-      This affects tp_basicsize, and the actions to be taken by
-      tp_alloc.  tp_alloc should look at the Py_TPFLAGS_GC flag bit in
-      the tp_flags field of the type passed in, and not assume that
-      this is the same as the corresponding bit in the base type.  (In
-      part, the GC API is at fault; Neil Schemenauer has a patch that
-      fixes the API, but it is currently backwards incompatible.)
-
-    Note: the rules here are very complicated -- probably too
-    complicated.  It may be better to give up on subtyping immutable
-    types, types with custom allocators, and types with variable size
-    allocation (such as int, string and tuple) -- then the rules can
-    be much simplified because you can assume allocation on the
-    standard heap, no requirement beyond zeroing memory in tp_alloc,
-    and no variable length allocation.
+    Note that this change in the type-checking macro means that C
+    functions that require an instance of the base type may be invoked
+    with instances of the derived type.  Before enabling subtyping of
+    a particular type, its code should be checked to make sure that
+    this won't break anything.


 Creating a subtype of a built-in type in C

+    The simplest form of subtyping is subtyping in C.  It is the
+    simplest form because we can require the C code to be aware of
+    some of the problems, and it's acceptable for C code that doesn't
+    follow the rules to dump core.  For added simplicity, it is
+    limited to single inheritance.
+
    Let's assume we're deriving from a mutable base type whose
    tp_itemsize is zero.  The subtype code is not GC-aware, although
    it may inherit GC-awareness from the base type (this is
@ -501,85 +422,95 @@ Creating a subtype of a built-in type in C
        int state;
    } spamlistobject;

-    Note that the base type structure field (here PyListObject) must
-    be the first field in the structure; any following fields are
-    extension fields.  Also note that the base type is not referenced
-    via a pointer; the actual contents of its structure must be
-    included! (The goal is for the memory lay out of the beginning of
-    the subtype instance to be the same as that of the base type
+    Note that the base type structure member (here PyListObject) must
+    be the first member of the structure; any following members are
+    additions.  Also note that the base type is not referenced via a
+    pointer; the actual contents of its structure must be included!
+    (The goal is for the memory lay out of the beginning of the
+    subtype instance to be the same as that of the base type
    instance.)

    Next, the derived type must declare a type object and initialize
    it.  Most of the slots in the type object may be initialized to
    zero, which is a signal that the base type slot must be copied
-    into it.  Some fields that must be initialized properly:
+    into it.  Some slots that must be initialized properly:

    - The object header must be filled in as usual; the type should be
      &PyType_Type.

-    - The tp_basicsize field must be set to the size of the subtype
+    - The tp_basicsize slot must be set to the size of the subtype
      instance struct (in the above example: sizeof(spamlistobject)).

-    - The tp_base field must be set to the address of the base type's
+    - The tp_base slot must be set to the address of the base type's
      type object.

-    - If the derived slot defines any pointer fields, the tp_dealloc
+    - If the derived slot defines any pointer members, the tp_dealloc
      slot function requires special attention, see below; otherwise,
      it can be set to zero, to inherit the base type's deallocation
      function.

-    - The tp_flags field must be set to the usual Py_TPFLAGS_DEFAULT
+    - The tp_flags slot must be set to the usual Py_TPFLAGS_DEFAULT
      value.

-    - The tp_name field must be set; it is recommended to set tp_doc
+    - The tp_name slot must be set; it is recommended to set tp_doc
      as well (these are not inherited).

-    Exception: if the subtype defines no additional fields in its
-    structure (it only defines new behavior, no new data), the
-    tp_basicsize and the tp_dealloc fields may be set to zero.
-
-    In order to complete the initialization of the type,
-    PyType_InitDict() must be called.  This replaces zero slots in the
-    subtype with the value of the corresponding base type slots.  (It
-    also fills in tp_dict, the type's dictionary, and does various
-    other initializations necessary for type objects.)
+    If the subtype defines no additional structure members (it only
+    defines new behavior, no new data), the tp_basicsize and the
+    tp_dealloc slots may be left set to zero.

    The subtype's tp_dealloc slot deserves special attention.  If the
-    derived type defines no additional pointers that need to be
+    derived type defines no additional pointer members that need to be
    DECREF'ed or freed when the object is deallocated, it can be set
-    to zero.  Otherwise, the subtype's deallocation function must call
-    Py_XDECREF() for any PyObject * fields and the correct memory
+    to zero.  Otherwise, the subtype's tp_dealloc() function must call
+    Py_XDECREF() for any PyObject * members and the correct memory
    freeing function for any other pointers it owns, and then call the
-    base class's tp_dealloc slot.  Because deallocation functions
-    typically are not exported, this call has to be made via the base
-    type's type structure, for example, when deriving from the
+    base class's tp_dealloc() slot.  This call has to be made via the
+    base type's type structure, for example, when deriving from the
    standard list type:

        PyList_Type.tp_dealloc(self);

-    (If the subtype uses a different allocation heap than the base
-    type, the subtype must call the base type's tp_clear() slot
-    instead, followed by a call to free the object's memory from the
-    appropriate heap, such as PyObject_DEL(self) if the subtype uses
-    the standard heap.  But in this case subtyping is not
-    recommended.)
+    If the subtype wants to use a different allocation heap than the
+    base type, the subtype must override both the tp_alloc() and the
+    tp_free() slots.  These will be called by the base class's
+    tp_new() and tp_dealloc() slots, respectively.
+
+    In order to complete the initialization of the type,
+    PyType_InitDict() must be called.  This replaces slots initialized
+    to zero in the subtype with the value of the corresponding base
+    type slots.  (It also fills in tp_dict, the type's dictionary, and
+    does various other initializations necessary for type objects.)

    A subtype is not usable until PyType_InitDict() is called for it;
    this is best done during module initialization, assuming the
    subtype belongs to a module.  An alternative for subtypes added to
    the Python core (which don't live in a particular module) would be
    to initialize the subtype in their constructor function.  It is
-    allowed to call PyType_InitDict() more than once, the second and
+    allowed to call PyType_InitDict() more than once; the second and
    further calls have no effect.  In order to avoid unnecessary
    calls, a test for tp_dict==NULL can be made.

-    To create a subtype instance, the base type's tp_alloc slot must
-    be called with the subtype as its first argument.  Then, if the
-    base type has a tp_init slot, that must be called to initialize
-    the base portion of the instance; finally the subtype's own fields
-    must be initialized.  After allocation, the initialization can
-    also be done by calling the subtype's tp_init slot, assuming this
-    correctly calls its base type's tp_init slot.
+    (During initialization of the Python interpreter, some types are
+    actually used before they are initialized.  As long as the slots
+    that are actually needed are initialized, especially tp_dealloc,
+    this works, but it is fragile and not recommended as a general
+    practice.)
+
+    To create a subtype instance, the subtype's tp_new() slot is
+    called.  This should first call the base type's tp_new() slot and
+    then initialize the subtype's additional data members.  To further
+    initialize the instance, the tp_init() slot is typically called.
+    Note that the tp_new() slot should *not* call the tp_init() slot;
+    this is up to tp_new()'s caller (typically a factory function).
+    There are circumstances where it is appropriate not to call
+    tp_init().
+
+    If a subtype defines a tp_init() slot, the tp_init() slot should
+    normally first call the base type's tp_init() slot.
+
+    (XXX There should be a paragraph or two about argument passing
+    here.)


 Subtyping in Python