Add a lot of text. A looooooot of text. Way too much rambling. And

it isn't even finished. I'll do that later. But at least there's some text here now...
2001-05-15 01:36:46 +00:00 · 2001-05-15 01:36:46 +00:00 · 8ce73a97d3
parent d1e2189144
commit 8ce73a97d3
1 changed files with 358 additions and 3 deletions
--- a/pep-0253.txt
+++ b/pep-0253.txt
@ -11,11 +11,366 @@ Post-History:
 Abstract

    This PEP proposes ways for creating subtypes of existing built-in
-    types, either in C or in Python.
+    types, either in C or in Python.  The text is currently long and
+    rambling; I'll go over it again later to make it shorter.

-Introduction
+    Traditionally, types in Python have been created statically, by
+    declaring a global variable of type PyTypeObject and initializing
+    it with a static initializer.  The fields in the type object
+    describe all aspects of a Python object that are relevant to the
+    Python interpreter.  A few fields contain dimensional information
+    (e.g. the basic allocation size of instances), others contain
+    various flags, but most fields are pointers to functions to
+    implement various kinds of behaviors.  A NULL pointer means that
+    the type does not implement the specific behavior; in that case
+    the system may provide a default behavior in that case or raise an
+    exception when the behavior is invoked.  Some collections of
+    functions pointers that are usually defined together are obtained
+    indirectly via a pointer to an additional structure containing.

-    [XXX to be done.]
+    While the details of initializing a PyTypeObject structure haven't
+    been documented as such, they are easily glanced from the examples
+    in the source code, and I am assuming that the reader is
+    sufficiently familiar with the traditional way of creating new
+    Python types in C.
+
+    This PEP will introduce the following optional features to types:
+
+    - create an instance of a type by calling it
+
+    - create a subtype in C by specifying a base type pointer
+
+    - create a subtype in Python using a class statement
+
+    - multiple inheritance
+
+    This PEP builds on PEP 252, which adds standard introspection to
+    types; in particular, types are assumed to have e.g. a __hash__
+    method when the type object defines the tp_hash slot.  PEP 252 also
+    adds a dictionary to type objects which contains all methods.  At
+    the Python level, this dictionary is read-only; at the C level, it
+    is accessible directly (but modifying it is not recommended except
+    as part of initialization).
+
+
+Metatypes
+
+    Inevitably the following discussion will come to mention metatypes
+    (or metaclasses).  Metatypes are nothing new in Python: Python has
+    always been able to talk about the type of a type:
+
+    >>> a = 0
+    >>> type(a)
+    <type 'int'>
+    >>> type(type(a))
+    <type 'type'>
+    >>> type(type(type(a)))
+    <type 'type'>
+    >>> 
+
+    In this example, type(a) is a "regular" type, and type(type(a)) is
+    a metatype.  While as distributed all types have the same metatype
+    (which is also its own metatype), this is not a requirement, and
+    in fact a useful 3rd party extension (ExtensionClasses by Jim
+    Fulton) creates an additional metatype.  A related feature is the
+    "Don Beaudry hook", which says that if a metatype is callable, its
+    instances (which are regular types) can be subclassed (really
+    subtyped) using a Python class statement.  We will use this rule
+    to support subtyping of built-in types, and in the process we will
+    introduce some additional metatypes, and a "metametatype". (The
+    metametatype is nothing unusual; Python's type system allows any
+    number of metalevels.)
+
+    Note that Python uses the concept of metatypes or metaclasses in a
+    different way than Smalltalk.  In Smalltalk-80, there is a
+    hierarchy of metaclasses that mirrors the hierarchy of regular
+    classes, metaclasses map 1-1 to classes (except for some funny
+    business at the root of the hierarchy), and each class statement
+    creates both a regular class and its metaclass, putting class
+    methods in the metaclass and instance methods in the regular
+    class.
+
+    Nice though this may be in the context of Smalltalk, it's not
+    compatible with the traditional use of metatypes in Python, and I
+    prefer to continue in the Python way.  This means that Python
+    metatypes are typically written in C, and may be shared between
+    many regular types. (It will be possible to subtype metatypes in
+    Python, so it won't be absolutely necessary to write C in order to
+    use metatypes; but the power of Python metatypes will be limited,
+    e.g. Python code will never be allowed to allocate raw memory and
+    initialize it at will.)
+
+
+Instantiation by calling the type object
+
+    Traditionally, for each type there is at least one C function that
+    creates instances of the type.  This function has to take care of
+    both allocating memory for the object and initializing that
+    memory.  As of Python 2.0, it also has to interface with the
+    garbage collection subsystem, if the type chooses to participate
+    in garbage collection (which is optional, but strongly recommended
+    for so-called "container" types: types that may contain arbitrary
+    references to other objects, and hence may participate in
+    reference cycles).
+
+    If we're going to implement subtyping, we must separate allocation
+    and initialization: typically, the most derived subtype is in
+    charge of allocation (and hence deallocation!), but in most cases
+    each base type's initializer (constructor) must still be called,
+    from the "most base" type to the most derived type.
+
+    But let's first get the interface for instantiation right.  If we
+    call an object, the tp_call slot if its type gets invoked.  Thus,
+    if we call a type, this invokes the tp_call slot of the type's
+    type: in other words, the tp_call slot of the metatype.
+    Traditionally this has been a NULL pointer, meaning that types
+    can't be called.  Now we're adding a tp_call slot to the metatype,
+    which makes all types "callable" in a trivial sense.  But
+    obviously the metatype's tp_call implementation doesn't know how
+    to initialize individual types.  So the type defines a new slot,
+    tp_construct, which is invoked by the metatype's tp_call slot.  If
+    the tp_construct slot is NULL, the metatype's tp_call issues a
+    nice error message: the type isn't callable.
+
+    We already know that tp_construct is responsible for initializing
+    the object (this will be important for subtyping too).  Who should
+    be responsible for allocation of the new object? Either the
+    metatype's tp_call can allocate the object, or the type's
+    tp_construct can allocate it.  The solution is copied from typical
+    C++ implementations: if the metatype's tp_call allocates storage
+    for the object it passes the storage as a pointer to the type's
+    tp_construct; if the metatype's tp_call does not allocate storage,
+    it passes a NULL pointer to the type's tp_call in which case the
+    type allocates the storage itself.  This moves the policy decision
+    to the metatype, and different metatypes may have different
+    policies.  The mechanisms are fixed though: either the metatype's
+    tp_call allocates storage, or the type's tp_construct allocates.
+
+    The deallocation mechanism chosen should match the allocation
+    mechanism: an allocation policy should prescribe both the
+    allocation and deallocation mechanism.  And again, planning ahead
+    for subtyping would be nice.  But the available mechanisms are
+    different.  The deallocation function has always been part of the
+    type structure, as tp_dealloc, which combines the
+    "uninitialization" with deallocation.  This was good enough for
+    the traditional situation, where it matched the combined
+    allocation and initialization of the creation function.  But now
+    imagine a type whose creation function uses a special free list
+    for allocation.  It's deallocation function puts the object's
+    memory back on the same free list.  But when allocation and
+    creation are separate, the object may have been allocated from the
+    regular heap, and it would be wrong (in some cases disastrous) if
+    it were placed on the free list by the deallocation function.
+
+    A solution would be for the tp_construct function to somehow mark
+    whether the object was allocated from the special free list, so
+    that the tp_dealloc function can choose the right deallocation
+    method (assuming that the only two alternatives are a special free
+    list or the regular heap).  A variant that doesn't require space
+    for an allocation flag bit would be to have two type objects,
+    identical in the contents of all their slots except for their
+    deallocation slot.  But this requires that all type-checking code
+    (e.g. the PyDict_Check()) recognizes both types.  We'll come back
+    to this solution in the context of subtyping.  Another alternative
+    is to require the metatype's tp_call to leave the allocation to
+    the tp_construct method, by passing in a NULL pointer.  But this
+    doesn't work once we allow subtyping.
+
+    Eventually, when we add any form of subtyping, we'll have to
+    separate deallocation from uninitialization.  The way to do this
+    is to add a separate slot to the type object that does the
+    uninitialization without the deallocation.  Fortunately, there is
+    already such a slot: tp_clear, currently used by the garbage
+    collection subsystem.  A simple rule makes this slot reusable as
+    an uninitialization: for types that support separate allocation
+    and initialization, tp_clear must be defined (even if the object
+    doesn't support garbage collection) and it must DECREF all
+    contained objects and FREE all other memory areas the object owns.
+    It must also be reentrant: it must be possible to clear an already
+    cleared object.  The easiest way to do this is to replace all
+    pointers DECREFed or FREEd with NULL pointers.
+
+
+Subtyping in C
+
+    The simplest form of subtyping is subtyping in C.  It is the
+    simplest form because we can require the C code to be aware of the
+    various problems, and it's acceptable for C code that doesn't
+    follow the rules to dump core; while for Python subtyping we would
+    need to catch all errors before they become core dumps.
+
+    The idea behind subtyping is very similar to that of single
+    inheritance in C++.  A base type is described by a structure
+    declaration plus a type object.  A derived type can extend the
+    structure (but must leave the names, order and type of the fields
+    of the base structure unchanged) and can override certain slots in
+    the type object, leaving others the same.
+
+    Not every type can serve as a base type.  The base type must
+    support separation of allocation and initialization by having a
+    tp_construct slot that can be called with a preallocated object,
+    and it must support uninitialization without deallocation by
+    having a tp_clear slot as described above.  The derived type must
+    also export the structure declaration for its instances through a
+    header file, as it is needed in order to derive a subtype.  The
+    type object for the base type must also be exported.
+
+    If the base type has a type-checking macro (e.g. PyDict_Check()),
+    this macro may be changed to recognize subtypes.  This can be done
+    by using the new PyObject_TypeCheck(object, type) macro, which
+    calls a function that follows the base class links.  There are
+    arguments for and against changing the type-checking macro in this
+    way.  The argument for the change should be clear: it allows
+    subtypes to be used in places where the base type is required,
+    which is often the prime attraction of subtyping (as opposed to
+    sharing implementation).  An argument against changing the
+    type-checking macro could be that the type check is used
+    frequently and a function call would slow things down too much
+    (hard to believe); or one could fear that a subtype might break an
+    invariant assumed by the support functions of the base type.
+    Sometimes it would be wise to change the base type to remove this
+    reliance; other times, it would be better to require that derived
+    types (implemented in C) maintain the invariants.
+
+    The derived type begins by declaring a type structure which
+    contains the base type's structure.  For example, here's the type
+    structure for a subtype of the built-in list type:
+
+    typedef struct {
+        PyListObject list;
+        int state;
+    } spamlistobject;
+
+    Note that the base type structure field (here PyListObject) must
+    be the first field in the structure; any following fields are
+    extension fields.  Also note that the base type is not referenced
+    via a pointer; the actual contents of its structure must be
+    included! (The goal is for the memory lay out of the beginning of
+    the subtype instance to be the same as that of the base type
+    instance.)
+
+    Next, the derived type must declare a type object and initialize
+    it.  Most of the slots in the type object may be initialized to
+    zero, which is a signal that the base type slot must be copied
+    into it.  Some fields that must be initialized properly:
+
+    - the object header must be filled in as usual; the type should be
+      PyType_Type
+
+    - the tp_basicsize field must be set to the size of the subtype
+      instances
+
+    - the tp_base field must be set to the address of the base type's
+      type object
+
+    - the tp_dealloc slot function must be a deallocation function for
+      the subtype
+
+    - the tp_flags field must be set to the usual Py_TPFLAGS_DEFAULT
+      value
+
+    - the tp_name field must be set (otherwise it will be inherited,
+      which is wrong)
+
+    Exception: if the subtype defines no additional fields in its
+    structure (i.e., it only defines new behavior, no new data), the
+    tp_basicsize and the tp_dealloc fields may be set to zero.  In
+    order to complete the initialization of the type,
+    PyType_InitDict() must be called.  This replaces zero slots in the
+    subtype with the value of the corresponding base type slots.  It
+    also fills in tp_dict, the type's dictionary; this is more a
+    matter of PEP 252.
+
+    The subtype's tp_dealloc slot deserves special attention.  It must
+    uninitialize and deallocate the object in an orderly manner: first
+    it must uninitialize the fields added by the extension type; then
+    it must call the base type's tp_clear function; finally it must
+    deallocate the memory of the object.  Usually, the base type's
+    tp_clear function has no global name; it is permissible to call it
+    via the base type's tp_clear slot, e.g. PyListType.tp_clear(obj).
+    Only if it is known that the base type uses the same allocation
+    method as the subtype and the subtype requires no uninitialization
+    (e.g. it adds no data fields or all its data fields are numbers)
+    is it permissible to leave tp_dealloc set to zero in the subtype's
+    type object; it will be copied from the base type.
+
+    A subtype is not usable until PyType_InitDict() is called for it;
+    this is best done during module initialization, assuming the
+    subtype belongs to a module.  An alternative for subtypes added to
+    the Python core (which don't live in a particular module) would be
+    to initialize the subtype in their constructor function.  It is
+    allowed to call PyType_InitDict() more than once, the second and
+    further calls have no effect.  In order to avoid unnecessary
+    calls, a test for tp_dict==NULL can be made.
+
+    If the subtype itself should be subtypable (usually desirable), it
+    should follow the same rules are given above for base types: have
+    a tp_construct that accepts a preallocated object and calls the
+    base type's tp_construct, and have a tp_clear that calls the base
+    type's tp_clear.
+
+
+Subtyping in Python
+
+    The next step is to allow subtyping of selected built-in types
+    through a class statement in Python.  Limiting ourselves to single
+    inheritance for now, here is what happens for a simple class
+    statement:
+
+    class C(B):
+        var1 = 1
+        def method1(self): pass
+        # etc.
+
+    The body of the class statement is executes in a fresh environment
+    (basically, a new dictionary used as local namespace), and then C
+    is created.  The following explains how C is created.
+
+    Assume B is a type object.  Since type objects are objects, and
+    every object has a type, B has a type.  B's type is accessible via
+    type(B) or B.__class__ (the latter notation is new for types; it
+    is introduced in PEP 252).  Let's say B's type is M (for
+    Metatype).  The class statement will create a new type, C.  Since
+    C will be a type object just like B, we view the creation of C as
+    an instantiation of the metatype, M.  The information that needs
+    to be provided for the creation of C is: its name (in this example
+    the string "C"); the list of base classes (a singleton tuple
+    containing B); and the results of executing the class body, in the
+    form of a dictionary (e.g. {"var1": 1, "method1": <function...>,
+    ...}).
+
+    According to the Don Beaudry hook, the following call is made:
+
+    C = M("C", (B,), dict)
+
+    (where dict is the dictionary resulting from execution of the
+    class body).  In other words, the metatype (M) is called.  Note
+    that even though we currently require there to be exactly one base
+    class, we still pass in a (singleton) sequence of base classes;
+    this makes it possible to support multiple inheritance later (or
+    for types with a different metaclass!) without changing this
+    interface.
+
+    Note that calling M requires that M itself has a type: the
+    meta-metatype.  In the current implementation, I have introduced a
+    new type object for this purpose, named turtle because of my
+    fondness of the phrase "turtles all the way down".  However I now
+    believe that it would be better if M were its own metatype, just
+    like before.  This can be accomplished by making M's tp_call slot
+    slightly more flexible.
+
+    In any case, the work for creating C is done by M's tp_construct
+    slot.  It allocates space for an "extended" type structure, which
+    contains space for: the type object; the auxiliary structures
+    (as_sequence etc.); the string object containing the type name (to
+    ensure that this object isn't deallocated while the type object is
+    still referencing it); and some more auxiliary storage (to be
+    described later).  It initializes this storage to zeros except for
+    a few crucial slots (e.g. tp_name is set to point to the type
+    name) and then sets the tp_base slot to point to B.  Then
+    PyType_InitDict() is called to inherit B's slots.  Finally, C's
+    tp_dict slot is updated with the contents of the namespace
+    dictionary (the third argument to the call to M).


 Copyright