Intermediate checkin (documented tp_new, tp_init, tp_alloc properly).

2001-07-10 17:11:19 +00:00 · 2001-07-10 17:11:19 +00:00 · 15299026e7
parent a921257c6a
commit 15299026e7
1 changed files with 162 additions and 60 deletions
--- a/pep-0253.txt
+++ b/pep-0253.txt
@ -21,7 +21,7 @@ Introduction
    it with a static initializer.  The fields in the type object
    describe all aspects of a Python type that are relevant to the
    Python interpreter.  A few fields contain dimensional information
-    (e.g. the basic allocation size of instances), others contain
+    (like the basic allocation size of instances), others contain
    various flags, but most fields are pointers to functions to
    implement various kinds of behaviors.  A NULL pointer means that
    the type does not implement the specific behavior; in that case
@ -39,34 +39,37 @@ Introduction

    This PEP will introduce the following features:

-    - a type can be a factory function for its instances
+      - a type can be a factory function for its instances

-    - types can be subtyped in C
+      - types can be subtyped in C

-    - types can be subtyped in Python with the class statement
+      - types can be subtyped in Python with the class statement

-    - multiple inheritance from types is supported (insofar as
-      practical)
+      - multiple inheritance from types is supported (insofar as
+        practical -- you still can't multiply inherit from list and
+        dictionary)

-    - the standard coercions functions (int, tuple, str etc.) will be
-      redefined to be the corresponding type objects, which serve as
-      their own factory functions
+      - the standard coercions functions (int, tuple, str etc.) will
+        be redefined to be the corresponding type objects, which serve
+        as their own factory functions

-    - there will be a standard type hierarchy
+      - a class statement can contain a __metaclass__ declaration,
+        specifying the metaclass to be used to create the new class

-    - a class statement can contain a metaclass declaration,
-      specifying the metaclass to be used to create the new class
+      - a class statement can contain a __slots__ declaration,
+        specifying the specific names of the instance variables
+        supported

-    - a class statement can contain a slots declaration, specifying
-      the specific names of the instance variables supported
+      - there will be a standard type hierarchy (maybe)

    This PEP builds on PEP 252, which adds standard introspection to
-    types; e.g., when the type object defines the tp_hash slot, the
-    type object has a __hash__ method.  PEP 252 also adds a
-    dictionary to type objects which contains all methods.  At the
-    Python level, this dictionary is read-only for built-in types; at
-    the C level, it is accessible directly (but it should not be
-    modified except as part of initialization).
+    types; for example, when a particular type object initializes the
+    tp_hash slot, that type object has a __hash__ method when
+    introspected.  PEP 252 also adds a dictionary to type objects
+    which contains all methods.  At the Python level, this dictionary
+    is read-only for built-in types; at the C level, it is accessible
+    directly (but it should not be modified except as part of
+    initialization).

    For binary compatibility, a flag bit in the tp_flags slot
    indicates the existence of the various new slots in the type
@ -79,14 +82,14 @@ Introduction

    In current Python, a distinction is made between types and
    classes.  This PEP together with PEP 254 will remove that
-    distinction.  However, for backwards compatibility there will
-    probably remain a bit of a distinction for years to come, and
-    without PEP 254, the distinction is still large: types ultimately
-    have a built-in type as a base class, while classes ultimately
-    derive from a user-defined class.  Therefore, in the rest of this
-    PEP, I will use the word type whenever I can -- including base
-    type or supertype, derived type or subtype, and metatype.
-    However, sometimes the terminology necessarily blends, e.g. an
+    distinction.  However, for backwards compatibility the distinction
+    will probably remain for years to come, and without PEP 254, the
+    distinction is still large: types ultimately have a built-in type
+    as a base class, while classes ultimately derive from a
+    user-defined class.  Therefore, in the rest of this PEP, I will
+    use the word type whenever I can -- including base type or
+    supertype, derived type or subtype, and metatype.  However,
+    sometimes the terminology necessarily blends, for example an
    object's type is given by its __class__ attribute, and subtyping
    in Python is spelled with a class statement.  If further
    distinction is necessary, user-defined classes can be referred to
@ -95,9 +98,9 @@ Introduction

 About metatypes

-    Inevitably the following discussion will come to mention metatypes
-    (or metaclasses).  Metatypes are nothing new in Python: Python has
-    always been able to talk about the type of a type:
+    Inevitably the discussion comes to metatypes (or metaclasses).
+    Metatypes are nothing new in Python: Python has always been able
+    to talk about the type of a type:

    >>> a = 0
    >>> type(a)
@ -113,16 +116,19 @@ About metatypes
    (PyType_Type, which is also its own metatype), this is not a
    requirement, and in fact a useful and relevant 3rd party extension
    (ExtensionClasses by Jim Fulton) creates an additional metatype.
+    The type of classic classes, known as types.ClassType, can also be
+    considered a distinct metatype.

-    A related feature is the "Don Beaudry hook", which says that if a
-    metatype is callable, its instances (which are regular types) can
-    be subclassed (really subtyped) using a Python class statement.
-    I will use this rule to support subtyping of built-in types, and
-    in fact it greatly simplifies the logic of class creation to
-    always simply call the metatype.  When no base class is specified,
-    a default metatype is called -- the default metatype is the
-    "ClassType" object, so the class statement will behave as before
-    in the normal case.
+    A feature closely connected to metatypes is the "Don Beaudry
+    hook", which says that if a metatype is callable, its instances
+    (which are regular types) can be subclassed (really subtyped)
+    using a Python class statement.  I will use this rule to support
+    subtyping of built-in types, and in fact it greatly simplifies the
+    logic of class creation to always simply call the metatype.  When
+    no base class is specified, a default metatype is called -- the
+    default metatype is the "ClassType" object, so the class statement
+    will behave as before in the normal case.  (This default can be
+    changed per module by setting the global variable __metaclass__.)

    Python uses the concept of metatypes or metaclasses in a different
    way than Smalltalk.  In Smalltalk-80, there is a hierarchy of
@ -138,11 +144,11 @@ About metatypes
    metatypes are typically written in C, and may be shared between
    many regular types. (It will be possible to subtype metatypes in
    Python, so it won't be absolutely necessary to write C in order to
-    use metatypes; but the power of Python metatypes will be limited,
-    e.g. Python code will never be allowed to allocate raw memory and
-    initialize it at will.)
+    use metatypes; but the power of Python metatypes will be limited.
+    For example, Python code will never be allowed to allocate raw
+    memory and initialize it at will.)

-    Metatypes determine various *policies* for types, e.g. what
+    Metatypes determine various *policies* for types,such as what
    happens when a type is called, how dynamic types are (whether a
    type's __dict__ can be modified after it is created), what the
    method resolution order is, how instance attributes are looked
@ -184,6 +190,25 @@ Making a type a factory for its instances
    implementation function looks for the tp_new slot of the type that
    is being called.

+    (Confusion alert: the tp_call slot of a regular type object (such
+    as PyInt_Type or PyList_Type) defines what happens when
+    *instances* of that type are called; in particular, the tp_call
+    slot in the function type, PyFunction_Type, is the key to making
+    functions callable.  As another example, PyInt_Type.tp_call is
+    NULL, because integers are not callable.  The new paradigm makes
+    *type objects* callable.  Since type objects are instances of
+    their metatype (PyType_Type), the metatype's tp_call slot
+    (PyType_Type.tp_call) points to a function that is invoked when
+    any type object is called.  Now, since each type has do do
+    something different to create an instance of itself,
+    PyType_Type.tp_call immediately defers to the tp_new slot of the
+    type that is being called.  To add to the confusion, PyType_Type
+    itself is also callable: its tp_new slot creates a new type.  This
+    is used by the class statement (via the Don Beaudry hook, see
+    above).  And what makes PyType_Type callable?  The tp_call slot of
+    *its* metatype -- but since it is its own metatype, that is its
+    own tp_call slot!)
+
    If the type's tp_new slot is NULL, an exception is raised.
    Otherwise, the tp_new slot is called.  The signature for the
    tp_new slot is
@ -203,6 +228,82 @@ Making a type a factory for its instances
    reference to an existing object is fine too.  The return value
    should always be a new reference, owned by the caller.

+    One the tp_new slot has returned an object, further initialization
+    is attempted by calling the tp_init() slot of the resulting
+    object's type, if not NULL.  This has the following signature:
+
+        PyObject *tp_init(PyObject *self,
+                          PyObject *args,
+                          PyObject *kwds)
+
+    It corresponds more closely to the __init__() method of classic
+    classes, and in fact is mapped to that by the slot/special-method
+    correspondence rules.  The difference in responsibilities between
+    the tp_new() slot and the tp_init() slot lies in the invariants
+    they ensure.  The tp_new() slot should ensure only the most
+    essential invariants, without which the C code that implements the
+    object's wold break.  The tp_init() slot should be used for
+    overridable user-specific initializations.  Take for example the
+    dictionary type.  The implementation has an internal pointer to a
+    hash table which should never be NULL.  This invariant is taken
+    care of by the tp_new() slot for dictionaries.  The dictionary
+    tp_init() slot, on the other hand, could be used to give the
+    dictionary an initial set of keys and values based on the
+    arguments passed in.
+
+    You may wonder why the tp_new() slot shouldn't call the tp_init()
+    slot itself.  The reason is that in certain circumstances (like
+    support for persistent objects), it is important to be able to
+    create an object of a particular type without initializing it any
+    further than necessary.  This may conveniently be done by calling
+    the tp_new() slot without calling tp_init().  It is also possible
+    that tp_init() is not called, or called more than once -- its
+    operation should be robust even in these anomalous cases.
+
+    For some objects, tp_new() may return an existing object.  For
+    example, the factory function for integers caches the integers -1
+    throug 99.  This is permissible only when the type argument to
+    tp_new() is the type that defined the tp_new() function (in the
+    example, if type == &PyInt_Type), and when the tp_init() slot for
+    this type does nothing.  If the type argument differs, the
+    tp_new() call is initiated by by a derived type's tp_new() to
+    create the object and initialize the base type portion of the
+    object; in this case tp_new() should always return a new object
+    (or raise an exception).
+
+    There's a third slot related to object creation: tp_alloc().  Its
+    responsibility is to allocate the memory for the object,
+    initialize the reference count and type pointer field, and
+    initialize the rest of the object to all zeros.  It should also
+    register the object with the garbage collection subsystem if the
+    type supports garbage collection.  This slot exists so that
+    derived types can override the memory allocation policy
+    (e.g. which heap is being used) separately from the initialization
+    code.  The signature is:
+
+        PyObject *tp_alloc(PyTypeObject *type, int nitems)
+
+    The type argument is the type of the new object.  The nitems
+    argument is normally zero, except for objects with a variable
+    allocation size (basically strings, tuples, and longs).  The
+    allocation size is given by the following expression:
+
+        type->tp_basicsize  +  nitems * type->tp_itemsize
+
+    This slot is only used for subclassable types.  The tp_new()
+    function of the base class must call the tp_alloc() slot of the
+    type passed in as its first argument.  It is the tp_new()
+    function's responsibility to calculate the number of items.  The
+    tp_alloc() slot will set the ob_size field of the new object if
+    the type->tp_itemsize field is nonzero.
+
+    XXX The keyword arguments are currently not passed to tp_new();
+    its kwds argument is always NULL.  This is a relic from a previous
+    revision and should probably be fixed.  Both tp_new() and
+    tp_init() should receive exactly the same arguments, and both
+    should check that the arguments are acceptable, because they may
+    be called independently.
+

 Requirements for a type to allow subtyping

@ -240,9 +341,9 @@ Requirements for a type to allow subtyping
    its part of the instance structure).

    A similar reasoning applies to destruction: if a subtype changes
-    the instance allocator (e.g. to use a different heap), it must
-    also change the instance deallocator; but it must still call on
-    the base type's destructor to DECREF the base type's instance
+    the instance allocator (for example to use a different heap), it
+    must also change the instance deallocator; but it must still call
+    on the base type's destructor to DECREF the base type's instance
    variables.

    In this proposal, I assign stricter meanings to two existing
@ -311,7 +412,7 @@ Requirements for a type to allow subtyping
    it is needed in order to derive a subtype.  The type object for
    the base type must also be exported.

-    If the base type has a type-checking macro (e.g. PyDict_Check()),
+    If the base type has a type-checking macro (like PyDict_Check()),
    this macro probably should be changed to recognize subtypes.  This
    can be done by using the new PyObject_TypeCheck(object, type)
    macro, which calls a function that follows the base class links.
@ -434,7 +535,7 @@ Creating a subtype of a built-in type in C
      as well (these are not inherited).

    Exception: if the subtype defines no additional fields in its
-    structure (i.e., it only defines new behavior, no new data), the
+    structure (it only defines new behavior, no new data), the
    tp_basicsize and the tp_dealloc fields may be set to zero.

    In order to complete the initialization of the type,
@ -451,16 +552,17 @@ Creating a subtype of a built-in type in C
    freeing function for any other pointers it owns, and then call the
    base class's tp_dealloc slot.  Because deallocation functions
    typically are not exported, this call has to be made via the base
-    type's type structure, e.g., when deriving from the standard list
-    type:
+    type's type structure, for example, when deriving from the
+    standard list type:

        PyList_Type.tp_dealloc(self);

    (If the subtype uses a different allocation heap than the base
    type, the subtype must call the base type's tp_clear() slot
    instead, followed by a call to free the object's memory from the
-    appropriate heap, e.g. PyObject_DEL(self) if the subtype uses the
-    standard heap.  But in this case subtyping is not recommended.)
+    appropriate heap, such as PyObject_DEL(self) if the subtype uses
+    the standard heap.  But in this case subtyping is not
+    recommended.)

    A subtype is not usable until PyType_InitDict() is called for it;
    this is best done during module initialization, assuming the
@ -506,7 +608,7 @@ Subtyping in Python
    to be provided for the creation of C is: its name (in this example
    the string "C"); the list of base classes (a singleton tuple
    containing B); and the results of executing the class body, in the
-    form of a dictionary (e.g. {"var1": 1, "method1": <function
+    form of a dictionary (for example {"var1": 1, "method1": <function
    method1 at ...>, ...}).

    I propose to rig the class statement to make the following call:
@ -580,8 +682,8 @@ Subtyping in Python
    ensure that this object isn't deallocated while the type object is
    still referencing it); and some more auxiliary storage (to be
    described later).  It initializes this storage to zeros except for
-    a few crucial slots (e.g. tp_name is set to point to the type
-    name) and then sets the tp_base slot to point to B.  Then
+    a few crucial slots (for example, tp_name is set to point to the
+    type name) and then sets the tp_base slot to point to B.  Then
    PyType_InitDict() is called to inherit B's slots.  Finally, C's
    tp_dict slot is updated with the contents of the namespace
    dictionary (the third argument to the call to M).
@ -641,10 +743,10 @@ Junk text (to be reused somewhere above)
    for an allocation flag bit would be to have two type objects,
    identical in the contents of all their slots except for their
    deallocation slot.  But this requires that all type-checking code
-    (e.g. the PyDict_Check()) recognizes both types.  We'll come back
-    to this solution in the context of subtyping.  Another alternative
-    is to require the metatype's tp_call to leave the allocation to
-    the tp_construct method, by passing in a NULL pointer.  But this
+    (like PyDict_Check()) recognizes both types.  We'll come back to
+    this solution in the context of subtyping.  Another alternative is
+    to require the metatype's tp_call to leave the allocation to the
+    tp_construct method, by passing in a NULL pointer.  But this
    doesn't work once we allow subtyping.

    Eventually, when we add any form of subtyping, we'll have to