Add a section on static methods and class methods.

Add a very uncooked section on the C API.
2001-07-13 21:04:00 +00:00 · 2001-07-13 21:04:00 +00:00 · c22397f038
parent bf4b7e8c90
commit c22397f038
1 changed files with 252 additions and 19 deletions
--- a/pep-0252.txt
+++ b/pep-0252.txt
@ -88,13 +88,13 @@ Introspection APIs
    names supported by the object.

    The type API is sometimes combined by a __dict__ that works the
-    same was as for instances (e.g., for function objects in Python
-    2.1, f.__dict__ contains f's dynamic attributes, while
+    same was as for instances (for example for function objects in
+    Python 2.1, f.__dict__ contains f's dynamic attributes, while
    f.__members__ lists the names of f's statically defined
    attributes).

    Some caution must be exercised: some objects don't list theire
-    "intrinsic" attributes (e.g. __dict__ and __doc__) in __members__,
+    "intrinsic" attributes (like __dict__ and __doc__) in __members__,
    while others do; sometimes attribute names that occur both in
    __members__ or __methods__ and as keys in __dict__, in which case
    it's anybody's guess whether the value found in __dict__ is used
@ -154,7 +154,7 @@ Specification of the class-based introspection API
    inspection of obj.__dict__.

    In the discussion below, I distinguish two kinds of objects:
-    regular objects (e.g. lists, ints, functions) and meta-objects.
+    regular objects (like lists, ints, functions) and meta-objects.
    Types and classes and meta-objects.  Meta-objects are also regular
    objects, but we're mostly interested in them because they are
    referenced by the __class__ attribute of regular objects (or by
@ -248,10 +248,10 @@ Specification of the class-based introspection API

       Rationale: we can't have a simples rule like "static overrides
       dynamic" or "dynamic overrides static", because some static
-       attributes indeed override dynamic attributes, e.g. a key
-       '__class__' in an instance's __dict__ is ignored in favor of
-       the statically defined __class__ pointer, but on the other hand
-       most keys in inst.__dict__ override attributes defined in
+       attributes indeed override dynamic attributes; for example, a
+       key '__class__' in an instance's __dict__ is ignored in favor
+       of the statically defined __class__ pointer, but on the other
+       hand most keys in inst.__dict__ override attributes defined in
       inst.__class__.  Presence of a __set__ method on a descriptor
       indicates that this is a data descriptor.  (Even read-only data
       descriptors have a __set__ method: it always raises an
@ -275,9 +275,9 @@ Specification of the class-based introspection API
       There is little prior art in Python for designing the
       descriptor's interface, neither for getting/setting the value
       nor for describing the attribute otherwise, except some trivial
-       properties (e.g. it's reasonable to assume that __name__ and
-       __doc__ should be the attribute's name and docstring).  I will
-       propose such an API below.
+       properties (it's reasonable to assume that __name__ and __doc__
+       should be the attribute's name and docstring).  I will propose
+       such an API below.

       If an object found in the meta-object's __dict__ is not an
       attribute descriptor, backward compatibility dictates certain
@ -335,24 +335,257 @@ Specification of the attribute descriptor API
      effect is the same as when T is omitted), or None.  When X is
      None, this should be a method descriptor, and the result is an
      *unbound* method restricted to objects whose type is (a
-      descendent of) T.  (For methods, this is called a "binding"
-      operation, even if X==None.  Exactly what is returned by the
-      binding operation depends on the semantics of the descriptor;
-      for example, class methods ignore the instance and bind to the
-      type instead.)
+      descendent of) T.  Such an unbound method is a descriptor
+      itself.  For methods, this is called a "binding" operation, even
+      if X==None.  Exactly what is returned by the binding operation
+      depends on the semantics of the descriptor; for example, static
+      methods and class methods (see below) ignore the instance and
+      bind to the type instead.

    - __set__(): a function of two arguments that sets the attribute
      value on the object.  If the attribute is read-only, this method
      raises a TypeError exception.  (Not an AttributeError!)
      Example: C.ivar.set(x, y) ~~ x.ivar = y.

-    Method attributes may also be callable; in this case they act as
-    unbound method.  Example: C.meth(C(), x) ~~ C().meth(x).
+
+Static methods and class methods
+
+    The descriptor API makes it possible to add static methods and
+    class methods.  Static methods are easy to describe: they behave
+    pretty much like static methods in C++ or Java.  Here's an
+    example:
+
+      class C:
+
+          def foo(x, y):
+              print "staticmethod", x, y
+          foo = staticmethod(foo)
+
+      C.foo(1, 2)
+      c = C()
+      c.foo(1, 2)
+
+    Both the call C.foo(1, 2) and the call c.foo(1, 2) call foo() with
+    two arguments, and print "staticmethod 1 2".  No "self" is declared in
+    the definition of foo(), and no instance is required in the call.
+
+    The line "foo = staticmethod(foo)" in the class statement is the
+    crucial element: this makes foo() a static method.  The built-in
+    staticmethod() wraps its function argument in a special kind of
+    descriptor whose __get__() method returns the original function
+    unchanged.  Without this, the __get__() method of standard
+    function objects would have created a bound method object for
+    'c.foo' and an unbound method object for 'C.foo'.
+
+    Class methods use a similar pattern to declare methods that
+    receive an implicit first argument that is the *class* for which
+    they are invoked.  This has no C++ or Java equivalent, and is not
+    quite the same as what class methods are in Smalltalk, but may
+    serve a similar purpose.  (Python also has real metaclasses, and
+    perhaps methods defined in a metaclass have more right to the name
+    "class method"; but I expect that most programmers won't be using
+    metaclasses.)  Here's an example:
+
+      class C:
+
+          def foo(x, y):
+              print "classmethod", x, y
+          foo = classmethod(foo)
+
+      C.foo(1)
+      c = C()
+      c.foo(1)
+
+    Both the call C.foo(1) and the call c.foo(1) end up calling foo()
+    with *two* arguments, and print "classmethod __main__.C 1".  The
+    first argument of foo() is implied, and it is the class, even if
+    the method was invoked via an instance.  Now let's continue the
+    example:
+
+      class D(C):
+          pass
+
+      D.foo(1)
+      d = D()
+      d.foo(1)
+
+    This prints "classmethod __main__.D 1" both times; in other words,
+    the class passed as the first argument of foo() is the class
+    involved in the call, not the class involved in the definition of
+    foo().
+
+    But notice this:
+
+      class E(C):
+          def foo(x, y): # override C.foo
+              print "E.foo() called"
+              C.foo(y)
+
+      E.foo(1)
+      e = E()
+      e.foo(1)
+
+    In this example, the call to C.foo() from E.foo() will see class C
+    as its first argument, not class E.  This is to be expected, since
+    the call specifies the class C.  But it stresses the difference
+    between these class methods and methods defined in metaclasses
+    (where an upcall to a metamethod would pass the target class as an
+    explicit first argument).  If you don't understand this, don't
+    worry, you're not alone.


 C API

-    XXX
+    XXX The following is VERY rough text that I wrote with a different
+    audience in mind; I'll have to go through this to edit it more.
+    XXX It also doesn't go into enough detail for the C API.
+
+    A built-in type can declare special data attributes in two ways:
+    using a struct memberlist (defined in structmember.h) or a struct
+    getsetlist (defined in descrobject.h).  The struct memberlist is
+    an old mechanism put to new use: each attribute has a descriptor
+    record including its name, an enum giving its type (various C
+    types are supported as well as PyObject *), an offset from the
+    start of the instance, and a read-only flag.
+
+    The struct getsetlist mechanism is new, and intended for cases
+    that don't fit in that mold, because they either require
+    additional checking, or are plain calculated attributes.  Each
+    attribute here has a name, a getter C function pointer, a setter C
+    function pointer, and a context pointer.  The function pointers
+    are optional, so that for example setting the setter function
+    pointer to NULL makes a read-only attribute.  The context pointer
+    is intended to pass auxiliary information to generic getter/setter
+    functions, but I haven't found a need for this yet.
+
+    Note that there is also a similar mechanism to declare built-in
+    methods: these are PyMethodDef structures, which contain a name
+    and a C function pointer (and some flags for the calling
+    convention).
+
+    Traditionally, built-in types have had to define their own
+    tp_getattro and tp_setattro slot functions to make these attribute
+    definitions work (PyMethodDef and struct memberlist are quite
+    old).  There are convenience functions that take an array of
+    PyMethodDef or memberlist structures, an object, and an attribute
+    name, and return or set the attribute if found in the list, or
+    raise an exception if not found.  But these convenience functions
+    had to be explicitly called by the tp_getattro or tp_setattro
+    method of the specific type, and they did a linear search of the
+    array using strcmp() to find the array element describing the
+    requested attribute.
+
+    I now have a brand spanking new generic mechanism that improves
+    this situation substantially.
+
+    - Pointers to arrays of PyMethodDef, memberlist, getsetlist
+      structures are part of the new type object (tp_methods,
+      tp_members, tp_getset).
+
+    - At type initialization time (in PyType_InitDict()), for each
+      entry in those three arrays, a descriptor object is created and
+      placed in a dictionary that belongs to the type (tp_dict).
+
+    - Descriptors are very lean objects that mostly point to the
+      corresponding structure.  An implementation detail is that all
+      descriptors share the same object type, and a discriminator
+      field tells what kind of descriptor it is (method, member, or
+      getset).
+
+    - As explained in PEP 252, descriptors have a get() method that
+      takes an object argument and returns that object's attribute;
+      descriptors for writable attributes also have a set() method
+      that takes an object and a value and set that object's
+      attribute.  Note that the get() object also serves as a bind()
+      operation for methods, binding the unbound method implementation
+      to the object.
+
+    - Instead of providing their own tp_getattro and tp_setattro
+      implementation, almost all built-in objects now place
+      PyObject_GenericGetAttr and (if they have any writable
+      attributes) PyObject_GenericSetAttr in their tp_getattro and
+      tp_setattro slots.  (Or, they can leave these NULL, and inherit
+      them from the default base object, if they arrange for an
+      explicit call to PyType_InitDict() for the type before the first
+      instance is created.)
+
+    - In the simplest case, PyObject_GenericGetAttr() does exactly one
+      dictionary lookup: it looks up the attribute name in the type's
+      dictionary (obj->ob_type->tp_dict).  Upon success, there are two
+      possibilities: the descriptor has a get method, or it doesn't.
+      For speed, the get and set methods are type slots: tp_descr_get
+      and tp_descr_set.  If the tp_descr_get slot is non-NULL, it is
+      called, passing the object as its only argument, and the return
+      value from this call is the result of the getattr operation.  If
+      the tp_descr_get slot is NULL, as a fallback the descriptor
+      itself is returned (compare class attributes that are not
+      methods but simple values).
+
+    - PyObject_GenericSetAttr() works very similar but uses the
+      tp_descr_set slot and calls it with the object and the new
+      attribute value; if the tp_descr_set slot is NULL, an
+      AttributeError is raised.
+
+    - But now for a more complicated case.  The approach described
+      above is suitable for most built-in objects such as lists,
+      strings, numbers.  However, some object types have a dictionary
+      in each instance that can store arbitrary attribute.  In fact,
+      when you use a class statement to subtype an existing built-in
+      type, you automatically get such a dictionary (unless you
+      explicitly turn it off, using another advanced feature,
+      __slots__).  Let's call this the instance dict, to distinguish
+      it from the type dict.
+
+    - In the more complicated case, there's a conflict between names
+      stored in the instance dict and names stored in the type dict.
+      If both dicts have an entry with the same key, which one should
+      we return?  Looking as classic Python for guidance, I find
+      conflicting rules: for class instances, the instance dict
+      overrides the class dict, *except* for the special attributes
+      (like __dict__ and __class__), which have priority over the
+      instance dict.
+
+    - I resolved this with the following set of rules, implemented in
+      PyObject_GenericGetAttr():
+
+      1. Look in the type dict.  If you find a *data* descriptor, use
+         its get() method to produce the result.  This takes care of
+         special attributes like __dict__ and __class__.
+
+      2. Look in the instance dict.  If you find anything, that's it.
+         (This takes care of the requirement that normally the
+         instance dict overrides the class dict.
+
+      3. Look in the type dict again (in reality this uses the saved
+         result from step 1, of course).  If you find a descriptor,
+         use its get() method; if you find something else, that's it;
+         if it's not there, raise AttributeError.
+
+      This requires a classification of descriptors in data and
+      nondata descriptors.  The current implementation quite sensibly
+      classifies member and getset descriptors as data (even if they
+      are read-only!)  and member descriptors as nondata.
+      Non-descriptors (like function pointers or plain values) are
+      also classified as non-data.
+
+    - This scheme has one drawback: in what I assume to be the most
+      common case, referencing an instance variable stored in the
+      instance dict, it does *two* dictionary lookups, whereas the
+      classic scheme did a quick test for attributes starting with two
+      underscores plus a single dictionary lookup.  (Although the
+      implementation is sadly structured as instance_getattr() calling
+      instance_getattr1() calling instance_getattr2() which finally
+      calls PyDict_GetItem(), and the underscore test calls
+      PyString_AsString() rather than inlining this.  I wonder if
+      optimizing the snot out of this might not be a good idea to
+      speed up Python 2.2, if we weren't going to rip it all out. :-)
+
+    - A benchmark verifies that in fact this is as fast as classic
+      instance variable lookup, so I'm no longer worried.
+
+    - Modification for dynamic types: step 1 and 3 look in the
+      dictionary of the type and all its base classes (in MRO
+      sequence, or couse).


 Discussion