From c22397f038e3d854495bc0dba7fd702a43ce7de0 Mon Sep 17 00:00:00 2001 From: Guido van Rossum Date: Fri, 13 Jul 2001 21:04:00 +0000 Subject: [PATCH] Add a section on static methods and class methods. Add a very uncooked section on the C API. --- pep-0252.txt | 271 +++++++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 252 insertions(+), 19 deletions(-) diff --git a/pep-0252.txt b/pep-0252.txt index 274319b1e..1170eeafe 100644 --- a/pep-0252.txt +++ b/pep-0252.txt @@ -88,13 +88,13 @@ Introspection APIs names supported by the object. The type API is sometimes combined by a __dict__ that works the - same was as for instances (e.g., for function objects in Python - 2.1, f.__dict__ contains f's dynamic attributes, while + same was as for instances (for example for function objects in + Python 2.1, f.__dict__ contains f's dynamic attributes, while f.__members__ lists the names of f's statically defined attributes). Some caution must be exercised: some objects don't list theire - "intrinsic" attributes (e.g. __dict__ and __doc__) in __members__, + "intrinsic" attributes (like __dict__ and __doc__) in __members__, while others do; sometimes attribute names that occur both in __members__ or __methods__ and as keys in __dict__, in which case it's anybody's guess whether the value found in __dict__ is used @@ -154,7 +154,7 @@ Specification of the class-based introspection API inspection of obj.__dict__. In the discussion below, I distinguish two kinds of objects: - regular objects (e.g. lists, ints, functions) and meta-objects. + regular objects (like lists, ints, functions) and meta-objects. Types and classes and meta-objects. Meta-objects are also regular objects, but we're mostly interested in them because they are referenced by the __class__ attribute of regular objects (or by @@ -248,10 +248,10 @@ Specification of the class-based introspection API Rationale: we can't have a simples rule like "static overrides dynamic" or "dynamic overrides static", because some static - attributes indeed override dynamic attributes, e.g. a key - '__class__' in an instance's __dict__ is ignored in favor of - the statically defined __class__ pointer, but on the other hand - most keys in inst.__dict__ override attributes defined in + attributes indeed override dynamic attributes; for example, a + key '__class__' in an instance's __dict__ is ignored in favor + of the statically defined __class__ pointer, but on the other + hand most keys in inst.__dict__ override attributes defined in inst.__class__. Presence of a __set__ method on a descriptor indicates that this is a data descriptor. (Even read-only data descriptors have a __set__ method: it always raises an @@ -275,9 +275,9 @@ Specification of the class-based introspection API There is little prior art in Python for designing the descriptor's interface, neither for getting/setting the value nor for describing the attribute otherwise, except some trivial - properties (e.g. it's reasonable to assume that __name__ and - __doc__ should be the attribute's name and docstring). I will - propose such an API below. + properties (it's reasonable to assume that __name__ and __doc__ + should be the attribute's name and docstring). I will propose + such an API below. If an object found in the meta-object's __dict__ is not an attribute descriptor, backward compatibility dictates certain @@ -335,24 +335,257 @@ Specification of the attribute descriptor API effect is the same as when T is omitted), or None. When X is None, this should be a method descriptor, and the result is an *unbound* method restricted to objects whose type is (a - descendent of) T. (For methods, this is called a "binding" - operation, even if X==None. Exactly what is returned by the - binding operation depends on the semantics of the descriptor; - for example, class methods ignore the instance and bind to the - type instead.) + descendent of) T. Such an unbound method is a descriptor + itself. For methods, this is called a "binding" operation, even + if X==None. Exactly what is returned by the binding operation + depends on the semantics of the descriptor; for example, static + methods and class methods (see below) ignore the instance and + bind to the type instead. - __set__(): a function of two arguments that sets the attribute value on the object. If the attribute is read-only, this method raises a TypeError exception. (Not an AttributeError!) Example: C.ivar.set(x, y) ~~ x.ivar = y. - Method attributes may also be callable; in this case they act as - unbound method. Example: C.meth(C(), x) ~~ C().meth(x). + +Static methods and class methods + + The descriptor API makes it possible to add static methods and + class methods. Static methods are easy to describe: they behave + pretty much like static methods in C++ or Java. Here's an + example: + + class C: + + def foo(x, y): + print "staticmethod", x, y + foo = staticmethod(foo) + + C.foo(1, 2) + c = C() + c.foo(1, 2) + + Both the call C.foo(1, 2) and the call c.foo(1, 2) call foo() with + two arguments, and print "staticmethod 1 2". No "self" is declared in + the definition of foo(), and no instance is required in the call. + + The line "foo = staticmethod(foo)" in the class statement is the + crucial element: this makes foo() a static method. The built-in + staticmethod() wraps its function argument in a special kind of + descriptor whose __get__() method returns the original function + unchanged. Without this, the __get__() method of standard + function objects would have created a bound method object for + 'c.foo' and an unbound method object for 'C.foo'. + + Class methods use a similar pattern to declare methods that + receive an implicit first argument that is the *class* for which + they are invoked. This has no C++ or Java equivalent, and is not + quite the same as what class methods are in Smalltalk, but may + serve a similar purpose. (Python also has real metaclasses, and + perhaps methods defined in a metaclass have more right to the name + "class method"; but I expect that most programmers won't be using + metaclasses.) Here's an example: + + class C: + + def foo(x, y): + print "classmethod", x, y + foo = classmethod(foo) + + C.foo(1) + c = C() + c.foo(1) + + Both the call C.foo(1) and the call c.foo(1) end up calling foo() + with *two* arguments, and print "classmethod __main__.C 1". The + first argument of foo() is implied, and it is the class, even if + the method was invoked via an instance. Now let's continue the + example: + + class D(C): + pass + + D.foo(1) + d = D() + d.foo(1) + + This prints "classmethod __main__.D 1" both times; in other words, + the class passed as the first argument of foo() is the class + involved in the call, not the class involved in the definition of + foo(). + + But notice this: + + class E(C): + def foo(x, y): # override C.foo + print "E.foo() called" + C.foo(y) + + E.foo(1) + e = E() + e.foo(1) + + In this example, the call to C.foo() from E.foo() will see class C + as its first argument, not class E. This is to be expected, since + the call specifies the class C. But it stresses the difference + between these class methods and methods defined in metaclasses + (where an upcall to a metamethod would pass the target class as an + explicit first argument). If you don't understand this, don't + worry, you're not alone. C API - XXX + XXX The following is VERY rough text that I wrote with a different + audience in mind; I'll have to go through this to edit it more. + XXX It also doesn't go into enough detail for the C API. + + A built-in type can declare special data attributes in two ways: + using a struct memberlist (defined in structmember.h) or a struct + getsetlist (defined in descrobject.h). The struct memberlist is + an old mechanism put to new use: each attribute has a descriptor + record including its name, an enum giving its type (various C + types are supported as well as PyObject *), an offset from the + start of the instance, and a read-only flag. + + The struct getsetlist mechanism is new, and intended for cases + that don't fit in that mold, because they either require + additional checking, or are plain calculated attributes. Each + attribute here has a name, a getter C function pointer, a setter C + function pointer, and a context pointer. The function pointers + are optional, so that for example setting the setter function + pointer to NULL makes a read-only attribute. The context pointer + is intended to pass auxiliary information to generic getter/setter + functions, but I haven't found a need for this yet. + + Note that there is also a similar mechanism to declare built-in + methods: these are PyMethodDef structures, which contain a name + and a C function pointer (and some flags for the calling + convention). + + Traditionally, built-in types have had to define their own + tp_getattro and tp_setattro slot functions to make these attribute + definitions work (PyMethodDef and struct memberlist are quite + old). There are convenience functions that take an array of + PyMethodDef or memberlist structures, an object, and an attribute + name, and return or set the attribute if found in the list, or + raise an exception if not found. But these convenience functions + had to be explicitly called by the tp_getattro or tp_setattro + method of the specific type, and they did a linear search of the + array using strcmp() to find the array element describing the + requested attribute. + + I now have a brand spanking new generic mechanism that improves + this situation substantially. + + - Pointers to arrays of PyMethodDef, memberlist, getsetlist + structures are part of the new type object (tp_methods, + tp_members, tp_getset). + + - At type initialization time (in PyType_InitDict()), for each + entry in those three arrays, a descriptor object is created and + placed in a dictionary that belongs to the type (tp_dict). + + - Descriptors are very lean objects that mostly point to the + corresponding structure. An implementation detail is that all + descriptors share the same object type, and a discriminator + field tells what kind of descriptor it is (method, member, or + getset). + + - As explained in PEP 252, descriptors have a get() method that + takes an object argument and returns that object's attribute; + descriptors for writable attributes also have a set() method + that takes an object and a value and set that object's + attribute. Note that the get() object also serves as a bind() + operation for methods, binding the unbound method implementation + to the object. + + - Instead of providing their own tp_getattro and tp_setattro + implementation, almost all built-in objects now place + PyObject_GenericGetAttr and (if they have any writable + attributes) PyObject_GenericSetAttr in their tp_getattro and + tp_setattro slots. (Or, they can leave these NULL, and inherit + them from the default base object, if they arrange for an + explicit call to PyType_InitDict() for the type before the first + instance is created.) + + - In the simplest case, PyObject_GenericGetAttr() does exactly one + dictionary lookup: it looks up the attribute name in the type's + dictionary (obj->ob_type->tp_dict). Upon success, there are two + possibilities: the descriptor has a get method, or it doesn't. + For speed, the get and set methods are type slots: tp_descr_get + and tp_descr_set. If the tp_descr_get slot is non-NULL, it is + called, passing the object as its only argument, and the return + value from this call is the result of the getattr operation. If + the tp_descr_get slot is NULL, as a fallback the descriptor + itself is returned (compare class attributes that are not + methods but simple values). + + - PyObject_GenericSetAttr() works very similar but uses the + tp_descr_set slot and calls it with the object and the new + attribute value; if the tp_descr_set slot is NULL, an + AttributeError is raised. + + - But now for a more complicated case. The approach described + above is suitable for most built-in objects such as lists, + strings, numbers. However, some object types have a dictionary + in each instance that can store arbitrary attribute. In fact, + when you use a class statement to subtype an existing built-in + type, you automatically get such a dictionary (unless you + explicitly turn it off, using another advanced feature, + __slots__). Let's call this the instance dict, to distinguish + it from the type dict. + + - In the more complicated case, there's a conflict between names + stored in the instance dict and names stored in the type dict. + If both dicts have an entry with the same key, which one should + we return? Looking as classic Python for guidance, I find + conflicting rules: for class instances, the instance dict + overrides the class dict, *except* for the special attributes + (like __dict__ and __class__), which have priority over the + instance dict. + + - I resolved this with the following set of rules, implemented in + PyObject_GenericGetAttr(): + + 1. Look in the type dict. If you find a *data* descriptor, use + its get() method to produce the result. This takes care of + special attributes like __dict__ and __class__. + + 2. Look in the instance dict. If you find anything, that's it. + (This takes care of the requirement that normally the + instance dict overrides the class dict. + + 3. Look in the type dict again (in reality this uses the saved + result from step 1, of course). If you find a descriptor, + use its get() method; if you find something else, that's it; + if it's not there, raise AttributeError. + + This requires a classification of descriptors in data and + nondata descriptors. The current implementation quite sensibly + classifies member and getset descriptors as data (even if they + are read-only!) and member descriptors as nondata. + Non-descriptors (like function pointers or plain values) are + also classified as non-data. + + - This scheme has one drawback: in what I assume to be the most + common case, referencing an instance variable stored in the + instance dict, it does *two* dictionary lookups, whereas the + classic scheme did a quick test for attributes starting with two + underscores plus a single dictionary lookup. (Although the + implementation is sadly structured as instance_getattr() calling + instance_getattr1() calling instance_getattr2() which finally + calls PyDict_GetItem(), and the underscore test calls + PyString_AsString() rather than inlining this. I wonder if + optimizing the snot out of this might not be a good idea to + speed up Python 2.2, if we weren't going to rip it all out. :-) + + - A benchmark verifies that in fact this is as fast as classic + instance variable lookup, so I'm no longer worried. + + - Modification for dynamic types: step 1 and 3 look in the + dictionary of the type and all its base classes (in MRO + sequence, or couse). Discussion