Add a section on static methods and class methods.

Add a very uncooked section on the C API.
This commit is contained in:
Guido van Rossum 2001-07-13 21:04:00 +00:00
parent bf4b7e8c90
commit c22397f038
1 changed files with 252 additions and 19 deletions

View File

@ -88,13 +88,13 @@ Introspection APIs
names supported by the object.
The type API is sometimes combined by a __dict__ that works the
same was as for instances (e.g., for function objects in Python
2.1, f.__dict__ contains f's dynamic attributes, while
same was as for instances (for example for function objects in
Python 2.1, f.__dict__ contains f's dynamic attributes, while
f.__members__ lists the names of f's statically defined
attributes).
Some caution must be exercised: some objects don't list theire
"intrinsic" attributes (e.g. __dict__ and __doc__) in __members__,
"intrinsic" attributes (like __dict__ and __doc__) in __members__,
while others do; sometimes attribute names that occur both in
__members__ or __methods__ and as keys in __dict__, in which case
it's anybody's guess whether the value found in __dict__ is used
@ -154,7 +154,7 @@ Specification of the class-based introspection API
inspection of obj.__dict__.
In the discussion below, I distinguish two kinds of objects:
regular objects (e.g. lists, ints, functions) and meta-objects.
regular objects (like lists, ints, functions) and meta-objects.
Types and classes and meta-objects. Meta-objects are also regular
objects, but we're mostly interested in them because they are
referenced by the __class__ attribute of regular objects (or by
@ -248,10 +248,10 @@ Specification of the class-based introspection API
Rationale: we can't have a simples rule like "static overrides
dynamic" or "dynamic overrides static", because some static
attributes indeed override dynamic attributes, e.g. a key
'__class__' in an instance's __dict__ is ignored in favor of
the statically defined __class__ pointer, but on the other hand
most keys in inst.__dict__ override attributes defined in
attributes indeed override dynamic attributes; for example, a
key '__class__' in an instance's __dict__ is ignored in favor
of the statically defined __class__ pointer, but on the other
hand most keys in inst.__dict__ override attributes defined in
inst.__class__. Presence of a __set__ method on a descriptor
indicates that this is a data descriptor. (Even read-only data
descriptors have a __set__ method: it always raises an
@ -275,9 +275,9 @@ Specification of the class-based introspection API
There is little prior art in Python for designing the
descriptor's interface, neither for getting/setting the value
nor for describing the attribute otherwise, except some trivial
properties (e.g. it's reasonable to assume that __name__ and
__doc__ should be the attribute's name and docstring). I will
propose such an API below.
properties (it's reasonable to assume that __name__ and __doc__
should be the attribute's name and docstring). I will propose
such an API below.
If an object found in the meta-object's __dict__ is not an
attribute descriptor, backward compatibility dictates certain
@ -335,24 +335,257 @@ Specification of the attribute descriptor API
effect is the same as when T is omitted), or None. When X is
None, this should be a method descriptor, and the result is an
*unbound* method restricted to objects whose type is (a
descendent of) T. (For methods, this is called a "binding"
operation, even if X==None. Exactly what is returned by the
binding operation depends on the semantics of the descriptor;
for example, class methods ignore the instance and bind to the
type instead.)
descendent of) T. Such an unbound method is a descriptor
itself. For methods, this is called a "binding" operation, even
if X==None. Exactly what is returned by the binding operation
depends on the semantics of the descriptor; for example, static
methods and class methods (see below) ignore the instance and
bind to the type instead.
- __set__(): a function of two arguments that sets the attribute
value on the object. If the attribute is read-only, this method
raises a TypeError exception. (Not an AttributeError!)
Example: C.ivar.set(x, y) ~~ x.ivar = y.
Method attributes may also be callable; in this case they act as
unbound method. Example: C.meth(C(), x) ~~ C().meth(x).
Static methods and class methods
The descriptor API makes it possible to add static methods and
class methods. Static methods are easy to describe: they behave
pretty much like static methods in C++ or Java. Here's an
example:
class C:
def foo(x, y):
print "staticmethod", x, y
foo = staticmethod(foo)
C.foo(1, 2)
c = C()
c.foo(1, 2)
Both the call C.foo(1, 2) and the call c.foo(1, 2) call foo() with
two arguments, and print "staticmethod 1 2". No "self" is declared in
the definition of foo(), and no instance is required in the call.
The line "foo = staticmethod(foo)" in the class statement is the
crucial element: this makes foo() a static method. The built-in
staticmethod() wraps its function argument in a special kind of
descriptor whose __get__() method returns the original function
unchanged. Without this, the __get__() method of standard
function objects would have created a bound method object for
'c.foo' and an unbound method object for 'C.foo'.
Class methods use a similar pattern to declare methods that
receive an implicit first argument that is the *class* for which
they are invoked. This has no C++ or Java equivalent, and is not
quite the same as what class methods are in Smalltalk, but may
serve a similar purpose. (Python also has real metaclasses, and
perhaps methods defined in a metaclass have more right to the name
"class method"; but I expect that most programmers won't be using
metaclasses.) Here's an example:
class C:
def foo(x, y):
print "classmethod", x, y
foo = classmethod(foo)
C.foo(1)
c = C()
c.foo(1)
Both the call C.foo(1) and the call c.foo(1) end up calling foo()
with *two* arguments, and print "classmethod __main__.C 1". The
first argument of foo() is implied, and it is the class, even if
the method was invoked via an instance. Now let's continue the
example:
class D(C):
pass
D.foo(1)
d = D()
d.foo(1)
This prints "classmethod __main__.D 1" both times; in other words,
the class passed as the first argument of foo() is the class
involved in the call, not the class involved in the definition of
foo().
But notice this:
class E(C):
def foo(x, y): # override C.foo
print "E.foo() called"
C.foo(y)
E.foo(1)
e = E()
e.foo(1)
In this example, the call to C.foo() from E.foo() will see class C
as its first argument, not class E. This is to be expected, since
the call specifies the class C. But it stresses the difference
between these class methods and methods defined in metaclasses
(where an upcall to a metamethod would pass the target class as an
explicit first argument). If you don't understand this, don't
worry, you're not alone.
C API
XXX
XXX The following is VERY rough text that I wrote with a different
audience in mind; I'll have to go through this to edit it more.
XXX It also doesn't go into enough detail for the C API.
A built-in type can declare special data attributes in two ways:
using a struct memberlist (defined in structmember.h) or a struct
getsetlist (defined in descrobject.h). The struct memberlist is
an old mechanism put to new use: each attribute has a descriptor
record including its name, an enum giving its type (various C
types are supported as well as PyObject *), an offset from the
start of the instance, and a read-only flag.
The struct getsetlist mechanism is new, and intended for cases
that don't fit in that mold, because they either require
additional checking, or are plain calculated attributes. Each
attribute here has a name, a getter C function pointer, a setter C
function pointer, and a context pointer. The function pointers
are optional, so that for example setting the setter function
pointer to NULL makes a read-only attribute. The context pointer
is intended to pass auxiliary information to generic getter/setter
functions, but I haven't found a need for this yet.
Note that there is also a similar mechanism to declare built-in
methods: these are PyMethodDef structures, which contain a name
and a C function pointer (and some flags for the calling
convention).
Traditionally, built-in types have had to define their own
tp_getattro and tp_setattro slot functions to make these attribute
definitions work (PyMethodDef and struct memberlist are quite
old). There are convenience functions that take an array of
PyMethodDef or memberlist structures, an object, and an attribute
name, and return or set the attribute if found in the list, or
raise an exception if not found. But these convenience functions
had to be explicitly called by the tp_getattro or tp_setattro
method of the specific type, and they did a linear search of the
array using strcmp() to find the array element describing the
requested attribute.
I now have a brand spanking new generic mechanism that improves
this situation substantially.
- Pointers to arrays of PyMethodDef, memberlist, getsetlist
structures are part of the new type object (tp_methods,
tp_members, tp_getset).
- At type initialization time (in PyType_InitDict()), for each
entry in those three arrays, a descriptor object is created and
placed in a dictionary that belongs to the type (tp_dict).
- Descriptors are very lean objects that mostly point to the
corresponding structure. An implementation detail is that all
descriptors share the same object type, and a discriminator
field tells what kind of descriptor it is (method, member, or
getset).
- As explained in PEP 252, descriptors have a get() method that
takes an object argument and returns that object's attribute;
descriptors for writable attributes also have a set() method
that takes an object and a value and set that object's
attribute. Note that the get() object also serves as a bind()
operation for methods, binding the unbound method implementation
to the object.
- Instead of providing their own tp_getattro and tp_setattro
implementation, almost all built-in objects now place
PyObject_GenericGetAttr and (if they have any writable
attributes) PyObject_GenericSetAttr in their tp_getattro and
tp_setattro slots. (Or, they can leave these NULL, and inherit
them from the default base object, if they arrange for an
explicit call to PyType_InitDict() for the type before the first
instance is created.)
- In the simplest case, PyObject_GenericGetAttr() does exactly one
dictionary lookup: it looks up the attribute name in the type's
dictionary (obj->ob_type->tp_dict). Upon success, there are two
possibilities: the descriptor has a get method, or it doesn't.
For speed, the get and set methods are type slots: tp_descr_get
and tp_descr_set. If the tp_descr_get slot is non-NULL, it is
called, passing the object as its only argument, and the return
value from this call is the result of the getattr operation. If
the tp_descr_get slot is NULL, as a fallback the descriptor
itself is returned (compare class attributes that are not
methods but simple values).
- PyObject_GenericSetAttr() works very similar but uses the
tp_descr_set slot and calls it with the object and the new
attribute value; if the tp_descr_set slot is NULL, an
AttributeError is raised.
- But now for a more complicated case. The approach described
above is suitable for most built-in objects such as lists,
strings, numbers. However, some object types have a dictionary
in each instance that can store arbitrary attribute. In fact,
when you use a class statement to subtype an existing built-in
type, you automatically get such a dictionary (unless you
explicitly turn it off, using another advanced feature,
__slots__). Let's call this the instance dict, to distinguish
it from the type dict.
- In the more complicated case, there's a conflict between names
stored in the instance dict and names stored in the type dict.
If both dicts have an entry with the same key, which one should
we return? Looking as classic Python for guidance, I find
conflicting rules: for class instances, the instance dict
overrides the class dict, *except* for the special attributes
(like __dict__ and __class__), which have priority over the
instance dict.
- I resolved this with the following set of rules, implemented in
PyObject_GenericGetAttr():
1. Look in the type dict. If you find a *data* descriptor, use
its get() method to produce the result. This takes care of
special attributes like __dict__ and __class__.
2. Look in the instance dict. If you find anything, that's it.
(This takes care of the requirement that normally the
instance dict overrides the class dict.
3. Look in the type dict again (in reality this uses the saved
result from step 1, of course). If you find a descriptor,
use its get() method; if you find something else, that's it;
if it's not there, raise AttributeError.
This requires a classification of descriptors in data and
nondata descriptors. The current implementation quite sensibly
classifies member and getset descriptors as data (even if they
are read-only!) and member descriptors as nondata.
Non-descriptors (like function pointers or plain values) are
also classified as non-data.
- This scheme has one drawback: in what I assume to be the most
common case, referencing an instance variable stored in the
instance dict, it does *two* dictionary lookups, whereas the
classic scheme did a quick test for attributes starting with two
underscores plus a single dictionary lookup. (Although the
implementation is sadly structured as instance_getattr() calling
instance_getattr1() calling instance_getattr2() which finally
calls PyDict_GetItem(), and the underscore test calls
PyString_AsString() rather than inlining this. I wonder if
optimizing the snot out of this might not be a good idea to
speed up Python 2.2, if we weren't going to rip it all out. :-)
- A benchmark verifies that in fact this is as fast as classic
instance variable lookup, so I'm no longer worried.
- Modification for dynamic types: step 1 and 3 look in the
dictionary of the type and all its base classes (in MRO
sequence, or couse).
Discussion