Add a section on static methods and class methods.
Add a very uncooked section on the C API.
This commit is contained in:
parent
bf4b7e8c90
commit
c22397f038
271
pep-0252.txt
271
pep-0252.txt
|
@ -88,13 +88,13 @@ Introspection APIs
|
|||
names supported by the object.
|
||||
|
||||
The type API is sometimes combined by a __dict__ that works the
|
||||
same was as for instances (e.g., for function objects in Python
|
||||
2.1, f.__dict__ contains f's dynamic attributes, while
|
||||
same was as for instances (for example for function objects in
|
||||
Python 2.1, f.__dict__ contains f's dynamic attributes, while
|
||||
f.__members__ lists the names of f's statically defined
|
||||
attributes).
|
||||
|
||||
Some caution must be exercised: some objects don't list theire
|
||||
"intrinsic" attributes (e.g. __dict__ and __doc__) in __members__,
|
||||
"intrinsic" attributes (like __dict__ and __doc__) in __members__,
|
||||
while others do; sometimes attribute names that occur both in
|
||||
__members__ or __methods__ and as keys in __dict__, in which case
|
||||
it's anybody's guess whether the value found in __dict__ is used
|
||||
|
@ -154,7 +154,7 @@ Specification of the class-based introspection API
|
|||
inspection of obj.__dict__.
|
||||
|
||||
In the discussion below, I distinguish two kinds of objects:
|
||||
regular objects (e.g. lists, ints, functions) and meta-objects.
|
||||
regular objects (like lists, ints, functions) and meta-objects.
|
||||
Types and classes and meta-objects. Meta-objects are also regular
|
||||
objects, but we're mostly interested in them because they are
|
||||
referenced by the __class__ attribute of regular objects (or by
|
||||
|
@ -248,10 +248,10 @@ Specification of the class-based introspection API
|
|||
|
||||
Rationale: we can't have a simples rule like "static overrides
|
||||
dynamic" or "dynamic overrides static", because some static
|
||||
attributes indeed override dynamic attributes, e.g. a key
|
||||
'__class__' in an instance's __dict__ is ignored in favor of
|
||||
the statically defined __class__ pointer, but on the other hand
|
||||
most keys in inst.__dict__ override attributes defined in
|
||||
attributes indeed override dynamic attributes; for example, a
|
||||
key '__class__' in an instance's __dict__ is ignored in favor
|
||||
of the statically defined __class__ pointer, but on the other
|
||||
hand most keys in inst.__dict__ override attributes defined in
|
||||
inst.__class__. Presence of a __set__ method on a descriptor
|
||||
indicates that this is a data descriptor. (Even read-only data
|
||||
descriptors have a __set__ method: it always raises an
|
||||
|
@ -275,9 +275,9 @@ Specification of the class-based introspection API
|
|||
There is little prior art in Python for designing the
|
||||
descriptor's interface, neither for getting/setting the value
|
||||
nor for describing the attribute otherwise, except some trivial
|
||||
properties (e.g. it's reasonable to assume that __name__ and
|
||||
__doc__ should be the attribute's name and docstring). I will
|
||||
propose such an API below.
|
||||
properties (it's reasonable to assume that __name__ and __doc__
|
||||
should be the attribute's name and docstring). I will propose
|
||||
such an API below.
|
||||
|
||||
If an object found in the meta-object's __dict__ is not an
|
||||
attribute descriptor, backward compatibility dictates certain
|
||||
|
@ -335,24 +335,257 @@ Specification of the attribute descriptor API
|
|||
effect is the same as when T is omitted), or None. When X is
|
||||
None, this should be a method descriptor, and the result is an
|
||||
*unbound* method restricted to objects whose type is (a
|
||||
descendent of) T. (For methods, this is called a "binding"
|
||||
operation, even if X==None. Exactly what is returned by the
|
||||
binding operation depends on the semantics of the descriptor;
|
||||
for example, class methods ignore the instance and bind to the
|
||||
type instead.)
|
||||
descendent of) T. Such an unbound method is a descriptor
|
||||
itself. For methods, this is called a "binding" operation, even
|
||||
if X==None. Exactly what is returned by the binding operation
|
||||
depends on the semantics of the descriptor; for example, static
|
||||
methods and class methods (see below) ignore the instance and
|
||||
bind to the type instead.
|
||||
|
||||
- __set__(): a function of two arguments that sets the attribute
|
||||
value on the object. If the attribute is read-only, this method
|
||||
raises a TypeError exception. (Not an AttributeError!)
|
||||
Example: C.ivar.set(x, y) ~~ x.ivar = y.
|
||||
|
||||
Method attributes may also be callable; in this case they act as
|
||||
unbound method. Example: C.meth(C(), x) ~~ C().meth(x).
|
||||
|
||||
Static methods and class methods
|
||||
|
||||
The descriptor API makes it possible to add static methods and
|
||||
class methods. Static methods are easy to describe: they behave
|
||||
pretty much like static methods in C++ or Java. Here's an
|
||||
example:
|
||||
|
||||
class C:
|
||||
|
||||
def foo(x, y):
|
||||
print "staticmethod", x, y
|
||||
foo = staticmethod(foo)
|
||||
|
||||
C.foo(1, 2)
|
||||
c = C()
|
||||
c.foo(1, 2)
|
||||
|
||||
Both the call C.foo(1, 2) and the call c.foo(1, 2) call foo() with
|
||||
two arguments, and print "staticmethod 1 2". No "self" is declared in
|
||||
the definition of foo(), and no instance is required in the call.
|
||||
|
||||
The line "foo = staticmethod(foo)" in the class statement is the
|
||||
crucial element: this makes foo() a static method. The built-in
|
||||
staticmethod() wraps its function argument in a special kind of
|
||||
descriptor whose __get__() method returns the original function
|
||||
unchanged. Without this, the __get__() method of standard
|
||||
function objects would have created a bound method object for
|
||||
'c.foo' and an unbound method object for 'C.foo'.
|
||||
|
||||
Class methods use a similar pattern to declare methods that
|
||||
receive an implicit first argument that is the *class* for which
|
||||
they are invoked. This has no C++ or Java equivalent, and is not
|
||||
quite the same as what class methods are in Smalltalk, but may
|
||||
serve a similar purpose. (Python also has real metaclasses, and
|
||||
perhaps methods defined in a metaclass have more right to the name
|
||||
"class method"; but I expect that most programmers won't be using
|
||||
metaclasses.) Here's an example:
|
||||
|
||||
class C:
|
||||
|
||||
def foo(x, y):
|
||||
print "classmethod", x, y
|
||||
foo = classmethod(foo)
|
||||
|
||||
C.foo(1)
|
||||
c = C()
|
||||
c.foo(1)
|
||||
|
||||
Both the call C.foo(1) and the call c.foo(1) end up calling foo()
|
||||
with *two* arguments, and print "classmethod __main__.C 1". The
|
||||
first argument of foo() is implied, and it is the class, even if
|
||||
the method was invoked via an instance. Now let's continue the
|
||||
example:
|
||||
|
||||
class D(C):
|
||||
pass
|
||||
|
||||
D.foo(1)
|
||||
d = D()
|
||||
d.foo(1)
|
||||
|
||||
This prints "classmethod __main__.D 1" both times; in other words,
|
||||
the class passed as the first argument of foo() is the class
|
||||
involved in the call, not the class involved in the definition of
|
||||
foo().
|
||||
|
||||
But notice this:
|
||||
|
||||
class E(C):
|
||||
def foo(x, y): # override C.foo
|
||||
print "E.foo() called"
|
||||
C.foo(y)
|
||||
|
||||
E.foo(1)
|
||||
e = E()
|
||||
e.foo(1)
|
||||
|
||||
In this example, the call to C.foo() from E.foo() will see class C
|
||||
as its first argument, not class E. This is to be expected, since
|
||||
the call specifies the class C. But it stresses the difference
|
||||
between these class methods and methods defined in metaclasses
|
||||
(where an upcall to a metamethod would pass the target class as an
|
||||
explicit first argument). If you don't understand this, don't
|
||||
worry, you're not alone.
|
||||
|
||||
|
||||
C API
|
||||
|
||||
XXX
|
||||
XXX The following is VERY rough text that I wrote with a different
|
||||
audience in mind; I'll have to go through this to edit it more.
|
||||
XXX It also doesn't go into enough detail for the C API.
|
||||
|
||||
A built-in type can declare special data attributes in two ways:
|
||||
using a struct memberlist (defined in structmember.h) or a struct
|
||||
getsetlist (defined in descrobject.h). The struct memberlist is
|
||||
an old mechanism put to new use: each attribute has a descriptor
|
||||
record including its name, an enum giving its type (various C
|
||||
types are supported as well as PyObject *), an offset from the
|
||||
start of the instance, and a read-only flag.
|
||||
|
||||
The struct getsetlist mechanism is new, and intended for cases
|
||||
that don't fit in that mold, because they either require
|
||||
additional checking, or are plain calculated attributes. Each
|
||||
attribute here has a name, a getter C function pointer, a setter C
|
||||
function pointer, and a context pointer. The function pointers
|
||||
are optional, so that for example setting the setter function
|
||||
pointer to NULL makes a read-only attribute. The context pointer
|
||||
is intended to pass auxiliary information to generic getter/setter
|
||||
functions, but I haven't found a need for this yet.
|
||||
|
||||
Note that there is also a similar mechanism to declare built-in
|
||||
methods: these are PyMethodDef structures, which contain a name
|
||||
and a C function pointer (and some flags for the calling
|
||||
convention).
|
||||
|
||||
Traditionally, built-in types have had to define their own
|
||||
tp_getattro and tp_setattro slot functions to make these attribute
|
||||
definitions work (PyMethodDef and struct memberlist are quite
|
||||
old). There are convenience functions that take an array of
|
||||
PyMethodDef or memberlist structures, an object, and an attribute
|
||||
name, and return or set the attribute if found in the list, or
|
||||
raise an exception if not found. But these convenience functions
|
||||
had to be explicitly called by the tp_getattro or tp_setattro
|
||||
method of the specific type, and they did a linear search of the
|
||||
array using strcmp() to find the array element describing the
|
||||
requested attribute.
|
||||
|
||||
I now have a brand spanking new generic mechanism that improves
|
||||
this situation substantially.
|
||||
|
||||
- Pointers to arrays of PyMethodDef, memberlist, getsetlist
|
||||
structures are part of the new type object (tp_methods,
|
||||
tp_members, tp_getset).
|
||||
|
||||
- At type initialization time (in PyType_InitDict()), for each
|
||||
entry in those three arrays, a descriptor object is created and
|
||||
placed in a dictionary that belongs to the type (tp_dict).
|
||||
|
||||
- Descriptors are very lean objects that mostly point to the
|
||||
corresponding structure. An implementation detail is that all
|
||||
descriptors share the same object type, and a discriminator
|
||||
field tells what kind of descriptor it is (method, member, or
|
||||
getset).
|
||||
|
||||
- As explained in PEP 252, descriptors have a get() method that
|
||||
takes an object argument and returns that object's attribute;
|
||||
descriptors for writable attributes also have a set() method
|
||||
that takes an object and a value and set that object's
|
||||
attribute. Note that the get() object also serves as a bind()
|
||||
operation for methods, binding the unbound method implementation
|
||||
to the object.
|
||||
|
||||
- Instead of providing their own tp_getattro and tp_setattro
|
||||
implementation, almost all built-in objects now place
|
||||
PyObject_GenericGetAttr and (if they have any writable
|
||||
attributes) PyObject_GenericSetAttr in their tp_getattro and
|
||||
tp_setattro slots. (Or, they can leave these NULL, and inherit
|
||||
them from the default base object, if they arrange for an
|
||||
explicit call to PyType_InitDict() for the type before the first
|
||||
instance is created.)
|
||||
|
||||
- In the simplest case, PyObject_GenericGetAttr() does exactly one
|
||||
dictionary lookup: it looks up the attribute name in the type's
|
||||
dictionary (obj->ob_type->tp_dict). Upon success, there are two
|
||||
possibilities: the descriptor has a get method, or it doesn't.
|
||||
For speed, the get and set methods are type slots: tp_descr_get
|
||||
and tp_descr_set. If the tp_descr_get slot is non-NULL, it is
|
||||
called, passing the object as its only argument, and the return
|
||||
value from this call is the result of the getattr operation. If
|
||||
the tp_descr_get slot is NULL, as a fallback the descriptor
|
||||
itself is returned (compare class attributes that are not
|
||||
methods but simple values).
|
||||
|
||||
- PyObject_GenericSetAttr() works very similar but uses the
|
||||
tp_descr_set slot and calls it with the object and the new
|
||||
attribute value; if the tp_descr_set slot is NULL, an
|
||||
AttributeError is raised.
|
||||
|
||||
- But now for a more complicated case. The approach described
|
||||
above is suitable for most built-in objects such as lists,
|
||||
strings, numbers. However, some object types have a dictionary
|
||||
in each instance that can store arbitrary attribute. In fact,
|
||||
when you use a class statement to subtype an existing built-in
|
||||
type, you automatically get such a dictionary (unless you
|
||||
explicitly turn it off, using another advanced feature,
|
||||
__slots__). Let's call this the instance dict, to distinguish
|
||||
it from the type dict.
|
||||
|
||||
- In the more complicated case, there's a conflict between names
|
||||
stored in the instance dict and names stored in the type dict.
|
||||
If both dicts have an entry with the same key, which one should
|
||||
we return? Looking as classic Python for guidance, I find
|
||||
conflicting rules: for class instances, the instance dict
|
||||
overrides the class dict, *except* for the special attributes
|
||||
(like __dict__ and __class__), which have priority over the
|
||||
instance dict.
|
||||
|
||||
- I resolved this with the following set of rules, implemented in
|
||||
PyObject_GenericGetAttr():
|
||||
|
||||
1. Look in the type dict. If you find a *data* descriptor, use
|
||||
its get() method to produce the result. This takes care of
|
||||
special attributes like __dict__ and __class__.
|
||||
|
||||
2. Look in the instance dict. If you find anything, that's it.
|
||||
(This takes care of the requirement that normally the
|
||||
instance dict overrides the class dict.
|
||||
|
||||
3. Look in the type dict again (in reality this uses the saved
|
||||
result from step 1, of course). If you find a descriptor,
|
||||
use its get() method; if you find something else, that's it;
|
||||
if it's not there, raise AttributeError.
|
||||
|
||||
This requires a classification of descriptors in data and
|
||||
nondata descriptors. The current implementation quite sensibly
|
||||
classifies member and getset descriptors as data (even if they
|
||||
are read-only!) and member descriptors as nondata.
|
||||
Non-descriptors (like function pointers or plain values) are
|
||||
also classified as non-data.
|
||||
|
||||
- This scheme has one drawback: in what I assume to be the most
|
||||
common case, referencing an instance variable stored in the
|
||||
instance dict, it does *two* dictionary lookups, whereas the
|
||||
classic scheme did a quick test for attributes starting with two
|
||||
underscores plus a single dictionary lookup. (Although the
|
||||
implementation is sadly structured as instance_getattr() calling
|
||||
instance_getattr1() calling instance_getattr2() which finally
|
||||
calls PyDict_GetItem(), and the underscore test calls
|
||||
PyString_AsString() rather than inlining this. I wonder if
|
||||
optimizing the snot out of this might not be a good idea to
|
||||
speed up Python 2.2, if we weren't going to rip it all out. :-)
|
||||
|
||||
- A benchmark verifies that in fact this is as fast as classic
|
||||
instance variable lookup, so I'm no longer worried.
|
||||
|
||||
- Modification for dynamic types: step 1 and 3 look in the
|
||||
dictionary of the type and all its base classes (in MRO
|
||||
sequence, or couse).
|
||||
|
||||
|
||||
Discussion
|
||||
|
|
Loading…
Reference in New Issue