714 lines
32 KiB
Plaintext
714 lines
32 KiB
Plaintext
PEP: 252
|
||
Title: Making Types Look More Like Classes
|
||
Version: $Revision$
|
||
Author: guido@python.org (Guido van Rossum)
|
||
Status: Draft
|
||
Type: Standards Track
|
||
Python-Version: 2.2
|
||
Created: 19-Apr-2001
|
||
Post-History:
|
||
|
||
Abstract
|
||
|
||
This PEP proposes changes to the introspection API for types that
|
||
makes them look more like classes, and their instances more like
|
||
class instances. For example, type(x) will be equivalent to
|
||
x.__class__ for most built-in types. When C is x.__class__,
|
||
x.meth(a) will generally be equivalent to C.meth(x, a), and
|
||
C.__dict__ contains x's methods and other attributes.
|
||
|
||
This PEP also introduces a new approach to specifying attributes,
|
||
using attribute descriptors, or descriptors for short.
|
||
Descriptors unify and generalize several different common
|
||
mechanisms used for describing attributes: a descriptor can
|
||
describe a method, a typed field in the object structure, or a
|
||
generalized attribute represented by getter and setter functions.
|
||
|
||
Based on the generalized descriptor API, this PEP also introduces
|
||
a way to declare class methods and static methods.
|
||
|
||
|
||
Introduction
|
||
|
||
One of Python's oldest language warts is the difference between
|
||
classes and types. For example, you can't directly subclass the
|
||
dictionary type, and the introspection interface for finding out
|
||
what methods and instance variables an object has is different for
|
||
types and for classes.
|
||
|
||
Healing the class/type split is a big effort, because it affects
|
||
many aspects of how Python is implemented. This PEP concerns
|
||
itself with making the introspection API for types look the same
|
||
as that for classes. Other PEPs will propose making classes look
|
||
more like types, and subclassing from built-in types; these topics
|
||
are not on the table for this PEP.
|
||
|
||
|
||
Introspection APIs
|
||
|
||
Introspection concerns itself with finding out what attributes an
|
||
object has. Python's very general getattr/setattr API makes it
|
||
impossible to guarantee that there always is a way to get a list
|
||
of all attributes supported by a specific object, but in practice
|
||
two conventions have appeared that together work for almost all
|
||
objects. I'll call them the class-based introspection API and the
|
||
type-based introspection API; class API and type API for short.
|
||
|
||
The class-based introspection API is used primarily for class
|
||
instances; it is also used by Jim Fulton's ExtensionClasses. It
|
||
assumes that all data attributes of an object x are stored in the
|
||
dictionary x.__dict__, and that all methods and class variables
|
||
can be found by inspection of x's class, written as x.__class__.
|
||
Classes have a __dict__ attribute, which yields a dictionary
|
||
containing methods and class variables defined by the class
|
||
itself, and a __bases__ attribute, which is a tuple of base
|
||
classes that must be inspected recursively. Some assumptions here
|
||
are:
|
||
|
||
- attributes defined in the instance dict override attributes
|
||
defined by the object's class;
|
||
|
||
- attributes defined in a derived class override attributes
|
||
defined in a base class;
|
||
|
||
- attributes in an earlier base class (meaning occurring earlier
|
||
in __bases__) override attributes in a later base class.
|
||
|
||
(The last two rules together are often summarized as the
|
||
left-to-right, depth-first rule for attribute search. This is the
|
||
classic Python attribute lookup rule. Note that PEP 253 will
|
||
propose to change the attribute lookup order, and if accepted,
|
||
this PEP will follow suit.)
|
||
|
||
The type-based introspection API is supported in one form or
|
||
another by most built-in objects. It uses two special attributes,
|
||
__members__ and __methods__. The __methods__ attribute, if
|
||
present, is a list of method names supported by the object. The
|
||
__members__ attribute, if present, is a list of data attribute
|
||
names supported by the object.
|
||
|
||
The type API is sometimes combined with a __dict__ that works the
|
||
same as for instances (for example for function objects in
|
||
Python 2.1, f.__dict__ contains f's dynamic attributes, while
|
||
f.__members__ lists the names of f's statically defined
|
||
attributes).
|
||
|
||
Some caution must be exercised: some objects don't list their
|
||
"intrinsic" attributes (like __dict__ and __doc__) in __members__,
|
||
while others do; sometimes attribute names occur both in
|
||
__members__ or __methods__ and as keys in __dict__, in which case
|
||
it's anybody's guess whether the value found in __dict__ is used
|
||
or not.
|
||
|
||
The type API has never been carefully specified. It is part of
|
||
Python folklore, and most third party extensions support it
|
||
because they follow examples that support it. Also, any type that
|
||
uses Py_FindMethod() and/or PyMember_Get() in its tp_getattr
|
||
handler supports it, because these two functions special-case the
|
||
attribute names __methods__ and __members__, respectively.
|
||
|
||
Jim Fulton's ExtensionClasses ignore the type API, and instead
|
||
emulate the class API, which is more powerful. In this PEP, I
|
||
propose to phase out the type API in favor of supporting the class
|
||
API for all types.
|
||
|
||
One argument in favor of the class API is that it doesn't require
|
||
you to create an instance in order to find out which attributes a
|
||
type supports; this in turn is useful for documentation
|
||
processors. For example, the socket module exports the SocketType
|
||
object, but this currently doesn't tell us what methods are
|
||
defined on socket objects. Using the class API, SocketType would
|
||
show exactly what the methods for socket objects are, and we can
|
||
even extract their docstrings, without creating a socket. (Since
|
||
this is a C extension module, the source-scanning approach to
|
||
docstring extraction isn't feasible in this case.)
|
||
|
||
|
||
Specification of the class-based introspection API
|
||
|
||
Objects may have two kinds of attributes: static and dynamic. The
|
||
names and sometimes other properties of static attributes are
|
||
knowable by inspection of the object's type or class, which is
|
||
accessible through obj.__class__ or type(obj). (I'm using type
|
||
and class interchangeably; a clumsy but descriptive term that fits
|
||
both is "meta-object".)
|
||
|
||
(XXX static and dynamic are not great terms to use here, because
|
||
"static" attributes may actually behave quite dynamically, and
|
||
because they have nothing to do with static class members in C++
|
||
or Java. Barry suggests to use immutable and mutable instead, but
|
||
those words already have precise and different meanings in
|
||
slightly different contexts, so I think that would still be
|
||
confusing.)
|
||
|
||
Examples of dynamic attributes are instance variables of class
|
||
instances, module attributes, etc. Examples of static attributes
|
||
are the methods of built-in objects like lists and dictionaries,
|
||
and the attributes of frame and code objects (f.f_code,
|
||
c.co_filename, etc.). When an object with dynamic attributes
|
||
exposes these through its __dict__ attribute, __dict__ is a static
|
||
attribute.
|
||
|
||
The names and values of dynamic properties are typically stored in
|
||
a dictionary, and this dictionary is typically accessible as
|
||
obj.__dict__. The rest of this specification is more concerned
|
||
with discovering the names and properties of static attributes
|
||
than with dynamic attributes; the latter are easily discovered by
|
||
inspection of obj.__dict__.
|
||
|
||
In the discussion below, I distinguish two kinds of objects:
|
||
regular objects (like lists, ints, functions) and meta-objects.
|
||
Types and classes and meta-objects. Meta-objects are also regular
|
||
objects, but we're mostly interested in them because they are
|
||
referenced by the __class__ attribute of regular objects (or by
|
||
the __bases__ attribute of other meta-objects).
|
||
|
||
The class introspection API consists of the following elements:
|
||
|
||
- the __class__ and __dict__ attributes on regular objects;
|
||
|
||
- the __bases__ and __dict__ attributes on meta-objects;
|
||
|
||
- precedence rules;
|
||
|
||
- attribute descriptors.
|
||
|
||
Together, these not only tell us about *all* attributes defined by
|
||
a meta-object, but they also help us calculate the value of a
|
||
specific attribute of a given object.
|
||
|
||
1. The __dict__ attribute on regular objects
|
||
|
||
A regular object may have a __dict__ attribute. If it does,
|
||
this should be a mapping (not necessarily a dictionary)
|
||
supporting at least __getitem__(), keys(), and has_key(). This
|
||
gives the dynamic attributes of the object. The keys in the
|
||
mapping give attribute names, and the corresponding values give
|
||
their values.
|
||
|
||
Typically, the value of an attribute with a given name is the
|
||
same object as the value corresponding to that name as a key in
|
||
the __dict__. In othe words, obj.__dict__['spam'] is obj.spam.
|
||
(But see the precedence rules below; a static attribute with
|
||
the same name *may* override the dictionary item.)
|
||
|
||
2. The __class__ attribute on regular objects
|
||
|
||
A regular object usually has a __class__ attribute. If it
|
||
does, this references a meta-object. A meta-object can define
|
||
static attributes for the regular object whose __class__ it
|
||
is. This is normally done through the following mechanism:
|
||
|
||
3. The __dict__ attribute on meta-objects
|
||
|
||
A meta-object may have a __dict__ attribute, of the same form
|
||
as the __dict__ attribute for regular objects (a mapping but
|
||
not necessarily a dictionary). If it does, the keys of the
|
||
meta-object's __dict__ are names of static attributes for the
|
||
corresponding regular object. The values are attribute
|
||
descriptors; we'll explain these later. An unbound method is a
|
||
special case of an attribute descriptor.
|
||
|
||
Because a meta-object is also a regular object, the items in a
|
||
meta-object's __dict__ correspond to attributes of the
|
||
meta-object; however, some transformation may be applied, and
|
||
bases (see below) may define additional dynamic attributes. In
|
||
other words, mobj.spam is not always mobj.__dict__['spam'].
|
||
(This rule contains a loophole because for classes, if
|
||
C.__dict__['spam'] is a function, C.spam is an unbound method
|
||
object.)
|
||
|
||
4. The __bases__ attribute on meta-objects
|
||
|
||
A meta-object may have a __bases__ attribute. If it does, this
|
||
should be a sequence (not necessarily a tuple) of other
|
||
meta-objects, the bases. An absent __bases__ is equivalent to
|
||
an empty sequence of bases. There must never be a cycle in the
|
||
relationship between meta-objects defined by __bases__
|
||
attributes; in other words, the __bases__ attributes define an
|
||
directed acyclic graph, with arcs pointing from derived
|
||
meta-objects to their base meta-objects. (It is not
|
||
necessarily a tree, since multiple classes can have the same
|
||
base class.) The __dict__ attributes of a meta-object in the
|
||
inheritance graph supply attribute descriptors for the regular
|
||
object whose __class__ attribute points to the root of the
|
||
inheritance tree (which is not the same as the root of the
|
||
inheritance hierarchy -- rather more the opposite, at the
|
||
bottom given how inheritance trees are typically drawn).
|
||
Descriptors are first searched in the dictionary of the root
|
||
meta-object, then in its bases, according to a precedence rule
|
||
(see the next paragraph).
|
||
|
||
5. Precedence rules
|
||
|
||
When two meta-objects in the inheritance graph for a given
|
||
regular object both define an attribute descriptor with the
|
||
same name, the search order is up to the meta-object. This
|
||
allows different meta-objects to define different search
|
||
orders. In particular, classic classes use the old
|
||
left-to-right depth-first rule, while new-style classes use a
|
||
more advanced rule (see the section on method resolution order
|
||
in PEP 253).
|
||
|
||
When a dynamic attribute (one defined in a regular object's
|
||
__dict__) has the same name as a static attribute (one defined
|
||
by a meta-object in the inheritance graph rooted at the regular
|
||
object's __class__), the static attribute has precedence if it
|
||
is a descriptor that defines a __set__ method (see below);
|
||
otherwise (if there is no __set__ method) the dynamic attribute
|
||
has precedence. In other words, for data attributes (those
|
||
with a __set__ method), the static definition overrides the
|
||
dynamic definition, but for other attributes, dynamic overrides
|
||
static.
|
||
|
||
Rationale: we can't have a simple rule like "static overrides
|
||
dynamic" or "dynamic overrides static", because some static
|
||
attributes indeed override dynamic attributes; for example, a
|
||
key '__class__' in an instance's __dict__ is ignored in favor
|
||
of the statically defined __class__ pointer, but on the other
|
||
hand most keys in inst.__dict__ override attributes defined in
|
||
inst.__class__. Presence of a __set__ method on a descriptor
|
||
indicates that this is a data descriptor. (Even read-only data
|
||
descriptors have a __set__ method: it always raises an
|
||
exception.) Absence of a __set__ method on a descriptor
|
||
indicates that the descriptor isn't interested in intercepting
|
||
assignment, and then the classic rule applies: an instance
|
||
variable with the same name as a method hides the method until
|
||
it is deleted.
|
||
|
||
6. Attribute descriptors
|
||
|
||
This is where it gets interesting -- and messy. Attribute
|
||
descriptors (descriptors for short) are stored in the
|
||
meta-object's __dict__ (or in the __dict__ of one of its
|
||
ancestors), and have two uses: a descriptor can be used to get
|
||
or set the corresponding attribute value on the (regular,
|
||
non-meta) object, and it has an additional interface that
|
||
describes the attribute for documentation and introspection
|
||
purposes.
|
||
|
||
There is little prior art in Python for designing the
|
||
descriptor's interface, neither for getting/setting the value
|
||
nor for describing the attribute otherwise, except some trivial
|
||
properties (it's reasonable to assume that __name__ and __doc__
|
||
should be the attribute's name and docstring). I will propose
|
||
such an API below.
|
||
|
||
If an object found in the meta-object's __dict__ is not an
|
||
attribute descriptor, backward compatibility dictates certain
|
||
minimal semantics. This basically means that if it is a Python
|
||
function or an unbound method, the attribute is a method;
|
||
otherwise, it is the default value for a dynamic data
|
||
attribute. Backwards compatibility also dictates that (in the
|
||
absence of a __setattr__ method) it is legal to assign to an
|
||
attribute corresponding to a method, and that this creates a
|
||
data attribute shadowing the method for this particular
|
||
instance. However, these semantics are only required for
|
||
backwards compatibility with regular classes.
|
||
|
||
The introspection API is a read-only API. We don't define the
|
||
effect of assignment to any of the special attributes (__dict__,
|
||
__class__ and __bases__), nor the effect of assignment to the
|
||
items of a __dict__. Generally, such assignments should be
|
||
considered off-limits. A future PEP may define some semantics for
|
||
some such assignments. (Especially because currently instances
|
||
support assignment to __class__ and __dict__, and classes support
|
||
assignment to __bases__ and __dict__.)
|
||
|
||
|
||
Specification of the attribute descriptor API
|
||
|
||
Attribute descriptors may have the following attributes. In the
|
||
examples, x is an object, C is x.__class__, x.meth() is a method,
|
||
and x.ivar is a data attribute or instance variable. All
|
||
attributes are optional -- a specific attribute may or may not be
|
||
present on a given descriptor. An absent attribute means that the
|
||
corresponding information is not available or the corresponding
|
||
functionality is not implemented.
|
||
|
||
- __name__: the attribute name. Because of aliasing and renaming,
|
||
the attribute may (additionally or exclusively) be known under a
|
||
different name, but this is the name under which it was born.
|
||
Example: C.meth.__name__ == 'meth'.
|
||
|
||
- __doc__: the attribute's documentation string. This may be
|
||
None.
|
||
|
||
- __objclass__: the class that declared this attribute. The
|
||
descriptor only applies to objects that are instances of this
|
||
class (this includes instances of its subclasses). Example:
|
||
C.meth.__objclass__ is C.
|
||
|
||
- __get__(): a function callable with one or two arguments that
|
||
retrieves the attribute value from an object. This is also
|
||
referred to as a "binding" operation, because it may return a
|
||
"bound method" object in the case of method descriptors. The
|
||
first argument, X, is the object from which the attribute must
|
||
be retrieved or to which it must be bound. When X is None, the
|
||
optional second argument, T, should be meta-object and the
|
||
binding operation may return an *unbound* method restricted to
|
||
instances of T. When both X and T are specified, X should be an
|
||
instance of T. Exactly what is returned by the binding
|
||
operation depends on the semantics of the descriptor; for
|
||
example, static methods and class methods (see below) ignore the
|
||
instance and bind to the type instead.
|
||
|
||
- __set__(): a function of two arguments that sets the attribute
|
||
value on the object. If the attribute is read-only, this method
|
||
may raise a TypeError or AttributeError exception (both are
|
||
allowed, because both are historically found for undefined or
|
||
unsettable attributes). Example:
|
||
C.ivar.set(x, y) ~~ x.ivar = y.
|
||
|
||
|
||
Static methods and class methods
|
||
|
||
The descriptor API makes it possible to add static methods and
|
||
class methods. Static methods are easy to describe: they behave
|
||
pretty much like static methods in C++ or Java. Here's an
|
||
example:
|
||
|
||
class C:
|
||
|
||
def foo(x, y):
|
||
print "staticmethod", x, y
|
||
foo = staticmethod(foo)
|
||
|
||
C.foo(1, 2)
|
||
c = C()
|
||
c.foo(1, 2)
|
||
|
||
Both the call C.foo(1, 2) and the call c.foo(1, 2) call foo() with
|
||
two arguments, and print "staticmethod 1 2". No "self" is declared in
|
||
the definition of foo(), and no instance is required in the call.
|
||
|
||
The line "foo = staticmethod(foo)" in the class statement is the
|
||
crucial element: this makes foo() a static method. The built-in
|
||
staticmethod() wraps its function argument in a special kind of
|
||
descriptor whose __get__() method returns the original function
|
||
unchanged. Without this, the __get__() method of standard
|
||
function objects would have created a bound method object for
|
||
'c.foo' and an unbound method object for 'C.foo'.
|
||
|
||
(XXX Barry suggests to use "sharedmethod" instead of
|
||
"staticmethod", because the word statis is being overloaded in so
|
||
many ways already. But I'm not sure if shared conveys the right
|
||
meaning.)
|
||
|
||
Class methods use a similar pattern to declare methods that
|
||
receive an implicit first argument that is the *class* for which
|
||
they are invoked. This has no C++ or Java equivalent, and is not
|
||
quite the same as what class methods are in Smalltalk, but may
|
||
serve a similar purpose. According to Armin Rigo, they are
|
||
similar to "virtual class methods" in Borland Pascal dialect
|
||
Delphi. (Python also has real metaclasses, and perhaps methods
|
||
defined in a metaclass have more right to the name "class method";
|
||
but I expect that most programmers won't be using metaclasses.)
|
||
Here's an example:
|
||
|
||
class C:
|
||
|
||
def foo(cls, y):
|
||
print "classmethod", cls, y
|
||
foo = classmethod(foo)
|
||
|
||
C.foo(1)
|
||
c = C()
|
||
c.foo(1)
|
||
|
||
Both the call C.foo(1) and the call c.foo(1) end up calling foo()
|
||
with *two* arguments, and print "classmethod __main__.C 1". The
|
||
first argument of foo() is implied, and it is the class, even if
|
||
the method was invoked via an instance. Now let's continue the
|
||
example:
|
||
|
||
class D(C):
|
||
pass
|
||
|
||
D.foo(1)
|
||
d = D()
|
||
d.foo(1)
|
||
|
||
This prints "classmethod __main__.D 1" both times; in other words,
|
||
the class passed as the first argument of foo() is the class
|
||
involved in the call, not the class involved in the definition of
|
||
foo().
|
||
|
||
But notice this:
|
||
|
||
class E(C):
|
||
def foo(cls, y): # override C.foo
|
||
print "E.foo() called"
|
||
C.foo(y)
|
||
foo = classmethod(foo)
|
||
|
||
E.foo(1)
|
||
e = E()
|
||
e.foo(1)
|
||
|
||
In this example, the call to C.foo() from E.foo() will see class C
|
||
as its first argument, not class E. This is to be expected, since
|
||
the call specifies the class C. But it stresses the difference
|
||
between these class methods and methods defined in metaclasses,
|
||
where an upcall to a metamethod would pass the target class as an
|
||
explicit first argument. (If you don't understand this, don't
|
||
worry, you're not alone.) Note that calling cls.foo(y) would be a
|
||
mistake -- it would cause infinite recursion. Also note that you
|
||
can't specify an explicit 'cls' argument to a class method. If
|
||
you want this (e.g. the __new__ method in PEP 253 requires this),
|
||
use a static method with a class as its explicit first argument
|
||
instead.
|
||
|
||
|
||
C API
|
||
|
||
XXX The following is VERY rough text that I wrote with a different
|
||
audience in mind; I'll have to go through this to edit it more.
|
||
XXX It also doesn't go into enough detail for the C API.
|
||
|
||
A built-in type can declare special data attributes in two ways:
|
||
using a struct memberlist (defined in structmember.h) or a struct
|
||
getsetlist (defined in descrobject.h). The struct memberlist is
|
||
an old mechanism put to new use: each attribute has a descriptor
|
||
record including its name, an enum giving its type (various C
|
||
types are supported as well as PyObject *), an offset from the
|
||
start of the instance, and a read-only flag.
|
||
|
||
The struct getsetlist mechanism is new, and intended for cases
|
||
that don't fit in that mold, because they either require
|
||
additional checking, or are plain calculated attributes. Each
|
||
attribute here has a name, a getter C function pointer, a setter C
|
||
function pointer, and a context pointer. The function pointers
|
||
are optional, so that for example setting the setter function
|
||
pointer to NULL makes a read-only attribute. The context pointer
|
||
is intended to pass auxiliary information to generic getter/setter
|
||
functions, but I haven't found a need for this yet.
|
||
|
||
Note that there is also a similar mechanism to declare built-in
|
||
methods: these are PyMethodDef structures, which contain a name
|
||
and a C function pointer (and some flags for the calling
|
||
convention).
|
||
|
||
Traditionally, built-in types have had to define their own
|
||
tp_getattro and tp_setattro slot functions to make these attribute
|
||
definitions work (PyMethodDef and struct memberlist are quite
|
||
old). There are convenience functions that take an array of
|
||
PyMethodDef or memberlist structures, an object, and an attribute
|
||
name, and return or set the attribute if found in the list, or
|
||
raise an exception if not found. But these convenience functions
|
||
had to be explicitly called by the tp_getattro or tp_setattro
|
||
method of the specific type, and they did a linear search of the
|
||
array using strcmp() to find the array element describing the
|
||
requested attribute.
|
||
|
||
I now have a brand spanking new generic mechanism that improves
|
||
this situation substantially.
|
||
|
||
- Pointers to arrays of PyMethodDef, memberlist, getsetlist
|
||
structures are part of the new type object (tp_methods,
|
||
tp_members, tp_getset).
|
||
|
||
- At type initialization time (in PyType_InitDict()), for each
|
||
entry in those three arrays, a descriptor object is created and
|
||
placed in a dictionary that belongs to the type (tp_dict).
|
||
|
||
- Descriptors are very lean objects that mostly point to the
|
||
corresponding structure. An implementation detail is that all
|
||
descriptors share the same object type, and a discriminator
|
||
field tells what kind of descriptor it is (method, member, or
|
||
getset).
|
||
|
||
- As explained in PEP 252, descriptors have a get() method that
|
||
takes an object argument and returns that object's attribute;
|
||
descriptors for writable attributes also have a set() method
|
||
that takes an object and a value and set that object's
|
||
attribute. Note that the get() object also serves as a bind()
|
||
operation for methods, binding the unbound method implementation
|
||
to the object.
|
||
|
||
- Instead of providing their own tp_getattro and tp_setattro
|
||
implementation, almost all built-in objects now place
|
||
PyObject_GenericGetAttr and (if they have any writable
|
||
attributes) PyObject_GenericSetAttr in their tp_getattro and
|
||
tp_setattro slots. (Or, they can leave these NULL, and inherit
|
||
them from the default base object, if they arrange for an
|
||
explicit call to PyType_InitDict() for the type before the first
|
||
instance is created.)
|
||
|
||
- In the simplest case, PyObject_GenericGetAttr() does exactly one
|
||
dictionary lookup: it looks up the attribute name in the type's
|
||
dictionary (obj->ob_type->tp_dict). Upon success, there are two
|
||
possibilities: the descriptor has a get method, or it doesn't.
|
||
For speed, the get and set methods are type slots: tp_descr_get
|
||
and tp_descr_set. If the tp_descr_get slot is non-NULL, it is
|
||
called, passing the object as its only argument, and the return
|
||
value from this call is the result of the getattr operation. If
|
||
the tp_descr_get slot is NULL, as a fallback the descriptor
|
||
itself is returned (compare class attributes that are not
|
||
methods but simple values).
|
||
|
||
- PyObject_GenericSetAttr() works very similar but uses the
|
||
tp_descr_set slot and calls it with the object and the new
|
||
attribute value; if the tp_descr_set slot is NULL, an
|
||
AttributeError is raised.
|
||
|
||
- But now for a more complicated case. The approach described
|
||
above is suitable for most built-in objects such as lists,
|
||
strings, numbers. However, some object types have a dictionary
|
||
in each instance that can store arbitrary attribute. In fact,
|
||
when you use a class statement to subtype an existing built-in
|
||
type, you automatically get such a dictionary (unless you
|
||
explicitly turn it off, using another advanced feature,
|
||
__slots__). Let's call this the instance dict, to distinguish
|
||
it from the type dict.
|
||
|
||
- In the more complicated case, there's a conflict between names
|
||
stored in the instance dict and names stored in the type dict.
|
||
If both dicts have an entry with the same key, which one should
|
||
we return? Looking at classic Python for guidance, I find
|
||
conflicting rules: for class instances, the instance dict
|
||
overrides the class dict, *except* for the special attributes
|
||
(like __dict__ and __class__), which have priority over the
|
||
instance dict.
|
||
|
||
- I resolved this with the following set of rules, implemented in
|
||
PyObject_GenericGetAttr():
|
||
|
||
1. Look in the type dict. If you find a *data* descriptor, use
|
||
its get() method to produce the result. This takes care of
|
||
special attributes like __dict__ and __class__.
|
||
|
||
2. Look in the instance dict. If you find anything, that's it.
|
||
(This takes care of the requirement that normally the
|
||
instance dict overrides the class dict.)
|
||
|
||
3. Look in the type dict again (in reality this uses the saved
|
||
result from step 1, of course). If you find a descriptor,
|
||
use its get() method; if you find something else, that's it;
|
||
if it's not there, raise AttributeError.
|
||
|
||
This requires a classification of descriptors as data and
|
||
nondata descriptors. The current implementation quite sensibly
|
||
classifies member and getset descriptors as data (even if they
|
||
are read-only!) and method descriptors as nondata.
|
||
Non-descriptors (like function pointers or plain values) are
|
||
also classified as non-data (!).
|
||
|
||
- This scheme has one drawback: in what I assume to be the most
|
||
common case, referencing an instance variable stored in the
|
||
instance dict, it does *two* dictionary lookups, whereas the
|
||
classic scheme did a quick test for attributes starting with two
|
||
underscores plus a single dictionary lookup. (Although the
|
||
implementation is sadly structured as instance_getattr() calling
|
||
instance_getattr1() calling instance_getattr2() which finally
|
||
calls PyDict_GetItem(), and the underscore test calls
|
||
PyString_AsString() rather than inlining this. I wonder if
|
||
optimizing the snot out of this might not be a good idea to
|
||
speed up Python 2.2, if we weren't going to rip it all out. :-)
|
||
|
||
- A benchmark verifies that in fact this is as fast as classic
|
||
instance variable lookup, so I'm no longer worried.
|
||
|
||
- Modification for dynamic types: step 1 and 3 look in the
|
||
dictionary of the type and all its base classes (in MRO
|
||
sequence, or couse).
|
||
|
||
|
||
Discussion
|
||
|
||
XXX
|
||
|
||
|
||
Examples
|
||
|
||
Let's look at lists. In classic Python, the method names of
|
||
lists were available as the __methods__ attribute of list objects:
|
||
|
||
>>> [].__methods__
|
||
['append', 'count', 'extend', 'index', 'insert', 'pop',
|
||
'remove', 'reverse', 'sort']
|
||
>>>
|
||
|
||
Under the new proposal, the __methods__ attribute no longer exists:
|
||
|
||
>>> [].__methods__
|
||
Traceback (most recent call last):
|
||
File "<stdin>", line 1, in ?
|
||
AttributeError: 'list' object has no attribute '__methods__'
|
||
>>>
|
||
|
||
Instead, you can get the same information from the list type:
|
||
|
||
>>> T = [].__class__
|
||
>>> T
|
||
<type 'list'>
|
||
>>> dir(T) # like T.__dict__.keys(), but sorted
|
||
['__add__', '__class__', '__contains__', '__eq__', '__ge__',
|
||
'__getattr__', '__getitem__', '__getslice__', '__gt__',
|
||
'__iadd__', '__imul__', '__init__', '__le__', '__len__',
|
||
'__lt__', '__mul__', '__ne__', '__new__', '__radd__',
|
||
'__repr__', '__rmul__', '__setitem__', '__setslice__', 'append',
|
||
'count', 'extend', 'index', 'insert', 'pop', 'remove',
|
||
'reverse', 'sort']
|
||
>>>
|
||
|
||
The new introspection API gives more information than the old one:
|
||
in addition to the regular methods, it also shows the methods that
|
||
are normally invoked through special notations, e.g. __iadd__
|
||
(+=), __len__ (len), __ne__ (!=). You can invoke any method from
|
||
this list directly:
|
||
|
||
>>> a = ['tic', 'tac']
|
||
>>> T.__len__(a) # same as len(a)
|
||
2
|
||
>>> T.append(a, 'toe') # same as a.append('toe')
|
||
>>> a
|
||
['tic', 'tac', 'toe']
|
||
>>>
|
||
|
||
This is just like it is for user-defined classes.
|
||
|
||
Notice a familiar yet surprising name in the list: __init__. This
|
||
is the domain of PEP 253.
|
||
|
||
|
||
Backwards compatibility
|
||
|
||
XXX
|
||
|
||
|
||
Warnings and Errors
|
||
|
||
XXX
|
||
|
||
|
||
Implementation
|
||
|
||
A partial implementation of this PEP is available from CVS as a
|
||
branch named "descr-branch". To experiment with this
|
||
implementation, proceed to check out Python from CVS according to
|
||
the instructions at http://sourceforge.net/cvs/?group_id=5470 but
|
||
add the arguments "-r descr-branch" to the cvs checkout command.
|
||
(You can also start with an existing checkout and do "cvs update
|
||
-r descr-branch".) For some examples of the features described
|
||
here, see the file Lib/test/test_descr.py.
|
||
|
||
Note: the code in this branch goes way beyond this PEP; it is also
|
||
the experimentation area for PEP 253 (Subtyping Built-in Types).
|
||
|
||
|
||
References
|
||
|
||
XXX
|
||
|
||
|
||
Copyright
|
||
|
||
This document has been placed in the public domain.
|
||
|
||
|
||
Local Variables:
|
||
mode: indented-text
|
||
indent-tabs-mode: nil
|
||
End:
|