465 lines
19 KiB
Plaintext
465 lines
19 KiB
Plaintext
PEP: 3119
|
||
Title: Introducing Abstract Base Classes
|
||
Version: $Revision$
|
||
Last-Modified: $Date$
|
||
Author: Guido van Rossum <guido@python.org>, Talin <talin@acm.org>
|
||
Status: Draft
|
||
Type: Standards Track
|
||
Content-Type: text/x-rst
|
||
Created: 18-Apr-2007
|
||
Post-History: Not yet posted
|
||
|
||
|
||
Abstract
|
||
========
|
||
|
||
**THIS IS A WORK IN PROGRESS! DON'T REVIEW YET!**
|
||
|
||
This is a proposal to add Abstract Base Class (ABC) support to Python
|
||
3000. It proposes:
|
||
|
||
* An "ABC support framework" which defines a metaclass, a base class,
|
||
a decorator, and some helpers that make it easy to define ABCs.
|
||
This will be added as a new library module named "abc".
|
||
|
||
* Specific ABCs for containers and iterators, to be added to the
|
||
collections module.
|
||
|
||
* Specific ABCs for numbers, to be added to a new module, yet to be
|
||
named.
|
||
|
||
* Guidelines for writing additional ABCs.
|
||
|
||
Much of the thinking that went into the proposal is not about the
|
||
specific mechanism of ABCs, as contrasted with Interfaces or Generic
|
||
Functions (GFs), but about clarifying philosophical issues like "what
|
||
makes a set", "what makes a mapping" and "what makes a sequence".
|
||
|
||
|
||
Rationale
|
||
=========
|
||
|
||
In the domain of object-oriented programming, the usage patterns for
|
||
interacting with an object can be divided into two basic categories,
|
||
which are 'invocation' and 'inspection'.
|
||
|
||
Invocation means interacting with an object by invoking its methods.
|
||
Usually this is combined with polymorphism, so that invoking a given
|
||
method may run different code depending on the type of an object.
|
||
|
||
Inspection means the ability for external code (outside of the object's
|
||
methods) to examine the type or properties of that object, and make
|
||
decisions on how to treat that object based on that information.
|
||
|
||
Both usage patterns serve the same general end, which is to be able to
|
||
support the processing of diverse and potentially novel objects in a
|
||
uniform way, but at the same time allowing processing decisions to be
|
||
customized for each different type of object.
|
||
|
||
In classical OOP theory, invocation is the preferred usage pattern, and
|
||
inspection is actively discouraged, being considered a relic of an
|
||
earlier, procedural programming style. However, in practice this view is
|
||
simply too dogmatic and inflexible, and leads to a kind of design
|
||
rigidity that is very much at odds with the dynamic nature of a language
|
||
like Python.
|
||
|
||
In particular, there is often a need to process objects in a way that
|
||
wasn't anticipated by the creator of the object class. It is not always
|
||
the best solution to build in to every object methods that satisfy the
|
||
needs of every possible user of that object. Moreover, there are many
|
||
powerful dispatch philosophies that are in direct contrast to the
|
||
classic OOP requirement of behavior being strictly encapsulated within
|
||
an object, examples being rule or pattern-match driven logic.
|
||
|
||
On the the other hand, one of the criticisms of inspection by classic
|
||
OOP theorists is the lack of formalisms and the ad hoc nature of what is
|
||
being inspected. In a language such as Python, in which almost any
|
||
aspect of an object can be reflected and directly accessed by external
|
||
code, there are many different ways to test whether an object conforms
|
||
to a particular protocol or not. For example, if asking 'is this object
|
||
a mutable sequence container?', one can look for a base class of 'list',
|
||
or one can look for a method named '__getitem__'. But note that although
|
||
these tests may seem obvious, neither of them are correct, as one
|
||
generates false negatives, and the other false positives.
|
||
|
||
The generally agreed-upon remedy is to standardize the tests, and group
|
||
them into a formal arrangement. This is most easily done by associating
|
||
with each class a set of standard testable properties, either via the
|
||
inheritance mechanism or some other means. Each test carries with it a
|
||
set of promises: it contains a promise about the general behavior of the
|
||
class, and a promise as to what other class methods will be available.
|
||
|
||
This PEP proposes a particular strategy for organizing these tests known
|
||
as Abstract Base Classes, or ABC. ABCs are simply Python classes that
|
||
are added into an object's inheritance tree to signal certain features
|
||
of that object to an external inspector. Tests are done using
|
||
isinstance(), and the presence of a particular ABC means that the test
|
||
has passed.
|
||
|
||
Like all other things in Python, these promises are in the nature of a
|
||
gentlemen's agreement - which means that the language does not attempt
|
||
to enforce that these promises are kept.
|
||
|
||
|
||
Specification
|
||
=============
|
||
|
||
The specification follows the four categories listed in the abstract:
|
||
|
||
* An "ABC support framework" which defines a metaclass, a base class,
|
||
a decorator, and some helpers that make it easy to define ABCs.
|
||
This will be added as a new library module named "abc".
|
||
|
||
* Specific ABCs for containers and iterators, to be added to the
|
||
collections module.
|
||
|
||
* Specific ABCs for numbers, to be added to a new module, yet to be
|
||
named.
|
||
|
||
* Guidelines for writing additional ABCs.
|
||
|
||
|
||
ABC Support Framework
|
||
---------------------
|
||
|
||
The abc module will define some utilities that help defining ABCs.
|
||
These are:
|
||
|
||
``@abstractmethod``
|
||
A decorator to be used to declare abstract methods. This should
|
||
only be used with classes whose class is derived from ``Abstract``
|
||
below. A class containing at least one method declared with this
|
||
decorator that hasn't been overridden yet cannot be instantiated.
|
||
Such a methods may be called from the overriding method in the
|
||
subclass (using ``super`` or direct invocation).
|
||
|
||
``Abstract``
|
||
A class implementing the constraint that it or its subclasses
|
||
cannot be instantiated unless each abstract method has been
|
||
overridden. Its metaclass is ``AbstractClass``. Note: being
|
||
derived from ``Abstract`` does not make a class abstract; the
|
||
abstract-ness is decided on a per-class basis, depending on
|
||
whether all methods defined with ``@abstractmethod`` have been
|
||
overridden.
|
||
|
||
``AbstractClass``
|
||
The metaclass of Abstract (and all classes derived from it). Its
|
||
purpose is to collect the information during the class
|
||
construction stage. It derives from ``type``.
|
||
|
||
``AbstractInstantiationError``
|
||
The exception raised when attempting to instantiate an abstract
|
||
class. It derives from ``TypeError``.
|
||
|
||
A possible implementation would add an attribute
|
||
``__abstractmethod__`` to any method declared with
|
||
``@abstractmethod``, and add the names of all such abstract methods to
|
||
a class attribute named ``__abstractmethods__``. Then the
|
||
``Abstract.__new__()`` method would raise an exception if any abstract
|
||
methods exist on the class being instantiated. For details see [2]_.
|
||
|
||
**Open issue:** perhaps ``abstractmethod`` and
|
||
``AbstractInstantiationError`` should become built-ins, ``Abstract``'s
|
||
functionality should be subsumed by ``object``, and
|
||
``AbstractClass``'s functionality should be merged into ``type``.
|
||
|
||
|
||
ABCs for Containers and Iterators
|
||
---------------------------------
|
||
|
||
The collections module will define ABCs necessary and sufficient to
|
||
work with sets, mappings, sequences, and some helper types such as
|
||
iterators and dictionary views.
|
||
|
||
The ABCs provide implementations of their abstract methods that are
|
||
technically valid but fairly useless; e.g. ``__hash__`` returns 0, and
|
||
``__iter__`` returns an empty iterator. In general, the abstract
|
||
methods represent the behavior of an empty container of the indicated
|
||
type.
|
||
|
||
Some ABCs also provide concrete (i.e. non-abstract) methods; for
|
||
example, the ``Iterator`` class has an ``__iter__`` method returning
|
||
itself, fulfilling an important invariant of iterators (which in
|
||
Python 2 has to be implemented anew by each iterator class).
|
||
|
||
No ABCs override ``__init__``, ``__new__``, ``__str__`` or
|
||
``__repr__``.
|
||
|
||
XXX define how set, frozenset, list, tuple, dict, bytes and str derive
|
||
from these.
|
||
|
||
|
||
One Trick Ponies
|
||
''''''''''''''''
|
||
|
||
These abstract classes represent single methods like ``__iter__`` or
|
||
``__len__``.
|
||
|
||
``Hashable``
|
||
The base class for classes defining ``__hash__``. Its abstract
|
||
``__hash__`` method always returns 0, which is a valid (albeit
|
||
inefficient) implementation. Note: being an instance of this
|
||
class does not imply that an object is immutable; e.g. a tuple
|
||
containing a list as a member is not immutable; its ``__hash__``
|
||
method raises an exception.
|
||
|
||
``Iterable``
|
||
The base class for classes defining ``__iter__``. Its abstract
|
||
``__iter__`` method returns an empty iterator.
|
||
|
||
``Iterator``
|
||
The base class for classes defining ``__next__``. This derives
|
||
from ``Iterable``. Its abstract ``__next__`` method raises
|
||
``StopIteration``. Its ``__iter__`` method returns ``self``, and
|
||
is *not* abstract. (Note: this assumes PEP 3114 is implemented.)
|
||
|
||
``Finite``
|
||
The base class for classes defining ``__len__``. Its abstract
|
||
``__len__`` method returns 0. Any ``__len__`` method should
|
||
return an ``Integer`` (see "Numbers" below) >= 0. If class ``C``
|
||
derives from ``Finite`` as well as from ``Iterable``, the
|
||
invariant ``sum(1 for x in o) == len(o)`` should hold for any
|
||
instance ``o`` of ``C``.
|
||
|
||
``Container``
|
||
The base class for classes defining ``__contains__``. Its
|
||
abstract ``__contains__`` method returns ``False``. Note:
|
||
strictly speaking, there are three variants of this method's
|
||
semantics. The first one is for sets and mappings, which is fast:
|
||
O(1) or O(log N). The second one is for membership checking on
|
||
sequences, which is slow: O(N). The third one is for subsequence
|
||
checking on (character or byte) strings, which is also slow: O(N).
|
||
Would it make sense to distinguish these? The signature of the
|
||
third variant is different, since it takes a sequence (typically
|
||
of the same type as the method's target) intead of an element.
|
||
For now, I'm using the same type for all three.
|
||
|
||
|
||
Sets
|
||
''''
|
||
|
||
These abstract classes represent various stages of "set-ness".
|
||
|
||
``Set``
|
||
This is a finite, iterable container, i.e. a subclass of
|
||
``Finite``, ``Iterable`` and ``Container``. Not every subset of
|
||
those three classes is a set though! Sets have the additional
|
||
property (though it is not expressed in code) that each element
|
||
occurs only once (as can be determined by iteration), and in
|
||
addition sets implement rich comparisons defined as
|
||
subclass/superclass tests.
|
||
|
||
Sets with different implementations can be compared safely,
|
||
efficiently and correctly. Because ``Set`` derives from
|
||
``Finite``, ``__eq__`` takes a shortcut and returns ``False``
|
||
immediately if two sets of unequal length are compared.
|
||
Similarly, ``__le__`` returns ``False`` immediately if the first
|
||
set has more members than the second set. Note that set inclusion
|
||
implements only a partial ordering; e.g. {1, 2} and {1, 3} are not
|
||
ordered (all three of ``<``, ``==`` and ``>`` return ``False`` for
|
||
these arguments). Sets cannot be ordered relative to mappings or
|
||
sequences, but they can be compared for equality (and then they
|
||
always compare unequal).
|
||
|
||
XXX Should we also implement the ``issubset`` and ``issuperset``
|
||
methods found on the set type in Python 2 (which are apparently
|
||
just aliases for ``__le__`` and ``__ge__``)?
|
||
|
||
XXX Should this class also implement union, intersection,
|
||
symmetric and asymmetric difference and/or the corresponding
|
||
operators? The problem with those (unlike the comparison
|
||
operators) is what should be the type of the return value. I'm
|
||
tentatively leaving these out -- if you need them, you can test
|
||
for a ``Set`` instance that implements e.g. ``__and__``. Some
|
||
alternatives: make these abstract methods (even though the
|
||
semantics apart from the type are well-defined); or make them
|
||
concrete methods that return a specific concrete set type; or make
|
||
them concrete methods that assume the class constructor always
|
||
accepts an iterable of elements; or add a new class method that
|
||
accepts an iterable of elements and that creates a new instance.
|
||
(I originally considered a "view" alternative, but the problem is
|
||
that computing ``len(a&b)`` requires iterating over ``a`` or
|
||
``b``, and that pretty much kills the idea.)
|
||
|
||
``HashableSet``
|
||
This is a subclass of both ``Set`` and ``Hashable``. It
|
||
implements a concrete ``__hash__`` method that subclasses should
|
||
not override; or if they do, the subclass should compute the same
|
||
hash value. This is so that sets with different implementations
|
||
still hash to the same value, so they can be used interchangeably
|
||
as dictionary keys. (A similar constraint exists on the hash
|
||
values for different types of numbers and strings.)
|
||
|
||
``MutableSet``
|
||
|
||
This is a subclass of ``Set`` implementing additional operations
|
||
to add and remove elements. The supported methods have the
|
||
semantics known from the ``set`` type in Python 2:
|
||
|
||
``.add(x)``
|
||
Abstract method that adds the element ``x``, if it isn't
|
||
already in the set.
|
||
|
||
``.remove(x)``
|
||
|
||
Abstract method that removes the element ``x``; raises
|
||
``KeyError`` if ``x`` is not in the set.
|
||
|
||
``.discard(x)``
|
||
Concrete method that removes the element ``x`` if it is
|
||
a member of the set; implemented using ``__contains__``
|
||
and ``remove``.
|
||
|
||
``.clear()``
|
||
Abstract method that empties the set. (Making this concrete
|
||
would just add a slow, cumbersome default implementation.)
|
||
|
||
XXX Should we support all the operations implemented by the Python
|
||
2 ``set`` type? I.e. union, update, __or__, __ror__, __ior__,
|
||
intersection, intersection_update, __and__, __rand__, __iand__,
|
||
difference, difference_update, __xor__, __rxor__, __ixor__,
|
||
symmetric_difference, symmetric_difference_update, __sub__,
|
||
__rsub__, __isub__. Note that in Python 2, ``a.update(b)`` is not
|
||
exactly the same as ``a |= b``, since ``update()`` takes any
|
||
iterable for an argument, while ``|=`` requires another set;
|
||
similar for the other operators.
|
||
|
||
|
||
Mappings
|
||
''''''''
|
||
|
||
These abstract classes represent various stages of mapping-ness.
|
||
|
||
XXX Do we need BasicMapping and IterableMapping? Perhaps we should
|
||
just start with Mapping.
|
||
|
||
``BasicMapping``
|
||
A subclass of ``Container`` defining the following methods:
|
||
|
||
``.__getitem__(key)``
|
||
Abstract method that returns the value corresponding to
|
||
``key``, or raises ``KeyError``. The implementation always
|
||
raises ``KeyError``.
|
||
|
||
``.get(key, default=None)``
|
||
Concrete method returning ``self[key]`` if this does not raise
|
||
``KeyError``, and the ``default`` value if it does.
|
||
|
||
``.__contains__()``
|
||
Concrete method returning ``True`` if ``self[key]`` does not
|
||
raise ``KeyError``, and ``False`` if it does.
|
||
|
||
|
||
``IterableMapping``
|
||
A subclass of ``BasicMapping`` and ``Iterable``. It defines no
|
||
new methods. Iterating over such an object should return all the
|
||
valid keys (i.e. those keys for which ``.__getitem__()`` returns a
|
||
value), once each, and nothing else. It is possible that the
|
||
iteration never ends.
|
||
|
||
``Mapping``
|
||
A subclass of ``IterableMapping`` and ``Finite``. It defines
|
||
concrete methods ``__eq__``, ``keys``, ``items``, ``values``. The
|
||
lengh of such an object should equal to the number of elements
|
||
returned by iterating over the object until the end of the
|
||
iterator is reached. Two mappings, even with different
|
||
implementations, can be compared for equality, and are considered
|
||
equal if and only iff their items compare equal when converted to
|
||
sets. The ``keys``, ``items`` and ``values`` methods return
|
||
views; ``keys`` and ``items`` return ``Set`` views, ``values``
|
||
returns a ``Container`` view. The following invariant should
|
||
hold: m.items() == set(zip(m.keys(), m.values())).
|
||
|
||
``HashableMapping``
|
||
A subclass of ``Mapping`` and ``Hashable``. The values should be
|
||
instances of ``Hashable``. The concrete ``__hash__`` method
|
||
should be equal to ``hash(m.items())``.
|
||
|
||
``MutableMapping``
|
||
A subclass of ``Mapping`` that also implements some standard
|
||
mutating methods. At least ``__setitem__``, ``__delitem__``,
|
||
``clear``, ``update``. XXX Also pop, popitem, setdefault?
|
||
|
||
XXX Should probably say something about mapping view types, too.
|
||
|
||
Sequences
|
||
'''''''''
|
||
|
||
These abstract classes represent various stages of sequence-ness.
|
||
|
||
``Sequence``
|
||
A subclass of ``Iterable``, ``Finite``, ``Container``. It
|
||
defines a new abstract method ``__getitem__`` that has a
|
||
complicated signature: when called with an integer, it returns an
|
||
element of the sequence or raises ``IndexError``; when called with
|
||
a ``slice`` object, it returns another ``Sequence``. The concrete
|
||
``__iter__`` method iterates over the elements using
|
||
``__getitem__`` with integer arguments 0, 1, and so on, until
|
||
``IndexError`` is raised. The length should be equal to the
|
||
number of values returned by the iterator.
|
||
|
||
XXX Other candidate methods, which can all have default concrete
|
||
implementations that only depend on ``__len__`` and
|
||
``__getitem__`` with an integer argument: __reversed__, index,
|
||
count, __add__, __mul__, __eq__, __lt__, __le__.
|
||
|
||
``HashableSequence``
|
||
A subclass of ``Sequence`` and ``Hashable``. The concrete
|
||
``__hash__`` method should implements the hashing algorithms used
|
||
by tuples in Python 2.
|
||
|
||
``MutableSequence``
|
||
A subclass of ``Sequence`` adding some standard mutating methods.
|
||
Abstract mutating methods: ``__setitem__`` (for integer indices as
|
||
well as slices), ``__delitem__`` (ditto), ``insert``, ``add``,
|
||
``reverse``. Concrete mutating methods: ``extend``, ``pop``,
|
||
``remove``. Note: this does not define ``sort()`` -- that is only
|
||
required to exist on genuine ``list`` instances.
|
||
|
||
XXX What about ``+=`` and ``*=``?
|
||
|
||
|
||
ABCs for Numbers
|
||
----------------
|
||
|
||
XXX define: Number, Complex, Real, Rational, Integer. Do we have a
|
||
use case for Cardinal (Integer >= 0)? Do we need Indexable (converts
|
||
to Integer using __index__).
|
||
|
||
|
||
Guidelines for Writing ABCs
|
||
---------------------------
|
||
|
||
XXX Use @abstractmethod and Abstract base class; define abstract
|
||
methods that could be useful as an end point when called via a super
|
||
chain; define concrete methods that are very simple permutations of
|
||
abstract methods (e.g. Mapping.get); keep abstract classes small, one
|
||
per use case instead of one per concept.
|
||
|
||
|
||
References
|
||
==========
|
||
|
||
.. [1] An Introduction to ABC's, by Talin
|
||
(http://mail.python.org/pipermail/python-3000/2007-April/006614.html)
|
||
|
||
.. [2] Incomplete implementation prototype, by GvR
|
||
(http://svn.python.org/view/sandbox/trunk/abc/)
|
||
|
||
|
||
Copyright
|
||
=========
|
||
|
||
This document has been placed in the public domain.
|
||
|
||
|
||
|
||
..
|
||
Local Variables:
|
||
mode: indented-text
|
||
indent-tabs-mode: nil
|
||
sentence-end-double-space: t
|
||
fill-column: 70
|
||
coding: utf-8
|
||
End:
|