python-peps/pep-3119.txt

507 lines
21 KiB
Plaintext
Raw Normal View History

PEP: 3119
Title: Introducing Abstract Base Classes
Version: $Revision$
Last-Modified: $Date$
Author: Guido van Rossum <guido@python.org>, Talin <talin@acm.org>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 18-Apr-2007
Post-History: Not yet posted
Abstract
========
**THIS IS A WORK IN PROGRESS! DON'T REVIEW YET!**
2007-04-18 13:32:21 -04:00
This is a proposal to add Abstract Base Class (ABC) support to Python
3000. It proposes:
* An "ABC support framework" which defines a metaclass, a base class,
a decorator, and some helpers that make it easy to define ABCs.
This will be added as a new library module named "abc".
* Specific ABCs for containers and iterators, to be added to the
collections module.
* Specific ABCs for numbers, to be added to a new module, yet to be
named.
* Guidelines for writing additional ABCs.
Much of the thinking that went into the proposal is not about the
specific mechanism of ABCs, as contrasted with Interfaces or Generic
Functions (GFs), but about clarifying philosophical issues like "what
makes a set", "what makes a mapping" and "what makes a sequence".
Rationale
=========
2007-04-18 13:43:05 -04:00
In the domain of object-oriented programming, the usage patterns for
interacting with an object can be divided into two basic categories,
which are 'invocation' and 'inspection'.
2007-04-18 13:43:05 -04:00
Invocation means interacting with an object by invoking its methods.
Usually this is combined with polymorphism, so that invoking a given
method may run different code depending on the type of an object.
2007-04-18 13:43:05 -04:00
Inspection means the ability for external code (outside of the object's
methods) to examine the type or properties of that object, and make
decisions on how to treat that object based on that information.
2007-04-18 13:43:05 -04:00
Both usage patterns serve the same general end, which is to be able to
support the processing of diverse and potentially novel objects in a
uniform way, but at the same time allowing processing decisions to be
customized for each different type of object.
2007-04-18 13:43:05 -04:00
In classical OOP theory, invocation is the preferred usage pattern, and
inspection is actively discouraged, being considered a relic of an
earlier, procedural programming style. However, in practice this view is
simply too dogmatic and inflexible, and leads to a kind of design
rigidity that is very much at odds with the dynamic nature of a language
like Python.
2007-04-18 13:43:05 -04:00
In particular, there is often a need to process objects in a way that
wasn't anticipated by the creator of the object class. It is not always
the best solution to build in to every object methods that satisfy the
needs of every possible user of that object. Moreover, there are many
powerful dispatch philosophies that are in direct contrast to the
classic OOP requirement of behavior being strictly encapsulated within
an object, examples being rule or pattern-match driven logic.
2007-04-18 13:43:05 -04:00
On the the other hand, one of the criticisms of inspection by classic
OOP theorists is the lack of formalisms and the ad hoc nature of what is
being inspected. In a language such as Python, in which almost any
aspect of an object can be reflected and directly accessed by external
code, there are many different ways to test whether an object conforms
to a particular protocol or not. For example, if asking 'is this object
a mutable sequence container?', one can look for a base class of 'list',
or one can look for a method named '__getitem__'. But note that although
these tests may seem obvious, neither of them are correct, as one
generates false negatives, and the other false positives.
2007-04-18 13:43:05 -04:00
The generally agreed-upon remedy is to standardize the tests, and group
them into a formal arrangement. This is most easily done by associating
with each class a set of standard testable properties, either via the
inheritance mechanism or some other means. Each test carries with it a
set of promises: it contains a promise about the general behavior of the
class, and a promise as to what other class methods will be available.
2007-04-18 13:43:05 -04:00
This PEP proposes a particular strategy for organizing these tests known
as Abstract Base Classes, or ABC. ABCs are simply Python classes that
are added into an object's inheritance tree to signal certain features
of that object to an external inspector. Tests are done using
isinstance(), and the presence of a particular ABC means that the test
has passed.
2007-04-18 13:43:05 -04:00
Like all other things in Python, these promises are in the nature of a
gentlemen's agreement - which means that the language does not attempt
to enforce that these promises are kept.
Specification
=============
The specification follows the four categories listed in the abstract:
* An "ABC support framework" which defines a metaclass, a base class,
a decorator, and some helpers that make it easy to define ABCs.
This will be added as a new library module named "abc", or (perhaps)
made built-in functionality.
* Specific ABCs for containers and iterators, to be added to the
collections module.
* Specific ABCs for numbers, to be added to a new module that is yet
to be named.
* Guidelines for writing additional ABCs.
ABC Support Framework
---------------------
The abc module will define some utilities that help defining ABCs.
These are:
``@abstractmethod``
A decorator used to declare abstract methods. This should only be
used with classes whose class is derived from ``Abstract`` below.
A class containing at least one method declared with this
decorator that hasn't been overridden yet cannot be instantiated.
Such a methods may be called from the overriding method in the
subclass (using ``super`` or direct invocation).
``Abstract``
A class implementing the constraint that it or its subclasses
cannot be instantiated unless each abstract method has been
overridden. Its metaclass is ``AbstractClass``. Note: being
derived from ``Abstract`` does not make a class abstract; the
abstract-ness is decided on a per-class basis, depending on
whether all methods defined with ``@abstractmethod`` have been
overridden.
``AbstractClass``
The metaclass of Abstract (and all classes derived from it). Its
purpose is to collect the information during the class
construction stage. It derives from ``type``.
``AbstractInstantiationError``
The exception raised when attempting to instantiate an abstract
class. It derives from ``TypeError``.
A possible implementation would add an attribute
``__abstractmethod__`` to any method declared with
``@abstractmethod``, and add the names of all such abstract methods to
a class attribute named ``__abstractmethods__``. Then the
``Abstract.__new__()`` method would raise an exception if any abstract
methods exist on the class being instantiated. For details see [2]_.
**Open issue:** perhaps ``abstractmethod`` and
``AbstractInstantiationError`` should become built-ins, ``Abstract``'s
functionality should be subsumed by ``object``, and
``AbstractClass``'s functionality should be merged into ``type``.
ABCs for Containers and Iterators
---------------------------------
The collections module will define ABCs necessary and sufficient to
work with sets, mappings, sequences, and some helper types such as
iterators and dictionary views.
The ABCs provide implementations of their abstract methods that are
technically valid but fairly useless; e.g. ``__hash__`` returns 0, and
``__iter__`` returns an empty iterator. In general, the abstract
methods represent the behavior of an empty container of the indicated
type.
Some ABCs also provide concrete (i.e. non-abstract) methods; for
example, the ``Iterator`` class has an ``__iter__`` method returning
itself, fulfilling an important invariant of iterators (which in
Python 2 has to be implemented anew by each iterator class).
No ABCs override ``__init__``, ``__new__``, ``__str__`` or
``__repr__``.
One Trick Ponies
''''''''''''''''
These abstract classes represent single methods like ``__iter__`` or
``__len__``. The ``Iterator`` class is included as well, even though
it has two prescribed methods.
``Hashable``
The base class for classes defining ``__hash__``. The
``__hash__`` method should return an ``Integer`` (see "Numbers"
below). The abstract ``__hash__`` method always returns 0, which
is a valid (albeit inefficient) implementation. **Invariant:** If
classes ``C1`` and ``C2`` both derive from ``Hashable``, the
invariant ``hash(o1) == hash(o2)`` must imply ``o1 == o2`` for all
instances ``o1`` of ``C1`` and all instances ``o2`` of ``C2``.
Note: being an instance of this class does not imply that an
object is immutable; e.g. a tuple containing a list as a member is
not immutable; its ``__hash__`` method raises an exception.
``Iterable``
The base class for classes defining ``__iter__``. The
``__iter__`` method should always return an instance of
``Iterator`` (see below). The abstract ``__iter__`` method
returns an empty iterator.
``Iterator``
The base class for classes defining ``__next__``. This derives
from ``Iterable``. The abstract ``__next__`` method raises
``StopIteration``. The concrete ``__iter__`` method returns
``self``. (Note: this assumes PEP 3114 is implemented.)
``Finite``
The base class for classes defining ``__len__``. The ``__len__``
method should return an ``Integer`` (see "Numbers" below) >= 0.
The abstract ``__len__`` method returns 0. **Invariant:** If a
class ``C`` derives from ``Finite`` as well as from ``Iterable``,
the invariant ``sum(1 for x in o) == len(o)`` should hold for any
instance ``o`` of ``C``.
``Container``
The base class for classes defining ``__contains__``. The
``__contains__` method should return a ``bool``.` The abstract
``__contains__`` method returns ``False``. **Invariant:** If a
class ``C`` derives from ``Container`` as well as from
``Iterable``, ``(x in o for x in o)`` should be a generator
yielding only True values for any instance ``o`` of ``C``.
Note: strictly speaking, there are three variants of this method's
semantics. The first one is for sets and mappings, which is fast:
O(1) or O(log N). The second one is for membership checking on
sequences, which is slow: O(N). The third one is for subsequence
checking on (character or byte) strings, which is also slow: O(N).
Would it make sense to distinguish these? The signature of the
third variant is different, since it takes a sequence (typically
of the same type as the method's target) intead of an element.
For now, I'm using the same type for all three. This means that
is is possible for ``x in o`` to be True even though ``x`` is
never yielded by ``iter(o)``.
Sets
''''
These abstract classes represent various stages of "set-ness". The
most fundamental set operation is the membership test, written as ``x
in s`` and implemented by ``s.__contains__(x)``. This is already
taken care of by the `Container`` class defined above. Therefore, we
define a set as finite, iterable container for which certain
invariants from mathematical set theory hold.
The built-in type ``set`` derives from ``MutableSet``. The built-in
type ``frozenset`` derives from ``HashableSet``.
You might wonder why we require a set to be finite -- surely certain
infinite sets can be represented just fine in Python. For example,
the set of even integers could be defined like this::
class EvenIntegers(Container):
def __contains__(self, x):
return x % 2 == 0
However, such sets have rather limited practical value, and deciding
whether one such set is a subset of another would be difficult in
general without using a symbolic algebra package. So I consider this
out of the scope of a pragmatic proposal like this.
``Set``
This is a finite, iterable container, i.e. a subclass of
``Finite``, ``Iterable`` and ``Container``. Not every subset of
those three classes is a set though! Sets have the additional
invariant that each element occurs only once (as can be determined
by iteration), and in addition sets implement rich comparisons
defined as subclass/superclass tests.
Sets with different implementations can be compared safely,
efficiently and correctly. Because ``Set`` derives from
``Finite``, ``__eq__`` takes a shortcut and returns ``False``
immediately if two sets of unequal length are compared.
Similarly, ``__le__`` returns ``False`` immediately if the first
set has more members than the second set. Note that set inclusion
implements only a partial ordering; e.g. ``{1, 2}`` and ``{1, 3}``
are not ordered (all three of ``<``, ``==`` and ``>`` return
``False`` for these arguments). Sets cannot be ordered relative
to mappings or sequences, but they can be compared for equality
(and then they always compare unequal).
**Open issue:** Should we also implement the ``issubset`` and
``issuperset`` methods found on the set type in Python 2? As
these are just aliases for ``__le__`` and ``__ge__``, I'm tempted
to leave these out.
**Open issue:** Should this class also implement the operators
and/or methods that compute union, intersection, symmetric and
asymmetric difference? The problem with those (unlike the
comparison operators) is what should be the type of the return
value. I'm tentatively leaving these out -- if you need them, you
can test for a ``Set`` instance that implements e.g. ``__and__``.
Some alternatives: make these abstract methods (even though the
semantics apart from the type are well-defined); or make them
concrete methods that return a specific concrete set type; or make
them concrete methods that assume the class constructor always
accepts an iterable of elements; or add a new class method that
accepts an iterable of elements and that creates a new instance.
(I originally considered a "view" alternative, but the problem is
that computing ``len(a&b)`` requires iterating over ``a`` or
``b``, and that pretty much kills the idea.)
``HashableSet``
This is a subclass of both ``Set`` and ``Hashable``. It
implements a concrete ``__hash__`` method that subclasses should
not override; or if they do, the subclass should compute the same
hash value. This is so that sets with different implementations
still hash to the same value, so they can be used interchangeably
as dictionary keys. (A similar constraint exists on the hash
values for different types of numbers and strings.)
``MutableSet``
This is a subclass of ``Set`` implementing additional operations
to add and remove elements. The supported methods have the
semantics known from the ``set`` type in Python 2:
``.add(x)``
Abstract method that adds the element ``x``, if it isn't
already in the set.
``.remove(x)``
Abstract method that removes the element ``x``; raises
``KeyError`` if ``x`` is not in the set.
``.discard(x)``
Concrete method that removes the element ``x`` if it is
a member of the set; implemented using ``__contains__``
and ``remove``.
``.clear()``
Abstract method that empties the set. (Making this concrete
would just add a slow, cumbersome default implementation.)
**Open issue:** Should we support all the operations implemented
by the Python 2 ``set`` type? I.e. union, update, __or__,
__ror__, __ior__, intersection, intersection_update, __and__,
__rand__, __iand__, difference, difference_update, __xor__,
__rxor__, __ixor__, symmetric_difference,
symmetric_difference_update, __sub__, __rsub__, __isub__. Note
that in Python 2, ``a.update(b)`` is not exactly the same as ``a
|= b``, since ``update()`` takes any iterable for an argument,
while ``|=`` requires another set; similar for the other
operators.
Mappings
''''''''
These abstract classes represent various stages of mapping-ness.
The built-in type ``dict`` derives from ``MutableMapping``.
XXX Do we need BasicMapping and IterableMapping? Perhaps we should
just start with Mapping.
``BasicMapping``
A subclass of ``Container`` defining the following methods:
``.__getitem__(key)``
Abstract method that returns the value corresponding to
``key``, or raises ``KeyError``. The implementation always
raises ``KeyError``.
``.get(key, default=None)``
Concrete method returning ``self[key]`` if this does not raise
``KeyError``, and the ``default`` value if it does.
``.__contains__()``
Concrete method returning ``True`` if ``self[key]`` does not
raise ``KeyError``, and ``False`` if it does.
``IterableMapping``
A subclass of ``BasicMapping`` and ``Iterable``. It defines no
new methods. Iterating over such an object should return all the
valid keys (i.e. those keys for which ``.__getitem__()`` returns a
value), once each, and nothing else. It is possible that the
iteration never ends.
``Mapping``
A subclass of ``IterableMapping`` and ``Finite``. It defines
concrete methods ``__eq__``, ``keys``, ``items``, ``values``. The
lengh of such an object should equal to the number of elements
returned by iterating over the object until the end of the
iterator is reached. Two mappings, even with different
implementations, can be compared for equality, and are considered
equal if and only iff their items compare equal when converted to
sets. The ``keys``, ``items`` and ``values`` methods return
views; ``keys`` and ``items`` return ``Set`` views, ``values``
returns a ``Container`` view. The following invariant should
hold: m.items() == set(zip(m.keys(), m.values())).
``HashableMapping``
A subclass of ``Mapping`` and ``Hashable``. The values should be
instances of ``Hashable``. The concrete ``__hash__`` method
should be equal to ``hash(m.items())``.
``MutableMapping``
A subclass of ``Mapping`` that also implements some standard
mutating methods. At least ``__setitem__``, ``__delitem__``,
``clear``, ``update``. XXX Also pop, popitem, setdefault?
XXX Should probably say something about mapping view types, too.
Sequences
'''''''''
These abstract classes represent various stages of sequence-ness.
The built-in ``list`` and ``bytes`` types derive from
``MutableSequence``. The built-in ``tuple`` and ``str`` types derive
from ``HashableSequence``.
``Sequence``
A subclass of ``Iterable``, ``Finite``, ``Container``. It
defines a new abstract method ``__getitem__`` that has a
complicated signature: when called with an integer, it returns an
element of the sequence or raises ``IndexError``; when called with
a ``slice`` object, it returns another ``Sequence``. The concrete
``__iter__`` method iterates over the elements using
``__getitem__`` with integer arguments 0, 1, and so on, until
``IndexError`` is raised. The length should be equal to the
number of values returned by the iterator.
XXX Other candidate methods, which can all have default concrete
implementations that only depend on ``__len__`` and
``__getitem__`` with an integer argument: __reversed__, index,
count, __add__, __mul__, __eq__, __lt__, __le__.
``HashableSequence``
A subclass of ``Sequence`` and ``Hashable``. The concrete
``__hash__`` method should implements the hashing algorithms used
by tuples in Python 2.
``MutableSequence``
A subclass of ``Sequence`` adding some standard mutating methods.
Abstract mutating methods: ``__setitem__`` (for integer indices as
well as slices), ``__delitem__`` (ditto), ``insert``, ``add``,
``reverse``. Concrete mutating methods: ``extend``, ``pop``,
``remove``. Note: this does not define ``sort()`` -- that is only
required to exist on genuine ``list`` instances.
2007-04-18 20:33:02 -04:00
XXX What about ``+=`` and ``*=``?
ABCs for Numbers
----------------
XXX define: Number, Complex, Real, Rational, Integer. Do we have a
use case for Cardinal (Integer >= 0)? Do we need Indexable (converts
to Integer using __index__).
Guidelines for Writing ABCs
---------------------------
XXX Use @abstractmethod and Abstract base class; define abstract
methods that could be useful as an end point when called via a super
chain; define concrete methods that are very simple permutations of
abstract methods (e.g. Mapping.get); keep abstract classes small, one
per use case instead of one per concept.
References
==========
.. [1] An Introduction to ABC's, by Talin
(http://mail.python.org/pipermail/python-3000/2007-April/006614.html)
.. [2] Incomplete implementation prototype, by GvR
(http://svn.python.org/view/sandbox/trunk/abc/)
Copyright
=========
This document has been placed in the public domain.
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End: