python-peps/pep-3119.txt

701 lines
29 KiB
Plaintext
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

PEP: 3119
Title: Introducing Abstract Base Classes
Version: $Revision$
Last-Modified: $Date$
Author: Guido van Rossum <guido@python.org>, Talin <talin@acm.org>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 18-Apr-2007
Post-History: Not yet posted
Abstract
========
This is a proposal to add Abstract Base Class (ABC) support to Python
3000. It proposes:
* An "ABC support framework" which defines a built-in decorator that
can be used to define abstract methods. A class containing an
abstract method that isn't overridden cannot be instantiated.
* Specific ABCs for containers and iterators, to be added to the
collections module.
Much of the thinking that went into the proposal is not about the
specific mechanism of ABCs, as contrasted with Interfaces or Generic
Functions (GFs), but about clarifying philosophical issues like "what
makes a set", "what makes a mapping" and "what makes a sequence".
Acknowledgements
----------------
Talin wrote the Rationale below [1]_ as well as most of the section on
ABCs vs. Interfaces. For that alone he deserves co-authorship. The
rest of the PEP uses "I" referring to the first author.
Rationale
=========
In the domain of object-oriented programming, the usage patterns for
interacting with an object can be divided into two basic categories,
which are 'invocation' and 'inspection'.
Invocation means interacting with an object by invoking its methods.
Usually this is combined with polymorphism, so that invoking a given
method may run different code depending on the type of an object.
Inspection means the ability for external code (outside of the
object's methods) to examine the type or properties of that object,
and make decisions on how to treat that object based on that
information.
Both usage patterns serve the same general end, which is to be able to
support the processing of diverse and potentially novel objects in a
uniform way, but at the same time allowing processing decisions to be
customized for each different type of object.
In classical OOP theory, invocation is the preferred usage pattern,
and inspection is actively discouraged, being considered a relic of an
earlier, procedural programming style. However, in practice this view
is simply too dogmatic and inflexible, and leads to a kind of design
rigidity that is very much at odds with the dynamic nature of a
language like Python.
In particular, there is often a need to process objects in a way that
wasn't anticipated by the creator of the object class. It is not
always the best solution to build in to every object methods that
satisfy the needs of every possible user of that object. Moreover,
there are many powerful dispatch philosophies that are in direct
contrast to the classic OOP requirement of behavior being strictly
encapsulated within an object, examples being rule or pattern-match
driven logic.
On the the other hand, one of the criticisms of inspection by classic
OOP theorists is the lack of formalisms and the ad hoc nature of what
is being inspected. In a language such as Python, in which almost any
aspect of an object can be reflected and directly accessed by external
code, there are many different ways to test whether an object conforms
to a particular protocol or not. For example, if asking 'is this
object a mutable sequence container?', one can look for a base class
of 'list', or one can look for a method named '__getitem__'. But note
that although these tests may seem obvious, neither of them are
correct, as one generates false negatives, and the other false
positives.
The generally agreed-upon remedy is to standardize the tests, and
group them into a formal arrangement. This is most easily done by
associating with each class a set of standard testable properties,
either via the inheritance mechanism or some other means. Each test
carries with it a set of promises: it contains a promise about the
general behavior of the class, and a promise as to what other class
methods will be available.
This PEP proposes a particular strategy for organizing these tests
known as Abstract Base Classes, or ABC. ABCs are simply Python
classes that are added into an object's inheritance tree to signal
certain features of that object to an external inspector. Tests are
done using isinstance(), and the presence of a particular ABC means
that the test has passed.
In addition, the ABCs define a minimal set of methods that establish
the characteristic behavior of the type. Code that discriminates
objects based on their ABC type can trust that those methods will
always be present. Each of these methods are accompanied by an
generalized abstract semantic definition that is described in the
documentation for the ABC. These standard semantic definitions are
not enforced, but are strongly recommended.
Like all other things in Python, these promises are in the nature of a
gentlemen's agreement, which in this case means that while the
language does enforce some of the promises made in the ABC, it is up
to the implementer of the concrete class to insure that the remaining
ones are kept.
Specification
=============
The specification follows the categories listed in the abstract:
* An "ABC support framework" which defines a built-in decorator that
make it easy to define ABCs, and mechanisms to support it.
* Specific ABCs for containers and iterators, to be added to the
collections module.
ABC Support Framework
---------------------
We define a new built-in decorator, ``@abstractmethod``, to be used to
declare abstract methods. A class containing at least one method
declared with this decorator that hasn't been overridden yet cannot be
instantiated. Such a methods may be called from the overriding method
in the subclass (using ``super`` or direct invocation). For example::
class A:
@abstractmethod
def foo(self): pass
A() # raises TypeError
class B(A):
pass
B() # raises TypeError
class C(A):
def foo(self): print(42)
C() # works
**Implementation:** The ``@abstractmethod`` decorator sets the
function attribute ``__isabstractmethod__`` to the value ``True``.
The ``type.__new__`` method computes the type attribute
``__abstractmethods__`` as the set of all method names that have an
``__isabstractmethod__`` attribute whose value is true. It does this
by combining the ``__abstractmethods__`` attributes of the base
classes, adding the names of all methods in the new class dict that
have a true ``__isabstractmethod__`` attribute, and removing the names
of all methods in the new class dict that don't have a true
``__isabstractmethod__`` attribute. If the resulting
``__abstractmethods__`` set is non-empty, the class is considered
abstract, and attempts to instantiate it will raise ``TypeError``.
(CPython can uses an internal flag ``Py_TPFLAGS_ABSTRACT`` to speed up
this check [6]_.)
**Discussion:** Unlike C++ or Java, abstract methods as defined here
may have an implementation. This implementation can be called via the
``super`` mechanism from the class that overrides it. This could be
useful as an end-point for a super-call in framework using a
cooperative multiple-inheritance [7]_, [8]_.
ABCs for Containers and Iterators
---------------------------------
The ``collections`` module will define ABCs necessary and sufficient
to work with sets, mappings, sequences, and some helper types such as
iterators and dictionary views.
The ABCs provide implementations of their abstract methods that are
technically valid but fairly useless; e.g. ``__hash__`` returns 0, and
``__iter__`` returns an empty iterator. In general, the abstract
methods represent the behavior of an empty container of the indicated
type.
Some ABCs also provide concrete (i.e. non-abstract) methods; for
example, the ``Iterator`` class has an ``__iter__`` method returning
itself, fulfilling an important invariant of iterators (which in
Python 2 has to be implemented anew by each iterator class).
No ABCs override ``__init__``, ``__new__``, ``__str__`` or
``__repr__``. Defining a standard constructor signature would
unnecessarily constrain custom container types, for example Patricia
trees or gdbm files. Defining a specific string representation for a
collection is similarly left up to individual implementations.
Ordering ABCs
'''''''''''''
These ABCs are closer to ``object`` in the ABC hierarchy.
``PartiallyOrdered``
This ABC defines the 4 inequality operations ``<``, ``<=``, ``>=``,
``>``. (Note that ``==`` and ``!=`` are defined by ``object``.)
Classes deriving from this ABC should satisfy weak invariants such
as ``a < b < c`` implies ``a < c`` but don't require that for any
two instances ``x`` and ``y`` exactly one of ``x < y``, ``x == y``
or ``x >= y`` apply.
``TotallyOrdered``
This ABC derives from ``PartiallyOrdered``. It adds no new
operations but implies a promise of stronger invariants. **Open
issues:** Should ``float`` derive from ``TotallyOrdered`` even
though for ``NaN`` this isn't strictly correct?
**Open issues:** Where should these live? The ``collections`` module
doesn't seem right.
One Trick Ponies
''''''''''''''''
These abstract classes represent single methods like ``__iter__`` or
``__len__``.
``Hashable``
The base class for classes defining ``__hash__``. The
``__hash__`` method should return an ``Integer`` (see "Numbers"
below). The abstract ``__hash__`` method always returns 0, which
is a valid (albeit inefficient) implementation. **Invariant:** If
classes ``C1`` and ``C2`` both derive from ``Hashable``, the
condition ``o1 == o2`` must imply ``hash(o1) == hash(o2)`` for all
instances ``o1`` of ``C1`` and all instances ``o2`` of ``C2``.
IOW, two objects shouldn't compare equal but have different hash
values.
Another constraint is that hashable objects, once created, should
never change their value (as compared by ``==``) or their hash
value. If a class cannot guarantee this, it should not derive
from ``Hashable``; if it cannot guarantee this for certain
instances only, ``__hash__`` for those instances should raise a
``TypeError`` exception.
Note: being an instance of this class does not imply that an
object is immutable; e.g. a tuple containing a list as a member is
not immutable; its ``__hash__`` method raises ``TypeError``.
``Iterable``
The base class for classes defining ``__iter__``. The
``__iter__`` method should always return an instance of
``Iterator`` (see below). The abstract ``__iter__`` method
returns an empty iterator.
``Iterator``
The base class for classes defining ``__next__``. This derives
from ``Iterable``. The abstract ``__next__`` method raises
``StopIteration``. The concrete ``__iter__`` method returns
``self``. (Note: this assumes PEP 3114 is implemented.)
``Sized``
The base class for classes defining ``__len__``. The ``__len__``
method should return an ``Integer`` (see "Numbers" below) >= 0.
The abstract ``__len__`` method returns 0. **Invariant:** If a
class ``C`` derives from ``Sized`` as well as from ``Iterable``,
the invariant ``sum(1 for x in o) == len(o)`` should hold for any
instance ``o`` of ``C``. **Open issues:** Is ``Sized`` the best
name? Proposed alternatives already tentatively rejected:
``Finite`` (nobody understood it), ``Lengthy``, ``Sizeable`` (both
too cute), ``Countable`` (the set of natural numbers is a
countable set in math), ``Enumerable`` (sounds like a sysnonym for
``Iterable``), ``Dimension``, ``Extent`` (sound like numbers to
me), ``Bounded`` (probably just as confusing as ``Fininte``).
``Container``
The base class for classes defining ``__contains__``. The
``__contains__`` method should return a ``bool``. The abstract
``__contains__`` method returns ``False``. **Invariant:** If a
class ``C`` derives from ``Container`` as well as from
``Iterable``, then ``(x in o for x in o)`` should be a generator
yielding only True values for any instance ``o`` of ``C``.
Note: strictly speaking, there are three variants of this method's
semantics. The first one is for sets and mappings, which is fast:
O(1) or O(log N). The second one is for membership checking on
sequences, which is slow: O(N). The third one is for subsequence
checking on (character or byte) strings, which is also slow: O(N).
Would it make sense to distinguish these? The signature of the
third variant is different, since it takes a sequence (typically
of the same type as the method's target) intead of an element.
For now, I'm using the same type for all three. This means that
is is possible for ``x in o`` to be True even though ``x`` is
never yielded by ``iter(o)``. A suggested name for the third form
is ``Searchable``.
Sets
''''
These abstract classes represent various stages of "set-ness". The
most fundamental set operation is the membership test, written as ``x
in s`` and implemented by ``s.__contains__(x)``. This is already
taken care of by the `Container`` class defined above. Therefore, we
define a set as finite, iterable container for which certain
invariants from mathematical set theory hold.
The built-in type ``set`` derives from ``MutableSet``. The built-in
type ``frozenset`` derives from ``HashableSet``.
You might wonder why we require a set to be finite -- surely certain
infinite sets can be represented just fine in Python. For example,
the set of even integers could be defined like this::
class EvenIntegers(Container):
def __contains__(self, x):
return x % 2 == 0
However, such sets have rather limited practical value, and deciding
whether one such set is a subset of another would be difficult in
general without using a symbolic algebra package. So I consider this
out of the scope of a pragmatic proposal like this.
``Set``
This is a finite, iterable, partially ordered container, i.e. a
subclass of ``Sized``, ``Iterable``, ``Container`` and
``PartiallyOrdered``. Not every subset of those three classes is
a set though! Sets have the additional invariant that each
element occurs only once (as can be determined by iteration), and
in addition sets define concrete operators that implement the
inequality operations as subclass/superclass tests.
Sets with different implementations can be compared safely,
efficiently and correctly. Because ``Set`` derives from
``Sized``, ``__eq__`` takes a shortcut and returns ``False``
immediately if two sets of unequal length are compared.
Similarly, ``__le__`` returns ``False`` immediately if the first
set has more members than the second set. Note that set inclusion
implements only a partial ordering; e.g. ``{1, 2}`` and ``{1, 3}``
are not ordered (all three of ``<``, ``==`` and ``>`` return
``False`` for these arguments). Sets cannot be ordered relative
to mappings or sequences, but they can be compared for equality
(and then they always compare unequal).
Note: the ``issubset`` and ``issuperset`` methods found on the set
type in Python 2 are not supported, as these are mostly just
aliases for ``__le__`` and ``__ge__``.
**Open issues:** Spell out the invariants and method definitions.
``ComposableSet``
This is a subclass of ``Set`` that defines abstract operators to
compute union, intersection, symmetric and asymmetric difference,
respectively ``__or__``, ``__and__``, ``__xor__`` and ``__sub__``.
These operators should return instances of ``ComposableSet``. The
abstract implementations return no meaningful values but raise
``NotImplementedError``; this is because any generic
implementation would have to create new instances but the ABCs
don't (and shouldn't, IMO) provide an API for creating new
instances. **Invariants:** The implementations of these operators
should ensure that the results match the mathematical definition
of set composition.
**Open issues:** Should I spell out the invariants? Should we
define an API for creating new instances (e.g. a class method or a
fixed constructor signature)? Should we just pick a concrete
return type (e.g. ``set``)?
``HashableSet``
This is a subclass of both ``ComposableSet`` and ``Hashable``. It
implements a concrete ``__hash__`` method that subclasses should
not override; or if they do, the subclass should compute the same
hash value. This is so that sets with different implementations
still hash to the same value, so they can be used interchangeably
as dictionary keys. (A similar constraint exists on the hash
values for different types of numbers and strings.)
**Open issues:** Should I spell out the hash algorithm? Should
there be another ABC that derives from Set and Hashable (but not
from Composable)?
``MutableSet``
This is a subclass of ``ComposableSet`` implementing additional
operations to add and remove elements. The supported methods have
the semantics known from the ``set`` type in Python 2 (except
for ``discard``, which is modeled after Java):
``.add(x)``
Abstract method returning a ``bool`` that adds the element
``x`` if it isn't already in the set. It should return
``True`` if ``x`` was added, ``False`` if it was already
there. The abstract implementation raises
``NotImplementedError``.
``.discard(x)``
Abstract method returning a ``bool`` that removes the element
``x`` if present. It should return ``True`` if the element
was present and ``False`` if it wasn't. The abstract
implementation raises ``NotImplementedError``.
``.clear()``
Abstract method that empties the set. The abstract
implementation raises ``NotImplementedError``. (Making this
concrete would just add a slow, cumbersome default
implementation.)
``.pop()``
Concrete method that removes an arbitrary item. If the set is
empty, it raises ``KeyError``. The default implementation
removes the first item returned by the set's iterator.
``.toggle(x)``
Concrete method returning a ``bool`` that adds x to the set if
it wasn't there, but removes it if it was there. It should
return ``True`` if ``x`` was added, ``False`` if it was
removed.
This also supports the in-place mutating operations ``|=``,
``&=``, ``^=``, ``-=``. These are concrete methods whose right
operand can be an arbitrary ``Iterable``. It does not support the
named methods that perform (almost) the same operations.
Mappings
''''''''
These abstract classes represent various stages of mapping-ness. The
``Mapping`` class represents the most common read-only mapping API.
However, code *accepting* a mapping is encouraged to check for the
``BasicMapping`` ABC when iteration is not used. This allows for
certain "black-box" implementations that can look up values by key but
don't provide a convenient iteration API. A hypothetical example
would be an interface to a hierarchical filesystem, where keys are
pathnames relative to some root directory. Iterating over all
pathnames would presumably take forever, as would counting the number
of valid pathnames.
The built-in type ``dict`` derives from ``MutableMapping``.
``BasicMapping``
A subclass of ``Container`` defining the following methods:
``.__getitem__(key)``
Abstract method that returns the value corresponding to
``key``, or raises ``KeyError``. The implementation always
raises ``KeyError``.
``.get(key, default=None)``
Concrete method returning ``self[key]`` if this does not raise
``KeyError``, and the ``default`` value if it does.
``.__contains__()``
Concrete method returning ``True`` if ``self[key]`` does not
raise ``KeyError``, and ``False`` if it does.
``Mapping``
A subclass of ``BasicMapping``, ``iterable`` and ``Sized``. It
defines concrete methods ``__eq__``, ``keys``, ``items``,
``values``. Iterating over a mapping should return all the valid
keys (i.e. those keys for which ``.__getitem__()`` returns a
value), once each, and nothing else. The lengh of a mapping
should equal to the number of elements returned by iterating over
the object until the end of the iterator is reached (this is
implied by the invariant listed above for ``Sized``). Two
mappings, even with different implementations, can be compared for
equality, and are considered equal if and only iff their items
compare equal when converted to sets. The ``keys``, ``items`` and
``values`` methods return views; ``keys`` and ``items`` return
``Set`` views, ``values`` returns a ``Container`` view. The
following invariant should hold: m.items() == set(zip(m.keys(),
m.values())).
``HashableMapping``
A subclass of ``Mapping`` and ``Hashable``. The values should be
instances of ``Hashable``. The concrete ``__hash__`` method
should be equal to ``hash(m.items())``.
``MutableMapping``
A subclass of ``Mapping`` that also implements some standard
mutating methods. Abstract methods include ``__setitem__``,
``__delitem__``, ``clear``, ``update``. Concrete methods include
``pop``, ``popitem``. Note: ``setdefault`` is *not* included.
**Open issues:**
* We should say more about mapping view types.
Sequences
'''''''''
These abstract classes represent various stages of sequence-ness.
The built-in ``list`` and ``bytes`` types derive from
``MutableSequence``. The built-in ``tuple`` and ``str`` types derive
from ``HashableSequence``.
``Sequence``
A subclass of ``Iterable``, ``Sized``, ``Container``. It
defines a new abstract method ``__getitem__`` that has a
complicated signature: when called with an integer, it returns an
element of the sequence or raises ``IndexError``; when called with
a ``slice`` object, it returns another ``Sequence``. The concrete
``__iter__`` method iterates over the elements using
``__getitem__`` with integer arguments 0, 1, and so on, until
``IndexError`` is raised. The length should be equal to the
number of values returned by the iterator.
**Open issues:** Other candidate methods, which can all have
default concrete implementations that only depend on ``__len__``
and ``__getitem__`` with an integer argument: __reversed__, index,
count, __add__, __mul__, __eq__, __lt__, __le__.
``HashableSequence``
A subclass of ``Sequence`` and ``Hashable``. The concrete
``__hash__`` method should implements the hashing algorithms used
by tuples in Python 2.
``MutableSequence``
A subclass of ``Sequence`` adding some standard mutating methods.
Abstract mutating methods: ``__setitem__`` (for integer indices as
well as slices), ``__delitem__`` (ditto), ``insert``, ``append``,
``reverse``. Concrete mutating methods: ``extend``, ``pop``,
``remove``. Concrete mutating operators: ``+=``, ``*=`` (these
mutate the object in place). Note: this does not define
``sort()`` -- that is only required to exist on genuine ``list``
instances.
Strings
-------
Python 3000 has two built-in string types: byte strings (``bytes``),
deriving from ``MutableSequence``, and (Unicode) character strings
(``str``), deriving from ``HashableSequence``.
**Open issues:** define the base interfaces for these so alternative
implementations and subclasses know what they are in for. This may be
the subject of a new PEP or PEPs (PEP 358 should be co-opted for the
``bytes`` type).
ABCs for Numbers
----------------
**Open issues:** Define: ``Number``, ``Complex``, ``Real``,
``Rational``, ``Integer``. Maybe also ``Cardinal`` (``Integer`` >=
0)? We probably also need ``Index``, which converts to ``Integer``
using ``__index__``. This should be moved out to a separate PEP.
Guidelines for Writing ABCs
---------------------------
Some suggestions for writing ABCs:
* Use the ``@abstractmethod`` decorator.
* Define abstract methods that could be useful as an end point when
called via a super chain.
* Define concrete methods that are very simple permutations of
abstract methods (e.g. ``Mapping.get``).
* Keep abstract classes small, one per use case instead of one per
concept.
ABCs vs. Alternatives
=====================
In this section I will attempt to compare and contrast ABCs to other
approaches that have been proposed.
ABCs vs. Duck Typing
--------------------
Does the introduction of ABCs mean the end of Duck Typing? I don't
think so. Python will not require that a class derives from
``BasicMapping`` or ``Sequence`` when it defines a ``__getitem__``
method, nor will the ``x[y]`` syntax require that ``x`` is an instance
of either ABC. You will still be able to assign any "file-like"
object to ``sys.stdout``, as long as it has a ``write`` method.
Of course, there will be some carrots to encourage users to derive
from the appropriate base classes; these vary from default
implementations for certain functionality to an improved ability to
distinguish between mappings and sequences. But there are no sticks.
If ``hasattr(x, __len__)`` works for you, great! ABCs are intended to
solve problems that don't have a good solution at all in Python 2,
such as distinguishing between mappings and sequences.
ABCs vs. Generic Functions
--------------------------
ABCs are compatible with Generic Functions (GFs). For example, my own
Generic Functions implementation [4]_ uses the classes (types) of the
arguments as the dispatch key, allowing derived classes to override
base classes. Since (from Python's perspective) ABCs are quite
ordinary classes, using an ABC in the default implementation for a GF
can be quite appropriate. For example, if I have an overloaded
``prettyprint`` function, it would make total sense to define
pretty-printing of sets like this::
@prettyprint.register(Set)
def pp_set(s):
return "{" + ... + "}" # Details left as an exercise
and implementations for specific subclasses of Set could be added
easily.
I believe ABCs also won't present any problems for RuleDispatch,
Phillip Eby's GF implementation in PEAK [5]_.
Of course, GF proponents might claim that GFs (and concrete, or
implementation, classes) are all you need. But even they will not
deny the usefulness of inheritance; and one can easily consider the
ABCs proposed in this PEP as optional implementation base classes;
there is no requirement that all user-defined mappings derive from
``BasicMapping``.
ABCs vs. Interfaces
-------------------
ABCs are not intrinsically incompatible with Interfaces, but there is
considerable overlap. For now, I'll leave it to proponents of
Interfaces to explain why Interfaces are better. I expect that much
of the work that went into e.g. defining the various shades of
"mapping-ness" and the nomenclature could easily be adapted for a
proposal to use Interfaces instead of ABCs.
"Interfaces" in this context refers to a set of proposals for
additional metadata elements attached to a class which are not part of
the regular class hierarchy, but do allow for certain types of
inheritance testing.
Such metadata would be designed, at least in some proposals, so as to
be easily mutable by an application, allowing application writers to
override the normal classification of an object.
The drawback to this idea of attaching mutable metadata to a class is
that classes are shared state, and mutating them may lead to conflicts
of intent. Additionally, the need to override the classification of
an object can be done more cleanly using generic functions: In the
simplest case, one can define a "category membership" generic function
that simply returns False in the base implementation, and then provide
overrides that return True for any classes of interest.
References
==========
.. [1] An Introduction to ABC's, by Talin
(http://mail.python.org/pipermail/python-3000/2007-April/006614.html)
.. [2] Incomplete implementation prototype, by GvR
(http://svn.python.org/view/sandbox/trunk/abc/)
.. [3] Possible Python 3K Class Tree?, wiki page created by Bill Janssen
(http://wiki.python.org/moin/AbstractBaseClasses)
.. [4] Generic Functions implementation, by GvR
(http://svn.python.org/view/sandbox/trunk/overload/)
.. [5] Charming Python: Scaling a new PEAK, by David Mertz
(http://www-128.ibm.com/developerworks/library/l-cppeak2/)
.. [6] Implementation of @abstractmethod
(http://python.org/sf/1706989)
.. [7] Unifying types and classes in Python 2.2, by GvR
(http://www.python.org/download/releases/2.2.3/descrintro/)
.. [8] "Putting Metaclasses to Work: A New Dimension in Object-Oriented
Programming", by Ira R. Forman and Scott H. Danforth
(http://www.amazon.com/gp/product/0201433052)
Copyright
=========
This document has been placed in the public domain.
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End: