PEP: 3119 Title: Introducing Abstract Base Classes Version: $Revision$ Last-Modified: $Date$ Author: Guido van Rossum , Talin Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 18-Apr-2007 Post-History: Not yet posted Abstract ======== **THIS IS A WORK IN PROGRESS! DON'T REVIEW YET!** This is a proposal to add Abstract Base Class (ABC) support to Python 3000. It proposes: * An "ABC support framework" which defines a metaclass, a base class, a decorator, and some helpers that make it easy to define ABCs. This will be added as a new library module named "abc". * Specific ABCs for containers and iterators, to be added to the collections module. * Specific ABCs for numbers, to be added to a new module, yet to be named. * Guidelines for writing additional ABCs. Much of the thinking that went into the proposal is not about the specific mechanism of ABCs, as contrasted with Interfaces or Generic Functions (GFs), but about clarifying philosophical issues like "what makes a set", "what makes a mapping" and "what makes a sequence". Acknowledgements ---------------- Talin wrote the Rationale below [1]_. For that alone he deserves co-authorship. But the rest of the PEP uses "I" referring to the first author. Rationale ========= In the domain of object-oriented programming, the usage patterns for interacting with an object can be divided into two basic categories, which are 'invocation' and 'inspection'. Invocation means interacting with an object by invoking its methods. Usually this is combined with polymorphism, so that invoking a given method may run different code depending on the type of an object. Inspection means the ability for external code (outside of the object's methods) to examine the type or properties of that object, and make decisions on how to treat that object based on that information. Both usage patterns serve the same general end, which is to be able to support the processing of diverse and potentially novel objects in a uniform way, but at the same time allowing processing decisions to be customized for each different type of object. In classical OOP theory, invocation is the preferred usage pattern, and inspection is actively discouraged, being considered a relic of an earlier, procedural programming style. However, in practice this view is simply too dogmatic and inflexible, and leads to a kind of design rigidity that is very much at odds with the dynamic nature of a language like Python. In particular, there is often a need to process objects in a way that wasn't anticipated by the creator of the object class. It is not always the best solution to build in to every object methods that satisfy the needs of every possible user of that object. Moreover, there are many powerful dispatch philosophies that are in direct contrast to the classic OOP requirement of behavior being strictly encapsulated within an object, examples being rule or pattern-match driven logic. On the the other hand, one of the criticisms of inspection by classic OOP theorists is the lack of formalisms and the ad hoc nature of what is being inspected. In a language such as Python, in which almost any aspect of an object can be reflected and directly accessed by external code, there are many different ways to test whether an object conforms to a particular protocol or not. For example, if asking 'is this object a mutable sequence container?', one can look for a base class of 'list', or one can look for a method named '__getitem__'. But note that although these tests may seem obvious, neither of them are correct, as one generates false negatives, and the other false positives. The generally agreed-upon remedy is to standardize the tests, and group them into a formal arrangement. This is most easily done by associating with each class a set of standard testable properties, either via the inheritance mechanism or some other means. Each test carries with it a set of promises: it contains a promise about the general behavior of the class, and a promise as to what other class methods will be available. This PEP proposes a particular strategy for organizing these tests known as Abstract Base Classes, or ABC. ABCs are simply Python classes that are added into an object's inheritance tree to signal certain features of that object to an external inspector. Tests are done using isinstance(), and the presence of a particular ABC means that the test has passed. Like all other things in Python, these promises are in the nature of a gentlemen's agreement - which means that the language does not attempt to enforce that these promises are kept. Specification ============= The specification follows the four categories listed in the abstract: * An "ABC support framework" which defines a metaclass, a base class, a decorator, and some helpers that make it easy to define ABCs. This will be added as a new library module named "abc", or (probably) made built-in functionality. * Specific ABCs for containers and iterators, to be added to the collections module. * Specific ABCs for numbers, to be added to a new module that is yet to be named. * Guidelines for writing additional ABCs. ABC Support Framework --------------------- The abc module will define some utilities that help defining ABCs. These are: ``@abstractmethod`` A decorator used to declare abstract methods. This should only be used with classes whose class is derived from ``Abstract`` below. A class containing at least one method declared with this decorator that hasn't been overridden yet cannot be instantiated. Such a methods may be called from the overriding method in the subclass (using ``super`` or direct invocation). ``Abstract`` A class implementing the constraint that it or its subclasses cannot be instantiated unless each abstract method has been overridden. Its metaclass is ``AbstractClass``. Note: being derived from ``Abstract`` does not make a class abstract; the abstract-ness is decided on a per-class basis, depending on whether all methods defined with ``@abstractmethod`` have been overridden. ``AbstractClass`` The metaclass of Abstract (and all classes derived from it). Its purpose is to collect the information during the class construction stage. It derives from ``type``. ``AbstractInstantiationError`` The exception raised when attempting to instantiate an abstract class. It derives from ``TypeError``. A possible implementation would add an attribute ``__abstractmethod__`` to any method declared with ``@abstractmethod``, and add the names of all such abstract methods to a class attribute named ``__abstractmethods__``. Then the ``Abstract.__new__()`` method would raise an exception if any abstract methods exist on the class being instantiated. For details see [2]_. (However, this would incur a significant cost upon each instantiation. A better approach would be to do most of the work in the metaclass.) **Open issue:** Probably ``abstractmethod`` and ``AbstractInstantiationError`` should become built-ins, ``Abstract``'s functionality should be subsumed by ``object``, and ``AbstractClass``'s functionality should be merged into ``type``. This would require a more efficient implementation of the instantiable-test sketched above. ABCs for Containers and Iterators --------------------------------- The collections module will define ABCs necessary and sufficient to work with sets, mappings, sequences, and some helper types such as iterators and dictionary views. The ABCs provide implementations of their abstract methods that are technically valid but fairly useless; e.g. ``__hash__`` returns 0, and ``__iter__`` returns an empty iterator. In general, the abstract methods represent the behavior of an empty container of the indicated type. Some ABCs also provide concrete (i.e. non-abstract) methods; for example, the ``Iterator`` class has an ``__iter__`` method returning itself, fulfilling an important invariant of iterators (which in Python 2 has to be implemented anew by each iterator class). No ABCs override ``__init__``, ``__new__``, ``__str__`` or ``__repr__``. One Trick Ponies '''''''''''''''' These abstract classes represent single methods like ``__iter__`` or ``__len__``. The ``Iterator`` class is included as well, even though it has two prescribed methods. ``Hashable`` The base class for classes defining ``__hash__``. The ``__hash__`` method should return an ``Integer`` (see "Numbers" below). The abstract ``__hash__`` method always returns 0, which is a valid (albeit inefficient) implementation. **Invariant:** If classes ``C1`` and ``C2`` both derive from ``Hashable``, the condition ``o1 == o2`` must imply ``hash(o1) == hash(o2)`` for all instances ``o1`` of ``C1`` and all instances ``o2`` of ``C2``. IOW, two objects shouldn't compare equal but have different hash values. Another constraint is that hashable objects, once created, should never change their value (as compared by ``==``) or their hash value. If a class cannot guarantee this, it should not derive from ``Hashable``; if it cannot guarantee this for certain instances only, ``__hash__`` for those instances should raise an exception. Note: being an instance of this class does not imply that an object is immutable; e.g. a tuple containing a list as a member is not immutable; its ``__hash__`` method raises an exception. ``Iterable`` The base class for classes defining ``__iter__``. The ``__iter__`` method should always return an instance of ``Iterator`` (see below). The abstract ``__iter__`` method returns an empty iterator. ``Iterator`` The base class for classes defining ``__next__``. This derives from ``Iterable``. The abstract ``__next__`` method raises ``StopIteration``. The concrete ``__iter__`` method returns ``self``. (Note: this assumes PEP 3114 is implemented.) ``Sized`` The base class for classes defining ``__len__``. The ``__len__`` method should return an ``Integer`` (see "Numbers" below) >= 0. The abstract ``__len__`` method returns 0. **Invariant:** If a class ``C`` derives from ``Sized`` as well as from ``Iterable``, the invariant ``sum(1 for x in o) == len(o)`` should hold for any instance ``o`` of ``C``. **Open issue:** Is ``Sized`` the best name? Proposed alternatives already tentatively rejected: ``Finite`` (nobody understood it), ``Lengthy``, ``Sizeable`` (both too cute), ``Countable`` (the set of natural numbers is a countable set in math), ``Enumerable`` (sounds like a sysnonym for ``Iterable``), ``Dimension``, ``Extent`` (sound like numbers to me). ``Container`` The base class for classes defining ``__contains__``. The ``__contains__`` method should return a ``bool``. The abstract ``__contains__`` method returns ``False``. **Invariant:** If a class ``C`` derives from ``Container`` as well as from ``Iterable``, then ``(x in o for x in o)`` should be a generator yielding only True values for any instance ``o`` of ``C``. Note: strictly speaking, there are three variants of this method's semantics. The first one is for sets and mappings, which is fast: O(1) or O(log N). The second one is for membership checking on sequences, which is slow: O(N). The third one is for subsequence checking on (character or byte) strings, which is also slow: O(N). Would it make sense to distinguish these? The signature of the third variant is different, since it takes a sequence (typically of the same type as the method's target) intead of an element. For now, I'm using the same type for all three. This means that is is possible for ``x in o`` to be True even though ``x`` is never yielded by ``iter(o)``. Sets '''' These abstract classes represent various stages of "set-ness". The most fundamental set operation is the membership test, written as ``x in s`` and implemented by ``s.__contains__(x)``. This is already taken care of by the `Container`` class defined above. Therefore, we define a set as finite, iterable container for which certain invariants from mathematical set theory hold. The built-in type ``set`` derives from ``MutableSet``. The built-in type ``frozenset`` derives from ``HashableSet``. You might wonder why we require a set to be finite -- surely certain infinite sets can be represented just fine in Python. For example, the set of even integers could be defined like this:: class EvenIntegers(Container): def __contains__(self, x): return x % 2 == 0 However, such sets have rather limited practical value, and deciding whether one such set is a subset of another would be difficult in general without using a symbolic algebra package. So I consider this out of the scope of a pragmatic proposal like this. ``Set`` This is a finite, iterable container, i.e. a subclass of ``Sized``, ``Iterable`` and ``Container``. Not every subset of those three classes is a set though! Sets have the additional invariant that each element occurs only once (as can be determined by iteration), and in addition sets define concrete operators that implement rich comparisons defined as subclass/superclass tests. Sets with different implementations can be compared safely, efficiently and correctly. Because ``Set`` derives from ``Sized``, ``__eq__`` takes a shortcut and returns ``False`` immediately if two sets of unequal length are compared. Similarly, ``__le__`` returns ``False`` immediately if the first set has more members than the second set. Note that set inclusion implements only a partial ordering; e.g. ``{1, 2}`` and ``{1, 3}`` are not ordered (all three of ``<``, ``==`` and ``>`` return ``False`` for these arguments). Sets cannot be ordered relative to mappings or sequences, but they can be compared for equality (and then they always compare unequal). Note: the ``issubset`` and ``issuperset`` methods found on the set type in Python 2 are not supported, as these are mostly just aliases for ``__le__`` and ``__ge__``. **Open issues:** Should I spell out the invariants and method definitions? ``ComposableSet`` This is a subclass of ``Set`` that defines abstract operators to compute union, intersection, symmetric and asymmetric difference, respectively ``__or__``, ``__and__``, ``__xor__`` and ``__sub__``. These operators should return instances of ``ComposableSet``. The abstract implementations return no meaningful values but raise ``NotImplementedError``; this is because any generic implementation would have to create new instances but the ABCs don't (and shouldn't, IMO) provide an API for creating new instances. **Invariants:** The implementations of these operators should ensure that the results match the mathematical definition of set composition. **Open issues:** Should I spell out the invariants? Should we define an API for creating new instances (e.g. a class method or a fixed constructor signature)? Should we just pick a concrete return type (e.g. ``set``)? Should we add the ``copy`` method? ``HashableSet`` This is a subclass of both ``ComposableSet`` and ``Hashable``. It implements a concrete ``__hash__`` method that subclasses should not override; or if they do, the subclass should compute the same hash value. This is so that sets with different implementations still hash to the same value, so they can be used interchangeably as dictionary keys. (A similar constraint exists on the hash values for different types of numbers and strings.) **Open issues:** Should I spell out the hash algorithm? Should there be another ABC that derives from Set and Hashable (but not from Composable)? ``MutableSet`` This is a subclass of ``ComposableSet`` implementing additional operations to add and remove elements. The supported methods have the semantics known from the ``set`` type in Python 2: ``.add(x)`` Abstract method that adds the element ``x``, if it isn't already in the set. ``.remove(x)`` Abstract method that removes the element ``x``; raises ``KeyError`` if ``x`` is not in the set. ``.discard(x)`` Concrete method that removes the element ``x`` if it is a member of the set; implemented using ``__contains__`` and ``remove``. ``.clear()`` Abstract method that empties the set. (Making this concrete would just add a slow, cumbersome default implementation.) ``.pop()`` Concrete method that removes an arbitrary item. If the set is empty, it raises ``KeyError``. The default implementation removes the first item returned by the set's iterator. This also supports the in-place mutating operations ``|=``, ``&=``, ``^=``, ``-=``. It does not support the named methods that perform (almost) the same operations, like ``update``, even though these don't have exactly the same rules (``update`` takes any iterable, while ``|=`` requires a set). **Open issues:** Should we unify ``remove`` and ``discard``, a la Java (which has a single method returning a boolean indicating whether it was removed or not)? Mappings '''''''' These abstract classes represent various stages of mapping-ness. The built-in type ``dict`` derives from ``MutableMapping``. ``BasicMapping`` A subclass of ``Container`` defining the following methods: ``.__getitem__(key)`` Abstract method that returns the value corresponding to ``key``, or raises ``KeyError``. The implementation always raises ``KeyError``. ``.get(key, default=None)`` Concrete method returning ``self[key]`` if this does not raise ``KeyError``, and the ``default`` value if it does. ``.__contains__()`` Concrete method returning ``True`` if ``self[key]`` does not raise ``KeyError``, and ``False`` if it does. ``IterableMapping`` A subclass of ``BasicMapping`` and ``Iterable``. It defines no new methods. Iterating over such an object should return all the valid keys (i.e. those keys for which ``.__getitem__()`` returns a value), once each, and nothing else. It is possible that the iteration never ends. ``Mapping`` A subclass of ``IterableMapping`` and ``Sized``. It defines concrete methods ``__eq__``, ``keys``, ``items``, ``values``. The lengh of such an object should equal to the number of elements returned by iterating over the object until the end of the iterator is reached. Two mappings, even with different implementations, can be compared for equality, and are considered equal if and only iff their items compare equal when converted to sets. The ``keys``, ``items`` and ``values`` methods return views; ``keys`` and ``items`` return ``Set`` views, ``values`` returns a ``Container`` view. The following invariant should hold: m.items() == set(zip(m.keys(), m.values())). ``HashableMapping`` A subclass of ``Mapping`` and ``Hashable``. The values should be instances of ``Hashable``. The concrete ``__hash__`` method should be equal to ``hash(m.items())``. ``MutableMapping`` A subclass of ``Mapping`` that also implements some standard mutating methods. Abstract methods include ``__setitem__``, ``__delitem__``, ``clear``, ``update``. Concrete methods include ``pop``, ``popitem``. Note: ``setdefault`` is *not* included. * Do we need BasicMapping and IterableMapping? We should probably just start with Mapping. * We should say more about mapping view types. Sequences ''''''''' These abstract classes represent various stages of sequence-ness. The built-in ``list`` and ``bytes`` types derive from ``MutableSequence``. The built-in ``tuple`` and ``str`` types derive from ``HashableSequence``. ``Sequence`` A subclass of ``Iterable``, ``Sized``, ``Container``. It defines a new abstract method ``__getitem__`` that has a complicated signature: when called with an integer, it returns an element of the sequence or raises ``IndexError``; when called with a ``slice`` object, it returns another ``Sequence``. The concrete ``__iter__`` method iterates over the elements using ``__getitem__`` with integer arguments 0, 1, and so on, until ``IndexError`` is raised. The length should be equal to the number of values returned by the iterator. **Open issues:** Other candidate methods, which can all have default concrete implementations that only depend on ``__len__`` and ``__getitem__`` with an integer argument: __reversed__, index, count, __add__, __mul__, __eq__, __lt__, __le__. ``HashableSequence`` A subclass of ``Sequence`` and ``Hashable``. The concrete ``__hash__`` method should implements the hashing algorithms used by tuples in Python 2. ``MutableSequence`` A subclass of ``Sequence`` adding some standard mutating methods. Abstract mutating methods: ``__setitem__`` (for integer indices as well as slices), ``__delitem__`` (ditto), ``insert``, ``append``, ``reverse``. Concrete mutating methods: ``extend``, ``pop``, ``remove``. Concrete mutating operators: ``+=``, ``*=`` (these mutate the object in place). Note: this does not define ``sort()`` -- that is only required to exist on genuine ``list`` instances. ABCs for Numbers ---------------- **Open issues:** Define: Number, Complex, Real, Rational, Integer. Do we have a use case for Cardinal (Integer >= 0)? Do we need Index (converts to Integer using __index__)? Or is that just subsumed into Integer and should we use __index__ only at the C level? Guidelines for Writing ABCs --------------------------- Some sugegstions: * Use @abstractmethod and Abstract base class. * Define abstract methods that could be useful as an end point when called via a super chain. * Define concrete methods that are very simple permutations of abstract methods (e.g. Mapping.get). * Keep abstract classes small, one per use case instead of one per concept. * What else? ABCs vs. Alternatives ===================== In this section I will attempt to compare and contrast ABCs to other approaches that have been proposed. ABCs vs. Duck Typing -------------------- Does the introduction of ABCs mean the end of Duck Typing? I don't think so. Python will not require that a class derives from ``BasicMapping`` or ``Sequence`` when it defines a ``__getitem__`` method, nor will the ``x[y]`` syntax require that ``x`` is an instance of either ABC. You will still be able to assign any "file-like" object to ``sys.stdout``, as long as it has a ``write`` method. Of course, there will be some carrots to encourage users to derive from the appropriate base classes; these vary from default implementations for certain functionality to an improved ability to distinguish between mappings and sequences. But there are no sticks. If ``hasattr(x, __len__)`` works for you, great! ABCs are intended to solve problems that don't have a good solution at all in Python 2, such as distinguishing between mappings and sequences. ABCs vs. Generic Functions -------------------------- ABCs are compatible with Generic Functions (GFs). For example, my own Generic Functions implementation [4]_ uses the classes (types) of the arguments as the dispatch key, allowing derived classes to override base classes. Since (from Python's perspective) ABCs are quite ordinary classes, using an ABC in the default implementation for a GF can be quite appropriate. For example, if I have an overloaded ``prettyprint`` function, it would make total sense to define pretty-printing of sets like this:: @prettyprint.register(Set) def pp_set(s): return "{" + ... + "}" # Details left as an exercise and implementations for specific subclasses of Set could be added easily. I believe ABCs also won't present any problems for RuleDispatch, Phillip Eby's GF implementation in PEAK [5]_. Of course, GF proponents might claim that GFs (and concrete, or implementation, classes) are all you need. But even they will not deny the usefulness of inheritance; and one can easily consider the ABCs proposed in this PEP as optional implementation base classes; there is no requirement that all user-defined mappings derive from ``BasicMapping``. ABCs vs. Interfaces ------------------- ABCs are not intrinsically incompatible with Interfaces, but there is considerable overlap. For now, I'll leave it to proponents of Interfaces to explain why Interfaces are better. I expect that much of the work that went into e.g. defining the various shades of "mapping-ness" and the nomenclature could easily be adapted for a proposal to use Interfaces instead of ABCs. Open Issues =========== Apart from the open issues already sprinkled through the text above, and the "category one" issue of deciding between ABCs, GFs and Interfaces there are some fairly large looming issues. * Should we strive to use ABCs for *all* areas of Python? The wiki page for ABCs created by Bill Janssen [3]_ tries to be comprehensive: it defines everything from Comparable and Object to files. The current PEP tries to limit itself to three areas: ABC support (like the ``@abstractmethod`` decorator), collections types, and numbers. The proposed class hierarchy for new I/O described in PEP 3116 already including de-facto ABCs; these can easily be upgraded to use the mechanisms from the current PEP if it is accepted. Perhaps Orderable would be a good concept to define in the current PEP; I don't expect we need to go further. * Perhaps the numeric classes could be moved to a separate PEP; the issues there don't have much in common with the issues for collection types. * What else? References ========== .. [1] An Introduction to ABC's, by Talin (http://mail.python.org/pipermail/python-3000/2007-April/006614.html) .. [2] Incomplete implementation prototype, by GvR (http://svn.python.org/view/sandbox/trunk/abc/) .. [3] Possible Python 3K Class Tree?, wiki page created by Bill Janssen (http://wiki.python.org/moin/AbstractBaseClasses) .. [4] Generic Functions implementation, by GvR (http://svn.python.org/view/sandbox/trunk/overload/) .. [5] Charming Python: Scaling a new PEAK, by David Mertz (http://www-128.ibm.com/developerworks/library/l-cppeak2/) Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: