Streamlined the containers section.

This commit is contained in:
Guido van Rossum 2007-05-11 23:36:47 +00:00
parent cef12733fb
commit 842b40e558
1 changed files with 128 additions and 205 deletions

View File

@ -7,7 +7,7 @@ Status: Draft
Type: Standards Track Type: Standards Track
Content-Type: text/x-rst Content-Type: text/x-rst
Created: 18-Apr-2007 Created: 18-Apr-2007
Post-History: 26-Apr-2007 Post-History: 26-Apr-2007, 11-May-2007
Abstract Abstract
@ -30,6 +30,9 @@ specific mechanism of ABCs, as contrasted with Interfaces or Generic
Functions (GFs), but about clarifying philosophical issues like "what Functions (GFs), but about clarifying philosophical issues like "what
makes a set", "what makes a mapping" and "what makes a sequence". makes a set", "what makes a mapping" and "what makes a sequence".
There's also a companion PEP 3141, which defines ABCs for numeric
types.
Acknowledgements Acknowledgements
---------------- ----------------
@ -329,7 +332,7 @@ invocation). For example::
C() # works C() # works
**Notes:** The ``@abstractmethod`` decorator should only be used **Note:** The ``@abstractmethod`` decorator should only be used
inside a class body, and only for classes whose metaclass is (derived inside a class body, and only for classes whose metaclass is (derived
from) ``ABCMeta``. Dynamically adding abstract methods to a class, or from) ``ABCMeta``. Dynamically adding abstract methods to a class, or
attempting to modify the abstraction status of a method or class once attempting to modify the abstraction status of a method or class once
@ -337,6 +340,11 @@ it is created, are not supported. The ``@abstractmethod`` only
affects subclasses derived using regular inheritance; "virtual affects subclasses derived using regular inheritance; "virtual
subclasses" registered with the ``register()`` method are not affected. subclasses" registered with the ``register()`` method are not affected.
It has been suggested that we should also provide a way to define
abstract data attributes. As it is easy to add these in a later
stage, and as the use case is considerably less common (apart from
pure documentation), we punt on this for now.
**Implementation:** The ``@abstractmethod`` decorator sets the **Implementation:** The ``@abstractmethod`` decorator sets the
function attribute ``__isabstractmethod__`` to the value ``True``. function attribute ``__isabstractmethod__`` to the value ``True``.
The ``ABCMeta.__new__`` method computes the type attribute The ``ABCMeta.__new__`` method computes the type attribute
@ -358,19 +366,14 @@ may have an implementation. This implementation can be called via the
useful as an end-point for a super-call in framework using a useful as an end-point for a super-call in framework using a
cooperative multiple-inheritance [7]_, [8]_. cooperative multiple-inheritance [7]_, [8]_.
**Open issues:** Should we also provide a standard way to declare
abstract data attributes? If so, how should these be spelled?
Perhaps place ``@abstractattribute`` decorators on properties? Or use
an ``@attributes(name1, name2, ...)`` class decorator? Strawman:
let's back off on this; it's easy enough to add this later.
ABCs for Containers and Iterators ABCs for Containers and Iterators
--------------------------------- ---------------------------------
The ``collections`` module will define ABCs necessary and sufficient The ``collections`` module will define ABCs necessary and sufficient
to work with sets, mappings, sequences, and some helper types such as to work with sets, mappings, sequences, and some helper types such as
iterators and dictionary views. iterators and dictionary views. All ABCs have the above-mentioned
``ABCMeta`` as their metaclass.
The ABCs provide implementations of their abstract methods that are The ABCs provide implementations of their abstract methods that are
technically valid but fairly useless; e.g. ``__hash__`` returns 0, and technically valid but fairly useless; e.g. ``__hash__`` returns 0, and
@ -381,7 +384,8 @@ type.
Some ABCs also provide concrete (i.e. non-abstract) methods; for Some ABCs also provide concrete (i.e. non-abstract) methods; for
example, the ``Iterator`` class has an ``__iter__`` method returning example, the ``Iterator`` class has an ``__iter__`` method returning
itself, fulfilling an important invariant of iterators (which in itself, fulfilling an important invariant of iterators (which in
Python 2 has to be implemented anew by each iterator class). Python 2 has to be implemented anew by each iterator class). These
ABCs can be considered "mix-in" classes.
No ABCs defined in the PEP override ``__init__``, ``__new__``, No ABCs defined in the PEP override ``__init__``, ``__new__``,
``__str__`` or ``__repr__``. Defining a standard constructor ``__str__`` or ``__repr__``. Defining a standard constructor
@ -390,27 +394,19 @@ example Patricia trees or gdbm files. Defining a specific string
representation for a collection is similarly left up to individual representation for a collection is similarly left up to individual
implementations. implementations.
**Note:** There are no ABCs for ordering operations (``__lt__``,
Ordering ABCs ``__le__``, ``__ge__``, ``__gt__``). Defining these in a base class
''''''''''''' (abstract or not) runs into problems with the accepted type for the
second operand. For example, if class ``Ordering`` defined
These ABCs are closer to ``object`` in the ABC hierarchy. ``__lt__``, one would assume that for any ``Ordering`` instances ``x``
and ``y``, ``x < y`` would be defined (even if it just defines a
``PartiallyOrdered`` partial ordering). But this cannot be the case: If both ``list`` and
This ABC defines the 4 inequality operations ``<``, ``<=``, ``>=``, ``str`` derived from ``Ordering``, this would imply that ``[1, 2] <
``>``. (Note that ``==`` and ``!=`` are defined by ``object``.) (1, 2)`` should be defined (and presumably return False), while in
Classes deriving from this ABC should implement a partial order fact (in Python 3000!) such "mixed-mode comparisons" operations are
as defined in mathematics. [9]_ explicitly forbidden and raise ``TypeError``. See PEP 3100 and [14]_
for more information. (This is a special case of a more general issue
``TotallyOrdered`` with operations that take another argument of the same type:
This ABC derives from ``PartiallyOrdered``. It adds no new
operations but implies a promise of stronger invariants.
Classes deriving from this ABC should implement a total order
as defined in mathematics. [10]_
**Open issues:** Where should these live? The ``collections`` module
doesn't seem right, but making them built-ins seems a slippery slope
too.
One Trick Ponies One Trick Ponies
@ -418,23 +414,22 @@ One Trick Ponies
These abstract classes represent single methods like ``__iter__`` or These abstract classes represent single methods like ``__iter__`` or
``__len__``. ``__len__``.
``Hashable`` ``Hashable``
The base class for classes defining ``__hash__``. The The base class for classes defining ``__hash__``. The
``__hash__`` method should return an ``Integer`` (see "Numbers" ``__hash__`` method should return an integer. The abstract
below). The abstract ``__hash__`` method always returns 0, which ``__hash__`` method always returns 0, which is a valid (albeit
is a valid (albeit inefficient) implementation. **Invariant:** If inefficient) implementation. **Invariant:** If classes ``C1`` and
classes ``C1`` and ``C2`` both derive from ``Hashable``, the ``C2`` both derive from ``Hashable``, the condition ``o1 == o2``
condition ``o1 == o2`` must imply ``hash(o1) == hash(o2)`` for all must imply ``hash(o1) == hash(o2)`` for all instances ``o1`` of
instances ``o1`` of ``C1`` and all instances ``o2`` of ``C2``. ``C1`` and all instances ``o2`` of ``C2``. IOW, two objects
IOW, two objects shouldn't compare equal but have different hash should never compare equal but have different hash values.
values.
Another constraint is that hashable objects, once created, should Another constraint is that hashable objects, once created, should
never change their value (as compared by ``==``) or their hash never change their value (as compared by ``==``) or their hash
value. If a class cannot guarantee this, it should not derive value. If a class cannot guarantee this, it should not derive
from ``Hashable``; if it cannot guarantee this for certain from ``Hashable``; if it cannot guarantee this for certain
instances only, ``__hash__`` for those instances should raise a instances, ``__hash__`` for those instances should raise a
``TypeError`` exception. ``TypeError`` exception.
**Note:** being an instance of this class does not imply that an **Note:** being an instance of this class does not imply that an
@ -453,7 +448,11 @@ These abstract classes represent single methods like ``__iter__`` or
The base class for classes defining ``__next__``. This derives The base class for classes defining ``__next__``. This derives
from ``Iterable``. The abstract ``__next__`` method raises from ``Iterable``. The abstract ``__next__`` method raises
``StopIteration``. The concrete ``__iter__`` method returns ``StopIteration``. The concrete ``__iter__`` method returns
``self``. ``self``. Note the distinction between ``Iterable`` and
``Iterator``: an ``Iterable`` can be iterated over, i.e. supports
the ``__iter__`` methods; an ``Iterator`` is what the built-in
function ``iter()`` returns, i.e. supports the ``__next__``
method.
``Sized`` ``Sized``
The base class for classes defining ``__len__``. The ``__len__`` The base class for classes defining ``__len__``. The ``__len__``
@ -471,57 +470,54 @@ These abstract classes represent single methods like ``__iter__`` or
``Iterable``, then ``(x in o for x in o)`` should be a generator ``Iterable``, then ``(x in o for x in o)`` should be a generator
yielding only True values for any instance ``o`` of ``C``. yielding only True values for any instance ``o`` of ``C``.
**Open issues:** strictly speaking, there are three variants of **Open issues:** Conceivably, instead of using the ABCMeta metaclass,
this method's semantics. The first one is for sets and mappings, these classes could override ``__instancecheck__`` and
which is fast: O(1) or O(log N). The second one is for membership ``__subclasscheck__`` to check for the presence of the applicable
checking on sequences, which is slow: O(N). The third one is for special method; for example::
subsequence checking on (character or byte) strings, which is also
slow: O(N). Would it make sense to distinguish these? The class Sized(metaclass=ABCMeta):
signature of the third variant is different, since it takes a @abstractmethod
sequence (typically of the same type as the method's target) def __hash__(self):
intead of an element. For now, I'm using the same type for all return 0
three. This means that is is possible for ``x in o`` to be True @classmethod
even though ``x`` is never yielded by ``iter(o)``. A suggested def __instancecheck__(cls, x):
name for the third form is ``Searchable`` (though people have return hasattr(x, "__len__")
objected against this name on the grounds that it has the wrong @classmethod
association). def __subclasscheck__(cls, C):
return hasattr(C, "__bases__") and hasattr(C, "__len__")
This has the advantage of not requiring explicit registration.
However, the semantics hard to get exactly right given the confusing
semantics of instance attributes vs. class attributes, and that a
class is an instance of its metaclass; the check for ``__bases__`` is
only an approximation of the desired semantics. **Strawman:** Let's
do it, but let's arrange it in such a way that the registration API
also works.
Sets Sets
'''' ''''
These abstract classes represent various stages of "set-ness". The These abstract classes represent read-only sets and mutable sets. The
most fundamental set operation is the membership test, written as ``x most fundamental set operation is the membership test, written as ``x
in s`` and implemented by ``s.__contains__(x)``. This is already in s`` and implemented by ``s.__contains__(x)``. This operation is
taken care of by the `Container`` class defined above. Therefore, we already defined by the `Container`` class defined above. Therefore,
define a set as a sized, iterable container for which certain we define a set as a sized, iterable container for which certain
invariants from mathematical set theory hold. invariants from mathematical set theory hold.
The built-in type ``set`` derives from ``MutableSet``. The built-in The built-in type ``set`` derives from ``MutableSet``. The built-in
type ``frozenset`` derives from ``HashableSet``. type ``frozenset`` derives from ``Set`` and ``Hashable``.
You might wonder why we require a set to be sized -- surely certain
infinite sets can be represented just fine in Python. For example,
the set of even integers could be defined like this::
class EvenIntegers(Container):
def __contains__(self, x):
return x % 2 == 0
However, such sets have rather limited practical value, and deciding
whether one such set is a subset of another would be difficult in
general without using a symbolic algebra package. So I consider this
out of the scope of a pragmatic proposal like this.
``Set`` ``Set``
This is a sized, iterable, partially ordered container, i.e. a
subclass of ``Sized``, ``Iterable``, ``Container`` and This is a sized, iterable container, i.e., a subclass of
``PartiallyOrdered``. Not every subset of those three classes is ``Sized``, ``Iterable`` and ``Container``. Not every subclass of
a set though! Sets have the additional invariant that each those three classes is a set though! Sets have the additional
element occurs only once (as can be determined by iteration), and invariant that each element occurs only once (as can be determined
in addition sets define concrete operators that implement the by iteration), and in addition sets define concrete operators that
inequality operations as subclass/superclass tests. In general, implement the inequality operations as subclass/superclass tests.
the invariants for finite sets in mathematics hold. [11]_ In general, the invariants for finite sets in mathematics
hold. [11]_
Sets with different implementations can be compared safely, Sets with different implementations can be compared safely,
(usually) efficiently and correctly using the mathematical (usually) efficiently and correctly using the mathematical
@ -539,57 +535,33 @@ out of the scope of a pragmatic proposal like this.
can be compared to those for equality (and then they always can be compared to those for equality (and then they always
compare unequal). compare unequal).
This class also defines concrete operators to compute union,
intersection, symmetric and asymmetric difference, respectively
``__or__``, ``__and__``, ``__xor__`` and ``__sub__``. These
operators should return instances of ``Set``. The default
implementations call the overridable class method
``_from_iterable()`` with an iterable argument. This factory
method's default implementation returns a ``frozenset`` instance;
it may be overridden to return another appropriate ``Set``
subclass.
Finally, this class defines a concrete method ``_hash`` which
computes the hash value from the elements. Hashable subclasses of
``Set`` can implement ``__hash__`` by calling ``_hash`` or they
can reimplement the same algorithm more efficiently; but the
algorithm implemented should be the same. Currently the algorithm
is fully specified only by the source code [15]_.
**Note:** the ``issubset`` and ``issuperset`` methods found on the **Note:** the ``issubset`` and ``issuperset`` methods found on the
set type in Python 2 are not supported, as these are mostly just set type in Python 2 are not supported, as these are mostly just
aliases for ``__le__`` and ``__ge__``. aliases for ``__le__`` and ``__ge__``.
**Open issues:** should we define comparison of instances of
different concrete set types this way?
``ComposableSet``
This is a subclass of ``Set`` that defines abstract operators to
compute union, intersection, symmetric and asymmetric difference,
respectively ``__or__``, ``__and__``, ``__xor__`` and ``__sub__``.
These operators should return instances of ``ComposableSet``. The
abstract implementations return no meaningful values but raise
``NotImplementedError``; this is because any generic
implementation would have to create new instances but the ABCs
don't (and shouldn't, IMO) provide an API for creating new
instances. The implementations of these operators should ensure
that the results match the mathematical definition of set
composition. [11]_
**Open issues:** Should ``__or__`` and friends be abstract or
concrete methods? Making them abstract means that every
ComposableSet implementation must reimplement all of them. But
making them concrete begs the question of the actual return type:
since the ABC doesn't (and IMO shouldn't) define the constructor
signature for subclasses, the concrete implementations in the ABC
don't have an API to construct a new instance given an iterable.
Perhaps the right choice is to have a static concrete factory
function ``fromiterable`` which takes an iterable and returns
a ``ComposableSet`` instance. Subclasses can override this and
benefit from the default implementations of ``__or__`` etc.; or
they can override ``__or__`` if they want to.
``HashableSet``
This is a subclass of both ``ComposableSet`` and ``Hashable``. It
implements a concrete ``__hash__`` method that subclasses should
not override; or if they do, the subclass should compute the same
hash value. This is so that sets with different implementations
still hash to the same value, so they can be used interchangeably
as dictionary keys. (A similar constraint exists on the hash
values for different types of numbers and strings.)
**Open issues:** Spell out the hash algorithm. Should there be
another ABC that derives from Set and Hashable, but not from
Composable?
``MutableSet`` ``MutableSet``
This is a subclass of ``ComposableSet`` implementing additional
operations to add and remove elements. The supported methods have This is a subclass of ``Set`` implementing additional operations
the semantics known from the ``set`` type in Python 2 (except to add and remove elements. The supported methods have the
for ``discard``, which is modeled after Java): semantics known from the ``set`` type in Python 2 (except for
``discard``, which is modeled after Java):
``.add(x)`` ``.add(x)``
Abstract method returning a ``bool`` that adds the element Abstract method returning a ``bool`` that adds the element
@ -635,21 +607,18 @@ out of the scope of a pragmatic proposal like this.
Mappings Mappings
'''''''' ''''''''
These abstract classes represent various stages of mapping-ness. The These abstract classes represent read-only mappings and mutable
``Mapping`` class represents the most common read-only mapping API. mappings. The ``Mapping`` class represents the most common read-only
However, code *accepting* a mapping is encouraged to check for the mapping API.
``BasicMapping`` ABC when iteration is not used. This allows for
certain "black-box" implementations that can look up values by key but
don't provide a convenient iteration API. A hypothetical example
would be an interface to a hierarchical filesystem, where keys are
pathnames relative to some root directory. Iterating over all
pathnames would presumably take forever, as would counting the number
of valid pathnames.
The built-in type ``dict`` derives from ``MutableMapping``. The built-in type ``dict`` derives from ``MutableMapping``.
``BasicMapping`` ``Mapping``
A subclass of ``Container`` defining the following methods:
A subclass of ``Container``, ``Iterable`` and ``Sized``. The keys
of a mapping naturally form a set. The (key, value) pairs (which
must be tuples) are also referred to as items. The items also
form a set. Methods:
``.__getitem__(key)`` ``.__getitem__(key)``
Abstract method that returns the value corresponding to Abstract method that returns the value corresponding to
@ -664,25 +633,13 @@ The built-in type ``dict`` derives from ``MutableMapping``.
Concrete method returning ``True`` if ``self[key]`` does not Concrete method returning ``True`` if ``self[key]`` does not
raise ``KeyError``, and ``False`` if it does. raise ``KeyError``, and ``False`` if it does.
``Mapping``
A subclass of ``BasicMapping``, ``Iterable`` and ``Sized``. The
keys of a mapping naturally form a set. The (key, value) pairs
are also referred to as items. The items also form a set.
Methods:
``.__len__()`` ``.__len__()``
Abstract method returning the length of the key set. Abstract method returning the number of distinct keys (i.e.,
the length of the key set).
``.__iter__()`` ``.__iter__()``
Abstract method returning each key in the key set exactly once. Abstract method returning each key in the key set exactly once.
``.__eq__(obj)``
Concrete method for comparing mappings. Two mappings, even
with different implementations, can be compared for equality,
and are considered equal if and only if their item sets are
equal. **Open issues:** should we define comparison of
instances of different concrete mapping types this way?
``.keys()`` ``.keys()``
Concrete method returning the key set as a ``Set``. The Concrete method returning the key set as a ``Set``. The
default concrete implementation returns a "view" on the key default concrete implementation returns a "view" on the key
@ -712,11 +669,6 @@ The built-in type ``dict`` derives from ``MutableMapping``.
i.e. iterating over the items, keys and values should return i.e. iterating over the items, keys and values should return
results in the same order. results in the same order.
``HashableMapping``
A subclass of ``Mapping`` and ``Hashable``. The values should be
instances of ``Hashable``. The concrete ``__hash__`` method
should be equal to ``hash(m.items())``.
``MutableMapping`` ``MutableMapping``
A subclass of ``Mapping`` that also implements some standard A subclass of ``Mapping`` that also implements some standard
mutating methods. Abstract methods include ``__setitem__``, mutating methods. Abstract methods include ``__setitem__``,
@ -728,13 +680,15 @@ The built-in type ``dict`` derives from ``MutableMapping``.
Sequences Sequences
''''''''' '''''''''
These abstract classes represent various stages of sequence-ness. These abstract classes represent read-only sequences and mutable
sequences.
The built-in ``list`` and ``bytes`` types derive from The built-in ``list`` and ``bytes`` types derive from
``MutableSequence``. The built-in ``tuple`` and ``str`` types derive ``MutableSequence``. The built-in ``tuple`` and ``str`` types derive
from ``HashableSequence``. from ``Sequence`` and ``Hashable``.
``Sequence`` ``Sequence``
A subclass of ``Iterable``, ``Sized``, ``Container``. It A subclass of ``Iterable``, ``Sized``, ``Container``. It
defines a new abstract method ``__getitem__`` that has a somewhat defines a new abstract method ``__getitem__`` that has a somewhat
complicated signature: when called with an integer, it returns an complicated signature: when called with an integer, it returns an
@ -747,15 +701,11 @@ from ``HashableSequence``.
**Open issues:** Other candidate methods, which can all have **Open issues:** Other candidate methods, which can all have
default concrete implementations that only depend on ``__len__`` default concrete implementations that only depend on ``__len__``
and ``__getitem__`` with an integer argument: __reversed__, index, and ``__getitem__`` with an integer argument: ``__reversed__``,
count, __add__, __mul__, __eq__, __lt__, __le__. ``index``, ``count``, ``__add__``, ``__mul__``.
``HashableSequence``
A subclass of ``Sequence`` and ``Hashable``. The concrete
``__hash__`` method should implements the hashing algorithms used
by tuples in Python 2.
``MutableSequence`` ``MutableSequence``
A subclass of ``Sequence`` adding some standard mutating methods. A subclass of ``Sequence`` adding some standard mutating methods.
Abstract mutating methods: ``__setitem__`` (for integer indices as Abstract mutating methods: ``__setitem__`` (for integer indices as
well as slices), ``__delitem__`` (ditto), ``insert``, ``append``, well as slices), ``__delitem__`` (ditto), ``insert``, ``append``,
@ -765,24 +715,14 @@ from ``HashableSequence``.
``sort()`` -- that is only required to exist on genuine ``list`` ``sort()`` -- that is only required to exist on genuine ``list``
instances. instances.
**Open issues:** If all the elements of a sequence are totally
ordered, the sequence itself can be totally ordered with respect to
other sequences containing corresponding items of the same type.
Should we reflect this by making ``Sequence`` derive from
``TotallyOrdered``? Or ``Partiallyordered``? Also, we could easily
define comparison of sequences of different types, so that e.g.
``(1, 2, 3) == [1, 2, 3]`` and ``(1, 2) < [1, 2, 3]``. Should we?
(It might imply ``["a", "b"] == "ab"`` and ``[1, 2] == b"\1\2"``.)
Strings Strings
------- -------
Python 3000 has two built-in string types: byte strings (``bytes``), Python 3000 will likely have at least two built-in string types: byte
deriving from ``MutableSequence``, and (Unicode) character strings strings (``bytes``), deriving from ``MutableSequence``, and (Unicode)
(``str``), deriving from ``HashableSequence``. They also derive from character strings (``str``), deriving from ``Sequence`` and
``TotallyOrdered``. If we were to introduce ``Searchable``, they ``Hashable``.
would also derive from that.
**Open issues:** define the base interfaces for these so alternative **Open issues:** define the base interfaces for these so alternative
implementations and subclasses know what they are in for. This may be implementations and subclasses know what they are in for. This may be
@ -790,29 +730,6 @@ the subject of a new PEP or PEPs (PEP 358 should be co-opted for the
``bytes`` type). ``bytes`` type).
Numbers
-------
ABCs for numerical types are defined in PEP 3141.
Guidelines for Writing ABCs
---------------------------
Some suggestions for writing ABCs:
* Use the ``@abstractmethod`` decorator.
* Define abstract methods that could be useful as an end point when
called via a super chain.
* Define concrete methods that are very simple permutations of
abstract methods (e.g. ``Mapping.get``).
* Keep abstract classes small, one per use case instead of one per
concept.
ABCs vs. Alternatives ABCs vs. Alternatives
===================== =====================
@ -940,6 +857,12 @@ References
.. [13] ABCMeta sample implementation .. [13] ABCMeta sample implementation
(http://svn.python.org/view/sandbox/trunk/abc/xyz.py) (http://svn.python.org/view/sandbox/trunk/abc/xyz.py)
.. [14] python-dev email ("Comparing heterogeneous types")
http://mail.python.org/pipermail/python-dev/2004-June/045111.html
.. [15] Function ``frozenset_hash()`` in Object/setobject.c
(http://svn.python.org/view/python/trunk/Objects/setobject.c)
Copyright Copyright
========= =========