Streamlined the containers section.

This commit is contained in:
Guido van Rossum 2007-05-11 23:36:47 +00:00
parent cef12733fb
commit 842b40e558
1 changed files with 128 additions and 205 deletions

View File

@ -7,7 +7,7 @@ Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 18-Apr-2007
Post-History: 26-Apr-2007
Post-History: 26-Apr-2007, 11-May-2007
Abstract
@ -30,6 +30,9 @@ specific mechanism of ABCs, as contrasted with Interfaces or Generic
Functions (GFs), but about clarifying philosophical issues like "what
makes a set", "what makes a mapping" and "what makes a sequence".
There's also a companion PEP 3141, which defines ABCs for numeric
types.
Acknowledgements
----------------
@ -329,7 +332,7 @@ invocation). For example::
C() # works
**Notes:** The ``@abstractmethod`` decorator should only be used
**Note:** The ``@abstractmethod`` decorator should only be used
inside a class body, and only for classes whose metaclass is (derived
from) ``ABCMeta``. Dynamically adding abstract methods to a class, or
attempting to modify the abstraction status of a method or class once
@ -337,6 +340,11 @@ it is created, are not supported. The ``@abstractmethod`` only
affects subclasses derived using regular inheritance; "virtual
subclasses" registered with the ``register()`` method are not affected.
It has been suggested that we should also provide a way to define
abstract data attributes. As it is easy to add these in a later
stage, and as the use case is considerably less common (apart from
pure documentation), we punt on this for now.
**Implementation:** The ``@abstractmethod`` decorator sets the
function attribute ``__isabstractmethod__`` to the value ``True``.
The ``ABCMeta.__new__`` method computes the type attribute
@ -358,19 +366,14 @@ may have an implementation. This implementation can be called via the
useful as an end-point for a super-call in framework using a
cooperative multiple-inheritance [7]_, [8]_.
**Open issues:** Should we also provide a standard way to declare
abstract data attributes? If so, how should these be spelled?
Perhaps place ``@abstractattribute`` decorators on properties? Or use
an ``@attributes(name1, name2, ...)`` class decorator? Strawman:
let's back off on this; it's easy enough to add this later.
ABCs for Containers and Iterators
---------------------------------
The ``collections`` module will define ABCs necessary and sufficient
to work with sets, mappings, sequences, and some helper types such as
iterators and dictionary views.
iterators and dictionary views. All ABCs have the above-mentioned
``ABCMeta`` as their metaclass.
The ABCs provide implementations of their abstract methods that are
technically valid but fairly useless; e.g. ``__hash__`` returns 0, and
@ -381,7 +384,8 @@ type.
Some ABCs also provide concrete (i.e. non-abstract) methods; for
example, the ``Iterator`` class has an ``__iter__`` method returning
itself, fulfilling an important invariant of iterators (which in
Python 2 has to be implemented anew by each iterator class).
Python 2 has to be implemented anew by each iterator class). These
ABCs can be considered "mix-in" classes.
No ABCs defined in the PEP override ``__init__``, ``__new__``,
``__str__`` or ``__repr__``. Defining a standard constructor
@ -390,27 +394,19 @@ example Patricia trees or gdbm files. Defining a specific string
representation for a collection is similarly left up to individual
implementations.
Ordering ABCs
'''''''''''''
These ABCs are closer to ``object`` in the ABC hierarchy.
``PartiallyOrdered``
This ABC defines the 4 inequality operations ``<``, ``<=``, ``>=``,
``>``. (Note that ``==`` and ``!=`` are defined by ``object``.)
Classes deriving from this ABC should implement a partial order
as defined in mathematics. [9]_
``TotallyOrdered``
This ABC derives from ``PartiallyOrdered``. It adds no new
operations but implies a promise of stronger invariants.
Classes deriving from this ABC should implement a total order
as defined in mathematics. [10]_
**Open issues:** Where should these live? The ``collections`` module
doesn't seem right, but making them built-ins seems a slippery slope
too.
**Note:** There are no ABCs for ordering operations (``__lt__``,
``__le__``, ``__ge__``, ``__gt__``). Defining these in a base class
(abstract or not) runs into problems with the accepted type for the
second operand. For example, if class ``Ordering`` defined
``__lt__``, one would assume that for any ``Ordering`` instances ``x``
and ``y``, ``x < y`` would be defined (even if it just defines a
partial ordering). But this cannot be the case: If both ``list`` and
``str`` derived from ``Ordering``, this would imply that ``[1, 2] <
(1, 2)`` should be defined (and presumably return False), while in
fact (in Python 3000!) such "mixed-mode comparisons" operations are
explicitly forbidden and raise ``TypeError``. See PEP 3100 and [14]_
for more information. (This is a special case of a more general issue
with operations that take another argument of the same type:
One Trick Ponies
@ -418,23 +414,22 @@ One Trick Ponies
These abstract classes represent single methods like ``__iter__`` or
``__len__``.
``Hashable``
The base class for classes defining ``__hash__``. The
``__hash__`` method should return an ``Integer`` (see "Numbers"
below). The abstract ``__hash__`` method always returns 0, which
is a valid (albeit inefficient) implementation. **Invariant:** If
classes ``C1`` and ``C2`` both derive from ``Hashable``, the
condition ``o1 == o2`` must imply ``hash(o1) == hash(o2)`` for all
instances ``o1`` of ``C1`` and all instances ``o2`` of ``C2``.
IOW, two objects shouldn't compare equal but have different hash
values.
``__hash__`` method should return an integer. The abstract
``__hash__`` method always returns 0, which is a valid (albeit
inefficient) implementation. **Invariant:** If classes ``C1`` and
``C2`` both derive from ``Hashable``, the condition ``o1 == o2``
must imply ``hash(o1) == hash(o2)`` for all instances ``o1`` of
``C1`` and all instances ``o2`` of ``C2``. IOW, two objects
should never compare equal but have different hash values.
Another constraint is that hashable objects, once created, should
never change their value (as compared by ``==``) or their hash
value. If a class cannot guarantee this, it should not derive
from ``Hashable``; if it cannot guarantee this for certain
instances only, ``__hash__`` for those instances should raise a
instances, ``__hash__`` for those instances should raise a
``TypeError`` exception.
**Note:** being an instance of this class does not imply that an
@ -453,7 +448,11 @@ These abstract classes represent single methods like ``__iter__`` or
The base class for classes defining ``__next__``. This derives
from ``Iterable``. The abstract ``__next__`` method raises
``StopIteration``. The concrete ``__iter__`` method returns
``self``.
``self``. Note the distinction between ``Iterable`` and
``Iterator``: an ``Iterable`` can be iterated over, i.e. supports
the ``__iter__`` methods; an ``Iterator`` is what the built-in
function ``iter()`` returns, i.e. supports the ``__next__``
method.
``Sized``
The base class for classes defining ``__len__``. The ``__len__``
@ -471,57 +470,54 @@ These abstract classes represent single methods like ``__iter__`` or
``Iterable``, then ``(x in o for x in o)`` should be a generator
yielding only True values for any instance ``o`` of ``C``.
**Open issues:** strictly speaking, there are three variants of
this method's semantics. The first one is for sets and mappings,
which is fast: O(1) or O(log N). The second one is for membership
checking on sequences, which is slow: O(N). The third one is for
subsequence checking on (character or byte) strings, which is also
slow: O(N). Would it make sense to distinguish these? The
signature of the third variant is different, since it takes a
sequence (typically of the same type as the method's target)
intead of an element. For now, I'm using the same type for all
three. This means that is is possible for ``x in o`` to be True
even though ``x`` is never yielded by ``iter(o)``. A suggested
name for the third form is ``Searchable`` (though people have
objected against this name on the grounds that it has the wrong
association).
**Open issues:** Conceivably, instead of using the ABCMeta metaclass,
these classes could override ``__instancecheck__`` and
``__subclasscheck__`` to check for the presence of the applicable
special method; for example::
class Sized(metaclass=ABCMeta):
@abstractmethod
def __hash__(self):
return 0
@classmethod
def __instancecheck__(cls, x):
return hasattr(x, "__len__")
@classmethod
def __subclasscheck__(cls, C):
return hasattr(C, "__bases__") and hasattr(C, "__len__")
This has the advantage of not requiring explicit registration.
However, the semantics hard to get exactly right given the confusing
semantics of instance attributes vs. class attributes, and that a
class is an instance of its metaclass; the check for ``__bases__`` is
only an approximation of the desired semantics. **Strawman:** Let's
do it, but let's arrange it in such a way that the registration API
also works.
Sets
''''
These abstract classes represent various stages of "set-ness". The
These abstract classes represent read-only sets and mutable sets. The
most fundamental set operation is the membership test, written as ``x
in s`` and implemented by ``s.__contains__(x)``. This is already
taken care of by the `Container`` class defined above. Therefore, we
define a set as a sized, iterable container for which certain
in s`` and implemented by ``s.__contains__(x)``. This operation is
already defined by the `Container`` class defined above. Therefore,
we define a set as a sized, iterable container for which certain
invariants from mathematical set theory hold.
The built-in type ``set`` derives from ``MutableSet``. The built-in
type ``frozenset`` derives from ``HashableSet``.
You might wonder why we require a set to be sized -- surely certain
infinite sets can be represented just fine in Python. For example,
the set of even integers could be defined like this::
class EvenIntegers(Container):
def __contains__(self, x):
return x % 2 == 0
However, such sets have rather limited practical value, and deciding
whether one such set is a subset of another would be difficult in
general without using a symbolic algebra package. So I consider this
out of the scope of a pragmatic proposal like this.
type ``frozenset`` derives from ``Set`` and ``Hashable``.
``Set``
This is a sized, iterable, partially ordered container, i.e. a
subclass of ``Sized``, ``Iterable``, ``Container`` and
``PartiallyOrdered``. Not every subset of those three classes is
a set though! Sets have the additional invariant that each
element occurs only once (as can be determined by iteration), and
in addition sets define concrete operators that implement the
inequality operations as subclass/superclass tests. In general,
the invariants for finite sets in mathematics hold. [11]_
This is a sized, iterable container, i.e., a subclass of
``Sized``, ``Iterable`` and ``Container``. Not every subclass of
those three classes is a set though! Sets have the additional
invariant that each element occurs only once (as can be determined
by iteration), and in addition sets define concrete operators that
implement the inequality operations as subclass/superclass tests.
In general, the invariants for finite sets in mathematics
hold. [11]_
Sets with different implementations can be compared safely,
(usually) efficiently and correctly using the mathematical
@ -539,57 +535,33 @@ out of the scope of a pragmatic proposal like this.
can be compared to those for equality (and then they always
compare unequal).
This class also defines concrete operators to compute union,
intersection, symmetric and asymmetric difference, respectively
``__or__``, ``__and__``, ``__xor__`` and ``__sub__``. These
operators should return instances of ``Set``. The default
implementations call the overridable class method
``_from_iterable()`` with an iterable argument. This factory
method's default implementation returns a ``frozenset`` instance;
it may be overridden to return another appropriate ``Set``
subclass.
Finally, this class defines a concrete method ``_hash`` which
computes the hash value from the elements. Hashable subclasses of
``Set`` can implement ``__hash__`` by calling ``_hash`` or they
can reimplement the same algorithm more efficiently; but the
algorithm implemented should be the same. Currently the algorithm
is fully specified only by the source code [15]_.
**Note:** the ``issubset`` and ``issuperset`` methods found on the
set type in Python 2 are not supported, as these are mostly just
aliases for ``__le__`` and ``__ge__``.
**Open issues:** should we define comparison of instances of
different concrete set types this way?
``ComposableSet``
This is a subclass of ``Set`` that defines abstract operators to
compute union, intersection, symmetric and asymmetric difference,
respectively ``__or__``, ``__and__``, ``__xor__`` and ``__sub__``.
These operators should return instances of ``ComposableSet``. The
abstract implementations return no meaningful values but raise
``NotImplementedError``; this is because any generic
implementation would have to create new instances but the ABCs
don't (and shouldn't, IMO) provide an API for creating new
instances. The implementations of these operators should ensure
that the results match the mathematical definition of set
composition. [11]_
**Open issues:** Should ``__or__`` and friends be abstract or
concrete methods? Making them abstract means that every
ComposableSet implementation must reimplement all of them. But
making them concrete begs the question of the actual return type:
since the ABC doesn't (and IMO shouldn't) define the constructor
signature for subclasses, the concrete implementations in the ABC
don't have an API to construct a new instance given an iterable.
Perhaps the right choice is to have a static concrete factory
function ``fromiterable`` which takes an iterable and returns
a ``ComposableSet`` instance. Subclasses can override this and
benefit from the default implementations of ``__or__`` etc.; or
they can override ``__or__`` if they want to.
``HashableSet``
This is a subclass of both ``ComposableSet`` and ``Hashable``. It
implements a concrete ``__hash__`` method that subclasses should
not override; or if they do, the subclass should compute the same
hash value. This is so that sets with different implementations
still hash to the same value, so they can be used interchangeably
as dictionary keys. (A similar constraint exists on the hash
values for different types of numbers and strings.)
**Open issues:** Spell out the hash algorithm. Should there be
another ABC that derives from Set and Hashable, but not from
Composable?
``MutableSet``
This is a subclass of ``ComposableSet`` implementing additional
operations to add and remove elements. The supported methods have
the semantics known from the ``set`` type in Python 2 (except
for ``discard``, which is modeled after Java):
This is a subclass of ``Set`` implementing additional operations
to add and remove elements. The supported methods have the
semantics known from the ``set`` type in Python 2 (except for
``discard``, which is modeled after Java):
``.add(x)``
Abstract method returning a ``bool`` that adds the element
@ -635,21 +607,18 @@ out of the scope of a pragmatic proposal like this.
Mappings
''''''''
These abstract classes represent various stages of mapping-ness. The
``Mapping`` class represents the most common read-only mapping API.
However, code *accepting* a mapping is encouraged to check for the
``BasicMapping`` ABC when iteration is not used. This allows for
certain "black-box" implementations that can look up values by key but
don't provide a convenient iteration API. A hypothetical example
would be an interface to a hierarchical filesystem, where keys are
pathnames relative to some root directory. Iterating over all
pathnames would presumably take forever, as would counting the number
of valid pathnames.
These abstract classes represent read-only mappings and mutable
mappings. The ``Mapping`` class represents the most common read-only
mapping API.
The built-in type ``dict`` derives from ``MutableMapping``.
``BasicMapping``
A subclass of ``Container`` defining the following methods:
``Mapping``
A subclass of ``Container``, ``Iterable`` and ``Sized``. The keys
of a mapping naturally form a set. The (key, value) pairs (which
must be tuples) are also referred to as items. The items also
form a set. Methods:
``.__getitem__(key)``
Abstract method that returns the value corresponding to
@ -664,25 +633,13 @@ The built-in type ``dict`` derives from ``MutableMapping``.
Concrete method returning ``True`` if ``self[key]`` does not
raise ``KeyError``, and ``False`` if it does.
``Mapping``
A subclass of ``BasicMapping``, ``Iterable`` and ``Sized``. The
keys of a mapping naturally form a set. The (key, value) pairs
are also referred to as items. The items also form a set.
Methods:
``.__len__()``
Abstract method returning the length of the key set.
Abstract method returning the number of distinct keys (i.e.,
the length of the key set).
``.__iter__()``
Abstract method returning each key in the key set exactly once.
``.__eq__(obj)``
Concrete method for comparing mappings. Two mappings, even
with different implementations, can be compared for equality,
and are considered equal if and only if their item sets are
equal. **Open issues:** should we define comparison of
instances of different concrete mapping types this way?
``.keys()``
Concrete method returning the key set as a ``Set``. The
default concrete implementation returns a "view" on the key
@ -712,11 +669,6 @@ The built-in type ``dict`` derives from ``MutableMapping``.
i.e. iterating over the items, keys and values should return
results in the same order.
``HashableMapping``
A subclass of ``Mapping`` and ``Hashable``. The values should be
instances of ``Hashable``. The concrete ``__hash__`` method
should be equal to ``hash(m.items())``.
``MutableMapping``
A subclass of ``Mapping`` that also implements some standard
mutating methods. Abstract methods include ``__setitem__``,
@ -728,13 +680,15 @@ The built-in type ``dict`` derives from ``MutableMapping``.
Sequences
'''''''''
These abstract classes represent various stages of sequence-ness.
These abstract classes represent read-only sequences and mutable
sequences.
The built-in ``list`` and ``bytes`` types derive from
``MutableSequence``. The built-in ``tuple`` and ``str`` types derive
from ``HashableSequence``.
from ``Sequence`` and ``Hashable``.
``Sequence``
A subclass of ``Iterable``, ``Sized``, ``Container``. It
defines a new abstract method ``__getitem__`` that has a somewhat
complicated signature: when called with an integer, it returns an
@ -747,15 +701,11 @@ from ``HashableSequence``.
**Open issues:** Other candidate methods, which can all have
default concrete implementations that only depend on ``__len__``
and ``__getitem__`` with an integer argument: __reversed__, index,
count, __add__, __mul__, __eq__, __lt__, __le__.
``HashableSequence``
A subclass of ``Sequence`` and ``Hashable``. The concrete
``__hash__`` method should implements the hashing algorithms used
by tuples in Python 2.
and ``__getitem__`` with an integer argument: ``__reversed__``,
``index``, ``count``, ``__add__``, ``__mul__``.
``MutableSequence``
A subclass of ``Sequence`` adding some standard mutating methods.
Abstract mutating methods: ``__setitem__`` (for integer indices as
well as slices), ``__delitem__`` (ditto), ``insert``, ``append``,
@ -765,24 +715,14 @@ from ``HashableSequence``.
``sort()`` -- that is only required to exist on genuine ``list``
instances.
**Open issues:** If all the elements of a sequence are totally
ordered, the sequence itself can be totally ordered with respect to
other sequences containing corresponding items of the same type.
Should we reflect this by making ``Sequence`` derive from
``TotallyOrdered``? Or ``Partiallyordered``? Also, we could easily
define comparison of sequences of different types, so that e.g.
``(1, 2, 3) == [1, 2, 3]`` and ``(1, 2) < [1, 2, 3]``. Should we?
(It might imply ``["a", "b"] == "ab"`` and ``[1, 2] == b"\1\2"``.)
Strings
-------
Python 3000 has two built-in string types: byte strings (``bytes``),
deriving from ``MutableSequence``, and (Unicode) character strings
(``str``), deriving from ``HashableSequence``. They also derive from
``TotallyOrdered``. If we were to introduce ``Searchable``, they
would also derive from that.
Python 3000 will likely have at least two built-in string types: byte
strings (``bytes``), deriving from ``MutableSequence``, and (Unicode)
character strings (``str``), deriving from ``Sequence`` and
``Hashable``.
**Open issues:** define the base interfaces for these so alternative
implementations and subclasses know what they are in for. This may be
@ -790,29 +730,6 @@ the subject of a new PEP or PEPs (PEP 358 should be co-opted for the
``bytes`` type).
Numbers
-------
ABCs for numerical types are defined in PEP 3141.
Guidelines for Writing ABCs
---------------------------
Some suggestions for writing ABCs:
* Use the ``@abstractmethod`` decorator.
* Define abstract methods that could be useful as an end point when
called via a super chain.
* Define concrete methods that are very simple permutations of
abstract methods (e.g. ``Mapping.get``).
* Keep abstract classes small, one per use case instead of one per
concept.
ABCs vs. Alternatives
=====================
@ -940,6 +857,12 @@ References
.. [13] ABCMeta sample implementation
(http://svn.python.org/view/sandbox/trunk/abc/xyz.py)
.. [14] python-dev email ("Comparing heterogeneous types")
http://mail.python.org/pipermail/python-dev/2004-June/045111.html
.. [15] Function ``frozenset_hash()`` in Object/setobject.c
(http://svn.python.org/view/python/trunk/Objects/setobject.c)
Copyright
=========