Add lots of text. Still far from complete.

This commit is contained in:
Guido van Rossum 2007-04-19 00:21:49 +00:00
parent 2e759ec054
commit fc4f448947
1 changed files with 261 additions and 10 deletions

View File

@ -7,7 +7,7 @@ Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 18-Apr-2007
Post-History: 18-Apr-2007
Post-History: Not yet posted
Abstract
@ -145,16 +145,266 @@ These are:
The exception raised when attempting to instantiate an abstract
class. It derives from ``TypeError``.
A possible implementation would add an attribute
``__abstractmethod__`` to any method declared with
``@abstractmethod``, and add the names of all such abstract methods to
a class attribute named ``__abstractmethods__``. Then the
``Abstract.__new__()`` method would raise an exception if any abstract
methods exist on the class being instantiated. For details see [2]_.
ABCs for Containers and Iterators
---------------------------------
XXX define: Iterable, Iterator, Sizeable (or Lengthy?), various Sets,
Mappings, Sequences. Also Hashable so we can define Immutable (which
adds to "read-only", does not subtract from "mutable", nor does
"mutable" add to "hashable" -- it adds to "read-only"). Also defines
how set, frozenset, list, tuple, dict, bytes and str derive from
these.
The collections module will define ABCs necessary and sufficient to
work with sets, mappings, sequences, and some helper types such as
iterators and dictionary views.
The ABCs provide implementations of their abstract methods that are
technically valid but fairly useless; e.g. ``__hash__`` returns 0, and
``__iter__`` returns an empty iterator. In general, the abstract
methods represent the behavior of an empty container of the indicated
type.
Some ABCs also provide concrete (i.e. non-abstract) methods; for
example, the ``Iterator`` class has an ``__iter__`` method returning
itself, fulfilling an important invariant of iterators (which in
Python 2 has to be implemented anew by each iterator class).
No ABCs override ``__init__``, ``__new__``, ``__str__`` or
``__repr__``.
XXX define how set, frozenset, list, tuple, dict, bytes and str derive
from these.
One Trick Ponies
''''''''''''''''
These abstract classes represent single methods like ``__iter__`` or
``__len__``.
``Hashable``
The base class for classes defining ``__hash__``. Its abstract
``__hash__`` method always returns 0, which is a valid (albeit
inefficient) implementation. Note: being an instance of this
class does not imply that an object is immutable; e.g. a tuple
containing a list as a member is not immutable; its ``__hash__``
method raises an exception.
``Iterable``
The base class for classes defining ``__iter__``. Its abstract
``__iter__`` method returns an empty iterator.
``Iterator``
The base class for classes defining ``__next__``. This derives
from ``Iterable``. Its abstract ``__next__`` method raises
StopIteration. Its ``__iter__`` method returns ``self``, and is
*not* abstract.
``Lengthy``
The base class for classes defining ``__len__``. Its abstract
``__len__`` method returns 0. (The name is perhaps too cute; but
the only alternatives I've come up with so far are ``Sizeable``,
which suffers from the same affliction, and ``Finite``, which
somehow is associated with numbers instead of sets in my mind.)
``Container``
The base class for classes defining ``__contains__`. Its abstract
``__contains__`` method returns ``False``. Note: strictly
speaking, there are three variants of this method's semantics.
The first one is for sets and mappings, which is fast: O(1) or
O(log N). The second one is for membership checking on sequences,
which is slow: O(N). The third one is for subsequence checking on
(character or byte) strings, which is also slow: O(N). Would it
make sense to distinguish these? The signature of the third
variant is different, since it takes a sequence (typically of the
same type as the method's target) intead of an element. For now,
I'm using the same type for all three.
Sets
''''
These abstract classes represent various stages of "set-ness".
``Set``
This is a finite, iterable container, i.e. a subclass of
``Lengthy``, ``Iterable`` and ``Container``. Not every subset of
those three classes is a set though! Sets have the additional
property (though it is not expressed in code) that each element
occurs only once (as can be determined by iteration), and in
addition sets implement rich comparisons defined as
subclass/superclass tests.
Sets with different implementations can be compared safely,
efficiently and correctly. Because ``Set`` derives from
``Lengthy``, ``__eq__`` takes a shortcut and returns ``False``
immediately if two sets of unequal length are compared.
Similarly, ``__le__`` returns ``False`` immediately if the first
set has more members than the second set. Note that set inclusion
implements only a partial ordering; e.g. {1, 2} and {1, 3} are not
ordered (all three of ``<``, ``==`` and ``>`` return ``False`` for
these arguments). Sets cannot be ordered relative to mappings or
sequences, but they can be compared for equality (and then they
always compare unequal).
XXX Should we also implement the ``issubset`` and ``issuperset``
methods found on the set type in Python 2 (which are apparently
just aliases for ``__le__`` and ``__ge__``)?
XXX Should this class also implement union, intersection,
symmetric and asymmetric difference and/or the corresponding
operators? The problem with those (unlike the comparison
operators) is what should be the type of the return value. I'm
tentatively leaving these out -- if you need them, you can test
for a ``Set`` instance that implements e.g. ``__and__``. Some
alternatives: make these abstract methods (even though the
semantics apart from the type are well-defined); or make them
concrete methods that return a specific concrete set type; or make
them concrete methods that assume the class constructor always
accepts an iterable of elements; or add a new class method that
accepts an iterable of elements and that creates a new instance.
(I originally considered a "view" alternative, but the problem is
that computing ``len(a&b)`` requires iterating over ``a`` or
``b``, and that pretty much kills the idea.)
``HashableSet``
This is a subclass of both ``Set`` and ``Hashable``. It
implements a concrete ``__hash__`` method that subclasses should
not override; or if they do, the subclass should compute the same
hash value. This is so that sets with different implementations
still hash to the same value, so they can be used interchangeably
as dictionary keys. (A similar constraint exists on the hash
values for different types of numbers and strings.)
``MutableSet``
This is a subclass of ``Set`` implementing additional operations
to add and remove elements. The supported methods have the
semantics known from the ``set`` type in Python 2:
``.add(x)``
Abstract method that adds the element ``x``, if it isn't
already in the set.
``.remove(x)``
Abstract method that removes the element ``x``; raises
``KeyError`` if ``x`` is not in the set.
``.discard(x)``
Concrete method that removes the element ``x`` if it is
a member of the set; implemented using ``__contains__``
and ``remove``.
``.clear()``
Abstract method that empties the set. (Making this concrete
would just add a slow, cumbersome default implementation.)
XXX Should we support all the operations implemented by the Python
2 ``set`` type? I.e. union, update, __or__, __ror__, __ior__,
intersection, intersection_update, __and__, __rand__, __iand__,
difference, difference_update, __xor__, __rxor__, __ixor__,
symmetric_difference, symmetric_difference_update, __sub__,
__rsub__, __isub__. Note that in Python 2, a.update(b) is not
exactly the same as a |= b, since update() takes any iterable for
an argument, while |= requires another set; similar for the other
operators.
Mappings
''''''''
These abstract classes represent various stages of mapping-ness.
XXX Do we need BasicMapping and IterableMapping? Perhaps we should
just start with Mapping.
``BasicMapping``
A subclass of ``Container`` defining the following methods:
``.__getitem__(key)``
Abstract method that returns the value corresponding to
``key``, or raises ``KeyError``. The implementation always
raises ``KeyError``.
``.get(key, default=None)``
Concrete method returning ``self[key]`` if this does not raise
``KeyError``, and the ``default`` value if it does.
``.__contains__()``
Concrete method returning ``True`` if ``self[key]`` does not
raise ``KeyError``, and ``False`` if it does.
``IterableMapping``
A subclass of ``BasicMapping`` and ``Iterable``. It defines no
new methods. Iterating over such an object should return all the
valid keys (i.e. those keys for which ``.__getitem__()`` returns a
value), once each, and nothing else. It is possible that the
iteration never ends.
``Mapping``
A subclass of ``IterableMapping`` and ``Lengthy``. It defines
concrete methods ``__eq__``, ``keys``, ``items``, ``values``. The
lengh of such an object should equal to the number of elements
returned by iterating over the object until the end of the
iterator is reached. Two mappings, even with different
implementations, can be compared for equality, and are considered
equal if and only iff their items compare equal when converted to
sets. The ``keys``, ``items`` and ``values`` methods return
views; ``keys`` and ``items`` return ``Set`` views, ``values``
returns a ``Container`` view. The following invariant should
hold: m.items() == set(zip(m.keys(), m.values())).
``HashableMapping``
A subclass of ``Mapping`` and ``Hashable``. The values should be
instances of ``Hashable``. The concrete ``__hash__`` method
should be equal to ``hash(m.items())``.
``MutableMapping``
A subclass of ``Mapping`` that also implements some standard
mutating methods. At least ``__setitem__``, ``__delitem__``,
``clear``, ``update``. XXX Also pop, popitem, setdefault?
XXX Should probably say something about mapping view types, too.
Sequences
'''''''''
These abstract classes represent various stages of sequence-ness.
``Sequence``
A subclass of ``Iterable``, ``Lengthy``, ``Container``. It
defines a new abstract method ``__getitem__`` that has a
complicated signature: when called with an integer, it returns an
element of the sequence or raises ``IndexError``; when called with
a ``slice`` object, it returns another ``Sequence``. The concrete
``__iter__`` method iterates over the elements using
``__getitem__`` with integer arguments 0, 1, and so on, until
``IndexError`` is raised. The length should be equal to the
number of values returned by the iterator.
XXX Other candidate methods, which can all have default concrete
implementations that only depend on ``__len__`` and
``__getitem__`` with an integer argument: __reversed__, index,
count, __add__, __mul__, __eq__, __lt__, __le__.
``HashableSequence``
A subclass of ``Sequence`` and ``Hashable``. The concrete
``__hash__`` method should implements the hashing algorithms used
by tuples in Python 2.
``MutableSequence``
A subclass of ``Sequence`` adding some standard mutating methods.
Abstract mutating methods: ``__setitem__`` (for integer indices as
well as slices), ``__delitem__`` (ditto), ``insert``, ``add``,
``reverse``. Concrete mutating methods: ``extend``, ``pop``,
``remove``. Note: this does not define ``sort()`` -- that is only
required to exist on genuine ``list`` instances.
XXX What about += and *=?
ABCs for Numbers
@ -168,10 +418,11 @@ to Integer using __index__).
Guidelines for Writing ABCs
---------------------------
XXX E.g. use @abstractmethod and Abstract base class; define abstract
XXX Use @abstractmethod and Abstract base class; define abstract
methods that could be useful as an end point when called via a super
chain; keep abstract classes small, one per use case instead of one
per concept.
chain; define concrete methods that are very simple permutations of
abstract methods (e.g. Mapping.get); keep abstract classes small, one
per use case instead of one per concept.
References