From 842b40e558edb61e8e9f80cb1dcf8eb9d5afd756 Mon Sep 17 00:00:00 2001 From: Guido van Rossum Date: Fri, 11 May 2007 23:36:47 +0000 Subject: [PATCH] Streamlined the containers section. --- pep-3119.txt | 333 ++++++++++++++++++++------------------------------- 1 file changed, 128 insertions(+), 205 deletions(-) diff --git a/pep-3119.txt b/pep-3119.txt index af32ca0b7..b2918e86d 100644 --- a/pep-3119.txt +++ b/pep-3119.txt @@ -7,7 +7,7 @@ Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 18-Apr-2007 -Post-History: 26-Apr-2007 +Post-History: 26-Apr-2007, 11-May-2007 Abstract @@ -30,6 +30,9 @@ specific mechanism of ABCs, as contrasted with Interfaces or Generic Functions (GFs), but about clarifying philosophical issues like "what makes a set", "what makes a mapping" and "what makes a sequence". +There's also a companion PEP 3141, which defines ABCs for numeric +types. + Acknowledgements ---------------- @@ -329,7 +332,7 @@ invocation). For example:: C() # works -**Notes:** The ``@abstractmethod`` decorator should only be used +**Note:** The ``@abstractmethod`` decorator should only be used inside a class body, and only for classes whose metaclass is (derived from) ``ABCMeta``. Dynamically adding abstract methods to a class, or attempting to modify the abstraction status of a method or class once @@ -337,6 +340,11 @@ it is created, are not supported. The ``@abstractmethod`` only affects subclasses derived using regular inheritance; "virtual subclasses" registered with the ``register()`` method are not affected. +It has been suggested that we should also provide a way to define +abstract data attributes. As it is easy to add these in a later +stage, and as the use case is considerably less common (apart from +pure documentation), we punt on this for now. + **Implementation:** The ``@abstractmethod`` decorator sets the function attribute ``__isabstractmethod__`` to the value ``True``. The ``ABCMeta.__new__`` method computes the type attribute @@ -358,19 +366,14 @@ may have an implementation. This implementation can be called via the useful as an end-point for a super-call in framework using a cooperative multiple-inheritance [7]_, [8]_. -**Open issues:** Should we also provide a standard way to declare -abstract data attributes? If so, how should these be spelled? -Perhaps place ``@abstractattribute`` decorators on properties? Or use -an ``@attributes(name1, name2, ...)`` class decorator? Strawman: -let's back off on this; it's easy enough to add this later. - ABCs for Containers and Iterators --------------------------------- The ``collections`` module will define ABCs necessary and sufficient to work with sets, mappings, sequences, and some helper types such as -iterators and dictionary views. +iterators and dictionary views. All ABCs have the above-mentioned +``ABCMeta`` as their metaclass. The ABCs provide implementations of their abstract methods that are technically valid but fairly useless; e.g. ``__hash__`` returns 0, and @@ -381,7 +384,8 @@ type. Some ABCs also provide concrete (i.e. non-abstract) methods; for example, the ``Iterator`` class has an ``__iter__`` method returning itself, fulfilling an important invariant of iterators (which in -Python 2 has to be implemented anew by each iterator class). +Python 2 has to be implemented anew by each iterator class). These +ABCs can be considered "mix-in" classes. No ABCs defined in the PEP override ``__init__``, ``__new__``, ``__str__`` or ``__repr__``. Defining a standard constructor @@ -390,27 +394,19 @@ example Patricia trees or gdbm files. Defining a specific string representation for a collection is similarly left up to individual implementations. - -Ordering ABCs -''''''''''''' - -These ABCs are closer to ``object`` in the ABC hierarchy. - -``PartiallyOrdered`` - This ABC defines the 4 inequality operations ``<``, ``<=``, ``>=``, - ``>``. (Note that ``==`` and ``!=`` are defined by ``object``.) - Classes deriving from this ABC should implement a partial order - as defined in mathematics. [9]_ - -``TotallyOrdered`` - This ABC derives from ``PartiallyOrdered``. It adds no new - operations but implies a promise of stronger invariants. - Classes deriving from this ABC should implement a total order - as defined in mathematics. [10]_ - -**Open issues:** Where should these live? The ``collections`` module -doesn't seem right, but making them built-ins seems a slippery slope -too. +**Note:** There are no ABCs for ordering operations (``__lt__``, +``__le__``, ``__ge__``, ``__gt__``). Defining these in a base class +(abstract or not) runs into problems with the accepted type for the +second operand. For example, if class ``Ordering`` defined +``__lt__``, one would assume that for any ``Ordering`` instances ``x`` +and ``y``, ``x < y`` would be defined (even if it just defines a +partial ordering). But this cannot be the case: If both ``list`` and +``str`` derived from ``Ordering``, this would imply that ``[1, 2] < +(1, 2)`` should be defined (and presumably return False), while in +fact (in Python 3000!) such "mixed-mode comparisons" operations are +explicitly forbidden and raise ``TypeError``. See PEP 3100 and [14]_ +for more information. (This is a special case of a more general issue +with operations that take another argument of the same type: One Trick Ponies @@ -418,23 +414,22 @@ One Trick Ponies These abstract classes represent single methods like ``__iter__`` or ``__len__``. - + ``Hashable`` The base class for classes defining ``__hash__``. The - ``__hash__`` method should return an ``Integer`` (see "Numbers" - below). The abstract ``__hash__`` method always returns 0, which - is a valid (albeit inefficient) implementation. **Invariant:** If - classes ``C1`` and ``C2`` both derive from ``Hashable``, the - condition ``o1 == o2`` must imply ``hash(o1) == hash(o2)`` for all - instances ``o1`` of ``C1`` and all instances ``o2`` of ``C2``. - IOW, two objects shouldn't compare equal but have different hash - values. + ``__hash__`` method should return an integer. The abstract + ``__hash__`` method always returns 0, which is a valid (albeit + inefficient) implementation. **Invariant:** If classes ``C1`` and + ``C2`` both derive from ``Hashable``, the condition ``o1 == o2`` + must imply ``hash(o1) == hash(o2)`` for all instances ``o1`` of + ``C1`` and all instances ``o2`` of ``C2``. IOW, two objects + should never compare equal but have different hash values. Another constraint is that hashable objects, once created, should never change their value (as compared by ``==``) or their hash value. If a class cannot guarantee this, it should not derive from ``Hashable``; if it cannot guarantee this for certain - instances only, ``__hash__`` for those instances should raise a + instances, ``__hash__`` for those instances should raise a ``TypeError`` exception. **Note:** being an instance of this class does not imply that an @@ -453,7 +448,11 @@ These abstract classes represent single methods like ``__iter__`` or The base class for classes defining ``__next__``. This derives from ``Iterable``. The abstract ``__next__`` method raises ``StopIteration``. The concrete ``__iter__`` method returns - ``self``. + ``self``. Note the distinction between ``Iterable`` and + ``Iterator``: an ``Iterable`` can be iterated over, i.e. supports + the ``__iter__`` methods; an ``Iterator`` is what the built-in + function ``iter()`` returns, i.e. supports the ``__next__`` + method. ``Sized`` The base class for classes defining ``__len__``. The ``__len__`` @@ -471,57 +470,54 @@ These abstract classes represent single methods like ``__iter__`` or ``Iterable``, then ``(x in o for x in o)`` should be a generator yielding only True values for any instance ``o`` of ``C``. - **Open issues:** strictly speaking, there are three variants of - this method's semantics. The first one is for sets and mappings, - which is fast: O(1) or O(log N). The second one is for membership - checking on sequences, which is slow: O(N). The third one is for - subsequence checking on (character or byte) strings, which is also - slow: O(N). Would it make sense to distinguish these? The - signature of the third variant is different, since it takes a - sequence (typically of the same type as the method's target) - intead of an element. For now, I'm using the same type for all - three. This means that is is possible for ``x in o`` to be True - even though ``x`` is never yielded by ``iter(o)``. A suggested - name for the third form is ``Searchable`` (though people have - objected against this name on the grounds that it has the wrong - association). +**Open issues:** Conceivably, instead of using the ABCMeta metaclass, +these classes could override ``__instancecheck__`` and +``__subclasscheck__`` to check for the presence of the applicable +special method; for example:: + + class Sized(metaclass=ABCMeta): + @abstractmethod + def __hash__(self): + return 0 + @classmethod + def __instancecheck__(cls, x): + return hasattr(x, "__len__") + @classmethod + def __subclasscheck__(cls, C): + return hasattr(C, "__bases__") and hasattr(C, "__len__") + +This has the advantage of not requiring explicit registration. +However, the semantics hard to get exactly right given the confusing +semantics of instance attributes vs. class attributes, and that a +class is an instance of its metaclass; the check for ``__bases__`` is +only an approximation of the desired semantics. **Strawman:** Let's +do it, but let's arrange it in such a way that the registration API +also works. Sets '''' -These abstract classes represent various stages of "set-ness". The +These abstract classes represent read-only sets and mutable sets. The most fundamental set operation is the membership test, written as ``x -in s`` and implemented by ``s.__contains__(x)``. This is already -taken care of by the `Container`` class defined above. Therefore, we -define a set as a sized, iterable container for which certain +in s`` and implemented by ``s.__contains__(x)``. This operation is +already defined by the `Container`` class defined above. Therefore, +we define a set as a sized, iterable container for which certain invariants from mathematical set theory hold. The built-in type ``set`` derives from ``MutableSet``. The built-in -type ``frozenset`` derives from ``HashableSet``. - -You might wonder why we require a set to be sized -- surely certain -infinite sets can be represented just fine in Python. For example, -the set of even integers could be defined like this:: - - class EvenIntegers(Container): - def __contains__(self, x): - return x % 2 == 0 - -However, such sets have rather limited practical value, and deciding -whether one such set is a subset of another would be difficult in -general without using a symbolic algebra package. So I consider this -out of the scope of a pragmatic proposal like this. +type ``frozenset`` derives from ``Set`` and ``Hashable``. ``Set`` - This is a sized, iterable, partially ordered container, i.e. a - subclass of ``Sized``, ``Iterable``, ``Container`` and - ``PartiallyOrdered``. Not every subset of those three classes is - a set though! Sets have the additional invariant that each - element occurs only once (as can be determined by iteration), and - in addition sets define concrete operators that implement the - inequality operations as subclass/superclass tests. In general, - the invariants for finite sets in mathematics hold. [11]_ + + This is a sized, iterable container, i.e., a subclass of + ``Sized``, ``Iterable`` and ``Container``. Not every subclass of + those three classes is a set though! Sets have the additional + invariant that each element occurs only once (as can be determined + by iteration), and in addition sets define concrete operators that + implement the inequality operations as subclass/superclass tests. + In general, the invariants for finite sets in mathematics + hold. [11]_ Sets with different implementations can be compared safely, (usually) efficiently and correctly using the mathematical @@ -539,57 +535,33 @@ out of the scope of a pragmatic proposal like this. can be compared to those for equality (and then they always compare unequal). + This class also defines concrete operators to compute union, + intersection, symmetric and asymmetric difference, respectively + ``__or__``, ``__and__``, ``__xor__`` and ``__sub__``. These + operators should return instances of ``Set``. The default + implementations call the overridable class method + ``_from_iterable()`` with an iterable argument. This factory + method's default implementation returns a ``frozenset`` instance; + it may be overridden to return another appropriate ``Set`` + subclass. + + Finally, this class defines a concrete method ``_hash`` which + computes the hash value from the elements. Hashable subclasses of + ``Set`` can implement ``__hash__`` by calling ``_hash`` or they + can reimplement the same algorithm more efficiently; but the + algorithm implemented should be the same. Currently the algorithm + is fully specified only by the source code [15]_. + **Note:** the ``issubset`` and ``issuperset`` methods found on the set type in Python 2 are not supported, as these are mostly just aliases for ``__le__`` and ``__ge__``. - **Open issues:** should we define comparison of instances of - different concrete set types this way? - -``ComposableSet`` - This is a subclass of ``Set`` that defines abstract operators to - compute union, intersection, symmetric and asymmetric difference, - respectively ``__or__``, ``__and__``, ``__xor__`` and ``__sub__``. - These operators should return instances of ``ComposableSet``. The - abstract implementations return no meaningful values but raise - ``NotImplementedError``; this is because any generic - implementation would have to create new instances but the ABCs - don't (and shouldn't, IMO) provide an API for creating new - instances. The implementations of these operators should ensure - that the results match the mathematical definition of set - composition. [11]_ - - **Open issues:** Should ``__or__`` and friends be abstract or - concrete methods? Making them abstract means that every - ComposableSet implementation must reimplement all of them. But - making them concrete begs the question of the actual return type: - since the ABC doesn't (and IMO shouldn't) define the constructor - signature for subclasses, the concrete implementations in the ABC - don't have an API to construct a new instance given an iterable. - Perhaps the right choice is to have a static concrete factory - function ``fromiterable`` which takes an iterable and returns - a ``ComposableSet`` instance. Subclasses can override this and - benefit from the default implementations of ``__or__`` etc.; or - they can override ``__or__`` if they want to. - -``HashableSet`` - This is a subclass of both ``ComposableSet`` and ``Hashable``. It - implements a concrete ``__hash__`` method that subclasses should - not override; or if they do, the subclass should compute the same - hash value. This is so that sets with different implementations - still hash to the same value, so they can be used interchangeably - as dictionary keys. (A similar constraint exists on the hash - values for different types of numbers and strings.) - - **Open issues:** Spell out the hash algorithm. Should there be - another ABC that derives from Set and Hashable, but not from - Composable? - ``MutableSet`` - This is a subclass of ``ComposableSet`` implementing additional - operations to add and remove elements. The supported methods have - the semantics known from the ``set`` type in Python 2 (except - for ``discard``, which is modeled after Java): + + This is a subclass of ``Set`` implementing additional operations + to add and remove elements. The supported methods have the + semantics known from the ``set`` type in Python 2 (except for + ``discard``, which is modeled after Java): ``.add(x)`` Abstract method returning a ``bool`` that adds the element @@ -635,21 +607,18 @@ out of the scope of a pragmatic proposal like this. Mappings '''''''' -These abstract classes represent various stages of mapping-ness. The -``Mapping`` class represents the most common read-only mapping API. -However, code *accepting* a mapping is encouraged to check for the -``BasicMapping`` ABC when iteration is not used. This allows for -certain "black-box" implementations that can look up values by key but -don't provide a convenient iteration API. A hypothetical example -would be an interface to a hierarchical filesystem, where keys are -pathnames relative to some root directory. Iterating over all -pathnames would presumably take forever, as would counting the number -of valid pathnames. +These abstract classes represent read-only mappings and mutable +mappings. The ``Mapping`` class represents the most common read-only +mapping API. The built-in type ``dict`` derives from ``MutableMapping``. -``BasicMapping`` - A subclass of ``Container`` defining the following methods: +``Mapping`` + + A subclass of ``Container``, ``Iterable`` and ``Sized``. The keys + of a mapping naturally form a set. The (key, value) pairs (which + must be tuples) are also referred to as items. The items also + form a set. Methods: ``.__getitem__(key)`` Abstract method that returns the value corresponding to @@ -664,25 +633,13 @@ The built-in type ``dict`` derives from ``MutableMapping``. Concrete method returning ``True`` if ``self[key]`` does not raise ``KeyError``, and ``False`` if it does. -``Mapping`` - A subclass of ``BasicMapping``, ``Iterable`` and ``Sized``. The - keys of a mapping naturally form a set. The (key, value) pairs - are also referred to as items. The items also form a set. - Methods: - ``.__len__()`` - Abstract method returning the length of the key set. + Abstract method returning the number of distinct keys (i.e., + the length of the key set). ``.__iter__()`` Abstract method returning each key in the key set exactly once. - ``.__eq__(obj)`` - Concrete method for comparing mappings. Two mappings, even - with different implementations, can be compared for equality, - and are considered equal if and only if their item sets are - equal. **Open issues:** should we define comparison of - instances of different concrete mapping types this way? - ``.keys()`` Concrete method returning the key set as a ``Set``. The default concrete implementation returns a "view" on the key @@ -712,11 +669,6 @@ The built-in type ``dict`` derives from ``MutableMapping``. i.e. iterating over the items, keys and values should return results in the same order. -``HashableMapping`` - A subclass of ``Mapping`` and ``Hashable``. The values should be - instances of ``Hashable``. The concrete ``__hash__`` method - should be equal to ``hash(m.items())``. - ``MutableMapping`` A subclass of ``Mapping`` that also implements some standard mutating methods. Abstract methods include ``__setitem__``, @@ -728,13 +680,15 @@ The built-in type ``dict`` derives from ``MutableMapping``. Sequences ''''''''' -These abstract classes represent various stages of sequence-ness. +These abstract classes represent read-only sequences and mutable +sequences. The built-in ``list`` and ``bytes`` types derive from ``MutableSequence``. The built-in ``tuple`` and ``str`` types derive -from ``HashableSequence``. +from ``Sequence`` and ``Hashable``. ``Sequence`` + A subclass of ``Iterable``, ``Sized``, ``Container``. It defines a new abstract method ``__getitem__`` that has a somewhat complicated signature: when called with an integer, it returns an @@ -747,15 +701,11 @@ from ``HashableSequence``. **Open issues:** Other candidate methods, which can all have default concrete implementations that only depend on ``__len__`` - and ``__getitem__`` with an integer argument: __reversed__, index, - count, __add__, __mul__, __eq__, __lt__, __le__. - -``HashableSequence`` - A subclass of ``Sequence`` and ``Hashable``. The concrete - ``__hash__`` method should implements the hashing algorithms used - by tuples in Python 2. + and ``__getitem__`` with an integer argument: ``__reversed__``, + ``index``, ``count``, ``__add__``, ``__mul__``. ``MutableSequence`` + A subclass of ``Sequence`` adding some standard mutating methods. Abstract mutating methods: ``__setitem__`` (for integer indices as well as slices), ``__delitem__`` (ditto), ``insert``, ``append``, @@ -765,24 +715,14 @@ from ``HashableSequence``. ``sort()`` -- that is only required to exist on genuine ``list`` instances. -**Open issues:** If all the elements of a sequence are totally -ordered, the sequence itself can be totally ordered with respect to -other sequences containing corresponding items of the same type. -Should we reflect this by making ``Sequence`` derive from -``TotallyOrdered``? Or ``Partiallyordered``? Also, we could easily -define comparison of sequences of different types, so that e.g. -``(1, 2, 3) == [1, 2, 3]`` and ``(1, 2) < [1, 2, 3]``. Should we? -(It might imply ``["a", "b"] == "ab"`` and ``[1, 2] == b"\1\2"``.) - Strings ------- -Python 3000 has two built-in string types: byte strings (``bytes``), -deriving from ``MutableSequence``, and (Unicode) character strings -(``str``), deriving from ``HashableSequence``. They also derive from -``TotallyOrdered``. If we were to introduce ``Searchable``, they -would also derive from that. +Python 3000 will likely have at least two built-in string types: byte +strings (``bytes``), deriving from ``MutableSequence``, and (Unicode) +character strings (``str``), deriving from ``Sequence`` and +``Hashable``. **Open issues:** define the base interfaces for these so alternative implementations and subclasses know what they are in for. This may be @@ -790,29 +730,6 @@ the subject of a new PEP or PEPs (PEP 358 should be co-opted for the ``bytes`` type). -Numbers -------- - -ABCs for numerical types are defined in PEP 3141. - - -Guidelines for Writing ABCs ---------------------------- - -Some suggestions for writing ABCs: - -* Use the ``@abstractmethod`` decorator. - -* Define abstract methods that could be useful as an end point when - called via a super chain. - -* Define concrete methods that are very simple permutations of - abstract methods (e.g. ``Mapping.get``). - -* Keep abstract classes small, one per use case instead of one per - concept. - - ABCs vs. Alternatives ===================== @@ -940,6 +857,12 @@ References .. [13] ABCMeta sample implementation (http://svn.python.org/view/sandbox/trunk/abc/xyz.py) +.. [14] python-dev email ("Comparing heterogeneous types") + http://mail.python.org/pipermail/python-dev/2004-June/045111.html + +.. [15] Function ``frozenset_hash()`` in Object/setobject.c + (http://svn.python.org/view/python/trunk/Objects/setobject.c) + Copyright =========