Fix a blatant bug in the definition of hash().

Add ComposableSet, which defines union etc.
This commit is contained in:
Guido van Rossum 2007-04-19 22:47:08 +00:00
parent e06f400021
commit f35a9eab74
1 changed files with 55 additions and 36 deletions

View File

@ -208,8 +208,17 @@ it has two prescribed methods.
below). The abstract ``__hash__`` method always returns 0, which
is a valid (albeit inefficient) implementation. **Invariant:** If
classes ``C1`` and ``C2`` both derive from ``Hashable``, the
invariant ``hash(o1) == hash(o2)`` must imply ``o1 == o2`` for all
condition ``o1 == o2`` must imply ``hash(o1) == hash(o2)`` for all
instances ``o1`` of ``C1`` and all instances ``o2`` of ``C2``.
IOW, two objects shouldn't compare equal but have different hash
values.
Another constraint is that hashable objects, once created, should
never change their value (as compared by ``==``) or their hash
value. If a class cannot guarantee this, it should not derive
from ``Hashable``; if it cannot guarantee this for certain
instances only, ``__hash__`` for those instances should raise an
exception.
Note: being an instance of this class does not imply that an
object is immutable; e.g. a tuple containing a list as a member is
@ -287,8 +296,8 @@ out of the scope of a pragmatic proposal like this.
``Finite``, ``Iterable`` and ``Container``. Not every subset of
those three classes is a set though! Sets have the additional
invariant that each element occurs only once (as can be determined
by iteration), and in addition sets implement rich comparisons
defined as subclass/superclass tests.
by iteration), and in addition sets define concrete operators that
implement rich comparisons defined as subclass/superclass tests.
Sets with different implementations can be compared safely,
efficiently and correctly. Because ``Set`` derives from
@ -302,29 +311,33 @@ out of the scope of a pragmatic proposal like this.
to mappings or sequences, but they can be compared for equality
(and then they always compare unequal).
**Open issue:** Should we also implement the ``issubset`` and
``issuperset`` methods found on the set type in Python 2? As
these are just aliases for ``__le__`` and ``__ge__``, I'm tempted
to leave these out.
Note: the ``issubset`` and ``issuperset`` methods found on the set
type in Python 2 are not supported, as these are mostly just
aliases for ``__le__`` and ``__ge__``.
**Open issue:** Should this class also implement the operators
and/or methods that compute union, intersection, symmetric and
asymmetric difference? The problem with those (unlike the
comparison operators) is what should be the type of the return
value. I'm tentatively leaving these out -- if you need them, you
can test for a ``Set`` instance that implements e.g. ``__and__``.
Some alternatives: make these abstract methods (even though the
semantics apart from the type are well-defined); or make them
concrete methods that return a specific concrete set type; or make
them concrete methods that assume the class constructor always
accepts an iterable of elements; or add a new class method that
accepts an iterable of elements and that creates a new instance.
(I originally considered a "view" alternative, but the problem is
that computing ``len(a&b)`` requires iterating over ``a`` or
``b``, and that pretty much kills the idea.)
**Open Issues:** Should I spell out the invariants and method
definitions?
``ComposableSet``
This is a subclass of ``Set`` that defines abstract operators to
compute union, intersection, symmetric and asymmetric difference,
respectively ``__or__``, ``__and__``, ``__xor__`` and ``__sub__``.
These operators should return instances of ``ComposableSet``. The
abstract implementations return no meaningful values but raise
``NotImplementedError``; this is because any generic
implementation would have to create new instances but the ABCs
don't (and shouldn't, IMO) provide an API for creating new
instances. **Invariants:** The implementations of these operators
should ensure that the results match the mathematical definition
of set composition.
**Open Issues:** Should I spell out the invariants? Should we
define an API for creating new instances (e.g. a class method or a
fixed constructor signature)? Should we just pick a concrete
return type (e.g. ``set``)? Should we add the ``copy`` method?
``HashableSet``
This is a subclass of both ``Set`` and ``Hashable``. It
This is a subclass of both ``ComposableSet`` and ``Hashable``. It
implements a concrete ``__hash__`` method that subclasses should
not override; or if they do, the subclass should compute the same
hash value. This is so that sets with different implementations
@ -332,17 +345,21 @@ out of the scope of a pragmatic proposal like this.
as dictionary keys. (A similar constraint exists on the hash
values for different types of numbers and strings.)
**Open Issues:** Should I spell out the hash algorithm? Should
there be another ABC that derives from Set and Hashable (but not
from Composable)?
``MutableSet``
This is a subclass of ``Set`` implementing additional operations
to add and remove elements. The supported methods have the
semantics known from the ``set`` type in Python 2:
This is a subclass of ``ComposableSet`` implementing additional
operations to add and remove elements. The supported methods have
the semantics known from the ``set`` type in Python 2:
``.add(x)``
Abstract method that adds the element ``x``, if it isn't
already in the set.
``.remove(x)``
Abstract method that removes the element ``x``; raises
``KeyError`` if ``x`` is not in the set.
@ -355,16 +372,18 @@ out of the scope of a pragmatic proposal like this.
Abstract method that empties the set. (Making this concrete
would just add a slow, cumbersome default implementation.)
**Open issue:** Should we support all the operations implemented
by the Python 2 ``set`` type? I.e. union, update, __or__,
__ror__, __ior__, intersection, intersection_update, __and__,
__rand__, __iand__, difference, difference_update, __xor__,
__rxor__, __ixor__, symmetric_difference,
symmetric_difference_update, __sub__, __rsub__, __isub__. Note
that in Python 2, ``a.update(b)`` is not exactly the same as ``a
|= b``, since ``update()`` takes any iterable for an argument,
**Open issues:** Should we support more operations implemented by
the Python 2 ``set`` type? E.g. pop, update, __ior__,
intersection_update, __iand__, difference_update, __ixor__,
symmetric_difference_update, __isub__. Should we unify ``remove``
and ``discard``, a la Java (which has a single method returning
a boolean indicating whether it was removed or not)?
Note that in Python 2, ``a.update(b)`` is not exactly the same as
``a |= b``, since ``update()`` takes any iterable for an argument,
while ``|=`` requires another set; similar for the other
operators.
operators. What to do about this? Do we really want the method
explosion that comes from this distinction?
Mappings