Greg Wilson's latest version

This commit is contained in:
Barry Warsaw 2000-12-14 17:11:17 +00:00
parent cc08f77942
commit 4158b00536
1 changed files with 78 additions and 22 deletions

View File

@ -16,38 +16,45 @@ Introduction
Rationale Rationale
Sets are a fundamental mathematical structure, and are commonly One of Python's greatest strengths as a teaching language is its
used to specify algorithms. They are much less frequently used in clarity. Its syntax and object model are so clean, and so simple,
implementations, even when they are the "right" structure. that it can serve as "executable pseudocode". Anything that makes
Programmers frequently use lists instead, even when the ordering it even better suited for this role will help increase its use in
information in lists is irrelevant, and by-value lookups are school and college courses.
frequent. (Most medium-sized C programs contain a depressing
number of start-to-end searches through malloc'd vectors to Sets are a fundamental mathematical structure, and are very
determine whether particular items are present or not...) commonly used in algorithm specifications. They are much less
frequently used in implementations, even when they are the "right"
structure. Programmers frequently use lists instead, even when
the ordering information in lists is irrelevant, and by-value
lookups are frequent. (Most medium-sized C programs contain a
depressing number of start-to-end searches through malloc'd
vectors to determine whether particular items are present or
not...)
Programmers are often told that they can implement sets as Programmers are often told that they can implement sets as
dictionaries with "don't care" values. Items can be added to dictionaries with "don't care" values. Items can be added to
these "sets" by assigning the "don't care" value to them; these "sets" by assigning the "don't care" value to them;
membership can be tested using "dict.has_key"; and items can be membership can be tested using "dict.has_key"; and items can be
deleted using "del". However, the three main binary operations deleted using "del". However, the other main operations on sets
on sets --- union, intersection, and difference --- are not (union, intersection, and difference) are not directly supported
directly supported by this representation, since their meaning is by this representation, since their meaning is ambiguous for
ambiguous for dictionaries containing key/value pairs. dictionaries containing key/value pairs.
Proposal Proposal
We propose adding a new built-in type to Python to represent sets. We propose adding a set type to Python. This type will be an
This type will be an unordered collection of unique values, just unordered collection of unique values, just as a dictionary is an
as a dictionary is an unordered collection of key/value pairs. unordered collection of key/value pairs. Constant sets will be
Constant sets will be represented using the usual mathematical represented using the usual mathematical notation, so that
notation, so that "{1, 2, 3}" will be a set of three integers. "{1, 2, 3}" will be a set of three integers.
In order to avoid ambiguity, the empty set will be written "{,}", In order to avoid ambiguity, the empty set will be written "{,}",
rather than "{}" (which is already used to represent empty rather than "{}" (which is already used to represent empty
dictionaries). We feel that this notation is as reasonable as the dictionaries). We feel that this notation is as reasonable as the
use of "(3,)" to represent single-element tuples; a more radical use of "(3,)" to represent single-element tuples; a more radical
alternative is discussed in the "Alternatives" section. strategy is discussed in the "Alternatives" section.
Iteration and comprehension will be implemented in the obvious Iteration and comprehension will be implemented in the obvious
ways, so that: ways, so that:
@ -64,7 +71,10 @@ Proposal
The binary operators '|', '&', '-', and "^" will implement set The binary operators '|', '&', '-', and "^" will implement set
union, intersection, difference, and symmetric difference. Their union, intersection, difference, and symmetric difference. Their
in-place equivalents will have the obvious semantics. in-place equivalents will have the obvious semantics. (We feel
that it is more sensible to overload the bitwise operators '|' and
'&', rather than the arithmetic operators '+' and "*', because
there is no arithmetic equivalent of '^'.)
The method "add" will add an element to a set. This is different The method "add" will add an element to a set. This is different
from set union, as the following example shows: from set union, as the following example shows:
@ -83,14 +93,21 @@ Proposal
using "del": using "del":
>>> S = {1, 2, 3} >>> S = {1, 2, 3}
>>> S.remove(3)
>>> S
{1, 2}
>>> del S[1] >>> del S[1]
>>> S >>> S
{2, 3}
>>> S.remove(3)
{2} {2}
The "KeyError" exception will be raised if an attempt is made to The "KeyError" exception will be raised if an attempt is made to
remove an element which is not in a set. remove an element which is not in a set. This definition of "del"
is consistent with that used for dictionaries:
>>> D = {1:2, 3:4}
>>> del D[1]
>>> D
{3:4}
A new method "dict.keyset" will return the keys of a dictionary as A new method "dict.keyset" will return the keys of a dictionary as
a set. A corresponding method "dict.valueset" will return the a set. A corresponding method "dict.valueset" will return the
@ -101,8 +118,47 @@ Proposal
handle sets as input. handle sets as input.
Open Issues
One major issue remains to be resolved: will sets be allowed to
contain mutable values, or will their values be required to
immutable (as dictionary keys are)? The disadvantages of allowing
only immutable values are clear --- if nothing else, it would
prevent users from creating sets of sets.
However, no efficient implementation of sets of mutable values has
yet been suggested. Hashing approaches will obviously fail (which
is why mutable values are not allowed to be dictionary keys).
Even simple-minded implementations, such as storing the set's
values in a list, can give incorrect results, as the following
example shows:
>>> a = [1, 2]
>>> b = [3, 4]
>>> S = [a, b]
>>> a[0:2] = [3, 4]
>>> S
[[3, 4], [3, 4]]
One way to solve this problem would be to add observer/observable
functionality to every data structure in Python, so that
structures would know to update themselves when their contained
values mutated. This is clearly impractical given the current
code base, and the performance penalties (in both memory and
execution time) would probably be unacceptable anyway.
Alternatives Alternatives
A more conservative alternative to this proposal would be to add a
new built-in class "Set", rather than adding new syntax for direct
expression of sets. On the positive side, this would not require
any changes to the Python language definition. On the negative
side, people would then not be able to write Python programs using
the same notation as they would use on a whiteboard. We feel that
the more Python supports standard pre-existing notation, the
greater the chances of it being adopted as a teaching language.
A radical alternative to the (admittedly clumsy) notation "{,}" is A radical alternative to the (admittedly clumsy) notation "{,}" is
to re-define "{}" to be the empty collection, rather than the to re-define "{}" to be the empty collection, rather than the
empty dictionary. Operations which made this object non-empty empty dictionary. Operations which made this object non-empty