183 lines
6.6 KiB
Plaintext
183 lines
6.6 KiB
Plaintext
PEP: 218
|
||
Title: Adding a Built-In Set Object Type
|
||
Version: $Revision$
|
||
Author: gvwilson@nevex.com (Greg Wilson)
|
||
Status: Draft
|
||
Type: Standards Track
|
||
Python-Version: 2.1
|
||
Created: 31-Jul-2000
|
||
Post-History:
|
||
|
||
|
||
Introduction
|
||
|
||
This PEP proposes adding sets as a built-in type in Python.
|
||
|
||
|
||
Rationale
|
||
|
||
One of Python's greatest strengths as a teaching language is its
|
||
clarity. Its syntax and object model are so clean, and so simple,
|
||
that it can serve as "executable pseudocode". Anything that makes
|
||
it even better suited for this role will help increase its use in
|
||
school and college courses.
|
||
|
||
Sets are a fundamental mathematical structure, and are very
|
||
commonly used in algorithm specifications. They are much less
|
||
frequently used in implementations, even when they are the "right"
|
||
structure. Programmers frequently use lists instead, even when
|
||
the ordering information in lists is irrelevant, and by-value
|
||
lookups are frequent. (Most medium-sized C programs contain a
|
||
depressing number of start-to-end searches through malloc'd
|
||
vectors to determine whether particular items are present or
|
||
not...)
|
||
|
||
Programmers are often told that they can implement sets as
|
||
dictionaries with "don't care" values. Items can be added to
|
||
these "sets" by assigning the "don't care" value to them;
|
||
membership can be tested using "dict.has_key"; and items can be
|
||
deleted using "del". However, the other main operations on sets
|
||
(union, intersection, and difference) are not directly supported
|
||
by this representation, since their meaning is ambiguous for
|
||
dictionaries containing key/value pairs.
|
||
|
||
|
||
Proposal
|
||
|
||
We propose adding a set type to Python. This type will be an
|
||
unordered collection of unique values, just as a dictionary is an
|
||
unordered collection of key/value pairs. Constant sets will be
|
||
represented using the usual mathematical notation, so that
|
||
"{1, 2, 3}" will be a set of three integers.
|
||
|
||
In order to avoid ambiguity, the empty set will be written "{,}",
|
||
rather than "{}" (which is already used to represent empty
|
||
dictionaries). We feel that this notation is as reasonable as the
|
||
use of "(3,)" to represent single-element tuples; a more radical
|
||
strategy is discussed in the "Alternatives" section.
|
||
|
||
Iteration and comprehension will be implemented in the obvious
|
||
ways, so that:
|
||
|
||
for x in S:
|
||
|
||
will step through the elements of S in arbitrary order, while:
|
||
|
||
{x**2 for x in S}
|
||
|
||
will produce a set containing the squares of all elements in S,
|
||
|
||
Membership will be tested using "in" and "not in".
|
||
|
||
The binary operators '|', '&', '-', and "^" will implement set
|
||
union, intersection, difference, and symmetric difference. Their
|
||
in-place equivalents will have the obvious semantics. (We feel
|
||
that it is more sensible to overload the bitwise operators '|' and
|
||
'&', rather than the arithmetic operators '+' and "*', because
|
||
there is no arithmetic equivalent of '^'.)
|
||
|
||
The method "add" will add an element to a set. This is different
|
||
from set union, as the following example shows:
|
||
|
||
>>> {1, 2, 3} | {4, 5, 6}
|
||
{1, 2, 3, 4, 5, 6}
|
||
|
||
>>> {1, 2, 3}.add({4, 5, 6})
|
||
{1, 2, 3, {4, 5, 6}}
|
||
|
||
Note that we expect that items can also be added to sets using
|
||
in-place union of temporaries, i.e. "S |= {x}" instead of
|
||
"S.add(x)".
|
||
|
||
Elements will be deleted from sets using a "remove" method, or
|
||
using "del":
|
||
|
||
>>> S = {1, 2, 3}
|
||
>>> S.remove(3)
|
||
>>> S
|
||
{1, 2}
|
||
>>> del S[1]
|
||
>>> S
|
||
{2}
|
||
|
||
The "KeyError" exception will be raised if an attempt is made to
|
||
remove an element which is not in a set. This definition of "del"
|
||
is consistent with that used for dictionaries:
|
||
|
||
>>> D = {1:2, 3:4}
|
||
>>> del D[1]
|
||
>>> D
|
||
{3:4}
|
||
|
||
A new method "dict.keyset" will return the keys of a dictionary as
|
||
a set. A corresponding method "dict.valueset" will return the
|
||
dictionary's values as a set.
|
||
|
||
A built-in converter "set()" will convert any sequence type to a
|
||
set; converters such as "list()" and "tuple()" will be extended to
|
||
handle sets as input.
|
||
|
||
|
||
Open Issues
|
||
|
||
One major issue remains to be resolved: will sets be allowed to
|
||
contain mutable values, or will their values be required to
|
||
immutable (as dictionary keys are)? The disadvantages of allowing
|
||
only immutable values are clear --- if nothing else, it would
|
||
prevent users from creating sets of sets.
|
||
|
||
However, no efficient implementation of sets of mutable values has
|
||
yet been suggested. Hashing approaches will obviously fail (which
|
||
is why mutable values are not allowed to be dictionary keys).
|
||
Even simple-minded implementations, such as storing the set's
|
||
values in a list, can give incorrect results, as the following
|
||
example shows:
|
||
|
||
>>> a = [1, 2]
|
||
>>> b = [3, 4]
|
||
>>> S = [a, b]
|
||
>>> a[0:2] = [3, 4]
|
||
>>> S
|
||
[[3, 4], [3, 4]]
|
||
|
||
One way to solve this problem would be to add observer/observable
|
||
functionality to every data structure in Python, so that
|
||
structures would know to update themselves when their contained
|
||
values mutated. This is clearly impractical given the current
|
||
code base, and the performance penalties (in both memory and
|
||
execution time) would probably be unacceptable anyway.
|
||
|
||
|
||
Alternatives
|
||
|
||
A more conservative alternative to this proposal would be to add a
|
||
new built-in class "Set", rather than adding new syntax for direct
|
||
expression of sets. On the positive side, this would not require
|
||
any changes to the Python language definition. On the negative
|
||
side, people would then not be able to write Python programs using
|
||
the same notation as they would use on a whiteboard. We feel that
|
||
the more Python supports standard pre-existing notation, the
|
||
greater the chances of it being adopted as a teaching language.
|
||
|
||
A radical alternative to the (admittedly clumsy) notation "{,}" is
|
||
to re-define "{}" to be the empty collection, rather than the
|
||
empty dictionary. Operations which made this object non-empty
|
||
would silently convert it to either a dictionary or a set; it
|
||
would then retain that type for the rest of its existence. This
|
||
idea was rejected because of its potential impact on existing
|
||
Python programs. A similar proposal to modify "dict.keys" and
|
||
"dict.values" to return sets, rather than lists, was rejected for
|
||
the same reasons.
|
||
|
||
|
||
Copyright
|
||
|
||
This document has been placed in the Public Domain.
|
||
|
||
|
||
|
||
Local Variables:
|
||
mode: indented-text
|
||
indent-tabs-mode: nil
|
||
End:
|