python-peps/pep-0218.txt

223 lines
8.7 KiB
Plaintext
Raw Normal View History

PEP: 218
Title: Adding a Built-In Set Object Type
Version: $Revision$
Author: gvwilson@ddj.com (Greg Wilson)
Status: Draft
2000-08-23 01:51:13 -04:00
Type: Standards Track
Python-Version: 2.2
Created: 31-Jul-2000
Post-History:
Introduction
This PEP proposes adding a Set module to the standard Python
library, and to then make sets a built-in Python type if that
module is widely used. After explaining why sets are desirable,
and why the common idiom of using dictionaries in their place is
inadequate, we describe how we intend built-in sets to work, and
then how the preliminary Set module will behave. The penultimate
section discusses the mutability (or otherwise) of sets and set
elements, and the solution which the Set module will implement.
The last section then looks at alternatives that were considered,
but discarded.
Rationale
2000-12-14 12:11:17 -05:00
Sets are a fundamental mathematical structure, and are very
commonly used in algorithm specifications. They are much less
frequently used in implementations, even when they are the "right"
structure. Programmers frequently use lists instead, even when
the ordering information in lists is irrelevant, and by-value
lookups are frequent. (Most medium-sized C programs contain a
depressing number of start-to-end searches through malloc'd
vectors to determine whether particular items are present or
not...)
Programmers are often told that they can implement sets as
dictionaries with "don't care" values. Items can be added to
these "sets" by assigning the "don't care" value to them;
membership can be tested using "dict.has_key"; and items can be
2000-12-14 12:11:17 -05:00
deleted using "del". However, the other main operations on sets
(union, intersection, and difference) are not directly supported
by this representation, since their meaning is ambiguous for
dictionaries containing key/value pairs.
Long-Term Proposal
The long-term goal of this PEP is to add a built-in set type to
Python. This type will be an unordered collection of unique
values, just as a dictionary is an unordered collection of
key/value pairs. Constant sets will be represented using the
usual mathematical notation, so that "{1, 2, 3}" will be a set of
three integers.
In order to avoid ambiguity, the empty set will be written "{-}",
rather than "{}" (which is already used to represent empty
dictionaries). We feel that this notation is as reasonable as the
use of "(3,)" to represent single-element tuples; a more radical
strategy is discussed in the "Alternatives" section, and more
readable than the earlier proposal "{,}".
Iteration and comprehension will be implemented in the obvious
ways, so that:
for x in S:
will step through the elements of S in arbitrary order, while:
{x**2 for x in S}
will produce a set containing the squares of all elements in S,
Membership will be tested using "in" and "not in", and basic set
operations will be implemented by a mixture of overloaded
operators:
| union
& intersection
^ symmetric difference
- asymmetric difference
and methods:
S.add(x) Add "x" to the set.
S.update(s) Add all elements of sequence "s" to the set.
S.remove(x) Remove "x" from the set. If "x" is not
present, this method raises a LookupError
exception.
S.discard(x) Remove "x" from the set if it is present, or
do nothing if it is not.
S.pop() Remove and return an arbitrary element,
raising a LookupError if the element is not
present.
S.clear() Remove all elements from this set.
and one new built-in conversion function:
set(x) Create a set containing the elements of the
collection "x".
Notes:
1. We propose using the bitwise operators "|&" for intersection
and union. While "+" for union would be intuitive, "*" for
intersection is not (very few of the people asked guessed what
it did correctly).
2. We considered using "+" to add elements to a set, rather than
"add". However, Guido van Rossum pointed out that "+" is
symmetric for other built-in types (although "*" is not). Use
of "add" will also avoid confusion between that operation and
set union.
3. Sets raise "LookupError" exceptions, rather than "KeyError" or
"ValueError", because set elements are neither keys nor values.
Open Issues for the Long-Term Proposal
Earlier drafts of PEP 218 had only a single set type, but the
sets.py implementation in Python 2.3 has two, Set and
ImmutableSet. The long-term proposal has a single built-in
conversion function, set(iterable); how should instances of a
built-in immutable set type be created? Possibilities include a
second immutable_set() built-in, or perhaps the set() function
could take an additional argument,
e.g. set(iterable, immutable=True)?
The PEP proposes {1,2,3} as the set notation and {-} for the empty
set. Would there be different syntax for an immutable and a
mutable set? Perhaps the built-in syntax would only be for
mutable sets, and an immutable set would be created from a mutable
set using the appropriate built-in function,
e.g. immutable_set({1,2,3}).
Short-Term Proposal
In order to determine whether there is enough demand for sets to
justify making them a built-in type, and to give users a chance to
try out the semantics we propose for sets, our short-term proposal
is to add a "Set" class to the standard Python library. This
class will have the operators and methods described above; it will
also have named methods corresponding to all of the operations: a
"union" method for "|", and a "union_update" method for "|=", and
so on.
This class will use a dictionary internally to contain set values.
To avoid having to duplicate values (e.g. for iteration through
the set), the class will rely on the iterators added in Python
2.2.
2000-12-14 12:11:17 -05:00
Tim Peters believes that the class's constructor should take a
single sequence as an argument, and populate the set with that
sequence's elements. His argument is that in most cases,
programmers will be creating sets from pre-existing sequences, so
that this case should be the common one. However, this would
require users to remember an extra set of parentheses when
initializing a set with known values:
>>> Set((1, 2, 3, 4)) # case 1
On the other hand, feedback from a small number of novice Python
users (all of whom were very experienced with other languages)
indicates that people will find a "parenthesis-free" syntax more
natural:
>>> Set(1, 2, 3, 4) # case 2
On the other, other hand, if Python does adopt a dictionary-like
notation for sets in the future, then case 2 will become
redundant. We have therefore adopted the first strategy, in which
the initializer takes a single iterable argument.
2000-12-14 12:11:17 -05:00
Mutability
2000-12-14 12:11:17 -05:00
The most difficult question to resolve in this proposal was
whether sets ought to be able to contain mutable elements. A
dictionary's keys must be immutable in order to support fast,
reliable lookup. While it would be easy to require set elements
to be immutable, this would preclude sets of sets (which are
widely used in graph algorithms and other applications).
2000-12-14 12:11:17 -05:00
There are two classes implemented in the "sets" module. Instances
of the Set class can be modified by the addition or removal of
elements, and the ImmutableSet class is "frozen", with an
unchangeable collection of elements. Therefore, an ImmutableSet
may be used as a dictionary key or as a set element, but cannot be
updated. Both types of set require that their elements are
immutable, hashable objects.
2000-12-14 12:11:17 -05:00
Alternatives
An alternative to the notation "{-}" for the empty set would be to
re-define "{}" to be the empty collection, rather than the empty
dictionary. Operations which made this object non-empty would
silently convert it to either a dictionary or a set; it would then
retain that type for the rest of its existence. This idea was
rejected because of its potential impact on existing Python
programs. A similar proposal to modify "dict.keys" and
"dict.values" to return sets, rather than lists, was rejected for
the same reasons.
Copyright
This document has been placed in the Public Domain.
Local Variables:
mode: indented-text
indent-tabs-mode: nil
End: