211 lines
8.2 KiB
Plaintext
211 lines
8.2 KiB
Plaintext
PEP: 218
|
||
Title: Adding a Built-In Set Object Type
|
||
Version: $Revision$
|
||
Author: gvwilson@ddj.com (Greg Wilson)
|
||
Status: Draft
|
||
Type: Standards Track
|
||
Python-Version: 2.2
|
||
Created: 31-Jul-2000
|
||
Post-History:
|
||
|
||
|
||
Cautionary note
|
||
|
||
Much of this PEP has been implemented in the new "sets" module,
|
||
added to Python 2.3. The sets module differs in a number of
|
||
aspects from this PEP; in particular its approach to mutability is
|
||
different. It is my hope that this PEP's author will update his
|
||
PEP to match the sets module more closely. [note added by Guido
|
||
van Rossum]
|
||
|
||
|
||
Introduction
|
||
|
||
This PEP proposes adding a Set module to the standard Python
|
||
library, and to then make sets a built-in Python type if that
|
||
module is widely used. After explaining why sets are desirable,
|
||
and why the common idiom of using dictionaries in their place is
|
||
inadequate, we describe how we intend built-in sets to work, and
|
||
then how the preliminary Set module will behave. The penultimate
|
||
section discusses the mutability (or otherwise) of sets and set
|
||
elements, and the solution which the Set module will implement.
|
||
The last section then looks at alternatives that were considered,
|
||
but discarded.
|
||
|
||
|
||
Rationale
|
||
|
||
Sets are a fundamental mathematical structure, and are very
|
||
commonly used in algorithm specifications. They are much less
|
||
frequently used in implementations, even when they are the "right"
|
||
structure. Programmers frequently use lists instead, even when
|
||
the ordering information in lists is irrelevant, and by-value
|
||
lookups are frequent. (Most medium-sized C programs contain a
|
||
depressing number of start-to-end searches through malloc'd
|
||
vectors to determine whether particular items are present or
|
||
not...)
|
||
|
||
Programmers are often told that they can implement sets as
|
||
dictionaries with "don't care" values. Items can be added to
|
||
these "sets" by assigning the "don't care" value to them;
|
||
membership can be tested using "dict.has_key"; and items can be
|
||
deleted using "del". However, the other main operations on sets
|
||
(union, intersection, and difference) are not directly supported
|
||
by this representation, since their meaning is ambiguous for
|
||
dictionaries containing key/value pairs.
|
||
|
||
|
||
Long-Term Proposal
|
||
|
||
The long-term goal of this PEP is to add a built-in set type to
|
||
Python. This type will be an unordered collection of unique
|
||
values, just as a dictionary is an unordered collection of
|
||
key/value pairs. Constant sets will be represented using the
|
||
usual mathematical notation, so that "{1, 2, 3}" will be a set of
|
||
three integers.
|
||
|
||
In order to avoid ambiguity, the empty set will be written "{-}",
|
||
rather than "{}" (which is already used to represent empty
|
||
dictionaries). We feel that this notation is as reasonable as the
|
||
use of "(3,)" to represent single-element tuples; a more radical
|
||
strategy is discussed in the "Alternatives" section, and more
|
||
readable than the earlier proposal "{,}".
|
||
|
||
Iteration and comprehension will be implemented in the obvious
|
||
ways, so that:
|
||
|
||
for x in S:
|
||
|
||
will step through the elements of S in arbitrary order, while:
|
||
|
||
{x**2 for x in S}
|
||
|
||
will produce a set containing the squares of all elements in S,
|
||
Membership will be tested using "in" and "not in", and basic set
|
||
operations will be implemented by a mixture of overloaded
|
||
operators:
|
||
|
||
| union
|
||
& intersection
|
||
^ symmetric difference
|
||
- asymmetric difference
|
||
|
||
and methods:
|
||
|
||
S.add(x) Add "x" to the set.
|
||
|
||
S.update(s) Add all elements of sequence "s" to the set.
|
||
|
||
S.remove(x) Remove "x" from the set. If "x" is not
|
||
present, this method raises a LookupError
|
||
exception.
|
||
|
||
S.discard(x) Remove "x" from the set if it is present, or
|
||
do nothing if it is not.
|
||
|
||
S.popitem() Remove and return an arbitrary element,
|
||
raising a LookupError if the element is not
|
||
present.
|
||
|
||
and one new built-in conversion function:
|
||
|
||
set(x) Create a set containing the elements of the
|
||
collection "x".
|
||
|
||
Notes:
|
||
|
||
1. We propose using the bitwise operators "|&" for intersection
|
||
and union. While "+" for union would be intuitive, "*" for
|
||
intersection is not (very few of the people asked guessed what
|
||
it did correctly).
|
||
|
||
2. We considered using "+" to add elements to a set, rather than
|
||
"add". However, Guido van Rossum pointed out that "+" is
|
||
symmetric for other built-in types (although "*" is not). Use
|
||
of "add" will also avoid confusion between that operation and
|
||
set union.
|
||
|
||
3. Sets raise "LookupError" exceptions, rather than "KeyError" or
|
||
"ValueError", because set elements are neither keys nor values.
|
||
|
||
Short-Term Proposal
|
||
|
||
In order to determine whether there is enough demand for sets to
|
||
justify making them a built-in type, and to give users a chance to
|
||
try out the semantics we propose for sets, our short-term proposal
|
||
is to add a "Set" class to the standard Python library. This
|
||
class will have the operators and methods described above; it will
|
||
also have named methods corresponding to all of the operations: a
|
||
"union" method for "|", and a "union_update" method for "|=", and
|
||
so on.
|
||
|
||
This class will use a dictionary internally to contain set values.
|
||
In order to avoid having to duplicate values (e.g. for iteration
|
||
through the set), the class will rely on the iterators which are
|
||
scheduled to appear in Python 2.2.
|
||
|
||
Tim Peters believes that the class's constructor should take a
|
||
single sequence as an argument, and populate the set with that
|
||
sequence's elements. His argument is that in most cases,
|
||
programmers will be created sets from pre-existing sequences, so
|
||
that common case should be usual. However, this would require
|
||
users to remember an extra set of parentheses when initializing a
|
||
set with known values:
|
||
|
||
>>> Set((1, 2, 3, 4)) # case 1
|
||
|
||
On the other hand, feedback from a small number of novice Python
|
||
users (all of whom were very experienced with other languages)
|
||
indicates that people will find a "parenthesis-free" syntax more
|
||
natural:
|
||
|
||
>>> Set(1, 2, 3, 4) # case 2
|
||
|
||
On the other, other hand, if Python does adopt a dictionary-like
|
||
notation for sets in the future, then case 2 will become
|
||
redundant. We have therefore adopted the first strategy, in which
|
||
the initializer takes a single sequence argument.
|
||
|
||
|
||
Mutability
|
||
|
||
The most difficult question to resolve in this proposal was
|
||
whether sets ought to be able to contain mutable elements. A
|
||
dictionary's keys must be immutable in order to support fast,
|
||
reliable lookup. While it would be easy to require set elements
|
||
to be immutable, this would preclude sets of sets (which are
|
||
widely used in graph algorithms and other applications).
|
||
|
||
At Tim Peters' suggestion, we will implement the following
|
||
compromise. A set may only contain immutable elements, but is
|
||
itself mutable *until* its hash code is calculated. As soon as
|
||
that happens, the set is "frozen", i.e. becomes immutable. Thus,
|
||
a set may be used as a dictionary key, or as a set element, but
|
||
cannot be updated after this is done. Peters reports that this
|
||
behavior rarely causes problems in practice.
|
||
|
||
|
||
Alternatives
|
||
|
||
An alternative to the notation "{-}" for the empty set would be to
|
||
re-define "{}" to be the empty collection, rather than the empty
|
||
dictionary. Operations which made this object non-empty would
|
||
silently convert it to either a dictionary or a set; it would then
|
||
retain that type for the rest of its existence. This idea was
|
||
rejected because of its potential impact on existing Python
|
||
programs. A similar proposal to modify "dict.keys" and
|
||
"dict.values" to return sets, rather than lists, was rejected for
|
||
the same reasons.
|
||
|
||
|
||
Copyright
|
||
|
||
This document has been placed in the Public Domain.
|
||
|
||
|
||
|
||
Local Variables:
|
||
mode: indented-text
|
||
indent-tabs-mode: nil
|
||
End:
|