2000-07-31 16:20:31 -04:00
|
|
|
|
PEP: 218
|
|
|
|
|
Title: Adding a Built-In Set Object Type
|
|
|
|
|
Version: $Revision$
|
2001-05-07 15:51:10 -04:00
|
|
|
|
Author: gvwilson@ddj.com (Greg Wilson)
|
2000-07-31 16:20:31 -04:00
|
|
|
|
Status: Draft
|
2000-08-23 01:51:13 -04:00
|
|
|
|
Type: Standards Track
|
2001-05-07 15:51:10 -04:00
|
|
|
|
Python-Version: 2.2
|
2000-07-31 16:20:31 -04:00
|
|
|
|
Created: 31-Jul-2000
|
|
|
|
|
Post-History:
|
|
|
|
|
|
|
|
|
|
|
2002-08-20 10:00:44 -04:00
|
|
|
|
Cautionary note
|
|
|
|
|
|
|
|
|
|
Much of this PEP has been implemented in the new "sets" module,
|
|
|
|
|
added to Python 2.3. The sets module differs in a number of
|
|
|
|
|
aspects from this PEP; in particular its approach to mutability is
|
|
|
|
|
different. It is my hope that this PEP's author will update his
|
|
|
|
|
PEP to match the sets module more closely. [note added by Guido
|
|
|
|
|
van Rossum]
|
|
|
|
|
|
|
|
|
|
|
2000-11-26 23:10:06 -05:00
|
|
|
|
Introduction
|
|
|
|
|
|
2001-05-07 15:51:10 -04:00
|
|
|
|
This PEP proposes adding a Set module to the standard Python
|
|
|
|
|
library, and to then make sets a built-in Python type if that
|
|
|
|
|
module is widely used. After explaining why sets are desirable,
|
|
|
|
|
and why the common idiom of using dictionaries in their place is
|
|
|
|
|
inadequate, we describe how we intend built-in sets to work, and
|
|
|
|
|
then how the preliminary Set module will behave. The penultimate
|
|
|
|
|
section discusses the mutability (or otherwise) of sets and set
|
|
|
|
|
elements, and the solution which the Set module will implement.
|
|
|
|
|
The last section then looks at alternatives that were considered,
|
|
|
|
|
but discarded.
|
2000-11-26 23:10:06 -05:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Rationale
|
2000-07-31 16:20:31 -04:00
|
|
|
|
|
2000-12-14 12:11:17 -05:00
|
|
|
|
Sets are a fundamental mathematical structure, and are very
|
|
|
|
|
commonly used in algorithm specifications. They are much less
|
|
|
|
|
frequently used in implementations, even when they are the "right"
|
|
|
|
|
structure. Programmers frequently use lists instead, even when
|
|
|
|
|
the ordering information in lists is irrelevant, and by-value
|
|
|
|
|
lookups are frequent. (Most medium-sized C programs contain a
|
|
|
|
|
depressing number of start-to-end searches through malloc'd
|
|
|
|
|
vectors to determine whether particular items are present or
|
|
|
|
|
not...)
|
2000-11-26 23:10:06 -05:00
|
|
|
|
|
|
|
|
|
Programmers are often told that they can implement sets as
|
|
|
|
|
dictionaries with "don't care" values. Items can be added to
|
|
|
|
|
these "sets" by assigning the "don't care" value to them;
|
|
|
|
|
membership can be tested using "dict.has_key"; and items can be
|
2000-12-14 12:11:17 -05:00
|
|
|
|
deleted using "del". However, the other main operations on sets
|
|
|
|
|
(union, intersection, and difference) are not directly supported
|
|
|
|
|
by this representation, since their meaning is ambiguous for
|
|
|
|
|
dictionaries containing key/value pairs.
|
2000-11-26 23:10:06 -05:00
|
|
|
|
|
|
|
|
|
|
2001-05-07 15:51:10 -04:00
|
|
|
|
Long-Term Proposal
|
2000-11-26 23:10:06 -05:00
|
|
|
|
|
2001-05-07 15:51:10 -04:00
|
|
|
|
The long-term goal of this PEP is to add a built-in set type to
|
|
|
|
|
Python. This type will be an unordered collection of unique
|
|
|
|
|
values, just as a dictionary is an unordered collection of
|
|
|
|
|
key/value pairs. Constant sets will be represented using the
|
|
|
|
|
usual mathematical notation, so that "{1, 2, 3}" will be a set of
|
|
|
|
|
three integers.
|
2000-11-26 23:10:06 -05:00
|
|
|
|
|
2001-05-07 15:51:10 -04:00
|
|
|
|
In order to avoid ambiguity, the empty set will be written "{-}",
|
2000-11-26 23:10:06 -05:00
|
|
|
|
rather than "{}" (which is already used to represent empty
|
|
|
|
|
dictionaries). We feel that this notation is as reasonable as the
|
|
|
|
|
use of "(3,)" to represent single-element tuples; a more radical
|
2001-05-07 15:51:10 -04:00
|
|
|
|
strategy is discussed in the "Alternatives" section, and more
|
|
|
|
|
readable than the earlier proposal "{,}".
|
2000-11-26 23:10:06 -05:00
|
|
|
|
|
|
|
|
|
Iteration and comprehension will be implemented in the obvious
|
|
|
|
|
ways, so that:
|
|
|
|
|
|
|
|
|
|
for x in S:
|
|
|
|
|
|
|
|
|
|
will step through the elements of S in arbitrary order, while:
|
|
|
|
|
|
|
|
|
|
{x**2 for x in S}
|
|
|
|
|
|
|
|
|
|
will produce a set containing the squares of all elements in S,
|
2001-05-07 15:51:10 -04:00
|
|
|
|
Membership will be tested using "in" and "not in", and basic set
|
|
|
|
|
operations will be implemented by a mixture of overloaded
|
|
|
|
|
operators:
|
|
|
|
|
|
|
|
|
|
| union
|
|
|
|
|
& intersection
|
|
|
|
|
^ symmetric difference
|
|
|
|
|
- asymmetric difference
|
|
|
|
|
|
|
|
|
|
and methods:
|
|
|
|
|
|
|
|
|
|
S.add(x) Add "x" to the set.
|
|
|
|
|
|
|
|
|
|
S.update(s) Add all elements of sequence "s" to the set.
|
|
|
|
|
|
|
|
|
|
S.remove(x) Remove "x" from the set. If "x" is not
|
|
|
|
|
present, this method raises a LookupError
|
|
|
|
|
exception.
|
|
|
|
|
|
|
|
|
|
S.discard(x) Remove "x" from the set if it is present, or
|
|
|
|
|
do nothing if it is not.
|
|
|
|
|
|
|
|
|
|
S.popitem() Remove and return an arbitrary element,
|
|
|
|
|
raising a LookupError if the element is not
|
|
|
|
|
present.
|
2000-11-26 23:10:06 -05:00
|
|
|
|
|
2001-05-07 15:51:10 -04:00
|
|
|
|
and one new built-in conversion function:
|
2000-11-26 23:10:06 -05:00
|
|
|
|
|
2001-05-07 15:51:10 -04:00
|
|
|
|
set(x) Create a set containing the elements of the
|
|
|
|
|
collection "x".
|
2000-11-26 23:10:06 -05:00
|
|
|
|
|
2001-05-07 15:51:10 -04:00
|
|
|
|
Notes:
|
2000-11-26 23:10:06 -05:00
|
|
|
|
|
2001-05-07 15:51:10 -04:00
|
|
|
|
1. We propose using the bitwise operators "|&" for intersection
|
|
|
|
|
and union. While "+" for union would be intuitive, "*" for
|
|
|
|
|
intersection is not (very few of the people asked guessed what
|
|
|
|
|
it did correctly).
|
2000-11-26 23:10:06 -05:00
|
|
|
|
|
2001-05-07 15:51:10 -04:00
|
|
|
|
2. We considered using "+" to add elements to a set, rather than
|
|
|
|
|
"add". However, Guido van Rossum pointed out that "+" is
|
|
|
|
|
symmetric for other built-in types (although "*" is not). Use
|
|
|
|
|
of "add" will also avoid confusion between that operation and
|
|
|
|
|
set union.
|
2000-11-26 23:10:06 -05:00
|
|
|
|
|
2001-05-07 15:51:10 -04:00
|
|
|
|
3. Sets raise "LookupError" exceptions, rather than "KeyError" or
|
|
|
|
|
"ValueError", because set elements are neither keys nor values.
|
2000-11-26 23:10:06 -05:00
|
|
|
|
|
2001-05-07 15:51:10 -04:00
|
|
|
|
Short-Term Proposal
|
2000-11-26 23:10:06 -05:00
|
|
|
|
|
2001-05-07 15:51:10 -04:00
|
|
|
|
In order to determine whether there is enough demand for sets to
|
|
|
|
|
justify making them a built-in type, and to give users a chance to
|
|
|
|
|
try out the semantics we propose for sets, our short-term proposal
|
|
|
|
|
is to add a "Set" class to the standard Python library. This
|
|
|
|
|
class will have the operators and methods described above; it will
|
|
|
|
|
also have named methods corresponding to all of the operations: a
|
|
|
|
|
"union" method for "|", and a "union_update" method for "|=", and
|
|
|
|
|
so on.
|
2000-11-26 23:10:06 -05:00
|
|
|
|
|
2001-05-07 15:51:10 -04:00
|
|
|
|
This class will use a dictionary internally to contain set values.
|
|
|
|
|
In order to avoid having to duplicate values (e.g. for iteration
|
|
|
|
|
through the set), the class will rely on the iterators which are
|
|
|
|
|
scheduled to appear in Python 2.2.
|
2000-12-14 12:11:17 -05:00
|
|
|
|
|
2001-05-07 15:51:10 -04:00
|
|
|
|
Tim Peters believes that the class's constructor should take a
|
|
|
|
|
single sequence as an argument, and populate the set with that
|
|
|
|
|
sequence's elements. His argument is that in most cases,
|
|
|
|
|
programmers will be created sets from pre-existing sequences, so
|
|
|
|
|
that common case should be usual. However, this would require
|
|
|
|
|
users to remember an extra set of parentheses when initializing a
|
|
|
|
|
set with known values:
|
2000-11-26 23:10:06 -05:00
|
|
|
|
|
2001-05-07 15:51:10 -04:00
|
|
|
|
>>> Set((1, 2, 3, 4)) # case 1
|
2000-11-26 23:10:06 -05:00
|
|
|
|
|
2001-05-07 15:51:10 -04:00
|
|
|
|
On the other hand, feedback from a small number of novice Python
|
|
|
|
|
users (all of whom were very experienced with other languages)
|
|
|
|
|
indicates that people will find a "parenthesis-free" syntax more
|
|
|
|
|
natural:
|
2000-11-26 23:10:06 -05:00
|
|
|
|
|
2001-05-07 15:51:10 -04:00
|
|
|
|
>>> Set(1, 2, 3, 4) # case 2
|
2000-11-26 23:10:06 -05:00
|
|
|
|
|
2001-05-07 15:51:10 -04:00
|
|
|
|
On the other, other hand, if Python does adopt a dictionary-like
|
|
|
|
|
notation for sets in the future, then case 2 will become
|
|
|
|
|
redundant. We have therefore adopted the first strategy, in which
|
|
|
|
|
the initializer takes a single sequence argument.
|
2000-12-14 12:11:17 -05:00
|
|
|
|
|
|
|
|
|
|
2001-05-07 15:51:10 -04:00
|
|
|
|
Mutability
|
2000-12-14 12:11:17 -05:00
|
|
|
|
|
2001-05-07 15:51:10 -04:00
|
|
|
|
The most difficult question to resolve in this proposal was
|
|
|
|
|
whether sets ought to be able to contain mutable elements. A
|
|
|
|
|
dictionary's keys must be immutable in order to support fast,
|
|
|
|
|
reliable lookup. While it would be easy to require set elements
|
|
|
|
|
to be immutable, this would preclude sets of sets (which are
|
|
|
|
|
widely used in graph algorithms and other applications).
|
2000-12-14 12:11:17 -05:00
|
|
|
|
|
2001-05-07 15:51:10 -04:00
|
|
|
|
At Tim Peters' suggestion, we will implement the following
|
|
|
|
|
compromise. A set may only contain immutable elements, but is
|
|
|
|
|
itself mutable *until* its hash code is calculated. As soon as
|
|
|
|
|
that happens, the set is "frozen", i.e. becomes immutable. Thus,
|
|
|
|
|
a set may be used as a dictionary key, or as a set element, but
|
|
|
|
|
cannot be updated after this is done. Peters reports that this
|
|
|
|
|
behavior rarely causes problems in practice.
|
2000-12-14 12:11:17 -05:00
|
|
|
|
|
|
|
|
|
|
2000-11-26 23:10:06 -05:00
|
|
|
|
Alternatives
|
|
|
|
|
|
2001-05-07 15:51:10 -04:00
|
|
|
|
An alternative to the notation "{-}" for the empty set would be to
|
|
|
|
|
re-define "{}" to be the empty collection, rather than the empty
|
|
|
|
|
dictionary. Operations which made this object non-empty would
|
|
|
|
|
silently convert it to either a dictionary or a set; it would then
|
|
|
|
|
retain that type for the rest of its existence. This idea was
|
|
|
|
|
rejected because of its potential impact on existing Python
|
|
|
|
|
programs. A similar proposal to modify "dict.keys" and
|
2000-11-26 23:10:06 -05:00
|
|
|
|
"dict.values" to return sets, rather than lists, was rejected for
|
|
|
|
|
the same reasons.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Copyright
|
|
|
|
|
|
|
|
|
|
This document has been placed in the Public Domain.
|
2000-07-31 16:20:31 -04:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Local Variables:
|
|
|
|
|
mode: indented-text
|
|
|
|
|
indent-tabs-mode: nil
|
|
|
|
|
End:
|