212 lines
7.8 KiB
Plaintext
212 lines
7.8 KiB
Plaintext
PEP: 218
|
||
Title: Adding a Built-In Set Object Type
|
||
Version: $Revision$
|
||
Last-Modified: $Date$
|
||
Author: gvwilson@ddj.com (Greg Wilson), python@rcn.com (Raymond Hettinger)
|
||
Status: Final
|
||
Type: Standards Track
|
||
Created: 31-Jul-2000
|
||
Python-Version: 2.2
|
||
Post-History:
|
||
|
||
|
||
Introduction
|
||
|
||
This PEP proposes adding a Set module to the standard Python
|
||
library, and to then make sets a built-in Python type if that
|
||
module is widely used. After explaining why sets are desirable,
|
||
and why the common idiom of using dictionaries in their place is
|
||
inadequate, we describe how we intend built-in sets to work, and
|
||
then how the preliminary Set module will behave. The last
|
||
section discusses the mutability (or otherwise) of sets and set
|
||
elements, and the solution which the Set module will implement.
|
||
|
||
|
||
Rationale
|
||
|
||
Sets are a fundamental mathematical structure, and are very
|
||
commonly used in algorithm specifications. They are much less
|
||
frequently used in implementations, even when they are the "right"
|
||
structure. Programmers frequently use lists instead, even when
|
||
the ordering information in lists is irrelevant, and by-value
|
||
lookups are frequent. (Most medium-sized C programs contain a
|
||
depressing number of start-to-end searches through malloc'd
|
||
vectors to determine whether particular items are present or
|
||
not...)
|
||
|
||
Programmers are often told that they can implement sets as
|
||
dictionaries with "don't care" values. Items can be added to
|
||
these "sets" by assigning the "don't care" value to them;
|
||
membership can be tested using "dict.has_key"; and items can be
|
||
deleted using "del". However, the other main operations on sets
|
||
(union, intersection, and difference) are not directly supported
|
||
by this representation, since their meaning is ambiguous for
|
||
dictionaries containing key/value pairs.
|
||
|
||
|
||
Proposal
|
||
|
||
The long-term goal of this PEP is to add a built-in set type to
|
||
Python. This type will be an unordered collection of unique
|
||
values, just as a dictionary is an unordered collection of
|
||
key/value pairs.
|
||
|
||
Iteration and comprehension will be implemented in the obvious
|
||
ways, so that:
|
||
|
||
for x in S:
|
||
|
||
will step through the elements of S in arbitrary order, while:
|
||
|
||
set(x**2 for x in S)
|
||
|
||
will produce a set containing the squares of all elements in S,
|
||
Membership will be tested using "in" and "not in", and basic set
|
||
operations will be implemented by a mixture of overloaded
|
||
operators:
|
||
|
||
| union
|
||
& intersection
|
||
^ symmetric difference
|
||
- asymmetric difference
|
||
== != equality and inequality tests
|
||
< <= >= > subset and superset tests
|
||
|
||
|
||
and methods:
|
||
|
||
S.add(x) Add "x" to the set.
|
||
|
||
S.update(s) Add all elements of sequence "s" to the set.
|
||
|
||
S.remove(x) Remove "x" from the set. If "x" is not
|
||
present, this method raises a LookupError
|
||
exception.
|
||
|
||
S.discard(x) Remove "x" from the set if it is present, or
|
||
do nothing if it is not.
|
||
|
||
S.pop() Remove and return an arbitrary element,
|
||
raising a LookupError if the element is not
|
||
present.
|
||
|
||
S.clear() Remove all elements from this set.
|
||
|
||
S.copy() Make a new set.
|
||
|
||
s.issuperset() Check for a superset relationship.
|
||
|
||
s.issubset() Check for a subset relationship.
|
||
|
||
|
||
and two new built-in conversion functions:
|
||
|
||
set(x) Create a set containing the elements of the
|
||
collection "x".
|
||
|
||
frozenset(x) Create an immutable set containing the elements
|
||
of the collection "x".
|
||
|
||
Notes:
|
||
|
||
1. We propose using the bitwise operators "|&" for intersection
|
||
and union. While "+" for union would be intuitive, "*" for
|
||
intersection is not (very few of the people asked guessed what
|
||
it did correctly).
|
||
|
||
2. We considered using "+" to add elements to a set, rather than
|
||
"add". However, Guido van Rossum pointed out that "+" is
|
||
symmetric for other built-in types (although "*" is not). Use
|
||
of "add" will also avoid confusion between that operation and
|
||
set union.
|
||
|
||
|
||
Set Notation
|
||
|
||
The PEP originally proposed {1,2,3} as the set notation and {-} for
|
||
the empty set. Experience with Python 2.3's sets.py showed that
|
||
the notation was not necessary. Also, there was some risk of making
|
||
dictionaries less instantly recognizable.
|
||
|
||
It was also contemplated that the braced notation would support set
|
||
comprehensions; however, Python 2.4 provided generator expressions
|
||
which fully met that need and did so it a more general way.
|
||
(See PEP 289 for details on generator expressions).
|
||
|
||
So, Guido ruled that there would not be a set syntax; however, the
|
||
issue could be revisited for Python 3000 (see PEP 3000).
|
||
|
||
|
||
History
|
||
|
||
To gain experience with sets, a pure python module was introduced
|
||
in Python 2.3. Based on that implementation, the set and frozenset
|
||
types were introduced in Python 2.4. The improvements are:
|
||
|
||
* Better hash algorithm for frozensets
|
||
* More compact pickle format (storing only an element list
|
||
instead of a dictionary of key:value pairs where the value
|
||
is always True).
|
||
* Use a __reduce__ function so that deep copying is automatic.
|
||
* The BaseSet concept was eliminated.
|
||
* The union_update() method became just update().
|
||
* Auto-conversion between mutable and immutable sets was dropped.
|
||
* The _repr method was dropped (the need is met by the new
|
||
sorted() built-in function).
|
||
|
||
Tim Peters believes that the class's constructor should take a
|
||
single sequence as an argument, and populate the set with that
|
||
sequence's elements. His argument is that in most cases,
|
||
programmers will be creating sets from pre-existing sequences, so
|
||
that this case should be the common one. However, this would
|
||
require users to remember an extra set of parentheses when
|
||
initializing a set with known values:
|
||
|
||
>>> Set((1, 2, 3, 4)) # case 1
|
||
|
||
On the other hand, feedback from a small number of novice Python
|
||
users (all of whom were very experienced with other languages)
|
||
indicates that people will find a "parenthesis-free" syntax more
|
||
natural:
|
||
|
||
>>> Set(1, 2, 3, 4) # case 2
|
||
|
||
Ultimately, we adopted the first strategy in which the initializer
|
||
takes a single iterable argument.
|
||
|
||
|
||
Mutability
|
||
|
||
The most difficult question to resolve in this proposal was
|
||
whether sets ought to be able to contain mutable elements. A
|
||
dictionary's keys must be immutable in order to support fast,
|
||
reliable lookup. While it would be easy to require set elements
|
||
to be immutable, this would preclude sets of sets (which are
|
||
widely used in graph algorithms and other applications).
|
||
|
||
Earlier drafts of PEP 218 had only a single set type, but the
|
||
sets.py implementation in Python 2.3 has two, Set and
|
||
ImmutableSet. For Python 2.4, the new built-in types were named
|
||
set and frozenset which are slightly less cumbersome.
|
||
|
||
There are two classes implemented in the "sets" module. Instances
|
||
of the Set class can be modified by the addition or removal of
|
||
elements, and the ImmutableSet class is "frozen", with an
|
||
unchangeable collection of elements. Therefore, an ImmutableSet
|
||
may be used as a dictionary key or as a set element, but cannot be
|
||
updated. Both types of set require that their elements are
|
||
immutable, hashable objects. Parallel comments apply to the "set"
|
||
and "frozenset" built-in types.
|
||
|
||
|
||
Copyright
|
||
|
||
This document has been placed in the Public Domain.
|
||
|
||
|
||
|
||
Local Variables:
|
||
mode: indented-text
|
||
indent-tabs-mode: nil
|
||
End:
|