225 lines
7.7 KiB
Plaintext
225 lines
7.7 KiB
Plaintext
PEP: 218
|
|
Title: Adding a Built-In Set Object Type
|
|
Version: $Revision$
|
|
Last-Modified: $Date$
|
|
Author: gvwilson@ddj.com (Greg Wilson), python@rcn.com (Raymond Hettinger)
|
|
Status: Final
|
|
Type: Standards Track
|
|
Content-Type: text/x-rst
|
|
Created: 31-Jul-2000
|
|
Python-Version: 2.2
|
|
Post-History:
|
|
|
|
|
|
Introduction
|
|
============
|
|
|
|
This PEP proposes adding a Set module to the standard Python
|
|
library, and to then make sets a built-in Python type if that
|
|
module is widely used. After explaining why sets are desirable,
|
|
and why the common idiom of using dictionaries in their place is
|
|
inadequate, we describe how we intend built-in sets to work, and
|
|
then how the preliminary Set module will behave. The last
|
|
section discusses the mutability (or otherwise) of sets and set
|
|
elements, and the solution which the Set module will implement.
|
|
|
|
|
|
Rationale
|
|
=========
|
|
|
|
Sets are a fundamental mathematical structure, and are very
|
|
commonly used in algorithm specifications. They are much less
|
|
frequently used in implementations, even when they are the "right"
|
|
structure. Programmers frequently use lists instead, even when
|
|
the ordering information in lists is irrelevant, and by-value
|
|
lookups are frequent. (Most medium-sized C programs contain a
|
|
depressing number of start-to-end searches through malloc'd
|
|
vectors to determine whether particular items are present or
|
|
not...)
|
|
|
|
Programmers are often told that they can implement sets as
|
|
dictionaries with "don't care" values. Items can be added to
|
|
these "sets" by assigning the "don't care" value to them;
|
|
membership can be tested using "dict.has_key"; and items can be
|
|
deleted using "del". However, the other main operations on sets
|
|
(union, intersection, and difference) are not directly supported
|
|
by this representation, since their meaning is ambiguous for
|
|
dictionaries containing key/value pairs.
|
|
|
|
|
|
Proposal
|
|
========
|
|
|
|
The long-term goal of this PEP is to add a built-in set type to
|
|
Python. This type will be an unordered collection of unique
|
|
values, just as a dictionary is an unordered collection of
|
|
key/value pairs.
|
|
|
|
Iteration and comprehension will be implemented in the obvious
|
|
ways, so that::
|
|
|
|
for x in S:
|
|
|
|
will step through the elements of S in arbitrary order, while::
|
|
|
|
set(x**2 for x in S)
|
|
|
|
will produce a set containing the squares of all elements in S,
|
|
Membership will be tested using "in" and "not in", and basic set
|
|
operations will be implemented by a mixture of overloaded
|
|
operators:
|
|
|
|
========= =============================
|
|
\| union
|
|
& intersection
|
|
^ symmetric difference
|
|
\- asymmetric difference
|
|
== != equality and inequality tests
|
|
< <= >= > subset and superset tests
|
|
========= =============================
|
|
|
|
and methods:
|
|
|
|
================== ============================================
|
|
``S.add(x)`` Add "x" to the set.
|
|
|
|
``S.update(s)`` Add all elements of sequence "s" to the set.
|
|
|
|
``S.remove(x)`` Remove "x" from the set. If "x" is not
|
|
present, this method raises a ``LookupError``
|
|
exception.
|
|
|
|
``S.discard(x)`` Remove "x" from the set if it is present, or
|
|
do nothing if it is not.
|
|
|
|
``S.pop()`` Remove and return an arbitrary element,
|
|
raising a LookupError if the element is not
|
|
present.
|
|
|
|
``S.clear()`` Remove all elements from this set.
|
|
|
|
``S.copy()`` Make a new set.
|
|
|
|
``s.issuperset()`` Check for a superset relationship.
|
|
|
|
``s.issubset()`` Check for a subset relationship.
|
|
================== ============================================
|
|
|
|
and two new built-in conversion functions:
|
|
|
|
================ ===============================================
|
|
``set(x)`` Create a set containing the elements of the
|
|
collection "x".
|
|
|
|
``frozenset(x)`` Create an immutable set containing the elements
|
|
of the collection "x".
|
|
================ ===============================================
|
|
|
|
Notes:
|
|
|
|
1. We propose using the bitwise operators "\|\&" for intersection
|
|
and union. While "+" for union would be intuitive, "\*" for
|
|
intersection is not (very few of the people asked guessed what
|
|
it did correctly).
|
|
|
|
2. We considered using "+" to add elements to a set, rather than
|
|
"add". However, Guido van Rossum pointed out that "+" is
|
|
symmetric for other built-in types (although "\*" is not). Use
|
|
of "add" will also avoid confusion between that operation and
|
|
set union.
|
|
|
|
|
|
Set Notation
|
|
============
|
|
|
|
The PEP originally proposed {1,2,3} as the set notation and {-} for
|
|
the empty set. Experience with Python 2.3's sets.py showed that
|
|
the notation was not necessary. Also, there was some risk of making
|
|
dictionaries less instantly recognizable.
|
|
|
|
It was also contemplated that the braced notation would support set
|
|
comprehensions; however, Python 2.4 provided generator expressions
|
|
which fully met that need and did so it a more general way.
|
|
(See PEP 289 for details on generator expressions).
|
|
|
|
So, Guido ruled that there would not be a set syntax; however, the
|
|
issue could be revisited for Python 3000 (see PEP 3000).
|
|
|
|
|
|
History
|
|
=======
|
|
|
|
To gain experience with sets, a pure python module was introduced
|
|
in Python 2.3. Based on that implementation, the set and frozenset
|
|
types were introduced in Python 2.4. The improvements are:
|
|
|
|
* Better hash algorithm for frozensets
|
|
* More compact pickle format (storing only an element list
|
|
instead of a dictionary of key:value pairs where the value
|
|
is always True).
|
|
* Use a ``__reduce__`` function so that deep copying is automatic.
|
|
* The BaseSet concept was eliminated.
|
|
* The ``union_update()`` method became just ``update()``.
|
|
* Auto-conversion between mutable and immutable sets was dropped.
|
|
* The ``_repr`` method was dropped (the need is met by the new
|
|
``sorted()`` built-in function).
|
|
|
|
Tim Peters believes that the class's constructor should take a
|
|
single sequence as an argument, and populate the set with that
|
|
sequence's elements. His argument is that in most cases,
|
|
programmers will be creating sets from pre-existing sequences, so
|
|
that this case should be the common one. However, this would
|
|
require users to remember an extra set of parentheses when
|
|
initializing a set with known values::
|
|
|
|
>>> Set((1, 2, 3, 4)) # case 1
|
|
|
|
On the other hand, feedback from a small number of novice Python
|
|
users (all of whom were very experienced with other languages)
|
|
indicates that people will find a "parenthesis-free" syntax more
|
|
natural::
|
|
|
|
>>> Set(1, 2, 3, 4) # case 2
|
|
|
|
Ultimately, we adopted the first strategy in which the initializer
|
|
takes a single iterable argument.
|
|
|
|
|
|
Mutability
|
|
==========
|
|
|
|
The most difficult question to resolve in this proposal was
|
|
whether sets ought to be able to contain mutable elements. A
|
|
dictionary's keys must be immutable in order to support fast,
|
|
reliable lookup. While it would be easy to require set elements
|
|
to be immutable, this would preclude sets of sets (which are
|
|
widely used in graph algorithms and other applications).
|
|
|
|
Earlier drafts of PEP 218 had only a single set type, but the
|
|
sets.py implementation in Python 2.3 has two, Set and
|
|
ImmutableSet. For Python 2.4, the new built-in types were named
|
|
set and frozenset which are slightly less cumbersome.
|
|
|
|
There are two classes implemented in the "sets" module. Instances
|
|
of the Set class can be modified by the addition or removal of
|
|
elements, and the ImmutableSet class is "frozen", with an
|
|
unchangeable collection of elements. Therefore, an ImmutableSet
|
|
may be used as a dictionary key or as a set element, but cannot be
|
|
updated. Both types of set require that their elements are
|
|
immutable, hashable objects. Parallel comments apply to the "set"
|
|
and "frozenset" built-in types.
|
|
|
|
|
|
Copyright
|
|
=========
|
|
|
|
This document has been placed in the Public Domain.
|
|
|
|
|
|
|
|
..
|
|
Local Variables:
|
|
mode: indented-text
|
|
indent-tabs-mode: nil
|
|
End:
|