python-peps/pep-0218.txt

127 lines
4.2 KiB
Plaintext
Raw Normal View History

PEP: 218
Title: Adding a Built-In Set Object Type
Version: $Revision$
Author: gvwilson@nevex.com (Greg Wilson)
Status: Draft
2000-08-23 01:51:13 -04:00
Type: Standards Track
Python-Version: 2.1
Created: 31-Jul-2000
Post-History:
Introduction
This PEP proposes adding sets as a built-in type in Python.
Rationale
Sets are a fundamental mathematical structure, and are commonly
used to specify algorithms. They are much less frequently used in
implementations, even when they are the "right" structure.
Programmers frequently use lists instead, even when the ordering
information in lists is irrelevant, and by-value lookups are
frequent. (Most medium-sized C programs contain a depressing
number of start-to-end searches through malloc'd vectors to
determine whether particular items are present or not...)
Programmers are often told that they can implement sets as
dictionaries with "don't care" values. Items can be added to
these "sets" by assigning the "don't care" value to them;
membership can be tested using "dict.has_key"; and items can be
deleted using "del". However, the three main binary operations
on sets --- union, intersection, and difference --- are not
directly supported by this representation, since their meaning is
ambiguous for dictionaries containing key/value pairs.
Proposal
We propose adding a new built-in type to Python to represent sets.
This type will be an unordered collection of unique values, just
as a dictionary is an unordered collection of key/value pairs.
Constant sets will be represented using the usual mathematical
notation, so that "{1, 2, 3}" will be a set of three integers.
In order to avoid ambiguity, the empty set will be written "{,}",
rather than "{}" (which is already used to represent empty
dictionaries). We feel that this notation is as reasonable as the
use of "(3,)" to represent single-element tuples; a more radical
alternative is discussed in the "Alternatives" section.
Iteration and comprehension will be implemented in the obvious
ways, so that:
for x in S:
will step through the elements of S in arbitrary order, while:
{x**2 for x in S}
will produce a set containing the squares of all elements in S,
Membership will be tested using "in" and "not in".
The binary operators '|', '&', '-', and "^" will implement set
union, intersection, difference, and symmetric difference. Their
in-place equivalents will have the obvious semantics.
The method "add" will add an element to a set. This is different
from set union, as the following example shows:
>>> {1, 2, 3} | {4, 5, 6}
{1, 2, 3, 4, 5, 6}
>>> {1, 2, 3}.add({4, 5, 6})
{1, 2, 3, {4, 5, 6}}
Note that we expect that items can also be added to sets using
in-place union of temporaries, i.e. "S |= {x}" instead of
"S.add(x)".
Elements will be deleted from sets using a "remove" method, or
using "del":
>>> S = {1, 2, 3}
>>> del S[1]
>>> S
{2, 3}
>>> S.remove(3)
{2}
The "KeyError" exception will be raised if an attempt is made to
remove an element which is not in a set.
A new method "dict.keyset" will return the keys of a dictionary as
a set. A corresponding method "dict.valueset" will return the
dictionary's values as a set.
A built-in converter "set()" will convert any sequence type to a
set; converters such as "list()" and "tuple()" will be extended to
handle sets as input.
Alternatives
A radical alternative to the (admittedly clumsy) notation "{,}" is
to re-define "{}" to be the empty collection, rather than the
empty dictionary. Operations which made this object non-empty
would silently convert it to either a dictionary or a set; it
would then retain that type for the rest of its existence. This
idea was rejected because of its potential impact on existing
Python programs. A similar proposal to modify "dict.keys" and
"dict.values" to return sets, rather than lists, was rejected for
the same reasons.
Copyright
This document has been placed in the Public Domain.
Local Variables:
mode: indented-text
indent-tabs-mode: nil
End: