Updating PEP to reflect prototype implementation
This commit is contained in:
parent
9ba47180ba
commit
125ca2410c
212
pep-0218.txt
212
pep-0218.txt
|
@ -1,27 +1,30 @@
|
|||
PEP: 218
|
||||
Title: Adding a Built-In Set Object Type
|
||||
Version: $Revision$
|
||||
Author: gvwilson@nevex.com (Greg Wilson)
|
||||
Author: gvwilson@ddj.com (Greg Wilson)
|
||||
Status: Draft
|
||||
Type: Standards Track
|
||||
Python-Version: 2.1
|
||||
Python-Version: 2.2
|
||||
Created: 31-Jul-2000
|
||||
Post-History:
|
||||
|
||||
|
||||
Introduction
|
||||
|
||||
This PEP proposes adding sets as a built-in type in Python.
|
||||
This PEP proposes adding a Set module to the standard Python
|
||||
library, and to then make sets a built-in Python type if that
|
||||
module is widely used. After explaining why sets are desirable,
|
||||
and why the common idiom of using dictionaries in their place is
|
||||
inadequate, we describe how we intend built-in sets to work, and
|
||||
then how the preliminary Set module will behave. The penultimate
|
||||
section discusses the mutability (or otherwise) of sets and set
|
||||
elements, and the solution which the Set module will implement.
|
||||
The last section then looks at alternatives that were considered,
|
||||
but discarded.
|
||||
|
||||
|
||||
Rationale
|
||||
|
||||
One of Python's greatest strengths as a teaching language is its
|
||||
clarity. Its syntax and object model are so clean, and so simple,
|
||||
that it can serve as "executable pseudocode". Anything that makes
|
||||
it even better suited for this role will help increase its use in
|
||||
school and college courses.
|
||||
|
||||
Sets are a fundamental mathematical structure, and are very
|
||||
commonly used in algorithm specifications. They are much less
|
||||
frequently used in implementations, even when they are the "right"
|
||||
|
@ -42,19 +45,21 @@ Rationale
|
|||
dictionaries containing key/value pairs.
|
||||
|
||||
|
||||
Proposal
|
||||
Long-Term Proposal
|
||||
|
||||
We propose adding a set type to Python. This type will be an
|
||||
unordered collection of unique values, just as a dictionary is an
|
||||
unordered collection of key/value pairs. Constant sets will be
|
||||
represented using the usual mathematical notation, so that
|
||||
"{1, 2, 3}" will be a set of three integers.
|
||||
The long-term goal of this PEP is to add a built-in set type to
|
||||
Python. This type will be an unordered collection of unique
|
||||
values, just as a dictionary is an unordered collection of
|
||||
key/value pairs. Constant sets will be represented using the
|
||||
usual mathematical notation, so that "{1, 2, 3}" will be a set of
|
||||
three integers.
|
||||
|
||||
In order to avoid ambiguity, the empty set will be written "{,}",
|
||||
In order to avoid ambiguity, the empty set will be written "{-}",
|
||||
rather than "{}" (which is already used to represent empty
|
||||
dictionaries). We feel that this notation is as reasonable as the
|
||||
use of "(3,)" to represent single-element tuples; a more radical
|
||||
strategy is discussed in the "Alternatives" section.
|
||||
strategy is discussed in the "Alternatives" section, and more
|
||||
readable than the earlier proposal "{,}".
|
||||
|
||||
Iteration and comprehension will be implemented in the obvious
|
||||
ways, so that:
|
||||
|
@ -66,106 +71,119 @@ Proposal
|
|||
{x**2 for x in S}
|
||||
|
||||
will produce a set containing the squares of all elements in S,
|
||||
Membership will be tested using "in" and "not in", and basic set
|
||||
operations will be implemented by a mixture of overloaded
|
||||
operators:
|
||||
|
||||
Membership will be tested using "in" and "not in".
|
||||
| union
|
||||
& intersection
|
||||
^ symmetric difference
|
||||
- asymmetric difference
|
||||
|
||||
The binary operators '|', '&', '-', and "^" will implement set
|
||||
union, intersection, difference, and symmetric difference. Their
|
||||
in-place equivalents will have the obvious semantics. (We feel
|
||||
that it is more sensible to overload the bitwise operators '|' and
|
||||
'&', rather than the arithmetic operators '+' and "*', because
|
||||
there is no arithmetic equivalent of '^'.)
|
||||
and methods:
|
||||
|
||||
The method "add" will add an element to a set. This is different
|
||||
from set union, as the following example shows:
|
||||
S.add(x) Add "x" to the set.
|
||||
|
||||
>>> {1, 2, 3} | {4, 5, 6}
|
||||
{1, 2, 3, 4, 5, 6}
|
||||
S.update(s) Add all elements of sequence "s" to the set.
|
||||
|
||||
>>> {1, 2, 3}.add({4, 5, 6})
|
||||
{1, 2, 3, {4, 5, 6}}
|
||||
S.remove(x) Remove "x" from the set. If "x" is not
|
||||
present, this method raises a LookupError
|
||||
exception.
|
||||
|
||||
Note that we expect that items can also be added to sets using
|
||||
in-place union of temporaries, i.e. "S |= {x}" instead of
|
||||
"S.add(x)".
|
||||
S.discard(x) Remove "x" from the set if it is present, or
|
||||
do nothing if it is not.
|
||||
|
||||
Elements will be deleted from sets using a "remove" method, or
|
||||
using "del":
|
||||
S.popitem() Remove and return an arbitrary element,
|
||||
raising a LookupError if the element is not
|
||||
present.
|
||||
|
||||
>>> S = {1, 2, 3}
|
||||
>>> S.remove(3)
|
||||
>>> S
|
||||
{1, 2}
|
||||
>>> del S[1]
|
||||
>>> S
|
||||
{2}
|
||||
and one new built-in conversion function:
|
||||
|
||||
The "KeyError" exception will be raised if an attempt is made to
|
||||
remove an element which is not in a set. This definition of "del"
|
||||
is consistent with that used for dictionaries:
|
||||
set(x) Create a set containing the elements of the
|
||||
collection "x".
|
||||
|
||||
>>> D = {1:2, 3:4}
|
||||
>>> del D[1]
|
||||
>>> D
|
||||
{3:4}
|
||||
Notes:
|
||||
|
||||
A new method "dict.keyset" will return the keys of a dictionary as
|
||||
a set. A corresponding method "dict.valueset" will return the
|
||||
dictionary's values as a set.
|
||||
1. We propose using the bitwise operators "|&" for intersection
|
||||
and union. While "+" for union would be intuitive, "*" for
|
||||
intersection is not (very few of the people asked guessed what
|
||||
it did correctly).
|
||||
|
||||
A built-in converter "set()" will convert any sequence type to a
|
||||
set; converters such as "list()" and "tuple()" will be extended to
|
||||
handle sets as input.
|
||||
2. We considered using "+" to add elements to a set, rather than
|
||||
"add". However, Guido van Rossum pointed out that "+" is
|
||||
symmetric for other built-in types (although "*" is not). Use
|
||||
of "add" will also avoid confusion between that operation and
|
||||
set union.
|
||||
|
||||
3. Sets raise "LookupError" exceptions, rather than "KeyError" or
|
||||
"ValueError", because set elements are neither keys nor values.
|
||||
|
||||
Short-Term Proposal
|
||||
|
||||
In order to determine whether there is enough demand for sets to
|
||||
justify making them a built-in type, and to give users a chance to
|
||||
try out the semantics we propose for sets, our short-term proposal
|
||||
is to add a "Set" class to the standard Python library. This
|
||||
class will have the operators and methods described above; it will
|
||||
also have named methods corresponding to all of the operations: a
|
||||
"union" method for "|", and a "union_update" method for "|=", and
|
||||
so on.
|
||||
|
||||
This class will use a dictionary internally to contain set values.
|
||||
In order to avoid having to duplicate values (e.g. for iteration
|
||||
through the set), the class will rely on the iterators which are
|
||||
scheduled to appear in Python 2.2.
|
||||
|
||||
Tim Peters believes that the class's constructor should take a
|
||||
single sequence as an argument, and populate the set with that
|
||||
sequence's elements. His argument is that in most cases,
|
||||
programmers will be created sets from pre-existing sequences, so
|
||||
that common case should be usual. However, this would require
|
||||
users to remember an extra set of parentheses when initializing a
|
||||
set with known values:
|
||||
|
||||
>>> Set((1, 2, 3, 4)) # case 1
|
||||
|
||||
On the other hand, feedback from a small number of novice Python
|
||||
users (all of whom were very experienced with other languages)
|
||||
indicates that people will find a "parenthesis-free" syntax more
|
||||
natural:
|
||||
|
||||
>>> Set(1, 2, 3, 4) # case 2
|
||||
|
||||
On the other, other hand, if Python does adopt a dictionary-like
|
||||
notation for sets in the future, then case 2 will become
|
||||
redundant. We have therefore adopted the first strategy, in which
|
||||
the initializer takes a single sequence argument.
|
||||
|
||||
|
||||
Open Issues
|
||||
Mutability
|
||||
|
||||
One major issue remains to be resolved: will sets be allowed to
|
||||
contain mutable values, or will their values be required to
|
||||
immutable (as dictionary keys are)? The disadvantages of allowing
|
||||
only immutable values are clear --- if nothing else, it would
|
||||
prevent users from creating sets of sets.
|
||||
The most difficult question to resolve in this proposal was
|
||||
whether sets ought to be able to contain mutable elements. A
|
||||
dictionary's keys must be immutable in order to support fast,
|
||||
reliable lookup. While it would be easy to require set elements
|
||||
to be immutable, this would preclude sets of sets (which are
|
||||
widely used in graph algorithms and other applications).
|
||||
|
||||
However, no efficient implementation of sets of mutable values has
|
||||
yet been suggested. Hashing approaches will obviously fail (which
|
||||
is why mutable values are not allowed to be dictionary keys).
|
||||
Even simple-minded implementations, such as storing the set's
|
||||
values in a list, can give incorrect results, as the following
|
||||
example shows:
|
||||
|
||||
>>> a = [1, 2]
|
||||
>>> b = [3, 4]
|
||||
>>> S = [a, b]
|
||||
>>> a[0:2] = [3, 4]
|
||||
>>> S
|
||||
[[3, 4], [3, 4]]
|
||||
|
||||
One way to solve this problem would be to add observer/observable
|
||||
functionality to every data structure in Python, so that
|
||||
structures would know to update themselves when their contained
|
||||
values mutated. This is clearly impractical given the current
|
||||
code base, and the performance penalties (in both memory and
|
||||
execution time) would probably be unacceptable anyway.
|
||||
At Tim Peters' suggestion, we will implement the following
|
||||
compromise. A set may only contain immutable elements, but is
|
||||
itself mutable *until* its hash code is calculated. As soon as
|
||||
that happens, the set is "frozen", i.e. becomes immutable. Thus,
|
||||
a set may be used as a dictionary key, or as a set element, but
|
||||
cannot be updated after this is done. Peters reports that this
|
||||
behavior rarely causes problems in practice.
|
||||
|
||||
|
||||
Alternatives
|
||||
|
||||
A more conservative alternative to this proposal would be to add a
|
||||
new built-in class "Set", rather than adding new syntax for direct
|
||||
expression of sets. On the positive side, this would not require
|
||||
any changes to the Python language definition. On the negative
|
||||
side, people would then not be able to write Python programs using
|
||||
the same notation as they would use on a whiteboard. We feel that
|
||||
the more Python supports standard pre-existing notation, the
|
||||
greater the chances of it being adopted as a teaching language.
|
||||
|
||||
A radical alternative to the (admittedly clumsy) notation "{,}" is
|
||||
to re-define "{}" to be the empty collection, rather than the
|
||||
empty dictionary. Operations which made this object non-empty
|
||||
would silently convert it to either a dictionary or a set; it
|
||||
would then retain that type for the rest of its existence. This
|
||||
idea was rejected because of its potential impact on existing
|
||||
Python programs. A similar proposal to modify "dict.keys" and
|
||||
An alternative to the notation "{-}" for the empty set would be to
|
||||
re-define "{}" to be the empty collection, rather than the empty
|
||||
dictionary. Operations which made this object non-empty would
|
||||
silently convert it to either a dictionary or a set; it would then
|
||||
retain that type for the rest of its existence. This idea was
|
||||
rejected because of its potential impact on existing Python
|
||||
programs. A similar proposal to modify "dict.keys" and
|
||||
"dict.values" to return sets, rather than lists, was rejected for
|
||||
the same reasons.
|
||||
|
||||
|
|
Loading…
Reference in New Issue