Update and finalize PEP 218 (builtin set types):

* List the additional methods and operators that are supported.
* List differences between sets.py and the built-in types.
* Mark the {-} syntax as rejected by Guido until Python 3000.
* Note that genexps make set comprehensions moot.
* Mark the pep as final and implemented.
This commit is contained in:
Raymond Hettinger 2004-08-27 20:28:58 +00:00
parent a584e413a0
commit d324b2fcff
2 changed files with 60 additions and 72 deletions

View File

@ -73,7 +73,6 @@ Index by Category
I 206 2.0 Batteries Included Zadka
S 209 Adding Multidimensional Arrays Barrett, Oliphant
S 215 String Interpolation Yee
S 218 Adding a Built-In Set Object Type Wilson
S 228 Reworking Python's Numeric Model Zadka, GvR
S 237 Unifying Long Integers and Integers Zadka, GvR
S 239 Adding a Rational Type to Python Craig, Zadka
@ -137,6 +136,7 @@ Index by Category
SF 208 Reworking the Coercion Model Schemenauer, Lemburg
SF 214 Extended Print Statement Warsaw
SF 217 Display Hook for Interactive Use Zadka
SF 218 Adding a Built-In Set Object Type Wilson, Hettinger
SF 221 Import As Wouters
SF 223 Change the Meaning of \x Escapes Peters
I 226 Python 2.1 Release Schedule Hylton
@ -247,7 +247,7 @@ Numerical Index
SD 215 String Interpolation Yee
IR 216 Docstring Format Zadka
SF 217 Display Hook for Interactive Use Zadka
S 218 Adding a Built-In Set Object Type Wilson
SF 218 Adding a Built-In Set Object Type Wilson, Hettinger
SD 219 Stackless Python McMillan
ID 220 Coroutines, Generators, Continuations McMillan
SF 221 Import As Wouters

View File

@ -1,8 +1,8 @@
PEP: 218
Title: Adding a Built-In Set Object Type
Version: $Revision$
Author: gvwilson@ddj.com (Greg Wilson)
Status: Draft
Author: gvwilson at ddj.com (Greg Wilson), python at rcn.com (Raymond Hettinger)
Status: Final
Type: Standards Track
Python-Version: 2.2
Created: 31-Jul-2000
@ -16,11 +16,9 @@ Introduction
module is widely used. After explaining why sets are desirable,
and why the common idiom of using dictionaries in their place is
inadequate, we describe how we intend built-in sets to work, and
then how the preliminary Set module will behave. The penultimate
then how the preliminary Set module will behave. The last
section discusses the mutability (or otherwise) of sets and set
elements, and the solution which the Set module will implement.
The last section then looks at alternatives that were considered,
but discarded.
Rationale
@ -45,21 +43,12 @@ Rationale
dictionaries containing key/value pairs.
Long-Term Proposal
Proposal
The long-term goal of this PEP is to add a built-in set type to
Python. This type will be an unordered collection of unique
values, just as a dictionary is an unordered collection of
key/value pairs. Constant sets will be represented using the
usual mathematical notation, so that "{1, 2, 3}" will be a set of
three integers.
In order to avoid ambiguity, the empty set will be written "{-}",
rather than "{}" (which is already used to represent empty
dictionaries). We feel that this notation is as reasonable as the
use of "(3,)" to represent single-element tuples; a more radical
strategy is discussed in the "Alternatives" section, and more
readable than the earlier proposal "{,}".
key/value pairs.
Iteration and comprehension will be implemented in the obvious
ways, so that:
@ -68,7 +57,7 @@ Long-Term Proposal
will step through the elements of S in arbitrary order, while:
{x**2 for x in S}
set(x**2 for x in S)
will produce a set containing the squares of all elements in S,
Membership will be tested using "in" and "not in", and basic set
@ -79,6 +68,9 @@ Long-Term Proposal
& intersection
^ symmetric difference
- asymmetric difference
== != equality and inequality tests
< <= >= > subset and superset tests
and methods:
@ -99,11 +91,21 @@ Long-Term Proposal
S.clear() Remove all elements from this set.
and one new built-in conversion function:
S.copy() Make a new set.
s.issuperset() Check for a superset relationship.
s.issubset() Check for a subset relationship.
and two new built-in conversion functions:
set(x) Create a set containing the elements of the
collection "x".
frozenset(x) Create an immutable set containing the elements
of the collection "x".
Notes:
1. We propose using the bitwise operators "|&" for intersection
@ -117,44 +119,39 @@ Long-Term Proposal
of "add" will also avoid confusion between that operation and
set union.
3. Sets raise "LookupError" exceptions, rather than "KeyError" or
"ValueError", because set elements are neither keys nor values.
Set Notation
The PEP originally proposed {1,2,3} as the set notation and {-} for
the empty set. Experience with Python 2.3's sets.py showed that
the notation was not necessary. Also, there was some risk of making
dictionaries less instantly recognizable.
It was also contemplated that the braced notation would support set
comprehensions; however, Python 2.4 provided generator expressions
which fully met that need and did so it a more general way.
(See PEP 289 for details on generator expressions).
So, Guido ruled that there would not be a set syntax; however, the
issue could be revisited for Python 3000 (see PEP 3000).
Open Issues for the Long-Term Proposal
History
Earlier drafts of PEP 218 had only a single set type, but the
sets.py implementation in Python 2.3 has two, Set and
ImmutableSet. The long-term proposal has a single built-in
conversion function, set(iterable); how should instances of a
built-in immutable set type be created? Possibilities include a
second immutable_set() built-in, or perhaps the set() function
could take an additional argument,
e.g. set(iterable, immutable=True)?
To gain experience with sets, a pure python module was introduced
in Python 2.3. Based on that implementation, the set and frozenset
types were introduced in Python 2.4. The improvements are:
The PEP proposes {1,2,3} as the set notation and {-} for the empty
set. Would there be different syntax for an immutable and a
mutable set? Perhaps the built-in syntax would only be for
mutable sets, and an immutable set would be created from a mutable
set using the appropriate built-in function,
e.g. immutable_set({1,2,3}).
Short-Term Proposal
In order to determine whether there is enough demand for sets to
justify making them a built-in type, and to give users a chance to
try out the semantics we propose for sets, our short-term proposal
is to add a "Set" class to the standard Python library. This
class will have the operators and methods described above; it will
also have named methods corresponding to all of the operations: a
"union" method for "|", and a "union_update" method for "|=", and
so on.
This class will use a dictionary internally to contain set values.
To avoid having to duplicate values (e.g. for iteration through
the set), the class will rely on the iterators added in Python
2.2.
* Better hash algorithm for frozensets
* More compact pickle format (storing only an element list
instead of a dictionary of key:value pairs where the value
is always True).
* Use a __reduce__ function so that deep copying is automatic.
* The BaseSet concept was eliminated.
* The union_update() method became just update().
* Auto-conversion between mutable and immutable sets was dropped.
* The _repr method was dropped (the need is met by the new
sorted() built-in function).
Tim Peters believes that the class's constructor should take a
single sequence as an argument, and populate the set with that
@ -173,10 +170,8 @@ Short-Term Proposal
>>> Set(1, 2, 3, 4) # case 2
On the other, other hand, if Python does adopt a dictionary-like
notation for sets in the future, then case 2 will become
redundant. We have therefore adopted the first strategy, in which
the initializer takes a single iterable argument.
Ultimately, we adopted the first strategy in which the initializer
takes a single iterable argument.
Mutability
@ -188,26 +183,19 @@ Mutability
to be immutable, this would preclude sets of sets (which are
widely used in graph algorithms and other applications).
Earlier drafts of PEP 218 had only a single set type, but the
sets.py implementation in Python 2.3 has two, Set and
ImmutableSet. For Python 2.4, the new built-in types were named
set and frozenset which are slightly less cumbersome.
There are two classes implemented in the "sets" module. Instances
of the Set class can be modified by the addition or removal of
elements, and the ImmutableSet class is "frozen", with an
unchangeable collection of elements. Therefore, an ImmutableSet
may be used as a dictionary key or as a set element, but cannot be
updated. Both types of set require that their elements are
immutable, hashable objects.
Alternatives
An alternative to the notation "{-}" for the empty set would be to
re-define "{}" to be the empty collection, rather than the empty
dictionary. Operations which made this object non-empty would
silently convert it to either a dictionary or a set; it would then
retain that type for the rest of its existence. This idea was
rejected because of its potential impact on existing Python
programs. A similar proposal to modify "dict.keys" and
"dict.values" to return sets, rather than lists, was rejected for
the same reasons.
immutable, hashable objects. Parallel comments apply to the "set"
and "frozenset" built-in types.
Copyright