Clarified lots of issues, added more open issues.

This commit is contained in:
Guido van Rossum 2006-12-23 05:22:30 +00:00
parent 6006113241
commit 9cf994cde3
1 changed files with 152 additions and 51 deletions

View File

@ -15,9 +15,10 @@ Abstract
This PEP proposes to change the .keys(), .values() and .items() This PEP proposes to change the .keys(), .values() and .items()
methods of the built-in dict type to return a set-like or methods of the built-in dict type to return a set-like or
multiset-like object whose contents are derived of the underlying multiset-like (== bag-like) object whose contents are derived of the
dictionary rather than a list which is a copy of the keys, etc.; and underlying dictionary rather than a list which is a copy of the keys,
to remove the .iterkeys(), .itervalues() and .iteritems() methods. etc.; and to remove the .iterkeys(), .itervalues() and .iteritems()
methods.
The approach is inspired by that taken in the Java Collections The approach is inspired by that taken in the Java Collections
Framework [1]_. Framework [1]_.
@ -31,40 +32,43 @@ lightweight object than a list, and to get rid of .iterkeys(),
.itervalues() and .iteritems(). The idea is that code that currently .itervalues() and .iteritems(). The idea is that code that currently
(in 2.x) reads:: (in 2.x) reads::
for x in d.iterkeys(): ... for k, v in d.iteritems(): ...
should be rewritten as:: should be rewritten as::
for x in d.keys(): ... for k, v in d.items(): ...
and code that currently reads:: (and similar for .itervalues() and .iterkeys(), except the latter is
redundant since we can write that loop as ``for k in d``.)
Code that currently reads::
a = d.keys() # assume we really want a list here a = d.keys() # assume we really want a list here
should be rewritten as (etc.) should be rewritten as
a = list(d.keys()) a = list(d.keys())
There are (at least) two ways to accomplish this. The original plan There are (at least) two ways to accomplish this. The original plan
was to simply let .keys(), .values() and .items() return an iterator, was to simply let .keys(), .values() and .items() return an iterator,
i.e. exactly what iterkeys(), itervalues() and iteritems() return i.e. exactly what iterkeys(), itervalues() and iteritems() return in
in Python 2.x. However, the Java Collections Framework [1]_ suggests Python 2.x. However, the Java Collections Framework [1]_ suggests
that a better solution is possible: the methods return objects with that a better solution is possible: the methods return objects with
set behavior (for .keys() and .items()) or multiset behavior (for set behavior (for .keys() and .items()) or multiset (== bag) behavior
.values()) that do not contain copies of the keys, values or items, (for .values()) that do not contain copies of the keys, values or
but rather reference the underlying dict and pull their values out of items, but rather reference the underlying dict and pull their values
the dict as needed. out of the dict as needed.
The advantage of this approach is that one can still write code like The advantage of this approach is that one can still write code like
this:: this::
a = d.keys() a = d.items()
for x in a: ... for k, v in a: ...
for x in a: ... for k, v in a: ...
Effectively, iter(d.keys()) in Python 3.0 does what d.iterkeys() does Effectively, iter(d.keys()) (etc.) in Python 3.0 will do what
in Python 2.x; but in most contexts we don't have to write the iter() d.iterkeys() (etc.) does in Python 2.x; but in most contexts we don't
call because it is implied by a for-loop. have to write the iter() call because it is implied by a for-loop.
The objects returned by the .keys() and .items() methods behave like The objects returned by the .keys() and .items() methods behave like
sets with limited mutability; they allow removing elements, but not sets with limited mutability; they allow removing elements, but not
@ -83,9 +87,9 @@ dicts have the same keys by simply testing::
if a.keys() == b.keys(): ... if a.keys() == b.keys(): ...
and similarly for values. (Two multisets are deemed equal if they and similarly for values. (Two multisets are deemed equal if they
have the same elements with the same cardinalities, have the same elements with the same cardinalities, e.g. the multiset
e.g. the multiset {1, 2, 2} is equal to the multiset {2, 1, 2} but {1, 2, 2} is equal to the multiset {2, 1, 2} but differs from the
differs from the multiset {1, 2}.) multiset {1, 2}.)
These operations are thread-safe only to the extent that using them in These operations are thread-safe only to the extent that using them in
a thread-unsafe way may cause an exception but will not cause a thread-unsafe way may cause an exception but will not cause
@ -108,11 +112,13 @@ simultaneously by another thread).
Specification Specification
============= =============
I'll try pseudo-code to specify the semantics:: I'm using pseudo-code to specify the semantics::
class dict: class dict:
# Omitting all other dict methods for brevity # Omitting all other dict methods for brevity.
# The .iterkeys(), .itervalues() and .iteritems() methods
# will be removed.
def keys(self): def keys(self):
return d_keys(self) return d_keys(self)
@ -151,9 +157,6 @@ I'll try pseudo-code to specify the semantics::
def clear(self): def clear(self):
self.__d.clear() self.__d.clear()
def copy(self):
return set(self)
# The following operations should be implemented to be # The following operations should be implemented to be
# compatible with sets; this can be done by exploiting # compatible with sets; this can be done by exploiting
# the above primitive operations: # the above primitive operations:
@ -163,6 +166,14 @@ I'll try pseudo-code to specify the semantics::
# &=, -= (updating in place and returning self; but not |=, ^=) # &=, -= (updating in place and returning self; but not |=, ^=)
# #
# as well as their method counterparts (.union(), etc.). # as well as their method counterparts (.union(), etc.).
#
# To specify the semantics, we can specify x == y as:
#
# set(x) == set(y) if both x and y are d_keys instances
# set(x) == y if x is a d_keys instance
# x == set(y) if y is a d_keys instance
#
# and so on for all other operations.
class d_items: class d_items:
@ -180,9 +191,13 @@ I'll try pseudo-code to specify the semantics::
yield key, self.__d[key] yield key, self.__d[key]
def remove(self, (key, value)): def remove(self, (key, value)):
if (key, value) not in self:
raise KeyError((key, value))
del self.__d[key] del self.__d[key]
def discard(self, item): def discard(self, item):
# Defined in terms of 'in' and .remove() so overriding
# those will update discard appropriately.
if item in self: if item in self:
self.remove(item) self.remove(item)
@ -192,10 +207,55 @@ I'll try pseudo-code to specify the semantics::
def clear(self): def clear(self):
self.__d.clear() self.__d.clear()
def copy(self): # As well as the set operations mentioned for d_keys above.
return set(self) # However the specifications suggested there will not work if
# the values aren't hashable. Fortunately, the operations can
# still be implemented efficiently. For example, this is how
# intersection can be specified:
# As well as the same set operations as mentioned for d_keys above. def __and__(self, other):
if isinstance(other, (set, frozenset, d_keys)):
result = set()
for item in other:
if item in self:
result.add(item)
return result
if not isinstance(other, d_items):
return NotImplemented
d = {}
if len(other) < len(self):
self, other = other, self
for item in self:
if item in other:
key, value = item
d[key] = value
return d.items()
# And here is equality:
def __eq__(self, other):
if isinstance(other, (set, frozenset, d_keys)):
if len(self) != len(other):
return False
for item in other:
if item not in self:
return False
return True
if not isinstance(other, d_items):
return NotImplemented
if len(self) != len(other):
return False
for item in self:
if item not in other:
return False
return True
def __ne__(self, other):
# XXX Perhaps object.__ne__() should be defined this way.
result = self.__eq__(other)
if result is not NotImplemented:
result = not result
return result
class d_values: class d_values:
@ -206,7 +266,9 @@ I'll try pseudo-code to specify the semantics::
return len(self.__d) return len(self.__d)
def __contains__(self, value): def __contains__(self, value):
# Slow! Do we even want to implement this? # This is slow, and it's what "x in y" uses as a fallback
# if __contains__ is not defined; but I'd rather make it
# explicit that it is supported.
for v in self: for v in self:
if v == value: if v == value:
return True return True
@ -216,32 +278,71 @@ I'll try pseudo-code to specify the semantics::
for key in self.__d: for key in self.__d:
yield self.__d[key] yield self.__d[key]
# Do we care about the following? def __eq__(self, other):
if not isinstance(other, d_values):
return NotImplemented
if len(self) != len(other):
return False
# XXX Sometimes this could be optimized, but these are the
# semantics: we can't depend on the values to be hashable
# or comparable.
o = list(other)
for x in self:
try:
o.remove(x)
except ValueError:
return False
return True
def pop(self): def __ne__(self, other):
return self.__d.popitem()[1] result = self.__eq__(other)
if result is not NotImplemented:
result = not result
return result
def clear(self): Note that we don't implement .copy() -- the presence of a .copy()
return self.__d.clear() method suggests that the copy has the same type as the original, but
that's not feasible without copying the underlying dict. If you want
def copy(self): a copy of a specific type, like list or set, you can just pass one
# XXX What should this return? of the above to the list() or set() constructor.
# Should we bother implementing set-like operations on
# multisets? If so, how about mixed operations on sets and
# multisets? I'm not sure that these are worth the effort.
I'm soliciting better names than d_keys, d_values and d_items; these
classes will be public so that their implementations may be reused by
the .keys(), .values() and .items() methods of other mappings. (Or
should they?)
Open Issues Open Issues
=========== ===========
Should the d_keys, d_values and d_items classes be reusable? Should I've left out the implementation of various set operations. These
they be subclassable? could still present surprises.
Should d_values have mutating methods (pop(), clear())? Strawman: no.
Should d_values implement set operations (as defined for multisets).
Strawman: no.
Should d_keys, d_values and d_items have a public instance variable or
method through which one can retrieve the underlying dict? Strawman:
yes (but what should it be called?).
I'm soliciting better names than d_keys, d_values and d_items. These
classes could be public so that their implementations could be reused
by the .keys(), .values() and .items() methods of other mappings. Or
should they?
Should the d_keys, d_values and d_items classes be reusable?
Strawman: yes.
Should they be subclassable? Strawman: yes (but see below).
A particularly nasty issue is whether operations that are specified in
terms of other operations (e.g. .discard()) must really be implemented
in terms of those other operations; this may appear irrelevant but it
becomes relevant if these classes are ever subclassed. Historically,
Python has a really poor track record of specifying the semantics of
highly optimized built-in types clearly in such cases; my strawman is
to continue that trend. Subclassing may still be useful to *add* new
methods, for example.
I'll leave the decisions (especially about naming) up to whoever
submits a working implementation.
References References