From 9cf994cde3555c56ab5c889c8affedbcbf7b4ab8 Mon Sep 17 00:00:00 2001 From: Guido van Rossum Date: Sat, 23 Dec 2006 05:22:30 +0000 Subject: [PATCH] Clarified lots of issues, added more open issues. --- pep-3106.txt | 203 ++++++++++++++++++++++++++++++++++++++------------- 1 file changed, 152 insertions(+), 51 deletions(-) diff --git a/pep-3106.txt b/pep-3106.txt index 982625da6..c2410b841 100644 --- a/pep-3106.txt +++ b/pep-3106.txt @@ -15,9 +15,10 @@ Abstract This PEP proposes to change the .keys(), .values() and .items() methods of the built-in dict type to return a set-like or -multiset-like object whose contents are derived of the underlying -dictionary rather than a list which is a copy of the keys, etc.; and -to remove the .iterkeys(), .itervalues() and .iteritems() methods. +multiset-like (== bag-like) object whose contents are derived of the +underlying dictionary rather than a list which is a copy of the keys, +etc.; and to remove the .iterkeys(), .itervalues() and .iteritems() +methods. The approach is inspired by that taken in the Java Collections Framework [1]_. @@ -31,40 +32,43 @@ lightweight object than a list, and to get rid of .iterkeys(), .itervalues() and .iteritems(). The idea is that code that currently (in 2.x) reads:: - for x in d.iterkeys(): ... + for k, v in d.iteritems(): ... should be rewritten as:: - for x in d.keys(): ... + for k, v in d.items(): ... -and code that currently reads:: +(and similar for .itervalues() and .iterkeys(), except the latter is +redundant since we can write that loop as ``for k in d``.) + +Code that currently reads:: a = d.keys() # assume we really want a list here -should be rewritten as +(etc.) should be rewritten as a = list(d.keys()) There are (at least) two ways to accomplish this. The original plan was to simply let .keys(), .values() and .items() return an iterator, -i.e. exactly what iterkeys(), itervalues() and iteritems() return -in Python 2.x. However, the Java Collections Framework [1]_ suggests +i.e. exactly what iterkeys(), itervalues() and iteritems() return in +Python 2.x. However, the Java Collections Framework [1]_ suggests that a better solution is possible: the methods return objects with -set behavior (for .keys() and .items()) or multiset behavior (for -.values()) that do not contain copies of the keys, values or items, -but rather reference the underlying dict and pull their values out of -the dict as needed. +set behavior (for .keys() and .items()) or multiset (== bag) behavior +(for .values()) that do not contain copies of the keys, values or +items, but rather reference the underlying dict and pull their values +out of the dict as needed. The advantage of this approach is that one can still write code like this:: - a = d.keys() - for x in a: ... - for x in a: ... + a = d.items() + for k, v in a: ... + for k, v in a: ... -Effectively, iter(d.keys()) in Python 3.0 does what d.iterkeys() does -in Python 2.x; but in most contexts we don't have to write the iter() -call because it is implied by a for-loop. +Effectively, iter(d.keys()) (etc.) in Python 3.0 will do what +d.iterkeys() (etc.) does in Python 2.x; but in most contexts we don't +have to write the iter() call because it is implied by a for-loop. The objects returned by the .keys() and .items() methods behave like sets with limited mutability; they allow removing elements, but not @@ -83,9 +87,9 @@ dicts have the same keys by simply testing:: if a.keys() == b.keys(): ... and similarly for values. (Two multisets are deemed equal if they -have the same elements with the same cardinalities, -e.g. the multiset {1, 2, 2} is equal to the multiset {2, 1, 2} but -differs from the multiset {1, 2}.) +have the same elements with the same cardinalities, e.g. the multiset +{1, 2, 2} is equal to the multiset {2, 1, 2} but differs from the +multiset {1, 2}.) These operations are thread-safe only to the extent that using them in a thread-unsafe way may cause an exception but will not cause @@ -108,11 +112,13 @@ simultaneously by another thread). Specification ============= -I'll try pseudo-code to specify the semantics:: +I'm using pseudo-code to specify the semantics:: class dict: - # Omitting all other dict methods for brevity + # Omitting all other dict methods for brevity. + # The .iterkeys(), .itervalues() and .iteritems() methods + # will be removed. def keys(self): return d_keys(self) @@ -151,18 +157,23 @@ I'll try pseudo-code to specify the semantics:: def clear(self): self.__d.clear() - def copy(self): - return set(self) - # The following operations should be implemented to be # compatible with sets; this can be done by exploiting - # the above primitive operations: + # the above primitive operations: # # <, <=, ==, !=, >=, > (returning a bool) # &, |, ^, - (returning a new, real set object) # &=, -= (updating in place and returning self; but not |=, ^=) # # as well as their method counterparts (.union(), etc.). + # + # To specify the semantics, we can specify x == y as: + # + # set(x) == set(y) if both x and y are d_keys instances + # set(x) == y if x is a d_keys instance + # x == set(y) if y is a d_keys instance + # + # and so on for all other operations. class d_items: @@ -180,9 +191,13 @@ I'll try pseudo-code to specify the semantics:: yield key, self.__d[key] def remove(self, (key, value)): + if (key, value) not in self: + raise KeyError((key, value)) del self.__d[key] def discard(self, item): + # Defined in terms of 'in' and .remove() so overriding + # those will update discard appropriately. if item in self: self.remove(item) @@ -192,10 +207,55 @@ I'll try pseudo-code to specify the semantics:: def clear(self): self.__d.clear() - def copy(self): - return set(self) + # As well as the set operations mentioned for d_keys above. + # However the specifications suggested there will not work if + # the values aren't hashable. Fortunately, the operations can + # still be implemented efficiently. For example, this is how + # intersection can be specified: - # As well as the same set operations as mentioned for d_keys above. + def __and__(self, other): + if isinstance(other, (set, frozenset, d_keys)): + result = set() + for item in other: + if item in self: + result.add(item) + return result + if not isinstance(other, d_items): + return NotImplemented + d = {} + if len(other) < len(self): + self, other = other, self + for item in self: + if item in other: + key, value = item + d[key] = value + return d.items() + + # And here is equality: + + def __eq__(self, other): + if isinstance(other, (set, frozenset, d_keys)): + if len(self) != len(other): + return False + for item in other: + if item not in self: + return False + return True + if not isinstance(other, d_items): + return NotImplemented + if len(self) != len(other): + return False + for item in self: + if item not in other: + return False + return True + + def __ne__(self, other): + # XXX Perhaps object.__ne__() should be defined this way. + result = self.__eq__(other) + if result is not NotImplemented: + result = not result + return result class d_values: @@ -206,7 +266,9 @@ I'll try pseudo-code to specify the semantics:: return len(self.__d) def __contains__(self, value): - # Slow! Do we even want to implement this? + # This is slow, and it's what "x in y" uses as a fallback + # if __contains__ is not defined; but I'd rather make it + # explicit that it is supported. for v in self: if v == value: return True @@ -216,32 +278,71 @@ I'll try pseudo-code to specify the semantics:: for key in self.__d: yield self.__d[key] - # Do we care about the following? + def __eq__(self, other): + if not isinstance(other, d_values): + return NotImplemented + if len(self) != len(other): + return False + # XXX Sometimes this could be optimized, but these are the + # semantics: we can't depend on the values to be hashable + # or comparable. + o = list(other) + for x in self: + try: + o.remove(x) + except ValueError: + return False + return True - def pop(self): - return self.__d.popitem()[1] + def __ne__(self, other): + result = self.__eq__(other) + if result is not NotImplemented: + result = not result + return result - def clear(self): - return self.__d.clear() - - def copy(self): - # XXX What should this return? - - # Should we bother implementing set-like operations on - # multisets? If so, how about mixed operations on sets and - # multisets? I'm not sure that these are worth the effort. - -I'm soliciting better names than d_keys, d_values and d_items; these -classes will be public so that their implementations may be reused by -the .keys(), .values() and .items() methods of other mappings. (Or -should they?) +Note that we don't implement .copy() -- the presence of a .copy() +method suggests that the copy has the same type as the original, but +that's not feasible without copying the underlying dict. If you want +a copy of a specific type, like list or set, you can just pass one +of the above to the list() or set() constructor. Open Issues =========== -Should the d_keys, d_values and d_items classes be reusable? Should -they be subclassable? +I've left out the implementation of various set operations. These +could still present surprises. + +Should d_values have mutating methods (pop(), clear())? Strawman: no. + +Should d_values implement set operations (as defined for multisets). +Strawman: no. + +Should d_keys, d_values and d_items have a public instance variable or +method through which one can retrieve the underlying dict? Strawman: +yes (but what should it be called?). + +I'm soliciting better names than d_keys, d_values and d_items. These +classes could be public so that their implementations could be reused +by the .keys(), .values() and .items() methods of other mappings. Or +should they? + +Should the d_keys, d_values and d_items classes be reusable? +Strawman: yes. + +Should they be subclassable? Strawman: yes (but see below). + +A particularly nasty issue is whether operations that are specified in +terms of other operations (e.g. .discard()) must really be implemented +in terms of those other operations; this may appear irrelevant but it +becomes relevant if these classes are ever subclassed. Historically, +Python has a really poor track record of specifying the semantics of +highly optimized built-in types clearly in such cases; my strawman is +to continue that trend. Subclassing may still be useful to *add* new +methods, for example. + +I'll leave the decisions (especially about naming) up to whoever +submits a working implementation. References