Clarified lots of issues, added more open issues.
This commit is contained in:
parent
6006113241
commit
9cf994cde3
201
pep-3106.txt
201
pep-3106.txt
|
@ -15,9 +15,10 @@ Abstract
|
|||
|
||||
This PEP proposes to change the .keys(), .values() and .items()
|
||||
methods of the built-in dict type to return a set-like or
|
||||
multiset-like object whose contents are derived of the underlying
|
||||
dictionary rather than a list which is a copy of the keys, etc.; and
|
||||
to remove the .iterkeys(), .itervalues() and .iteritems() methods.
|
||||
multiset-like (== bag-like) object whose contents are derived of the
|
||||
underlying dictionary rather than a list which is a copy of the keys,
|
||||
etc.; and to remove the .iterkeys(), .itervalues() and .iteritems()
|
||||
methods.
|
||||
|
||||
The approach is inspired by that taken in the Java Collections
|
||||
Framework [1]_.
|
||||
|
@ -31,40 +32,43 @@ lightweight object than a list, and to get rid of .iterkeys(),
|
|||
.itervalues() and .iteritems(). The idea is that code that currently
|
||||
(in 2.x) reads::
|
||||
|
||||
for x in d.iterkeys(): ...
|
||||
for k, v in d.iteritems(): ...
|
||||
|
||||
should be rewritten as::
|
||||
|
||||
for x in d.keys(): ...
|
||||
for k, v in d.items(): ...
|
||||
|
||||
and code that currently reads::
|
||||
(and similar for .itervalues() and .iterkeys(), except the latter is
|
||||
redundant since we can write that loop as ``for k in d``.)
|
||||
|
||||
Code that currently reads::
|
||||
|
||||
a = d.keys() # assume we really want a list here
|
||||
|
||||
should be rewritten as
|
||||
(etc.) should be rewritten as
|
||||
|
||||
a = list(d.keys())
|
||||
|
||||
There are (at least) two ways to accomplish this. The original plan
|
||||
was to simply let .keys(), .values() and .items() return an iterator,
|
||||
i.e. exactly what iterkeys(), itervalues() and iteritems() return
|
||||
in Python 2.x. However, the Java Collections Framework [1]_ suggests
|
||||
i.e. exactly what iterkeys(), itervalues() and iteritems() return in
|
||||
Python 2.x. However, the Java Collections Framework [1]_ suggests
|
||||
that a better solution is possible: the methods return objects with
|
||||
set behavior (for .keys() and .items()) or multiset behavior (for
|
||||
.values()) that do not contain copies of the keys, values or items,
|
||||
but rather reference the underlying dict and pull their values out of
|
||||
the dict as needed.
|
||||
set behavior (for .keys() and .items()) or multiset (== bag) behavior
|
||||
(for .values()) that do not contain copies of the keys, values or
|
||||
items, but rather reference the underlying dict and pull their values
|
||||
out of the dict as needed.
|
||||
|
||||
The advantage of this approach is that one can still write code like
|
||||
this::
|
||||
|
||||
a = d.keys()
|
||||
for x in a: ...
|
||||
for x in a: ...
|
||||
a = d.items()
|
||||
for k, v in a: ...
|
||||
for k, v in a: ...
|
||||
|
||||
Effectively, iter(d.keys()) in Python 3.0 does what d.iterkeys() does
|
||||
in Python 2.x; but in most contexts we don't have to write the iter()
|
||||
call because it is implied by a for-loop.
|
||||
Effectively, iter(d.keys()) (etc.) in Python 3.0 will do what
|
||||
d.iterkeys() (etc.) does in Python 2.x; but in most contexts we don't
|
||||
have to write the iter() call because it is implied by a for-loop.
|
||||
|
||||
The objects returned by the .keys() and .items() methods behave like
|
||||
sets with limited mutability; they allow removing elements, but not
|
||||
|
@ -83,9 +87,9 @@ dicts have the same keys by simply testing::
|
|||
if a.keys() == b.keys(): ...
|
||||
|
||||
and similarly for values. (Two multisets are deemed equal if they
|
||||
have the same elements with the same cardinalities,
|
||||
e.g. the multiset {1, 2, 2} is equal to the multiset {2, 1, 2} but
|
||||
differs from the multiset {1, 2}.)
|
||||
have the same elements with the same cardinalities, e.g. the multiset
|
||||
{1, 2, 2} is equal to the multiset {2, 1, 2} but differs from the
|
||||
multiset {1, 2}.)
|
||||
|
||||
These operations are thread-safe only to the extent that using them in
|
||||
a thread-unsafe way may cause an exception but will not cause
|
||||
|
@ -108,11 +112,13 @@ simultaneously by another thread).
|
|||
Specification
|
||||
=============
|
||||
|
||||
I'll try pseudo-code to specify the semantics::
|
||||
I'm using pseudo-code to specify the semantics::
|
||||
|
||||
class dict:
|
||||
|
||||
# Omitting all other dict methods for brevity
|
||||
# Omitting all other dict methods for brevity.
|
||||
# The .iterkeys(), .itervalues() and .iteritems() methods
|
||||
# will be removed.
|
||||
|
||||
def keys(self):
|
||||
return d_keys(self)
|
||||
|
@ -151,9 +157,6 @@ I'll try pseudo-code to specify the semantics::
|
|||
def clear(self):
|
||||
self.__d.clear()
|
||||
|
||||
def copy(self):
|
||||
return set(self)
|
||||
|
||||
# The following operations should be implemented to be
|
||||
# compatible with sets; this can be done by exploiting
|
||||
# the above primitive operations:
|
||||
|
@ -163,6 +166,14 @@ I'll try pseudo-code to specify the semantics::
|
|||
# &=, -= (updating in place and returning self; but not |=, ^=)
|
||||
#
|
||||
# as well as their method counterparts (.union(), etc.).
|
||||
#
|
||||
# To specify the semantics, we can specify x == y as:
|
||||
#
|
||||
# set(x) == set(y) if both x and y are d_keys instances
|
||||
# set(x) == y if x is a d_keys instance
|
||||
# x == set(y) if y is a d_keys instance
|
||||
#
|
||||
# and so on for all other operations.
|
||||
|
||||
class d_items:
|
||||
|
||||
|
@ -180,9 +191,13 @@ I'll try pseudo-code to specify the semantics::
|
|||
yield key, self.__d[key]
|
||||
|
||||
def remove(self, (key, value)):
|
||||
if (key, value) not in self:
|
||||
raise KeyError((key, value))
|
||||
del self.__d[key]
|
||||
|
||||
def discard(self, item):
|
||||
# Defined in terms of 'in' and .remove() so overriding
|
||||
# those will update discard appropriately.
|
||||
if item in self:
|
||||
self.remove(item)
|
||||
|
||||
|
@ -192,10 +207,55 @@ I'll try pseudo-code to specify the semantics::
|
|||
def clear(self):
|
||||
self.__d.clear()
|
||||
|
||||
def copy(self):
|
||||
return set(self)
|
||||
# As well as the set operations mentioned for d_keys above.
|
||||
# However the specifications suggested there will not work if
|
||||
# the values aren't hashable. Fortunately, the operations can
|
||||
# still be implemented efficiently. For example, this is how
|
||||
# intersection can be specified:
|
||||
|
||||
# As well as the same set operations as mentioned for d_keys above.
|
||||
def __and__(self, other):
|
||||
if isinstance(other, (set, frozenset, d_keys)):
|
||||
result = set()
|
||||
for item in other:
|
||||
if item in self:
|
||||
result.add(item)
|
||||
return result
|
||||
if not isinstance(other, d_items):
|
||||
return NotImplemented
|
||||
d = {}
|
||||
if len(other) < len(self):
|
||||
self, other = other, self
|
||||
for item in self:
|
||||
if item in other:
|
||||
key, value = item
|
||||
d[key] = value
|
||||
return d.items()
|
||||
|
||||
# And here is equality:
|
||||
|
||||
def __eq__(self, other):
|
||||
if isinstance(other, (set, frozenset, d_keys)):
|
||||
if len(self) != len(other):
|
||||
return False
|
||||
for item in other:
|
||||
if item not in self:
|
||||
return False
|
||||
return True
|
||||
if not isinstance(other, d_items):
|
||||
return NotImplemented
|
||||
if len(self) != len(other):
|
||||
return False
|
||||
for item in self:
|
||||
if item not in other:
|
||||
return False
|
||||
return True
|
||||
|
||||
def __ne__(self, other):
|
||||
# XXX Perhaps object.__ne__() should be defined this way.
|
||||
result = self.__eq__(other)
|
||||
if result is not NotImplemented:
|
||||
result = not result
|
||||
return result
|
||||
|
||||
class d_values:
|
||||
|
||||
|
@ -206,7 +266,9 @@ I'll try pseudo-code to specify the semantics::
|
|||
return len(self.__d)
|
||||
|
||||
def __contains__(self, value):
|
||||
# Slow! Do we even want to implement this?
|
||||
# This is slow, and it's what "x in y" uses as a fallback
|
||||
# if __contains__ is not defined; but I'd rather make it
|
||||
# explicit that it is supported.
|
||||
for v in self:
|
||||
if v == value:
|
||||
return True
|
||||
|
@ -216,32 +278,71 @@ I'll try pseudo-code to specify the semantics::
|
|||
for key in self.__d:
|
||||
yield self.__d[key]
|
||||
|
||||
# Do we care about the following?
|
||||
def __eq__(self, other):
|
||||
if not isinstance(other, d_values):
|
||||
return NotImplemented
|
||||
if len(self) != len(other):
|
||||
return False
|
||||
# XXX Sometimes this could be optimized, but these are the
|
||||
# semantics: we can't depend on the values to be hashable
|
||||
# or comparable.
|
||||
o = list(other)
|
||||
for x in self:
|
||||
try:
|
||||
o.remove(x)
|
||||
except ValueError:
|
||||
return False
|
||||
return True
|
||||
|
||||
def pop(self):
|
||||
return self.__d.popitem()[1]
|
||||
def __ne__(self, other):
|
||||
result = self.__eq__(other)
|
||||
if result is not NotImplemented:
|
||||
result = not result
|
||||
return result
|
||||
|
||||
def clear(self):
|
||||
return self.__d.clear()
|
||||
|
||||
def copy(self):
|
||||
# XXX What should this return?
|
||||
|
||||
# Should we bother implementing set-like operations on
|
||||
# multisets? If so, how about mixed operations on sets and
|
||||
# multisets? I'm not sure that these are worth the effort.
|
||||
|
||||
I'm soliciting better names than d_keys, d_values and d_items; these
|
||||
classes will be public so that their implementations may be reused by
|
||||
the .keys(), .values() and .items() methods of other mappings. (Or
|
||||
should they?)
|
||||
Note that we don't implement .copy() -- the presence of a .copy()
|
||||
method suggests that the copy has the same type as the original, but
|
||||
that's not feasible without copying the underlying dict. If you want
|
||||
a copy of a specific type, like list or set, you can just pass one
|
||||
of the above to the list() or set() constructor.
|
||||
|
||||
|
||||
Open Issues
|
||||
===========
|
||||
|
||||
Should the d_keys, d_values and d_items classes be reusable? Should
|
||||
they be subclassable?
|
||||
I've left out the implementation of various set operations. These
|
||||
could still present surprises.
|
||||
|
||||
Should d_values have mutating methods (pop(), clear())? Strawman: no.
|
||||
|
||||
Should d_values implement set operations (as defined for multisets).
|
||||
Strawman: no.
|
||||
|
||||
Should d_keys, d_values and d_items have a public instance variable or
|
||||
method through which one can retrieve the underlying dict? Strawman:
|
||||
yes (but what should it be called?).
|
||||
|
||||
I'm soliciting better names than d_keys, d_values and d_items. These
|
||||
classes could be public so that their implementations could be reused
|
||||
by the .keys(), .values() and .items() methods of other mappings. Or
|
||||
should they?
|
||||
|
||||
Should the d_keys, d_values and d_items classes be reusable?
|
||||
Strawman: yes.
|
||||
|
||||
Should they be subclassable? Strawman: yes (but see below).
|
||||
|
||||
A particularly nasty issue is whether operations that are specified in
|
||||
terms of other operations (e.g. .discard()) must really be implemented
|
||||
in terms of those other operations; this may appear irrelevant but it
|
||||
becomes relevant if these classes are ever subclassed. Historically,
|
||||
Python has a really poor track record of specifying the semantics of
|
||||
highly optimized built-in types clearly in such cases; my strawman is
|
||||
to continue that trend. Subclassing may still be useful to *add* new
|
||||
methods, for example.
|
||||
|
||||
I'll leave the decisions (especially about naming) up to whoever
|
||||
submits a working implementation.
|
||||
|
||||
|
||||
References
|
||||
|
|
Loading…
Reference in New Issue