339 lines
11 KiB
ReStructuredText
339 lines
11 KiB
ReStructuredText
PEP: 3106
|
|
Title: Revamping dict.keys(), .values() and .items()
|
|
Version: $Revision$
|
|
Last-Modified: $Date$
|
|
Author: Guido van Rossum
|
|
Status: Final
|
|
Type: Standards Track
|
|
Content-Type: text/x-rst
|
|
Created: 19-Dec-2006
|
|
Python-Version: 3.0
|
|
Post-History:
|
|
|
|
|
|
Abstract
|
|
========
|
|
|
|
This PEP proposes to change the .keys(), .values() and .items()
|
|
methods of the built-in dict type to return a set-like or unordered
|
|
container object whose contents are derived from the underlying
|
|
dictionary rather than a list which is a copy of the keys, etc.; and
|
|
to remove the .iterkeys(), .itervalues() and .iteritems() methods.
|
|
|
|
The approach is inspired by that taken in the Java Collections
|
|
Framework [1]_.
|
|
|
|
|
|
Introduction
|
|
============
|
|
|
|
It has long been the plan to change the .keys(), .values() and
|
|
.items() methods of the built-in dict type to return a more
|
|
lightweight object than a list, and to get rid of .iterkeys(),
|
|
.itervalues() and .iteritems(). The idea is that code that currently
|
|
(in 2.x) reads::
|
|
|
|
for k, v in d.iteritems(): ...
|
|
|
|
should be rewritten as::
|
|
|
|
for k, v in d.items(): ...
|
|
|
|
(and similar for .itervalues() and .iterkeys(), except the latter is
|
|
redundant since we can write that loop as ``for k in d``.)
|
|
|
|
Code that currently reads::
|
|
|
|
a = d.keys() # assume we really want a list here
|
|
|
|
(etc.) should be rewritten as
|
|
|
|
a = list(d.keys())
|
|
|
|
There are (at least) two ways to accomplish this. The original plan
|
|
was to simply let .keys(), .values() and .items() return an iterator,
|
|
i.e. exactly what iterkeys(), itervalues() and iteritems() return in
|
|
Python 2.x. However, the Java Collections Framework [1]_ suggests
|
|
that a better solution is possible: the methods return objects with
|
|
set behavior (for .keys() and .items()) or multiset (== bag) behavior
|
|
(for .values()) that do not contain copies of the keys, values or
|
|
items, but rather reference the underlying dict and pull their values
|
|
out of the dict as needed.
|
|
|
|
The advantage of this approach is that one can still write code like
|
|
this::
|
|
|
|
a = d.items()
|
|
for k, v in a: ...
|
|
# And later, again:
|
|
for k, v in a: ...
|
|
|
|
Effectively, iter(d.keys()) (etc.) in Python 3.0 will do what
|
|
d.iterkeys() (etc.) does in Python 2.x; but in most contexts we don't
|
|
have to write the iter() call because it is implied by a for-loop.
|
|
|
|
The objects returned by the .keys() and .items() methods behave like
|
|
sets. The object returned by the values() method behaves like a much
|
|
simpler unordered collection -- it cannot be a set because duplicate
|
|
values are possible.
|
|
|
|
Because of the set behavior, it will be possible to check whether two
|
|
dicts have the same keys by simply testing::
|
|
|
|
if a.keys() == b.keys(): ...
|
|
|
|
and similarly for .items().
|
|
|
|
These operations are thread-safe only to the extent that using them in
|
|
a thread-unsafe way may cause an exception but will not cause
|
|
corruption of the internal representation.
|
|
|
|
As in Python 2.x, mutating a dict while iterating over it using an
|
|
iterator has an undefined effect and will in most cases raise a
|
|
RuntimeError exception. (This is similar to the guarantees made by
|
|
the Java Collections Framework.)
|
|
|
|
The objects returned by .keys() and .items() are fully interoperable
|
|
with instances of the built-in set and frozenset types; for example::
|
|
|
|
set(d.keys()) == d.keys()
|
|
|
|
is guaranteed to be True (except when d is being modified
|
|
simultaneously by another thread).
|
|
|
|
|
|
Specification
|
|
=============
|
|
|
|
I'm using pseudo-code to specify the semantics::
|
|
|
|
class dict:
|
|
|
|
# Omitting all other dict methods for brevity.
|
|
# The .iterkeys(), .itervalues() and .iteritems() methods
|
|
# will be removed.
|
|
|
|
def keys(self):
|
|
return d_keys(self)
|
|
|
|
def items(self):
|
|
return d_items(self)
|
|
|
|
def values(self):
|
|
return d_values(self)
|
|
|
|
class d_keys:
|
|
|
|
def __init__(self, d):
|
|
self.__d = d
|
|
|
|
def __len__(self):
|
|
return len(self.__d)
|
|
|
|
def __contains__(self, key):
|
|
return key in self.__d
|
|
|
|
def __iter__(self):
|
|
for key in self.__d:
|
|
yield key
|
|
|
|
# The following operations should be implemented to be
|
|
# compatible with sets; this can be done by exploiting
|
|
# the above primitive operations:
|
|
#
|
|
# <, <=, ==, !=, >=, > (returning a bool)
|
|
# &, |, ^, - (returning a new, real set object)
|
|
#
|
|
# as well as their method counterparts (.union(), etc.).
|
|
#
|
|
# To specify the semantics, we can specify x == y as:
|
|
#
|
|
# set(x) == set(y) if both x and y are d_keys instances
|
|
# set(x) == y if x is a d_keys instance
|
|
# x == set(y) if y is a d_keys instance
|
|
#
|
|
# and so on for all other operations.
|
|
|
|
class d_items:
|
|
|
|
def __init__(self, d):
|
|
self.__d = d
|
|
|
|
def __len__(self):
|
|
return len(self.__d)
|
|
|
|
def __contains__(self, (key, value)):
|
|
return key in self.__d and self.__d[key] == value
|
|
|
|
def __iter__(self):
|
|
for key in self.__d:
|
|
yield key, self.__d[key]
|
|
|
|
# As well as the set operations mentioned for d_keys above.
|
|
# However the specifications suggested there will not work if
|
|
# the values aren't hashable. Fortunately, the operations can
|
|
# still be implemented efficiently. For example, this is how
|
|
# intersection can be specified:
|
|
|
|
def __and__(self, other):
|
|
if isinstance(other, (set, frozenset, d_keys)):
|
|
result = set()
|
|
for item in other:
|
|
if item in self:
|
|
result.add(item)
|
|
return result
|
|
if not isinstance(other, d_items):
|
|
return NotImplemented
|
|
d = {}
|
|
if len(other) < len(self):
|
|
self, other = other, self
|
|
for item in self:
|
|
if item in other:
|
|
key, value = item
|
|
d[key] = value
|
|
return d.items()
|
|
|
|
# And here is equality:
|
|
|
|
def __eq__(self, other):
|
|
if isinstance(other, (set, frozenset, d_keys)):
|
|
if len(self) != len(other):
|
|
return False
|
|
for item in other:
|
|
if item not in self:
|
|
return False
|
|
return True
|
|
if not isinstance(other, d_items):
|
|
return NotImplemented
|
|
# XXX We could also just compare the underlying dicts...
|
|
if len(self) != len(other):
|
|
return False
|
|
for item in self:
|
|
if item not in other:
|
|
return False
|
|
return True
|
|
|
|
def __ne__(self, other):
|
|
# XXX Perhaps object.__ne__() should be defined this way.
|
|
result = self.__eq__(other)
|
|
if result is not NotImplemented:
|
|
result = not result
|
|
return result
|
|
|
|
class d_values:
|
|
|
|
def __init__(self, d):
|
|
self.__d = d
|
|
|
|
def __len__(self):
|
|
return len(self.__d)
|
|
|
|
def __contains__(self, value):
|
|
# This is slow, and it's what "x in y" uses as a fallback
|
|
# if __contains__ is not defined; but I'd rather make it
|
|
# explicit that it is supported.
|
|
for v in self:
|
|
if v == value:
|
|
return True
|
|
return False
|
|
|
|
def __iter__(self):
|
|
for key in self.__d:
|
|
yield self.__d[key]
|
|
|
|
def __eq__(self, other):
|
|
if not isinstance(other, d_values):
|
|
return NotImplemented
|
|
if len(self) != len(other):
|
|
return False
|
|
# XXX Sometimes this could be optimized, but these are the
|
|
# semantics: we can't depend on the values to be hashable
|
|
# or comparable.
|
|
olist = list(other)
|
|
for x in self:
|
|
try:
|
|
olist.remove(x)
|
|
except ValueError:
|
|
return False
|
|
assert olist == []
|
|
return True
|
|
|
|
def __ne__(self, other):
|
|
result = self.__eq__(other)
|
|
if result is not NotImplemented:
|
|
result = not result
|
|
return result
|
|
|
|
Notes:
|
|
|
|
The view objects are not directly mutable, but don't implement
|
|
__hash__(); their value can change if the underlying dict is mutated.
|
|
|
|
The only requirements on the underlying dict are that it implements
|
|
__getitem__(), __contains__(), __iter__(), and __len__().
|
|
|
|
We don't implement .copy() -- the presence of a .copy()
|
|
method suggests that the copy has the same type as the original, but
|
|
that's not feasible without copying the underlying dict. If you want
|
|
a copy of a specific type, like list or set, you can just pass one
|
|
of the above to the list() or set() constructor.
|
|
|
|
The specification implies that the order in which items
|
|
are returned by .keys(), .values() and .items() is the same (just as
|
|
it was in Python 2.x), because the order is all derived from the dict
|
|
iterator (which is presumably arbitrary but stable as long as a dict
|
|
isn't modified). This can be expressed by the following invariant::
|
|
|
|
list(d.items()) == list(zip(d.keys(), d.values()))
|
|
|
|
|
|
Open Issues
|
|
===========
|
|
|
|
Do we need more of a motivation? I would think that being able to do
|
|
set operations on keys and items without having to copy them should
|
|
speak for itself.
|
|
|
|
I've left out the implementation of various set operations. These
|
|
could still present small surprises.
|
|
|
|
It would be okay if multiple calls to d.keys() (etc.) returned the
|
|
same object, since the object's only state is the dict to which it
|
|
refers. Is this worth having extra slots in the dict object for?
|
|
Should that be a weak reference or should the d_keys (etc.) object
|
|
live forever once created? Strawman: probably not worth the extra
|
|
slots in every dict.
|
|
|
|
Should d_keys, d_values and d_items have a public instance variable or
|
|
method through which one can retrieve the underlying dict? Strawman:
|
|
yes (but what should it be called?).
|
|
|
|
I'm soliciting better names than d_keys, d_values and d_items. These
|
|
classes could be public so that their implementations could be reused
|
|
by the .keys(), .values() and .items() methods of other mappings. Or
|
|
should they?
|
|
|
|
Should the d_keys, d_values and d_items classes be reusable?
|
|
Strawman: yes.
|
|
|
|
Should they be subclassable? Strawman: yes (but see below).
|
|
|
|
A particularly nasty issue is whether operations that are specified in
|
|
terms of other operations (e.g. .discard()) must really be implemented
|
|
in terms of those other operations; this may appear irrelevant but it
|
|
becomes relevant if these classes are ever subclassed. Historically,
|
|
Python has a really poor track record of specifying the semantics of
|
|
highly optimized built-in types clearly in such cases; my strawman is
|
|
to continue that trend. Subclassing may still be useful to *add* new
|
|
methods, for example.
|
|
|
|
I'll leave the decisions (especially about naming) up to whoever
|
|
submits a working implementation.
|
|
|
|
|
|
References
|
|
==========
|
|
|
|
.. [1] Java Collections Framework
|
|
http://java.sun.com/docs/books/tutorial/collections/index.html
|