python-peps/pep-3106.txt

PEP: 3106
Title: Revamping dict.keys(), .values() and .items()
Version: $Revision$
Last-Modified: $Date$
Author: Guido van Rossum
Status: Draft
Type: Standards
Content-Type: text/x-rst
Created: 19-Dec-2006
Post-History:


Abstract
========

This PEP proposes to change the .keys(), .values() and .items()
methods of the built-in dict type to return a set-like or
multiset-like (== bag-like) object whose contents are derived of the
underlying dictionary rather than a list which is a copy of the keys,
etc.; and to remove the .iterkeys(), .itervalues() and .iteritems()
methods.

The approach is inspired by that taken in the Java Collections
Framework [1]_.

Introduction
============

It has long been the plan to change the .keys(), .values() and
.items() methods of the built-in dict type to return a more
lightweight object than a list, and to get rid of .iterkeys(),
.itervalues() and .iteritems().  The idea is that code that currently
(in 2.x) reads::

    for k, v in d.iteritems(): ...

should be rewritten as::

    for k, v in d.items(): ...

(and similar for .itervalues() and .iterkeys(), except the latter is
redundant since we can write that loop as ``for k in d``.)

Code that currently reads::

    a = d.keys()    # assume we really want a list here

(etc.) should be rewritten as

    a = list(d.keys())

There are (at least) two ways to accomplish this.  The original plan
was to simply let .keys(), .values() and .items() return an iterator,
i.e. exactly what iterkeys(), itervalues() and iteritems() return in
Python 2.x.  However, the Java Collections Framework [1]_ suggests
that a better solution is possible: the methods return objects with
set behavior (for .keys() and .items()) or multiset (== bag) behavior
(for .values()) that do not contain copies of the keys, values or
items, but rather reference the underlying dict and pull their values
out of the dict as needed.

The advantage of this approach is that one can still write code like
this::

    a = d.items()
    for k, v in a: ...
    for k, v in a: ...

Effectively, iter(d.keys()) (etc.) in Python 3.0 will do what
d.iterkeys() (etc.) does in Python 2.x; but in most contexts we don't
have to write the iter() call because it is implied by a for-loop.

The objects returned by the .keys() and .items() methods behave like
sets with limited mutability; they allow removing elements, but not
adding them.  Removing an item from these sets removes it from the
underlying dict.  The object returned by the values() method behaves
like a multiset (Java calls this a Collection).  It does not allow
removing elements, because a value might occur multiple times and the
implementation wouldn't know which key to remove from the underlying
dict.  (The Java Collections Framework has a way around this by
removing from an iterator, but I see no practical use case for that
functionality.)

Because of the set behavior, it will be possible to check whether two
dicts have the same keys by simply testing::

    if a.keys() == b.keys(): ...

and similarly for values.  (Two multisets are deemed equal if they
have the same elements with the same cardinalities, e.g. the multiset
{1, 2, 2} is equal to the multiset {2, 1, 2} but differs from the
multiset {1, 2}.)

These operations are thread-safe only to the extent that using them in
a thread-unsafe way may cause an exception but will not cause
corruption of the internal representation.

As in Python 2.x, mutating a dict while iterating over it using an
iterator has an undefined effect and will in most cases raise a
RuntimeError exception.  (This is similar to the guarantees made by
the Java Collections Framework.)

The objects returned by .keys() and .items() are fully interoperable
with instances of the built-in set and frozenset types; for example::

    set(d.keys()) == d.keys()

is guaranteed to be True (except when d is being modified
simultaneously by another thread).


Specification
=============

I'm using pseudo-code to specify the semantics::

    class dict:

        # Omitting all other dict methods for brevity.
        # The .iterkeys(), .itervalues() and .iteritems() methods
        # will be removed.

        def keys(self):
            return d_keys(self)

        def items(self):
            return d_items(self)

        def values(self):
            return d_values(self)

    class d_keys:

        def __init__(self, d):
            self.__d = d

        def __len__(self):
            return len(self.__d)

        def __contains__(self, key):
            return key in self.__d

        def __iter__(self):
            for key in self.__d:
                yield key

        def remove(self, key):
            del self.__d[key]

        def discard(self, key):
            if key in self:
                self.remove(key)

        def pop(self):
            return self.__d.popitem()[0]

        def clear(self):
            self.__d.clear()

        # The following operations should be implemented to be
        # compatible with sets; this can be done by exploiting
        # the above primitive operations:
        #
        #   <, <=, ==, !=, >=, > (returning a bool)
        #   &, |, ^, - (returning a new, real set object)
        #   &=, -= (updating in place and returning self; but not |=, ^=)
        #
        # as well as their method counterparts (.union(), etc.).
        #
        # To specify the semantics, we can specify x == y as:
        #
        #   set(x) == set(y)   if both x and y are d_keys instances
        #   set(x) == y        if x is a d_keys instance
        #   x == set(y)        if y is a d_keys instance
        #
        # and so on for all other operations.

    class d_items:

        def __init__(self, d):
            self.__d = d

        def __len__(self):
            return len(self.__d)

        def __contains__(self, (key, value)):
            return key in self.__d and self.__d[key] == value

        def __iter__(self):
            for key in self.__d:
                yield key, self.__d[key]

        def remove(self, (key, value)):
            if (key, value) not in self:
                raise KeyError((key, value))
            del self.__d[key]

        def discard(self, item):
            # Defined in terms of 'in' and .remove() so overriding
            # those will update discard appropriately.
            if item in self:
                self.remove(item)

        def pop(self):
            return self.__d.popitem()

        def clear(self):
            self.__d.clear()

        # As well as the set operations mentioned for d_keys above.
        # However the specifications suggested there will not work if
        # the values aren't hashable.  Fortunately, the operations can
        # still be implemented efficiently.  For example, this is how
        # intersection can be specified:

        def __and__(self, other):
            if isinstance(other, (set, frozenset, d_keys)):
                result = set()
                for item in other:
                    if item in self:
                        result.add(item)
                return result
            if not isinstance(other, d_items):
                return NotImplemented
            d = {}
            if len(other) < len(self):
                self, other = other, self
            for item in self:
                if item in other:
                    key, value = item
                    d[key] = value
            return d.items()

        # And here is equality:

        def __eq__(self, other):
            if isinstance(other, (set, frozenset, d_keys)):
                if len(self) != len(other):
                    return False
                for item in other:
                    if item not in self:
                        return False
                return True
            if not isinstance(other, d_items):
                return NotImplemented
            if len(self) != len(other):
                return False
            for item in self:
                if item not in other:
                    return False
            return True

        def __ne__(self, other):
            # XXX Perhaps object.__ne__() should be defined this way.
            result = self.__eq__(other)
            if result is not NotImplemented:
                result = not result
            return result

    class d_values:

        def __init__(self, d):
            self.__d = d

        def __len__(self):
            return len(self.__d)

        def __contains__(self, value):
            # This is slow, and it's what "x in y" uses as a fallback
            # if __contains__ is not defined; but I'd rather make it
            # explicit that it is supported.
            for v in self:
                 if v == value:
                     return True
            return False

        def __iter__(self):
            for key in self.__d:
                yield self.__d[key]

        def __eq__(self, other):
            if not isinstance(other, d_values):
                return NotImplemented
            if len(self) != len(other):
                return False
            # XXX Sometimes this could be optimized, but these are the
            # semantics: we can't depend on the values to be hashable
            # or comparable.
            o = list(other)
            for x in self:
                try:
                    o.remove(x)
                except ValueError:
                    return False
            return True

        def __ne__(self, other):
            result = self.__eq__(other)
            if result is not NotImplemented:
                result = not result
            return result

Note that we don't implement .copy() -- the presence of a .copy()
method suggests that the copy has the same type as the original, but
that's not feasible without copying the underlying dict.  If you want
a copy of a specific type, like list or set, you can just pass one
of the above to the list() or set() constructor.


Open Issues
===========

I've left out the implementation of various set operations.  These
could still present surprises.

Should d_values have mutating methods (pop(), clear())?  Strawman: no.

Should d_values implement set operations (as defined for multisets).
Strawman: no.

Should d_keys, d_values and d_items have a public instance variable or
method through which one can retrieve the underlying dict?  Strawman:
yes (but what should it be called?).

I'm soliciting better names than d_keys, d_values and d_items.  These
classes could be public so that their implementations could be reused
by the .keys(), .values() and .items() methods of other mappings.  Or
should they?

Should the d_keys, d_values and d_items classes be reusable?
Strawman: yes.

Should they be subclassable?  Strawman: yes (but see below).

A particularly nasty issue is whether operations that are specified in
terms of other operations (e.g. .discard()) must really be implemented
in terms of those other operations; this may appear irrelevant but it
becomes relevant if these classes are ever subclassed.  Historically,
Python has a really poor track record of specifying the semantics of
highly optimized built-in types clearly in such cases; my strawman is
to continue that trend.  Subclassing may still be useful to *add* new
methods, for example.

I'll leave the decisions (especially about naming) up to whoever
submits a working implementation.


References
==========

.. [1] Java Collections Framework
   http://java.sun.com/docs/books/tutorial/collections/index.html
Not really creating the PEP, just a stub to reserve the PEP number. 2006-12-19 22:03:44 -05:00			`PEP: 3106`
			`Title: Revamping dict.keys(), .values() and .items()`
			`Version: $Revision$`
			`Last-Modified: $Date$`
			`Author: Guido van Rossum`
			`Status: Draft`
			`Type: Standards`
			`Content-Type: text/x-rst`
			`Created: 19-Dec-2006`
			`Post-History:`


			`Abstract`
Wrote the real PEP (mostly). 2006-12-20 00:44:20 -05:00			`========`
Not really creating the PEP, just a stub to reserve the PEP number. 2006-12-19 22:03:44 -05:00
Wrote the real PEP (mostly). 2006-12-20 00:44:20 -05:00			`This PEP proposes to change the .keys(), .values() and .items()`
			`methods of the built-in dict type to return a set-like or`
Clarified lots of issues, added more open issues. 2006-12-23 00:22:30 -05:00			`multiset-like (== bag-like) object whose contents are derived of the`
			`underlying dictionary rather than a list which is a copy of the keys,`
			`etc.; and to remove the .iterkeys(), .itervalues() and .iteritems()`
			`methods.`
Wrote the real PEP (mostly). 2006-12-20 00:44:20 -05:00
			`The approach is inspired by that taken in the Java Collections`
			`Framework [1]_.`

			`Introduction`
			`============`

			`It has long been the plan to change the .keys(), .values() and`
			`.items() methods of the built-in dict type to return a more`
			`lightweight object than a list, and to get rid of .iterkeys(),`
			`.itervalues() and .iteritems(). The idea is that code that currently`
			`(in 2.x) reads::`

Clarified lots of issues, added more open issues. 2006-12-23 00:22:30 -05:00			`for k, v in d.iteritems(): ...`
Wrote the real PEP (mostly). 2006-12-20 00:44:20 -05:00
			`should be rewritten as::`

Clarified lots of issues, added more open issues. 2006-12-23 00:22:30 -05:00			`for k, v in d.items(): ...`
Wrote the real PEP (mostly). 2006-12-20 00:44:20 -05:00
Clarified lots of issues, added more open issues. 2006-12-23 00:22:30 -05:00			`(and similar for .itervalues() and .iterkeys(), except the latter is`
			redundant since we can write that loop as ``for k in d``.)

			`Code that currently reads::`
Wrote the real PEP (mostly). 2006-12-20 00:44:20 -05:00
			`a = d.keys() # assume we really want a list here`

Clarified lots of issues, added more open issues. 2006-12-23 00:22:30 -05:00			`(etc.) should be rewritten as`
Wrote the real PEP (mostly). 2006-12-20 00:44:20 -05:00
			`a = list(d.keys())`

			`There are (at least) two ways to accomplish this. The original plan`
			`was to simply let .keys(), .values() and .items() return an iterator,`
Clarified lots of issues, added more open issues. 2006-12-23 00:22:30 -05:00			`i.e. exactly what iterkeys(), itervalues() and iteritems() return in`
			`Python 2.x. However, the Java Collections Framework [1]_ suggests`
Wrote the real PEP (mostly). 2006-12-20 00:44:20 -05:00			`that a better solution is possible: the methods return objects with`
Clarified lots of issues, added more open issues. 2006-12-23 00:22:30 -05:00			`set behavior (for .keys() and .items()) or multiset (== bag) behavior`
			`(for .values()) that do not contain copies of the keys, values or`
			`items, but rather reference the underlying dict and pull their values`
			`out of the dict as needed.`
Wrote the real PEP (mostly). 2006-12-20 00:44:20 -05:00
			`The advantage of this approach is that one can still write code like`
			`this::`

Clarified lots of issues, added more open issues. 2006-12-23 00:22:30 -05:00			`a = d.items()`
			`for k, v in a: ...`
			`for k, v in a: ...`
Wrote the real PEP (mostly). 2006-12-20 00:44:20 -05:00
Clarified lots of issues, added more open issues. 2006-12-23 00:22:30 -05:00			`Effectively, iter(d.keys()) (etc.) in Python 3.0 will do what`
			`d.iterkeys() (etc.) does in Python 2.x; but in most contexts we don't`
			`have to write the iter() call because it is implied by a for-loop.`
Wrote the real PEP (mostly). 2006-12-20 00:44:20 -05:00
			`The objects returned by the .keys() and .items() methods behave like`
Fix typo 2006-12-20 01:36:16 -05:00			`sets with limited mutability; they allow removing elements, but not`
Wrote the real PEP (mostly). 2006-12-20 00:44:20 -05:00			`adding them. Removing an item from these sets removes it from the`
			`underlying dict. The object returned by the values() method behaves`
			`like a multiset (Java calls this a Collection). It does not allow`
			`removing elements, because a value might occur multiple times and the`
			`implementation wouldn't know which key to remove from the underlying`
			`dict. (The Java Collections Framework has a way around this by`
			`removing from an iterator, but I see no practical use case for that`
			`functionality.)`

			`Because of the set behavior, it will be possible to check whether two`
			`dicts have the same keys by simply testing::`

			`if a.keys() == b.keys(): ...`

			`and similarly for values. (Two multisets are deemed equal if they`
Clarified lots of issues, added more open issues. 2006-12-23 00:22:30 -05:00			`have the same elements with the same cardinalities, e.g. the multiset`
			`{1, 2, 2} is equal to the multiset {2, 1, 2} but differs from the`
			`multiset {1, 2}.)`
Wrote the real PEP (mostly). 2006-12-20 00:44:20 -05:00
			`These operations are thread-safe only to the extent that using them in`
			`a thread-unsafe way may cause an exception but will not cause`
			`corruption of the internal representation.`

			`As in Python 2.x, mutating a dict while iterating over it using an`
			`iterator has an undefined effect and will in most cases raise a`
			`RuntimeError exception. (This is similar to the guarantees made by`
			`the Java Collections Framework.)`

			`The objects returned by .keys() and .items() are fully interoperable`
			`with instances of the built-in set and frozenset types; for example::`

			`set(d.keys()) == d.keys()`

			`is guaranteed to be True (except when d is being modified`
			`simultaneously by another thread).`


			`Specification`
			`=============`

Clarified lots of issues, added more open issues. 2006-12-23 00:22:30 -05:00			`I'm using pseudo-code to specify the semantics::`
Wrote the real PEP (mostly). 2006-12-20 00:44:20 -05:00
			`class dict:`

Clarified lots of issues, added more open issues. 2006-12-23 00:22:30 -05:00			`# Omitting all other dict methods for brevity.`
			`# The .iterkeys(), .itervalues() and .iteritems() methods`
			`# will be removed.`
Wrote the real PEP (mostly). 2006-12-20 00:44:20 -05:00
			`def keys(self):`
			`return d_keys(self)`

			`def items(self):`
			`return d_items(self)`

			`def values(self):`
			`return d_values(self)`

			`class d_keys:`

			`def __init__(self, d):`
			`self.__d = d`

			`def __len__(self):`
			`return len(self.__d)`

			`def __contains__(self, key):`
			`return key in self.__d`

			`def __iter__(self):`
			`for key in self.__d:`
			`yield key`

			`def remove(self, key):`
			`del self.__d[key]`

			`def discard(self, key):`
			`if key in self:`
			`self.remove(key)`

			`def pop(self):`
			`return self.__d.popitem()[0]`

			`def clear(self):`
			`self.__d.clear()`

			`# The following operations should be implemented to be`
			`# compatible with sets; this can be done by exploiting`
Clarified lots of issues, added more open issues. 2006-12-23 00:22:30 -05:00			`# the above primitive operations:`
Wrote the real PEP (mostly). 2006-12-20 00:44:20 -05:00			`#`
			`# <, <=, ==, !=, >=, > (returning a bool)`
			`# &, \|, ^, - (returning a new, real set object)`
			`# &=, -= (updating in place and returning self; but not \|=, ^=)`
			`#`
			`# as well as their method counterparts (.union(), etc.).`
Clarified lots of issues, added more open issues. 2006-12-23 00:22:30 -05:00			`#`
			`# To specify the semantics, we can specify x == y as:`
			`#`
			`# set(x) == set(y) if both x and y are d_keys instances`
			`# set(x) == y if x is a d_keys instance`
			`# x == set(y) if y is a d_keys instance`
			`#`
			`# and so on for all other operations.`
Wrote the real PEP (mostly). 2006-12-20 00:44:20 -05:00
			`class d_items:`

			`def __init__(self, d):`
			`self.__d = d`

			`def __len__(self):`
			`return len(self.__d)`

			`def __contains__(self, (key, value)):`
			`return key in self.__d and self.__d[key] == value`

			`def __iter__(self):`
			`for key in self.__d:`
			`yield key, self.__d[key]`

			`def remove(self, (key, value)):`
Clarified lots of issues, added more open issues. 2006-12-23 00:22:30 -05:00			`if (key, value) not in self:`
			`raise KeyError((key, value))`
Wrote the real PEP (mostly). 2006-12-20 00:44:20 -05:00			`del self.__d[key]`

			`def discard(self, item):`
Clarified lots of issues, added more open issues. 2006-12-23 00:22:30 -05:00			`# Defined in terms of 'in' and .remove() so overriding`
			`# those will update discard appropriately.`
Wrote the real PEP (mostly). 2006-12-20 00:44:20 -05:00			`if item in self:`
			`self.remove(item)`

			`def pop(self):`
			`return self.__d.popitem()`

			`def clear(self):`
			`self.__d.clear()`

Clarified lots of issues, added more open issues. 2006-12-23 00:22:30 -05:00			`# As well as the set operations mentioned for d_keys above.`
			`# However the specifications suggested there will not work if`
			`# the values aren't hashable. Fortunately, the operations can`
			`# still be implemented efficiently. For example, this is how`
			`# intersection can be specified:`

			`def __and__(self, other):`
			`if isinstance(other, (set, frozenset, d_keys)):`
			`result = set()`
			`for item in other:`
			`if item in self:`
			`result.add(item)`
			`return result`
			`if not isinstance(other, d_items):`
			`return NotImplemented`
			`d = {}`
			`if len(other) < len(self):`
			`self, other = other, self`
			`for item in self:`
			`if item in other:`
			`key, value = item`
			`d[key] = value`
			`return d.items()`

			`# And here is equality:`

			`def __eq__(self, other):`
			`if isinstance(other, (set, frozenset, d_keys)):`
			`if len(self) != len(other):`
			`return False`
			`for item in other:`
			`if item not in self:`
			`return False`
			`return True`
			`if not isinstance(other, d_items):`
			`return NotImplemented`
			`if len(self) != len(other):`
			`return False`
			`for item in self:`
			`if item not in other:`
			`return False`
			`return True`

			`def __ne__(self, other):`
			`# XXX Perhaps object.__ne__() should be defined this way.`
			`result = self.__eq__(other)`
			`if result is not NotImplemented:`
			`result = not result`
			`return result`
Wrote the real PEP (mostly). 2006-12-20 00:44:20 -05:00
			`class d_values:`

			`def __init__(self, d):`
			`self.__d = d`

			`def __len__(self):`
			`return len(self.__d)`

			`def __contains__(self, value):`
Clarified lots of issues, added more open issues. 2006-12-23 00:22:30 -05:00			`# This is slow, and it's what "x in y" uses as a fallback`
			`# if __contains__ is not defined; but I'd rather make it`
			`# explicit that it is supported.`
Wrote the real PEP (mostly). 2006-12-20 00:44:20 -05:00			`for v in self:`
			`if v == value:`
			`return True`
			`return False`

			`def __iter__(self):`
			`for key in self.__d:`
			`yield self.__d[key]`

Clarified lots of issues, added more open issues. 2006-12-23 00:22:30 -05:00			`def __eq__(self, other):`
			`if not isinstance(other, d_values):`
			`return NotImplemented`
			`if len(self) != len(other):`
			`return False`
			`# XXX Sometimes this could be optimized, but these are the`
			`# semantics: we can't depend on the values to be hashable`
			`# or comparable.`
			`o = list(other)`
			`for x in self:`
			`try:`
			`o.remove(x)`
			`except ValueError:`
			`return False`
			`return True`

			`def __ne__(self, other):`
			`result = self.__eq__(other)`
			`if result is not NotImplemented:`
			`result = not result`
			`return result`

			`Note that we don't implement .copy() -- the presence of a .copy()`
			`method suggests that the copy has the same type as the original, but`
			`that's not feasible without copying the underlying dict. If you want`
			`a copy of a specific type, like list or set, you can just pass one`
			`of the above to the list() or set() constructor.`
Wrote the real PEP (mostly). 2006-12-20 00:44:20 -05:00

Clarified lots of issues, added more open issues. 2006-12-23 00:22:30 -05:00			`Open Issues`
			`===========`
Wrote the real PEP (mostly). 2006-12-20 00:44:20 -05:00
Clarified lots of issues, added more open issues. 2006-12-23 00:22:30 -05:00			`I've left out the implementation of various set operations. These`
			`could still present surprises.`
Wrote the real PEP (mostly). 2006-12-20 00:44:20 -05:00
Clarified lots of issues, added more open issues. 2006-12-23 00:22:30 -05:00			`Should d_values have mutating methods (pop(), clear())? Strawman: no.`
Wrote the real PEP (mostly). 2006-12-20 00:44:20 -05:00
Clarified lots of issues, added more open issues. 2006-12-23 00:22:30 -05:00			`Should d_values implement set operations (as defined for multisets).`
			`Strawman: no.`
Wrote the real PEP (mostly). 2006-12-20 00:44:20 -05:00
Clarified lots of issues, added more open issues. 2006-12-23 00:22:30 -05:00			`Should d_keys, d_values and d_items have a public instance variable or`
			`method through which one can retrieve the underlying dict? Strawman:`
			`yes (but what should it be called?).`
Wrote the real PEP (mostly). 2006-12-20 00:44:20 -05:00
Clarified lots of issues, added more open issues. 2006-12-23 00:22:30 -05:00			`I'm soliciting better names than d_keys, d_values and d_items. These`
			`classes could be public so that their implementations could be reused`
			`by the .keys(), .values() and .items() methods of other mappings. Or`
			`should they?`

			`Should the d_keys, d_values and d_items classes be reusable?`
			`Strawman: yes.`

			`Should they be subclassable? Strawman: yes (but see below).`

			`A particularly nasty issue is whether operations that are specified in`
			`terms of other operations (e.g. .discard()) must really be implemented`
			`in terms of those other operations; this may appear irrelevant but it`
			`becomes relevant if these classes are ever subclassed. Historically,`
			`Python has a really poor track record of specifying the semantics of`
			`highly optimized built-in types clearly in such cases; my strawman is`
			`to continue that trend. Subclassing may still be useful to add new`
			`methods, for example.`
Wrote the real PEP (mostly). 2006-12-20 00:44:20 -05:00
Clarified lots of issues, added more open issues. 2006-12-23 00:22:30 -05:00			`I'll leave the decisions (especially about naming) up to whoever`
			`submits a working implementation.`
Wrote the real PEP (mostly). 2006-12-20 00:44:20 -05:00

			`References`
			`==========`

			`.. [1] Java Collections Framework`
			`http://java.sun.com/docs/books/tutorial/collections/index.html`