305 lines
9.8 KiB
Plaintext
305 lines
9.8 KiB
Plaintext
PEP: 372
|
|
Title: Adding an ordered dictionary to collections
|
|
Version: $Revision$
|
|
Last-Modified: $Date$
|
|
Author: Armin Ronacher <armin.ronacher@active-4.com>
|
|
Status: Draft
|
|
Type: Standards Track
|
|
Content-Type: text/x-rst
|
|
Created: 15-Jun-2008
|
|
Python-Version: 2.6, 3.0
|
|
Post-History:
|
|
|
|
|
|
Abstract
|
|
========
|
|
|
|
This PEP proposes an ordered dictionary as a new data structure for
|
|
the ``collections`` module, called "odict" in this PEP for short. The
|
|
proposed API incorporates the experiences gained from working with
|
|
similar implementations that exist in various real-world applications
|
|
and other programming languages.
|
|
|
|
|
|
Rationale
|
|
=========
|
|
|
|
In current Python versions, the widely used built-in dict type does
|
|
not specify an order for the key/value pairs stored. This makes it
|
|
hard to use dictionaries as data storage for some specific use cases.
|
|
|
|
Some dynamic programming languages like PHP and Ruby 1.9 guarantee a
|
|
certain order on iteration. In those languages, and existing Python
|
|
ordered-dict implementations, the ordering of items is defined by the
|
|
time of insertion of the key. New keys are appended at the end, but
|
|
keys that are overwritten are not moved to the end.
|
|
|
|
The following example shows the behavior for simple assignments:
|
|
|
|
>>> d = odict()
|
|
>>> d['parrot'] = 'dead'
|
|
>>> d['penguin'] = 'exploded'
|
|
>>> d.items()
|
|
[('parrot', 'dead'), ('penguin', 'exploded')]
|
|
|
|
That the ordering is preserved makes an odict useful for a couple of
|
|
situations:
|
|
|
|
- XML/HTML processing libraries currently drop the ordering of
|
|
attributes, use a list instead of a dict which makes filtering
|
|
cumbersome, or implement their own ordered dictionary. This affects
|
|
ElementTree, html5lib, Genshi and many more libraries.
|
|
|
|
- There are many ordered dict implementations in various libraries
|
|
and applications, most of them subtly incompatible with each other.
|
|
Furthermore, subclassing dict is a non-trivial task and many
|
|
implementations don't override all the methods properly which can
|
|
lead to unexpected results.
|
|
|
|
Additionally, many ordered dicts are implemented in an inefficient
|
|
way, making many operations more complex then they have to be.
|
|
|
|
- PEP 3115 allows metaclasses to change the mapping object used for
|
|
the class body. An ordered dict could be used to create ordered
|
|
member declarations similar to C structs. This could be useful, for
|
|
example, for future ``ctypes`` releases as well as ORMs that define
|
|
database tables as classes, like the one the Django framework ships.
|
|
Django currently uses an ugly hack to restore the ordering of
|
|
members in database models.
|
|
|
|
- The RawConfigParser class accepts a ``dict_type`` argument that
|
|
allows an application to set the type of dictionary used internally.
|
|
The motivation for this addition was expressly to allow users to
|
|
provide an ordered dictionary. [1]_
|
|
|
|
- Code ported from other programming languages such as PHP often
|
|
depends on a ordered dict. Having an implementation of an
|
|
ordering-preserving dictionary in the standard library could ease
|
|
the transition and improve the compatibility of different libraries.
|
|
|
|
|
|
Ordered Dict API
|
|
================
|
|
|
|
The ordered dict API would be mostly compatible with dict and existing
|
|
ordered dicts. (Note: this PEP refers to the Python 2.x dictionary
|
|
API; the transfer to the 3.x API is trivial.)
|
|
|
|
The constructor and ``update()`` both accept iterables of tuples as
|
|
well as mappings like a dict does. The ordering however is preserved
|
|
for the first case:
|
|
|
|
>>> d = odict([('a', 'b'), ('c', 'd')])
|
|
>>> d.update({'foo': 'bar'})
|
|
>>> d
|
|
collections.odict([('a', 'b'), ('c', 'd'), ('foo', 'bar')])
|
|
|
|
If ordered dicts are updated from regular dicts, the ordering of new
|
|
keys is of course undefined again unless ``sort()`` is called.
|
|
|
|
All iteration methods as well as ``keys()``, ``values()`` and
|
|
``items()`` return the values ordered by the time the key-value pair
|
|
was inserted:
|
|
|
|
>>> d['spam'] = 'eggs'
|
|
>>> d.keys()
|
|
['a', 'c', 'foo', 'spam']
|
|
>>> d.values()
|
|
['b', 'd', 'bar', 'eggs']
|
|
>>> d.items()
|
|
[('a', 'b'), ('c', 'd'), ('foo', 'bar'), ('spam', 'eggs')]
|
|
|
|
New methods not available on dict:
|
|
|
|
``odict.byindex(index)``
|
|
Returns the key/value pair for an index, that is, the "position" of a key in
|
|
the ordered dict. 0 is the first key/value pair, -1 the last.
|
|
|
|
>>> d.byindex(2)
|
|
('foo', 'bar')
|
|
|
|
If there is no key for index an `IndexError` is raised. Slices are not
|
|
supported.
|
|
|
|
``odict.index(key)``
|
|
Returns the index of a key. If the key does not exist, a `ValueError` is
|
|
raised.
|
|
|
|
``odict.sort(cmp=None, key=None, reverse=False)``
|
|
Sorts the odict in place by cmp or key. This works exactly like
|
|
``list.sort()``, but the comparison functions are passed a key/value tuple,
|
|
not only the value.
|
|
|
|
>>> d = odict([(42, 1), (1, 4), (23, 7)]) d.sort() d
|
|
collections.odict([(1, 4), (23, 7), (42, 1)])
|
|
|
|
``odict.reverse()``
|
|
Reverses the odict in place.
|
|
|
|
``odict.__reversed__()``
|
|
Supports reverse iteration by key.
|
|
|
|
``odict.__eq__()`` / ``odict.__ne__()``
|
|
Compares the odict to another object. If it's compared to another
|
|
odict the ordering of items is taken into account, otherwise only
|
|
the keys and values.
|
|
|
|
``odict.__cmp__()``
|
|
Ordered dicts are sorted by their items. ``cmp(od1, od2)`` is
|
|
equivalent to ``cmp(od1.items(), od2.items())`` if both ``od1``
|
|
and ``od2`` are ordered dicts. Otherwise the regular dict comparison
|
|
kicks in.
|
|
|
|
|
|
Python 3 Version
|
|
================
|
|
|
|
The Python 3 version of the ``odict`` returns dictionary views rather
|
|
than lists for ``odict.keys()``, ``odict.values()`` and
|
|
``odict.items()``. The keys view is equivalent to a regular keys view
|
|
but supports the following extra or changed operations:
|
|
|
|
``odict_keys.__getitem__(index)``
|
|
|
|
Returns the key for an index. This is equivalent to
|
|
``odict.byindex(index)``.
|
|
|
|
``odict_keys.index(key)``
|
|
|
|
Returns the index for a key. This exists for compatibility with
|
|
the ``Sequence`` abstract base class and is equivalent to
|
|
``odict.index(key)``.
|
|
|
|
``odict_keys.__iter__()``
|
|
|
|
Has the same semantics as ``odict.__iter__()``.
|
|
|
|
``odict_keys.__reversed__()``
|
|
|
|
Has the same semantics as ``odict.__reversed__()``.
|
|
|
|
``odict_keys.__cmp__()`` / ``odict_keys.__eq__()`` /
|
|
``odict_keys.__ne__()``
|
|
|
|
Same semantics as the equivalent ``odict`` operation. E.g.: when
|
|
compared to another odict keys view the ordering is taken into
|
|
account.
|
|
|
|
|
|
Questions and Answers
|
|
=====================
|
|
|
|
What happens if an existing key is reassigned?
|
|
|
|
The key is not moved but assigned a new value in place. This is
|
|
consistent with existing implementations and allows subclasses to
|
|
change the behavior easily::
|
|
|
|
class moving_odict(collections.odict):
|
|
def __setitem__(self, key, value):
|
|
self.pop(key, None)
|
|
collections.odict.__setitem__(self, key, value)
|
|
|
|
What happens if keys appear multiple times in the list passed to the
|
|
constructor?
|
|
|
|
The same as for regular dicts: The latter item overrides the
|
|
former. This has the side-effect that the position of the first
|
|
key is used because the key is actually overwritten:
|
|
|
|
>>> odict([('a', 1), ('b', 2), ('a', 3)])
|
|
collections.odict([('a', 3), ('b', 2)])
|
|
|
|
This behavior is consistent with existing implementations in
|
|
Python, the PHP array and the hashmap in Ruby 1.9.
|
|
|
|
Why is there no ``odict.insert()``?
|
|
|
|
There are few situations where you really want to insert a key at
|
|
an specified index. To avoid API complication, the proposed
|
|
solution for this situation is creating a list of items,
|
|
manipulating that and converting it back into an odict:
|
|
|
|
>>> d = odict([('a', 42), ('b', 23), ('c', 19)])
|
|
>>> l = d.items()
|
|
>>> l.insert(1, ('x', 0))
|
|
>>> odict(l)
|
|
collections.odict([('a', 42), ('x', 0), ('b', 23), ('c', 19)])
|
|
|
|
Is the ordered dict a dict subclass?
|
|
|
|
Yes. Like ``defaultdict``, ``odict`` subclasses ``dict``.
|
|
|
|
Does ``odict.pop()`` support list-like popping of items?
|
|
|
|
No. Neither ``odict.__getitem__()`` nor ``odict.pop()`` support
|
|
retrieving or deleting items by index. Slices are not supported
|
|
either. This would introduce ambiguities if integers or slice
|
|
objects are used as dict keys.
|
|
|
|
As a matter of fact, ``odict`` does not implement the ``Sequence``
|
|
interface.
|
|
|
|
|
|
Example Implementation
|
|
======================
|
|
|
|
A poorly performing example implementation of the odict written in
|
|
Python is available:
|
|
|
|
`odict.py <http://dev.pocoo.org/hg/sandbox/raw-file/tip/odict.py>`_
|
|
|
|
The version for ``collections`` should be implemented in C and use a
|
|
linked list internally.
|
|
|
|
Other implementations of ordered dicts in various Python projects or
|
|
standalone libraries, that inspired the API proposed here, are:
|
|
|
|
- `odict in Babel`_
|
|
- `OrderedDict in Django`_
|
|
- `The odict module`_
|
|
- `ordereddict`_ (a C implementation of the odict module)
|
|
- `StableDict`_
|
|
- `Armin Rigo's OrderedDict`_
|
|
|
|
|
|
.. _odict in Babel: http://babel.edgewall.org/browser/trunk/babel/util.py?rev=374#L178
|
|
.. _OrderedDict in Django:
|
|
http://code.djangoproject.com/browser/django/trunk/django/utils/datastructures.py?rev=7140#L53
|
|
.. _The odict module: http://www.voidspace.org.uk/python/odict.html
|
|
.. _ordereddict: http://www.xs4all.nl/~anthon/Python/ordereddict/
|
|
.. _StableDict: http://pypi.python.org/pypi/StableDict/0.2
|
|
.. _Armin Rigo's OrderedDict: http://codespeak.net/svn/user/arigo/hack/pyfuse/OrderedDict.py
|
|
|
|
|
|
Future Directions
|
|
=================
|
|
|
|
With the availability of an ordered dict in the standard library,
|
|
other libraries may take advantage of that. For example, ElementTree
|
|
could return odicts in the future that retain the attribute ordering
|
|
of the source file.
|
|
|
|
|
|
References
|
|
==========
|
|
|
|
.. [1] http://bugs.python.org/issue1371075
|
|
|
|
|
|
Copyright
|
|
=========
|
|
|
|
This document has been placed in the public domain.
|
|
|
|
|
|
..
|
|
Local Variables:
|
|
mode: indented-text
|
|
indent-tabs-mode: nil
|
|
sentence-end-double-space: t
|
|
fill-column: 70
|
|
coding: utf-8
|
|
End:
|
|
|