329 lines
12 KiB
Plaintext
329 lines
12 KiB
Plaintext
PEP: 372
|
||
Title: Adding an ordered dictionary to collections
|
||
Version: $Revision$
|
||
Last-Modified: $Date$
|
||
Author: Armin Ronacher <armin.ronacher@active-4.com>
|
||
Raymond Hettinger <python@rcn.com>
|
||
Status: Final
|
||
Type: Standards Track
|
||
Content-Type: text/x-rst
|
||
Created: 15-Jun-2008
|
||
Python-Version: 2.7, 3.1
|
||
Post-History:
|
||
|
||
|
||
Abstract
|
||
========
|
||
|
||
This PEP proposes an ordered dictionary as a new data structure for
|
||
the ``collections`` module, called "OrderedDict" in this PEP. The
|
||
proposed API incorporates the experiences gained from working with
|
||
similar implementations that exist in various real-world applications
|
||
and other programming languages.
|
||
|
||
|
||
Patch
|
||
=====
|
||
|
||
A working Py3.1 patch including tests and documentation is at:
|
||
|
||
`OrderedDict patch <http://bugs.python.org/issue5397>`_
|
||
|
||
The check-in was in revisions: 70101 and 70102
|
||
|
||
Rationale
|
||
=========
|
||
|
||
In current Python versions, the widely used built-in dict type does
|
||
not specify an order for the key/value pairs stored. This makes it
|
||
hard to use dictionaries as data storage for some specific use cases.
|
||
|
||
Some dynamic programming languages like PHP and Ruby 1.9 guarantee a
|
||
certain order on iteration. In those languages, and existing Python
|
||
ordered-dict implementations, the ordering of items is defined by the
|
||
time of insertion of the key. New keys are appended at the end, but
|
||
keys that are overwritten are not moved to the end.
|
||
|
||
The following example shows the behavior for simple assignments:
|
||
|
||
>>> d = OrderedDict()
|
||
>>> d['parrot'] = 'dead'
|
||
>>> d['penguin'] = 'exploded'
|
||
>>> d.items()
|
||
[('parrot', 'dead'), ('penguin', 'exploded')]
|
||
|
||
That the ordering is preserved makes an OrderedDict useful for a couple of
|
||
situations:
|
||
|
||
- XML/HTML processing libraries currently drop the ordering of
|
||
attributes, use a list instead of a dict which makes filtering
|
||
cumbersome, or implement their own ordered dictionary. This affects
|
||
ElementTree, html5lib, Genshi and many more libraries.
|
||
|
||
- There are many ordered dict implementations in various libraries
|
||
and applications, most of them subtly incompatible with each other.
|
||
Furthermore, subclassing dict is a non-trivial task and many
|
||
implementations don't override all the methods properly which can
|
||
lead to unexpected results.
|
||
|
||
Additionally, many ordered dicts are implemented in an inefficient
|
||
way, making many operations more complex then they have to be.
|
||
|
||
- PEP 3115 allows metaclasses to change the mapping object used for
|
||
the class body. An ordered dict could be used to create ordered
|
||
member declarations similar to C structs. This could be useful, for
|
||
example, for future ``ctypes`` releases as well as ORMs that define
|
||
database tables as classes, like the one the Django framework ships.
|
||
Django currently uses an ugly hack to restore the ordering of
|
||
members in database models.
|
||
|
||
- The RawConfigParser class accepts a ``dict_type`` argument that
|
||
allows an application to set the type of dictionary used internally.
|
||
The motivation for this addition was expressly to allow users to
|
||
provide an ordered dictionary. [1]_
|
||
|
||
- Code ported from other programming languages such as PHP often
|
||
depends on an ordered dict. Having an implementation of an
|
||
ordering-preserving dictionary in the standard library could ease
|
||
the transition and improve the compatibility of different libraries.
|
||
|
||
|
||
Ordered Dict API
|
||
================
|
||
|
||
The ordered dict API would be mostly compatible with dict and existing
|
||
ordered dicts. Note: this PEP refers to the 2.7 and 3.0 dictionary
|
||
API as described in collections.Mapping abstract base class.
|
||
|
||
The constructor and ``update()`` both accept iterables of tuples as
|
||
well as mappings like a dict does. Unlike a regular dictionary,
|
||
the insertion order is preserved.
|
||
|
||
>>> d = OrderedDict([('a', 'b'), ('c', 'd')])
|
||
>>> d.update({'foo': 'bar'})
|
||
>>> d
|
||
collections.OrderedDict([('a', 'b'), ('c', 'd'), ('foo', 'bar')])
|
||
|
||
If ordered dicts are updated from regular dicts, the ordering of new
|
||
keys is of course undefined.
|
||
|
||
All iteration methods as well as ``keys()``, ``values()`` and
|
||
``items()`` return the values ordered by the time the key was
|
||
first inserted:
|
||
|
||
>>> d['spam'] = 'eggs'
|
||
>>> d.keys()
|
||
['a', 'c', 'foo', 'spam']
|
||
>>> d.values()
|
||
['b', 'd', 'bar', 'eggs']
|
||
>>> d.items()
|
||
[('a', 'b'), ('c', 'd'), ('foo', 'bar'), ('spam', 'eggs')]
|
||
|
||
New methods not available on dict:
|
||
|
||
``OrderedDict.__reversed__()``
|
||
Supports reverse iteration by key.
|
||
|
||
|
||
Questions and Answers
|
||
=====================
|
||
|
||
What happens if an existing key is reassigned?
|
||
|
||
The key is not moved but assigned a new value in place. This is
|
||
consistent with existing implementations.
|
||
|
||
What happens if keys appear multiple times in the list passed to the
|
||
constructor?
|
||
|
||
The same as for regular dicts -- the latter item overrides the
|
||
former. This has the side-effect that the position of the first
|
||
key is used because only the value is actually overwritten::
|
||
|
||
>>> OrderedDict([('a', 1), ('b', 2), ('a', 3)])
|
||
collections.OrderedDict([('a', 3), ('b', 2)])
|
||
|
||
This behavior is consistent with existing implementations in
|
||
Python, the PHP array and the hashmap in Ruby 1.9.
|
||
|
||
Is the ordered dict a dict subclass? Why?
|
||
|
||
Yes. Like ``defaultdict``, an ordered dictionary `` subclasses ``dict``.
|
||
Being a dict subclass make some of the methods faster (like
|
||
``__getitem__`` and ``__len__``). More importantly, being a dict
|
||
subclass lets ordered dictionaries be usable with tools like json that
|
||
insist on having dict inputs by testing isinstance(d, dict).
|
||
|
||
Do any limitations arise from subclassing dict?
|
||
|
||
Yes. Since the API for dicts is different in Py2.x and Py3.x, the
|
||
OrderedDict API must also be different. So, the Py2.7 version will need
|
||
to override iterkeys, itervalues, and iteritems.
|
||
|
||
Does ``OrderedDict.popitem()`` return a particular key/value pair?
|
||
|
||
Yes. It pops-off the most recently inserted new key and its
|
||
corresponding value. This corresponds to the usual LIFO behavior
|
||
exhibited by traditional push/pop pairs. It is semantically
|
||
equivalent to ``k=list(od)[-1]; v=od[k]; del od[k]; return (k,v)``.
|
||
The actual implementation is more efficient and pops directly
|
||
from a sorted list of keys.
|
||
|
||
Does OrderedDict support indexing, slicing, and whatnot?
|
||
|
||
As a matter of fact, ``OrderedDict`` does not implement the ``Sequence``
|
||
interface. Rather, it is a ``MutableMapping`` that remembers
|
||
the order of key insertion. The only sequence-like addition is
|
||
support for ``reversed``.
|
||
|
||
A further advantage of not allowing indexing is that it leaves open
|
||
the possibility of a fast C implementation using linked lists.
|
||
|
||
Does OrderedDict support alternate sort orders such as alphabetical?
|
||
|
||
No. Those wanting different sort orders really need to be using another
|
||
technique. The OrderedDict is all about recording insertion order. If any
|
||
other order is of interest, then another structure (like an in-memory
|
||
dbm) is likely a better fit.
|
||
|
||
How well does OrderedDict work with the json module, PyYAML, and ConfigParser?
|
||
|
||
For json, the good news is that json's encoder respects OrderedDict's iteration order::
|
||
|
||
>>> items = [('one', 1), ('two', 2), ('three',3), ('four',4), ('five',5)]
|
||
>>> json.dumps(OrderedDict(items))
|
||
'{"one": 1, "two": 2, "three": 3, "four": 4, "five": 5}'
|
||
|
||
In Py2.6, the object_hook for json decoders passes-in an already built
|
||
dictionary so order is lost before the object hook sees it. This
|
||
problem is being fixed for Python 2.7/3.1 by adding a new hook that
|
||
preserves order (see http://bugs.python.org/issue5381 ).
|
||
With the new hook, order can be preserved::
|
||
|
||
>>> jtext = '{"one": 1, "two": 2, "three": 3, "four": 4, "five": 5}'
|
||
>>> json.loads(jtext, object_pairs_hook=OrderedDict)
|
||
OrderedDict({'one': 1, 'two': 2, 'three': 3, 'four': 4, 'five': 5})
|
||
|
||
For PyYAML, a full round-trip is problem free::
|
||
|
||
>>> ytext = yaml.dump(OrderedDict(items))
|
||
>>> print ytext
|
||
!!python/object/apply:collections.OrderedDict
|
||
- - [one, 1]
|
||
- [two, 2]
|
||
- [three, 3]
|
||
- [four, 4]
|
||
- [five, 5]
|
||
|
||
>>> yaml.load(ytext)
|
||
OrderedDict({'one': 1, 'two': 2, 'three': 3, 'four': 4, 'five': 5})
|
||
|
||
For the ConfigParser module, round-tripping is also problem free. Custom
|
||
dicts were added in Py2.6 specifically to support ordered dictionaries::
|
||
|
||
>>> config = ConfigParser(dict_type=OrderedDict)
|
||
>>> config.read('myconfig.ini')
|
||
>>> config.remove_option('Log', 'error')
|
||
>>> config.write(open('myconfig.ini', 'w'))
|
||
|
||
How does OrderedDict handle equality testing?
|
||
|
||
Comparing two ordered dictionaries implies that the test will be
|
||
order-sensitive so that list ``(od1.items())==list(od2.items())``.
|
||
|
||
When ordered dicts are compared with other Mappings, their order
|
||
insensitive comparison is used. This allows ordered dictionaries
|
||
to be substituted anywhere regular dictionaries are used.
|
||
|
||
How __repr__ format will maintain order during a repr/eval round-trip?
|
||
|
||
OrderedDict([('a', 1), ('b', 2)])
|
||
|
||
What are the trade-offs of the possible underlying data structures?
|
||
|
||
* Keeping a sorted list of keys is fast for all operations except
|
||
__delitem__() which becomes an O(n) exercise. This data structure leads
|
||
to very simple code and little wasted space.
|
||
|
||
* Keeping a separate dictionary to record insertion sequence numbers makes
|
||
the code a little bit more complex. All of the basic operations are O(1)
|
||
but the constant factor is increased for __setitem__() and __delitem__()
|
||
meaning that every use case will have to pay for this speedup (since all
|
||
buildup go through __setitem__). Also, the first traveral incurs a
|
||
one-time ``O(n log n)`` sorting cost. The storage costs are double that
|
||
for the sorted-list-of-keys approach.
|
||
|
||
* A version written in C could use a linked list. The code would be more
|
||
complex than the other two approaches but it would conserve space and
|
||
would keep the same big-oh performance as regular dictionaries. It is
|
||
the fastest and most space efficient.
|
||
|
||
|
||
Reference Implementation
|
||
========================
|
||
|
||
An implementation with tests and documentation is at:
|
||
|
||
`OrderedDict patch <http://bugs.python.org/issue5397>`_
|
||
|
||
The proposed version has several merits:
|
||
|
||
* Strict compliance with the MutableMapping API and no new methods
|
||
so that the learning curve is near zero. It is simply a dictionary
|
||
that remembers insertion order.
|
||
|
||
* Generally good performance. The big-oh times are the same as regular
|
||
dictionaries except that key deletion is O(n).
|
||
|
||
Other implementations of ordered dicts in various Python projects or
|
||
standalone libraries, that inspired the API proposed here, are:
|
||
|
||
- `odict in Python`_
|
||
- `odict in Babel`_
|
||
- `OrderedDict in Django`_
|
||
- `The odict module`_
|
||
- `ordereddict`_ (a C implementation of the odict module)
|
||
- `StableDict`_
|
||
- `Armin Rigo's OrderedDict`_
|
||
|
||
.. _odict in Python: http://dev.pocoo.org/hg/sandbox/raw-file/tip/odict.py
|
||
.. _odict in Babel: http://babel.edgewall.org/browser/trunk/babel/util.py?rev=374#L178
|
||
.. _OrderedDict in Django:
|
||
http://code.djangoproject.com/browser/django/trunk/django/utils/datastructures.py?rev=7140#L53
|
||
.. _The odict module: http://www.voidspace.org.uk/python/odict.html
|
||
.. _ordereddict: http://www.xs4all.nl/~anthon/Python/ordereddict/
|
||
.. _StableDict: http://pypi.python.org/pypi/StableDict/0.2
|
||
.. _Armin Rigo's OrderedDict: http://codespeak.net/svn/user/arigo/hack/pyfuse/OrderedDict.py
|
||
|
||
|
||
Future Directions
|
||
=================
|
||
|
||
With the availability of an ordered dict in the standard library,
|
||
other libraries may take advantage of that. For example, ElementTree
|
||
could return odicts in the future that retain the attribute ordering
|
||
of the source file.
|
||
|
||
|
||
References
|
||
==========
|
||
|
||
.. [1] http://bugs.python.org/issue1371075
|
||
|
||
|
||
Copyright
|
||
=========
|
||
|
||
This document has been placed in the public domain.
|
||
|
||
|
||
|
||
..
|
||
Local Variables:
|
||
mode: indented-text
|
||
indent-tabs-mode: nil
|
||
sentence-end-double-space: t
|
||
fill-column: 70
|
||
coding: utf-8
|
||
End:
|