316 lines
12 KiB
Plaintext
316 lines
12 KiB
Plaintext
PEP: 372
|
|
Title: Adding an ordered dictionary to collections
|
|
Author: Armin Ronacher <armin.ronacher@active-4.com>,
|
|
Raymond Hettinger <python@rcn.com>
|
|
Status: Final
|
|
Type: Standards Track
|
|
Content-Type: text/x-rst
|
|
Created: 15-Jun-2008
|
|
Python-Version: 2.7, 3.1
|
|
Post-History:
|
|
|
|
|
|
Abstract
|
|
========
|
|
|
|
This PEP proposes an ordered dictionary as a new data structure for
|
|
the ``collections`` module, called "OrderedDict" in this PEP. The
|
|
proposed API incorporates the experiences gained from working with
|
|
similar implementations that exist in various real-world applications
|
|
and other programming languages.
|
|
|
|
|
|
Patch
|
|
=====
|
|
|
|
A working Py3.1 patch including tests and documentation is at:
|
|
|
|
`OrderedDict patch <https://github.com/python/cpython/issues/49647>`_
|
|
|
|
The check-in was in revisions: 70101 and 70102
|
|
|
|
Rationale
|
|
=========
|
|
|
|
In current Python versions, the widely used built-in dict type does
|
|
not specify an order for the key/value pairs stored. This makes it
|
|
hard to use dictionaries as data storage for some specific use cases.
|
|
|
|
Some dynamic programming languages like PHP and Ruby 1.9 guarantee a
|
|
certain order on iteration. In those languages, and existing Python
|
|
ordered-dict implementations, the ordering of items is defined by the
|
|
time of insertion of the key. New keys are appended at the end, but
|
|
keys that are overwritten are not moved to the end.
|
|
|
|
The following example shows the behavior for simple assignments:
|
|
|
|
>>> d = OrderedDict()
|
|
>>> d['parrot'] = 'dead'
|
|
>>> d['penguin'] = 'exploded'
|
|
>>> d.items()
|
|
[('parrot', 'dead'), ('penguin', 'exploded')]
|
|
|
|
That the ordering is preserved makes an OrderedDict useful for a couple of
|
|
situations:
|
|
|
|
- XML/HTML processing libraries currently drop the ordering of
|
|
attributes, use a list instead of a dict which makes filtering
|
|
cumbersome, or implement their own ordered dictionary. This affects
|
|
ElementTree, html5lib, Genshi and many more libraries.
|
|
|
|
- There are many ordered dict implementations in various libraries
|
|
and applications, most of them subtly incompatible with each other.
|
|
Furthermore, subclassing dict is a non-trivial task and many
|
|
implementations don't override all the methods properly which can
|
|
lead to unexpected results.
|
|
|
|
Additionally, many ordered dicts are implemented in an inefficient
|
|
way, making many operations more complex then they have to be.
|
|
|
|
- :pep:`3115` allows metaclasses to change the mapping object used for
|
|
the class body. An ordered dict could be used to create ordered
|
|
member declarations similar to C structs. This could be useful, for
|
|
example, for future ``ctypes`` releases as well as ORMs that define
|
|
database tables as classes, like the one the Django framework ships.
|
|
Django currently uses an ugly hack to restore the ordering of
|
|
members in database models.
|
|
|
|
- The RawConfigParser class accepts a ``dict_type`` argument that
|
|
allows an application to set the type of dictionary used internally.
|
|
The motivation for this addition was expressly to allow users to
|
|
provide an ordered dictionary. [1]_
|
|
|
|
- Code ported from other programming languages such as PHP often
|
|
depends on an ordered dict. Having an implementation of an
|
|
ordering-preserving dictionary in the standard library could ease
|
|
the transition and improve the compatibility of different libraries.
|
|
|
|
|
|
Ordered Dict API
|
|
================
|
|
|
|
The ordered dict API would be mostly compatible with dict and existing
|
|
ordered dicts. Note: this PEP refers to the 2.7 and 3.0 dictionary
|
|
API as described in collections.Mapping abstract base class.
|
|
|
|
The constructor and ``update()`` both accept iterables of tuples as
|
|
well as mappings like a dict does. Unlike a regular dictionary,
|
|
the insertion order is preserved.
|
|
|
|
>>> d = OrderedDict([('a', 'b'), ('c', 'd')])
|
|
>>> d.update({'foo': 'bar'})
|
|
>>> d
|
|
collections.OrderedDict([('a', 'b'), ('c', 'd'), ('foo', 'bar')])
|
|
|
|
If ordered dicts are updated from regular dicts, the ordering of new
|
|
keys is of course undefined.
|
|
|
|
All iteration methods as well as ``keys()``, ``values()`` and
|
|
``items()`` return the values ordered by the time the key was
|
|
first inserted:
|
|
|
|
>>> d['spam'] = 'eggs'
|
|
>>> d.keys()
|
|
['a', 'c', 'foo', 'spam']
|
|
>>> d.values()
|
|
['b', 'd', 'bar', 'eggs']
|
|
>>> d.items()
|
|
[('a', 'b'), ('c', 'd'), ('foo', 'bar'), ('spam', 'eggs')]
|
|
|
|
New methods not available on dict:
|
|
|
|
``OrderedDict.__reversed__()``
|
|
Supports reverse iteration by key.
|
|
|
|
|
|
Questions and Answers
|
|
=====================
|
|
|
|
What happens if an existing key is reassigned?
|
|
|
|
The key is not moved but assigned a new value in place. This is
|
|
consistent with existing implementations.
|
|
|
|
What happens if keys appear multiple times in the list passed to the
|
|
constructor?
|
|
|
|
The same as for regular dicts -- the latter item overrides the
|
|
former. This has the side-effect that the position of the first
|
|
key is used because only the value is actually overwritten::
|
|
|
|
>>> OrderedDict([('a', 1), ('b', 2), ('a', 3)])
|
|
collections.OrderedDict([('a', 3), ('b', 2)])
|
|
|
|
This behavior is consistent with existing implementations in
|
|
Python, the PHP array and the hashmap in Ruby 1.9.
|
|
|
|
Is the ordered dict a dict subclass? Why?
|
|
|
|
Yes. Like ``defaultdict``, an ordered dictionary subclasses ``dict``.
|
|
Being a dict subclass make some of the methods faster (like
|
|
``__getitem__`` and ``__len__``). More importantly, being a dict
|
|
subclass lets ordered dictionaries be usable with tools like json that
|
|
insist on having dict inputs by testing isinstance(d, dict).
|
|
|
|
Do any limitations arise from subclassing dict?
|
|
|
|
Yes. Since the API for dicts is different in Py2.x and Py3.x, the
|
|
OrderedDict API must also be different. So, the Py2.7 version will need
|
|
to override iterkeys, itervalues, and iteritems.
|
|
|
|
Does ``OrderedDict.popitem()`` return a particular key/value pair?
|
|
|
|
Yes. It pops-off the most recently inserted new key and its
|
|
corresponding value. This corresponds to the usual LIFO behavior
|
|
exhibited by traditional push/pop pairs. It is semantically
|
|
equivalent to ``k=list(od)[-1]; v=od[k]; del od[k]; return (k,v)``.
|
|
The actual implementation is more efficient and pops directly
|
|
from a sorted list of keys.
|
|
|
|
Does OrderedDict support indexing, slicing, and whatnot?
|
|
|
|
As a matter of fact, ``OrderedDict`` does not implement the ``Sequence``
|
|
interface. Rather, it is a ``MutableMapping`` that remembers
|
|
the order of key insertion. The only sequence-like addition is
|
|
support for ``reversed``.
|
|
|
|
A further advantage of not allowing indexing is that it leaves open
|
|
the possibility of a fast C implementation using linked lists.
|
|
|
|
Does OrderedDict support alternate sort orders such as alphabetical?
|
|
|
|
No. Those wanting different sort orders really need to be using another
|
|
technique. The OrderedDict is all about recording insertion order. If any
|
|
other order is of interest, then another structure (like an in-memory
|
|
dbm) is likely a better fit.
|
|
|
|
How well does OrderedDict work with the json module, PyYAML, and ConfigParser?
|
|
|
|
For json, the good news is that json's encoder respects OrderedDict's iteration order::
|
|
|
|
>>> items = [('one', 1), ('two', 2), ('three',3), ('four',4), ('five',5)]
|
|
>>> json.dumps(OrderedDict(items))
|
|
'{"one": 1, "two": 2, "three": 3, "four": 4, "five": 5}'
|
|
|
|
In Py2.6, the object_hook for json decoders passes-in an already built
|
|
dictionary so order is lost before the object hook sees it. This
|
|
problem is being fixed for Python 2.7/3.1 by adding a new hook that
|
|
preserves order (see https://github.com/python/cpython/issues/49631 ).
|
|
With the new hook, order can be preserved::
|
|
|
|
>>> jtext = '{"one": 1, "two": 2, "three": 3, "four": 4, "five": 5}'
|
|
>>> json.loads(jtext, object_pairs_hook=OrderedDict)
|
|
OrderedDict({'one': 1, 'two': 2, 'three': 3, 'four': 4, 'five': 5})
|
|
|
|
For PyYAML, a full round-trip is problem free::
|
|
|
|
>>> ytext = yaml.dump(OrderedDict(items))
|
|
>>> print ytext
|
|
!!python/object/apply:collections.OrderedDict
|
|
- - [one, 1]
|
|
- [two, 2]
|
|
- [three, 3]
|
|
- [four, 4]
|
|
- [five, 5]
|
|
|
|
>>> yaml.load(ytext)
|
|
OrderedDict({'one': 1, 'two': 2, 'three': 3, 'four': 4, 'five': 5})
|
|
|
|
For the ConfigParser module, round-tripping is also problem free. Custom
|
|
dicts were added in Py2.6 specifically to support ordered dictionaries::
|
|
|
|
>>> config = ConfigParser(dict_type=OrderedDict)
|
|
>>> config.read('myconfig.ini')
|
|
>>> config.remove_option('Log', 'error')
|
|
>>> config.write(open('myconfig.ini', 'w'))
|
|
|
|
How does OrderedDict handle equality testing?
|
|
|
|
Comparing two ordered dictionaries implies that the test will be
|
|
order-sensitive so that list ``(od1.items())==list(od2.items())``.
|
|
|
|
When ordered dicts are compared with other Mappings, their order
|
|
insensitive comparison is used. This allows ordered dictionaries
|
|
to be substituted anywhere regular dictionaries are used.
|
|
|
|
How __repr__ format will maintain order during a repr/eval round-trip?
|
|
|
|
OrderedDict([('a', 1), ('b', 2)])
|
|
|
|
What are the trade-offs of the possible underlying data structures?
|
|
|
|
* Keeping a sorted list of keys is fast for all operations except
|
|
__delitem__() which becomes an O(n) exercise. This data structure leads
|
|
to very simple code and little wasted space.
|
|
|
|
* Keeping a separate dictionary to record insertion sequence numbers makes
|
|
the code a little bit more complex. All of the basic operations are O(1)
|
|
but the constant factor is increased for __setitem__() and __delitem__()
|
|
meaning that every use case will have to pay for this speedup (since all
|
|
buildup go through __setitem__). Also, the first traversal incurs a
|
|
one-time ``O(n log n)`` sorting cost. The storage costs are double that
|
|
for the sorted-list-of-keys approach.
|
|
|
|
* A version written in C could use a linked list. The code would be more
|
|
complex than the other two approaches but it would conserve space and
|
|
would keep the same big-oh performance as regular dictionaries. It is
|
|
the fastest and most space efficient.
|
|
|
|
|
|
Reference Implementation
|
|
========================
|
|
|
|
An implementation with tests and documentation is at:
|
|
|
|
`OrderedDict patch <https://github.com/python/cpython/issues/49647>`_
|
|
|
|
The proposed version has several merits:
|
|
|
|
* Strict compliance with the MutableMapping API and no new methods
|
|
so that the learning curve is near zero. It is simply a dictionary
|
|
that remembers insertion order.
|
|
|
|
* Generally good performance. The big-oh times are the same as regular
|
|
dictionaries except that key deletion is O(n).
|
|
|
|
Other implementations of ordered dicts in various Python projects or
|
|
standalone libraries, that inspired the API proposed here, are:
|
|
|
|
- `odict in Python`_
|
|
- `odict in Babel`_
|
|
- `OrderedDict in Django`_
|
|
- `The odict module`_
|
|
- `ordereddict`_ (a C implementation of the odict module)
|
|
- `StableDict`_
|
|
- `Armin Rigo's OrderedDict`_
|
|
|
|
.. _odict in Python: http://dev.pocoo.org/hg/sandbox/raw-file/tip/odict.py
|
|
.. _odict in Babel: http://babel.edgewall.org/browser/trunk/babel/util.py?rev=374#L178
|
|
.. _OrderedDict in Django:
|
|
http://code.djangoproject.com/browser/django/trunk/django/utils/datastructures.py?rev=7140#L53
|
|
.. _The odict module: http://www.voidspace.org.uk/python/odict.html
|
|
.. _ordereddict: http://www.xs4all.nl/~anthon/Python/ordereddict/
|
|
.. _StableDict: http://pypi.python.org/pypi/StableDict/0.2
|
|
.. _Armin Rigo's OrderedDict: http://codespeak.net/svn/user/arigo/hack/pyfuse/OrderedDict.py
|
|
|
|
|
|
Future Directions
|
|
=================
|
|
|
|
With the availability of an ordered dict in the standard library,
|
|
other libraries may take advantage of that. For example, ElementTree
|
|
could return odicts in the future that retain the attribute ordering
|
|
of the source file.
|
|
|
|
|
|
References
|
|
==========
|
|
|
|
.. [1] https://github.com/python/cpython/issues/42649
|
|
|
|
|
|
Copyright
|
|
=========
|
|
|
|
This document has been placed in the public domain.
|