325 lines
12 KiB
Plaintext
325 lines
12 KiB
Plaintext
PEP: 372
|
|
Title: Adding an ordered dictionary to collections
|
|
Version: $Revision$
|
|
Last-Modified: $Date$
|
|
Author: Armin Ronacher <armin.ronacher@active-4.com>
|
|
Raymond Hettinger <python@rcn.com>
|
|
Status: Draft
|
|
Type: Standards Track
|
|
Content-Type: text/x-rst
|
|
Created: 15-Jun-2008
|
|
Python-Version: 2.7, 3.1
|
|
Post-History:
|
|
|
|
|
|
Abstract
|
|
========
|
|
|
|
This PEP proposes an ordered dictionary as a new data structure for
|
|
the ``collections`` module, called "odict" in this PEP for short. The
|
|
proposed API incorporates the experiences gained from working with
|
|
similar implementations that exist in various real-world applications
|
|
and other programming languages.
|
|
|
|
|
|
Rationale
|
|
=========
|
|
|
|
In current Python versions, the widely used built-in dict type does
|
|
not specify an order for the key/value pairs stored. This makes it
|
|
hard to use dictionaries as data storage for some specific use cases.
|
|
|
|
Some dynamic programming languages like PHP and Ruby 1.9 guarantee a
|
|
certain order on iteration. In those languages, and existing Python
|
|
ordered-dict implementations, the ordering of items is defined by the
|
|
time of insertion of the key. New keys are appended at the end, but
|
|
keys that are overwritten are not moved to the end.
|
|
|
|
The following example shows the behavior for simple assignments:
|
|
|
|
>>> d = odict()
|
|
>>> d['parrot'] = 'dead'
|
|
>>> d['penguin'] = 'exploded'
|
|
>>> d.items()
|
|
[('parrot', 'dead'), ('penguin', 'exploded')]
|
|
|
|
That the ordering is preserved makes an odict useful for a couple of
|
|
situations:
|
|
|
|
- XML/HTML processing libraries currently drop the ordering of
|
|
attributes, use a list instead of a dict which makes filtering
|
|
cumbersome, or implement their own ordered dictionary. This affects
|
|
ElementTree, html5lib, Genshi and many more libraries.
|
|
|
|
- There are many ordered dict implementations in various libraries
|
|
and applications, most of them subtly incompatible with each other.
|
|
Furthermore, subclassing dict is a non-trivial task and many
|
|
implementations don't override all the methods properly which can
|
|
lead to unexpected results.
|
|
|
|
Additionally, many ordered dicts are implemented in an inefficient
|
|
way, making many operations more complex then they have to be.
|
|
|
|
- PEP 3115 allows metaclasses to change the mapping object used for
|
|
the class body. An ordered dict could be used to create ordered
|
|
member declarations similar to C structs. This could be useful, for
|
|
example, for future ``ctypes`` releases as well as ORMs that define
|
|
database tables as classes, like the one the Django framework ships.
|
|
Django currently uses an ugly hack to restore the ordering of
|
|
members in database models.
|
|
|
|
- The RawConfigParser class accepts a ``dict_type`` argument that
|
|
allows an application to set the type of dictionary used internally.
|
|
The motivation for this addition was expressly to allow users to
|
|
provide an ordered dictionary. [1]_
|
|
|
|
- Code ported from other programming languages such as PHP often
|
|
depends on an ordered dict. Having an implementation of an
|
|
ordering-preserving dictionary in the standard library could ease
|
|
the transition and improve the compatibility of different libraries.
|
|
|
|
|
|
Ordered Dict API
|
|
================
|
|
|
|
The ordered dict API would be mostly compatible with dict and existing
|
|
ordered dicts. Note: this PEP refers to the 2.7 and 3.0 dictionary
|
|
API as described in collections.Mapping abstract base class.
|
|
|
|
The constructor and ``update()`` both accept iterables of tuples as
|
|
well as mappings like a dict does. Unlike a regular dictionary,
|
|
the insertion order is preserved.
|
|
|
|
>>> d = odict([('a', 'b'), ('c', 'd')])
|
|
>>> d.update({'foo': 'bar'})
|
|
>>> d
|
|
collections.odict([('a', 'b'), ('c', 'd'), ('foo', 'bar')])
|
|
|
|
If ordered dicts are updated from regular dicts, the ordering of new
|
|
keys is of course undefined.
|
|
|
|
All iteration methods as well as ``keys()``, ``values()`` and
|
|
``items()`` return the values ordered by the time the key was
|
|
first inserted:
|
|
|
|
>>> d['spam'] = 'eggs'
|
|
>>> d.keys()
|
|
['a', 'c', 'foo', 'spam']
|
|
>>> d.values()
|
|
['b', 'd', 'bar', 'eggs']
|
|
>>> d.items()
|
|
[('a', 'b'), ('c', 'd'), ('foo', 'bar'), ('spam', 'eggs')]
|
|
|
|
New methods not available on dict:
|
|
|
|
``odict.__reversed__()``
|
|
Supports reverse iteration by key.
|
|
|
|
|
|
Questions and Answers
|
|
=====================
|
|
|
|
What happens if an existing key is reassigned?
|
|
|
|
The key is not moved but assigned a new value in place. This is
|
|
consistent with existing implementations and allows subclasses to
|
|
change the behavior easily::
|
|
|
|
class moving_odict(collections.odict):
|
|
def __setitem__(self, key, value):
|
|
self.pop(key, None)
|
|
collections.odict.__setitem__(self, key, value)
|
|
|
|
What happens if keys appear multiple times in the list passed to the
|
|
constructor?
|
|
|
|
The same as for regular dicts: The latter item overrides the
|
|
former. This has the side-effect that the position of the first
|
|
key is used because only the value is actually overwritten:
|
|
|
|
>>> odict([('a', 1), ('b', 2), ('a', 3)])
|
|
collections.odict([('a', 3), ('b', 2)])
|
|
|
|
This behavior is consistent with existing implementations in
|
|
Python, the PHP array and the hashmap in Ruby 1.9.
|
|
|
|
Is the ordered dict a dict subclass? Why?
|
|
|
|
Yes. Like ``defaultdict``, ``odict`` subclasses ``dict``.
|
|
Being a dict subclass confers speed upon methods that aren't overridden
|
|
like ``__getitem__`` and ``__len__``. Also, being a dict gives the
|
|
most utility with tools that were expecting regular dicts (like the
|
|
json module).
|
|
|
|
Do any limitations arise from subclassing dict?
|
|
|
|
Yes. Since the API for dicts is different in Py2.x and Py3.x, the
|
|
odict API must also be different (i.e. Py2.6 needs to override
|
|
iterkeys, itervalues, and iteritems).
|
|
|
|
Does ``odict.popitem()`` return a particular key/value pair?
|
|
|
|
Yes. It pops-off the most recently inserted new key and its
|
|
corresponding value. This corresponds to the usual LIFO behavior
|
|
exhibited by traditional push/pop pairs. It is semantically
|
|
equivalent to ``k=list(od)[-1]; v=od[k]; del od[k]; return (k,v)``.
|
|
The actual implementation is more efficient and pops directly
|
|
off of a sorted list of keys.
|
|
|
|
Does odict support indexing, slicing, and whatnot?
|
|
|
|
As a matter of fact, ``odict`` does not implement the ``Sequence``
|
|
interface. Rather, it is a ``MutableMapping`` that remembers
|
|
the order of key insertion. The only sequence-like addition is
|
|
automatic support for ``reversed``.
|
|
|
|
Does odict support alternate sort orders such as alphabetical?
|
|
|
|
No. Those wanting different sort orders really need to be using another
|
|
technique. The odict is all about recording insertion order. If any
|
|
other order is of interest, then another structure (like an in-memory
|
|
dbm) is likely a better fit. It would be a mistake to try to be all
|
|
things to all users.
|
|
|
|
How well does odict work with the json module, PyYAML, and ConfigParser?
|
|
|
|
For json, the good news is that json's encoder respects odict's iteration order:
|
|
|
|
>>> items = [('one', 1), ('two', 2), ('three',3), ('four',4), ('five',5)]
|
|
>>> json.dumps(OrderedDict(items))
|
|
'{"one": 1, "two": 2, "three": 3, "four": 4, "five": 5}'
|
|
|
|
In Py2.6, the object_hook for json decoders passes-in an already built
|
|
dictionary so order is lost before the object hook sees it. This
|
|
problem is being fixed for Python 2.7/3.1 by adding an new hook that
|
|
preserves order (see http://bugs.python.org/issue5381 ).
|
|
With the new hook, order can be preserved:
|
|
|
|
>>> jtext = '{"one": 1, "two": 2, "three": 3, "four": 4, "five": 5}'
|
|
>>> json.loads(jtext, object_pairs_hook=OrderedDict)
|
|
OrderedDict({'one': 1, 'two': 2, 'three': 3, 'four': 4, 'five': 5})
|
|
|
|
For PyYAML, a full round-trip is problem free:
|
|
|
|
>>> ytext = yaml.dump(OrderedDict(items))
|
|
>>> print ytext
|
|
!!python/object/apply:collections.OrderedDict
|
|
- - [one, 1]
|
|
- [two, 2]
|
|
- [three, 3]
|
|
- [four, 4]
|
|
- [five, 5]
|
|
|
|
>>> yaml.load(ytext)
|
|
OrderedDict({'one': 1, 'two': 2, 'three': 3, 'four': 4, 'five': 5})
|
|
|
|
For the ConfigParser module, round-tripping is problem free. Custom
|
|
dicts were added in Py2.6 specifically to support ordered dictionaries:
|
|
|
|
>>> config = ConfigParser(dict_type=OrderedDict)
|
|
>>> config.read('myconfig.ini')
|
|
>>> config.remove_option('Log', 'error')
|
|
>>> config.write(open('myconfig.ini', 'w'))
|
|
|
|
How does odict handle equality testing?
|
|
|
|
Being a dict, one might expect equality tests to not care about order. For
|
|
an odict to dict comparison, this would be a necessity and it's probably
|
|
not wise to silently switch comparison modes based on the input types.
|
|
Also, some third-party tools that expect dict inputs may also expect the
|
|
comparison to not care about order. Accordingly, we decided to punt and
|
|
let the usual dict equality testing run without reference to internal
|
|
ordering. This should be documented clearly since different people will
|
|
have different expectations. If a use case does arise, it's not hard to
|
|
explicitly craft an order based comparison:
|
|
``list(od1.items())==list(od2.items())``.
|
|
|
|
What are the trade-offs of the possible underlying data structures?
|
|
|
|
* Keeping a sorted list of keys is very fast for all operations except
|
|
__delitem__() which becomes an O(n) exercise. This structure leads to
|
|
very simple code and little wasted space.
|
|
|
|
* Keeping a separate dictionary to record insertion sequence numbers makes
|
|
the code a little bit more complex. All of the basic operations are O(1)
|
|
but the constant factor is increased for __setitem__() and __delitem__()
|
|
meaning that every use case will have to pay for this speedup (since all
|
|
buildup go through __setitem__). Also, the first traveral incurs a
|
|
one-time ``O(n log n)`` sorting cost. The storage costs are double that
|
|
for the sorted-list-of-keys approach.
|
|
|
|
* A version written in C could use a linked list. The code would be more
|
|
complex than the other two approaches but it would conserve space and
|
|
would keep the same big-oh performance as regular dictionaries. It is
|
|
the fastest and most space efficient.
|
|
|
|
Reference Implementation
|
|
========================
|
|
|
|
A proposed version is at:
|
|
|
|
`OrderedDict recipe <http://code.activestate.com/recipes/576669/>`_
|
|
|
|
The proposed version has several merits:
|
|
|
|
* Strict compliance with the MutableMapping API and no new methods
|
|
so that the learning curve is near zero. It is simply a dictionary
|
|
that remembers insertion order.
|
|
|
|
* Generally good performance. The big-oh times are the same as regular
|
|
dictionaries except that key deletion is O(n).
|
|
|
|
* The code runs without modification on Py2.6, Py2.7, Py3.0, and Py3.1.
|
|
|
|
Other implementations of ordered dicts in various Python projects or
|
|
standalone libraries, that inspired the API proposed here, are:
|
|
|
|
- `odict in Python`_
|
|
- `odict in Babel`_
|
|
- `OrderedDict in Django`_
|
|
- `The odict module`_
|
|
- `ordereddict`_ (a C implementation of the odict module)
|
|
- `StableDict`_
|
|
- `Armin Rigo's OrderedDict`_
|
|
|
|
.. _odict in Python: http://dev.pocoo.org/hg/sandbox/raw-file/tip/odict.py
|
|
.. _odict in Babel: http://babel.edgewall.org/browser/trunk/babel/util.py?rev=374#L178
|
|
.. _OrderedDict in Django:
|
|
http://code.djangoproject.com/browser/django/trunk/django/utils/datastructures.py?rev=7140#L53
|
|
.. _The odict module: http://www.voidspace.org.uk/python/odict.html
|
|
.. _ordereddict: http://www.xs4all.nl/~anthon/Python/ordereddict/
|
|
.. _StableDict: http://pypi.python.org/pypi/StableDict/0.2
|
|
.. _Armin Rigo's OrderedDict: http://codespeak.net/svn/user/arigo/hack/pyfuse/OrderedDict.py
|
|
|
|
|
|
Future Directions
|
|
=================
|
|
|
|
With the availability of an ordered dict in the standard library,
|
|
other libraries may take advantage of that. For example, ElementTree
|
|
could return odicts in the future that retain the attribute ordering
|
|
of the source file.
|
|
|
|
|
|
References
|
|
==========
|
|
|
|
.. [1] http://bugs.python.org/issue1371075
|
|
|
|
|
|
Copyright
|
|
=========
|
|
|
|
This document has been placed in the public domain.
|
|
|
|
|
|
..
|
|
Local Variables:
|
|
mode: indented-text
|
|
indent-tabs-mode: nil
|
|
sentence-end-double-space: t
|
|
fill-column: 70
|
|
coding: utf-8
|
|
End:
|
|
|