From 60cae8daed0ee552a99623fec996664d972f2ff3 Mon Sep 17 00:00:00 2001 From: Georg Brandl Date: Mon, 16 Jun 2008 06:08:08 +0000 Subject: [PATCH] Add PEP 372, Adding an ordered directory to collections. --- pep-0000.txt | 3 + pep-0372.txt | 234 +++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 237 insertions(+) create mode 100644 pep-0372.txt diff --git a/pep-0000.txt b/pep-0000.txt index e9bf6d5c5..bea2bec5d 100644 --- a/pep-0000.txt +++ b/pep-0000.txt @@ -98,6 +98,7 @@ Index by Category S 364 Transitioning to the Py3K Standard Library Warsaw S 368 Standard image protocol and class Mastrodomenico S 369 Post import hooks Heimes + S 372 Adding an ordered directory to collections Ronacher S 3135 New Super Spealman, Delaney Finished PEPs (done, implemented in Subversion) @@ -476,6 +477,7 @@ Numerical Index S 369 Post import hooks Heimes SA 370 Per user site-packages directory Heimes SA 371 Addition of the multiprocessing package Noller, Oudkerk + S 372 Adding an ordered directory to collections Ronacher SR 666 Reject Foolish Indentation Creighton SR 754 IEEE 754 Floating Point Special Values Warnes P 3000 Python 3000 GvR @@ -629,6 +631,7 @@ Owners Reis, Christian R. kiko@async.com.br Riehl, Jonathan jriehl@spaceship.com Roberge, André andre.roberge@gmail.com + Ronacher, Armin armin.ronacher@active-4.com van Rossum, Guido (GvR) guido@python.org van Rossum, Just (JvR) just@letterror.com Sajip, Vinay vinay_sajip@red-dove.com diff --git a/pep-0372.txt b/pep-0372.txt new file mode 100644 index 000000000..7d1ac7fac --- /dev/null +++ b/pep-0372.txt @@ -0,0 +1,234 @@ +PEP: 372 +Title: Adding an ordered directory to collections +Version: $Revision$ +Last-Modified: $Date$ +Author: Armin Ronacher +Status: Draft +Type: Standards Track +Content-Type: text/x-rst +Created: 15-Jun-2008 +Python-Version: 2.6, 3.0 +Post-History: + + +Abstract +======== + +This PEP proposes an ordered dictionary as a new data structure for +the ``collections`` module, called "odict" in this PEP for short. The +proposed API incorporates the experiences gained from working with +similar implementations that exist in various real-world applications +and other programming languages. + + +Rationale +========= + +In current Python versions, the widely used built-in dict type does +not specify an order for the key/value pairs stored. This makes it +hard to use dictionaries as data storage for some specific use cases. + +Some dynamic programming languages like PHP and Ruby 1.9 guarantee a +certain order on iteration. In those languages, and existing Python +ordered-dict implementations, the ordering of items is defined by the +time of insertion of the key. New keys are appended at the end, keys +that are overwritten and not moved. + +The following example shows the behavior for simple assignments: + +>>> d = odict() +>>> d['parrot'] = 'dead' +>>> d['penguin'] = 'exploded' +>>> d.items() +[('parrot', 'dead'), ('penguin', 'exploded')] + +That the ordering is preserved makes an odict useful for a couple of +situations: + +- XML/HTML processing libraries currently drop the ordering of + attributes, use a list instead of a dict which makes filtering + cumbersome, or implement their own ordered dictionary. This affects + ElementTree, html5lib, Genshi and many more libraries. + +- There are many ordererd dict implementations in various libraries + and applications, most of them subtly incompatible with each other. + Furthermore, subclassing dict is a non-trivial task and many + implementations don't override all the methods properly which can + lead to unexpected results. + + Additionally, many ordered dicts are implemented in an inefficient + way, making many operations more complex then they have to be. + +- PEP 3115 allows metaclasses to change the mapping object used for + the class body. An ordered dict could be used to create ordered + member declarations similar to C structs. This could be useful, for + example, for future ``ctypes`` releases as well as ORMs that define + database tables as classes, like the one the Django framework ships. + Django currently uses an ugly hack to restore the ordering of + members in database models. + +- Code ported from other programming languages such as PHP often + depends on a ordered dict. Having an implementation of an + ordering-preserving dictionary in the standard library could ease + the transition and improve the compatibility of different libraries. + + +Ordered Dict API +================ + +The ordered dict API would be mostly compatible with dict and existing +ordered dicts. (Note: this PEP refers to the Python 2.x dictionary +API; the transfer to the 3.x API is trivial.) + +The constructor and ``update()`` both accept iterables of tuples as +well as mappings like a dict does. The ordering however is preserved +for the first case: + +>>> d = odict([('a', 'b'), ('c', 'd')]) +>>> d.update({'foo': 'bar'}) +>>> d +collections.odict([('a', 'b'), ('c', 'd'), ('foo', 'bar')]) + +If ordered dicts are updated from regular dicts, the ordering of new +keys is of course undefined again unless ``sort()`` is called. + +All iteration methods as well as ``keys()``, ``values()`` and +``items()`` return the values ordered by the the time the key-value +pair was inserted: + +>>> d['spam'] = 'eggs' +>>> d.keys() +['a', 'c', 'foo', 'spam'] +>>> d.values() +['b', 'd', 'bar', 'eggs'] +>>> d.items() +[('a', 'b'), ('c', 'd'), ('foo', 'bar'), ('spam', 'eggs')] + +New methods not available on dict: + + ``odict.byindex(index)`` + + Index-based lookup is supported by ``byindex()`` which returns + the key/value pair for an index, that is, the "position" of a + key in the ordered dict. 0 is the first key/value pair, -1 + the last. + + >>> d.byindex(2) + ('foo', 'bar') + + ``odict.sort(cmp=None, key=None, reverse=False)`` + + Sorts the odict in place by cmp or key. This works exactly + like ``list.sort()``, but the comparison functions are passed + a key/value tuple, not only the value. + + >>> d = odict([(42, 1), (1, 4), (23, 7)]) + >>> d.sort() + >>> d + collections.odict([(1, 4), (23, 7), (42, 1)]) + + ``odict.reverse()`` + + Reverses the odict in place. + + ``odict.__reverse__()`` + + Supports reverse iteration by key. + + +Questions and Answers +===================== + +What happens if an existing key is reassigned? + + The key is not moved but assigned a new value in place. This is + consistent with existing implementations and allows subclasses to + change the behavior easily:: + + class movingcollections.odict): + def __setitem__(self, key, value): + self.pop(key, None) + odict.__setitem__(self, key, value) + +What happens if keys appear multiple times in the list passed to the +constructor? + + The same as for regular dicts: The latter item overrides the + former. This has the side-effect that the position of the first + key is used because the key is actually overwritten: + + >>> odict([('a', 1), ('b', 2), ('a', 3)]) + collections.odict([('a', 3), ('b', 2)]) + + This behavior is consistent with existing implementations in + Python, the PHP array and the hashmap in Ruby 1.9. + +Why is there no ``odict.insert()``? + + There are few situations where you really want to insert a key at + an specified index. To avoid API complication, the proposed + solution for this situation is creating a list of items, + manipulating that and converting it back into an odict: + + >>> d = odict([('a', 42), ('b', 23), ('c', 19)]) + >>> l = d.items() + >>> l.insert(1, ('x', 0)) + >>> odict(l) + collections.odict([('a', 42), ('x', 0), ('b', 23), ('c', 19)]) + + +Example Implementation +====================== + +A poorly performing example implementation of the odict written in +Python is available: + + `odict.py `_ + +The version for ``collections`` should be implemented in C and use a +linked list internally. + +Other implementations of ordered dicts in various Python projects or +standalone libraries, that inspired the API proposed here, are: + +- `odict in Babel`_ +- `OrderedDict in Django`_ +- `The odict module`_ +- `ordereddict`_ (a C implementation of the odict module) +- `StableDict`_ +- `Armin Rigo's OrderedDict`_ + + +.. _odict in Babel: http://babel.edgewall.org/browser/trunk/babel/util.py?rev=374#L178 +.. _OrderedDict in Django: + http://code.djangoproject.com/browser/django/trunk/django/utils/datastructures.py?rev=7140#L53 +.. _The odict module: http://www.voidspace.org.uk/python/odict.html +.. _ordereddict: http://www.xs4all.nl/~anthon/Python/ordereddict/ +.. _StableDict: http://pypi.python.org/pypi/StableDict/0.2 +.. _Armin Rigo's OrderedDict: http://codespeak.net/svn/user/arigo/hack/pyfuse/OrderedDict.py + + +Future Directions +================= + +With the availability of an ordered dict in the standard library, +other libraries may take advantage of that. For example, ElementTree +could return odicts in the future that retain the attribute ordering +of the source file. + + +Copyright +========= + +This document has been placed in the public domain. + + +.. + Local Variables: + mode: indented-text + indent-tabs-mode: nil + sentence-end-double-space: t + fill-column: 70 + coding: utf-8 + End: +