Add PEP 372, Adding an ordered directory to collections.

This commit is contained in:
Georg Brandl 2008-06-16 06:08:08 +00:00
parent 86de0a6dbe
commit 60cae8daed
2 changed files with 237 additions and 0 deletions

View File

@ -98,6 +98,7 @@ Index by Category
S 364 Transitioning to the Py3K Standard Library Warsaw
S 368 Standard image protocol and class Mastrodomenico
S 369 Post import hooks Heimes
S 372 Adding an ordered directory to collections Ronacher
S 3135 New Super Spealman, Delaney
Finished PEPs (done, implemented in Subversion)
@ -476,6 +477,7 @@ Numerical Index
S 369 Post import hooks Heimes
SA 370 Per user site-packages directory Heimes
SA 371 Addition of the multiprocessing package Noller, Oudkerk
S 372 Adding an ordered directory to collections Ronacher
SR 666 Reject Foolish Indentation Creighton
SR 754 IEEE 754 Floating Point Special Values Warnes
P 3000 Python 3000 GvR
@ -629,6 +631,7 @@ Owners
Reis, Christian R. kiko@async.com.br
Riehl, Jonathan jriehl@spaceship.com
Roberge, André andre.roberge@gmail.com
Ronacher, Armin armin.ronacher@active-4.com
van Rossum, Guido (GvR) guido@python.org
van Rossum, Just (JvR) just@letterror.com
Sajip, Vinay vinay_sajip@red-dove.com

234
pep-0372.txt Normal file
View File

@ -0,0 +1,234 @@
PEP: 372
Title: Adding an ordered directory to collections
Version: $Revision$
Last-Modified: $Date$
Author: Armin Ronacher <armin.ronacher@active-4.com>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 15-Jun-2008
Python-Version: 2.6, 3.0
Post-History:
Abstract
========
This PEP proposes an ordered dictionary as a new data structure for
the ``collections`` module, called "odict" in this PEP for short. The
proposed API incorporates the experiences gained from working with
similar implementations that exist in various real-world applications
and other programming languages.
Rationale
=========
In current Python versions, the widely used built-in dict type does
not specify an order for the key/value pairs stored. This makes it
hard to use dictionaries as data storage for some specific use cases.
Some dynamic programming languages like PHP and Ruby 1.9 guarantee a
certain order on iteration. In those languages, and existing Python
ordered-dict implementations, the ordering of items is defined by the
time of insertion of the key. New keys are appended at the end, keys
that are overwritten and not moved.
The following example shows the behavior for simple assignments:
>>> d = odict()
>>> d['parrot'] = 'dead'
>>> d['penguin'] = 'exploded'
>>> d.items()
[('parrot', 'dead'), ('penguin', 'exploded')]
That the ordering is preserved makes an odict useful for a couple of
situations:
- XML/HTML processing libraries currently drop the ordering of
attributes, use a list instead of a dict which makes filtering
cumbersome, or implement their own ordered dictionary. This affects
ElementTree, html5lib, Genshi and many more libraries.
- There are many ordererd dict implementations in various libraries
and applications, most of them subtly incompatible with each other.
Furthermore, subclassing dict is a non-trivial task and many
implementations don't override all the methods properly which can
lead to unexpected results.
Additionally, many ordered dicts are implemented in an inefficient
way, making many operations more complex then they have to be.
- PEP 3115 allows metaclasses to change the mapping object used for
the class body. An ordered dict could be used to create ordered
member declarations similar to C structs. This could be useful, for
example, for future ``ctypes`` releases as well as ORMs that define
database tables as classes, like the one the Django framework ships.
Django currently uses an ugly hack to restore the ordering of
members in database models.
- Code ported from other programming languages such as PHP often
depends on a ordered dict. Having an implementation of an
ordering-preserving dictionary in the standard library could ease
the transition and improve the compatibility of different libraries.
Ordered Dict API
================
The ordered dict API would be mostly compatible with dict and existing
ordered dicts. (Note: this PEP refers to the Python 2.x dictionary
API; the transfer to the 3.x API is trivial.)
The constructor and ``update()`` both accept iterables of tuples as
well as mappings like a dict does. The ordering however is preserved
for the first case:
>>> d = odict([('a', 'b'), ('c', 'd')])
>>> d.update({'foo': 'bar'})
>>> d
collections.odict([('a', 'b'), ('c', 'd'), ('foo', 'bar')])
If ordered dicts are updated from regular dicts, the ordering of new
keys is of course undefined again unless ``sort()`` is called.
All iteration methods as well as ``keys()``, ``values()`` and
``items()`` return the values ordered by the the time the key-value
pair was inserted:
>>> d['spam'] = 'eggs'
>>> d.keys()
['a', 'c', 'foo', 'spam']
>>> d.values()
['b', 'd', 'bar', 'eggs']
>>> d.items()
[('a', 'b'), ('c', 'd'), ('foo', 'bar'), ('spam', 'eggs')]
New methods not available on dict:
``odict.byindex(index)``
Index-based lookup is supported by ``byindex()`` which returns
the key/value pair for an index, that is, the "position" of a
key in the ordered dict. 0 is the first key/value pair, -1
the last.
>>> d.byindex(2)
('foo', 'bar')
``odict.sort(cmp=None, key=None, reverse=False)``
Sorts the odict in place by cmp or key. This works exactly
like ``list.sort()``, but the comparison functions are passed
a key/value tuple, not only the value.
>>> d = odict([(42, 1), (1, 4), (23, 7)])
>>> d.sort()
>>> d
collections.odict([(1, 4), (23, 7), (42, 1)])
``odict.reverse()``
Reverses the odict in place.
``odict.__reverse__()``
Supports reverse iteration by key.
Questions and Answers
=====================
What happens if an existing key is reassigned?
The key is not moved but assigned a new value in place. This is
consistent with existing implementations and allows subclasses to
change the behavior easily::
class movingcollections.odict):
def __setitem__(self, key, value):
self.pop(key, None)
odict.__setitem__(self, key, value)
What happens if keys appear multiple times in the list passed to the
constructor?
The same as for regular dicts: The latter item overrides the
former. This has the side-effect that the position of the first
key is used because the key is actually overwritten:
>>> odict([('a', 1), ('b', 2), ('a', 3)])
collections.odict([('a', 3), ('b', 2)])
This behavior is consistent with existing implementations in
Python, the PHP array and the hashmap in Ruby 1.9.
Why is there no ``odict.insert()``?
There are few situations where you really want to insert a key at
an specified index. To avoid API complication, the proposed
solution for this situation is creating a list of items,
manipulating that and converting it back into an odict:
>>> d = odict([('a', 42), ('b', 23), ('c', 19)])
>>> l = d.items()
>>> l.insert(1, ('x', 0))
>>> odict(l)
collections.odict([('a', 42), ('x', 0), ('b', 23), ('c', 19)])
Example Implementation
======================
A poorly performing example implementation of the odict written in
Python is available:
`odict.py <http://dev.pocoo.org/hg/sandbox/raw-file/tip/odict.py>`_
The version for ``collections`` should be implemented in C and use a
linked list internally.
Other implementations of ordered dicts in various Python projects or
standalone libraries, that inspired the API proposed here, are:
- `odict in Babel`_
- `OrderedDict in Django`_
- `The odict module`_
- `ordereddict`_ (a C implementation of the odict module)
- `StableDict`_
- `Armin Rigo's OrderedDict`_
.. _odict in Babel: http://babel.edgewall.org/browser/trunk/babel/util.py?rev=374#L178
.. _OrderedDict in Django:
http://code.djangoproject.com/browser/django/trunk/django/utils/datastructures.py?rev=7140#L53
.. _The odict module: http://www.voidspace.org.uk/python/odict.html
.. _ordereddict: http://www.xs4all.nl/~anthon/Python/ordereddict/
.. _StableDict: http://pypi.python.org/pypi/StableDict/0.2
.. _Armin Rigo's OrderedDict: http://codespeak.net/svn/user/arigo/hack/pyfuse/OrderedDict.py
Future Directions
=================
With the availability of an ordered dict in the standard library,
other libraries may take advantage of that. For example, ElementTree
could return odicts in the future that retain the attribute ordering
of the source file.
Copyright
=========
This document has been placed in the public domain.
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End: