207 lines
7.1 KiB
Plaintext
207 lines
7.1 KiB
Plaintext
PEP: 265
|
||
Title: Sorting Dictionaries by Value
|
||
Version: $Revision$
|
||
Last-Modified: $Date$
|
||
Author: g2@iowegian.com (Grant Griffin)
|
||
Status: Rejected
|
||
Type: Standards Track
|
||
Created: 8-Aug-2001
|
||
Python-Version: 2.2
|
||
Post-History:
|
||
|
||
|
||
Abstract
|
||
|
||
This PEP suggests a "sort by value" operation for dictionaries.
|
||
The primary benefit would be in terms of "batteries included"
|
||
support for a common Python idiom which, in its current form, is
|
||
both difficult for beginners to understand and cumbersome for all
|
||
to implement.
|
||
|
||
BDFL Pronouncement
|
||
|
||
This PEP is rejected because the need for it has been largely
|
||
fulfilled by Py2.4's sorted() builtin function:
|
||
|
||
>>> sorted(d.iteritems(), key=itemgetter(1), reverse=True)
|
||
[('b', 23), ('d', 17), ('c', 5), ('a', 2), ('e', 1)]
|
||
|
||
or for just the keys:
|
||
|
||
sorted(d, key=d.__getitem__, reverse=True)
|
||
['b', 'd', 'c', 'a', 'e']
|
||
|
||
Also, Python 2.5's heapq.nlargest() function addresses the common use
|
||
case of finding only a few of the highest valued items:
|
||
|
||
>>> nlargest(2, d.iteritems(), itemgetter(1))
|
||
[('b', 23), ('d', 17)]
|
||
|
||
|
||
Motivation
|
||
|
||
A common use of dictionaries is to count occurrences by setting
|
||
the value of d[key] to 1 on its first occurrence, then increment
|
||
the value on each subsequent occurrence. This can be done several
|
||
different ways, but the get() method is the most succinct:
|
||
|
||
d[key] = d.get(key, 0) + 1
|
||
|
||
Once all occurrences have been counted, a common use of the
|
||
resulting dictionary is to print the occurrences in
|
||
occurrence-sorted order, often with the largest value first.
|
||
|
||
This leads to a need to sort a dictionary's items by value. The
|
||
canonical method of doing so in Python is to first use d.items()
|
||
to get a list of the dictionary's items, then invert the ordering
|
||
of each item's tuple from (key, value) into (value, key), then
|
||
sort the list; since Python sorts the list based on the first item
|
||
of the tuple, the list of (inverted) items is therefore sorted by
|
||
value. If desired, the list can then be reversed, and the tuples
|
||
can be re-inverted back to (key, value). (However, in my
|
||
experience, the inverted tuple ordering is fine for most purposes,
|
||
e.g. printing out the list.)
|
||
|
||
For example, given an occurrence count of:
|
||
|
||
>>> d = {'a':2, 'b':23, 'c':5, 'd':17, 'e':1}
|
||
|
||
we might do:
|
||
|
||
>>> items = [(v, k) for k, v in d.items()]
|
||
>>> items.sort()
|
||
>>> items.reverse() # so largest is first
|
||
>>> items = [(k, v) for v, k in items]
|
||
|
||
resulting in:
|
||
|
||
>>> items
|
||
[('b', 23), ('d', 17), ('c', 5), ('a', 2), ('e', 1)]
|
||
|
||
which shows the list in by-value order, largest first. (In this
|
||
case, 'b' was found to have the most occurrences.)
|
||
|
||
This works fine, but is "hard to use" in two aspects. First,
|
||
although this idiom is known to veteran Pythoneers, it is not at
|
||
all obvious to newbies -- either in terms of its algorithm
|
||
(inverting the ordering of item tuples) or its implementation
|
||
(using list comprehensions -- which are an advanced Python
|
||
feature.) Second, it requires having to repeatedly type a lot of
|
||
"grunge", resulting in both tedium and mistakes.
|
||
|
||
We therefore would rather Python provide a method of sorting
|
||
dictionaries by value which would be both easy for newbies to
|
||
understand (or, better yet, not to _have to_ understand) and
|
||
easier for all to use.
|
||
|
||
|
||
Rationale
|
||
|
||
As Tim Peters has pointed out, this sort of thing brings on the
|
||
problem of trying to be all things to all people. Therefore, we
|
||
will limit its scope to try to hit "the sweet spot". Unusual
|
||
cases (e.g. sorting via a custom comparison function) can, of
|
||
course, be handled "manually" using present methods.
|
||
|
||
Here are some simple possibilities:
|
||
|
||
The items() method of dictionaries can be augmented with new
|
||
parameters having default values that provide for full
|
||
backwards-compatibility:
|
||
|
||
(1) items(sort_by_values=0, reversed=0)
|
||
|
||
or maybe just:
|
||
|
||
(2) items(sort_by_values=0)
|
||
|
||
since reversing a list is easy enough.
|
||
|
||
Alternatively, items() could simply let us control the (key, value)
|
||
order:
|
||
|
||
(3) items(values_first=0)
|
||
|
||
Again, this is fully backwards-compatible. It does less work than
|
||
the others, but it at least eases the most complicated/tricky part
|
||
of the sort-by-value problem: inverting the order of item tuples.
|
||
Using this is very simple:
|
||
|
||
items = d.items(1)
|
||
items.sort()
|
||
items.reverse() # (if desired)
|
||
|
||
The primary drawback of the preceding three approaches is the
|
||
additional overhead for the parameter-less "items()" case, due to
|
||
having to process default parameters. (However, if one assumes
|
||
that items() gets used primarily for creating sort-by-value lists,
|
||
this is not really a drawback in practice.)
|
||
|
||
Alternatively, we might add a new dictionary method which somehow
|
||
embodies "sorting". This approach offers two advantages. First,
|
||
it avoids adding overhead to the items() method. Second, it is
|
||
perhaps more accessible to newbies: when they go looking for a
|
||
method for sorting dictionaries, they hopefully run into this one,
|
||
and they will not have to understand the finer points of tuple
|
||
inversion and list sorting to achieve sort-by-value.
|
||
|
||
To allow the four basic possibilities of sorting by key/value and in
|
||
forward/reverse order, we could add this method:
|
||
|
||
(4) sorted_items(by_value=0, reversed=0)
|
||
|
||
I believe the most common case would actually be "by_value=1,
|
||
reversed=1", but the defaults values given here might lead to
|
||
fewer surprises by users: sorted_items() would be the same as
|
||
items() followed by sort().
|
||
|
||
Finally (as a last resort), we could use:
|
||
|
||
(5) items_sorted_by_value(reversed=0)
|
||
|
||
|
||
Implementation
|
||
|
||
The proposed dictionary methods would necessarily be implemented
|
||
in C. Presumably, the implementation would be fairly simple since
|
||
it involves just adding a few calls to Python's existing
|
||
machinery.
|
||
|
||
|
||
Concerns
|
||
|
||
Aside from the run-time overhead already addressed in
|
||
possibilities 1 through 3, concerns with this proposal probably
|
||
will fall into the categories of "feature bloat" and/or "code
|
||
bloat". However, I believe that several of the suggestions made
|
||
here will result in quite minimal bloat, resulting in a good
|
||
tradeoff between bloat and "value added".
|
||
|
||
Tim Peters has noted that implementing this in C might not be
|
||
significantly faster than implementing it in Python today.
|
||
However, the major benefits intended here are "accessibility" and
|
||
"ease of use", not "speed". Therefore, as long as it is not
|
||
noticeably slower (in the case of plain items(), speed need not be
|
||
a consideration.
|
||
|
||
|
||
References
|
||
|
||
A related thread called "counting occurrences" appeared on
|
||
comp.lang.python in August, 2001. This included examples of
|
||
approaches to systematizing the sort-by-value problem by
|
||
implementing it as reusable Python functions and classes.
|
||
|
||
|
||
Copyright
|
||
|
||
This document has been placed in the public domain.
|
||
|
||
|
||
|
||
Local Variables:
|
||
mode: indented-text
|
||
indent-tabs-mode: nil
|
||
End:
|
||
|