187 lines
6.5 KiB
Plaintext
187 lines
6.5 KiB
Plaintext
|
PEP: 265
|
|||
|
Title: Sorting Dictionaries by Value
|
|||
|
Version: $Revision$
|
|||
|
Author: g2@iowegian.com (Grant Griffin)
|
|||
|
Status: Draft
|
|||
|
Type: Standards Track
|
|||
|
Created: 8-Aug-2001
|
|||
|
Python-Version: 2.2
|
|||
|
Post-History:
|
|||
|
|
|||
|
|
|||
|
Abstract
|
|||
|
|
|||
|
This PEP suggests a "sort by value" operation for dictionaries.
|
|||
|
The primary benefit would be in terms of "batteries included"
|
|||
|
support for a common Python idiom which, in its current form, is
|
|||
|
both difficult for beginners to understand and cumbersome for all
|
|||
|
to implement.
|
|||
|
|
|||
|
|
|||
|
Motivation
|
|||
|
|
|||
|
A common use of dictionaries is to count occurrences by setting
|
|||
|
the value of d[key] to 1 on its first occurrence, then increment
|
|||
|
the value on each subsequent occurrence. This can be done several
|
|||
|
different ways, but the get() method is the most succinct:
|
|||
|
|
|||
|
d[key] = d.get(key, 0) + 1
|
|||
|
|
|||
|
Once all occurrences have been counted, a common use of the
|
|||
|
resulting dictionary is to print the occurrences in
|
|||
|
occurrence-sorted order, often with the largest value first.
|
|||
|
|
|||
|
This leads to a need to sort a dictionary's items by value. The
|
|||
|
canonical method of doing so in Python is to first use d.items()
|
|||
|
to get a list of the dictionary's items, then invert the ordering
|
|||
|
of each item's tuple from (key, value) into (value, key), then
|
|||
|
sort the list; since Python sorts the list based on the first item
|
|||
|
of the tuple, the list of (inverted) items is therefore sorted by
|
|||
|
value. If desired, the list can then be reversed, and the tuples
|
|||
|
can be re-inverted back to (key, value). (However, in my
|
|||
|
experience, the inverted tuple ordering is fine for most purposes,
|
|||
|
e.g. printing out the list.)
|
|||
|
|
|||
|
For example, given an occurrence count of:
|
|||
|
|
|||
|
>>> d = {'a':2, 'b':23, 'c':5, 'd':17, 'e':1}
|
|||
|
|
|||
|
we might do:
|
|||
|
|
|||
|
>>> items = [(v, k) for k, v in d.items()]
|
|||
|
>>> items.sort()
|
|||
|
>>> items.reverse() # so largest is first
|
|||
|
>>> items = [(k, v) for v, k in items]
|
|||
|
|
|||
|
resulting in:
|
|||
|
|
|||
|
>>> items
|
|||
|
[('b', 23), ('d', 17), ('c', 5), ('a', 2), ('e', 1)]
|
|||
|
|
|||
|
which shows the list in by-value order, largest first. (In this
|
|||
|
case, 'b' was found to have the most occurrences.)
|
|||
|
|
|||
|
This works fine, but is "hard to use" in two aspects. First,
|
|||
|
although this idiom is known to veteran Pythoneers, it is not at
|
|||
|
all obvious to newbies -- either in terms of its algorithm
|
|||
|
(inverting the ordering of item tuples) or its implementation
|
|||
|
(using list comprehensions -- which are an advanced Python
|
|||
|
feature.) Second, it requires having to repeatedly type a lot of
|
|||
|
"grunge", resulting in both tedium and mistakes.
|
|||
|
|
|||
|
We therefore would rather Python provide a method of sorting
|
|||
|
dictionaries by value which would be both easy for newbies to
|
|||
|
understand (or, better yet, not to _have to_ understand) and
|
|||
|
easier for all to use.
|
|||
|
|
|||
|
|
|||
|
Rationale
|
|||
|
|
|||
|
As Tim Peters has pointed out, this sort of thing brings on the
|
|||
|
problem of trying to be all things to all people. Therefore, we
|
|||
|
will limit its scope to try to hit "the sweet spot". Unusual
|
|||
|
cases (e.g. sorting via a custom comparison function) can, of
|
|||
|
course, be handled "manually" using present methods.
|
|||
|
|
|||
|
Here are some simple possibilities:
|
|||
|
|
|||
|
The items() method of dictionaries can be augmented with new
|
|||
|
parameters having default values that provide for full
|
|||
|
backwards-compatibility:
|
|||
|
|
|||
|
(1) items(sort_by_values=0, reversed=0)
|
|||
|
|
|||
|
or maybe just:
|
|||
|
|
|||
|
(2) items(sort_by_values=0)
|
|||
|
|
|||
|
since reversing a list is easy enough.
|
|||
|
|
|||
|
Alternatively, items() could simply let us control the (key, value)
|
|||
|
order:
|
|||
|
|
|||
|
(3) items(values_first=0)
|
|||
|
|
|||
|
Again, this is fully backwards-compatible. It does less work than
|
|||
|
the others, but it at least eases the most complicated/tricky part
|
|||
|
of the sort-by-value problem: inverting the order of item tuples.
|
|||
|
Using this is very simple:
|
|||
|
|
|||
|
items = d.items(1)
|
|||
|
items.sort()
|
|||
|
items.reverse() # (if desired)
|
|||
|
|
|||
|
The primary drawback of the preceding three approaches is the
|
|||
|
additional overhead for the parameter-less "items()" case, due to
|
|||
|
having to process default parameters. (However, if one assumes
|
|||
|
that items() gets used primarily for creating sort-by-value lists,
|
|||
|
this is not really a drawback in practice.)
|
|||
|
|
|||
|
Alternatively, we might add a new dictionary method which somehow
|
|||
|
embodies "sorting". This approach offers two advantages. First,
|
|||
|
it avoids adding overhead to the items() method. Second, it is
|
|||
|
perhaps more accessible to newbies: when they go looking for a
|
|||
|
method for sorting dictionaries, they hopefully run into this one,
|
|||
|
and they will not have to understand the finer points of tuple
|
|||
|
inversion and list sorting to achieve sort-by-value.
|
|||
|
|
|||
|
To allow the four basic possibilities of sorting by key/value and in
|
|||
|
forward/reverse order, we could add this method:
|
|||
|
|
|||
|
(4) sorted_items(by_value=0, reversed=0)
|
|||
|
|
|||
|
I believe the most common case would actually be "by_value=1,
|
|||
|
reversed=1", but the defaults values given here might lead to
|
|||
|
fewer surprises by users: sorted_items() would be the same as
|
|||
|
items() followed by sort().
|
|||
|
|
|||
|
Finally (as a last resort), we could use:
|
|||
|
|
|||
|
(5) items_sorted_by_value(reversed=0)
|
|||
|
|
|||
|
|
|||
|
Implementation
|
|||
|
|
|||
|
The proposed dictionary methods would necessarily be implemented
|
|||
|
in C. Presumably, the implementation would be fairly simple since
|
|||
|
it involves just adding a few calls to Python's existing
|
|||
|
machinery.
|
|||
|
|
|||
|
|
|||
|
Concerns
|
|||
|
|
|||
|
Aside from the run-time overhead already addressed in
|
|||
|
possibilities 1 through 3, concerns with this proposal probably
|
|||
|
will fall into the categories of "feature bloat" and/or "code
|
|||
|
bloat". However, I believe that several of the suggestions made
|
|||
|
here will result in quite minimal bloat, resulting in a good
|
|||
|
tradeoff between bloat and "value added".
|
|||
|
|
|||
|
Tim Peters has noted that implementing this in C might not be
|
|||
|
significantly faster than implementing it in Python today.
|
|||
|
However, the major benefits intended here are "accessibility" and
|
|||
|
"ease of use", not "speed". Therefore, as long as it is not
|
|||
|
noticeably slower (in the case of plain items(), speed need not be
|
|||
|
a consideration.
|
|||
|
|
|||
|
|
|||
|
References
|
|||
|
|
|||
|
A related thread called "counting occurrences" appeared on
|
|||
|
comp.lang.python in August, 2001. This included examples of
|
|||
|
approaches to systematizing the sort-by-value problem by
|
|||
|
implementing it as reusable Python functions and classes.
|
|||
|
|
|||
|
|
|||
|
Copyright
|
|||
|
|
|||
|
This document has been placed in the public domain.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Local Variables:
|
|||
|
mode: indented-text
|
|||
|
indent-tabs-mode: nil
|
|||
|
End:
|
|||
|
|