PEP 265, Sorting Dictionaries by Value, Grant Griffin
Editorial pass, formatting, spell checking by Barry.
This commit is contained in:
parent
3d503a4ea4
commit
cb2305d6da
|
@ -0,0 +1,186 @@
|
|||
PEP: 265
|
||||
Title: Sorting Dictionaries by Value
|
||||
Version: $Revision$
|
||||
Author: g2@iowegian.com (Grant Griffin)
|
||||
Status: Draft
|
||||
Type: Standards Track
|
||||
Created: 8-Aug-2001
|
||||
Python-Version: 2.2
|
||||
Post-History:
|
||||
|
||||
|
||||
Abstract
|
||||
|
||||
This PEP suggests a "sort by value" operation for dictionaries.
|
||||
The primary benefit would be in terms of "batteries included"
|
||||
support for a common Python idiom which, in its current form, is
|
||||
both difficult for beginners to understand and cumbersome for all
|
||||
to implement.
|
||||
|
||||
|
||||
Motivation
|
||||
|
||||
A common use of dictionaries is to count occurrences by setting
|
||||
the value of d[key] to 1 on its first occurrence, then increment
|
||||
the value on each subsequent occurrence. This can be done several
|
||||
different ways, but the get() method is the most succinct:
|
||||
|
||||
d[key] = d.get(key, 0) + 1
|
||||
|
||||
Once all occurrences have been counted, a common use of the
|
||||
resulting dictionary is to print the occurrences in
|
||||
occurrence-sorted order, often with the largest value first.
|
||||
|
||||
This leads to a need to sort a dictionary's items by value. The
|
||||
canonical method of doing so in Python is to first use d.items()
|
||||
to get a list of the dictionary's items, then invert the ordering
|
||||
of each item's tuple from (key, value) into (value, key), then
|
||||
sort the list; since Python sorts the list based on the first item
|
||||
of the tuple, the list of (inverted) items is therefore sorted by
|
||||
value. If desired, the list can then be reversed, and the tuples
|
||||
can be re-inverted back to (key, value). (However, in my
|
||||
experience, the inverted tuple ordering is fine for most purposes,
|
||||
e.g. printing out the list.)
|
||||
|
||||
For example, given an occurrence count of:
|
||||
|
||||
>>> d = {'a':2, 'b':23, 'c':5, 'd':17, 'e':1}
|
||||
|
||||
we might do:
|
||||
|
||||
>>> items = [(v, k) for k, v in d.items()]
|
||||
>>> items.sort()
|
||||
>>> items.reverse() # so largest is first
|
||||
>>> items = [(k, v) for v, k in items]
|
||||
|
||||
resulting in:
|
||||
|
||||
>>> items
|
||||
[('b', 23), ('d', 17), ('c', 5), ('a', 2), ('e', 1)]
|
||||
|
||||
which shows the list in by-value order, largest first. (In this
|
||||
case, 'b' was found to have the most occurrences.)
|
||||
|
||||
This works fine, but is "hard to use" in two aspects. First,
|
||||
although this idiom is known to veteran Pythoneers, it is not at
|
||||
all obvious to newbies -- either in terms of its algorithm
|
||||
(inverting the ordering of item tuples) or its implementation
|
||||
(using list comprehensions -- which are an advanced Python
|
||||
feature.) Second, it requires having to repeatedly type a lot of
|
||||
"grunge", resulting in both tedium and mistakes.
|
||||
|
||||
We therefore would rather Python provide a method of sorting
|
||||
dictionaries by value which would be both easy for newbies to
|
||||
understand (or, better yet, not to _have to_ understand) and
|
||||
easier for all to use.
|
||||
|
||||
|
||||
Rationale
|
||||
|
||||
As Tim Peters has pointed out, this sort of thing brings on the
|
||||
problem of trying to be all things to all people. Therefore, we
|
||||
will limit its scope to try to hit "the sweet spot". Unusual
|
||||
cases (e.g. sorting via a custom comparison function) can, of
|
||||
course, be handled "manually" using present methods.
|
||||
|
||||
Here are some simple possibilities:
|
||||
|
||||
The items() method of dictionaries can be augmented with new
|
||||
parameters having default values that provide for full
|
||||
backwards-compatibility:
|
||||
|
||||
(1) items(sort_by_values=0, reversed=0)
|
||||
|
||||
or maybe just:
|
||||
|
||||
(2) items(sort_by_values=0)
|
||||
|
||||
since reversing a list is easy enough.
|
||||
|
||||
Alternatively, items() could simply let us control the (key, value)
|
||||
order:
|
||||
|
||||
(3) items(values_first=0)
|
||||
|
||||
Again, this is fully backwards-compatible. It does less work than
|
||||
the others, but it at least eases the most complicated/tricky part
|
||||
of the sort-by-value problem: inverting the order of item tuples.
|
||||
Using this is very simple:
|
||||
|
||||
items = d.items(1)
|
||||
items.sort()
|
||||
items.reverse() # (if desired)
|
||||
|
||||
The primary drawback of the preceding three approaches is the
|
||||
additional overhead for the parameter-less "items()" case, due to
|
||||
having to process default parameters. (However, if one assumes
|
||||
that items() gets used primarily for creating sort-by-value lists,
|
||||
this is not really a drawback in practice.)
|
||||
|
||||
Alternatively, we might add a new dictionary method which somehow
|
||||
embodies "sorting". This approach offers two advantages. First,
|
||||
it avoids adding overhead to the items() method. Second, it is
|
||||
perhaps more accessible to newbies: when they go looking for a
|
||||
method for sorting dictionaries, they hopefully run into this one,
|
||||
and they will not have to understand the finer points of tuple
|
||||
inversion and list sorting to achieve sort-by-value.
|
||||
|
||||
To allow the four basic possibilities of sorting by key/value and in
|
||||
forward/reverse order, we could add this method:
|
||||
|
||||
(4) sorted_items(by_value=0, reversed=0)
|
||||
|
||||
I believe the most common case would actually be "by_value=1,
|
||||
reversed=1", but the defaults values given here might lead to
|
||||
fewer surprises by users: sorted_items() would be the same as
|
||||
items() followed by sort().
|
||||
|
||||
Finally (as a last resort), we could use:
|
||||
|
||||
(5) items_sorted_by_value(reversed=0)
|
||||
|
||||
|
||||
Implementation
|
||||
|
||||
The proposed dictionary methods would necessarily be implemented
|
||||
in C. Presumably, the implementation would be fairly simple since
|
||||
it involves just adding a few calls to Python's existing
|
||||
machinery.
|
||||
|
||||
|
||||
Concerns
|
||||
|
||||
Aside from the run-time overhead already addressed in
|
||||
possibilities 1 through 3, concerns with this proposal probably
|
||||
will fall into the categories of "feature bloat" and/or "code
|
||||
bloat". However, I believe that several of the suggestions made
|
||||
here will result in quite minimal bloat, resulting in a good
|
||||
tradeoff between bloat and "value added".
|
||||
|
||||
Tim Peters has noted that implementing this in C might not be
|
||||
significantly faster than implementing it in Python today.
|
||||
However, the major benefits intended here are "accessibility" and
|
||||
"ease of use", not "speed". Therefore, as long as it is not
|
||||
noticeably slower (in the case of plain items(), speed need not be
|
||||
a consideration.
|
||||
|
||||
|
||||
References
|
||||
|
||||
A related thread called "counting occurrences" appeared on
|
||||
comp.lang.python in August, 2001. This included examples of
|
||||
approaches to systematizing the sort-by-value problem by
|
||||
implementing it as reusable Python functions and classes.
|
||||
|
||||
|
||||
Copyright
|
||||
|
||||
This document has been placed in the public domain.
|
||||
|
||||
|
||||
|
||||
Local Variables:
|
||||
mode: indented-text
|
||||
indent-tabs-mode: nil
|
||||
End:
|
||||
|
Loading…
Reference in New Issue