2001-08-14 19:07:17 -04:00
|
|
|
PEP: 265
|
|
|
|
Title: Sorting Dictionaries by Value
|
|
|
|
Version: $Revision$
|
2006-03-23 15:13:19 -05:00
|
|
|
Last-Modified: $Date$
|
2001-08-14 19:07:17 -04:00
|
|
|
Author: g2@iowegian.com (Grant Griffin)
|
2005-06-17 00:30:41 -04:00
|
|
|
Status: Rejected
|
2001-08-14 19:07:17 -04:00
|
|
|
Type: Standards Track
|
2017-01-24 15:47:22 -05:00
|
|
|
Content-Type: text/x-rst
|
2001-08-14 19:07:17 -04:00
|
|
|
Created: 8-Aug-2001
|
|
|
|
Python-Version: 2.2
|
|
|
|
Post-History:
|
|
|
|
|
|
|
|
|
|
|
|
Abstract
|
2017-01-24 15:47:22 -05:00
|
|
|
========
|
|
|
|
|
|
|
|
This PEP suggests a "sort by value" operation for dictionaries.
|
|
|
|
The primary benefit would be in terms of "batteries included"
|
|
|
|
support for a common Python idiom which, in its current form, is
|
|
|
|
both difficult for beginners to understand and cumbersome for all
|
|
|
|
to implement.
|
2001-08-14 19:07:17 -04:00
|
|
|
|
|
|
|
|
2005-06-17 00:30:41 -04:00
|
|
|
BDFL Pronouncement
|
2017-01-24 15:47:22 -05:00
|
|
|
==================
|
2005-06-17 00:30:41 -04:00
|
|
|
|
2017-01-24 15:47:22 -05:00
|
|
|
This PEP is rejected because the need for it has been largely
|
|
|
|
fulfilled by Py2.4's ``sorted()`` builtin function::
|
2005-06-17 00:30:41 -04:00
|
|
|
|
2017-01-24 15:47:22 -05:00
|
|
|
>>> sorted(d.iteritems(), key=itemgetter(1), reverse=True)
|
|
|
|
[('b', 23), ('d', 17), ('c', 5), ('a', 2), ('e', 1)]
|
2005-06-17 00:30:41 -04:00
|
|
|
|
2017-01-24 15:47:22 -05:00
|
|
|
or for just the keys::
|
2005-06-17 00:30:41 -04:00
|
|
|
|
2017-01-24 15:47:22 -05:00
|
|
|
sorted(d, key=d.__getitem__, reverse=True)
|
|
|
|
['b', 'd', 'c', 'a', 'e']
|
2005-06-17 00:30:41 -04:00
|
|
|
|
2017-01-24 15:47:22 -05:00
|
|
|
Also, Python 2.5's ``heapq.nlargest()`` function addresses the common use
|
|
|
|
case of finding only a few of the highest valued items::
|
2005-06-17 00:30:41 -04:00
|
|
|
|
2017-01-24 15:47:22 -05:00
|
|
|
>>> nlargest(2, d.iteritems(), itemgetter(1))
|
|
|
|
[('b', 23), ('d', 17)]
|
2005-06-17 00:30:41 -04:00
|
|
|
|
2001-08-14 19:07:17 -04:00
|
|
|
|
|
|
|
Motivation
|
2017-01-24 15:47:22 -05:00
|
|
|
==========
|
2001-08-14 19:07:17 -04:00
|
|
|
|
2017-01-24 15:47:22 -05:00
|
|
|
A common use of dictionaries is to count occurrences by setting
|
|
|
|
the value of ``d[key]`` to 1 on its first occurrence, then increment
|
|
|
|
the value on each subsequent occurrence. This can be done several
|
|
|
|
different ways, but the ``get()`` method is the most succinct::
|
2001-08-14 19:07:17 -04:00
|
|
|
|
2017-01-24 15:47:22 -05:00
|
|
|
d[key] = d.get(key, 0) + 1
|
2001-08-14 19:07:17 -04:00
|
|
|
|
2017-01-24 15:47:22 -05:00
|
|
|
Once all occurrences have been counted, a common use of the
|
|
|
|
resulting dictionary is to print the occurrences in
|
|
|
|
occurrence-sorted order, often with the largest value first.
|
2001-08-14 19:07:17 -04:00
|
|
|
|
2017-01-24 15:47:22 -05:00
|
|
|
This leads to a need to sort a dictionary's items by value. The
|
|
|
|
canonical method of doing so in Python is to first use ``d.items()``
|
|
|
|
to get a list of the dictionary's items, then invert the ordering
|
|
|
|
of each item's tuple from (key, value) into (value, key), then
|
|
|
|
sort the list; since Python sorts the list based on the first item
|
|
|
|
of the tuple, the list of (inverted) items is therefore sorted by
|
|
|
|
value. If desired, the list can then be reversed, and the tuples
|
|
|
|
can be re-inverted back to (key, value). (However, in my
|
|
|
|
experience, the inverted tuple ordering is fine for most purposes,
|
|
|
|
e.g. printing out the list.)
|
2001-08-14 19:07:17 -04:00
|
|
|
|
2017-01-24 15:47:22 -05:00
|
|
|
For example, given an occurrence count of::
|
2001-08-14 19:07:17 -04:00
|
|
|
|
2017-01-24 15:47:22 -05:00
|
|
|
>>> d = {'a':2, 'b':23, 'c':5, 'd':17, 'e':1}
|
2001-08-14 19:07:17 -04:00
|
|
|
|
2017-01-24 15:47:22 -05:00
|
|
|
we might do::
|
2001-08-14 19:07:17 -04:00
|
|
|
|
2017-01-24 15:47:22 -05:00
|
|
|
>>> items = [(v, k) for k, v in d.items()]
|
|
|
|
>>> items.sort()
|
|
|
|
>>> items.reverse() # so largest is first
|
|
|
|
>>> items = [(k, v) for v, k in items]
|
2001-08-14 19:07:17 -04:00
|
|
|
|
2017-01-24 15:47:22 -05:00
|
|
|
resulting in::
|
2001-08-14 19:07:17 -04:00
|
|
|
|
2017-01-24 15:47:22 -05:00
|
|
|
>>> items
|
|
|
|
[('b', 23), ('d', 17), ('c', 5), ('a', 2), ('e', 1)]
|
2001-08-14 19:07:17 -04:00
|
|
|
|
2017-01-24 15:47:22 -05:00
|
|
|
which shows the list in by-value order, largest first. (In this
|
|
|
|
case, ``b`` was found to have the most occurrences.)
|
2001-08-14 19:07:17 -04:00
|
|
|
|
2017-01-24 15:47:22 -05:00
|
|
|
This works fine, but is "hard to use" in two aspects. First,
|
|
|
|
although this idiom is known to veteran Pythoneers, it is not at
|
|
|
|
all obvious to newbies -- either in terms of its algorithm
|
|
|
|
(inverting the ordering of item tuples) or its implementation
|
|
|
|
(using list comprehensions -- which are an advanced Python
|
|
|
|
feature.) Second, it requires having to repeatedly type a lot of
|
|
|
|
"grunge", resulting in both tedium and mistakes.
|
2001-08-14 19:07:17 -04:00
|
|
|
|
2017-01-24 15:47:22 -05:00
|
|
|
We therefore would rather Python provide a method of sorting
|
|
|
|
dictionaries by value which would be both easy for newbies to
|
|
|
|
understand (or, better yet, not to *have to* understand) and
|
|
|
|
easier for all to use.
|
2001-08-14 19:07:17 -04:00
|
|
|
|
|
|
|
|
|
|
|
Rationale
|
2017-01-24 15:47:22 -05:00
|
|
|
=========
|
2001-08-14 19:07:17 -04:00
|
|
|
|
2017-01-24 15:47:22 -05:00
|
|
|
As Tim Peters has pointed out, this sort of thing brings on the
|
|
|
|
problem of trying to be all things to all people. Therefore, we
|
|
|
|
will limit its scope to try to hit "the sweet spot". Unusual
|
|
|
|
cases (e.g. sorting via a custom comparison function) can, of
|
|
|
|
course, be handled "manually" using present methods.
|
2001-08-14 19:07:17 -04:00
|
|
|
|
2017-01-24 15:47:22 -05:00
|
|
|
Here are some simple possibilities:
|
2001-08-14 19:07:17 -04:00
|
|
|
|
2017-01-24 15:47:22 -05:00
|
|
|
The ``items()`` method of dictionaries can be augmented with new
|
|
|
|
parameters having default values that provide for full
|
|
|
|
backwards-compatibility::
|
2001-08-14 19:07:17 -04:00
|
|
|
|
2017-01-24 15:47:22 -05:00
|
|
|
(1) items(sort_by_values=0, reversed=0)
|
2001-08-14 19:07:17 -04:00
|
|
|
|
2017-01-24 15:47:22 -05:00
|
|
|
or maybe just::
|
2001-08-14 19:07:17 -04:00
|
|
|
|
2017-01-24 15:47:22 -05:00
|
|
|
(2) items(sort_by_values=0)
|
2001-08-14 19:07:17 -04:00
|
|
|
|
2017-01-24 15:47:22 -05:00
|
|
|
since reversing a list is easy enough.
|
2001-08-14 19:07:17 -04:00
|
|
|
|
2017-01-24 15:47:22 -05:00
|
|
|
Alternatively, ``items()`` could simply let us control the (key, value)
|
|
|
|
order::
|
2001-08-14 19:07:17 -04:00
|
|
|
|
2017-01-24 15:47:22 -05:00
|
|
|
(3) items(values_first=0)
|
2001-08-14 19:07:17 -04:00
|
|
|
|
2017-01-24 15:47:22 -05:00
|
|
|
Again, this is fully backwards-compatible. It does less work than
|
|
|
|
the others, but it at least eases the most complicated/tricky part
|
|
|
|
of the sort-by-value problem: inverting the order of item tuples.
|
|
|
|
Using this is very simple::
|
2001-08-14 19:07:17 -04:00
|
|
|
|
2017-01-24 15:47:22 -05:00
|
|
|
items = d.items(1)
|
|
|
|
items.sort()
|
|
|
|
items.reverse() # (if desired)
|
2001-08-14 19:07:17 -04:00
|
|
|
|
2017-01-24 15:47:22 -05:00
|
|
|
The primary drawback of the preceding three approaches is the
|
|
|
|
additional overhead for the parameter-less ``items()`` case, due to
|
|
|
|
having to process default parameters. (However, if one assumes
|
|
|
|
that ``items()`` gets used primarily for creating sort-by-value lists,
|
|
|
|
this is not really a drawback in practice.)
|
2001-08-14 19:07:17 -04:00
|
|
|
|
2017-01-24 15:47:22 -05:00
|
|
|
Alternatively, we might add a new dictionary method which somehow
|
|
|
|
embodies "sorting". This approach offers two advantages. First,
|
|
|
|
it avoids adding overhead to the ``items()`` method. Second, it is
|
|
|
|
perhaps more accessible to newbies: when they go looking for a
|
|
|
|
method for sorting dictionaries, they hopefully run into this one,
|
|
|
|
and they will not have to understand the finer points of tuple
|
|
|
|
inversion and list sorting to achieve sort-by-value.
|
2001-08-14 19:07:17 -04:00
|
|
|
|
2017-01-24 15:47:22 -05:00
|
|
|
To allow the four basic possibilities of sorting by key/value and in
|
|
|
|
forward/reverse order, we could add this method::
|
2001-08-14 19:07:17 -04:00
|
|
|
|
2017-01-24 15:47:22 -05:00
|
|
|
(4) sorted_items(by_value=0, reversed=0)
|
2001-08-14 19:07:17 -04:00
|
|
|
|
2017-01-24 15:47:22 -05:00
|
|
|
I believe the most common case would actually be ``by_value=1,
|
|
|
|
reversed=1``, but the defaults values given here might lead to
|
|
|
|
fewer surprises by users: ``sorted_items()`` would be the same as
|
|
|
|
``items()`` followed by ``sort()``.
|
2001-08-14 19:07:17 -04:00
|
|
|
|
2017-01-24 15:47:22 -05:00
|
|
|
Finally (as a last resort), we could use::
|
2001-08-14 19:07:17 -04:00
|
|
|
|
2017-01-24 15:47:22 -05:00
|
|
|
(5) items_sorted_by_value(reversed=0)
|
2001-08-14 19:07:17 -04:00
|
|
|
|
|
|
|
|
|
|
|
Implementation
|
2017-01-24 15:47:22 -05:00
|
|
|
==============
|
2001-08-14 19:07:17 -04:00
|
|
|
|
2017-01-24 15:47:22 -05:00
|
|
|
The proposed dictionary methods would necessarily be implemented
|
|
|
|
in C. Presumably, the implementation would be fairly simple since
|
|
|
|
it involves just adding a few calls to Python's existing
|
|
|
|
machinery.
|
2001-08-14 19:07:17 -04:00
|
|
|
|
|
|
|
|
|
|
|
Concerns
|
2017-01-24 15:47:22 -05:00
|
|
|
========
|
2001-08-14 19:07:17 -04:00
|
|
|
|
2017-01-24 15:47:22 -05:00
|
|
|
Aside from the run-time overhead already addressed in
|
|
|
|
possibilities 1 through 3, concerns with this proposal probably
|
|
|
|
will fall into the categories of "feature bloat" and/or "code
|
|
|
|
bloat". However, I believe that several of the suggestions made
|
|
|
|
here will result in quite minimal bloat, resulting in a good
|
|
|
|
tradeoff between bloat and "value added".
|
2001-08-14 19:07:17 -04:00
|
|
|
|
2017-01-24 15:47:22 -05:00
|
|
|
Tim Peters has noted that implementing this in C might not be
|
|
|
|
significantly faster than implementing it in Python today.
|
|
|
|
However, the major benefits intended here are "accessibility" and
|
|
|
|
"ease of use", not "speed". Therefore, as long as it is not
|
|
|
|
noticeably slower (in the case of plain ``items()``, speed need not be
|
|
|
|
a consideration.
|
2001-08-14 19:07:17 -04:00
|
|
|
|
|
|
|
|
|
|
|
References
|
2017-01-24 15:47:22 -05:00
|
|
|
==========
|
2001-08-14 19:07:17 -04:00
|
|
|
|
2017-01-24 15:47:22 -05:00
|
|
|
A related thread called "counting occurrences" appeared on
|
|
|
|
comp.lang.python in August, 2001. This included examples of
|
|
|
|
approaches to systematizing the sort-by-value problem by
|
|
|
|
implementing it as reusable Python functions and classes.
|
2001-08-14 19:07:17 -04:00
|
|
|
|
|
|
|
|
|
|
|
Copyright
|
2017-01-24 15:47:22 -05:00
|
|
|
=========
|
2001-08-14 19:07:17 -04:00
|
|
|
|
2017-01-24 15:47:22 -05:00
|
|
|
This document has been placed in the public domain.
|
2001-08-14 19:07:17 -04:00
|
|
|
|
|
|
|
|
|
|
|
|
2017-01-24 15:47:22 -05:00
|
|
|
..
|
|
|
|
Local Variables:
|
|
|
|
mode: indented-text
|
|
|
|
indent-tabs-mode: nil
|
|
|
|
End:
|
|
|
|
|