diff --git a/pep-0265.txt b/pep-0265.txt new file mode 100644 index 000000000..d3cd8f950 --- /dev/null +++ b/pep-0265.txt @@ -0,0 +1,186 @@ +PEP: 265 +Title: Sorting Dictionaries by Value +Version: $Revision$ +Author: g2@iowegian.com (Grant Griffin) +Status: Draft +Type: Standards Track +Created: 8-Aug-2001 +Python-Version: 2.2 +Post-History: + + +Abstract + + This PEP suggests a "sort by value" operation for dictionaries. + The primary benefit would be in terms of "batteries included" + support for a common Python idiom which, in its current form, is + both difficult for beginners to understand and cumbersome for all + to implement. + + +Motivation + + A common use of dictionaries is to count occurrences by setting + the value of d[key] to 1 on its first occurrence, then increment + the value on each subsequent occurrence. This can be done several + different ways, but the get() method is the most succinct: + + d[key] = d.get(key, 0) + 1 + + Once all occurrences have been counted, a common use of the + resulting dictionary is to print the occurrences in + occurrence-sorted order, often with the largest value first. + + This leads to a need to sort a dictionary's items by value. The + canonical method of doing so in Python is to first use d.items() + to get a list of the dictionary's items, then invert the ordering + of each item's tuple from (key, value) into (value, key), then + sort the list; since Python sorts the list based on the first item + of the tuple, the list of (inverted) items is therefore sorted by + value. If desired, the list can then be reversed, and the tuples + can be re-inverted back to (key, value). (However, in my + experience, the inverted tuple ordering is fine for most purposes, + e.g. printing out the list.) + + For example, given an occurrence count of: + + >>> d = {'a':2, 'b':23, 'c':5, 'd':17, 'e':1} + + we might do: + + >>> items = [(v, k) for k, v in d.items()] + >>> items.sort() + >>> items.reverse() # so largest is first + >>> items = [(k, v) for v, k in items] + + resulting in: + + >>> items + [('b', 23), ('d', 17), ('c', 5), ('a', 2), ('e', 1)] + + which shows the list in by-value order, largest first. (In this + case, 'b' was found to have the most occurrences.) + + This works fine, but is "hard to use" in two aspects. First, + although this idiom is known to veteran Pythoneers, it is not at + all obvious to newbies -- either in terms of its algorithm + (inverting the ordering of item tuples) or its implementation + (using list comprehensions -- which are an advanced Python + feature.) Second, it requires having to repeatedly type a lot of + "grunge", resulting in both tedium and mistakes. + + We therefore would rather Python provide a method of sorting + dictionaries by value which would be both easy for newbies to + understand (or, better yet, not to _have to_ understand) and + easier for all to use. + + +Rationale + + As Tim Peters has pointed out, this sort of thing brings on the + problem of trying to be all things to all people. Therefore, we + will limit its scope to try to hit "the sweet spot". Unusual + cases (e.g. sorting via a custom comparison function) can, of + course, be handled "manually" using present methods. + + Here are some simple possibilities: + + The items() method of dictionaries can be augmented with new + parameters having default values that provide for full + backwards-compatibility: + + (1) items(sort_by_values=0, reversed=0) + + or maybe just: + + (2) items(sort_by_values=0) + + since reversing a list is easy enough. + + Alternatively, items() could simply let us control the (key, value) + order: + + (3) items(values_first=0) + + Again, this is fully backwards-compatible. It does less work than + the others, but it at least eases the most complicated/tricky part + of the sort-by-value problem: inverting the order of item tuples. + Using this is very simple: + + items = d.items(1) + items.sort() + items.reverse() # (if desired) + + The primary drawback of the preceding three approaches is the + additional overhead for the parameter-less "items()" case, due to + having to process default parameters. (However, if one assumes + that items() gets used primarily for creating sort-by-value lists, + this is not really a drawback in practice.) + + Alternatively, we might add a new dictionary method which somehow + embodies "sorting". This approach offers two advantages. First, + it avoids adding overhead to the items() method. Second, it is + perhaps more accessible to newbies: when they go looking for a + method for sorting dictionaries, they hopefully run into this one, + and they will not have to understand the finer points of tuple + inversion and list sorting to achieve sort-by-value. + + To allow the four basic possibilities of sorting by key/value and in + forward/reverse order, we could add this method: + + (4) sorted_items(by_value=0, reversed=0) + + I believe the most common case would actually be "by_value=1, + reversed=1", but the defaults values given here might lead to + fewer surprises by users: sorted_items() would be the same as + items() followed by sort(). + + Finally (as a last resort), we could use: + + (5) items_sorted_by_value(reversed=0) + + +Implementation + + The proposed dictionary methods would necessarily be implemented + in C. Presumably, the implementation would be fairly simple since + it involves just adding a few calls to Python's existing + machinery. + + +Concerns + + Aside from the run-time overhead already addressed in + possibilities 1 through 3, concerns with this proposal probably + will fall into the categories of "feature bloat" and/or "code + bloat". However, I believe that several of the suggestions made + here will result in quite minimal bloat, resulting in a good + tradeoff between bloat and "value added". + + Tim Peters has noted that implementing this in C might not be + significantly faster than implementing it in Python today. + However, the major benefits intended here are "accessibility" and + "ease of use", not "speed". Therefore, as long as it is not + noticeably slower (in the case of plain items(), speed need not be + a consideration. + + +References + + A related thread called "counting occurrences" appeared on + comp.lang.python in August, 2001. This included examples of + approaches to systematizing the sort-by-value problem by + implementing it as reusable Python functions and classes. + + +Copyright + + This document has been placed in the public domain. + + + +Local Variables: +mode: indented-text +indent-tabs-mode: nil +End: +