Adding a key-transforming dictionary to collections

This commit is contained in:
Antoine Pitrou 2013-09-13 20:38:36 +02:00
parent 0be7861dee
commit 6bd3a9dd6a
1 changed files with 195 additions and 0 deletions

195
pep-0455.txt Normal file
View File

@ -0,0 +1,195 @@
PEP: 455
Title: Adding a key-transforming dictionary to collections
Version: $Revision$
Last-Modified: $Date$
Author: Antoine Pitrou <solipsis@pitrou.net>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 13-Sep-2013
Python-Version: 3.4
Post-History:
Abstract
========
This PEP proposes a new data structure for the ``collections`` module,
called "TransformDict" in this PEP. This structure is a mutable mapping
which transforms the key using a given function when doing a lookup, but
retains the original key when reading.
Rationale
=========
Numerous specialized versions of this pattern exist. The most common
is a case-insensitive case-preserving dict, i.e. a dict-like container
which matches keys in a case-insensitive fashion but retains the original
casing. It is a very common need in network programming, as many
protocols feature some arrays of "key / value" properties in their
messages, where the keys are textual strings whose casing isn't relevant.
Another common request is an identity dict, where keys are matched
according to their respective id()s instead of normal matching.
Both are instances of a more general pattern, where a given transformation
function is applied to keys when looking them up: that function being
``str.lower`` in the former example and the built-in ``id`` function in
the latter.
(it can be said that the pattern *projects* keys from the user-visible
set onto the internal lookup set, hence this PEP's title)
Semantics
=========
TransformDict is a ``MutableMapping`` implementation: it faithfully
implements the well-known API of mutable mappings, as ``dict`` itself
and other dict-like classes in the standard library. Therefore, this PEP
won't rehash the semantics of most TransformDict methods.
The transformation function needn't be bijective, it can be strictly
surjective as in the case-insensitive example::
>>> d = TransformDict(str.lower)
>>> d['SomeKey'] = 5
>>> d['somekey']
5
>>> d['SOMEKEY']
5
TransformDict retains the first key used when creating an entry::
>>> d = TransformDict(str.lower)
>>> d['SomeKey'] = 1
>>> d['somekey'] = 2
>>> list(d.items())
[('SomeKey', 2)]
The original keys needn't be hashable, as long as the transformation
function returns a hashable one::
>>> d = TransformDict(id)
>>> l = [None]
>>> d[l] = 5
>>> l in d
True
Constructor
-----------
As shown in the example aboves, creating a TransformDict requires passing
the key transformation function as the first argument (much like creating
a ``defaultdict`` requires passing the factory function as first argument).
The constructor also takes other optional arguments which can be used
to initialize the TransformDict with certain key-value pairs. Those
optional arguments are the same as in the ``dict`` and ``defaultdict``
constructors::
>>> d = TransformDict(str.lower, [('Foo': 1)], Bar=2)
>>> sorted(d.items())
[('Bar', 2), ('Foo', 1)]
Alternative proposals and questions
===================================
Retaining the last original key
-------------------------------
Most python-dev respondents found retaining the first user-supplied key
more intuitive than retaining the last. Also, it matches the dict
object's own behaviour when using different but equal keys::
>>> d = {}
>>> d[1] = 'hello'
>>> d[1.0] = 'world'
>>> d
{1: 'world'}
Furthermore, explicitly retaining the last key in a first-key-retaining
scheme is still possible using the following approach::
d.pop(key, None)
d[key] = value
while the converse (retaining the first key in a last-key-retaining
scheme) doesn't look possible without rewriting part of the container's
code.
Using an encoder / decoder pair
-------------------------------
Using a function pair isn't necessary, since the original key is retained
by the container. Moreover, an encoder / decoder pair would require the
transformation to be bijective, which prevents important use cases
like case-insensitive matching.
Providing a transformation function for values
----------------------------------------------
Dictionary values are not used for lookup, their semantics are totally
irrelevant to the container's operation. Therefore, there is no point in
having both an "original" and a "transformed" value: the transformed
value wouldn't be used for anything.
Providing a specialized container, not generic
----------------------------------------------
It was asked why we would provide the generic TransformDict construct
rather than a specialized case-insensitive dict variant. The answer
is that it's nearly as cheap (code-wise and performance-wise) to provide
the generic construct, and it can fill more use cases.
Implementation
==============
A patch for the collections module is tracked on the bug tracker at
http://bugs.python.org/issue18986.
Existing work
=============
Case-insensitive dicts are a popular request:
* http://twistedmatrix.com/documents/current/api/twisted.python.util.InsensitiveDict.html
* https://mail.python.org/pipermail/python-list/2013-May/647243.html
* https://mail.python.org/pipermail/python-list/2005-April/296208.html
* https://mail.python.org/pipermail/python-list/2004-June/241748.html
* http://bugs.python.org/msg197376
* http://stackoverflow.com/a/2082169
* http://stackoverflow.com/a/3296782
* http://code.activestate.com/recipes/66315-case-insensitive-dictionary/
* https://gist.github.com/babakness/3901174
* http://www.wikier.org/blog/key-insensitive-dictionary-in-python
* http://en.sharejs.com/python/14534
* http://www.voidspace.org.uk/python/archive.shtml#caseless
Identity dicts have been requested too:
* https://mail.python.org/pipermail/python-ideas/2010-May/007235.html
* http://www.gossamer-threads.com/lists/python/python/209527
Python's own pickle module uses identity lookups for object
memoization: http://hg.python.org/cpython/file/0e70bf1f32a3/Lib/pickle.py#l234
Copyright
=========
This document has been placed in the public domain.
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End: