Adding a key-transforming dictionary to collections
This commit is contained in:
parent
0be7861dee
commit
6bd3a9dd6a
|
@ -0,0 +1,195 @@
|
|||
PEP: 455
|
||||
Title: Adding a key-transforming dictionary to collections
|
||||
Version: $Revision$
|
||||
Last-Modified: $Date$
|
||||
Author: Antoine Pitrou <solipsis@pitrou.net>
|
||||
Status: Draft
|
||||
Type: Standards Track
|
||||
Content-Type: text/x-rst
|
||||
Created: 13-Sep-2013
|
||||
Python-Version: 3.4
|
||||
Post-History:
|
||||
|
||||
|
||||
Abstract
|
||||
========
|
||||
|
||||
This PEP proposes a new data structure for the ``collections`` module,
|
||||
called "TransformDict" in this PEP. This structure is a mutable mapping
|
||||
which transforms the key using a given function when doing a lookup, but
|
||||
retains the original key when reading.
|
||||
|
||||
|
||||
Rationale
|
||||
=========
|
||||
|
||||
Numerous specialized versions of this pattern exist. The most common
|
||||
is a case-insensitive case-preserving dict, i.e. a dict-like container
|
||||
which matches keys in a case-insensitive fashion but retains the original
|
||||
casing. It is a very common need in network programming, as many
|
||||
protocols feature some arrays of "key / value" properties in their
|
||||
messages, where the keys are textual strings whose casing isn't relevant.
|
||||
|
||||
Another common request is an identity dict, where keys are matched
|
||||
according to their respective id()s instead of normal matching.
|
||||
|
||||
Both are instances of a more general pattern, where a given transformation
|
||||
function is applied to keys when looking them up: that function being
|
||||
``str.lower`` in the former example and the built-in ``id`` function in
|
||||
the latter.
|
||||
|
||||
(it can be said that the pattern *projects* keys from the user-visible
|
||||
set onto the internal lookup set, hence this PEP's title)
|
||||
|
||||
|
||||
Semantics
|
||||
=========
|
||||
|
||||
TransformDict is a ``MutableMapping`` implementation: it faithfully
|
||||
implements the well-known API of mutable mappings, as ``dict`` itself
|
||||
and other dict-like classes in the standard library. Therefore, this PEP
|
||||
won't rehash the semantics of most TransformDict methods.
|
||||
|
||||
The transformation function needn't be bijective, it can be strictly
|
||||
surjective as in the case-insensitive example::
|
||||
|
||||
>>> d = TransformDict(str.lower)
|
||||
>>> d['SomeKey'] = 5
|
||||
>>> d['somekey']
|
||||
5
|
||||
>>> d['SOMEKEY']
|
||||
5
|
||||
|
||||
TransformDict retains the first key used when creating an entry::
|
||||
|
||||
>>> d = TransformDict(str.lower)
|
||||
>>> d['SomeKey'] = 1
|
||||
>>> d['somekey'] = 2
|
||||
>>> list(d.items())
|
||||
[('SomeKey', 2)]
|
||||
|
||||
The original keys needn't be hashable, as long as the transformation
|
||||
function returns a hashable one::
|
||||
|
||||
>>> d = TransformDict(id)
|
||||
>>> l = [None]
|
||||
>>> d[l] = 5
|
||||
>>> l in d
|
||||
True
|
||||
|
||||
Constructor
|
||||
-----------
|
||||
|
||||
As shown in the example aboves, creating a TransformDict requires passing
|
||||
the key transformation function as the first argument (much like creating
|
||||
a ``defaultdict`` requires passing the factory function as first argument).
|
||||
|
||||
The constructor also takes other optional arguments which can be used
|
||||
to initialize the TransformDict with certain key-value pairs. Those
|
||||
optional arguments are the same as in the ``dict`` and ``defaultdict``
|
||||
constructors::
|
||||
|
||||
>>> d = TransformDict(str.lower, [('Foo': 1)], Bar=2)
|
||||
>>> sorted(d.items())
|
||||
[('Bar', 2), ('Foo', 1)]
|
||||
|
||||
|
||||
Alternative proposals and questions
|
||||
===================================
|
||||
|
||||
Retaining the last original key
|
||||
-------------------------------
|
||||
|
||||
Most python-dev respondents found retaining the first user-supplied key
|
||||
more intuitive than retaining the last. Also, it matches the dict
|
||||
object's own behaviour when using different but equal keys::
|
||||
|
||||
>>> d = {}
|
||||
>>> d[1] = 'hello'
|
||||
>>> d[1.0] = 'world'
|
||||
>>> d
|
||||
{1: 'world'}
|
||||
|
||||
Furthermore, explicitly retaining the last key in a first-key-retaining
|
||||
scheme is still possible using the following approach::
|
||||
|
||||
d.pop(key, None)
|
||||
d[key] = value
|
||||
|
||||
while the converse (retaining the first key in a last-key-retaining
|
||||
scheme) doesn't look possible without rewriting part of the container's
|
||||
code.
|
||||
|
||||
Using an encoder / decoder pair
|
||||
-------------------------------
|
||||
|
||||
Using a function pair isn't necessary, since the original key is retained
|
||||
by the container. Moreover, an encoder / decoder pair would require the
|
||||
transformation to be bijective, which prevents important use cases
|
||||
like case-insensitive matching.
|
||||
|
||||
Providing a transformation function for values
|
||||
----------------------------------------------
|
||||
|
||||
Dictionary values are not used for lookup, their semantics are totally
|
||||
irrelevant to the container's operation. Therefore, there is no point in
|
||||
having both an "original" and a "transformed" value: the transformed
|
||||
value wouldn't be used for anything.
|
||||
|
||||
Providing a specialized container, not generic
|
||||
----------------------------------------------
|
||||
|
||||
It was asked why we would provide the generic TransformDict construct
|
||||
rather than a specialized case-insensitive dict variant. The answer
|
||||
is that it's nearly as cheap (code-wise and performance-wise) to provide
|
||||
the generic construct, and it can fill more use cases.
|
||||
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
A patch for the collections module is tracked on the bug tracker at
|
||||
http://bugs.python.org/issue18986.
|
||||
|
||||
|
||||
Existing work
|
||||
=============
|
||||
|
||||
Case-insensitive dicts are a popular request:
|
||||
|
||||
* http://twistedmatrix.com/documents/current/api/twisted.python.util.InsensitiveDict.html
|
||||
* https://mail.python.org/pipermail/python-list/2013-May/647243.html
|
||||
* https://mail.python.org/pipermail/python-list/2005-April/296208.html
|
||||
* https://mail.python.org/pipermail/python-list/2004-June/241748.html
|
||||
* http://bugs.python.org/msg197376
|
||||
* http://stackoverflow.com/a/2082169
|
||||
* http://stackoverflow.com/a/3296782
|
||||
* http://code.activestate.com/recipes/66315-case-insensitive-dictionary/
|
||||
* https://gist.github.com/babakness/3901174
|
||||
* http://www.wikier.org/blog/key-insensitive-dictionary-in-python
|
||||
* http://en.sharejs.com/python/14534
|
||||
* http://www.voidspace.org.uk/python/archive.shtml#caseless
|
||||
|
||||
Identity dicts have been requested too:
|
||||
|
||||
* https://mail.python.org/pipermail/python-ideas/2010-May/007235.html
|
||||
* http://www.gossamer-threads.com/lists/python/python/209527
|
||||
|
||||
Python's own pickle module uses identity lookups for object
|
||||
memoization: http://hg.python.org/cpython/file/0e70bf1f32a3/Lib/pickle.py#l234
|
||||
|
||||
|
||||
Copyright
|
||||
=========
|
||||
|
||||
This document has been placed in the public domain.
|
||||
|
||||
|
||||
..
|
||||
Local Variables:
|
||||
mode: indented-text
|
||||
indent-tabs-mode: nil
|
||||
sentence-end-double-space: t
|
||||
fill-column: 70
|
||||
coding: utf-8
|
||||
End:
|
Loading…
Reference in New Issue