274 lines
8.4 KiB
Plaintext
274 lines
8.4 KiB
Plaintext
PEP: 455
|
|
Title: Adding a key-transforming dictionary to collections
|
|
Version: $Revision$
|
|
Last-Modified: $Date$
|
|
Author: Antoine Pitrou <solipsis@pitrou.net>
|
|
Status: Draft
|
|
Type: Standards Track
|
|
Content-Type: text/x-rst
|
|
Created: 13-Sep-2013
|
|
Python-Version: 3.4
|
|
Post-History:
|
|
|
|
|
|
Abstract
|
|
========
|
|
|
|
This PEP proposes a new data structure for the ``collections`` module,
|
|
called "TransformDict" in this PEP. This structure is a mutable mapping
|
|
which transforms the key using a given function when doing a lookup, but
|
|
retains the original key when reading.
|
|
|
|
|
|
Rationale
|
|
=========
|
|
|
|
Numerous specialized versions of this pattern exist. The most common
|
|
is a case-insensitive case-preserving dict, i.e. a dict-like container
|
|
which matches keys in a case-insensitive fashion but retains the original
|
|
casing. It is a very common need in network programming, as many
|
|
protocols feature some arrays of "key / value" properties in their
|
|
messages, where the keys are textual strings whose case is specified to
|
|
be ignored on receipt but by either specification or custom is to be
|
|
preserved or non-trivially canonicalized when retransmitted.
|
|
|
|
Another common request is an identity dict, where keys are matched
|
|
according to their respective id()s instead of normal matching.
|
|
|
|
Both are instances of a more general pattern, where a given transformation
|
|
function is applied to keys when looking them up: that function being
|
|
``str.lower`` or ``str.casefold`` in the former example and the built-in
|
|
``id`` function in the latter.
|
|
|
|
(it could be said that the pattern *projects* keys from the user-visible
|
|
set onto the internal lookup set)
|
|
|
|
|
|
Semantics
|
|
=========
|
|
|
|
TransformDict is a ``MutableMapping`` implementation: it faithfully
|
|
implements the well-known API of mutable mappings, as ``dict`` itself
|
|
and other dict-like classes in the standard library. Therefore, this PEP
|
|
won't rehash the semantics of most TransformDict methods.
|
|
|
|
The transformation function needn't be bijective, it can be strictly
|
|
surjective as in the case-insensitive example (in other words, different
|
|
keys can lookup the same value)::
|
|
|
|
>>> d = TransformDict(str.casefold)
|
|
>>> d['SomeKey'] = 5
|
|
>>> d['somekey']
|
|
5
|
|
>>> d['SOMEKEY']
|
|
5
|
|
|
|
TransformDict retains the first key used when creating an entry::
|
|
|
|
>>> d = TransformDict(str.casefold)
|
|
>>> d['SomeKey'] = 1
|
|
>>> d['somekey'] = 2
|
|
>>> list(d.items())
|
|
[('SomeKey', 2)]
|
|
|
|
The original keys needn't be hashable, as long as the transformation
|
|
function returns a hashable one::
|
|
|
|
>>> d = TransformDict(id)
|
|
>>> l = [None]
|
|
>>> d[l] = 5
|
|
>>> l in d
|
|
True
|
|
|
|
Constructor
|
|
-----------
|
|
|
|
As shown in the example aboves, creating a TransformDict requires passing
|
|
the key transformation function as the first argument (much like creating
|
|
a ``defaultdict`` requires passing the factory function as first argument).
|
|
|
|
The constructor also takes other optional arguments which can be used
|
|
to initialize the TransformDict with certain key-value pairs. Those
|
|
optional arguments are the same as in the ``dict`` and ``defaultdict``
|
|
constructors::
|
|
|
|
>>> d = TransformDict(str.casefold, [('Foo', 1)], Bar=2)
|
|
>>> sorted(d.items())
|
|
[('Bar', 2), ('Foo', 1)]
|
|
|
|
Getting the original key
|
|
------------------------
|
|
|
|
TransformDict also features a lookup method returning the stored key
|
|
together with the corresponding value::
|
|
|
|
>>> d = TransformDict(str.casefold, {'Foo': 1})
|
|
>>> d.getitem('FOO')
|
|
('Foo', 1)
|
|
>>> d.getitem('bar')
|
|
Traceback (most recent call last):
|
|
File "<stdin>", line 1, in <module>
|
|
KeyError: 'bar'
|
|
|
|
The method name ``getitem()`` mirrors the standard ``popitem()`` method
|
|
on mutable mappings.
|
|
|
|
Getting the transformation function
|
|
-----------------------------------
|
|
|
|
TransformDict has a simple read-only property ``transform_func`` which
|
|
gives back the transformation function.
|
|
|
|
|
|
Alternative proposals and questions
|
|
===================================
|
|
|
|
Retaining the last original key
|
|
-------------------------------
|
|
|
|
Most python-dev respondents found retaining the first user-supplied key
|
|
more intuitive than retaining the last. Also, it matches the dict
|
|
object's own behaviour when using different but equal keys::
|
|
|
|
>>> d = {}
|
|
>>> d[1] = 'hello'
|
|
>>> d[1.0] = 'world'
|
|
>>> d
|
|
{1: 'world'}
|
|
|
|
Furthermore, explicitly retaining the last key in a first-key-retaining
|
|
scheme is still possible using the following approach::
|
|
|
|
d.pop(key, None)
|
|
d[key] = value
|
|
|
|
while the converse (retaining the first key in a last-key-retaining
|
|
scheme) doesn't look possible without rewriting part of the container's
|
|
code.
|
|
|
|
Using an encoder / decoder pair
|
|
-------------------------------
|
|
|
|
Using a function pair isn't necessary, since the original key is retained
|
|
by the container. Moreover, an encoder / decoder pair would require the
|
|
transformation to be bijective, which prevents important use cases
|
|
like case-insensitive matching.
|
|
|
|
Providing a transformation function for values
|
|
----------------------------------------------
|
|
|
|
Dictionary values are not used for lookup, their semantics are totally
|
|
irrelevant to the container's operation. Therefore, there is no point in
|
|
having both an "original" and a "transformed" value: the transformed
|
|
value wouldn't be used for anything.
|
|
|
|
Providing a specialized container, not generic
|
|
----------------------------------------------
|
|
|
|
It was asked why we would provide the generic TransformDict construct
|
|
rather than a specialized case-insensitive dict variant. The answer
|
|
is that it's nearly as cheap (code-wise and performance-wise) to provide
|
|
the generic construct, and it can fill more use cases.
|
|
|
|
Even case-insensitive dicts can actually elicit different transformation
|
|
functions: ``str.lower``, ``str.casefold`` or in some cases ``bytes.lower``
|
|
when working with text encoded in a ASCII-compatible encoding.
|
|
|
|
Other constructor patterns
|
|
--------------------------
|
|
|
|
Two other constructor patterns were proposed by Serhiy Storchaka:
|
|
|
|
* A type factory scheme::
|
|
|
|
d = TransformDict(str.casefold)(Foo=1)
|
|
|
|
* A subclassing scheme::
|
|
|
|
class CaseInsensitiveDict(TransformDict):
|
|
__transform__ = str.casefold
|
|
|
|
d = CaseInsensitiveDict(Foo=1)
|
|
|
|
While both approaches can be defended, they don't follow established
|
|
practices in the standard library, and therefore were rejected.
|
|
|
|
|
|
Implementation
|
|
==============
|
|
|
|
A patch for the collections module is tracked on the bug tracker at
|
|
http://bugs.python.org/issue18986.
|
|
|
|
|
|
Existing work
|
|
=============
|
|
|
|
Case-insensitive dicts are a popular request:
|
|
|
|
* http://twistedmatrix.com/documents/current/api/twisted.python.util.InsensitiveDict.html
|
|
* https://mail.python.org/pipermail/python-list/2013-May/647243.html
|
|
* https://mail.python.org/pipermail/python-list/2005-April/296208.html
|
|
* https://mail.python.org/pipermail/python-list/2004-June/241748.html
|
|
* http://bugs.python.org/msg197376
|
|
* http://stackoverflow.com/a/2082169
|
|
* http://stackoverflow.com/a/3296782
|
|
* http://code.activestate.com/recipes/66315-case-insensitive-dictionary/
|
|
* https://gist.github.com/babakness/3901174
|
|
* http://www.wikier.org/blog/key-insensitive-dictionary-in-python
|
|
* http://en.sharejs.com/python/14534
|
|
* http://www.voidspace.org.uk/python/archive.shtml#caseless
|
|
|
|
Identity dicts have been requested too:
|
|
|
|
* https://mail.python.org/pipermail/python-ideas/2010-May/007235.html
|
|
* http://www.gossamer-threads.com/lists/python/python/209527
|
|
|
|
Several modules in the standard library use identity lookups for object
|
|
memoization, for example ``pickle``, ``json``, ``copy``, ``cProfile``,
|
|
``doctest`` and ``_threading_local``.
|
|
|
|
Other languages
|
|
---------------
|
|
|
|
C# / .Net
|
|
^^^^^^^^^
|
|
|
|
.Net has a generic ``Dictionary`` class where you can specify a custom
|
|
``IEqualityComparer``: http://msdn.microsoft.com/en-us/library/xfhwa508.aspx
|
|
|
|
Using it is the recommended way to write case-insensitive dictionaries:
|
|
http://stackoverflow.com/questions/13230414/case-insensitive-access-for-generic-dictionary
|
|
|
|
Java
|
|
^^^^
|
|
|
|
Java has a specialized ``CaseInsensitiveMap``:
|
|
http://commons.apache.org/proper/commons-collections/apidocs/org/apache/commons/collections4/map/CaseInsensitiveMap.html
|
|
|
|
It also has a separate ``IdentityHashMap``:
|
|
http://docs.oracle.com/javase/6/docs/api/java/util/IdentityHashMap.html
|
|
|
|
C++
|
|
^^^
|
|
|
|
The C++ Standard Template Library features an ``unordered_map``
|
|
with customizable hash and equality functions:
|
|
http://www.cplusplus.com/reference/unordered_map/unordered_map/
|
|
|
|
|
|
Copyright
|
|
=========
|
|
|
|
This document has been placed in the public domain.
|
|
|
|
|
|
..
|
|
Local Variables:
|
|
mode: indented-text
|
|
indent-tabs-mode: nil
|
|
sentence-end-double-space: t
|
|
fill-column: 70
|
|
coding: utf-8
|
|
End:
|