283 lines
8.7 KiB
ReStructuredText
283 lines
8.7 KiB
ReStructuredText
PEP: 455
|
||
Title: Adding a key-transforming dictionary to collections
|
||
Version: $Revision$
|
||
Last-Modified: $Date$
|
||
Author: Antoine Pitrou <solipsis@pitrou.net>
|
||
BDFL-Delegate: Raymond Hettinger
|
||
Status: Rejected
|
||
Type: Standards Track
|
||
Content-Type: text/x-rst
|
||
Created: 13-Sep-2013
|
||
Python-Version: 3.5
|
||
Post-History:
|
||
|
||
|
||
Abstract
|
||
========
|
||
|
||
This PEP proposes a new data structure for the ``collections`` module,
|
||
called "TransformDict" in this PEP. This structure is a mutable mapping
|
||
which transforms the key using a given function when doing a lookup, but
|
||
retains the original key when reading.
|
||
|
||
Rejection
|
||
---------
|
||
|
||
See the rationale at
|
||
https://mail.python.org/pipermail/python-dev/2015-May/140003.html
|
||
and for an earlier partial review, see
|
||
https://mail.python.org/pipermail/python-dev/2013-October/129937.html .
|
||
|
||
Rationale
|
||
=========
|
||
|
||
Numerous specialized versions of this pattern exist. The most common
|
||
is a case-insensitive case-preserving dict, i.e. a dict-like container
|
||
which matches keys in a case-insensitive fashion but retains the original
|
||
casing. It is a very common need in network programming, as many
|
||
protocols feature some arrays of "key / value" properties in their
|
||
messages, where the keys are textual strings whose case is specified to
|
||
be ignored on receipt but by either specification or custom is to be
|
||
preserved or non-trivially canonicalized when retransmitted.
|
||
|
||
Another common request is an identity dict, where keys are matched
|
||
according to their respective id()s instead of normal matching.
|
||
|
||
Both are instances of a more general pattern, where a given transformation
|
||
function is applied to keys when looking them up: that function being
|
||
``str.lower`` or ``str.casefold`` in the former example and the built-in
|
||
``id`` function in the latter.
|
||
|
||
(It could be said that the pattern *projects* keys from the user-visible
|
||
set onto the internal lookup set.)
|
||
|
||
|
||
Semantics
|
||
=========
|
||
|
||
TransformDict is a ``MutableMapping`` implementation: it faithfully
|
||
implements the well-known API of mutable mappings, like ``dict`` itself
|
||
and other dict-like classes in the standard library. Therefore, this PEP
|
||
won't rehash the semantics of most TransformDict methods.
|
||
|
||
The transformation function needn't be bijective, it can be strictly
|
||
surjective as in the case-insensitive example (in other words, different
|
||
keys can lookup the same value)::
|
||
|
||
>>> d = TransformDict(str.casefold)
|
||
>>> d['SomeKey'] = 5
|
||
>>> d['somekey']
|
||
5
|
||
>>> d['SOMEKEY']
|
||
5
|
||
|
||
TransformDict retains the first key used when creating an entry::
|
||
|
||
>>> d = TransformDict(str.casefold)
|
||
>>> d['SomeKey'] = 1
|
||
>>> d['somekey'] = 2
|
||
>>> list(d.items())
|
||
[('SomeKey', 2)]
|
||
|
||
The original keys needn't be hashable, as long as the transformation
|
||
function returns a hashable one::
|
||
|
||
>>> d = TransformDict(id)
|
||
>>> l = [None]
|
||
>>> d[l] = 5
|
||
>>> l in d
|
||
True
|
||
|
||
Constructor
|
||
-----------
|
||
|
||
As shown in the examples above, creating a TransformDict requires passing
|
||
the key transformation function as the first argument (much like creating
|
||
a ``defaultdict`` requires passing the factory function as first argument).
|
||
|
||
The constructor also takes other optional arguments which can be used
|
||
to initialize the TransformDict with certain key-value pairs. Those
|
||
optional arguments are the same as in the ``dict`` and ``defaultdict``
|
||
constructors::
|
||
|
||
>>> d = TransformDict(str.casefold, [('Foo', 1)], Bar=2)
|
||
>>> sorted(d.items())
|
||
[('Bar', 2), ('Foo', 1)]
|
||
|
||
Getting the original key
|
||
------------------------
|
||
|
||
TransformDict also features a lookup method returning the stored key
|
||
together with the corresponding value::
|
||
|
||
>>> d = TransformDict(str.casefold, {'Foo': 1})
|
||
>>> d.getitem('FOO')
|
||
('Foo', 1)
|
||
>>> d.getitem('bar')
|
||
Traceback (most recent call last):
|
||
File "<stdin>", line 1, in <module>
|
||
KeyError: 'bar'
|
||
|
||
The method name ``getitem()`` follows the standard ``popitem()`` method
|
||
on mutable mappings.
|
||
|
||
Getting the transformation function
|
||
-----------------------------------
|
||
|
||
TransformDict has a simple read-only property ``transform_func`` which
|
||
gives back the transformation function.
|
||
|
||
|
||
Alternative proposals and questions
|
||
===================================
|
||
|
||
Retaining the last original key
|
||
-------------------------------
|
||
|
||
Most python-dev respondents found retaining the first user-supplied key
|
||
more intuitive than retaining the last. Also, it matches the dict
|
||
object's own behaviour when using different but equal keys::
|
||
|
||
>>> d = {}
|
||
>>> d[1] = 'hello'
|
||
>>> d[1.0] = 'world'
|
||
>>> d
|
||
{1: 'world'}
|
||
|
||
Furthermore, explicitly retaining the last key in a first-key-retaining
|
||
scheme is still possible using the following approach::
|
||
|
||
d.pop(key, None)
|
||
d[key] = value
|
||
|
||
while the converse (retaining the first key in a last-key-retaining
|
||
scheme) doesn't look possible without rewriting part of the container's
|
||
code.
|
||
|
||
Using an encoder / decoder pair
|
||
-------------------------------
|
||
|
||
Using a function pair isn't necessary, since the original key is retained
|
||
by the container. Moreover, an encoder / decoder pair would require the
|
||
transformation to be bijective, which prevents important use cases
|
||
like case-insensitive matching.
|
||
|
||
Providing a transformation function for values
|
||
----------------------------------------------
|
||
|
||
Dictionary values are not used for lookup, their semantics are totally
|
||
irrelevant to the container's operation. Therefore, there is no point in
|
||
having both an "original" and a "transformed" value: the transformed
|
||
value wouldn't be used for anything.
|
||
|
||
Providing a specialized container, not generic
|
||
----------------------------------------------
|
||
|
||
It was asked why we would provide the generic TransformDict construct
|
||
rather than a specialized case-insensitive dict variant. The answer
|
||
is that it's nearly as cheap (code-wise and performance-wise) to provide
|
||
the generic construct, and it can fill more use cases.
|
||
|
||
Even case-insensitive dicts can actually elicit different transformation
|
||
functions: ``str.lower``, ``str.casefold`` or in some cases ``bytes.lower``
|
||
when working with text encoded in an ASCII-compatible encoding.
|
||
|
||
Other constructor patterns
|
||
--------------------------
|
||
|
||
Two other constructor patterns were proposed by Serhiy Storchaka:
|
||
|
||
* A type factory scheme::
|
||
|
||
d = TransformDict(str.casefold)(Foo=1)
|
||
|
||
* A subclassing scheme::
|
||
|
||
class CaseInsensitiveDict(TransformDict):
|
||
__transform__ = str.casefold
|
||
|
||
d = CaseInsensitiveDict(Foo=1)
|
||
|
||
While both approaches can be defended, they don't follow established
|
||
practices in the standard library, and therefore were rejected.
|
||
|
||
|
||
Implementation
|
||
==============
|
||
|
||
A patch for the collections module is tracked on the bug tracker at
|
||
http://bugs.python.org/issue18986.
|
||
|
||
|
||
Existing work
|
||
=============
|
||
|
||
Case-insensitive dicts are a popular request:
|
||
|
||
* http://twistedmatrix.com/documents/current/api/twisted.python.util.InsensitiveDict.html
|
||
* https://mail.python.org/pipermail/python-list/2013-May/647243.html
|
||
* https://mail.python.org/pipermail/python-list/2005-April/296208.html
|
||
* https://mail.python.org/pipermail/python-list/2004-June/241748.html
|
||
* http://bugs.python.org/msg197376
|
||
* http://stackoverflow.com/a/2082169
|
||
* http://stackoverflow.com/a/3296782
|
||
* http://code.activestate.com/recipes/66315-case-insensitive-dictionary/
|
||
* https://gist.github.com/babakness/3901174
|
||
* http://www.wikier.org/blog/key-insensitive-dictionary-in-python
|
||
* http://en.sharejs.com/python/14534
|
||
* http://www.voidspace.org.uk/python/archive.shtml#caseless
|
||
|
||
Identity dicts have been requested too:
|
||
|
||
* https://mail.python.org/pipermail/python-ideas/2010-May/007235.html
|
||
* http://www.gossamer-threads.com/lists/python/python/209527
|
||
|
||
Several modules in the standard library use identity lookups for object
|
||
memoization, for example ``pickle``, ``json``, ``copy``, ``cProfile``,
|
||
``doctest`` and ``_threading_local``.
|
||
|
||
Other languages
|
||
---------------
|
||
|
||
C# / .Net
|
||
^^^^^^^^^
|
||
|
||
.Net has a generic ``Dictionary`` class where you can specify a custom
|
||
``IEqualityComparer``: http://msdn.microsoft.com/en-us/library/xfhwa508.aspx
|
||
|
||
Using it is the recommended way to write case-insensitive dictionaries:
|
||
http://stackoverflow.com/questions/13230414/case-insensitive-access-for-generic-dictionary
|
||
|
||
Java
|
||
^^^^
|
||
|
||
Java has a specialized ``CaseInsensitiveMap``:
|
||
http://commons.apache.org/proper/commons-collections/apidocs/org/apache/commons/collections4/map/CaseInsensitiveMap.html
|
||
|
||
It also has a separate ``IdentityHashMap``:
|
||
http://docs.oracle.com/javase/6/docs/api/java/util/IdentityHashMap.html
|
||
|
||
C++
|
||
^^^
|
||
|
||
The C++ Standard Template Library features an ``unordered_map``
|
||
with customizable hash and equality functions:
|
||
http://www.cplusplus.com/reference/unordered_map/unordered_map/
|
||
|
||
|
||
Copyright
|
||
=========
|
||
|
||
This document has been placed in the public domain.
|
||
|
||
|
||
|
||
..
|
||
Local Variables:
|
||
mode: indented-text
|
||
indent-tabs-mode: nil
|
||
sentence-end-double-space: t
|
||
fill-column: 70
|
||
coding: utf-8
|
||
End:
|