163 lines
5.8 KiB
Plaintext
163 lines
5.8 KiB
Plaintext
|
PEP: 412
|
||
|
Title: Key-Sharing Dictionary
|
||
|
Version: $Revision$
|
||
|
Last-Modified: $Date$
|
||
|
Author: Mark Shannon <mark@hotpy.org>
|
||
|
Status: Draft
|
||
|
Type: Standards Track
|
||
|
Content-Type: text/x-rst
|
||
|
Created: 08-Feb-2012
|
||
|
Python-Version: 3.3 or 3.4
|
||
|
Post-History: 08-Feb-2012
|
||
|
|
||
|
|
||
|
Abstract
|
||
|
========
|
||
|
|
||
|
This PEP proposes a change in the implementation of the builtin dictionary
|
||
|
type ``dict``. The new implementation allows dictionaries which are used as
|
||
|
attribute dictionaries (the ``__dict__`` attribute of an object) to share
|
||
|
keys with other attribute dictionaries of instances of the same class.
|
||
|
|
||
|
Motivation
|
||
|
==========
|
||
|
|
||
|
The current dictionary implementation uses more memory than is necessary
|
||
|
when used as a container for object attributes as the keys are
|
||
|
replicated for each instance rather than being shared across many instances
|
||
|
of the same class.
|
||
|
Despite this, the current dictionary implementation is finely tuned and
|
||
|
performs very well as a general-purpose mapping object.
|
||
|
|
||
|
By separating the keys (and hashes) from the values it is possible to share
|
||
|
the keys between multiple dictionaries and improve memory use.
|
||
|
By ensuring that keys are separated from the values only when beneficial,
|
||
|
it is possible to retain the high-performance of the current dictionary
|
||
|
implementation when used as a general-purpose mapping object.
|
||
|
|
||
|
Behaviour
|
||
|
=========
|
||
|
|
||
|
The new dictionary behaves in the same way as the old implementation.
|
||
|
It fully conforms to the Python API, the C API and the ABI.
|
||
|
|
||
|
Performance
|
||
|
===========
|
||
|
|
||
|
Memory Usage
|
||
|
------------
|
||
|
|
||
|
Reduction in memory use is directly related to the number of dictionaries
|
||
|
with shared keys in existence at any time. These dictionaries are typically
|
||
|
half the size of the current dictionary implementation.
|
||
|
|
||
|
Benchmarking shows that memory use is reduced by 10% to 20% for
|
||
|
object-oriented programs with no significant change in memory use
|
||
|
for other programs.
|
||
|
|
||
|
Speed
|
||
|
-----
|
||
|
|
||
|
The performance of the new implementation is dominated by memory locality
|
||
|
effects. When keys are not shared (for example in module dictionaries
|
||
|
and dictionary explicitly created by dict() or {} ) then performance is
|
||
|
unchanged (within a percent or two) from the current implementation.
|
||
|
|
||
|
For the shared keys case, the new implementation tends to separate keys
|
||
|
from values, but reduces total memory usage. This will improve performance
|
||
|
in many cases as the effects of reduced memory usage outweigh the loss of
|
||
|
locality, but some programs may show a small slow down.
|
||
|
|
||
|
Benchmarking shows no significant change of speed for most benchmarks.
|
||
|
Object-oriented benchmarks show small speed ups when they create large
|
||
|
numbers of objects of the same class (the gcbench benchmark shows a 10%
|
||
|
speed up; this is likely to be an upper limit).
|
||
|
|
||
|
Implementation
|
||
|
==============
|
||
|
|
||
|
Both the old and new dictionaries consist of a fixed-sized dict struct and
|
||
|
a re-sizeable table.
|
||
|
In the new dictionary the table can be further split into a keys table and
|
||
|
values array.
|
||
|
The keys table holds the keys and hashes and (for non-split tables) the
|
||
|
values as well. It differs only from the original implementation in that it
|
||
|
contains a number of fields that were previously in the dict struct.
|
||
|
If a table is split the values in the keys table are ignored, instead the
|
||
|
values are held in a separate array.
|
||
|
|
||
|
Split-Table dictionaries
|
||
|
------------------------
|
||
|
|
||
|
When dictionaries are created to fill the __dict__ slot of an object, they are
|
||
|
created in split form. The keys table is cached in the type, potentially
|
||
|
allowing all attribute dictionaries of instances of one class to share keys.
|
||
|
In the event of the keys of these dictionaries starting to diverge,
|
||
|
individual dictionaries will lazily convert to the combined-table form.
|
||
|
This ensures good memory use in the common case, and correctness in all cases.
|
||
|
|
||
|
When resizing a split dictionary it is converted to a combined table.
|
||
|
If resizing is as a result of storing an instance attribute, and there is
|
||
|
only instance of a class, then the dictionary will be re-split immediately.
|
||
|
Since most OO code will set attributes in the __init__ method, all attributes
|
||
|
will be set before a second instance is created and no more resizing will be
|
||
|
necessary as all further instance dictionaries will have the correct size.
|
||
|
For more complex use patterns, it is impossible to know what is the best
|
||
|
approach, so the implementation allows extra insertions up to the point
|
||
|
of a resize when it reverts to the combined table (non-shared keys).
|
||
|
|
||
|
A deletion from a split dictionary does not change the keys table, it simply
|
||
|
removes the value from the values array.
|
||
|
|
||
|
Combined-Table dictionaries
|
||
|
---------------------------
|
||
|
|
||
|
Explicit dictionaries (dict() or {}), module dictionaries and most other
|
||
|
dictionaries are created as combined-table dictionaries.
|
||
|
A combined-table dictionary never becomes a split-table dictionary.
|
||
|
Combined tables are laid out in much the same way as the tables in the old
|
||
|
dictionary, resulting in very similar performance.
|
||
|
|
||
|
Implementation
|
||
|
==============
|
||
|
|
||
|
The new dictionary implementation is available at [1]_.
|
||
|
|
||
|
Pros and Cons
|
||
|
=============
|
||
|
|
||
|
Pros
|
||
|
----
|
||
|
|
||
|
Significant memory savings for object-oriented applications.
|
||
|
Small improvement to speed for programs which create lots of similar objects.
|
||
|
|
||
|
Cons
|
||
|
----
|
||
|
|
||
|
Change to data structures:
|
||
|
Third party modules which meddle with the internals of the dictionary
|
||
|
implementation will break.
|
||
|
Changes to repr() output and iteration order:
|
||
|
For most cases, this will be unchanged.
|
||
|
However for some split-table dictionaries the iteration order will
|
||
|
change.
|
||
|
|
||
|
Neither of these cons should be a problem.
|
||
|
Modules which meddle with the internals of the dictionary
|
||
|
implementation are already broken and should be fixed to use the API.
|
||
|
The iteration order of dictionaries was never defined and has always been
|
||
|
arbitrary; it is different for Jython and PyPy.
|
||
|
|
||
|
References
|
||
|
==========
|
||
|
|
||
|
.. [1] Reference Implementation:
|
||
|
https://bitbucket.org/markshannon/cpython_new_dict
|
||
|
|
||
|
Copyright
|
||
|
=========
|
||
|
|
||
|
This document has been placed in the public domain.
|
||
|
|