199 lines
6.9 KiB
Plaintext
199 lines
6.9 KiB
Plaintext
PEP: 412
|
||
Title: Key-Sharing Dictionary
|
||
Version: $Revision$
|
||
Last-Modified: $Date$
|
||
Author: Mark Shannon <mark@hotpy.org>
|
||
Status: Final
|
||
Type: Standards Track
|
||
Content-Type: text/x-rst
|
||
Created: 08-Feb-2012
|
||
Python-Version: 3.3
|
||
Post-History: 08-Feb-2012
|
||
|
||
|
||
Abstract
|
||
========
|
||
|
||
This PEP proposes a change in the implementation of the builtin
|
||
dictionary type ``dict``. The new implementation allows dictionaries
|
||
which are used as attribute dictionaries (the ``__dict__`` attribute
|
||
of an object) to share keys with other attribute dictionaries of
|
||
instances of the same class.
|
||
|
||
Motivation
|
||
==========
|
||
|
||
The current dictionary implementation uses more memory than is
|
||
necessary when used as a container for object attributes as the keys
|
||
are replicated for each instance rather than being shared across many
|
||
instances of the same class. Despite this, the current dictionary
|
||
implementation is finely tuned and performs very well as a
|
||
general-purpose mapping object.
|
||
|
||
By separating the keys (and hashes) from the values it is possible to
|
||
share the keys between multiple dictionaries and improve memory use.
|
||
By ensuring that keys are separated from the values only when
|
||
beneficial, it is possible to retain the high-performance of the
|
||
current dictionary implementation when used as a general-purpose
|
||
mapping object.
|
||
|
||
Behaviour
|
||
=========
|
||
|
||
The new dictionary behaves in the same way as the old implementation.
|
||
It fully conforms to the Python API, the C API and the ABI.
|
||
|
||
Performance
|
||
===========
|
||
|
||
Memory Usage
|
||
------------
|
||
|
||
Reduction in memory use is directly related to the number of
|
||
dictionaries with shared keys in existence at any time. These
|
||
dictionaries are typically half the size of the current dictionary
|
||
implementation.
|
||
|
||
Benchmarking shows that memory use is reduced by 10% to 20% for
|
||
object-oriented programs with no significant change in memory use for
|
||
other programs.
|
||
|
||
Speed
|
||
-----
|
||
|
||
The performance of the new implementation is dominated by memory
|
||
locality effects. When keys are not shared (for example in module
|
||
dictionaries and dictionary explicitly created by ``dict()`` or
|
||
``{}``) then performance is unchanged (within a percent or two) from
|
||
the current implementation.
|
||
|
||
For the shared keys case, the new implementation tends to separate
|
||
keys from values, but reduces total memory usage. This will improve
|
||
performance in many cases as the effects of reduced memory usage
|
||
outweigh the loss of locality, but some programs may show a small slow
|
||
down.
|
||
|
||
Benchmarking shows no significant change of speed for most benchmarks.
|
||
Object-oriented benchmarks show small speed ups when they create large
|
||
numbers of objects of the same class (the gcbench benchmark shows a
|
||
10% speed up; this is likely to be an upper limit).
|
||
|
||
Implementation
|
||
==============
|
||
|
||
Both the old and new dictionaries consist of a fixed-sized dict struct
|
||
and a re-sizeable table. In the new dictionary the table can be
|
||
further split into a keys table and values array. The keys table
|
||
holds the keys and hashes and (for non-split tables) the values as
|
||
well. It differs only from the original implementation in that it
|
||
contains a number of fields that were previously in the dict struct.
|
||
If a table is split the values in the keys table are ignored, instead
|
||
the values are held in a separate array.
|
||
|
||
Split-Table dictionaries
|
||
------------------------
|
||
|
||
When dictionaries are created to fill the __dict__ slot of an object,
|
||
they are created in split form. The keys table is cached in the type,
|
||
potentially allowing all attribute dictionaries of instances of one
|
||
class to share keys. In the event of the keys of these dictionaries
|
||
starting to diverge, individual dictionaries will lazily convert to
|
||
the combined-table form. This ensures good memory use in the common
|
||
case, and correctness in all cases.
|
||
|
||
When resizing a split dictionary it is converted to a combined table.
|
||
If resizing is as a result of storing an instance attribute, and there
|
||
is only instance of a class, then the dictionary will be re-split
|
||
immediately. Since most OO code will set attributes in the __init__
|
||
method, all attributes will be set before a second instance is created
|
||
and no more resizing will be necessary as all further instance
|
||
dictionaries will have the correct size. For more complex use
|
||
patterns, it is impossible to know what is the best approach, so the
|
||
implementation allows extra insertions up to the point of a resize
|
||
when it reverts to the combined table (non-shared keys).
|
||
|
||
A deletion from a split dictionary does not change the keys table, it
|
||
simply removes the value from the values array.
|
||
|
||
Combined-Table dictionaries
|
||
---------------------------
|
||
|
||
Explicit dictionaries (``dict()`` or ``{}``), module dictionaries and
|
||
most other dictionaries are created as combined-table dictionaries. A
|
||
combined-table dictionary never becomes a split-table dictionary.
|
||
Combined tables are laid out in much the same way as the tables in the
|
||
old dictionary, resulting in very similar performance.
|
||
|
||
Implementation
|
||
==============
|
||
|
||
The new dictionary implementation is available at [1]_.
|
||
|
||
Pros and Cons
|
||
=============
|
||
|
||
Pros
|
||
----
|
||
|
||
Significant memory savings for object-oriented applications. Small
|
||
improvement to speed for programs which create lots of similar
|
||
objects.
|
||
|
||
Cons
|
||
----
|
||
|
||
Change to data structures: Third party modules which meddle with the
|
||
internals of the dictionary implementation will break.
|
||
|
||
Changes to repr() output and iteration order: For most cases, this
|
||
will be unchanged. However, for some split-table dictionaries the
|
||
iteration order will change.
|
||
|
||
Neither of these cons should be a problem. Modules which meddle with
|
||
the internals of the dictionary implementation are already broken and
|
||
should be fixed to use the API. The iteration order of dictionaries
|
||
was never defined and has always been arbitrary; it is different for
|
||
Jython and PyPy.
|
||
|
||
Alternative Implementation
|
||
--------------------------
|
||
|
||
An alternative implementation for split tables, which could save even
|
||
more memory, is to store an index in the value field of the keys table
|
||
(instead of ignoring the value field). This index would explicitly
|
||
state where in the value array to look. The value array would then
|
||
only require 1 field for each usable slot in the key table, rather
|
||
than each slot in the key table.
|
||
|
||
This "indexed" version would reduce the size of value array by about
|
||
one third. The keys table would need an extra "values_size" field,
|
||
increasing the size of combined dicts by one word. The extra
|
||
indirection adds more complexity to the code, potentially reducing
|
||
performance a little.
|
||
|
||
The "indexed" version will not be included in this implementation, but
|
||
should be considered deferred rather than rejected, pending further
|
||
experimentation.
|
||
|
||
References
|
||
==========
|
||
|
||
.. [1] Reference Implementation:
|
||
https://bitbucket.org/markshannon/cpython_new_dict
|
||
|
||
Copyright
|
||
=========
|
||
|
||
This document has been placed in the public domain.
|
||
|
||
|
||
|
||
..
|
||
Local Variables:
|
||
mode: indented-text
|
||
indent-tabs-mode: nil
|
||
sentence-end-double-space: t
|
||
fill-column: 70
|
||
coding: utf-8
|
||
End:
|