PEP 412: add missing variables footer and reformat.

This commit is contained in:
Georg Brandl 2012-03-31 21:40:52 +02:00
parent 7638f18993
commit 4408906f8a
1 changed files with 101 additions and 84 deletions

View File

@ -14,26 +14,28 @@ Post-History: 08-Feb-2012
Abstract
========
This PEP proposes a change in the implementation of the builtin dictionary
type ``dict``. The new implementation allows dictionaries which are used as
attribute dictionaries (the ``__dict__`` attribute of an object) to share
keys with other attribute dictionaries of instances of the same class.
This PEP proposes a change in the implementation of the builtin
dictionary type ``dict``. The new implementation allows dictionaries
which are used as attribute dictionaries (the ``__dict__`` attribute
of an object) to share keys with other attribute dictionaries of
instances of the same class.
Motivation
==========
The current dictionary implementation uses more memory than is necessary
when used as a container for object attributes as the keys are
replicated for each instance rather than being shared across many instances
of the same class.
Despite this, the current dictionary implementation is finely tuned and
performs very well as a general-purpose mapping object.
The current dictionary implementation uses more memory than is
necessary when used as a container for object attributes as the keys
are replicated for each instance rather than being shared across many
instances of the same class. Despite this, the current dictionary
implementation is finely tuned and performs very well as a
general-purpose mapping object.
By separating the keys (and hashes) from the values it is possible to share
the keys between multiple dictionaries and improve memory use.
By ensuring that keys are separated from the values only when beneficial,
it is possible to retain the high-performance of the current dictionary
implementation when used as a general-purpose mapping object.
By separating the keys (and hashes) from the values it is possible to
share the keys between multiple dictionaries and improve memory use.
By ensuring that keys are separated from the values only when
beneficial, it is possible to retain the high-performance of the
current dictionary implementation when used as a general-purpose
mapping object.
Behaviour
=========
@ -47,76 +49,80 @@ Performance
Memory Usage
------------
Reduction in memory use is directly related to the number of dictionaries
with shared keys in existence at any time. These dictionaries are typically
half the size of the current dictionary implementation.
Reduction in memory use is directly related to the number of
dictionaries with shared keys in existence at any time. These
dictionaries are typically half the size of the current dictionary
implementation.
Benchmarking shows that memory use is reduced by 10% to 20% for
object-oriented programs with no significant change in memory use
for other programs.
object-oriented programs with no significant change in memory use for
other programs.
Speed
-----
The performance of the new implementation is dominated by memory locality
effects. When keys are not shared (for example in module dictionaries
and dictionary explicitly created by dict() or {} ) then performance is
unchanged (within a percent or two) from the current implementation.
The performance of the new implementation is dominated by memory
locality effects. When keys are not shared (for example in module
dictionaries and dictionary explicitly created by ``dict()`` or
``{}``) then performance is unchanged (within a percent or two) from
the current implementation.
For the shared keys case, the new implementation tends to separate keys
from values, but reduces total memory usage. This will improve performance
in many cases as the effects of reduced memory usage outweigh the loss of
locality, but some programs may show a small slow down.
For the shared keys case, the new implementation tends to separate
keys from values, but reduces total memory usage. This will improve
performance in many cases as the effects of reduced memory usage
outweigh the loss of locality, but some programs may show a small slow
down.
Benchmarking shows no significant change of speed for most benchmarks.
Object-oriented benchmarks show small speed ups when they create large
numbers of objects of the same class (the gcbench benchmark shows a 10%
speed up; this is likely to be an upper limit).
numbers of objects of the same class (the gcbench benchmark shows a
10% speed up; this is likely to be an upper limit).
Implementation
==============
Both the old and new dictionaries consist of a fixed-sized dict struct and
a re-sizeable table.
In the new dictionary the table can be further split into a keys table and
values array.
The keys table holds the keys and hashes and (for non-split tables) the
values as well. It differs only from the original implementation in that it
Both the old and new dictionaries consist of a fixed-sized dict struct
and a re-sizeable table. In the new dictionary the table can be
further split into a keys table and values array. The keys table
holds the keys and hashes and (for non-split tables) the values as
well. It differs only from the original implementation in that it
contains a number of fields that were previously in the dict struct.
If a table is split the values in the keys table are ignored, instead the
values are held in a separate array.
If a table is split the values in the keys table are ignored, instead
the values are held in a separate array.
Split-Table dictionaries
------------------------
When dictionaries are created to fill the __dict__ slot of an object, they are
created in split form. The keys table is cached in the type, potentially
allowing all attribute dictionaries of instances of one class to share keys.
In the event of the keys of these dictionaries starting to diverge,
individual dictionaries will lazily convert to the combined-table form.
This ensures good memory use in the common case, and correctness in all cases.
When dictionaries are created to fill the __dict__ slot of an object,
they are created in split form. The keys table is cached in the type,
potentially allowing all attribute dictionaries of instances of one
class to share keys. In the event of the keys of these dictionaries
starting to diverge, individual dictionaries will lazily convert to
the combined-table form. This ensures good memory use in the common
case, and correctness in all cases.
When resizing a split dictionary it is converted to a combined table.
If resizing is as a result of storing an instance attribute, and there is
only instance of a class, then the dictionary will be re-split immediately.
Since most OO code will set attributes in the __init__ method, all attributes
will be set before a second instance is created and no more resizing will be
necessary as all further instance dictionaries will have the correct size.
For more complex use patterns, it is impossible to know what is the best
approach, so the implementation allows extra insertions up to the point
of a resize when it reverts to the combined table (non-shared keys).
If resizing is as a result of storing an instance attribute, and there
is only instance of a class, then the dictionary will be re-split
immediately. Since most OO code will set attributes in the __init__
method, all attributes will be set before a second instance is created
and no more resizing will be necessary as all further instance
dictionaries will have the correct size. For more complex use
patterns, it is impossible to know what is the best approach, so the
implementation allows extra insertions up to the point of a resize
when it reverts to the combined table (non-shared keys).
A deletion from a split dictionary does not change the keys table, it simply
removes the value from the values array.
A deletion from a split dictionary does not change the keys table, it
simply removes the value from the values array.
Combined-Table dictionaries
---------------------------
Explicit dictionaries (dict() or {}), module dictionaries and most other
dictionaries are created as combined-table dictionaries.
A combined-table dictionary never becomes a split-table dictionary.
Combined tables are laid out in much the same way as the tables in the old
dictionary, resulting in very similar performance.
Explicit dictionaries (``dict()`` or ``{}``), module dictionaries and
most other dictionaries are created as combined-table dictionaries. A
combined-table dictionary never becomes a split-table dictionary.
Combined tables are laid out in much the same way as the tables in the
old dictionary, resulting in very similar performance.
Implementation
==============
@ -129,44 +135,45 @@ Pros and Cons
Pros
----
Significant memory savings for object-oriented applications.
Small improvement to speed for programs which create lots of similar objects.
Significant memory savings for object-oriented applications. Small
improvement to speed for programs which create lots of similar
objects.
Cons
----
Change to data structures:
Third party modules which meddle with the internals of the dictionary
implementation will break.
Changes to repr() output and iteration order:
For most cases, this will be unchanged.
However for some split-table dictionaries the iteration order will
change.
Change to data structures: Third party modules which meddle with the
internals of the dictionary implementation will break.
Neither of these cons should be a problem.
Modules which meddle with the internals of the dictionary
implementation are already broken and should be fixed to use the API.
The iteration order of dictionaries was never defined and has always been
arbitrary; it is different for Jython and PyPy.
Changes to repr() output and iteration order: For most cases, this
will be unchanged. However for some split-table dictionaries the
iteration order will change.
Neither of these cons should be a problem. Modules which meddle with
the internals of the dictionary implementation are already broken and
should be fixed to use the API. The iteration order of dictionaries
was never defined and has always been arbitrary; it is different for
Jython and PyPy.
Alternative Implementation
--------------------------
An alternative implementation for split tables, which could save even more
memory, is to store an index in the value field of the keys table (instead
of ignoring the value field). This index would explicitly state where in the
value array to look. The value array would then only require 1 field for each
usable slot in the key table, rather than each slot in the key table.
An alternative implementation for split tables, which could save even
more memory, is to store an index in the value field of the keys table
(instead of ignoring the value field). This index would explicitly
state where in the value array to look. The value array would then
only require 1 field for each usable slot in the key table, rather
than each slot in the key table.
This "indexed" version would reduce the size of value array by about
one third. The keys table would need an extra "values_size" field, increasing
the size of combined dicts by one word.
The extra indirection adds more complexity to the code, potentially reducing
one third. The keys table would need an extra "values_size" field,
increasing the size of combined dicts by one word. The extra
indirection adds more complexity to the code, potentially reducing
performance a little.
The "indexed" version will not be included in this implementation,
but should be considered deferred rather than rejected,
pending further experimentation.
The "indexed" version will not be included in this implementation, but
should be considered deferred rather than rejected, pending further
experimentation.
References
==========
@ -179,3 +186,13 @@ Copyright
This document has been placed in the public domain.
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End: