PEP 412: add missing variables footer and reformat.
This commit is contained in:
parent
7638f18993
commit
4408906f8a
185
pep-0412.txt
185
pep-0412.txt
|
@ -14,26 +14,28 @@ Post-History: 08-Feb-2012
|
|||
Abstract
|
||||
========
|
||||
|
||||
This PEP proposes a change in the implementation of the builtin dictionary
|
||||
type ``dict``. The new implementation allows dictionaries which are used as
|
||||
attribute dictionaries (the ``__dict__`` attribute of an object) to share
|
||||
keys with other attribute dictionaries of instances of the same class.
|
||||
This PEP proposes a change in the implementation of the builtin
|
||||
dictionary type ``dict``. The new implementation allows dictionaries
|
||||
which are used as attribute dictionaries (the ``__dict__`` attribute
|
||||
of an object) to share keys with other attribute dictionaries of
|
||||
instances of the same class.
|
||||
|
||||
Motivation
|
||||
==========
|
||||
|
||||
The current dictionary implementation uses more memory than is necessary
|
||||
when used as a container for object attributes as the keys are
|
||||
replicated for each instance rather than being shared across many instances
|
||||
of the same class.
|
||||
Despite this, the current dictionary implementation is finely tuned and
|
||||
performs very well as a general-purpose mapping object.
|
||||
The current dictionary implementation uses more memory than is
|
||||
necessary when used as a container for object attributes as the keys
|
||||
are replicated for each instance rather than being shared across many
|
||||
instances of the same class. Despite this, the current dictionary
|
||||
implementation is finely tuned and performs very well as a
|
||||
general-purpose mapping object.
|
||||
|
||||
By separating the keys (and hashes) from the values it is possible to share
|
||||
the keys between multiple dictionaries and improve memory use.
|
||||
By ensuring that keys are separated from the values only when beneficial,
|
||||
it is possible to retain the high-performance of the current dictionary
|
||||
implementation when used as a general-purpose mapping object.
|
||||
By separating the keys (and hashes) from the values it is possible to
|
||||
share the keys between multiple dictionaries and improve memory use.
|
||||
By ensuring that keys are separated from the values only when
|
||||
beneficial, it is possible to retain the high-performance of the
|
||||
current dictionary implementation when used as a general-purpose
|
||||
mapping object.
|
||||
|
||||
Behaviour
|
||||
=========
|
||||
|
@ -47,76 +49,80 @@ Performance
|
|||
Memory Usage
|
||||
------------
|
||||
|
||||
Reduction in memory use is directly related to the number of dictionaries
|
||||
with shared keys in existence at any time. These dictionaries are typically
|
||||
half the size of the current dictionary implementation.
|
||||
Reduction in memory use is directly related to the number of
|
||||
dictionaries with shared keys in existence at any time. These
|
||||
dictionaries are typically half the size of the current dictionary
|
||||
implementation.
|
||||
|
||||
Benchmarking shows that memory use is reduced by 10% to 20% for
|
||||
object-oriented programs with no significant change in memory use
|
||||
for other programs.
|
||||
object-oriented programs with no significant change in memory use for
|
||||
other programs.
|
||||
|
||||
Speed
|
||||
-----
|
||||
|
||||
The performance of the new implementation is dominated by memory locality
|
||||
effects. When keys are not shared (for example in module dictionaries
|
||||
and dictionary explicitly created by dict() or {} ) then performance is
|
||||
unchanged (within a percent or two) from the current implementation.
|
||||
The performance of the new implementation is dominated by memory
|
||||
locality effects. When keys are not shared (for example in module
|
||||
dictionaries and dictionary explicitly created by ``dict()`` or
|
||||
``{}``) then performance is unchanged (within a percent or two) from
|
||||
the current implementation.
|
||||
|
||||
For the shared keys case, the new implementation tends to separate keys
|
||||
from values, but reduces total memory usage. This will improve performance
|
||||
in many cases as the effects of reduced memory usage outweigh the loss of
|
||||
locality, but some programs may show a small slow down.
|
||||
For the shared keys case, the new implementation tends to separate
|
||||
keys from values, but reduces total memory usage. This will improve
|
||||
performance in many cases as the effects of reduced memory usage
|
||||
outweigh the loss of locality, but some programs may show a small slow
|
||||
down.
|
||||
|
||||
Benchmarking shows no significant change of speed for most benchmarks.
|
||||
Object-oriented benchmarks show small speed ups when they create large
|
||||
numbers of objects of the same class (the gcbench benchmark shows a 10%
|
||||
speed up; this is likely to be an upper limit).
|
||||
numbers of objects of the same class (the gcbench benchmark shows a
|
||||
10% speed up; this is likely to be an upper limit).
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
Both the old and new dictionaries consist of a fixed-sized dict struct and
|
||||
a re-sizeable table.
|
||||
In the new dictionary the table can be further split into a keys table and
|
||||
values array.
|
||||
The keys table holds the keys and hashes and (for non-split tables) the
|
||||
values as well. It differs only from the original implementation in that it
|
||||
Both the old and new dictionaries consist of a fixed-sized dict struct
|
||||
and a re-sizeable table. In the new dictionary the table can be
|
||||
further split into a keys table and values array. The keys table
|
||||
holds the keys and hashes and (for non-split tables) the values as
|
||||
well. It differs only from the original implementation in that it
|
||||
contains a number of fields that were previously in the dict struct.
|
||||
If a table is split the values in the keys table are ignored, instead the
|
||||
values are held in a separate array.
|
||||
If a table is split the values in the keys table are ignored, instead
|
||||
the values are held in a separate array.
|
||||
|
||||
Split-Table dictionaries
|
||||
------------------------
|
||||
|
||||
When dictionaries are created to fill the __dict__ slot of an object, they are
|
||||
created in split form. The keys table is cached in the type, potentially
|
||||
allowing all attribute dictionaries of instances of one class to share keys.
|
||||
In the event of the keys of these dictionaries starting to diverge,
|
||||
individual dictionaries will lazily convert to the combined-table form.
|
||||
This ensures good memory use in the common case, and correctness in all cases.
|
||||
When dictionaries are created to fill the __dict__ slot of an object,
|
||||
they are created in split form. The keys table is cached in the type,
|
||||
potentially allowing all attribute dictionaries of instances of one
|
||||
class to share keys. In the event of the keys of these dictionaries
|
||||
starting to diverge, individual dictionaries will lazily convert to
|
||||
the combined-table form. This ensures good memory use in the common
|
||||
case, and correctness in all cases.
|
||||
|
||||
When resizing a split dictionary it is converted to a combined table.
|
||||
If resizing is as a result of storing an instance attribute, and there is
|
||||
only instance of a class, then the dictionary will be re-split immediately.
|
||||
Since most OO code will set attributes in the __init__ method, all attributes
|
||||
will be set before a second instance is created and no more resizing will be
|
||||
necessary as all further instance dictionaries will have the correct size.
|
||||
For more complex use patterns, it is impossible to know what is the best
|
||||
approach, so the implementation allows extra insertions up to the point
|
||||
of a resize when it reverts to the combined table (non-shared keys).
|
||||
If resizing is as a result of storing an instance attribute, and there
|
||||
is only instance of a class, then the dictionary will be re-split
|
||||
immediately. Since most OO code will set attributes in the __init__
|
||||
method, all attributes will be set before a second instance is created
|
||||
and no more resizing will be necessary as all further instance
|
||||
dictionaries will have the correct size. For more complex use
|
||||
patterns, it is impossible to know what is the best approach, so the
|
||||
implementation allows extra insertions up to the point of a resize
|
||||
when it reverts to the combined table (non-shared keys).
|
||||
|
||||
A deletion from a split dictionary does not change the keys table, it simply
|
||||
removes the value from the values array.
|
||||
A deletion from a split dictionary does not change the keys table, it
|
||||
simply removes the value from the values array.
|
||||
|
||||
Combined-Table dictionaries
|
||||
---------------------------
|
||||
|
||||
Explicit dictionaries (dict() or {}), module dictionaries and most other
|
||||
dictionaries are created as combined-table dictionaries.
|
||||
A combined-table dictionary never becomes a split-table dictionary.
|
||||
Combined tables are laid out in much the same way as the tables in the old
|
||||
dictionary, resulting in very similar performance.
|
||||
Explicit dictionaries (``dict()`` or ``{}``), module dictionaries and
|
||||
most other dictionaries are created as combined-table dictionaries. A
|
||||
combined-table dictionary never becomes a split-table dictionary.
|
||||
Combined tables are laid out in much the same way as the tables in the
|
||||
old dictionary, resulting in very similar performance.
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
@ -129,44 +135,45 @@ Pros and Cons
|
|||
Pros
|
||||
----
|
||||
|
||||
Significant memory savings for object-oriented applications.
|
||||
Small improvement to speed for programs which create lots of similar objects.
|
||||
Significant memory savings for object-oriented applications. Small
|
||||
improvement to speed for programs which create lots of similar
|
||||
objects.
|
||||
|
||||
Cons
|
||||
----
|
||||
|
||||
Change to data structures:
|
||||
Third party modules which meddle with the internals of the dictionary
|
||||
implementation will break.
|
||||
Changes to repr() output and iteration order:
|
||||
For most cases, this will be unchanged.
|
||||
However for some split-table dictionaries the iteration order will
|
||||
change.
|
||||
Change to data structures: Third party modules which meddle with the
|
||||
internals of the dictionary implementation will break.
|
||||
|
||||
Neither of these cons should be a problem.
|
||||
Modules which meddle with the internals of the dictionary
|
||||
implementation are already broken and should be fixed to use the API.
|
||||
The iteration order of dictionaries was never defined and has always been
|
||||
arbitrary; it is different for Jython and PyPy.
|
||||
Changes to repr() output and iteration order: For most cases, this
|
||||
will be unchanged. However for some split-table dictionaries the
|
||||
iteration order will change.
|
||||
|
||||
Neither of these cons should be a problem. Modules which meddle with
|
||||
the internals of the dictionary implementation are already broken and
|
||||
should be fixed to use the API. The iteration order of dictionaries
|
||||
was never defined and has always been arbitrary; it is different for
|
||||
Jython and PyPy.
|
||||
|
||||
Alternative Implementation
|
||||
--------------------------
|
||||
|
||||
An alternative implementation for split tables, which could save even more
|
||||
memory, is to store an index in the value field of the keys table (instead
|
||||
of ignoring the value field). This index would explicitly state where in the
|
||||
value array to look. The value array would then only require 1 field for each
|
||||
usable slot in the key table, rather than each slot in the key table.
|
||||
An alternative implementation for split tables, which could save even
|
||||
more memory, is to store an index in the value field of the keys table
|
||||
(instead of ignoring the value field). This index would explicitly
|
||||
state where in the value array to look. The value array would then
|
||||
only require 1 field for each usable slot in the key table, rather
|
||||
than each slot in the key table.
|
||||
|
||||
This "indexed" version would reduce the size of value array by about
|
||||
one third. The keys table would need an extra "values_size" field, increasing
|
||||
the size of combined dicts by one word.
|
||||
The extra indirection adds more complexity to the code, potentially reducing
|
||||
one third. The keys table would need an extra "values_size" field,
|
||||
increasing the size of combined dicts by one word. The extra
|
||||
indirection adds more complexity to the code, potentially reducing
|
||||
performance a little.
|
||||
|
||||
The "indexed" version will not be included in this implementation,
|
||||
but should be considered deferred rather than rejected,
|
||||
pending further experimentation.
|
||||
The "indexed" version will not be included in this implementation, but
|
||||
should be considered deferred rather than rejected, pending further
|
||||
experimentation.
|
||||
|
||||
References
|
||||
==========
|
||||
|
@ -179,3 +186,13 @@ Copyright
|
|||
|
||||
This document has been placed in the public domain.
|
||||
|
||||
|
||||
|
||||
..
|
||||
Local Variables:
|
||||
mode: indented-text
|
||||
indent-tabs-mode: nil
|
||||
sentence-end-double-space: t
|
||||
fill-column: 70
|
||||
coding: utf-8
|
||||
End:
|
||||
|
|
Loading…
Reference in New Issue