diff --git a/pep-0412.txt b/pep-0412.txt index e49d1cff9..a514b474a 100644 --- a/pep-0412.txt +++ b/pep-0412.txt @@ -14,26 +14,28 @@ Post-History: 08-Feb-2012 Abstract ======== -This PEP proposes a change in the implementation of the builtin dictionary -type ``dict``. The new implementation allows dictionaries which are used as -attribute dictionaries (the ``__dict__`` attribute of an object) to share -keys with other attribute dictionaries of instances of the same class. +This PEP proposes a change in the implementation of the builtin +dictionary type ``dict``. The new implementation allows dictionaries +which are used as attribute dictionaries (the ``__dict__`` attribute +of an object) to share keys with other attribute dictionaries of +instances of the same class. Motivation ========== -The current dictionary implementation uses more memory than is necessary -when used as a container for object attributes as the keys are -replicated for each instance rather than being shared across many instances -of the same class. -Despite this, the current dictionary implementation is finely tuned and -performs very well as a general-purpose mapping object. +The current dictionary implementation uses more memory than is +necessary when used as a container for object attributes as the keys +are replicated for each instance rather than being shared across many +instances of the same class. Despite this, the current dictionary +implementation is finely tuned and performs very well as a +general-purpose mapping object. -By separating the keys (and hashes) from the values it is possible to share -the keys between multiple dictionaries and improve memory use. -By ensuring that keys are separated from the values only when beneficial, -it is possible to retain the high-performance of the current dictionary -implementation when used as a general-purpose mapping object. +By separating the keys (and hashes) from the values it is possible to +share the keys between multiple dictionaries and improve memory use. +By ensuring that keys are separated from the values only when +beneficial, it is possible to retain the high-performance of the +current dictionary implementation when used as a general-purpose +mapping object. Behaviour ========= @@ -47,76 +49,80 @@ Performance Memory Usage ------------ -Reduction in memory use is directly related to the number of dictionaries -with shared keys in existence at any time. These dictionaries are typically -half the size of the current dictionary implementation. +Reduction in memory use is directly related to the number of +dictionaries with shared keys in existence at any time. These +dictionaries are typically half the size of the current dictionary +implementation. Benchmarking shows that memory use is reduced by 10% to 20% for -object-oriented programs with no significant change in memory use -for other programs. +object-oriented programs with no significant change in memory use for +other programs. Speed ----- -The performance of the new implementation is dominated by memory locality -effects. When keys are not shared (for example in module dictionaries -and dictionary explicitly created by dict() or {} ) then performance is -unchanged (within a percent or two) from the current implementation. +The performance of the new implementation is dominated by memory +locality effects. When keys are not shared (for example in module +dictionaries and dictionary explicitly created by ``dict()`` or +``{}``) then performance is unchanged (within a percent or two) from +the current implementation. -For the shared keys case, the new implementation tends to separate keys -from values, but reduces total memory usage. This will improve performance -in many cases as the effects of reduced memory usage outweigh the loss of -locality, but some programs may show a small slow down. +For the shared keys case, the new implementation tends to separate +keys from values, but reduces total memory usage. This will improve +performance in many cases as the effects of reduced memory usage +outweigh the loss of locality, but some programs may show a small slow +down. Benchmarking shows no significant change of speed for most benchmarks. Object-oriented benchmarks show small speed ups when they create large -numbers of objects of the same class (the gcbench benchmark shows a 10% -speed up; this is likely to be an upper limit). +numbers of objects of the same class (the gcbench benchmark shows a +10% speed up; this is likely to be an upper limit). Implementation ============== -Both the old and new dictionaries consist of a fixed-sized dict struct and -a re-sizeable table. -In the new dictionary the table can be further split into a keys table and -values array. -The keys table holds the keys and hashes and (for non-split tables) the -values as well. It differs only from the original implementation in that it +Both the old and new dictionaries consist of a fixed-sized dict struct +and a re-sizeable table. In the new dictionary the table can be +further split into a keys table and values array. The keys table +holds the keys and hashes and (for non-split tables) the values as +well. It differs only from the original implementation in that it contains a number of fields that were previously in the dict struct. -If a table is split the values in the keys table are ignored, instead the -values are held in a separate array. +If a table is split the values in the keys table are ignored, instead +the values are held in a separate array. Split-Table dictionaries ------------------------ -When dictionaries are created to fill the __dict__ slot of an object, they are -created in split form. The keys table is cached in the type, potentially -allowing all attribute dictionaries of instances of one class to share keys. -In the event of the keys of these dictionaries starting to diverge, -individual dictionaries will lazily convert to the combined-table form. -This ensures good memory use in the common case, and correctness in all cases. +When dictionaries are created to fill the __dict__ slot of an object, +they are created in split form. The keys table is cached in the type, +potentially allowing all attribute dictionaries of instances of one +class to share keys. In the event of the keys of these dictionaries +starting to diverge, individual dictionaries will lazily convert to +the combined-table form. This ensures good memory use in the common +case, and correctness in all cases. When resizing a split dictionary it is converted to a combined table. -If resizing is as a result of storing an instance attribute, and there is -only instance of a class, then the dictionary will be re-split immediately. -Since most OO code will set attributes in the __init__ method, all attributes -will be set before a second instance is created and no more resizing will be -necessary as all further instance dictionaries will have the correct size. -For more complex use patterns, it is impossible to know what is the best -approach, so the implementation allows extra insertions up to the point -of a resize when it reverts to the combined table (non-shared keys). +If resizing is as a result of storing an instance attribute, and there +is only instance of a class, then the dictionary will be re-split +immediately. Since most OO code will set attributes in the __init__ +method, all attributes will be set before a second instance is created +and no more resizing will be necessary as all further instance +dictionaries will have the correct size. For more complex use +patterns, it is impossible to know what is the best approach, so the +implementation allows extra insertions up to the point of a resize +when it reverts to the combined table (non-shared keys). -A deletion from a split dictionary does not change the keys table, it simply -removes the value from the values array. +A deletion from a split dictionary does not change the keys table, it +simply removes the value from the values array. Combined-Table dictionaries --------------------------- -Explicit dictionaries (dict() or {}), module dictionaries and most other -dictionaries are created as combined-table dictionaries. -A combined-table dictionary never becomes a split-table dictionary. -Combined tables are laid out in much the same way as the tables in the old -dictionary, resulting in very similar performance. +Explicit dictionaries (``dict()`` or ``{}``), module dictionaries and +most other dictionaries are created as combined-table dictionaries. A +combined-table dictionary never becomes a split-table dictionary. +Combined tables are laid out in much the same way as the tables in the +old dictionary, resulting in very similar performance. Implementation ============== @@ -129,44 +135,45 @@ Pros and Cons Pros ---- -Significant memory savings for object-oriented applications. -Small improvement to speed for programs which create lots of similar objects. +Significant memory savings for object-oriented applications. Small +improvement to speed for programs which create lots of similar +objects. Cons ---- -Change to data structures: -Third party modules which meddle with the internals of the dictionary -implementation will break. -Changes to repr() output and iteration order: -For most cases, this will be unchanged. -However for some split-table dictionaries the iteration order will -change. +Change to data structures: Third party modules which meddle with the +internals of the dictionary implementation will break. -Neither of these cons should be a problem. -Modules which meddle with the internals of the dictionary -implementation are already broken and should be fixed to use the API. -The iteration order of dictionaries was never defined and has always been -arbitrary; it is different for Jython and PyPy. +Changes to repr() output and iteration order: For most cases, this +will be unchanged. However for some split-table dictionaries the +iteration order will change. + +Neither of these cons should be a problem. Modules which meddle with +the internals of the dictionary implementation are already broken and +should be fixed to use the API. The iteration order of dictionaries +was never defined and has always been arbitrary; it is different for +Jython and PyPy. Alternative Implementation -------------------------- -An alternative implementation for split tables, which could save even more -memory, is to store an index in the value field of the keys table (instead -of ignoring the value field). This index would explicitly state where in the -value array to look. The value array would then only require 1 field for each -usable slot in the key table, rather than each slot in the key table. +An alternative implementation for split tables, which could save even +more memory, is to store an index in the value field of the keys table +(instead of ignoring the value field). This index would explicitly +state where in the value array to look. The value array would then +only require 1 field for each usable slot in the key table, rather +than each slot in the key table. This "indexed" version would reduce the size of value array by about -one third. The keys table would need an extra "values_size" field, increasing -the size of combined dicts by one word. -The extra indirection adds more complexity to the code, potentially reducing +one third. The keys table would need an extra "values_size" field, +increasing the size of combined dicts by one word. The extra +indirection adds more complexity to the code, potentially reducing performance a little. -The "indexed" version will not be included in this implementation, -but should be considered deferred rather than rejected, -pending further experimentation. +The "indexed" version will not be included in this implementation, but +should be considered deferred rather than rejected, pending further +experimentation. References ========== @@ -179,3 +186,13 @@ Copyright This document has been placed in the public domain. + + +.. + Local Variables: + mode: indented-text + indent-tabs-mode: nil + sentence-end-double-space: t + fill-column: 70 + coding: utf-8 + End: