PEP 509: minor edits

This commit is contained in:
Victor Stinner 2016-01-11 17:18:06 +01:00
parent 974dc6a0bc
commit 86af87adc3
1 changed files with 63 additions and 55 deletions

View File

@ -57,9 +57,9 @@ optimizers.
Guard example Guard example
============= =============
Pseudo-code of an fast guard to check if a dictionary key was modified Pseudo-code of an fast guard to check if a dictionary entry was modified
(created, updated or deleted) using an hypothetical (created, updated or deleted) using an hypothetical
``get_dict_version(dict)`` function:: ``dict_get_version(dict)`` function::
UNSET = object() UNSET = object()
@ -68,22 +68,26 @@ Pseudo-code of an fast guard to check if a dictionary key was modified
self.dict = dict self.dict = dict
self.key = key self.key = key
self.value = dict.get(key, UNSET) self.value = dict.get(key, UNSET)
self.version = get_dict_version(dict) self.version = dict_get_version(dict)
def check(self): def check(self):
"""Return True if the dictionary value did not changed.""" """Return True if the dictionary entry did not changed."""
version = get_dict_version(self.dict)
# read the version field of the dict structure
version = dict_get_version(self.dict)
if version == self.version: if version == self.version:
# Fast-path: avoid the dictionary lookup # Fast-path: dictionary lookup avoided
return True return True
# lookup in the dictionary
value = self.dict.get(self.key, UNSET) value = self.dict.get(self.key, UNSET)
if value == self.value: if value is self.value:
# another key was modified: # another key was modified:
# cache the new dictionary version # cache the new dictionary version
self.version = version self.version = version
return True return True
# the key was modified
return False return False
@ -139,9 +143,9 @@ Retrospective
Changes Changes
======= =======
Add a ``PY_INT64_T ma_version`` field to the ``PyDictObject`` structure: Add a ``ma_version`` field to the ``PyDictObject`` structure with the C
64-bit unsigned integer. New empty dictionaries are initilized to type ``PY_INT64_T``, 64-bit unsigned integer. New empty dictionaries are
version ``0``. The version is incremented at each change: initilized to version ``0``. The version is incremented at each change:
* ``clear()`` if the dict was non-empty * ``clear()`` if the dict was non-empty
* ``pop(key)`` if the key exists * ``pop(key)`` if the key exists
@ -153,39 +157,40 @@ version ``0``. The version is incremented at each change:
* ``update(...)`` if new values are different than existing values (the * ``update(...)`` if new values are different than existing values (the
version can be incremented multiple times) version can be incremented multiple times)
Example using an hypothetical ``get_dict_version(dict)`` function:: Example using an hypothetical ``dict_get_version(dict)`` function::
>>> d = {} >>> d = {}
>>> get_dict_version(d) >>> dict_get_version(d)
0 0
>>> d['key'] = 'value' >>> d['key'] = 'value'
>>> get_dict_version(d) >>> dict_get_version(d)
1 1
>>> d['key'] = 'new value' >>> d['key'] = 'new value'
>>> get_dict_version(d) >>> dict_get_version(d)
2 2
>>> del d['key'] >>> del d['key']
>>> get_dict_version(d) >>> dict_get_version(d)
3 3
If a dictionary is created with items, the version is also incremented If a dictionary is created with items, the version is also incremented
at each dictionary insertion. Example:: at each dictionary insertion. Example::
>>> d=dict(x=7, y=33) >>> d=dict(x=7, y=33)
>>> get_dict_version(d) >>> dict_get_version(d)
2 2
The version is not incremented is an existing key is modified to the The version is not incremented if an existing key is set to the same
same value, but only the identifier of the value is tested, not the value. For efficiency, values are compared by their identity:
content of the value. Example:: ``new_value is old_value``, not by their content:
``new_value == old_value``. Example::
>>> d={} >>> d={}
>>> value = object() >>> value = object()
>>> d['key'] = value >>> d['key'] = value
>>> get_dict_version(d) >>> dict_get_version(d)
2 2
>>> d['key'] = value >>> d['key'] = value
>>> get_dict_version(d) >>> dict_get_version(d)
2 2
.. note:: .. note::
@ -207,10 +212,10 @@ any overhead on dictionary operations.
When the version does not change, ``PyDict_GetItem()`` takes 14.8 ns for When the version does not change, ``PyDict_GetItem()`` takes 14.8 ns for
a dictioanry lookup, whereas a guard check only takes 3.8 ns. Moreover, a dictioanry lookup, whereas a guard check only takes 3.8 ns. Moreover,
a guard can watch multiple keys. For example, for an optimization using a guard can watch for multiple keys. For example, for an optimization
10 global variables in a function, the check costs 148 ns for 10 dict using 10 global variables in a function, 10 dictionary lookups costs 148
lookups, whereas the guard still only cost 3.8 ns when the version does ns, whereas the guard still only costs 3.8 ns when the version does not
not change (39x as fast). change (39x as fast).
Integer overflow Integer overflow
@ -230,7 +235,7 @@ to the old version modulo ``2 ** 64``.
If a dictionary is modified each nanosecond, an overflow takes longer If a dictionary is modified each nanosecond, an overflow takes longer
than 584 years. Using a 32-bit version, the overflow occurs only after 4 than 584 years. Using a 32-bit version, the overflow occurs only after 4
seconds. That's why a 64-bit unsigned type is also used on 32-bit seconds. That's why a 64-bit unsigned type is also used on 32-bit
systems. systems. A dictionary lookup at the C level takes 14.8 ns.
A risk of a bug every 584 years is acceptable. A risk of a bug every 584 years is acceptable.
@ -249,7 +254,7 @@ the ``dict`` API).
There are multiple issues: There are multiple issues:
* To be consistent and avoid bad surprises, the version must be added to * To be consistent and avoid bad surprises, the version must be added to
all mapping type. Implementing a new mapping type would require extra all mapping types. Implementing a new mapping type would require extra
work for no benefit, since the version is only required on the work for no benefit, since the version is only required on the
``dict`` type in practice. ``dict`` type in practice.
* All Python implementations must implement this new property, it gives * All Python implementations must implement this new property, it gives
@ -260,11 +265,11 @@ There are multiple issues:
``dict.__version__ == guard_version`` must be used instead to reduce ``dict.__version__ == guard_version`` must be used instead to reduce
the risk of bug on integer overflow (even if the integer overflow is the risk of bug on integer overflow (even if the integer overflow is
unlikely in practice). unlikely in practice).
* Exposing the dictioanry version can lead the * Exposing the dictionary version at Python level can lead the
false assumption on performances. Checking ``dict.__version__`` at false assumption on performances. Checking ``dict.__version__`` at
the Python level is not faster than a dictionary lookup. The lookup the Python level is not faster than a dictionary lookup. A dictionary
has a cost of 48.7 ns and checking a guard has a cost of 47.5 ns, the lookup has a cost of 48.7 ns and checking a guard has a cost of 47.5
difference is only 1.2 ns (3%):: ns, the difference is only 1.2 ns (3%)::
$ ./python -m timeit -s 'd = {str(i):i for i in range(100)}' 'd["33"] == 33' $ ./python -m timeit -s 'd = {str(i):i for i in range(100)}' 'd["33"] == 33'
@ -286,53 +291,56 @@ Add a version to each dict entry
A single version per dictionary requires to keep a strong reference to A single version per dictionary requires to keep a strong reference to
the value which can keep the value alive longer than expected. If we add the value which can keep the value alive longer than expected. If we add
also a version per dictionary entry, the guard can rely on the entry also a version per dictionary entry, the guard can only store the entry
version and so avoid the strong reference to the value (only strong version to avoid the strong reference to the value (only strong
references to a dictionary and key are needed). references to the dictionary and to the key are needed).
Changes: add a ``getversion(key)`` method to dictionary which returns Changes: add a ``me_version`` field to the ``PyDictKeyEntry`` structure,
``None`` if the key doesn't exist. When a key is created or modified, the field has the C type ``PY_INT64_T``. When a key is created or
the entry version is set to the dictionary version which is incremented modified, the entry version is set to the dictionary version which is
at each change (create, modify, delete). incremented at any change (create, modify, delete).
Pseudo-code of an fast guard to check if a dict key was modified using Pseudo-code of an fast guard to check if a dictionary key was modified
``getversion()``:: using hypothetical ``dict_get_version(dict)``
``dict_get_entry_version(dict)`` functions::
UNSET = object() UNSET = object()
class Guard: class GuardDictKey:
def __init__(self, dict, key): def __init__(self, dict, key):
self.dict = dict self.dict = dict
self.key = key self.key = key
self.dict_version = get_dict_version(dict) self.dict_version = dict_get_version(dict)
self.entry_version = dict.getversion(key) self.entry_version = dict_get_entry_version(dict, key)
def check(self): def check(self):
"""Return True if the dictionary value did not changed.""" """Return True if the dictionary entry did not changed."""
dict_version = get_dict_version(self.dict)
# read the version field of the dict structure
dict_version = dict_get_version(self.dict)
if dict_version == self.version: if dict_version == self.version:
# Fast-path: avoid the dictionary lookup # Fast-path: dictionary lookup avoided
return True return True
# lookup in the dictionary, but get the entry version, # lookup in the dictionary
#not the value entry_version = get_dict_key_version(dict, key)
entry_version = self.dict.getversion(self.key)
if entry_version == self.entry_version: if entry_version == self.entry_version:
# another key was modified: # another key was modified:
# cache the new dictionary version # cache the new dictionary version
self.dict_version = dict_version self.dict_version = dict_version
return True return True
# the key was modified
return False return False
This main drawback of this option is the impact on the memory footprint. The main drawback of this option is the impact on the memory footprint.
It increases the size of each dictionary entry, so the overhead depends It increases the size of each dictionary entry, so the overhead depends
on the number of buckets (dictionary entries, used or unused yet). For on the number of buckets (dictionary entries, used or unused yet). For
example, it increases the size of each dictionary entry by 8 bytes on example, it increases the size of each dictionary entry by 8 bytes on
64-bit system if we use ``size_t``. 64-bit system.
In Python, the memory footprint matters and the trend is more to reduce In Python, the memory footprint matters and the trend is to reduce it.
it. Examples: Examples:
* `PEP 393 -- Flexible String Representation * `PEP 393 -- Flexible String Representation
<https://www.python.org/dev/peps/pep-0393/>`_ <https://www.python.org/dev/peps/pep-0393/>`_
@ -351,7 +359,7 @@ Leave the ``dict`` type unchanged to not add any overhead (memory
footprint) when guards are not needed. footprint) when guards are not needed.
Technical issue: a lot of C code in the wild, including CPython core, Technical issue: a lot of C code in the wild, including CPython core,
expect the exact ``dict`` type. Issues: expecting the exact ``dict`` type. Issues:
* ``exec()`` requires a ``dict`` for globals and locals. A lot of code * ``exec()`` requires a ``dict`` for globals and locals. A lot of code
use ``globals={}``. It is not possible to cast the ``dict`` to a use ``globals={}``. It is not possible to cast the ``dict`` to a
@ -371,7 +379,7 @@ Other issues:
* The garbage collector has a special code to "untrack" ``dict`` * The garbage collector has a special code to "untrack" ``dict``
instances. If a ``dict`` subtype is used for namespaces, the garbage instances. If a ``dict`` subtype is used for namespaces, the garbage
collector may be unable to break some reference cycles. collector can be unable to break some reference cycles.
* Some functions have a fast-path for ``dict`` which would not be taken * Some functions have a fast-path for ``dict`` which would not be taken
for ``dict`` subtypes, and so it would make Python a little bit for ``dict`` subtypes, and so it would make Python a little bit
slower. slower.