PEP 509: minor edits
This commit is contained in:
parent
974dc6a0bc
commit
86af87adc3
118
pep-0509.txt
118
pep-0509.txt
|
@ -57,9 +57,9 @@ optimizers.
|
||||||
Guard example
|
Guard example
|
||||||
=============
|
=============
|
||||||
|
|
||||||
Pseudo-code of an fast guard to check if a dictionary key was modified
|
Pseudo-code of an fast guard to check if a dictionary entry was modified
|
||||||
(created, updated or deleted) using an hypothetical
|
(created, updated or deleted) using an hypothetical
|
||||||
``get_dict_version(dict)`` function::
|
``dict_get_version(dict)`` function::
|
||||||
|
|
||||||
UNSET = object()
|
UNSET = object()
|
||||||
|
|
||||||
|
@ -68,22 +68,26 @@ Pseudo-code of an fast guard to check if a dictionary key was modified
|
||||||
self.dict = dict
|
self.dict = dict
|
||||||
self.key = key
|
self.key = key
|
||||||
self.value = dict.get(key, UNSET)
|
self.value = dict.get(key, UNSET)
|
||||||
self.version = get_dict_version(dict)
|
self.version = dict_get_version(dict)
|
||||||
|
|
||||||
def check(self):
|
def check(self):
|
||||||
"""Return True if the dictionary value did not changed."""
|
"""Return True if the dictionary entry did not changed."""
|
||||||
version = get_dict_version(self.dict)
|
|
||||||
|
# read the version field of the dict structure
|
||||||
|
version = dict_get_version(self.dict)
|
||||||
if version == self.version:
|
if version == self.version:
|
||||||
# Fast-path: avoid the dictionary lookup
|
# Fast-path: dictionary lookup avoided
|
||||||
return True
|
return True
|
||||||
|
|
||||||
|
# lookup in the dictionary
|
||||||
value = self.dict.get(self.key, UNSET)
|
value = self.dict.get(self.key, UNSET)
|
||||||
if value == self.value:
|
if value is self.value:
|
||||||
# another key was modified:
|
# another key was modified:
|
||||||
# cache the new dictionary version
|
# cache the new dictionary version
|
||||||
self.version = version
|
self.version = version
|
||||||
return True
|
return True
|
||||||
|
|
||||||
|
# the key was modified
|
||||||
return False
|
return False
|
||||||
|
|
||||||
|
|
||||||
|
@ -139,9 +143,9 @@ Retrospective
|
||||||
Changes
|
Changes
|
||||||
=======
|
=======
|
||||||
|
|
||||||
Add a ``PY_INT64_T ma_version`` field to the ``PyDictObject`` structure:
|
Add a ``ma_version`` field to the ``PyDictObject`` structure with the C
|
||||||
64-bit unsigned integer. New empty dictionaries are initilized to
|
type ``PY_INT64_T``, 64-bit unsigned integer. New empty dictionaries are
|
||||||
version ``0``. The version is incremented at each change:
|
initilized to version ``0``. The version is incremented at each change:
|
||||||
|
|
||||||
* ``clear()`` if the dict was non-empty
|
* ``clear()`` if the dict was non-empty
|
||||||
* ``pop(key)`` if the key exists
|
* ``pop(key)`` if the key exists
|
||||||
|
@ -153,39 +157,40 @@ version ``0``. The version is incremented at each change:
|
||||||
* ``update(...)`` if new values are different than existing values (the
|
* ``update(...)`` if new values are different than existing values (the
|
||||||
version can be incremented multiple times)
|
version can be incremented multiple times)
|
||||||
|
|
||||||
Example using an hypothetical ``get_dict_version(dict)`` function::
|
Example using an hypothetical ``dict_get_version(dict)`` function::
|
||||||
|
|
||||||
>>> d = {}
|
>>> d = {}
|
||||||
>>> get_dict_version(d)
|
>>> dict_get_version(d)
|
||||||
0
|
0
|
||||||
>>> d['key'] = 'value'
|
>>> d['key'] = 'value'
|
||||||
>>> get_dict_version(d)
|
>>> dict_get_version(d)
|
||||||
1
|
1
|
||||||
>>> d['key'] = 'new value'
|
>>> d['key'] = 'new value'
|
||||||
>>> get_dict_version(d)
|
>>> dict_get_version(d)
|
||||||
2
|
2
|
||||||
>>> del d['key']
|
>>> del d['key']
|
||||||
>>> get_dict_version(d)
|
>>> dict_get_version(d)
|
||||||
3
|
3
|
||||||
|
|
||||||
If a dictionary is created with items, the version is also incremented
|
If a dictionary is created with items, the version is also incremented
|
||||||
at each dictionary insertion. Example::
|
at each dictionary insertion. Example::
|
||||||
|
|
||||||
>>> d=dict(x=7, y=33)
|
>>> d=dict(x=7, y=33)
|
||||||
>>> get_dict_version(d)
|
>>> dict_get_version(d)
|
||||||
2
|
2
|
||||||
|
|
||||||
The version is not incremented is an existing key is modified to the
|
The version is not incremented if an existing key is set to the same
|
||||||
same value, but only the identifier of the value is tested, not the
|
value. For efficiency, values are compared by their identity:
|
||||||
content of the value. Example::
|
``new_value is old_value``, not by their content:
|
||||||
|
``new_value == old_value``. Example::
|
||||||
|
|
||||||
>>> d={}
|
>>> d={}
|
||||||
>>> value = object()
|
>>> value = object()
|
||||||
>>> d['key'] = value
|
>>> d['key'] = value
|
||||||
>>> get_dict_version(d)
|
>>> dict_get_version(d)
|
||||||
2
|
2
|
||||||
>>> d['key'] = value
|
>>> d['key'] = value
|
||||||
>>> get_dict_version(d)
|
>>> dict_get_version(d)
|
||||||
2
|
2
|
||||||
|
|
||||||
.. note::
|
.. note::
|
||||||
|
@ -207,10 +212,10 @@ any overhead on dictionary operations.
|
||||||
|
|
||||||
When the version does not change, ``PyDict_GetItem()`` takes 14.8 ns for
|
When the version does not change, ``PyDict_GetItem()`` takes 14.8 ns for
|
||||||
a dictioanry lookup, whereas a guard check only takes 3.8 ns. Moreover,
|
a dictioanry lookup, whereas a guard check only takes 3.8 ns. Moreover,
|
||||||
a guard can watch multiple keys. For example, for an optimization using
|
a guard can watch for multiple keys. For example, for an optimization
|
||||||
10 global variables in a function, the check costs 148 ns for 10 dict
|
using 10 global variables in a function, 10 dictionary lookups costs 148
|
||||||
lookups, whereas the guard still only cost 3.8 ns when the version does
|
ns, whereas the guard still only costs 3.8 ns when the version does not
|
||||||
not change (39x as fast).
|
change (39x as fast).
|
||||||
|
|
||||||
|
|
||||||
Integer overflow
|
Integer overflow
|
||||||
|
@ -230,7 +235,7 @@ to the old version modulo ``2 ** 64``.
|
||||||
If a dictionary is modified each nanosecond, an overflow takes longer
|
If a dictionary is modified each nanosecond, an overflow takes longer
|
||||||
than 584 years. Using a 32-bit version, the overflow occurs only after 4
|
than 584 years. Using a 32-bit version, the overflow occurs only after 4
|
||||||
seconds. That's why a 64-bit unsigned type is also used on 32-bit
|
seconds. That's why a 64-bit unsigned type is also used on 32-bit
|
||||||
systems.
|
systems. A dictionary lookup at the C level takes 14.8 ns.
|
||||||
|
|
||||||
A risk of a bug every 584 years is acceptable.
|
A risk of a bug every 584 years is acceptable.
|
||||||
|
|
||||||
|
@ -249,7 +254,7 @@ the ``dict`` API).
|
||||||
There are multiple issues:
|
There are multiple issues:
|
||||||
|
|
||||||
* To be consistent and avoid bad surprises, the version must be added to
|
* To be consistent and avoid bad surprises, the version must be added to
|
||||||
all mapping type. Implementing a new mapping type would require extra
|
all mapping types. Implementing a new mapping type would require extra
|
||||||
work for no benefit, since the version is only required on the
|
work for no benefit, since the version is only required on the
|
||||||
``dict`` type in practice.
|
``dict`` type in practice.
|
||||||
* All Python implementations must implement this new property, it gives
|
* All Python implementations must implement this new property, it gives
|
||||||
|
@ -260,11 +265,11 @@ There are multiple issues:
|
||||||
``dict.__version__ == guard_version`` must be used instead to reduce
|
``dict.__version__ == guard_version`` must be used instead to reduce
|
||||||
the risk of bug on integer overflow (even if the integer overflow is
|
the risk of bug on integer overflow (even if the integer overflow is
|
||||||
unlikely in practice).
|
unlikely in practice).
|
||||||
* Exposing the dictioanry version can lead the
|
* Exposing the dictionary version at Python level can lead the
|
||||||
false assumption on performances. Checking ``dict.__version__`` at
|
false assumption on performances. Checking ``dict.__version__`` at
|
||||||
the Python level is not faster than a dictionary lookup. The lookup
|
the Python level is not faster than a dictionary lookup. A dictionary
|
||||||
has a cost of 48.7 ns and checking a guard has a cost of 47.5 ns, the
|
lookup has a cost of 48.7 ns and checking a guard has a cost of 47.5
|
||||||
difference is only 1.2 ns (3%)::
|
ns, the difference is only 1.2 ns (3%)::
|
||||||
|
|
||||||
|
|
||||||
$ ./python -m timeit -s 'd = {str(i):i for i in range(100)}' 'd["33"] == 33'
|
$ ./python -m timeit -s 'd = {str(i):i for i in range(100)}' 'd["33"] == 33'
|
||||||
|
@ -286,53 +291,56 @@ Add a version to each dict entry
|
||||||
|
|
||||||
A single version per dictionary requires to keep a strong reference to
|
A single version per dictionary requires to keep a strong reference to
|
||||||
the value which can keep the value alive longer than expected. If we add
|
the value which can keep the value alive longer than expected. If we add
|
||||||
also a version per dictionary entry, the guard can rely on the entry
|
also a version per dictionary entry, the guard can only store the entry
|
||||||
version and so avoid the strong reference to the value (only strong
|
version to avoid the strong reference to the value (only strong
|
||||||
references to a dictionary and key are needed).
|
references to the dictionary and to the key are needed).
|
||||||
|
|
||||||
Changes: add a ``getversion(key)`` method to dictionary which returns
|
Changes: add a ``me_version`` field to the ``PyDictKeyEntry`` structure,
|
||||||
``None`` if the key doesn't exist. When a key is created or modified,
|
the field has the C type ``PY_INT64_T``. When a key is created or
|
||||||
the entry version is set to the dictionary version which is incremented
|
modified, the entry version is set to the dictionary version which is
|
||||||
at each change (create, modify, delete).
|
incremented at any change (create, modify, delete).
|
||||||
|
|
||||||
Pseudo-code of an fast guard to check if a dict key was modified using
|
Pseudo-code of an fast guard to check if a dictionary key was modified
|
||||||
``getversion()``::
|
using hypothetical ``dict_get_version(dict)``
|
||||||
|
``dict_get_entry_version(dict)`` functions::
|
||||||
|
|
||||||
UNSET = object()
|
UNSET = object()
|
||||||
|
|
||||||
class Guard:
|
class GuardDictKey:
|
||||||
def __init__(self, dict, key):
|
def __init__(self, dict, key):
|
||||||
self.dict = dict
|
self.dict = dict
|
||||||
self.key = key
|
self.key = key
|
||||||
self.dict_version = get_dict_version(dict)
|
self.dict_version = dict_get_version(dict)
|
||||||
self.entry_version = dict.getversion(key)
|
self.entry_version = dict_get_entry_version(dict, key)
|
||||||
|
|
||||||
def check(self):
|
def check(self):
|
||||||
"""Return True if the dictionary value did not changed."""
|
"""Return True if the dictionary entry did not changed."""
|
||||||
dict_version = get_dict_version(self.dict)
|
|
||||||
|
# read the version field of the dict structure
|
||||||
|
dict_version = dict_get_version(self.dict)
|
||||||
if dict_version == self.version:
|
if dict_version == self.version:
|
||||||
# Fast-path: avoid the dictionary lookup
|
# Fast-path: dictionary lookup avoided
|
||||||
return True
|
return True
|
||||||
|
|
||||||
# lookup in the dictionary, but get the entry version,
|
# lookup in the dictionary
|
||||||
#not the value
|
entry_version = get_dict_key_version(dict, key)
|
||||||
entry_version = self.dict.getversion(self.key)
|
|
||||||
if entry_version == self.entry_version:
|
if entry_version == self.entry_version:
|
||||||
# another key was modified:
|
# another key was modified:
|
||||||
# cache the new dictionary version
|
# cache the new dictionary version
|
||||||
self.dict_version = dict_version
|
self.dict_version = dict_version
|
||||||
return True
|
return True
|
||||||
|
|
||||||
|
# the key was modified
|
||||||
return False
|
return False
|
||||||
|
|
||||||
This main drawback of this option is the impact on the memory footprint.
|
The main drawback of this option is the impact on the memory footprint.
|
||||||
It increases the size of each dictionary entry, so the overhead depends
|
It increases the size of each dictionary entry, so the overhead depends
|
||||||
on the number of buckets (dictionary entries, used or unused yet). For
|
on the number of buckets (dictionary entries, used or unused yet). For
|
||||||
example, it increases the size of each dictionary entry by 8 bytes on
|
example, it increases the size of each dictionary entry by 8 bytes on
|
||||||
64-bit system if we use ``size_t``.
|
64-bit system.
|
||||||
|
|
||||||
In Python, the memory footprint matters and the trend is more to reduce
|
In Python, the memory footprint matters and the trend is to reduce it.
|
||||||
it. Examples:
|
Examples:
|
||||||
|
|
||||||
* `PEP 393 -- Flexible String Representation
|
* `PEP 393 -- Flexible String Representation
|
||||||
<https://www.python.org/dev/peps/pep-0393/>`_
|
<https://www.python.org/dev/peps/pep-0393/>`_
|
||||||
|
@ -351,7 +359,7 @@ Leave the ``dict`` type unchanged to not add any overhead (memory
|
||||||
footprint) when guards are not needed.
|
footprint) when guards are not needed.
|
||||||
|
|
||||||
Technical issue: a lot of C code in the wild, including CPython core,
|
Technical issue: a lot of C code in the wild, including CPython core,
|
||||||
expect the exact ``dict`` type. Issues:
|
expecting the exact ``dict`` type. Issues:
|
||||||
|
|
||||||
* ``exec()`` requires a ``dict`` for globals and locals. A lot of code
|
* ``exec()`` requires a ``dict`` for globals and locals. A lot of code
|
||||||
use ``globals={}``. It is not possible to cast the ``dict`` to a
|
use ``globals={}``. It is not possible to cast the ``dict`` to a
|
||||||
|
@ -371,7 +379,7 @@ Other issues:
|
||||||
|
|
||||||
* The garbage collector has a special code to "untrack" ``dict``
|
* The garbage collector has a special code to "untrack" ``dict``
|
||||||
instances. If a ``dict`` subtype is used for namespaces, the garbage
|
instances. If a ``dict`` subtype is used for namespaces, the garbage
|
||||||
collector may be unable to break some reference cycles.
|
collector can be unable to break some reference cycles.
|
||||||
* Some functions have a fast-path for ``dict`` which would not be taken
|
* Some functions have a fast-path for ``dict`` which would not be taken
|
||||||
for ``dict`` subtypes, and so it would make Python a little bit
|
for ``dict`` subtypes, and so it would make Python a little bit
|
||||||
slower.
|
slower.
|
||||||
|
|
Loading…
Reference in New Issue