PEP 509: dict version is now global

* Rename ma_version to ma_version_tag
* Add links to fatoptimizer and fat projects
This commit is contained in:
Victor Stinner 2016-04-14 17:13:07 +02:00
parent 0a5fc3d1c9
commit 9bd8ffbfa4
1 changed files with 81 additions and 49 deletions

View File

@ -13,8 +13,9 @@ Python-Version: 3.6
Abstract Abstract
======== ========
Add a new private version to builtin ``dict`` type, incremented at each Add a new private version to the builtin ``dict`` type, incremented at
change, to implement fast guards on namespaces. each dictionary creation and at each dictionary change, to implement
fast guards on namespaces.
Rationale Rationale
@ -38,7 +39,9 @@ PEP proposes to add a version to dictionaries to implement fast guards
on namespaces. on namespaces.
Dictionary lookups can be skipped if the version does not change which Dictionary lookups can be skipped if the version does not change which
is the common case for most namespaces. The performance of a guard does is the common case for most namespaces. Since the version is globally
unique, the version is also enough to check if the namespace dictionary
was not replaced with a new dictionary. The performance of a guard does
not depend on the number of watched dictionary entries, complexity of not depend on the number of watched dictionary entries, complexity of
O(1), if the dictionary version does not change. O(1), if the dictionary version does not change.
@ -71,9 +74,10 @@ Pseudo-code of an fast guard to check if a dictionary entry was modified
self.version = dict_get_version(dict) self.version = dict_get_version(dict)
def check(self): def check(self):
"""Return True if the dictionary entry did not changed.""" """Return True if the dictionary entry did not changed
and the dictionary was not replaced."""
# read the version field of the dict structure # read the version of the dict structure
version = dict_get_version(self.dict) version = dict_get_version(self.dict)
if version == self.version: if version == self.version:
# Fast-path: dictionary lookup avoided # Fast-path: dictionary lookup avoided
@ -94,6 +98,22 @@ Pseudo-code of an fast guard to check if a dictionary entry was modified
Usage of the dict version Usage of the dict version
========================= =========================
Speedup method calls 1.2x
-------------------------
Yury Selivanov wrote a patch to optimize method calls. The patch depends
on the `implement per-opcode cache in ceval
<https://bugs.python.org/issue26219>`_ patch which requires dictionary
versions to invalidate the cache if the globals dictionary or the
builtins dictionary has been modified.
The cache also requires that the dictionary version is globally unique.
It is possible to define a function in a namespace and call it
in a different namespace: using ``exec()`` with the *globals* parameter
for example. In this case, the globals dictionary was changed and the
cache must be invalidated.
Specialized functions using guards Specialized functions using guards
---------------------------------- ----------------------------------
@ -102,8 +122,9 @@ The `PEP 510 -- Specialized functions with guards
specialized functions with guards. It allows to implement static specialized functions with guards. It allows to implement static
optimizers for Python without breaking the Python semantics. optimizers for Python without breaking the Python semantics.
Example of a static Python optimizer: the astoptimizer of the `FAT Example of a static Python optimizer: the `fatoptimizer
Python <http://faster-cpython.readthedocs.org/fat_python.html>`_ project <http://fatoptimizer.readthedocs.org/>`_ of the `FAT Python
<http://faster-cpython.readthedocs.org/fat_python.html>`_ project
implements many optimizations which require guards on namespaces. implements many optimizations which require guards on namespaces.
Examples: Examples:
@ -128,7 +149,7 @@ Core runtime).
Unladen Swallow Unladen Swallow
--------------- ---------------
Even if dictionary version was not explicitly mentioned, optimization Even if dictionary version was not explicitly mentioned, optimizing
globals and builtins lookup was part of the Unladen Swallow plan: globals and builtins lookup was part of the Unladen Swallow plan:
"Implement one of the several proposed schemes for speeding lookups of "Implement one of the several proposed schemes for speeding lookups of
globals and builtins." Source: `Unladen Swallow ProjectPlan globals and builtins." Source: `Unladen Swallow ProjectPlan
@ -143,9 +164,12 @@ Retrospective
Changes Changes
======= =======
Add a ``ma_version`` field to the ``PyDictObject`` structure with the C Add a ``ma_version_tag`` field to the ``PyDictObject`` structure with
type ``PY_INT64_T``, 64-bit unsigned integer. New empty dictionaries are the C type ``PY_INT64_T``, 64-bit unsigned integer. Add also a global
initilized to version ``0``. The version is incremented at each change: dictionary version. Each time a dictionary is created, the global
version is incremented and the dictionary version is initialized to the
global version. The global version is also incremented and copied to the
dictionary version at each dictionary change:
* ``clear()`` if the dict was non-empty * ``clear()`` if the dict was non-empty
* ``pop(key)`` if the key exists * ``pop(key)`` if the key exists
@ -158,30 +182,27 @@ initilized to version ``0``. The version is incremented at each change:
values are compared by identity, not by their content; the version can values are compared by identity, not by their content; the version can
be incremented multiple times be incremented multiple times
.. note:: The ``PyDictObject`` structure is not part of the stable ABI.
The ``PyDictObject`` structure is not part of the stable ABI.
The field is called ``ma_version_tag`` rather than ``ma_version`` to
suggest to compare it using ``version_tag == old_version_tag`` rather
than ``version <= old_version`` which makes the integer overflow much
likely.
Example using an hypothetical ``dict_get_version(dict)`` function:: Example using an hypothetical ``dict_get_version(dict)`` function::
>>> d = {} >>> d = {}
>>> dict_get_version(d) >>> dict_get_version(d)
0 100
>>> d['key'] = 'value' >>> d['key'] = 'value'
>>> dict_get_version(d) >>> dict_get_version(d)
1 101
>>> d['key'] = 'new value' >>> d['key'] = 'new value'
>>> dict_get_version(d) >>> dict_get_version(d)
2 102
>>> del d['key'] >>> del d['key']
>>> dict_get_version(d) >>> dict_get_version(d)
3 103
If a dictionary is created with items, the version is also incremented
at each dictionary insertion. Example::
>>> d = dict(x=7, y=33)
>>> dict_get_version(d)
2
The version is not incremented if an existing key is set to the same The version is not incremented if an existing key is set to the same
value. For efficiency, values are compared by their identity: value. For efficiency, values are compared by their identity:
@ -192,10 +213,10 @@ value. For efficiency, values are compared by their identity:
>>> value = object() >>> value = object()
>>> d['key'] = value >>> d['key'] = value
>>> dict_get_version(d) >>> dict_get_version(d)
1 40
>>> d['key'] = value >>> d['key'] = value
>>> dict_get_version(d) >>> dict_get_version(d)
1 40
.. note:: .. note::
CPython uses some singleton like integers in the range [-5; 257], CPython uses some singleton like integers in the range [-5; 257],
@ -204,10 +225,10 @@ value. For efficiency, values are compared by their identity:
singleton, the version is not modified. singleton, the version is not modified.
Implementation Implementation and Performance
============== ==============================
The `issue #26058: PEP 509: Add ma_version to PyDictObject The `issue #26058: PEP 509: Add ma_version_tag to PyDictObject
<https://bugs.python.org/issue26058>`_ contains a patch implementing <https://bugs.python.org/issue26058>`_ contains a patch implementing
this PEP. this PEP.
@ -221,23 +242,26 @@ using 10 global variables in a function, 10 dictionary lookups costs 148
ns, whereas the guard still only costs 3.8 ns when the version does not ns, whereas the guard still only costs 3.8 ns when the version does not
change (39x as fast). change (39x as fast).
The `fat module
<http://fatoptimizer.readthedocs.org/en/latest/fat.html>`_ implements
such guards: ``fat.GuardDict`` is based on the dictionary version.
Integer overflow Integer overflow
================ ================
The implementation uses the C unsigned integer type ``PY_UINT64_T`` to The implementation uses the C type ``PY_UINT64_T`` to store the version:
store the version, a 64 bits unsigned integer. The C code uses a 64 bits unsigned integer. The C code uses ``version++``. On integer
``version++``. On integer overflow, the version is wrapped to ``0`` (and overflow, the version is wrapped to ``0`` (and then continue to be
then continue to be incremented) according to the C standard. incremented) according to the C standard.
After an integer overflow, a guard can succeed whereas the watched After an integer overflow, a guard can succeed whereas the watched
dictionary key was modified. The bug occurs if the dictionary is dictionary key was modified. The bug only occurs at a guard check if
modified at least ``2 ** 64`` times between two checks of the guard and there are exaclty ``2 ** 64`` dictionary creations or modifications
if the new version (theoretical value with no integer overflow) is equal since the previous guard check.
to the old version modulo ``2 ** 64``.
If a dictionary is modified each nanosecond, an overflow takes longer If a dictionary is modified every nanosecond, ``2 ** 64`` modifications
than 584 years. Using a 32-bit version, the overflow occurs only after 4 takes longer than 584 years. Using a 32-bit version, it only takes 4
seconds. That's why a 64-bit unsigned type is also used on 32-bit seconds. That's why a 64-bit unsigned type is also used on 32-bit
systems. A dictionary lookup at the C level takes 14.8 ns. systems. A dictionary lookup at the C level takes 14.8 ns.
@ -264,11 +288,6 @@ There are multiple issues:
* All Python implementations must implement this new property, it gives * All Python implementations must implement this new property, it gives
more work to other implementations, whereas they may not use the more work to other implementations, whereas they may not use the
dictionary version at all. dictionary version at all.
* The ``__version__`` can be wrapped on integer overflow. It is error
prone: using ``dict.__version__ <= guard_version`` is wrong,
``dict.__version__ == guard_version`` must be used instead to reduce
the risk of bug on integer overflow (even if the integer overflow is
unlikely in practice).
* Exposing the dictionary version at Python level can lead the * Exposing the dictionary version at Python level can lead the
false assumption on performances. Checking ``dict.__version__`` at false assumption on performances. Checking ``dict.__version__`` at
the Python level is not faster than a dictionary lookup. A dictionary the Python level is not faster than a dictionary lookup. A dictionary
@ -281,7 +300,13 @@ There are multiple issues:
$ ./python -m timeit -s 'd = {str(i):i for i in range(100)}' 'd.__version__ == 100' $ ./python -m timeit -s 'd = {str(i):i for i in range(100)}' 'd.__version__ == 100'
10000000 loops, best of 3: 0.0475 usec per loop 10000000 loops, best of 3: 0.0475 usec per loop
Bikeshedding on the property name: * The ``__version__`` can be wrapped on integer overflow. It is error
prone: using ``dict.__version__ <= guard_version`` is wrong,
``dict.__version__ == guard_version`` must be used instead to reduce
the risk of bug on integer overflow (even if the integer overflow is
unlikely in practice).
Mandatory bikeshedding on the property name:
* ``__cache_token__``: name proposed by Nick Coghlan, name coming from * ``__cache_token__``: name proposed by Nick Coghlan, name coming from
`abc.get_cache_token() `abc.get_cache_token()
@ -318,9 +343,10 @@ using hypothetical ``dict_get_version(dict)``
self.entry_version = dict_get_entry_version(dict, key) self.entry_version = dict_get_entry_version(dict, key)
def check(self): def check(self):
"""Return True if the dictionary entry did not changed.""" """Return True if the dictionary entry did not changed
and the dictionary was not replaced."""
# read the version field of the dict structure # read the version of the dict structure
dict_version = dict_get_version(self.dict) dict_version = dict_get_version(self.dict)
if dict_version == self.version: if dict_version == self.version:
# Fast-path: dictionary lookup avoided # Fast-path: dictionary lookup avoided
@ -486,8 +512,14 @@ first appeared in their parent objects.
Discussion Discussion
========== ==========
Thread on the python-ideas mailing list: `RFC: PEP: Add dict.__version__ Thread on the mailing lists:
<https://mail.python.org/pipermail/python-ideas/2016-January/037702.html>`_.
* python-dev: `PEP 509: Add a private version to dict
<https://mail.python.org/pipermail/python-dev/2016-January/142685.html>`_
(january 2016)
* python-ideas: `RFC: PEP: Add dict.__version__
<https://mail.python.org/pipermail/python-ideas/2016-January/037702.html>`_
(january 2016)
Copyright Copyright