PEP 509: dict version is now global
* Rename ma_version to ma_version_tag * Add links to fatoptimizer and fat projects
This commit is contained in:
parent
0a5fc3d1c9
commit
9bd8ffbfa4
128
pep-0509.txt
128
pep-0509.txt
|
@ -13,8 +13,9 @@ Python-Version: 3.6
|
||||||
Abstract
|
Abstract
|
||||||
========
|
========
|
||||||
|
|
||||||
Add a new private version to builtin ``dict`` type, incremented at each
|
Add a new private version to the builtin ``dict`` type, incremented at
|
||||||
change, to implement fast guards on namespaces.
|
each dictionary creation and at each dictionary change, to implement
|
||||||
|
fast guards on namespaces.
|
||||||
|
|
||||||
|
|
||||||
Rationale
|
Rationale
|
||||||
|
@ -38,7 +39,9 @@ PEP proposes to add a version to dictionaries to implement fast guards
|
||||||
on namespaces.
|
on namespaces.
|
||||||
|
|
||||||
Dictionary lookups can be skipped if the version does not change which
|
Dictionary lookups can be skipped if the version does not change which
|
||||||
is the common case for most namespaces. The performance of a guard does
|
is the common case for most namespaces. Since the version is globally
|
||||||
|
unique, the version is also enough to check if the namespace dictionary
|
||||||
|
was not replaced with a new dictionary. The performance of a guard does
|
||||||
not depend on the number of watched dictionary entries, complexity of
|
not depend on the number of watched dictionary entries, complexity of
|
||||||
O(1), if the dictionary version does not change.
|
O(1), if the dictionary version does not change.
|
||||||
|
|
||||||
|
@ -71,9 +74,10 @@ Pseudo-code of an fast guard to check if a dictionary entry was modified
|
||||||
self.version = dict_get_version(dict)
|
self.version = dict_get_version(dict)
|
||||||
|
|
||||||
def check(self):
|
def check(self):
|
||||||
"""Return True if the dictionary entry did not changed."""
|
"""Return True if the dictionary entry did not changed
|
||||||
|
and the dictionary was not replaced."""
|
||||||
|
|
||||||
# read the version field of the dict structure
|
# read the version of the dict structure
|
||||||
version = dict_get_version(self.dict)
|
version = dict_get_version(self.dict)
|
||||||
if version == self.version:
|
if version == self.version:
|
||||||
# Fast-path: dictionary lookup avoided
|
# Fast-path: dictionary lookup avoided
|
||||||
|
@ -94,6 +98,22 @@ Pseudo-code of an fast guard to check if a dictionary entry was modified
|
||||||
Usage of the dict version
|
Usage of the dict version
|
||||||
=========================
|
=========================
|
||||||
|
|
||||||
|
Speedup method calls 1.2x
|
||||||
|
-------------------------
|
||||||
|
|
||||||
|
Yury Selivanov wrote a patch to optimize method calls. The patch depends
|
||||||
|
on the `implement per-opcode cache in ceval
|
||||||
|
<https://bugs.python.org/issue26219>`_ patch which requires dictionary
|
||||||
|
versions to invalidate the cache if the globals dictionary or the
|
||||||
|
builtins dictionary has been modified.
|
||||||
|
|
||||||
|
The cache also requires that the dictionary version is globally unique.
|
||||||
|
It is possible to define a function in a namespace and call it
|
||||||
|
in a different namespace: using ``exec()`` with the *globals* parameter
|
||||||
|
for example. In this case, the globals dictionary was changed and the
|
||||||
|
cache must be invalidated.
|
||||||
|
|
||||||
|
|
||||||
Specialized functions using guards
|
Specialized functions using guards
|
||||||
----------------------------------
|
----------------------------------
|
||||||
|
|
||||||
|
@ -102,8 +122,9 @@ The `PEP 510 -- Specialized functions with guards
|
||||||
specialized functions with guards. It allows to implement static
|
specialized functions with guards. It allows to implement static
|
||||||
optimizers for Python without breaking the Python semantics.
|
optimizers for Python without breaking the Python semantics.
|
||||||
|
|
||||||
Example of a static Python optimizer: the astoptimizer of the `FAT
|
Example of a static Python optimizer: the `fatoptimizer
|
||||||
Python <http://faster-cpython.readthedocs.org/fat_python.html>`_ project
|
<http://fatoptimizer.readthedocs.org/>`_ of the `FAT Python
|
||||||
|
<http://faster-cpython.readthedocs.org/fat_python.html>`_ project
|
||||||
implements many optimizations which require guards on namespaces.
|
implements many optimizations which require guards on namespaces.
|
||||||
Examples:
|
Examples:
|
||||||
|
|
||||||
|
@ -128,7 +149,7 @@ Core runtime).
|
||||||
Unladen Swallow
|
Unladen Swallow
|
||||||
---------------
|
---------------
|
||||||
|
|
||||||
Even if dictionary version was not explicitly mentioned, optimization
|
Even if dictionary version was not explicitly mentioned, optimizing
|
||||||
globals and builtins lookup was part of the Unladen Swallow plan:
|
globals and builtins lookup was part of the Unladen Swallow plan:
|
||||||
"Implement one of the several proposed schemes for speeding lookups of
|
"Implement one of the several proposed schemes for speeding lookups of
|
||||||
globals and builtins." Source: `Unladen Swallow ProjectPlan
|
globals and builtins." Source: `Unladen Swallow ProjectPlan
|
||||||
|
@ -143,9 +164,12 @@ Retrospective
|
||||||
Changes
|
Changes
|
||||||
=======
|
=======
|
||||||
|
|
||||||
Add a ``ma_version`` field to the ``PyDictObject`` structure with the C
|
Add a ``ma_version_tag`` field to the ``PyDictObject`` structure with
|
||||||
type ``PY_INT64_T``, 64-bit unsigned integer. New empty dictionaries are
|
the C type ``PY_INT64_T``, 64-bit unsigned integer. Add also a global
|
||||||
initilized to version ``0``. The version is incremented at each change:
|
dictionary version. Each time a dictionary is created, the global
|
||||||
|
version is incremented and the dictionary version is initialized to the
|
||||||
|
global version. The global version is also incremented and copied to the
|
||||||
|
dictionary version at each dictionary change:
|
||||||
|
|
||||||
* ``clear()`` if the dict was non-empty
|
* ``clear()`` if the dict was non-empty
|
||||||
* ``pop(key)`` if the key exists
|
* ``pop(key)`` if the key exists
|
||||||
|
@ -158,30 +182,27 @@ initilized to version ``0``. The version is incremented at each change:
|
||||||
values are compared by identity, not by their content; the version can
|
values are compared by identity, not by their content; the version can
|
||||||
be incremented multiple times
|
be incremented multiple times
|
||||||
|
|
||||||
.. note::
|
|
||||||
The ``PyDictObject`` structure is not part of the stable ABI.
|
The ``PyDictObject`` structure is not part of the stable ABI.
|
||||||
|
|
||||||
|
The field is called ``ma_version_tag`` rather than ``ma_version`` to
|
||||||
|
suggest to compare it using ``version_tag == old_version_tag`` rather
|
||||||
|
than ``version <= old_version`` which makes the integer overflow much
|
||||||
|
likely.
|
||||||
|
|
||||||
Example using an hypothetical ``dict_get_version(dict)`` function::
|
Example using an hypothetical ``dict_get_version(dict)`` function::
|
||||||
|
|
||||||
>>> d = {}
|
>>> d = {}
|
||||||
>>> dict_get_version(d)
|
>>> dict_get_version(d)
|
||||||
0
|
100
|
||||||
>>> d['key'] = 'value'
|
>>> d['key'] = 'value'
|
||||||
>>> dict_get_version(d)
|
>>> dict_get_version(d)
|
||||||
1
|
101
|
||||||
>>> d['key'] = 'new value'
|
>>> d['key'] = 'new value'
|
||||||
>>> dict_get_version(d)
|
>>> dict_get_version(d)
|
||||||
2
|
102
|
||||||
>>> del d['key']
|
>>> del d['key']
|
||||||
>>> dict_get_version(d)
|
>>> dict_get_version(d)
|
||||||
3
|
103
|
||||||
|
|
||||||
If a dictionary is created with items, the version is also incremented
|
|
||||||
at each dictionary insertion. Example::
|
|
||||||
|
|
||||||
>>> d = dict(x=7, y=33)
|
|
||||||
>>> dict_get_version(d)
|
|
||||||
2
|
|
||||||
|
|
||||||
The version is not incremented if an existing key is set to the same
|
The version is not incremented if an existing key is set to the same
|
||||||
value. For efficiency, values are compared by their identity:
|
value. For efficiency, values are compared by their identity:
|
||||||
|
@ -192,10 +213,10 @@ value. For efficiency, values are compared by their identity:
|
||||||
>>> value = object()
|
>>> value = object()
|
||||||
>>> d['key'] = value
|
>>> d['key'] = value
|
||||||
>>> dict_get_version(d)
|
>>> dict_get_version(d)
|
||||||
1
|
40
|
||||||
>>> d['key'] = value
|
>>> d['key'] = value
|
||||||
>>> dict_get_version(d)
|
>>> dict_get_version(d)
|
||||||
1
|
40
|
||||||
|
|
||||||
.. note::
|
.. note::
|
||||||
CPython uses some singleton like integers in the range [-5; 257],
|
CPython uses some singleton like integers in the range [-5; 257],
|
||||||
|
@ -204,10 +225,10 @@ value. For efficiency, values are compared by their identity:
|
||||||
singleton, the version is not modified.
|
singleton, the version is not modified.
|
||||||
|
|
||||||
|
|
||||||
Implementation
|
Implementation and Performance
|
||||||
==============
|
==============================
|
||||||
|
|
||||||
The `issue #26058: PEP 509: Add ma_version to PyDictObject
|
The `issue #26058: PEP 509: Add ma_version_tag to PyDictObject
|
||||||
<https://bugs.python.org/issue26058>`_ contains a patch implementing
|
<https://bugs.python.org/issue26058>`_ contains a patch implementing
|
||||||
this PEP.
|
this PEP.
|
||||||
|
|
||||||
|
@ -221,23 +242,26 @@ using 10 global variables in a function, 10 dictionary lookups costs 148
|
||||||
ns, whereas the guard still only costs 3.8 ns when the version does not
|
ns, whereas the guard still only costs 3.8 ns when the version does not
|
||||||
change (39x as fast).
|
change (39x as fast).
|
||||||
|
|
||||||
|
The `fat module
|
||||||
|
<http://fatoptimizer.readthedocs.org/en/latest/fat.html>`_ implements
|
||||||
|
such guards: ``fat.GuardDict`` is based on the dictionary version.
|
||||||
|
|
||||||
|
|
||||||
Integer overflow
|
Integer overflow
|
||||||
================
|
================
|
||||||
|
|
||||||
The implementation uses the C unsigned integer type ``PY_UINT64_T`` to
|
The implementation uses the C type ``PY_UINT64_T`` to store the version:
|
||||||
store the version, a 64 bits unsigned integer. The C code uses
|
a 64 bits unsigned integer. The C code uses ``version++``. On integer
|
||||||
``version++``. On integer overflow, the version is wrapped to ``0`` (and
|
overflow, the version is wrapped to ``0`` (and then continue to be
|
||||||
then continue to be incremented) according to the C standard.
|
incremented) according to the C standard.
|
||||||
|
|
||||||
After an integer overflow, a guard can succeed whereas the watched
|
After an integer overflow, a guard can succeed whereas the watched
|
||||||
dictionary key was modified. The bug occurs if the dictionary is
|
dictionary key was modified. The bug only occurs at a guard check if
|
||||||
modified at least ``2 ** 64`` times between two checks of the guard and
|
there are exaclty ``2 ** 64`` dictionary creations or modifications
|
||||||
if the new version (theoretical value with no integer overflow) is equal
|
since the previous guard check.
|
||||||
to the old version modulo ``2 ** 64``.
|
|
||||||
|
|
||||||
If a dictionary is modified each nanosecond, an overflow takes longer
|
If a dictionary is modified every nanosecond, ``2 ** 64`` modifications
|
||||||
than 584 years. Using a 32-bit version, the overflow occurs only after 4
|
takes longer than 584 years. Using a 32-bit version, it only takes 4
|
||||||
seconds. That's why a 64-bit unsigned type is also used on 32-bit
|
seconds. That's why a 64-bit unsigned type is also used on 32-bit
|
||||||
systems. A dictionary lookup at the C level takes 14.8 ns.
|
systems. A dictionary lookup at the C level takes 14.8 ns.
|
||||||
|
|
||||||
|
@ -264,11 +288,6 @@ There are multiple issues:
|
||||||
* All Python implementations must implement this new property, it gives
|
* All Python implementations must implement this new property, it gives
|
||||||
more work to other implementations, whereas they may not use the
|
more work to other implementations, whereas they may not use the
|
||||||
dictionary version at all.
|
dictionary version at all.
|
||||||
* The ``__version__`` can be wrapped on integer overflow. It is error
|
|
||||||
prone: using ``dict.__version__ <= guard_version`` is wrong,
|
|
||||||
``dict.__version__ == guard_version`` must be used instead to reduce
|
|
||||||
the risk of bug on integer overflow (even if the integer overflow is
|
|
||||||
unlikely in practice).
|
|
||||||
* Exposing the dictionary version at Python level can lead the
|
* Exposing the dictionary version at Python level can lead the
|
||||||
false assumption on performances. Checking ``dict.__version__`` at
|
false assumption on performances. Checking ``dict.__version__`` at
|
||||||
the Python level is not faster than a dictionary lookup. A dictionary
|
the Python level is not faster than a dictionary lookup. A dictionary
|
||||||
|
@ -281,7 +300,13 @@ There are multiple issues:
|
||||||
$ ./python -m timeit -s 'd = {str(i):i for i in range(100)}' 'd.__version__ == 100'
|
$ ./python -m timeit -s 'd = {str(i):i for i in range(100)}' 'd.__version__ == 100'
|
||||||
10000000 loops, best of 3: 0.0475 usec per loop
|
10000000 loops, best of 3: 0.0475 usec per loop
|
||||||
|
|
||||||
Bikeshedding on the property name:
|
* The ``__version__`` can be wrapped on integer overflow. It is error
|
||||||
|
prone: using ``dict.__version__ <= guard_version`` is wrong,
|
||||||
|
``dict.__version__ == guard_version`` must be used instead to reduce
|
||||||
|
the risk of bug on integer overflow (even if the integer overflow is
|
||||||
|
unlikely in practice).
|
||||||
|
|
||||||
|
Mandatory bikeshedding on the property name:
|
||||||
|
|
||||||
* ``__cache_token__``: name proposed by Nick Coghlan, name coming from
|
* ``__cache_token__``: name proposed by Nick Coghlan, name coming from
|
||||||
`abc.get_cache_token()
|
`abc.get_cache_token()
|
||||||
|
@ -318,9 +343,10 @@ using hypothetical ``dict_get_version(dict)``
|
||||||
self.entry_version = dict_get_entry_version(dict, key)
|
self.entry_version = dict_get_entry_version(dict, key)
|
||||||
|
|
||||||
def check(self):
|
def check(self):
|
||||||
"""Return True if the dictionary entry did not changed."""
|
"""Return True if the dictionary entry did not changed
|
||||||
|
and the dictionary was not replaced."""
|
||||||
|
|
||||||
# read the version field of the dict structure
|
# read the version of the dict structure
|
||||||
dict_version = dict_get_version(self.dict)
|
dict_version = dict_get_version(self.dict)
|
||||||
if dict_version == self.version:
|
if dict_version == self.version:
|
||||||
# Fast-path: dictionary lookup avoided
|
# Fast-path: dictionary lookup avoided
|
||||||
|
@ -486,8 +512,14 @@ first appeared in their parent objects.
|
||||||
Discussion
|
Discussion
|
||||||
==========
|
==========
|
||||||
|
|
||||||
Thread on the python-ideas mailing list: `RFC: PEP: Add dict.__version__
|
Thread on the mailing lists:
|
||||||
<https://mail.python.org/pipermail/python-ideas/2016-January/037702.html>`_.
|
|
||||||
|
* python-dev: `PEP 509: Add a private version to dict
|
||||||
|
<https://mail.python.org/pipermail/python-dev/2016-January/142685.html>`_
|
||||||
|
(january 2016)
|
||||||
|
* python-ideas: `RFC: PEP: Add dict.__version__
|
||||||
|
<https://mail.python.org/pipermail/python-ideas/2016-January/037702.html>`_
|
||||||
|
(january 2016)
|
||||||
|
|
||||||
|
|
||||||
Copyright
|
Copyright
|
||||||
|
|
Loading…
Reference in New Issue