* __setitem__() now always increases the version: remove the micro-optimization
  dict[key] is new_value
* be more explict on version++: explain that the operation must be atomic, and
  that dict methods are already atomic thanks to the GIL
* "Guard against changing dict during iteration": don't guess if the new dict
  version can be used or not. Let's discuss that later.
* rephrase/complete some sections
* add links to new threads on python-dev
This commit is contained in:
Victor Stinner 2016-04-19 10:43:02 +02:00
parent 24ca07b6c9
commit 37ca004dd3
1 changed files with 103 additions and 96 deletions

View File

@ -22,11 +22,11 @@ Rationale
========= =========
In Python, the builtin ``dict`` type is used by many instructions. For In Python, the builtin ``dict`` type is used by many instructions. For
example, the ``LOAD_GLOBAL`` instruction searchs for a variable in the example, the ``LOAD_GLOBAL`` instruction looks up a variable in the
global namespace, or in the builtins namespace (two dict lookups). global namespace, or in the builtins namespace (two dict lookups).
Python uses ``dict`` for the builtins namespace, globals namespace, type Python uses ``dict`` for the builtins namespace, globals namespace, type
namespaces, instance namespaces, etc. The local namespace (namespace of namespaces, instance namespaces, etc. The local namespace (function
a function) is usually optimized to an array, but it can be a dict too. namespace) is usually optimized to an array, but it can be a dict too.
Python is hard to optimize because almost everything is mutable: builtin Python is hard to optimize because almost everything is mutable: builtin
functions, function code, global variables, local variables, ... can be functions, function code, global variables, local variables, ... can be
@ -35,26 +35,29 @@ semantics requires to detect when "something changes": we will call
these checks "guards". these checks "guards".
The speedup of optimizations depends on the speed of guard checks. This The speedup of optimizations depends on the speed of guard checks. This
PEP proposes to add a version to dictionaries to implement fast guards PEP proposes to add a private version to dictionaries to implement fast
on namespaces. guards on namespaces.
Dictionary lookups can be skipped if the version does not change which Dictionary lookups can be skipped if the version does not change which
is the common case for most namespaces. Since the version is globally is the common case for most namespaces. The version is globally unique,
unique, the version is also enough to check if the namespace dictionary so checking the version is also enough to check if the namespace
was not replaced with a new dictionary. The performance of a guard does dictionary was not replaced with a new dictionary.
not depend on the number of watched dictionary entries, complexity of
O(1), if the dictionary version does not change. When the dictionary version does not change, the performance of a guard
does not depend on the number of watched dictionary entries: the
complexity is O(1).
Example of optimization: copy the value of a global variable to function Example of optimization: copy the value of a global variable to function
constants. This optimization requires a guard on the global variable to constants. This optimization requires a guard on the global variable to
check if it was modified. If the variable is modified, the variable must check if it was modified. If the global variable is not modified, the
be loaded at runtime when the function is called, instead of using the function uses the cached copy. If the global variable is modified, the
constant. function uses a regular lookup, and maybe also deoptimize the function
(to remove the overhead of the guard check for next function calls).
See the `PEP 510 -- Specialized functions with guards See the `PEP 510 -- Specialized functions with guards
<https://www.python.org/dev/peps/pep-0510/>`_ for the concrete usage of <https://www.python.org/dev/peps/pep-0510/>`_ for the concrete usage of
guards to specialize functions and for the rationale on Python static guards to specialize functions and for a more general rationale on
optimizers. Python static optimizers.
Guard example Guard example
@ -77,7 +80,7 @@ Pseudo-code of an fast guard to check if a dictionary entry was modified
"""Return True if the dictionary entry did not change """Return True if the dictionary entry did not change
and the dictionary was not replaced.""" and the dictionary was not replaced."""
# read the version of the dict structure # read the version of the dictionary
version = dict_get_version(self.dict) version = dict_get_version(self.dict)
if version == self.version: if version == self.version:
# Fast-path: dictionary lookup avoided # Fast-path: dictionary lookup avoided
@ -98,21 +101,21 @@ Pseudo-code of an fast guard to check if a dictionary entry was modified
Usage of the dict version Usage of the dict version
========================= =========================
Speedup method calls 1.2x Speedup method calls
------------------------- --------------------
Yury Selivanov wrote a `patch to optimize method calls Yury Selivanov wrote a `patch to optimize method calls
<https://bugs.python.org/issue26110>`_. The patch depends on the <https://bugs.python.org/issue26110>`_. The patch depends on the
`implement per-opcode cache in ceval `"implement per-opcode cache in ceval"
<https://bugs.python.org/issue26219>`_ patch which requires dictionary <https://bugs.python.org/issue26219>`_ patch which requires dictionary
versions to invalidate the cache if the globals dictionary or the versions to invalidate the cache if the globals dictionary or the
builtins dictionary has been modified. builtins dictionary has been modified.
The cache also requires that the dictionary version is globally unique. The cache also requires that the dictionary version is globally unique.
It is possible to define a function in a namespace and call it It is possible to define a function in a namespace and call it in a
in a different namespace: using ``exec()`` with the *globals* parameter different namespace, using ``exec()`` with the *globals* parameter for
for example. In this case, the globals dictionary was changed and the example. In this case, the globals dictionary was replaced and the cache
cache must be invalidated. must also be invalidated.
Specialized functions using guards Specialized functions using guards
@ -123,28 +126,37 @@ The `PEP 510 -- Specialized functions with guards
specialized functions with guards. It allows to implement static specialized functions with guards. It allows to implement static
optimizers for Python without breaking the Python semantics. optimizers for Python without breaking the Python semantics.
Example of a static Python optimizer: the `fatoptimizer The `fatoptimizer <http://fatoptimizer.readthedocs.org/>`_ of the `FAT
<http://fatoptimizer.readthedocs.org/>`_ of the `FAT Python Python <http://faster-cpython.readthedocs.org/fat_python.html>`_ project
<http://faster-cpython.readthedocs.org/fat_python.html>`_ project is an example of a static Python optimizer. It implements many
implements many optimizations which require guards on namespaces. optimizations which require guards on namespaces:
Examples:
* Call pure builtins: to replace ``len("abc")`` with ``3``, guards on * Call pure builtins: to replace ``len("abc")`` with ``3``, guards on
``builtins.__dict__['len']`` and ``globals()['len']`` are required ``builtins.__dict__['len']`` and ``globals()['len']`` are required
* Loop unrolling: to unroll the loop ``for i in range(...): ...``, * Loop unrolling: to unroll the loop ``for i in range(...): ...``,
guards on ``builtins.__dict__['range']`` and ``globals()['range']`` guards on ``builtins.__dict__['range']`` and ``globals()['range']``
are required are required
* etc.
Pyjion Pyjion
------ ------
According of Brett Cannon, one of the two main developers of Pyjion, According of Brett Cannon, one of the two main developers of Pyjion,
Pyjion can also benefit from dictionary version to implement Pyjion can benefit from dictionary version to implement optimizations.
optimizations.
Pyjion is a JIT compiler for Python based upon CoreCLR (Microsoft .NET `Pyjion <https://github.com/Microsoft/Pyjion>`_ is a JIT compiler for
Core runtime). Python based upon CoreCLR (Microsoft .NET Core runtime).
Cython
------
Cython can benefit from dictionary version to implement optimizations.
`Cython <http://cython.org/>`_ is an optimising static compiler for both
the Python programming language and the extended Cython programming
language.
Unladen Swallow Unladen Swallow
@ -153,8 +165,8 @@ Unladen Swallow
Even if dictionary version was not explicitly mentioned, optimizing Even if dictionary version was not explicitly mentioned, optimizing
globals and builtins lookup was part of the Unladen Swallow plan: globals and builtins lookup was part of the Unladen Swallow plan:
"Implement one of the several proposed schemes for speeding lookups of "Implement one of the several proposed schemes for speeding lookups of
globals and builtins." Source: `Unladen Swallow ProjectPlan globals and builtins." (source: `Unladen Swallow ProjectPlan
<https://code.google.com/p/unladen-swallow/wiki/ProjectPlan>`_. <https://code.google.com/p/unladen-swallow/wiki/ProjectPlan>`_).
Unladen Swallow is a fork of CPython 2.6.1 adding a JIT compiler Unladen Swallow is a fork of CPython 2.6.1 adding a JIT compiler
implemented with LLVM. The project stopped in 2011: `Unladen Swallow implemented with LLVM. The project stopped in 2011: `Unladen Swallow
@ -172,23 +184,16 @@ version is incremented and the dictionary version is initialized to the
global version. The global version is also incremented and copied to the global version. The global version is also incremented and copied to the
dictionary version at each dictionary change: dictionary version at each dictionary change:
* ``clear()`` if the dict was non-empty * ``clear()`` if the dict is non-empty
* ``pop(key)`` if the key exists * ``pop(key)`` if the key exists
* ``popitem()`` if the dict is non-empty * ``popitem()`` if the dict is non-empty
* ``setdefault(key, value)`` if the `key` does not exist * ``setdefault(key, value)`` if the key does not exist
* ``__detitem__(key)`` if the key exists * ``__delitem__(key)`` if the key exists
* ``__setitem__(key, value)`` if the `key` doesn't exist or if the value * ``__setitem__(key, value)`` always increases the version
is not ``dict[key]`` * ``update(...)`` if called with arguments
* ``update(...)`` if new values are different than existing values:
values are compared by identity, not by their content; the version can
be incremented multiple times
The ``PyDictObject`` structure is not part of the stable ABI. The version increase must be atomic. In CPython, the Global Interpreter
Lock (GIL) already protects ``dict`` methods to make them atomic.
The field is called ``ma_version_tag`` rather than ``ma_version`` to
suggest to compare it using ``version_tag == old_version_tag`` rather
than ``version <= old_version`` which makes the integer overflow much
likely.
Example using an hypothetical ``dict_get_version(dict)`` function:: Example using an hypothetical ``dict_get_version(dict)`` function::
@ -205,25 +210,23 @@ Example using an hypothetical ``dict_get_version(dict)`` function::
>>> dict_get_version(d) >>> dict_get_version(d)
103 103
The version is not incremented if an existing key is set to the same ``dict.__setitem__(key, value)`` and ``dict.update(...)`` always
value. For efficiency, values are compared by their identity: increases the version, even if the new value is identical or is equal to
``new_value is old_value``, not by their content: the current value (even if ``(dict[key] is value) or (dict[key] ==
``new_value == old_value``. Example:: value)``).
>>> d = {} The field is called ``ma_version_tag``, rather than ``ma_version``, to
>>> value = object() suggest to compare it using ``version_tag == old_version_tag``, rather
>>> d['key'] = value than ``version <= old_version`` which is wrong most of the time after an
>>> dict_get_version(d) integer overflow.
40
>>> d['key'] = value
>>> dict_get_version(d)
40
.. note::
CPython uses some singleton like integers in the range [-5; 257], Backwards Compatibility
empty tuple, empty strings, Unicode strings of a single character in =======================
the range [U+0000; U+00FF], etc. When a key is set twice to the same
singleton, the version is not modified. Since the ``PyDictObject`` structure is not part of the stable ABI and
the new dictionary version not exposed at the Python scope, changes are
backward compatible.
Implementation and Performance Implementation and Performance
@ -234,7 +237,10 @@ The `issue #26058: PEP 509: Add ma_version_tag to PyDictObject
this PEP. this PEP.
On pybench and timeit microbenchmarks, the patch does not seem to add On pybench and timeit microbenchmarks, the patch does not seem to add
any overhead on dictionary operations. any overhead on dictionary operations. For example, the following timeit
micro-benchmarks takes 318 nanoseconds before and after the change::
python3.6 -m timeit 'd={1: 0}; d[2]=0; d[3]=0; d[4]=0; del d[1]; del d[2]; d.clear()'
When the version does not change, ``PyDict_GetItem()`` takes 14.8 ns for When the version does not change, ``PyDict_GetItem()`` takes 14.8 ns for
a dictionary lookup, whereas a guard check only takes 3.8 ns. Moreover, a dictionary lookup, whereas a guard check only takes 3.8 ns. Moreover,
@ -286,19 +292,19 @@ There are multiple issues:
all mapping types. Implementing a new mapping type would require extra all mapping types. Implementing a new mapping type would require extra
work for no benefit, since the version is only required on the work for no benefit, since the version is only required on the
``dict`` type in practice. ``dict`` type in practice.
* All Python implementations must implement this new property, it gives * All Python implementations would have to implement this new property,
more work to other implementations, whereas they may not use the it gives more work to other implementations, whereas they may not use
dictionary version at all. the dictionary version at all.
* Exposing the dictionary version at Python level can lead the * Exposing the dictionary version at the Python level can lead the
false assumption on performances. Checking ``dict.__version__`` at false assumption on performances. Checking ``dict.__version__`` at
the Python level is not faster than a dictionary lookup. A dictionary the Python level is not faster than a dictionary lookup. A dictionary
lookup has a cost of 48.7 ns and checking a guard has a cost of 47.5 lookup in Python has a cost of 48.7 ns and checking the version has a
ns, the difference is only 1.2 ns (3%):: cost of 47.5 ns, the difference is only 1.2 ns (3%)::
$ ./python -m timeit -s 'd = {str(i):i for i in range(100)}' 'd["33"] == 33' $ python3.6 -m timeit -s 'd = {str(i):i for i in range(100)}' 'd["33"] == 33'
10000000 loops, best of 3: 0.0487 usec per loop 10000000 loops, best of 3: 0.0487 usec per loop
$ ./python -m timeit -s 'd = {str(i):i for i in range(100)}' 'd.__version__ == 100' $ python3.6 -m timeit -s 'd = {str(i):i for i in range(100)}' 'd.__version__ == 100'
10000000 loops, best of 3: 0.0475 usec per loop 10000000 loops, best of 3: 0.0475 usec per loop
* The ``__version__`` can be wrapped on integer overflow. It is error * The ``__version__`` can be wrapped on integer overflow. It is error
@ -313,6 +319,7 @@ Mandatory bikeshedding on the property name:
`abc.get_cache_token() `abc.get_cache_token()
<https://docs.python.org/3/library/abc.html#abc.get_cache_token>`_. <https://docs.python.org/3/library/abc.html#abc.get_cache_token>`_.
* ``__version__`` * ``__version__``
* ``__version_tag__``
* ``__timestamp__`` * ``__timestamp__``
@ -322,16 +329,16 @@ Add a version to each dict entry
A single version per dictionary requires to keep a strong reference to A single version per dictionary requires to keep a strong reference to
the value which can keep the value alive longer than expected. If we add the value which can keep the value alive longer than expected. If we add
also a version per dictionary entry, the guard can only store the entry also a version per dictionary entry, the guard can only store the entry
version to avoid the strong reference to the value (only strong version (a simple integer) to avoid the strong reference to the value:
references to the dictionary and to the key are needed). only strong references to the dictionary and to the key are needed.
Changes: add a ``me_version`` field to the ``PyDictKeyEntry`` structure, Changes: add a ``me_version_tag`` field to the ``PyDictKeyEntry``
the field has the C type ``PY_UINT64_T``. When a key is created or structure, the field has the C type ``PY_UINT64_T``. When a key is
modified, the entry version is set to the dictionary version which is created or modified, the entry version is set to the dictionary version
incremented at any change (create, modify, delete). which is incremented at any change (create, modify, delete).
Pseudo-code of an fast guard to check if a dictionary key was modified Pseudo-code of an fast guard to check if a dictionary key was modified
using hypothetical ``dict_get_version(dict)`` using hypothetical ``dict_get_version(dict)`` and
``dict_get_entry_version(dict)`` functions:: ``dict_get_entry_version(dict)`` functions::
UNSET = object() UNSET = object()
@ -347,18 +354,19 @@ using hypothetical ``dict_get_version(dict)``
"""Return True if the dictionary entry did not change """Return True if the dictionary entry did not change
and the dictionary was not replaced.""" and the dictionary was not replaced."""
# read the version of the dict structure # read the version of the dictionary
dict_version = dict_get_version(self.dict) dict_version = dict_get_version(self.dict)
if dict_version == self.version: if dict_version == self.version:
# Fast-path: dictionary lookup avoided # Fast-path: dictionary lookup avoided
return True return True
# lookup in the dictionary # lookup in the dictionary to read the entry version
entry_version = get_dict_key_version(dict, key) entry_version = get_dict_key_version(dict, key)
if entry_version == self.entry_version: if entry_version == self.entry_version:
# another key was modified: # another key was modified:
# cache the new dictionary version # cache the new dictionary version
self.dict_version = dict_version self.dict_version = dict_version
self.entry_version = entry_version
return True return True
# the key was modified # the key was modified
@ -366,7 +374,7 @@ using hypothetical ``dict_get_version(dict)``
The main drawback of this option is the impact on the memory footprint. The main drawback of this option is the impact on the memory footprint.
It increases the size of each dictionary entry, so the overhead depends It increases the size of each dictionary entry, so the overhead depends
on the number of buckets (dictionary entries, used or unused yet). For on the number of buckets (dictionary entries, used or not used). For
example, it increases the size of each dictionary entry by 8 bytes on example, it increases the size of each dictionary entry by 8 bytes on
64-bit system. 64-bit system.
@ -386,8 +394,8 @@ Add a new ``verdict`` type, subtype of ``dict``. When guards are needed,
use the ``verdict`` for namespaces (module namespace, type namespace, use the ``verdict`` for namespaces (module namespace, type namespace,
instance namespace, etc.) instead of ``dict``. instance namespace, etc.) instead of ``dict``.
Leave the ``dict`` type unchanged to not add any overhead (memory Leave the ``dict`` type unchanged to not add any overhead (CPU, memory
footprint) when guards are not needed. footprint) when guards are not used.
Technical issue: a lot of C code in the wild, including CPython core, Technical issue: a lot of C code in the wild, including CPython core,
expecting the exact ``dict`` type. Issues: expecting the exact ``dict`` type. Issues:
@ -396,7 +404,7 @@ expecting the exact ``dict`` type. Issues:
use ``globals={}``. It is not possible to cast the ``dict`` to a use ``globals={}``. It is not possible to cast the ``dict`` to a
``dict`` subtype because the caller expects the ``globals`` parameter ``dict`` subtype because the caller expects the ``globals`` parameter
to be modified (``dict`` is mutable). to be modified (``dict`` is mutable).
* Functions call directly ``PyDict_xxx()`` functions, instead of calling * C functions call directly ``PyDict_xxx()`` functions, instead of calling
``PyObject_xxx()`` if the object is a ``dict`` subtype ``PyObject_xxx()`` if the object is a ``dict`` subtype
* ``PyDict_CheckExact()`` check fails on ``dict`` subtype, whereas some * ``PyDict_CheckExact()`` check fails on ``dict`` subtype, whereas some
functions require the exact ``dict`` type. functions require the exact ``dict`` type.
@ -427,7 +435,7 @@ was merged into Python 2.6. The patch adds a "type attribute cache
version tag" (``tp_version_tag``) and a "valid version tag" flag to version tag" (``tp_version_tag``) and a "valid version tag" flag to
types (the ``PyTypeObject`` structure). types (the ``PyTypeObject`` structure).
The type version tag is not available at the Python level. The type version tag is not exposed at the Python level.
The version tag has the C type ``unsigned int``. The cache is a global The version tag has the C type ``unsigned int``. The cache is a global
hash table of 4096 entries, shared by all types. The cache is global to hash table of 4096 entries, shared by all types. The cache is global to
@ -478,7 +486,8 @@ globals+builtins lookup optimization
the field has the C type ``size_t``. the field has the C type ``size_t``.
Thread on python-dev: `About dictionary lookup caching Thread on python-dev: `About dictionary lookup caching
<https://mail.python.org/pipermail/python-dev/2006-December/070348.html>`_. <https://mail.python.org/pipermail/python-dev/2006-December/070348.html>`_
(December 2006).
Guard against changing dict during iteration Guard against changing dict during iteration
@ -488,13 +497,7 @@ In 2013, Serhiy Storchaka proposed `Guard against changing dict during
iteration (issue #19332) <https://bugs.python.org/issue19332>`_ which iteration (issue #19332) <https://bugs.python.org/issue19332>`_ which
adds a ``ma_count`` field to the ``PyDictObject`` structure (``dict`` adds a ``ma_count`` field to the ``PyDictObject`` structure (``dict``
type), the field has the C type ``size_t``. This field is incremented type), the field has the C type ``size_t``. This field is incremented
when the dictionary is modified, and so is very similar to the proposed when the dictionary is modified.
dictionary version.
Sadly, the dictionary version proposed in this PEP doesn't help to
detect dictionary mutation. The dictionary version changes when values
are replaced, whereas modifying dictionary values while iterating on
dictionary keys is legit in Python.
PySizer PySizer
@ -515,6 +518,10 @@ Discussion
Thread on the mailing lists: Thread on the mailing lists:
* python-dev: `Updated PEP 509
<https://mail.python.org/pipermail/python-dev/2016-April/144250.html>`_
* python-dev: `RFC: PEP 509: Add a private version to dict
<https://mail.python.org/pipermail/python-dev/2016-April/144137.html>`_
* python-dev: `PEP 509: Add a private version to dict * python-dev: `PEP 509: Add a private version to dict
<https://mail.python.org/pipermail/python-dev/2016-January/142685.html>`_ <https://mail.python.org/pipermail/python-dev/2016-January/142685.html>`_
(january 2016) (january 2016)