* __setitem__() now always increases the version: remove the micro-optimization
  dict[key] is new_value
* be more explict on version++: explain that the operation must be atomic, and
  that dict methods are already atomic thanks to the GIL
* "Guard against changing dict during iteration": don't guess if the new dict
  version can be used or not. Let's discuss that later.
* rephrase/complete some sections
* add links to new threads on python-dev
This commit is contained in:
Victor Stinner 2016-04-19 10:43:02 +02:00
parent 24ca07b6c9
commit 37ca004dd3
1 changed files with 103 additions and 96 deletions

View File

@ -22,11 +22,11 @@ Rationale
=========
In Python, the builtin ``dict`` type is used by many instructions. For
example, the ``LOAD_GLOBAL`` instruction searchs for a variable in the
example, the ``LOAD_GLOBAL`` instruction looks up a variable in the
global namespace, or in the builtins namespace (two dict lookups).
Python uses ``dict`` for the builtins namespace, globals namespace, type
namespaces, instance namespaces, etc. The local namespace (namespace of
a function) is usually optimized to an array, but it can be a dict too.
namespaces, instance namespaces, etc. The local namespace (function
namespace) is usually optimized to an array, but it can be a dict too.
Python is hard to optimize because almost everything is mutable: builtin
functions, function code, global variables, local variables, ... can be
@ -35,26 +35,29 @@ semantics requires to detect when "something changes": we will call
these checks "guards".
The speedup of optimizations depends on the speed of guard checks. This
PEP proposes to add a version to dictionaries to implement fast guards
on namespaces.
PEP proposes to add a private version to dictionaries to implement fast
guards on namespaces.
Dictionary lookups can be skipped if the version does not change which
is the common case for most namespaces. Since the version is globally
unique, the version is also enough to check if the namespace dictionary
was not replaced with a new dictionary. The performance of a guard does
not depend on the number of watched dictionary entries, complexity of
O(1), if the dictionary version does not change.
is the common case for most namespaces. The version is globally unique,
so checking the version is also enough to check if the namespace
dictionary was not replaced with a new dictionary.
When the dictionary version does not change, the performance of a guard
does not depend on the number of watched dictionary entries: the
complexity is O(1).
Example of optimization: copy the value of a global variable to function
constants. This optimization requires a guard on the global variable to
check if it was modified. If the variable is modified, the variable must
be loaded at runtime when the function is called, instead of using the
constant.
check if it was modified. If the global variable is not modified, the
function uses the cached copy. If the global variable is modified, the
function uses a regular lookup, and maybe also deoptimize the function
(to remove the overhead of the guard check for next function calls).
See the `PEP 510 -- Specialized functions with guards
<https://www.python.org/dev/peps/pep-0510/>`_ for the concrete usage of
guards to specialize functions and for the rationale on Python static
optimizers.
guards to specialize functions and for a more general rationale on
Python static optimizers.
Guard example
@ -77,7 +80,7 @@ Pseudo-code of an fast guard to check if a dictionary entry was modified
"""Return True if the dictionary entry did not change
and the dictionary was not replaced."""
# read the version of the dict structure
# read the version of the dictionary
version = dict_get_version(self.dict)
if version == self.version:
# Fast-path: dictionary lookup avoided
@ -98,21 +101,21 @@ Pseudo-code of an fast guard to check if a dictionary entry was modified
Usage of the dict version
=========================
Speedup method calls 1.2x
-------------------------
Speedup method calls
--------------------
Yury Selivanov wrote a `patch to optimize method calls
<https://bugs.python.org/issue26110>`_. The patch depends on the
`implement per-opcode cache in ceval
`"implement per-opcode cache in ceval"
<https://bugs.python.org/issue26219>`_ patch which requires dictionary
versions to invalidate the cache if the globals dictionary or the
builtins dictionary has been modified.
The cache also requires that the dictionary version is globally unique.
It is possible to define a function in a namespace and call it
in a different namespace: using ``exec()`` with the *globals* parameter
for example. In this case, the globals dictionary was changed and the
cache must be invalidated.
It is possible to define a function in a namespace and call it in a
different namespace, using ``exec()`` with the *globals* parameter for
example. In this case, the globals dictionary was replaced and the cache
must also be invalidated.
Specialized functions using guards
@ -123,28 +126,37 @@ The `PEP 510 -- Specialized functions with guards
specialized functions with guards. It allows to implement static
optimizers for Python without breaking the Python semantics.
Example of a static Python optimizer: the `fatoptimizer
<http://fatoptimizer.readthedocs.org/>`_ of the `FAT Python
<http://faster-cpython.readthedocs.org/fat_python.html>`_ project
implements many optimizations which require guards on namespaces.
Examples:
The `fatoptimizer <http://fatoptimizer.readthedocs.org/>`_ of the `FAT
Python <http://faster-cpython.readthedocs.org/fat_python.html>`_ project
is an example of a static Python optimizer. It implements many
optimizations which require guards on namespaces:
* Call pure builtins: to replace ``len("abc")`` with ``3``, guards on
``builtins.__dict__['len']`` and ``globals()['len']`` are required
* Loop unrolling: to unroll the loop ``for i in range(...): ...``,
guards on ``builtins.__dict__['range']`` and ``globals()['range']``
are required
* etc.
Pyjion
------
According of Brett Cannon, one of the two main developers of Pyjion,
Pyjion can also benefit from dictionary version to implement
optimizations.
Pyjion can benefit from dictionary version to implement optimizations.
Pyjion is a JIT compiler for Python based upon CoreCLR (Microsoft .NET
Core runtime).
`Pyjion <https://github.com/Microsoft/Pyjion>`_ is a JIT compiler for
Python based upon CoreCLR (Microsoft .NET Core runtime).
Cython
------
Cython can benefit from dictionary version to implement optimizations.
`Cython <http://cython.org/>`_ is an optimising static compiler for both
the Python programming language and the extended Cython programming
language.
Unladen Swallow
@ -153,8 +165,8 @@ Unladen Swallow
Even if dictionary version was not explicitly mentioned, optimizing
globals and builtins lookup was part of the Unladen Swallow plan:
"Implement one of the several proposed schemes for speeding lookups of
globals and builtins." Source: `Unladen Swallow ProjectPlan
<https://code.google.com/p/unladen-swallow/wiki/ProjectPlan>`_.
globals and builtins." (source: `Unladen Swallow ProjectPlan
<https://code.google.com/p/unladen-swallow/wiki/ProjectPlan>`_).
Unladen Swallow is a fork of CPython 2.6.1 adding a JIT compiler
implemented with LLVM. The project stopped in 2011: `Unladen Swallow
@ -172,23 +184,16 @@ version is incremented and the dictionary version is initialized to the
global version. The global version is also incremented and copied to the
dictionary version at each dictionary change:
* ``clear()`` if the dict was non-empty
* ``clear()`` if the dict is non-empty
* ``pop(key)`` if the key exists
* ``popitem()`` if the dict is non-empty
* ``setdefault(key, value)`` if the `key` does not exist
* ``__detitem__(key)`` if the key exists
* ``__setitem__(key, value)`` if the `key` doesn't exist or if the value
is not ``dict[key]``
* ``update(...)`` if new values are different than existing values:
values are compared by identity, not by their content; the version can
be incremented multiple times
* ``setdefault(key, value)`` if the key does not exist
* ``__delitem__(key)`` if the key exists
* ``__setitem__(key, value)`` always increases the version
* ``update(...)`` if called with arguments
The ``PyDictObject`` structure is not part of the stable ABI.
The field is called ``ma_version_tag`` rather than ``ma_version`` to
suggest to compare it using ``version_tag == old_version_tag`` rather
than ``version <= old_version`` which makes the integer overflow much
likely.
The version increase must be atomic. In CPython, the Global Interpreter
Lock (GIL) already protects ``dict`` methods to make them atomic.
Example using an hypothetical ``dict_get_version(dict)`` function::
@ -205,25 +210,23 @@ Example using an hypothetical ``dict_get_version(dict)`` function::
>>> dict_get_version(d)
103
The version is not incremented if an existing key is set to the same
value. For efficiency, values are compared by their identity:
``new_value is old_value``, not by their content:
``new_value == old_value``. Example::
``dict.__setitem__(key, value)`` and ``dict.update(...)`` always
increases the version, even if the new value is identical or is equal to
the current value (even if ``(dict[key] is value) or (dict[key] ==
value)``).
>>> d = {}
>>> value = object()
>>> d['key'] = value
>>> dict_get_version(d)
40
>>> d['key'] = value
>>> dict_get_version(d)
40
The field is called ``ma_version_tag``, rather than ``ma_version``, to
suggest to compare it using ``version_tag == old_version_tag``, rather
than ``version <= old_version`` which is wrong most of the time after an
integer overflow.
.. note::
CPython uses some singleton like integers in the range [-5; 257],
empty tuple, empty strings, Unicode strings of a single character in
the range [U+0000; U+00FF], etc. When a key is set twice to the same
singleton, the version is not modified.
Backwards Compatibility
=======================
Since the ``PyDictObject`` structure is not part of the stable ABI and
the new dictionary version not exposed at the Python scope, changes are
backward compatible.
Implementation and Performance
@ -234,7 +237,10 @@ The `issue #26058: PEP 509: Add ma_version_tag to PyDictObject
this PEP.
On pybench and timeit microbenchmarks, the patch does not seem to add
any overhead on dictionary operations.
any overhead on dictionary operations. For example, the following timeit
micro-benchmarks takes 318 nanoseconds before and after the change::
python3.6 -m timeit 'd={1: 0}; d[2]=0; d[3]=0; d[4]=0; del d[1]; del d[2]; d.clear()'
When the version does not change, ``PyDict_GetItem()`` takes 14.8 ns for
a dictionary lookup, whereas a guard check only takes 3.8 ns. Moreover,
@ -286,19 +292,19 @@ There are multiple issues:
all mapping types. Implementing a new mapping type would require extra
work for no benefit, since the version is only required on the
``dict`` type in practice.
* All Python implementations must implement this new property, it gives
more work to other implementations, whereas they may not use the
dictionary version at all.
* Exposing the dictionary version at Python level can lead the
* All Python implementations would have to implement this new property,
it gives more work to other implementations, whereas they may not use
the dictionary version at all.
* Exposing the dictionary version at the Python level can lead the
false assumption on performances. Checking ``dict.__version__`` at
the Python level is not faster than a dictionary lookup. A dictionary
lookup has a cost of 48.7 ns and checking a guard has a cost of 47.5
ns, the difference is only 1.2 ns (3%)::
lookup in Python has a cost of 48.7 ns and checking the version has a
cost of 47.5 ns, the difference is only 1.2 ns (3%)::
$ ./python -m timeit -s 'd = {str(i):i for i in range(100)}' 'd["33"] == 33'
$ python3.6 -m timeit -s 'd = {str(i):i for i in range(100)}' 'd["33"] == 33'
10000000 loops, best of 3: 0.0487 usec per loop
$ ./python -m timeit -s 'd = {str(i):i for i in range(100)}' 'd.__version__ == 100'
$ python3.6 -m timeit -s 'd = {str(i):i for i in range(100)}' 'd.__version__ == 100'
10000000 loops, best of 3: 0.0475 usec per loop
* The ``__version__`` can be wrapped on integer overflow. It is error
@ -313,6 +319,7 @@ Mandatory bikeshedding on the property name:
`abc.get_cache_token()
<https://docs.python.org/3/library/abc.html#abc.get_cache_token>`_.
* ``__version__``
* ``__version_tag__``
* ``__timestamp__``
@ -322,16 +329,16 @@ Add a version to each dict entry
A single version per dictionary requires to keep a strong reference to
the value which can keep the value alive longer than expected. If we add
also a version per dictionary entry, the guard can only store the entry
version to avoid the strong reference to the value (only strong
references to the dictionary and to the key are needed).
version (a simple integer) to avoid the strong reference to the value:
only strong references to the dictionary and to the key are needed.
Changes: add a ``me_version`` field to the ``PyDictKeyEntry`` structure,
the field has the C type ``PY_UINT64_T``. When a key is created or
modified, the entry version is set to the dictionary version which is
incremented at any change (create, modify, delete).
Changes: add a ``me_version_tag`` field to the ``PyDictKeyEntry``
structure, the field has the C type ``PY_UINT64_T``. When a key is
created or modified, the entry version is set to the dictionary version
which is incremented at any change (create, modify, delete).
Pseudo-code of an fast guard to check if a dictionary key was modified
using hypothetical ``dict_get_version(dict)``
using hypothetical ``dict_get_version(dict)`` and
``dict_get_entry_version(dict)`` functions::
UNSET = object()
@ -347,18 +354,19 @@ using hypothetical ``dict_get_version(dict)``
"""Return True if the dictionary entry did not change
and the dictionary was not replaced."""
# read the version of the dict structure
# read the version of the dictionary
dict_version = dict_get_version(self.dict)
if dict_version == self.version:
# Fast-path: dictionary lookup avoided
return True
# lookup in the dictionary
# lookup in the dictionary to read the entry version
entry_version = get_dict_key_version(dict, key)
if entry_version == self.entry_version:
# another key was modified:
# cache the new dictionary version
self.dict_version = dict_version
self.entry_version = entry_version
return True
# the key was modified
@ -366,7 +374,7 @@ using hypothetical ``dict_get_version(dict)``
The main drawback of this option is the impact on the memory footprint.
It increases the size of each dictionary entry, so the overhead depends
on the number of buckets (dictionary entries, used or unused yet). For
on the number of buckets (dictionary entries, used or not used). For
example, it increases the size of each dictionary entry by 8 bytes on
64-bit system.
@ -386,8 +394,8 @@ Add a new ``verdict`` type, subtype of ``dict``. When guards are needed,
use the ``verdict`` for namespaces (module namespace, type namespace,
instance namespace, etc.) instead of ``dict``.
Leave the ``dict`` type unchanged to not add any overhead (memory
footprint) when guards are not needed.
Leave the ``dict`` type unchanged to not add any overhead (CPU, memory
footprint) when guards are not used.
Technical issue: a lot of C code in the wild, including CPython core,
expecting the exact ``dict`` type. Issues:
@ -396,7 +404,7 @@ expecting the exact ``dict`` type. Issues:
use ``globals={}``. It is not possible to cast the ``dict`` to a
``dict`` subtype because the caller expects the ``globals`` parameter
to be modified (``dict`` is mutable).
* Functions call directly ``PyDict_xxx()`` functions, instead of calling
* C functions call directly ``PyDict_xxx()`` functions, instead of calling
``PyObject_xxx()`` if the object is a ``dict`` subtype
* ``PyDict_CheckExact()`` check fails on ``dict`` subtype, whereas some
functions require the exact ``dict`` type.
@ -427,7 +435,7 @@ was merged into Python 2.6. The patch adds a "type attribute cache
version tag" (``tp_version_tag``) and a "valid version tag" flag to
types (the ``PyTypeObject`` structure).
The type version tag is not available at the Python level.
The type version tag is not exposed at the Python level.
The version tag has the C type ``unsigned int``. The cache is a global
hash table of 4096 entries, shared by all types. The cache is global to
@ -478,7 +486,8 @@ globals+builtins lookup optimization
the field has the C type ``size_t``.
Thread on python-dev: `About dictionary lookup caching
<https://mail.python.org/pipermail/python-dev/2006-December/070348.html>`_.
<https://mail.python.org/pipermail/python-dev/2006-December/070348.html>`_
(December 2006).
Guard against changing dict during iteration
@ -488,13 +497,7 @@ In 2013, Serhiy Storchaka proposed `Guard against changing dict during
iteration (issue #19332) <https://bugs.python.org/issue19332>`_ which
adds a ``ma_count`` field to the ``PyDictObject`` structure (``dict``
type), the field has the C type ``size_t``. This field is incremented
when the dictionary is modified, and so is very similar to the proposed
dictionary version.
Sadly, the dictionary version proposed in this PEP doesn't help to
detect dictionary mutation. The dictionary version changes when values
are replaced, whereas modifying dictionary values while iterating on
dictionary keys is legit in Python.
when the dictionary is modified.
PySizer
@ -515,6 +518,10 @@ Discussion
Thread on the mailing lists:
* python-dev: `Updated PEP 509
<https://mail.python.org/pipermail/python-dev/2016-April/144250.html>`_
* python-dev: `RFC: PEP 509: Add a private version to dict
<https://mail.python.org/pipermail/python-dev/2016-April/144137.html>`_
* python-dev: `PEP 509: Add a private version to dict
<https://mail.python.org/pipermail/python-dev/2016-January/142685.html>`_
(january 2016)