573 lines
21 KiB
Plaintext
573 lines
21 KiB
Plaintext
PEP: 509
|
|
Title: Add a private version to dict
|
|
Author: Victor Stinner <vstinner@python.org>
|
|
Status: Superseded
|
|
Type: Standards Track
|
|
Content-Type: text/x-rst
|
|
Created: 04-Jan-2016
|
|
Python-Version: 3.6
|
|
Post-History: `08-Jan-2016 <https://mail.python.org/archives/list/python-ideas@python.org/thread/FPB7MLWMRSCYQVOUYN2SUV4NR47TPPG3/>`__,
|
|
`11-Jan-2016 <https://mail.python.org/archives/list/python-dev@python.org/thread/AY42HER5DFKA7DJV25AL7YX5DQPF6RMV/>`__,
|
|
`14-Apr-2016 <https://mail.python.org/archives/list/python-dev@python.org/thread/UXEQIDSTVLAHJVDQJMJPCU2QZGYMBV2H/>`__,
|
|
`19-Apr-2016 <https://mail.python.org/archives/list/python-dev@python.org/thread/2S562SVRK5S2QKP3SFONG357Z72I6KLE/>`__
|
|
Superseded-By: 699
|
|
Resolution: https://mail.python.org/archives/list/python-dev@python.org/message/QFVJV6YQOUSWIYY4FBORY647YCBSCIMQ/
|
|
|
|
|
|
Abstract
|
|
========
|
|
|
|
Add a new private version to the builtin ``dict`` type, incremented at
|
|
each dictionary creation and at each dictionary change, to implement
|
|
fast guards on namespaces.
|
|
|
|
|
|
Rationale
|
|
=========
|
|
|
|
In Python, the builtin ``dict`` type is used by many instructions. For
|
|
example, the ``LOAD_GLOBAL`` instruction looks up a variable in the
|
|
global namespace, or in the builtins namespace (two dict lookups).
|
|
Python uses ``dict`` for the builtins namespace, globals namespace, type
|
|
namespaces, instance namespaces, etc. The local namespace (function
|
|
namespace) is usually optimized to an array, but it can be a dict too.
|
|
|
|
Python is hard to optimize because almost everything is mutable: builtin
|
|
functions, function code, global variables, local variables, ... can be
|
|
modified at runtime. Implementing optimizations respecting the Python
|
|
semantics requires to detect when "something changes": we will call
|
|
these checks "guards".
|
|
|
|
The speedup of optimizations depends on the speed of guard checks. This
|
|
PEP proposes to add a private version to dictionaries to implement fast
|
|
guards on namespaces.
|
|
|
|
Dictionary lookups can be skipped if the version does not change, which
|
|
is the common case for most namespaces. The version is globally unique,
|
|
so checking the version is also enough to verify that the namespace
|
|
dictionary was not replaced with a new dictionary.
|
|
|
|
When the dictionary version does not change, the performance of a guard
|
|
does not depend on the number of watched dictionary entries: the
|
|
complexity is O(1).
|
|
|
|
Example of optimization: copy the value of a global variable to function
|
|
constants. This optimization requires a guard on the global variable to
|
|
check if it was modified after it was copied. If the global variable is
|
|
not modified, the function uses the cached copy. If the global variable
|
|
is modified, the function uses a regular lookup, and maybe also
|
|
deoptimizes the function (to remove the overhead of the guard check for
|
|
next function calls).
|
|
|
|
See the :pep:`510 -- Specialized functions with guards <510>`
|
|
for concrete usage of
|
|
guards to specialize functions and for a more general rationale on
|
|
Python static optimizers.
|
|
|
|
|
|
Guard example
|
|
=============
|
|
|
|
Pseudo-code of a fast guard to check if a dictionary entry was modified
|
|
(created, updated or deleted) using a hypothetical
|
|
``dict_get_version(dict)`` function::
|
|
|
|
UNSET = object()
|
|
|
|
class GuardDictKey:
|
|
def __init__(self, dict, key):
|
|
self.dict = dict
|
|
self.key = key
|
|
self.value = dict.get(key, UNSET)
|
|
self.version = dict_get_version(dict)
|
|
|
|
def check(self):
|
|
"""Return True if the dictionary entry did not change
|
|
and the dictionary was not replaced."""
|
|
|
|
# read the version of the dictionary
|
|
version = dict_get_version(self.dict)
|
|
if version == self.version:
|
|
# Fast-path: dictionary lookup avoided
|
|
return True
|
|
|
|
# lookup in the dictionary
|
|
value = self.dict.get(self.key, UNSET)
|
|
if value is self.value:
|
|
# another key was modified:
|
|
# cache the new dictionary version
|
|
self.version = version
|
|
return True
|
|
|
|
# the key was modified
|
|
return False
|
|
|
|
|
|
Usage of the dict version
|
|
=========================
|
|
|
|
Speedup method calls
|
|
--------------------
|
|
|
|
Yury Selivanov wrote a `patch to optimize method calls
|
|
<https://bugs.python.org/issue26110>`_. The patch depends on the
|
|
`"implement per-opcode cache in ceval"
|
|
<https://bugs.python.org/issue26219>`_ patch which requires dictionary
|
|
versions to invalidate the cache if the globals dictionary or the
|
|
builtins dictionary has been modified.
|
|
|
|
The cache also requires that the dictionary version is globally unique.
|
|
It is possible to define a function in a namespace and call it in a
|
|
different namespace, using ``exec()`` with the *globals* parameter for
|
|
example. In this case, the globals dictionary was replaced and the cache
|
|
must also be invalidated.
|
|
|
|
|
|
Specialized functions using guards
|
|
----------------------------------
|
|
|
|
:pep:`510` proposes an API to support
|
|
specialized functions with guards. It allows to implement static
|
|
optimizers for Python without breaking the Python semantics.
|
|
|
|
The `fatoptimizer <http://fatoptimizer.readthedocs.org/>`_ of the `FAT
|
|
Python <http://faster-cpython.readthedocs.org/fat_python.html>`_ project
|
|
is an example of a static Python optimizer. It implements many
|
|
optimizations which require guards on namespaces:
|
|
|
|
* Call pure builtins: to replace ``len("abc")`` with ``3``, guards on
|
|
``builtins.__dict__['len']`` and ``globals()['len']`` are required
|
|
* Loop unrolling: to unroll the loop ``for i in range(...): ...``,
|
|
guards on ``builtins.__dict__['range']`` and ``globals()['range']``
|
|
are required
|
|
* etc.
|
|
|
|
|
|
Pyjion
|
|
------
|
|
|
|
According of Brett Cannon, one of the two main developers of Pyjion,
|
|
Pyjion can benefit from dictionary version to implement optimizations.
|
|
|
|
`Pyjion <https://github.com/Microsoft/Pyjion>`_ is a JIT compiler for
|
|
Python based upon CoreCLR (Microsoft .NET Core runtime).
|
|
|
|
|
|
Cython
|
|
------
|
|
|
|
Cython can benefit from dictionary version to implement optimizations.
|
|
|
|
`Cython <http://cython.org/>`_ is an optimising static compiler for both
|
|
the Python programming language and the extended Cython programming
|
|
language.
|
|
|
|
|
|
Unladen Swallow
|
|
---------------
|
|
|
|
Even if dictionary version was not explicitly mentioned, optimizing
|
|
globals and builtins lookup was part of the Unladen Swallow plan:
|
|
"Implement one of the several proposed schemes for speeding lookups of
|
|
globals and builtins." (source: `Unladen Swallow ProjectPlan
|
|
<https://code.google.com/p/unladen-swallow/wiki/ProjectPlan>`_).
|
|
|
|
Unladen Swallow is a fork of CPython 2.6.1 adding a JIT compiler
|
|
implemented with LLVM. The project stopped in 2011: `Unladen Swallow
|
|
Retrospective
|
|
<http://qinsb.blogspot.com.au/2011/03/unladen-swallow-retrospective.html>`_.
|
|
|
|
|
|
Changes
|
|
=======
|
|
|
|
Add a ``ma_version_tag`` field to the ``PyDictObject`` structure with
|
|
the C type ``PY_UINT64_T``, 64-bit unsigned integer. Add also a global
|
|
dictionary version.
|
|
|
|
Each time a dictionary is created, the global version is incremented and
|
|
the dictionary version is initialized to the global version.
|
|
|
|
Each time the dictionary content is modified, the global version must be
|
|
incremented and copied to the dictionary version. Dictionary methods
|
|
which can modify its content:
|
|
|
|
* ``clear()``
|
|
* ``pop(key)``
|
|
* ``popitem()``
|
|
* ``setdefault(key, value)``
|
|
* ``__delitem__(key)``
|
|
* ``__setitem__(key, value)``
|
|
* ``update(...)``
|
|
|
|
The choice of increasing or not the version when a dictionary method
|
|
does not change its content is left to the Python implementation. A
|
|
Python implementation can decide to not increase the version to avoid
|
|
dictionary lookups in guards. Examples of cases when dictionary methods
|
|
don't modify its content:
|
|
|
|
* ``clear()`` if the dict is already empty
|
|
* ``pop(key)`` if the key does not exist
|
|
* ``popitem()`` if the dict is empty
|
|
* ``setdefault(key, value)`` if the key already exists
|
|
* ``__delitem__(key)`` if the key does not exist
|
|
* ``__setitem__(key, value)`` if the new value is identical to the
|
|
current value
|
|
* ``update()`` if called without argument or if new values are identical
|
|
to current values
|
|
|
|
Setting a key to a new value equals to the old value is also considered
|
|
as an operation modifying the dictionary content.
|
|
|
|
Two different empty dictionaries must have a different version to be
|
|
able to identify a dictionary just by its version. It allows to verify
|
|
in a guard that a namespace was not replaced without storing a strong
|
|
reference to the dictionary. Using a borrowed reference does not work:
|
|
if the old dictionary is destroyed, it is possible that a new dictionary
|
|
is allocated at the same memory address. By the way, dictionaries don't
|
|
support weak references.
|
|
|
|
The version increase must be atomic. In CPython, the Global Interpreter
|
|
Lock (GIL) already protects ``dict`` methods to make changes atomic.
|
|
|
|
Example using a hypothetical ``dict_get_version(dict)`` function::
|
|
|
|
>>> d = {}
|
|
>>> dict_get_version(d)
|
|
100
|
|
>>> d['key'] = 'value'
|
|
>>> dict_get_version(d)
|
|
101
|
|
>>> d['key'] = 'new value'
|
|
>>> dict_get_version(d)
|
|
102
|
|
>>> del d['key']
|
|
>>> dict_get_version(d)
|
|
103
|
|
|
|
The field is called ``ma_version_tag``, rather than ``ma_version``, to
|
|
suggest to compare it using ``version_tag == old_version_tag``, rather
|
|
than ``version <= old_version`` which becomes wrong after an integer
|
|
overflow.
|
|
|
|
|
|
Backwards Compatibility
|
|
=======================
|
|
|
|
Since the ``PyDictObject`` structure is not part of the stable ABI and
|
|
the new dictionary version not exposed at the Python scope, changes are
|
|
backward compatible.
|
|
|
|
|
|
Implementation and Performance
|
|
==============================
|
|
|
|
The `issue #26058: PEP 509: Add ma_version_tag to PyDictObject
|
|
<https://bugs.python.org/issue26058>`_ contains a patch implementing
|
|
this PEP.
|
|
|
|
On pybench and timeit microbenchmarks, the patch does not seem to add
|
|
any overhead on dictionary operations. For example, the following timeit
|
|
micro-benchmarks takes 318 nanoseconds before and after the change::
|
|
|
|
python3.6 -m timeit 'd={1: 0}; d[2]=0; d[3]=0; d[4]=0; del d[1]; del d[2]; d.clear()'
|
|
|
|
When the version does not change, ``PyDict_GetItem()`` takes 14.8 ns for
|
|
a dictionary lookup, whereas a guard check only takes 3.8 ns. Moreover,
|
|
a guard can watch for multiple keys. For example, for an optimization
|
|
using 10 global variables in a function, 10 dictionary lookups costs 148
|
|
ns, whereas the guard still only costs 3.8 ns when the version does not
|
|
change (39x as fast).
|
|
|
|
The `fat module
|
|
<http://fatoptimizer.readthedocs.org/en/latest/fat.html>`_ implements
|
|
such guards: ``fat.GuardDict`` is based on the dictionary version.
|
|
|
|
|
|
Integer overflow
|
|
================
|
|
|
|
The implementation uses the C type ``PY_UINT64_T`` to store the version:
|
|
a 64 bits unsigned integer. The C code uses ``version++``. On integer
|
|
overflow, the version is wrapped to ``0`` (and then continues to be
|
|
incremented) according to the C standard.
|
|
|
|
After an integer overflow, a guard can succeed whereas the watched
|
|
dictionary key was modified. The bug only occurs at a guard check if
|
|
there are exactly ``2 ** 64`` dictionary creations or modifications
|
|
since the previous guard check.
|
|
|
|
If a dictionary is modified every nanosecond, ``2 ** 64`` modifications
|
|
takes longer than 584 years. Using a 32-bit version, it only takes 4
|
|
seconds. That's why a 64-bit unsigned type is also used on 32-bit
|
|
systems. A dictionary lookup at the C level takes 14.8 ns.
|
|
|
|
A risk of a bug every 584 years is acceptable.
|
|
|
|
|
|
Alternatives
|
|
============
|
|
|
|
Expose the version at Python level as a read-only __version__ property
|
|
----------------------------------------------------------------------
|
|
|
|
The first version of the PEP proposed to expose the dictionary version
|
|
as a read-only ``__version__`` property at Python level, and also to add
|
|
the property to ``collections.UserDict`` (since this type must mimic
|
|
the ``dict`` API).
|
|
|
|
There are multiple issues:
|
|
|
|
* To be consistent and avoid bad surprises, the version must be added to
|
|
all mapping types. Implementing a new mapping type would require extra
|
|
work for no benefit, since the version is only required on the
|
|
``dict`` type in practice.
|
|
* All Python implementations would have to implement this new property,
|
|
it gives more work to other implementations, whereas they may not use
|
|
the dictionary version at all.
|
|
* Exposing the dictionary version at the Python level can lead the
|
|
false assumption on performances. Checking ``dict.__version__`` at
|
|
the Python level is not faster than a dictionary lookup. A dictionary
|
|
lookup in Python has a cost of 48.7 ns and checking the version has a
|
|
cost of 47.5 ns, the difference is only 1.2 ns (3%)::
|
|
|
|
|
|
$ python3.6 -m timeit -s 'd = {str(i):i for i in range(100)}' 'd["33"] == 33'
|
|
10000000 loops, best of 3: 0.0487 usec per loop
|
|
$ python3.6 -m timeit -s 'd = {str(i):i for i in range(100)}' 'd.__version__ == 100'
|
|
10000000 loops, best of 3: 0.0475 usec per loop
|
|
|
|
* The ``__version__`` can be wrapped on integer overflow. It is error
|
|
prone: using ``dict.__version__ <= guard_version`` is wrong,
|
|
``dict.__version__ == guard_version`` must be used instead to reduce
|
|
the risk of bug on integer overflow (even if the integer overflow is
|
|
unlikely in practice).
|
|
|
|
Mandatory bikeshedding on the property name:
|
|
|
|
* ``__cache_token__``: name proposed by Nick Coghlan, name coming from
|
|
`abc.get_cache_token()
|
|
<https://docs.python.org/3/library/abc.html#abc.get_cache_token>`_.
|
|
* ``__version__``
|
|
* ``__version_tag__``
|
|
* ``__timestamp__``
|
|
|
|
|
|
Add a version to each dict entry
|
|
--------------------------------
|
|
|
|
A single version per dictionary requires to keep a strong reference to
|
|
the value which can keep the value alive longer than expected. If we add
|
|
also a version per dictionary entry, the guard can only store the entry
|
|
version (a simple integer) to avoid the strong reference to the value:
|
|
only strong references to the dictionary and to the key are needed.
|
|
|
|
Changes: add a ``me_version_tag`` field to the ``PyDictKeyEntry``
|
|
structure, the field has the C type ``PY_UINT64_T``. When a key is
|
|
created or modified, the entry version is set to the dictionary version
|
|
which is incremented at any change (create, modify, delete).
|
|
|
|
Pseudo-code of a fast guard to check if a dictionary key was modified
|
|
using hypothetical ``dict_get_version(dict)`` and
|
|
``dict_get_entry_version(dict)`` functions::
|
|
|
|
UNSET = object()
|
|
|
|
class GuardDictKey:
|
|
def __init__(self, dict, key):
|
|
self.dict = dict
|
|
self.key = key
|
|
self.dict_version = dict_get_version(dict)
|
|
self.entry_version = dict_get_entry_version(dict, key)
|
|
|
|
def check(self):
|
|
"""Return True if the dictionary entry did not change
|
|
and the dictionary was not replaced."""
|
|
|
|
# read the version of the dictionary
|
|
dict_version = dict_get_version(self.dict)
|
|
if dict_version == self.version:
|
|
# Fast-path: dictionary lookup avoided
|
|
return True
|
|
|
|
# lookup in the dictionary to read the entry version
|
|
entry_version = get_dict_key_version(dict, key)
|
|
if entry_version == self.entry_version:
|
|
# another key was modified:
|
|
# cache the new dictionary version
|
|
self.dict_version = dict_version
|
|
self.entry_version = entry_version
|
|
return True
|
|
|
|
# the key was modified
|
|
return False
|
|
|
|
The main drawback of this option is the impact on the memory footprint.
|
|
It increases the size of each dictionary entry, so the overhead depends
|
|
on the number of buckets (dictionary entries, used or not used). For
|
|
example, it increases the size of each dictionary entry by 8 bytes on
|
|
64-bit system.
|
|
|
|
In Python, the memory footprint matters and the trend is to reduce it.
|
|
Examples:
|
|
|
|
* :pep:`393` -- Flexible String Representation
|
|
* :pep:`412` -- Key-Sharing Dictionary
|
|
|
|
|
|
Add a new dict subtype
|
|
----------------------
|
|
|
|
Add a new ``verdict`` type, subtype of ``dict``. When guards are needed,
|
|
use the ``verdict`` for namespaces (module namespace, type namespace,
|
|
instance namespace, etc.) instead of ``dict``.
|
|
|
|
Leave the ``dict`` type unchanged to not add any overhead (CPU, memory
|
|
footprint) when guards are not used.
|
|
|
|
Technical issue: a lot of C code in the wild, including CPython core,
|
|
expecting the exact ``dict`` type. Issues:
|
|
|
|
* ``exec()`` requires a ``dict`` for globals and locals. A lot of code
|
|
use ``globals={}``. It is not possible to cast the ``dict`` to a
|
|
``dict`` subtype because the caller expects the ``globals`` parameter
|
|
to be modified (``dict`` is mutable).
|
|
* C functions call directly ``PyDict_xxx()`` functions, instead of calling
|
|
``PyObject_xxx()`` if the object is a ``dict`` subtype
|
|
* ``PyDict_CheckExact()`` check fails on ``dict`` subtype, whereas some
|
|
functions require the exact ``dict`` type.
|
|
* ``Python/ceval.c`` does not completely supports dict subtypes for
|
|
namespaces
|
|
|
|
|
|
The ``exec()`` issue is a blocker issue.
|
|
|
|
Other issues:
|
|
|
|
* The garbage collector has a special code to "untrack" ``dict``
|
|
instances. If a ``dict`` subtype is used for namespaces, the garbage
|
|
collector can be unable to break some reference cycles.
|
|
* Some functions have a fast-path for ``dict`` which would not be taken
|
|
for ``dict`` subtypes, and so it would make Python a little bit
|
|
slower.
|
|
|
|
|
|
Prior Art
|
|
=========
|
|
|
|
Method cache and type version tag
|
|
---------------------------------
|
|
|
|
In 2007, Armin Rigo wrote a patch to implement a cache of methods. It
|
|
was merged into Python 2.6. The patch adds a "type attribute cache
|
|
version tag" (``tp_version_tag``) and a "valid version tag" flag to
|
|
types (the ``PyTypeObject`` structure).
|
|
|
|
The type version tag is not exposed at the Python level.
|
|
|
|
The version tag has the C type ``unsigned int``. The cache is a global
|
|
hash table of 4096 entries, shared by all types. The cache is global to
|
|
"make it fast, have a deterministic and low memory footprint, and be
|
|
easy to invalidate". Each cache entry has a version tag. A global
|
|
version tag is used to create the next version tag, it also has the C
|
|
type ``unsigned int``.
|
|
|
|
By default, a type has its "valid version tag" flag cleared to indicate
|
|
that the version tag is invalid. When the first method of the type is
|
|
cached, the version tag and the "valid version tag" flag are set. When a
|
|
type is modified, the "valid version tag" flag of the type and its
|
|
subclasses is cleared. Later, when a cache entry of these types is used,
|
|
the entry is removed because its version tag is outdated.
|
|
|
|
On integer overflow, the whole cache is cleared and the global version
|
|
tag is reset to ``0``.
|
|
|
|
See `Method cache (issue #1685986)
|
|
<https://bugs.python.org/issue1685986>`_ and `Armin's method cache
|
|
optimization updated for Python 2.6 (issue #1700288)
|
|
<https://bugs.python.org/issue1700288>`_.
|
|
|
|
|
|
Globals / builtins cache
|
|
------------------------
|
|
|
|
In 2010, Antoine Pitrou proposed a `Globals / builtins cache (issue
|
|
#10401) <http://bugs.python.org/issue10401>`_ which adds a private
|
|
``ma_version`` field to the ``PyDictObject`` structure (``dict`` type),
|
|
the field has the C type ``Py_ssize_t``.
|
|
|
|
The patch adds a "global and builtin cache" to functions and frames, and
|
|
changes ``LOAD_GLOBAL`` and ``STORE_GLOBAL`` instructions to use the
|
|
cache.
|
|
|
|
The change on the ``PyDictObject`` structure is very similar to this
|
|
PEP.
|
|
|
|
|
|
Cached globals+builtins lookup
|
|
------------------------------
|
|
|
|
In 2006, Andrea Griffini proposed a patch implementing a `Cached
|
|
globals+builtins lookup optimization
|
|
<https://bugs.python.org/issue1616125>`_. The patch adds a private
|
|
``timestamp`` field to the ``PyDictObject`` structure (``dict`` type),
|
|
the field has the C type ``size_t``.
|
|
|
|
Thread on python-dev: `About dictionary lookup caching
|
|
<https://mail.python.org/pipermail/python-dev/2006-December/070348.html>`_
|
|
(December 2006).
|
|
|
|
|
|
Guard against changing dict during iteration
|
|
--------------------------------------------
|
|
|
|
In 2013, Serhiy Storchaka proposed `Guard against changing dict during
|
|
iteration (issue #19332) <https://bugs.python.org/issue19332>`_ which
|
|
adds a ``ma_count`` field to the ``PyDictObject`` structure (``dict``
|
|
type), the field has the C type ``size_t``. This field is incremented
|
|
when the dictionary is modified.
|
|
|
|
|
|
PySizer
|
|
-------
|
|
|
|
`PySizer <http://pysizer.8325.org/>`_: a memory profiler for Python,
|
|
Google Summer of Code 2005 project by Nick Smallbone.
|
|
|
|
This project has a patch for CPython 2.4 which adds ``key_time`` and
|
|
``value_time`` fields to dictionary entries. It uses a global
|
|
process-wide counter for dictionaries, incremented each time that a
|
|
dictionary is modified. The times are used to decide when child objects
|
|
first appeared in their parent objects.
|
|
|
|
|
|
Discussion
|
|
==========
|
|
|
|
Thread on the mailing lists:
|
|
|
|
* python-dev: `Updated PEP 509
|
|
<https://mail.python.org/pipermail/python-dev/2016-April/144250.html>`_
|
|
* python-dev: `RFC: PEP 509: Add a private version to dict
|
|
<https://mail.python.org/pipermail/python-dev/2016-April/144137.html>`_
|
|
* python-dev: `PEP 509: Add a private version to dict
|
|
<https://mail.python.org/pipermail/python-dev/2016-January/142685.html>`_
|
|
(January 2016)
|
|
* python-ideas: `RFC: PEP: Add dict.__version__
|
|
<https://mail.python.org/pipermail/python-ideas/2016-January/037702.html>`_
|
|
(January 2016)
|
|
|
|
|
|
Acceptance
|
|
==========
|
|
|
|
The PEP was `accepted on 2016-09-07 by Guido van Rossum
|
|
<https://mail.python.org/pipermail/python-dev/2016-September/146298.html>`_.
|
|
The PEP implementation has since been committed to the repository.
|
|
|
|
|
|
Copyright
|
|
=========
|
|
|
|
This document has been placed in the public domain.
|