From 534db9e409d047dc41b21575ad1b827ed0d0ffe3 Mon Sep 17 00:00:00 2001 From: Victor Stinner Date: Sat, 9 Jan 2016 23:28:43 +0100 Subject: [PATCH] Add PEP 509 (dict.__version__) and 510 (func.specialize) --- pep-0509.txt | 387 +++++++++++++++++++++++++++++++++++++++++++++++++++ pep-0510.txt | 213 ++++++++++++++++++++++++++++ 2 files changed, 600 insertions(+) create mode 100644 pep-0509.txt create mode 100644 pep-0510.txt diff --git a/pep-0509.txt b/pep-0509.txt new file mode 100644 index 000000000..25c9d7095 --- /dev/null +++ b/pep-0509.txt @@ -0,0 +1,387 @@ +PEP: 509 +Title: Add dict.__version__ +Version: $Revision$ +Last-Modified: $Date$ +Author: Victor Stinner +Status: Draft +Type: Standards Track +Content-Type: text/x-rst +Created: 4-January-2016 +Python-Version: 3.6 + + +Abstract +======== + +Add a new read-only ``__version__`` property to ``dict`` and +``collections.UserDict`` types, incremented at each change. + + +Rationale +========= + +In Python, the builtin ``dict`` type is used by many instructions. For +example, the ``LOAD_GLOBAL`` instruction searchs for a variable in the +global namespace, or in the builtins namespace (two dict lookups). +Python uses ``dict`` for the builtins namespace, globals namespace, type +namespaces, instance namespaces, etc. The local namespace (namespace of +a function) is usually optimized to an array, but it can be a dict too. + +Python is hard to optimize because almost everything is mutable: builtin +functions, function code, global variables, local variables, ... can be +modified at runtime. Implementing optimizations respecting the Python +semantic requires to detect when "something changes": we will call these +checks "guards". + +The speedup of optimizations depends on the speed of guard checks. This +PEP proposes to add a version to dictionaries to implement efficient +guards on namespaces. + +Example of optimization: replace loading a global variable with a +constant. This optimization requires a guard on the global variable to +check if it was modified. If the variable is modified, the variable must +be loaded at runtime, instead of using the constant. + + +Guard example +============= + +Pseudo-code of an efficient guard to check if a dictionary key was +modified (created, updated or deleted):: + + UNSET = object() + + class Guard: + def __init__(self, dict, key): + self.dict = dict + self.key = key + self.value = dict.get(key, UNSET) + self.version = dict.__version__ + + def check(self): + """Return True if the dictionary value did not changed.""" + version = self.dict.__version__ + if version == self.version: + # Fast-path: avoid the dictionary lookup + return True + + value = self.dict.get(self.key, UNSET) + if value == self.value: + # another key was modified: + # cache the new dictionary version + self.version = version + return True + + return False + + +Changes +======= + +Add a read-only ``__version__`` property to builtin ``dict`` type and to +the ``collections.UserDict`` type. New empty dictionaries are initilized +to version ``0``. The version is incremented at each change: + +* ``clear()`` if the dict was non-empty +* ``pop(key)`` if the key exists +* ``popitem()`` if the dict is non-empty +* ``setdefault(key, value)`` if the `key` does not exist +* ``__detitem__(key)`` if the key exists +* ``__setitem__(key, value)`` if the `key` doesn't exist or if the value + is different +* ``update(...)`` if new values are different than existing values (the + version can be incremented multiple times) + +Example:: + + >>> d = {} + >>> d.__version__ + 0 + >>> d['key'] = 'value' + >>> d.__version__ + 1 + >>> d['key'] = 'new value' + >>> d.__version__ + 2 + >>> del d['key'] + >>> d.__version__ + 3 + +If a dictionary is created with items, the version is also incremented +at each dictionary insertion. Example:: + + >>> d=dict(x=7, y=33) + >>> d.__version__ + 2 + +The version is not incremented is an existing key is modified to the +same value, but only the identifier of the value is tested, not the +content of the value. Example:: + + >>> d={} + >>> value = object() + >>> d['key'] = value + >>> d.__version__ + 2 + >>> d['key'] = value + >>> d.__version__ + 2 + +.. note:: + CPython uses some singleton like integers in the range [-5; 257], + empty tuple, empty strings, Unicode strings of a single character in + the range [U+0000; U+00FF], etc. When a key is set twice to the same + singleton, the version is not modified. + +The PEP is designed to implement guards on namespaces, only the ``dict`` +type can be used for namespaces in practice. ``collections.UserDict`` +is modified because it must mimicks ``dict``. ``collections.Mapping`` is +unchanged. + + +Integer overflow +================ + +The implementation uses the C unsigned integer type ``size_t`` to store +the version. On 32-bit systems, the maximum version is ``2**32-1`` +(more than ``4.2 * 10 ** 9``, 4 billions). On 64-bit systems, the maximum +version is ``2**64-1`` (more than ``1.8 * 10**19``). + +The C code uses ``version++``. On integer overflow, the version is +wrapped to ``0`` (and then continue to be incremented). + +The check ``dict.__version__ == old_version`` can be true after an +integer overflow, so a guard can return false even if the value changed, +which is wrong. The bug occurs if the dict is modified at least ``2**64`` +times (on 64-bit system) between two checks of the guard. + +Using a more complex type (ex: ``PyLongObject``) to avoid the overflow +would slow down operations on the ``dict`` type. Even if there is a +theorical risk of missing a value change, the risk is considered too low +compared to the slow down of using a more complex type. + + +Usage of dict.__version__ +========================= + +Detect dictionary mutation during iteration +------------------------------------------- + +Currently, iterating on a dictionary only detects when the dictionary +size changes, but not when keys or values are modified. Using the +dictionary version, it would be possible to detect when keys and values +are modified. + +See the `issue #19332: Guard against changing dict during iteration +`_. + + +astoptimizer of FAT Python +-------------------------- + +The astoptimizer of the FAT Python project implements many optimizations +which require guards on namespaces. Examples: + +* Call pure builtins: to replace ``len("abc")`` with ``3``, guards on + ``builtins.__dict__['len']`` and ``globals()['len']`` are required +* Loop unrolling: to unroll the loop ``for i in range(...): ...``, + guards on ``builtins.__dict__['range']`` and ``globals()['range']`` + are required + +The `FAT Python +`_ project is a +static optimizer for Python 3.6. + + +Pyjion +------ + +According of Brett Cannon, one of the two main developers of Pyjion, Pyjion can +also benefit from dictionary version to implement optimizations. + +Pyjion is a JIT compiler for Python based upon CoreCLR (Microsoft .NET Core +runtime). + + +Unladen Swallow +--------------- + +Even if dictionary version was not explicitly mentionned, optimization globals +and builtins lookup was part of the Unladen Swallow plan: "Implement one of the +several proposed schemes for speeding lookups of globals and builtins." +Source: `Unladen Swallow ProjectPlan +`_. + +Unladen Swallow is a fork of CPython 2.6.1 adding a JIT compiler implemented +with LLVM. The project stopped in 2011: `Unladen Swallow Retrospective +`_. + + +Implementation +============== + +See the `issue #26058: Add dict.__version__ read-only property +`_. + +On pybench and timeit microbenchmarks, the patch does not seem to add +any overhead on dictionary operations. + + +Alternatives +============ + +Add a version to each dict entry +-------------------------------- + +A single version per dictionary requires to keep a strong reference to +the value which can keep the value alive longer than expected. If we add +also a version per dictionary entry, the guard can rely on the entry +version and so avoid the strong reference to the value (only strong +references to a dictionary and key are needed). + +Changes: add a ``getversion(key)`` method to dictionary which returns +``None`` if the key doesn't exist. When a key is created or modified, +the entry version is set to the dictionary version which is incremented +at each change (create, modify, delete). + +Pseudo-code of an efficient guard to check if a dict key was modified +using ``getversion()``:: + + UNSET = object() + + class Guard: + def __init__(self, dict, key): + self.dict = dict + self.key = key + self.dict_version = dict.__version__ + self.entry_version = dict.getversion(key) + + def check(self): + """Return True if the dictionary value did not changed.""" + dict_version = self.dict.__version__ + if dict_version == self.version: + # Fast-path: avoid the dictionary lookup + return True + + # lookup in the dictionary, but get the entry version, + #not the value + entry_version = self.dict.getversion(self.key) + if entry_version == self.entry_version: + # another key was modified: + # cache the new dictionary version + self.dict_version = dict_version + return True + + return False + +This main drawback of this option is the impact on the memory footprint. +It increases the size of each dictionary entry, so the overhead depends +on the number of buckets (dictionary entries, used or unused yet). For +example, it increases the size of each dictionary entry by 8 bytes on +64-bit system if we use ``size_t``. + +In Python, the memory footprint matters and the trend is more to reduce +it. Examples: + +* `PEP 393 -- Flexible String Representation + `_ +* `PEP 412 -- Key-Sharing Dictionary + `_ + + +Add a new dict subtype +---------------------- + +Add a new ``verdict`` type, subtype of ``dict``. When guards are needed, +use the ``verdict`` for namespaces (module namespace, type namespace, +instance namespace, etc.) instead of ``dict``. + +Leave the ``dict`` type unchanged to not add any overhead (memory +footprint) when guards are not needed. + +Technical issue: a lot of C code in the wild, including CPython core, +expect the exact ``dict`` type. Issues: + +* ``exec()`` requires a ``dict`` for globals and locals. A lot of code + use ``globals={}``. It is not possible to cast the ``dict`` to a + ``dict`` subtype because the caller expects the ``globals`` parameter + to be modified (``dict`` is mutable). +* Functions call directly ``PyDict_xxx()`` functions, instead of calling + ``PyObject_xxx()`` if the object is a ``dict`` subtype +* ``PyDict_CheckExact()`` check fails on ``dict`` subtype, whereas some + functions require the exact ``dict`` type. +* ``Python/ceval.c`` does not completly supports dict subtypes for + namespaces + + +The ``exec()`` issue is a blocker issue. + +Other issues: + +* The garbage collector has a special code to "untrack" ``dict`` + instances. If a ``dict`` subtype is used for namespaces, the garbage + collector may be unable to break some reference cycles. +* Some functions have a fast-path for ``dict`` which would not be taken + for ``dict`` subtypes, and so it would make Python a little bit + slower. + + +Prior Art +========= + +Guard against changing dict during iteration +-------------------------------------------- + +In 2013, Serhiy Storchaka proposed a patch for the `issue #19332: Guard +against changing dict during iteration +`_ (mentioned above) which adds a +``size_t ma_count`` field to the ``PyDictObject`` structure. This field +is incremented when the dictionary is modified, and so is very similar +to the proposed ``dict.__version__``. + + +Cached globals+builtins lookup +------------------------------ + +In 2006, Andrea Griffini proposes a patch implementing a `Cached +globals+builtins lookup optimization `_. +The patch adds a private ``timestamp`` field to dict. + +See the thread on python-dev: `About dictionary lookup caching +`_. + + +Globals / builtins cache +------------------------ + +In 2010, Antoine Pitrou proposed a `Globals / builtins cache +`_ which adds a private +``ma_version`` field to the ``dict`` type. The patch adds a "global and +builtin cache" to functions and frames, and changes ``LOAD_GLOBAL`` and +``STORE_GLOBAL`` instructions to use the cache. + + +PySizer +------- + +`PySizer `_: a memory profiler for Python, +Google Summer of Code 2005 project by Nick Smallbone. + +This project has a patch for CPython 2.4 which adds ``key_time`` and +``value_time`` fields to dictionary entries. It uses a global +process-wide counter for dictionaries, incremented each time that a +dictionary is modified. The times are used to decide when child objects +first appeared in their parent objects. + + +Discussion +========== + +Thread on the python-ideas mailing list: `RFC: PEP: Add dict.__version__ +`_. + + +Copyright +========= + +This document has been placed in the public domain. diff --git a/pep-0510.txt b/pep-0510.txt new file mode 100644 index 000000000..bcfa7ace6 --- /dev/null +++ b/pep-0510.txt @@ -0,0 +1,213 @@ +PEP: 510 +Title: Specialized functions with guards +Version: $Revision$ +Last-Modified: $Date$ +Author: Victor Stinner +Status: Draft +Type: Standards Track +Content-Type: text/x-rst +Created: 4-January-2016 +Python-Version: 3.6 + + +Abstract +======== + +Add an API to add specialized functions with guards to functions, to +support static optimizers respecting the Python semantic. + + +Rationale +========= + +Python is hard to optimize because almost everything is mutable: builtin +functions, function code, global variables, local variables, ... can be +modified at runtime. Implement optimizations respecting the Python +semantic requires to detect when "something changes", we will call these +checks "guards". + +This PEP proposes to add a ``specialize()`` method to functions to add a +specialized functions with guards. When the function is called, the +specialized function is used if nothing changed, otherwise use the +original bytecode. + +Writing an optimizer is out of the scope of this PEP. + + +Example +======= + +Using bytecode +-------------- + +Replace ``chr(65)`` with ``"A"``:: + + import myoptimizer + + def func(): + return chr(65) + + def fast_func(): + return "A" + + func.specialize(fast_func.__code__, [myoptimizer.GuardBuiltins("chr")]) + del fast_func + + print("func(): %s" % func()) + print("#specialized: %s" % len(func.get_specialized())) + print() + + import builtins + builtins.chr = lambda obj: "mock" + + print("func(): %s" % func()) + print("#specialized: %s" % len(func.get_specialized())) + +Output:: + + func(): A + #specialized: 1 + + func(): mock + #specialized: 0 + +The hypothetical ``myoptimizer.GuardBuiltins("len")`` is a guard on the +builtin ``len()`` function and the ``len`` name in the global namespace. +The guard fails if the builtin function is replaced or if a ``len`` name +is defined in the global namespace. + +The first call returns directly the string ``"A"``. The second call +removes the specialized function because the builtin ``chr()`` function +was replaced, and executes the original bytecode + +On a microbenchmark, calling the specialized function takes 88 ns, +whereas the original bytecode takes 145 ns (+57 ns): 1.6 times as fast. + + +Using builtin function +---------------------- + +Replace a slow Python function calling ``chr(obj)`` with a direct call +to the builtin ``chr()`` function:: + + import myoptimizer + + def func(arg): + return chr(arg) + + func.specialize(chr, [myoptimizer.GuardBuiltins("chr")]) + + print("func(65): %s" % func(65)) + print("#specialized: %s" % len(func.get_specialized())) + print() + + import builtins + builtins.chr = lambda obj: "mock" + + print("func(65): %s" % func(65)) + print("#specialized: %s" % len(func.get_specialized())) + +Output:: + + func(): A + #specialized: 1 + + func(): mock + #specialized: 0 + +The first call returns directly the builtin ``chr()`` function (without +creating a Python frame). The second call removes the specialized +function because the builtin ``chr()`` function was replaced, and +executes the original bytecode. + +On a microbenchmark, calling the specialized function takes 95 ns, +whereas the original bytecode takes 155 ns (+60 ns): 1.6 times as fast. +Calling directly ``chr(65)`` takes 76 ns. + + +Python Function Call +==================== + +Pseudo-code to call a Python function having specialized functions with +guards:: + + def call_func(func, *args, **kwargs): + # by default, call the regular bytecode + code = func.__code__.co_code + specialized = func.get_specialized() + nspecialized = len(specialized) + + index = 0 + while index < nspecialized: + guard = specialized[index].guard + # pass arguments, some guards need them + check = guard(args, kwargs) + if check == 1: + # guard succeeded: we can use the specialized function + code = specialized[index].code + break + elif check == -1: + # guard will always fail: remove the specialized function + del specialized[index] + elif check == 0: + # guard failed temporarely + index += 1 + + # code can be a code object or any callable object + execute_code(code, args, kwargs) + + +Changes +======= + +* Add two new methods to functions: + + - ``specialize(code, guards: list)``: add specialized + function with guard. `code` is a code object (ex: + ``func2.__code__``) or any callable object (ex: ``len``). + The specialization can be ignored if a guard already fails. + - ``get_specialized()``: get the list of specialized functions with + guards + +* Base ``Guard`` type which can be used as parent type to implement + guards. It requires to implement a ``check()`` function, with an + optional ``first_check()`` function. API: + + * ``int check(PyObject *guard, PyObject **stack)``: return 1 on + success, 0 if the guard failed temporarely, -1 if the guard will + always fail + * ``int first_check(PyObject *guard, PyObject *func)``: return 0 on + success, -1 if the guard will always fail + +Microbenchmark on ``python3.6 -m timeit -s 'def f(): pass' 'f()'`` (best +of 3 runs): + +* Original Python: 79 ns +* Patched Python: 79 ns + +According to this microbenchmark, the changes has no overhead on calling +a Python function without specialization. + + +Behaviour +========= + +When a function code is replaced (``func.__code__ = new_code``), all +specialized functions are removed. + +When a function is serialized (by ``marshal`` or ``pickle`` for +example), specialized functions and guards are ignored (not serialized). + + +Discussion +========== + +Thread on the python-ideas mailing list: `RFC: PEP: Specialized +functions with guards +`_. + + +Copyright +========= + +This document has been placed in the public domain.