From af7f732146453d0960e28852fb0bcc364ac682e6 Mon Sep 17 00:00:00 2001 From: Yury Selivanov Date: Tue, 12 Dec 2017 12:11:31 -0500 Subject: [PATCH] Add PEP 567 -- Context Variables (#499) --- pep-0567.rst | 447 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 447 insertions(+) create mode 100644 pep-0567.rst diff --git a/pep-0567.rst b/pep-0567.rst new file mode 100644 index 000000000..36fabfa99 --- /dev/null +++ b/pep-0567.rst @@ -0,0 +1,447 @@ +PEP: 567 +Title: Context Variables +Version: $Revision$ +Last-Modified: $Date$ +Author: Yury Selivanov +Status: Draft +Type: Standards Track +Content-Type: text/x-rst +Created: 12-Dec-2017 +Python-Version: 3.7 +Post-History: 12-Dec-2017 + + +Abstract +======== + +This PEP proposes the new ``contextvars`` module and a set of new +CPython C APIs to support context variables. This concept is +similar to thread-local variables but, unlike TLS, it allows +correctly keeping track of values per asynchronous task, e.g. +``asyncio.Task``. + +This proposal builds directly upon concepts originally introduced +in :pep:`550`. The key difference is that this PEP is only concerned +with solving the case for asynchronous tasks, and not generators. +There are no proposed modifications to any built-in types or to the +interpreter. + + +Rationale +========= + +Thread-local variables are insufficient for asynchronous tasks which +execute concurrently in the same OS thread. Any context manager that +needs to save and restore a context value and uses +``threading.local()``, will have its context values bleed to other +code unexpectedly when used in async/await code. + +A few examples where having a working context local storage for +asynchronous code is desired: + +* Context managers like decimal contexts and ``numpy.errstate``. + +* Request-related data, such as security tokens and request + data in web applications, language context for ``gettext`` etc. + +* Profiling, tracing, and logging in large code bases. + + +Introduction +============ + +The PEP proposes a new mechanism for managing context variables. +The key classes involved in this mechanism are ``contextvars.Context`` +and ``contextvars.ContextVar``. The PEP also proposes some policies +for using the mechanism around asynchronous tasks. + +The proposed mechanism for accessing context variables uses the +``ContextVar`` class. A module (such as decimal) that wishes to +store a context variable should: + +* declare a module-global variable holding a ``ContextVar`` to + serve as a "key"; + +* access the current value via the ``get()`` method on the + key variable; + +* modify the current value via the ``set()`` method on the + key variable. + +The notion of "current value" deserves special consideration: +different asynchronous tasks that exist and execute concurrently +may have different values. This idea is well-known from thread-local +storage but in this case the locality of the value is not always +necessarily to a thread. Instead, there is the notion of the +"current ``Context``" which is stored in thread-local storage, and +is accessed via ``contextvars.get_context()`` function. +Manipulation of the current ``Context`` is the responsibility of the +task framework, e.g. asyncio. + +A ``Context`` is conceptually a mapping, implemented using an +immutable dictionary. The ``ContextVar.get()`` method does a +lookup in the current ``Context`` with ``self`` as a key, raising a +``LookupError`` or returning a default value specified in +the constructor. + +The ``ContextVar.set(value)`` method clones the current ``Context``, +assigns the ``value`` to it with ``self`` as a key, and sets the +new ``Context`` as a new current. Because ``Context`` uses an +immutable dictionary, cloning it is O(1). + + +Specification +============= + +A new standard library module ``contextvars`` is added with the +following APIs: + +1. ``get_context() -> Context`` function is used to get the current + ``Context`` object for the current OS thread. + +2. ``ContextVar`` class to declare and access context variables. + +3. ``Context`` class encapsulates context state. Every OS thread + stores a reference to its current ``Context`` instance. + It is not possible to control that reference manually. + Instead, the ``Context.run(callable, *args)`` method is used to run + Python code in another context. + + +contextvars.ContextVar +---------------------- + +The ``ContextVar`` class has the following constructor signature: +``ContextVar(name, *, default=no_default)``. The ``name`` parameter +is used only for introspection and debug purposes. The ``default`` +parameter is optional. Example:: + + # Declare a context variable 'var' with the default value 42. + var = ContextVar('var', default=42) + +``ContextVar.get()`` returns a value for context variable from the +current ``Context``:: + + # Get the value of `var`. + var.get() + +``ContextVar.set(value) -> Token`` is used to set a new value for +the context variable in the current ``Context``:: + + # Set the variable 'var' to 1 in the current context. + var.set(1) + +``contextvars.Token`` is an opaque object that should be used to +restore the ``ContextVar`` to its previous value, or remove it from +the context if it was not set before. The ``ContextVar.reset(Token)`` +is used for that:: + + old = var.set(1) + try: + ... + finally: + var.reset(old) + +The ``Token`` API exists to make the current proposal forward +compatible with :pep:`550`, in case there is demand to support +context variables in generators and asynchronous generators in the +future. + +``ContextVar`` design allows for a fast implementation of +``ContextVar.get()``, which is particularly important for modules +like ``decimal`` an ``numpy``. + + +contextvars.Context +------------------- + +``Context`` objects are mappings of ``ContextVar``s to values. + +To get the current ``Context`` for the current OS thread, use +``contextvars.get_context()`` method:: + + ctx = contextvars.get_context() + +To run Python code in some ``Context``, use ``Context.run()`` +method:: + + ctx.run(function) + +Any changes to any context variables that ``function`` causes, will +be contained in the ``ctx`` context:: + + var = ContextVar('var') + var.set('spam') + + def function(): + assert var.get() == 'spam' + + var.set('ham') + assert var.get() == 'ham' + + ctx = get_context() + ctx.run(function) + + assert var.get('spam') + +Any changes to the context will be contained and persisted in the +``Context`` object on which ``run()`` is called on. + +``Context`` objects implement the ``collections.abc.Mapping`` ABC. +This can be used to introspect context objects:: + + ctx = contextvars.get_context() + + # Print all context variables in their values in 'ctx': + print(ctx.items()) + + # Print the value of 'some_variable' in context 'ctx': + print(ctx[some_variable]) + + +asyncio +------- + +``asyncio`` uses ``Loop.call_soon()``, ``Loop.call_later()``, +and ``Loop.call_at()`` to schedule the asynchronous execution of a +function. ``asyncio.Task`` uses ``call_soon()`` to run the +wrapped coroutine. + +We modify ``Loop.call_{at,later,soon}`` to accept the new +optional *context* keyword-only argument, which defaults to +the current context:: + + def call_soon(self, callback, *args, context=None): + if context is None: + context = contextvars.get_context() + + # ... some time later + context.run(callback, *args) + +Tasks in asyncio need to maintain their own isolated context. +``asyncio.Task`` is modified as follows:: + + class Task: + def __init__(self, coro): + ... + # Get the current context snapshot. + self._context = contextvars.get_context() + self._loop.call_soon(self._step, context=self._context) + + def _step(self, exc=None): + ... + # Every advance of the wrapped coroutine is done in + # the task's context. + self._loop.call_soon(self._step, context=self._context) + ... + + +CPython C API +------------- + +TBD + + +Implementation +============== + +This section explains high-level implementation details in +pseudo-code. Some optimizations are omitted to keep this section +short and clear. + +The internal immutable dictionary for ``Context`` is implemented +using Hash Array Mapped Tries (HAMT). They allow for O(log N) ``set`` +operation, and for O(1) ``get_context()`` function. For the purposes +of this section, we implement an immutable dictionary using +``dict.copy()``:: + + class _ContextData: + + def __init__(self): + self.__mapping = dict() + + def get(self, key): + return self.__mapping[key] + + def set(self, key, value): + copy = _ContextData() + copy.__mapping = self.__mapping.copy() + copy.__mapping[key] = value + return copy + + def delete(self, key): + copy = _ContextData() + copy.__mapping = self.__mapping.copy() + del copy.__mapping[key] + return copy + +Every OS thread has a reference to the current ``_ContextData``. +``PyThreadState`` is updated with a new ``context_data`` field that +points to a ``_ContextData`` object:: + + PyThreadState: + context : _ContextData + +``contextvars.get_context()`` is implemented as follows: + + def get_context(): + ts : PyThreadState = PyThreadState_Get() + + if ts.context_data is None: + ts.context_data = _ContextData() + + ctx = Context() + ctx.__data = ts.context_data + return ctx + +``contextvars.Context`` is a wrapper around ``_ContextData``:: + + class Context(collections.abc.Mapping): + + def __init__(self): + self.__data = _ContextData() + + def run(self, callable, *args): + ts : PyThreadState = PyThreadState_Get() + saved_data : _ContextData = ts.context_data + + try: + ts.context_data = self.__data + callable(*args) + finally: + self.__data = ts.context_data + ts.context_data = saved_data + + # Mapping API methods are implemented by delegating + # `get()` and other Mapping calls to `self.__data`. + +``contextvars.ContextVar`` interacts with +``PyThreadState.context_data`` directly:: + + class ContextVar: + + def __init__(self, name, *, default=NO_DEFAULT): + self.__name = name + self.__default = default + + @property + def name(self): + return self.__name + + def get(self, default=NO_DEFAULT): + ts : PyThreadState = PyThreadState_Get() + data : _ContextData = ts.context_data + + try: + return data.get(self) + except KeyError: + pass + + if default is not NO_DEFAULT: + return default + + if self.__default is not NO_DEFAULT: + return self.__default + + raise LookupError + + def set(self, value): + ts : PyThreadState = PyThreadState_Get() + data : _ContextData = ts.context_data + + try: + old_value = data.get(self) + except KeyError: + old_value = NO_VALUE + + ts.context_data = data.set(self, value) + return Token(self, old_value) + + def reset(self, token): + if token.__used: + return + + if token.__old_value is NO_VALUE: + ts.context_data = data.delete(token.__var) + else: + ts.context_data = data.set(token.__var, + token.__old_value) + + token.__used = True + + + class Token: + + def __init__(self, var, old_value): + self.__var = var + self.__old_value = old_value + self.__used = False + + +Backwards Compatibility +======================= + +This proposal preserves 100% backwards compatibility. + +Libraries that use ``threading.local()`` to store context-related +values, currently work correctly only for synchronous code. Switching +them to use the proposed API will keep their behavior for synchronous +code unmodified, but will automatically enable support for +asynchronous code. + + +Appendix: HAMT Performance Analysis +=================================== + +.. figure:: pep-0550-hamt_vs_dict-v2.png + :align: center + :width: 100% + + Figure 1. Benchmark code can be found here: [1]_. + +The above chart demonstrates that: + +* HAMT displays near O(1) performance for all benchmarked + dictionary sizes. + +* ``dict.copy()`` becomes very slow around 100 items. + +.. figure:: pep-0550-lookup_hamt.png + :align: center + :width: 100% + + Figure 2. Benchmark code can be found here: [2]_. + +Figure 2 compares the lookup costs of ``dict`` versus a HAMT-based +immutable mapping. HAMT lookup time is 30-40% slower than Python dict +lookups on average, which is a very good result, considering that the +latter is very well optimized. + +The reference implementation of HAMT for CPython can be found here: +[3]_. + + +References +========== + +.. [1] https://gist.github.com/1st1/9004813d5576c96529527d44c5457dcd + +.. [2] https://gist.github.com/1st1/dbe27f2e14c30cce6f0b5fddfc8c437e + +.. [3] https://github.com/1st1/cpython/tree/hamt + + +Copyright +========= + +This document has been placed in the public domain. + + +.. + Local Variables: + mode: indented-text + indent-tabs-mode: nil + sentence-end-double-space: t + fill-column: 70 + coding: utf-8 + End: