PEP: 550 Title: Execution Context Version: $Revision$ Last-Modified: $Date$ Author: Yury Selivanov , Elvis Pranskevichus Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 11-Aug-2017 Python-Version: 3.7 Post-History: 11-Aug-2017, 15-Aug-2017, 18-Aug-2017, 25-Aug-2017 Abstract ======== This PEP adds a new generic mechanism of ensuring consistent access to non-local state in the context of out-of-order execution, such as in Python generators and coroutines. Thread-local storage, such as ``threading.local()``, is inadequate for programs that execute concurrently in the same OS thread. This PEP proposes a solution to this problem. Rationale ========= Prior to the advent of asynchronous programming in Python, programs used OS threads to achieve concurrency. The need for thread-specific state was solved by ``threading.local()`` and its C-API equivalent, ``PyThreadState_GetDict()``. A few examples of where Thread-local storage (TLS) is commonly relied upon: * Context managers like decimal contexts, ``numpy.errstate``, and ``warnings.catch_warnings``. * Request-related data, such as security tokens and request data in web applications, language context for ``gettext`` etc. * Profiling, tracing, and logging in large code bases. Unfortunately, TLS does not work well for programs which execute concurrently in a single thread. A Python generator is the simplest example of a concurrent program. Consider the following:: def fractions(precision, x, y): with decimal.localcontext() as ctx: ctx.prec = precision yield Decimal(x) / Decimal(y) yield Decimal(x) / Decimal(y**2) g1 = fractions(precision=2, x=1, y=3) g2 = fractions(precision=6, x=2, y=3) items = list(zip(g1, g2)) The expected value of ``items`` is:: [(Decimal('0.33'), Decimal('0.666667')), (Decimal('0.11'), Decimal('0.222222'))] Rather surprisingly, the actual result is:: [(Decimal('0.33'), Decimal('0.666667')), (Decimal('0.111111'), Decimal('0.222222'))] This is because Decimal context is stored as a thread-local, so concurrent iteration of the ``fractions()`` generator would corrupt the state. A similar problem exists with coroutines. Applications also often need to associate certain data with a given thread of execution. For example, a web application server commonly needs access to the current HTTP request object. The inadequacy of TLS in asynchronous code has lead to the proliferation of ad-hoc solutions, which are limited in scope and do not support all required use cases. The current status quo is that any library (including the standard library), which relies on TLS, is likely to be broken when used in asynchronous code or with generators (see [3]_ as an example issue.) Some languages, that support coroutines or generators, recommend passing the context manually as an argument to every function, see [1]_ for an example. This approach, however, has limited use for Python, where there is a large ecosystem that was built to work with a TLS-like context. Furthermore, libraries like ``decimal`` or ``numpy`` rely on context implicitly in overloaded operator implementations. The .NET runtime, which has support for async/await, has a generic solution for this problem, called ``ExecutionContext`` (see [2]_). Goals ===== The goal of this PEP is to provide a more reliable ``threading.local()`` alternative, which: * provides the mechanism and the API to fix non-local state issues with coroutines and generators; * has no or negligible performance impact on the existing code or the code that will be using the new mechanism, including libraries like ``decimal`` and ``numpy``. High-Level Specification ======================== The full specification of this PEP is broken down into three parts: * High-Level Specification (this section): the description of the overall solution. We show how it applies to generators and coroutines in user code, without delving into implementation details. * Detailed Specification: the complete description of new concepts, APIs, and related changes to the standard library. * Implementation Details: the description and analysis of data structures and algorithms used to implement this PEP, as well as the necessary changes to CPython. For the purpose of this section, we define *execution context* as an opaque container of non-local state that allows consistent access to its contents in the concurrent execution environment. A *context variable* is an object representing a value in the execution context. A new context variable is created by calling the ``new_context_var()`` function. A context variable object has two methods: * ``lookup()``: returns the value of the variable in the current execution context; * ``set()``: sets the value of the variable in the current execution context. Regular Single-threaded Code ---------------------------- In regular, single-threaded code that doesn't involve generators or coroutines, context variables behave like globals:: var = new_context_var() def sub(): assert var.lookup() == 'main' var.set('sub') def main(): var.set('main') sub() assert var.lookup() == 'sub' Multithreaded Code ------------------ In multithreaded code, context variables behave like thread locals:: var = new_context_var() def sub(): assert var.lookup() is None # The execution context is empty # for each new thread. var.set('sub') def main(): var.set('main') thread = threading.Thread(target=sub) thread.start() thread.join() assert var.lookup() == 'main' Generators ---------- In generators, changes to context variables are local and are not visible to the caller, but are visible to the code called by the generator. Once set in the generator, the context variable is guaranteed not to change between iterations:: var = new_context_var() def gen(): var.set('gen') assert var.lookup() == 'gen' yield 1 assert var.lookup() == 'gen' yield 2 def main(): var.set('main') g = gen() next(g) assert var.lookup() == 'main' var.set('main modified') next(g) assert var.lookup() == 'main modified' Changes to caller's context variables are visible to the generator (unless they were also modified inside the generator):: var = new_context_var() def gen(): assert var.lookup() == 'var' yield 1 assert var.lookup() == 'var modified' yield 2 def main(): g = gen() var.set('var') next(g) var.set('var modified') next(g) Now, let's revisit the decimal precision example from the `Rationale`_ section, and see how the execution context can improve the situation:: import decimal decimal_prec = new_context_var() # create a new context variable # Pre-PEP 550 Decimal relies on TLS for its context. # This subclass switches the decimal context storage # to the execution context for illustration purposes. # class MyDecimal(decimal.Decimal): def __init__(self, value="0"): prec = decimal_prec.lookup() if prec is None: raise ValueError('could not find decimal precision') context = decimal.Context(prec=prec) super().__init__(value, context=context) def fractions(precision, x, y): # Normally, this would be set by a context manager, # but for simplicity we do this directly. decimal_prec.set(precision) yield MyDecimal(x) / MyDecimal(y) yield MyDecimal(x) / MyDecimal(y**2) g1 = fractions(precision=2, x=1, y=3) g2 = fractions(precision=6, x=2, y=3) items = list(zip(g1, g2)) The value of ``items`` is:: [(Decimal('0.33'), Decimal('0.666667')), (Decimal('0.11'), Decimal('0.222222'))] which matches the expected result. Coroutines and Asynchronous Tasks --------------------------------- In coroutines, like in generators, context variable changes are local and are not visible to the caller:: import asyncio var = new_context_var() async def sub(): assert var.lookup() == 'main' var.set('sub') assert var.lookup() == 'sub' async def main(): var.set('main') await sub() assert var.lookup() == 'main' loop = asyncio.get_event_loop() loop.run_until_complete(main()) To establish the full semantics of execution context in couroutines, we must also consider *tasks*. A task is the abstraction used by *asyncio*, and other similar libraries, to manage the concurrent execution of coroutines. In the example above, a task is created implicitly by the ``run_until_complete()`` function. ``asyncio.wait_for()`` is another example of implicit task creation:: async def sub(): await asyncio.sleep(1) assert var.lookup() == 'main' async def main(): var.set('main') # waiting for sub() directly await sub() # waiting for sub() with a timeout await asyncio.wait_for(sub(), timeout=2) var.set('main changed') Intuitively, we expect the assertion in ``sub()`` to hold true in both invocations, even though the ``wait_for()`` implementation actually spawns a task, which runs ``sub()`` concurrently with ``main()``. Thus, tasks **must** capture a snapshot of the current execution context at the moment of their creation and use it to execute the wrapped coroutine whenever that happens. If this is not done, then innocuous looking changes like wrapping a coroutine in a ``wait_for()`` call would cause surprising breakage. This leads to the following:: import asyncio var = new_context_var() async def sub(): # Sleeping will make sub() run after # `var` is modified in main(). await asyncio.sleep(1) assert var.lookup() == 'main' async def main(): var.set('main') loop.create_task(sub()) # schedules asynchronous execution # of sub(). assert var.lookup() == 'main' var.set('main changed') loop = asyncio.get_event_loop() loop.run_until_complete(main()) In the above code we show how ``sub()``, running in a separate task, sees the value of ``var`` as it was when ``loop.create_task(sub())`` was called. Like tasks, the intuitive behaviour of callbacks scheduled with either ``Loop.call_soon()``, ``Loop.call_later()``, or ``Future.add_done_callback()`` is to also capture a snapshot of the current execution context at the point of scheduling, and use it to run the callback:: current_request = new_context_var() def log_error(e): logging.error('error when handling request %r', current_request.lookup()) async def render_response(): ... async def handle_get_request(request): current_request.set(request) try: return await render_response() except Exception as e: get_event_loop().call_soon(log_error, e) return '500 - Internal Server Error' Detailed Specification ====================== Conceptually, an *execution context* (EC) is a stack of logical contexts. There is one EC per Python thread. A *logical context* (LC) is a mapping of context variables to their values in that particular LC. A *context variable* is an object representing a value in the execution context. A new context variable object is created by calling the ``sys.new_context_var(name: str)`` function. The value of the ``name`` argument is not used by the EC machinery, but may be used for debugging and introspection. The context variable object has the following methods and attributes: * ``name``: the value passed to ``new_context_var()``. * ``lookup()``: traverses the execution context top-to-bottom, until the variable value is found. Returns ``None``, if the variable is not present in the execution context; * ``set()``: sets the value of the variable in the topmost logical context. Generators ---------- When created, each generator object has an empty logical context object stored in its ``__logical_context__`` attribute. This logical context is pushed onto the execution context at the beginning of each generator iteration and popped at the end:: var1 = sys.new_context_var('var1') var2 = sys.new_context_var('var2') def gen(): var1.set('var1-gen') var2.set('var2-gen') # EC = [ # outer_LC(), # gen_LC({var1: 'var1-gen', var2: 'var2-gen'}) # ] n = nested_gen() # nested_gen_LC is created next(n) # EC = [ # outer_LC(), # gen_LC({var1: 'var1-gen', var2: 'var2-gen'}) # ] var1.set('var1-gen-mod') var2.set('var2-gen-mod') # EC = [ # outer_LC(), # gen_LC({var1: 'var1-gen-mod', var2: 'var2-gen-mod'}) # ] next(n) def nested_gen(): # EC = [ # outer_LC(), # gen_LC({var1: 'var1-gen', var2: 'var2-gen'}), # nested_gen_LC() # ] assert var1.lookup() == 'var1-gen' assert var2.lookup() == 'var2-gen' var1.set('var1-nested-gen') # EC = [ # outer_LC(), # gen_LC({var1: 'var1-gen', var2: 'var2-gen'}), # nested_gen_LC({var1: 'var1-nested-gen'}) # ] yield # EC = [ # outer_LC(), # gen_LC({var1: 'var1-gen-mod', var2: 'var2-gen-mod'}), # nested_gen_LC({var1: 'var1-nested-gen'}) # ] assert var1.lookup() == 'var1-nested-gen' assert var2.lookup() == 'var2-gen-mod' yield # EC = [outer_LC()] g = gen() # gen_LC is created for the generator object `g` list(g) # EC = [outer_LC()] The snippet above shows the state of the execution context stack throughout the generator lifespan. contextlib.contextmanager ------------------------- Earlier, we've used the following example:: import decimal # create a new context variable decimal_prec = sys.new_context_var('decimal_prec') # ... def fractions(precision, x, y): decimal_prec.set(precision) yield MyDecimal(x) / MyDecimal(y) yield MyDecimal(x) / MyDecimal(y**2) Let's extend it by adding a context manager:: @contextlib.contextmanager def precision_context(prec): old_rec = decimal_prec.lookup() try: decimal_prec.set(prec) yield finally: decimal_prec.set(old_prec) Unfortunately, this would not work straight away, as the modification to the ``decimal_prec`` variable is contained to the ``precision_context()`` generator, and therefore will not be visible inside the ``with`` block:: def fractions(precision, x, y): # EC = [{}, {}] with precision_context(precision): # EC becomes [{}, {}, {decimal_prec: precision}] in the # *precision_context()* generator, # but here the EC is still [{}, {}] # raises ValueError('could not find decimal precision')! yield MyDecimal(x) / MyDecimal(y) yield MyDecimal(x) / MyDecimal(y**2) The way to fix this is to set the generator's ``__logical_context__`` attribute to ``None``. This will cause the generator to avoid modifying the execution context stack. We modify the ``contextlib.contextmanager()`` decorator to set ``genobj.__logical_context__`` to ``None`` to produce well-behaved context managers:: def fractions(precision, x, y): # EC = [{}, {}] with precision_context(precision): # EC = [{}, {decimal_prec: precision}] yield MyDecimal(x) / MyDecimal(y) yield MyDecimal(x) / MyDecimal(y**2) # EC becomes [{}, {decimal_prec: None}] asyncio ------- ``asyncio`` uses ``Loop.call_soon``, ``Loop.call_later``, and ``Loop.call_at`` to schedule the asynchronous execution of a function. ``asyncio.Task`` uses ``call_soon()`` to further the execution of the wrapped coroutine. We modify ``Loop.call_{at,later,soon}`` to accept the new optional *execution_context* keyword argument, which defaults to the copy of the current execution context:: def call_soon(self, callback, *args, execution_context=None): if execution_context is None: execution_context = sys.get_execution_context() # ... some time later sys.run_with_execution_context( execution_context, callback, args) The ``sys.get_execution_context()`` function returns a shallow copy of the current execution context. By shallow copy here we mean such a new execution context that: * lookups in the copy provide the same results as in the original execution context, and * any changes in the original execution context do not affect the copy, and * any changes to the copy do not affect the original execution context. Either of the following satisfy the copy requirements: * a new stack with shallow copies of logical contexts; * a new stack with one squashed logical context. The ``sys.run_with_execution_context(ec, func, *args, **kwargs)`` function runs ``func(*args, **kwargs)`` with *ec* as the execution context. The function performs the following steps: 1. Set *ec* as the current execution context stack in the current thread. 2. Push an empty logical context onto the stack. 3. Run ``func(*args, **kwargs)``. 4. Pop the logical context from the stack. 5. Restore the original execution context stack. 6. Return or raise the ``func()`` result. These steps ensure that *ec* cannot be modified by *func*, which makes ``run_with_execution_context()`` idempotent. ``asyncio.Task`` is modified as follows:: class Task: def __init__(self, coro): ... # Get the current execution context snapshot. self._exec_context = sys.get_execution_context() self._loop.call_soon( self._step, execution_context=self._exec_context) def _step(self, exc=None): ... self._loop.call_soon( self._step, execution_context=self._exec_context) ... Generators Transformed into Iterators ------------------------------------- Any Python generator can be represented as an equivalent iterator. Compilers like Cython rely on this axiom. With respect to the execution context, such iterator should behave the same way as the generator it represents. This means that there needs to be a Python API to create new logical contexts and run code with a given logical context. The ``sys.new_logical_context()`` function creates a new empty logical context. The ``sys.run_with_logical_context(lc, func, *args, **kwargs)`` function can be used to run functions in the specified logical context. The *lc* can be modified as a result of the call. The ``sys.run_with_logical_context()`` function performs the following steps: 1. Push *lc* onto the current execution context stack. 2. Run ``func(*args, **kwargs)``. 3. Pop *lc* from the execution context stack. 4. Return or raise the ``func()`` result. By using ``new_logical_context()`` and ``run_with_logical_context()``, we can replicate the generator behaviour like this:: class Generator: def __init__(self): self.logical_context = sys.new_logical_context() def __iter__(self): return self def __next__(self): return sys.run_with_logical_context( self.logical_context, self._next_impl) def _next_impl(self): # Actual __next__ implementation. ... Let's see how this pattern can be applied to a real generator:: # create a new context variable decimal_prec = sys.new_context_var('decimal_precision') def gen_series(n, precision): decimal_prec.set(precision) for i in range(1, n): yield MyDecimal(i) / MyDecimal(3) # gen_series is equivalent to the following iterator: class Series: def __init__(self, n, precision): # Create a new empty logical context on creation, # like the generators do. self.logical_context = sys.new_logical_context() # run_with_logical_context() will pushes # self.logical_context onto the execution context stack, # runs self._next_impl, and pops self.logical_context # from the stack. return sys.run_with_logical_context( self.logical_context, self._init, n, precision) def _init(self, n, precision): self.i = 1 self.n = n decimal_prec.set(precision) def __iter__(self): return self def __next__(self): return sys.run_with_logical_context( self.logical_context, self._next_impl) def _next_impl(self): decimal_prec.set(self.precision) result = MyDecimal(self.i) / MyDecimal(3) self.i += 1 return result For regular iterators such approach to logical context management is normally not necessary, and it is recommended to set and restore context variables directly in ``__next__``:: class Series: def __next__(self): old_prec = decimal_prec.lookup() try: decimal_prec.set(self.precision) ... finally: decimal_prec.set(old_prec) Asynchronous Generators ----------------------- The execution context semantics in asynchronous generators does not differ from that of regular generators and coroutines. Implementation ============== Execution context is implemented as an immutable linked list of logical contexts, where each logical context is an immutable weak key mapping. A pointer to the currently active execution context is stored in the OS thread state:: +-----------------+ | | ec | PyThreadState +-------------+ | | | +-----------------+ | | ec_node ec_node ec_node v +------+------+ +------+------+ +------+------+ | NULL | lc |<----| prev | lc |<----| prev | lc | +------+--+---+ +------+--+---+ +------+--+---+ | | | LC v LC v LC v +-------------+ +-------------+ +-------------+ | var1: obj1 | | EMPTY | | var1: obj4 | | var2: obj2 | +-------------+ +-------------+ | var3: obj3 | +-------------+ The choice of the immutable list of immutable mappings as a fundamental data structure is motivated by the need to efficiently implement ``sys.get_execution_context()``, which is to be frequently used by asynchronous tasks and callbacks. When the EC is immutable, ``get_execution_context()`` can simply copy the current execution context *by reference*:: def get_execution_context(self): return PyThreadState_Get().ec Let's review all possible context modification scenarios: * The ``ContextVariable.set()`` method is called:: def ContextVar_set(self, val): # See a more complete set() definition # in the `Context Variables` section. tstate = PyThreadState_Get() top_ec_node = tstate.ec top_lc = top_ec_node.lc new_top_lc = top_lc.set(self, val) tstate.ec = ec_node( prev=top_ec_node.prev, lc=new_top_lc) * The ``sys.run_with_logical_context()`` is called, in which case the passed logical context object is appended to the execution context:: def run_with_logical_context(lc, func, *args, **kwargs): tstate = PyThreadState_Get() old_top_ec_node = tstate.ec new_top_ec_node = ec_node(prev=old_top_ec_node, lc=lc) try: tstate.ec = new_top_ec_node return func(*args, **kwargs) finally: tstate.ec = old_top_ec_node * The ``sys.run_with_execution_context()`` is called, in which case the current execution context is set to the passed execution context with a new empty logical context appended to it:: def run_with_execution_context(ec, func, *args, **kwargs): tstate = PyThreadState_Get() old_top_ec_node = tstate.ec new_lc = sys.new_logical_context() new_top_ec_node = ec_node(prev=ec, lc=new_lc) try: tstate.ec = new_top_ec_node return func(*args, **kwargs) finally: tstate.ec = old_top_ec_node * Either ``genobj.send()``, ``genobj.throw()``, ``genobj.close()`` are called on a ``genobj`` generator, in which case the logical context recorded in ``genobj`` is pushed onto the stack:: PyGen_New(PyGenObject *gen): gen.__logical_context__ = sys.new_logical_context() gen_send(PyGenObject *gen, ...): tstate = PyThreadState_Get() if gen.__logical_context__ is not None: old_top_ec_node = tstate.ec new_top_ec_node = ec_node( prev=old_top_ec_node, lc=gen.__logical_context__) try: tstate.ec = new_top_ec_node return _gen_send_impl(gen, ...) finally: gen.__logical_context__ = tstate.ec.lc tstate.ec = old_top_ec_node else: return _gen_send_impl(gen, ...) * Coroutines and asynchronous generators share the implementation with generators, and the above changes apply to them as well. In certain scenarios the EC may need to be squashed to limit the size of the chain. For example, consider the following corner case:: async def repeat(coro, delay): await coro() await asyncio.sleep(delay) loop.create_task(repeat(coro, delay)) async def ping(): print('ping') loop = asyncio.get_event_loop() loop.create_task(repeat(ping, 1)) loop.run_forever() In the above code, the EC chain will grow as long as ``repeat()`` is called. Each new task will call ``sys.run_in_execution_context()``, which will append a new logical context to the chain. To prevent unbounded growth, ``sys.get_execution_context()`` checks if the chain is longer than a predetermined maximum, and if it is, squashes the chain into a single LC:: def get_execution_context(): tstate = PyThreadState_Get() if tstate.ec_len > EC_LEN_MAX: squashed_lc = sys.new_logical_context() ec_node = tstate.ec while ec_node: # The LC.merge() method does not replace existing keys. squashed_lc = squashed_lc.merge(ec_node.lc) ec_node = ec_node.prev return ec_node(prev=NULL, lc=squashed_lc) else: return tstate.ec Logical Context --------------- Logical context is an immutable weak key mapping which has the following properties with respect to garbage collection: * ``ContextVar`` objects are strongly-referenced only from the application code, not from any of the Execution Context machinery or values they point to. This means that there are no reference cycles that could extend their lifespan longer than necessary, or prevent their collection by the GC. * Values put in the Execution Context are guaranteed to be kept alive while there is a ``ContextVar`` key referencing them in the thread. * If a ``ContextVar`` is garbage collected, all of its values will be removed from all contexts, allowing them to be GCed if needed. * If a thread has ended its execution, its thread state will be cleaned up along with its ``ExecutionContext``, cleaning up all values bound to all context variables in the thread. As discussed earluier, we need ``sys.get_execution_context()`` to be consistently fast regardless of the size of the execution context, so logical context is necessarily an immutable mapping. Choosing ``dict`` for the underlying implementation is suboptimal, because ``LC.set()`` will cause ``dict.copy()``, which is an O(N) operation, where *N* is the number of items in the LC. ``get_execution_context()``, when squashing the EC, is a O(M) operation, where *M* is the total number of context variable values in the EC. So, instead of ``dict``, we choose Hash Array Mapped Trie (HAMT) as the underlying implementation of logical contexts. (Scala and Clojure use HAMT to implement high performance immutable collections [5]_, [6]_.) With HAMT ``.set()`` becomes an O(log N) operation, and ``get_execution_context()`` squashing is more efficient on average due to structural sharing in HAMT. See `Appendix: HAMT Performance Analysis`_ for a more elaborate analysis of HAMT performance compared to ``dict``. Context Variables ----------------- The ``ContextVar.lookup()`` and ``ContextVar.set()`` methods are implemented as follows (in pseudo-code):: class ContextVar: def lookup(self): tstate = PyThreadState_Get() ec_node = tstate.ec while ec_node: if self in ec_node.lc: return ec_node.lc[self] ec_node = ec_node.prev return None def set(self, value): tstate = PyThreadState_Get() top_ec_node = tstate.ec if top_ec_node is not None: top_lc = top_ec_node.lc new_top_lc = top_lc.set(self, value) tstate.ec = ec_node( prev=top_ec_node.prev, lc=new_top_lc) else: top_lc = sys.new_logical_context() new_top_lc = top_lc.set(self, value) tstate.ec = ec_node( prev=NULL, lc=new_top_lc) For efficient access in performance-sensitive code paths, such as in ``numpy`` and ``decimal``, we add a cache to ``ContextVar.get()``, making it an O(1) operation when the cache is hit. The cache key is composed from the following: * The new ``uint64_t PyThreadState->unique_id``, which is a globally unique thread state identifier. It is computed from the new ``uint64_t PyInterpreterState->ts_counter``, which is incremented whenever a new thread state is created. * The ``uint64_t ContextVar->version`` counter, which is incremented whenever the context variable value is changed in any logical context in any thread. The cache is then implemented as follows:: class ContextVar: def set(self, value): ... # implementation self.version += 1 def lookup(self): tstate = PyThreadState_Get() if (self.last_tstate_id == tstate.unique_id and self.last_version == self.version): return self.last_value value = self._lookup_uncached() self.last_value = value # borrowed ref self.last_tstate_id = tstate.unique_id self.last_version = self.version return value Note that ``last_value`` is a borrowed reference. The assumption is that if the version checks are fine, the object will be alive. This allows the values of context variables to be properly garbage collected. This generic caching approach is similar to what the current C implementation of ``decimal`` does to cache the the current decimal context, and has similar performance characteristics. Performance Considerations ========================== Tests of the reference implementation based on the prior revisions of this PEP have shown 1-2% slowdown on generator microbenchmarks and no noticeable difference in macrobenchmarks. The performance of non-generator and non-async code is not affected by this PEP. Summary of the New APIs ======================= Python ------ The following new Python APIs are introduced by this PEP: 1. The ``sys.new_context_var(name: str='...')`` function to create ``ContextVar`` objects. 2. The ``ContextVar`` object, which has: * the read-only ``.name`` attribute, * the ``.lookup()`` method which returns the value of the variable in the current execution context; * the ``.set()`` method which sets the value of the variable in the current execution context. 3. The ``sys.get_execution_context()`` function, which returns a copy of the current execution context. 4. The ``sys.new_execution_context()`` function, which returns a new empty execution context. 5. The ``sys.new_logical_context()`` function, which returns a new empty logical context. 6. The ``sys.run_with_execution_context(ec: ExecutionContext, func, *args, **kwargs)`` function, which runs *func* with the provided execution context. 7. The ``sys.run_with_logical_context(lc:LogicalContext, func, *args, **kwargs)`` function, which runs *func* with the provided logical context on top of the current execution context. C API ----- 1. ``PyContextVar * PyContext_NewVar(char *desc)``: create a ``PyContextVar`` object. 2. ``PyObject * PyContext_LookupVar(PyContextVar *)``: return the value of the variable in the current execution context. 3. ``int PyContext_SetVar(PyContextVar *, PyObject *)``: set the value of the variable in the current execution context. 4. ``PyLogicalContext * PyLogicalContext_New()``: create a new empty ``PyLogicalContext``. 5. ``PyLogicalContext * PyExecutionContext_New()``: create a new empty ``PyExecutionContext``. 6. ``PyExecutionContext * PyExecutionContext_Get()``: return the current execution context. 7. ``int PyExecutionContext_Set(PyExecutionContext *)``: set the passed EC object as the current for the active thread state. 8. ``int PyExecutionContext_SetWithLogicalContext(PyExecutionContext *, PyLogicalContext *)``: allows to implement ``sys.run_with_logical_context`` Python API. Design Considerations ===================== Should ``PyThreadState_GetDict()`` use the execution context? ------------------------------------------------------------- No. ``PyThreadState_GetDict`` is based on TLS, and changing its semantics will break backwards compatibility. PEP 521 ------- :pep:`521` proposes an alternative solution to the problem, which extends the context manager protocol with two new methods: ``__suspend__()`` and ``__resume__()``. Similarly, the asynchronous context manager protocol is also extended with ``__asuspend__()`` and ``__aresume__()``. This allows implementing context managers that manage non-local state, which behave correctly in generators and coroutines. For example, consider the following context manager, which uses execution state:: class Context: def __init__(self): self.var = new_context_var('var') def __enter__(self): self.old_x = self.var.lookup() self.var.set('something') def __exit__(self, *err): self.var.set(self.old_x) An equivalent implementation with PEP 521:: local = threading.local() class Context: def __enter__(self): self.old_x = getattr(local, 'x', None) local.x = 'something' def __suspend__(self): local.x = self.old_x def __resume__(self): local.x = 'something' def __exit__(self, *err): local.x = self.old_x The downside of this approach is the addition of significant new complexity to the context manager protocol and the interpreter implementation. This approach is also likely to negatively impact the performance of generators and coroutines. Additionally, the solution in :pep:`521` is limited to context managers, and does not provide any mechanism to propagate state in asynchronous tasks and callbacks. Can Execution Context be implemented outside of CPython? -------------------------------------------------------- No. Proper generator behaviour with respect to the execution context requires changes to the interpreter. Should we update sys.displayhook and other APIs to use EC? ---------------------------------------------------------- APIs like redirecting stdout by overwriting ``sys.stdout``, or specifying new exception display hooks by overwriting the ``sys.displayhook`` function are affecting the whole Python process **by design**. Their users assume that the effect of changing them will be visible across OS threads. Therefore we cannot just make these APIs to use the new Execution Context. That said we think it is possible to design new APIs that will be context aware, but that is outside of the scope of this PEP. Greenlets --------- Greenlet is an alternative implementation of cooperative scheduling for Python. Although greenlet package is not part of CPython, popular frameworks like gevent rely on it, and it is important that greenlet can be modified to support execution contexts. Conceptually, the behaviour of greenlets is very similar to that of generators, which means that similar changes around greenlet entry and exit can be done to add support for execution context. Backwards Compatibility ======================= This proposal preserves 100% backwards compatibility. Appendix: HAMT Performance Analysis =================================== .. figure:: pep-0550-hamt_vs_dict-v2.png :align: center :width: 100% Figure 1. Benchmark code can be found here: [9]_. The above chart demonstrates that: * HAMT displays near O(1) performance for all benchmarked dictionary sizes. * ``dict.copy()`` becomes very slow around 100 items. .. figure:: pep-0550-lookup_hamt.png :align: center :width: 100% Figure 2. Benchmark code can be found here: [10]_. Figure 2 compares the lookup costs of ``dict`` versus a HAMT-based immutable mapping. HAMT lookup time is 30-40% slower than Python dict lookups on average, which is a very good result, considering that the latter is very well optimized. Thre is research [8]_ showing that there are further possible improvements to the performance of HAMT. The reference implementation of HAMT for CPython can be found here: [7]_. Acknowledgments =============== Thanks to Victor Petrovykh for countless discussions around the topic and PEP proofreading and edits. Thanks to Nathaniel Smith for proposing the ``ContextVar`` design [17]_ [18]_, for pushing the PEP towards a more complete design, and coming up with the idea of having a stack of contexts in the thread state. Thanks to Nick Coghlan for numerous suggestions and ideas on the mailing list, and for coming up with a case that cause the complete rewrite of the initial PEP version [19]_. Version History =============== 1. Initial revision, posted on 11-Aug-2017 [20]_. 2. V2 posted on 15-Aug-2017 [21]_. The fundamental limitation that caused a complete redesign of the first version was that it was not possible to implement an iterator that would interact with the EC in the same way as generators (see [19]_.) Version 2 was a complete rewrite, introducing new terminology (Local Context, Execution Context, Context Item) and new APIs. 3. V3 posted on 18-Aug-2017 [22]_. Updates: * Local Context was renamed to Logical Context. The term "local" was ambiguous and conflicted with local name scopes. * Context Item was renamed to Context Key, see the thread with Nick Coghlan, Stefan Krah, and Yury Selivanov [23]_ for details. * Context Item get cache design was adjusted, per Nathaniel Smith's idea in [25]_. * Coroutines are created without a Logical Context; ceval loop no longer needs to special case the ``await`` expression (proposed by Nick Coghlan in [24]_.) 4. V4 posted on 25-Aug-2017: the current version. * The specification section has been completely rewritten. * Context Key renamed to Context Var. * Removed the distinction between generators and coroutines with respect to logical context isolation. References ========== .. [1] https://blog.golang.org/context .. [2] https://msdn.microsoft.com/en-us/library/system.threading.executioncontext.aspx .. [3] https://github.com/numpy/numpy/issues/9444 .. [4] http://bugs.python.org/issue31179 .. [5] https://en.wikipedia.org/wiki/Hash_array_mapped_trie .. [6] http://blog.higher-order.net/2010/08/16/assoc-and-clojures-persistenthashmap-part-ii.html .. [7] https://github.com/1st1/cpython/tree/hamt .. [8] https://michael.steindorfer.name/publications/oopsla15.pdf .. [9] https://gist.github.com/1st1/9004813d5576c96529527d44c5457dcd .. [10] https://gist.github.com/1st1/dbe27f2e14c30cce6f0b5fddfc8c437e .. [11] https://github.com/1st1/cpython/tree/pep550 .. [12] https://www.python.org/dev/peps/pep-0492/#async-await .. [13] https://github.com/MagicStack/uvloop/blob/master/examples/bench/echoserver.py .. [14] https://github.com/MagicStack/pgbench .. [15] https://github.com/python/performance .. [16] https://gist.github.com/1st1/6b7a614643f91ead3edf37c4451a6b4c .. [17] https://mail.python.org/pipermail/python-ideas/2017-August/046752.html .. [18] https://mail.python.org/pipermail/python-ideas/2017-August/046772.html .. [19] https://mail.python.org/pipermail/python-ideas/2017-August/046775.html .. [20] https://github.com/python/peps/blob/e8a06c9a790f39451d9e99e203b13b3ad73a1d01/pep-0550.rst .. [21] https://github.com/python/peps/blob/e3aa3b2b4e4e9967d28a10827eed1e9e5960c175/pep-0550.rst .. [22] https://github.com/python/peps/blob/287ed87bb475a7da657f950b353c71c1248f67e7/pep-0550.rst .. [23] https://mail.python.org/pipermail/python-ideas/2017-August/046801.html .. [24] https://mail.python.org/pipermail/python-ideas/2017-August/046790.html .. [25] https://mail.python.org/pipermail/python-ideas/2017-August/046786.html Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: