PEP: 550
Title: Execution Context
Version: $Revision$
Last-Modified: $Date$
Author: Yury Selivanov <yury@magic.io>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 11-Aug-2017
Python-Version: 3.7
Post-History: 11-Aug-2017, 15-Aug-2017


Abstract
========

This PEP proposes a new mechanism to manage execution state--the
logical environment in which a function, a thread, a generator,
or a coroutine executes in.

A few examples of where having a reliable state storage is required:

* Context managers like decimal contexts, ``numpy.errstate``,
  and ``warnings.catch_warnings``;

* Storing request-related data such as security tokens and request
  data in web applications, implementing i18n;

* Profiling, tracing, and logging in complex and large code bases.

The usual solution for storing state is to use a Thread-local Storage
(TLS), implemented in the standard library as ``threading.local()``.
Unfortunately, TLS does not work for the purpose of state isolation
for generators or asynchronous code, because such code executes
concurrently in a single thread.


Rationale
=========

Traditionally, a Thread-local Storage (TLS) is used for storing the
state.  However, the major flaw of using the TLS is that it works only
for multi-threaded code.  It is not possible to reliably contain the
state within a generator or a coroutine.  For example, consider
the following generator::

    def calculate(precision, ...):
        with decimal.localcontext() as ctx:
            # Set the precision for decimal calculations
            # inside this block
            ctx.prec = precision

            yield calculate_something()
            yield calculate_something_else()

Decimal context is using a TLS to store the state, and because TLS is
not aware of generators, the state can leak.  If a user iterates over
the ``calculate()`` generator with different precisions one by one
using a ``zip()`` built-in, the above code will not work correctly.
For example::

    g1 = calculate(precision=100)
    g2 = calculate(precision=50)

    items = list(zip(g1, g2))

    # items[0] will be a tuple of:
    #   first value from g1 calculated with 100 precision,
    #   first value from g2 calculated with 50 precision.
    #
    # items[1] will be a tuple of:
    #   second value from g1 calculated with 50 precision (!!!),
    #   second value from g2 calculated with 50 precision.

An even scarier example would be using decimals to represent money
in an async/await application: decimal calculations can suddenly
lose precision in the middle of processing a request.  Currently,
bugs like this are extremely hard to find and fix.

Another common need for web applications is to have access to the
current request object, or security context, or, simply, the request
URL for logging or submitting performance tracing data::

    async def handle_http_request(request):
        context.current_http_request = request

        await ...
        # Invoke your framework code, render templates,
        # make DB queries, etc, and use the global
        # 'current_http_request' in that code.

        # This isn't currently possible to do reliably
        # in asyncio out of the box.

These examples are just a few out of many, where a reliable way to
store context data is absolutely needed.

The inability to use TLS for asynchronous code has lead to
proliferation of ad-hoc solutions, which are limited in scope and
do not support all required use cases.

Current status quo is that any library, including the standard
library, that uses a TLS, will likely not work as expected in
asynchronous code or with generators (see [3]_ as an example issue.)

Some languages that have coroutines or generators recommend to
manually pass a ``context`` object to every function, see [1]_
describing the pattern for Go.  This approach, however, has limited
use for Python, where we have a huge ecosystem that was built to work
with a TLS-like context.  Moreover, passing the context explicitly
does not work at all for libraries like ``decimal`` or ``numpy``,
which use operator overloading.

.NET runtime, which has support for async/await, has a generic
solution of this problem, called ``ExecutionContext`` (see [2]_).
On the surface, working with it is very similar to working with a TLS,
but the former explicitly supports asynchronous code.


Goals
=====

The goal of this PEP is to provide a more reliable alternative to
``threading.local()``.  It should be explicitly designed to work with
Python execution model, equally supporting threads, generators, and
coroutines.

An acceptable solution for Python should meet the following
requirements:

* Transparent support for code executing in threads, coroutines,
  and generators with an easy to use API.

* Negligible impact on the performance of the existing code or the
  code that will be using the new mechanism.

* Fast C API for packages like ``decimal`` and ``numpy``.

Explicit is still better than implicit, hence the new APIs should only
be used when there is no acceptable way of passing the state
explicitly.


Specification
=============

Execution Context is a mechanism of storing and accessing data specific
to a logical thread of execution.  We consider OS threads,
generators, and chains of coroutines (such as ``asyncio.Task``)
to be variants of a logical thread.

In this specification, we will use the following terminology:

* **Local Context**, or LC, is a key/value mapping that stores the
  context of a logical thread.

* **Execution Context**, or EC, is an OS-thread-specific dynamic
  stack of Local Contexts.

* **Context Item**, or CI, is an object used to set and get values
  from the Execution Context.

Please note that throughout the specification we use simple
pseudo-code to illustrate how the EC machinery works.  The actual
algorithms and data structures that we will use to implement the PEP
are discussed in the `Implementation Strategy`_ section.


Context Item Object
-------------------

The ``sys.new_context_item(description)`` function creates a
new ``ContextItem`` object.  The ``description`` parameter is a
``str``, explaining the nature of the context key for introspection
and debugging purposes.

``ContextItem`` objects have the following methods and attributes:

* ``.description``: read-only description;

* ``.set(o)`` method: set the value to ``o`` for the context item
  in the execution context.

* ``.get()`` method: return the current EC value for the context item.
  Context items are initialized with ``None`` when created, so
  this method call never fails.

The below is an example of how context items can be used::

    my_context = sys.new_context_item(description='mylib.context')
    my_context.set('spam')

    # Later, to access the value of my_context:
    print(my_context.get())


Thread State and Multi-threaded code
------------------------------------

Execution Context is implemented on top of Thread-local Storage.
For every thread there is a separate stack of Local Contexts --
mappings of ``ContextItem`` objects to their values in the LC.
New threads always start with an empty EC.

For CPython::

    PyThreadState:
        execution_context: ExecutionContext([
            LocalContext({ci1: val1, ci2: val2, ...}),
            ...
        ])

The ``ContextItem.get()`` and ``.set()`` methods are defined as
follows (in pseudo-code)::

    class ContextItem:

        def get(self):
            tstate = PyThreadState_Get()

            for local_context in reversed(tstate.execution_context):
                if self in local_context:
                    return local_context[self]

            return None

        def set(self, value):
            tstate = PyThreadState_Get()

            if not tstate.execution_context:
                tstate.execution_context = [LocalContext()]

            tstate.execution_context[-1][self] = value

With the semantics defined so far, the Execution Context can already
be used as an alternative to ``threading.local()``::

    def print_foo():
        print(ci.get() or 'nothing')

    ci = sys.new_context_item(description='test')
    ci.set('foo')

    # Will print "foo":
    print_foo()

    # Will print "nothing":
    threading.Thread(target=print_foo).start()


Manual Context Management
-------------------------

Execution Context is generally managed by the Python interpreter,
but sometimes it is desirable for the user to take the control
over it.  A few examples when this is needed:

* running a computation in ``concurrent.futures.ThreadPoolExecutor``
  with the current EC;

* reimplementing generators with iterators (more on that later);

* managing contexts in asynchronous frameworks (implement proper
  EC support in ``asyncio.Task`` and ``asyncio.loop.call_soon``.)

For these purposes we add a set of new APIs (they will be used in
later sections of this specification):

* ``sys.new_local_context()``: create an empty ``LocalContext``
  object.

* ``sys.new_execution_context()``: create an empty
  ``ExecutionContext`` object.

* Both ``LocalContext`` and ``ExecutionContext`` objects are opaque
  to Python code, and there are no APIs to modify them.

* ``sys.get_execution_context()`` function.  The function returns a
  copy of the current EC: an ``ExecutionContext`` instance.

  The runtime complexity of the actual implementation of this function
  can be O(1), but for the purposes of this section it is equivalent
  to::

    def get_execution_context():
        tstate = PyThreadState_Get()
        return copy(tstate.execution_context)

* ``sys.run_with_execution_context(ec: ExecutionContext, func, *args,
  **kwargs)`` runs ``func(*args, **kwargs)`` in the provided execution
  context::

    def run_with_execution_context(ec, func, *args, **kwargs):
        tstate = PyThreadState_Get()

        old_ec = tstate.execution_context

        tstate.execution_context = ExecutionContext(
            ec.local_contexts + [LocalContext()]
        )

        try:
            return func(*args, **kwargs)
        finally:
            tstate.execution_context = old_ec

  Any changes to Local Context by ``func`` will be ignored.
  This allows to reuse one ``ExecutionContext`` object for multiple
  invocations of different functions, without them being able to
  affect each other's environment::

      ci = sys.new_context_item('example')
      ci.set('spam')

      def func():
          print(ci.get())
          ci.set('ham')

      ec = sys.get_execution_context()

      sys.run_with_execution_context(ec, func)
      sys.run_with_execution_context(ec, func)

      # Will print:
      #   spam
      #   spam

* ``sys.run_with_local_context(lc: LocalContext, func, *args,
  **kwargs)`` runs ``func(*args, **kwargs)`` in the current execution
  context using the specified local context.

  Any changes that ``func`` does to the local context will be
  persisted in ``lc``.  This behaviour is different from the
  ``run_with_execution_context()`` function, which always creates
  a new throw-away local context.

  In pseudo-code::

    def run_with_local_context(lc, func, *args, **kwargs):
        tstate = PyThreadState_Get()

        old_ec = tstate.execution_context

        tstate.execution_context = ExecutionContext(
            old_ec.local_contexts + [lc]
        )

        try:
            return func(*args, **kwargs)
        finally:
            tstate.execution_context = old_ec

  Using the previous example::

      ci = sys.new_context_item('example')
      ci.set('spam')

      def func():
          print(ci.get())
          ci.set('ham')

      ec = sys.get_execution_context()
      lc = sys.new_local_context()

      sys.run_with_local_context(lc, func)
      sys.run_with_local_context(lc, func)

      # Will print:
      #   spam
      #   ham

As an example, let's make a subclass of
``concurrent.futures.ThreadPoolExecutor`` that preserves the execution
context for scheduled functions::

    class Executor(concurrent.futures.ThreadPoolExecutor):

        def submit(self, fn, *args, **kwargs):
            context = sys.get_execution_context()

            fn = functools.partial(
                sys.run_with_execution_context, context,
                fn, *args, **kwargs)

            return super().submit(fn)


EC Semantics for Coroutines
---------------------------

Python :pep:`492` coroutines are used to implement cooperative
multitasking.  For a Python end-user they are similar to threads,
especially when it comes to sharing resources or modifying
the global state.

An event loop is needed to schedule coroutines.  Coroutines that
are explicitly scheduled by the user are usually called Tasks.
When a coroutine is scheduled, it can schedule other coroutines using
an ``await`` expression.  In async/await world, awaiting a coroutine
is equivalent to a regular function call in synchronous code.  Thus,
Tasks are similar to threads.

By drawing a parallel between regular multithreaded code and
async/await, it becomes apparent that any modification of the
execution context within one Task should be visible to all coroutines
scheduled within it.  Any execution context modifications, however,
must not be visible to other Tasks executing within the same OS
thread.


Coroutine Object Modifications
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

To achieve this, a small set of modifications to the coroutine object
is needed:

* New ``cr_local_context`` attribute.  This attribute is readable
  and writable for Python code.

* When a coroutine object is instantiated, its ``cr_local_context``
  is initialized with an empty Local Context.

* Coroutine's ``.send()`` and ``.throw()`` methods are modified as
  follows (in pseudo-C)::

    if coro.cr_local_context is not None:
        tstate = PyThreadState_Get()

        tstate.execution_context.push(coro.cr_local_context)

        try:
            # Perform the actual `Coroutine.send()` or
            # `Coroutine.throw()` call.
            return coro.send(...)
        finally:
            coro.cr_local_context = tstate.execution_context.pop()
    else:
        # Perform the actual `Coroutine.send()` or
        # `Coroutine.throw()` call.
        return coro.send(...)

* When Python interpreter sees an ``await`` instruction, it inspects
  the ``cr_local_context`` attribute of the coroutine that is about
  to be awaited.  For ``await coro``:

  * If ``coro.cr_local_context`` is an empty ``LocalContext`` object
    that ``coro`` was created with, the interpreter will set
    ``coro.cr_local_context`` to ``None``.

  * If ``coro.cr_local_context`` was modified by Python code, the
    interpreter will leave it as is.

  This makes any changes to execution context made by nested coroutine
  calls within a Task to be visible throughout the Task::

      ci = sys.new_context_item('example')

      async def nested():
          ci.set('nested')

      async def main():
          ci.set('main')
          print('before:', ci.get())
          await nested()
          print('after:', ci.get())

      # Will print:
      #   before: main
      #   after: nested

  Essentially, coroutines work with Execution Context items similarly
  to threads, and ``await`` expression acts like a function call.

  This mechanism also works for ``yield from`` in generators decorated
  with ``@types.coroutine`` or ``@asyncio.coroutine``, which are
  called "generator-based coroutines" according to :pep:`492`,
  and should be fully compatible with native async/await coroutines.


Tasks
^^^^^

In asynchronous frameworks like asyncio, coroutines are run by
an event loop, and need to be explicitly scheduled (in asyncio
coroutines are run by ``asyncio.Task``.)

With the currently defined semantics, the interpreter makes
coroutines linked by an ``await`` expression share the same
Local Context.

The interpreter, however, is not aware of the Task concept, and
cannot help with ensuring that new Tasks started in coroutines,
use the correct EC::

    current_request = sys.new_context_item(description='request')

    async def child():
        print('current request:', repr(current_request.get()))

    async def handle_request(request):
        current_request.set(request)
        event_loop.create_task(child)

    run(top_coro())

    # Will print:
    #   current_request: None

To enable correct Execution Context propagation into Tasks, the
asynchronous framework needs to assist the interpreter:

* When ``create_task`` is called, it should capture the current
  execution context with ``sys.get_execution_context()`` and save it
  on the Task object.

* When the Task object runs its coroutine object, it should execute
  ``.send()`` and ``.throw()`` methods within the captured
  execution context, using the ``sys.run_with_execution_context()``
  function.

With help from the asynchronous framework, the above snippet will
run correctly, and the ``child()`` coroutine will be able to access
the current request object through the ``current_request``
Context Item.


Event Loop Callbacks
^^^^^^^^^^^^^^^^^^^^

Similarly to Tasks, functions like asyncio's ``loop.call_soon()``
should capture the current execution context with
``sys.get_execution_context()`` and execute callbacks
within it with ``sys.run_with_execution_context()``.

This way the following code will work::

    current_request = sys.new_context_item(description='request')

    def log():
        request = current_request.get()
        print(request)

    async def request_handler(request):
        current_request.set(request)
        get_event_loop.call_soon(log)


Generators
----------

Generators in Python, while similar to Coroutines, are used in a
fundamentally different way.  They are producers of data, and
they use ``yield`` expression to suspend/resume their execution.

A crucial difference between ``await coro`` and ``yield value`` is
that the former expression guarantees that the ``coro`` will be
executed fully, while the latter is producing ``value`` and
suspending the generator until it gets iterated again.

Generators, similarly to coroutines, have a ``gi_local_context``
attribute, which is set to an empty Local Context when created.

Contrary to coroutines though, ``yield from o`` expression in
generators (that are not generator-based coroutines) is semantically
equivalent to ``for v in o: yield v``, therefore the interpreter does
not attempt to control their ``gi_local_context``.


EC Semantics for Generators
^^^^^^^^^^^^^^^^^^^^^^^^^^^

Every generator object has its own Local Context that stores
only its own local modifications of the context.  When a generator
is being iterated, its local context will be put in the EC stack
of the current thread.  This means that the generator will be able
to see access items from the surrounding context::

    local = sys.new_context_item("local")
    global = sys.new_context_item("global")

    def generator():
        local.set('inside gen:')
        while True:
            print(local.get(), global.get())
            yield

    g = gen()

    local.set('hello')
    global.set('spam')
    next(g)

    local.set('world')
    global.set('ham')
    next(g)

    # Will print:
    #   inside gen: spam
    #   inside gen: ham

Any changes to the EC in nested generators are invisible to the outer
generator::

    local = sys.new_context_item("local")

    def inner_gen():
        local.set('spam')
        yield

    def outer_gen():
        local.set('ham')
        yield from gen()
        print(local.get())

    list(outer_gen())

    # Will print:
    #   ham


Running generators without LC
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Similarly to coroutines, generators with ``gi_local_context``
set to ``None`` simply use the outer Local Context.

The ``@contextlib.contextmanager`` decorator uses this mechanism to
allow its generator to affect the EC::

    item = sys.new_context_item('test')

    @contextmanager
    def context(x):
        old = item.get()
        item.set('x')
        try:
            yield
        finally:
            item.set(old)

    with context('spam'):

        with context('ham'):
            print(1, item.get())

        print(2, item.get())

    # Will print:
    #   1 ham
    #   2 spam


Implementing Generators with Iterators
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The Execution Context API allows to fully replicate EC behaviour
imposed on generators with a regular Python iterator class::

    class Gen:

        def __init__(self):
            self.local_context = sys.new_local_context()

        def __iter__(self):
            return self

        def __next__(self):
            return sys.run_with_local_context(
                self.local_context, self._next_impl)

        def _next_impl(self):
            # Actual __next__ implementation.
            ...


Asynchronous Generators
-----------------------

Asynchronous Generators (AG) interact with the Execution Context
similarly to regular generators.

They have an ``ag_local_context`` attribute, which, similarly to
regular generators, can be set to ``None`` to make them use the outer
Local Context.  This is used by the new
``contextlib.asynccontextmanager`` decorator.

The EC support of ``await`` expression is implemented using the same
approach as in coroutines, see the `Coroutine Object Modifications`_
section.


Greenlets
---------

Greenlet is an alternative implementation of cooperative
scheduling for Python.  Although greenlet package is not part of
CPython, popular frameworks like gevent rely on it, and it is
important that greenlet can be modified to support execution
contexts.

In a nutshell, greenlet design is very similar to design of
generators.  The main difference is that for generators, the stack
is managed by the Python interpreter.  Greenlet works outside of the
Python interpreter, and manually saves some ``PyThreadState``
fields and pushes/pops the C-stack.  Thus the ``greenlet`` package
can be easily updated to use the new low-level `C API`_ to enable
full support of EC.


New APIs
========

Python
------

Python APIs were designed to completely hide the internal
implementation details, but at the same time provide enough control
over EC and LC to re-implement all of Python built-in objects
in pure Python.

1. ``sys.new_context_item(description='...')``: create a
   ``ContextItem`` object used to access/set values in EC.

2. ``ContextItem``:

   * ``.description``: read-only attribute.
   * ``.get()``: return the current value for the item.
   * ``.set(o)``: set the current value in the EC for the item.

3. ``sys.get_execution_context()``: return the current
   ``ExecutionContext``.

4. ``sys.new_execution_context()``: create a new empty
   ``ExecutionContext``.

5. ``sys.new_local_context()``: create a new empty ``LocalContext``.

6. ``sys.run_with_execution_context(ec: ExecutionContext,
   func, *args, **kwargs)``.

7. ``sys.run_with_local_context(lc:LocalContext,
   func, *args, **kwargs)``.


C API
-----

1. ``PyContextItem * PyContext_NewItem(char *desc)``: create a
   ``PyContextItem`` object.

2. ``PyObject * PyContext_GetItem(PyContextItem *)``: get the
   current value for the context item.

3. ``int PyContext_SetItem(PyContextItem *, PyObject *)``: set
   the current value for the context item.

4. ``PyLocalContext * PyLocalContext_New()``: create a new empty
   ``PyLocalContext``.

5. ``PyLocalContext * PyExecutionContext_New()``: create a new empty
   ``PyExecutionContext``.

6. ``PyExecutionContext * PyExecutionContext_Get()``: get the
   EC for the active thread state.

7. ``int PyExecutionContext_Set(PyExecutionContext *)``: set the
   passed EC object as the current for the active thread state.

8. ``int PyExecutionContext_SetWithLocalContext(PyExecutionContext *,
   PyLocalContext *)``: allows to implement
   ``sys.run_with_local_context`` Python API.


Implementation Strategy
=======================

LocalContext is a Weak Key Mapping
----------------------------------

Using a weak key mapping for ``LocalContext`` implementation
enables the following properties with regards to garbage
collection:

* ``ContextItem`` objects are strongly-referenced only from the
  application code, not from any of the Execution Context
  machinery or values they point to.  This means that there
  are no reference cycles that could extend their lifespan
  longer than necessary, or prevent their garbage collection.

* Values put in the Execution Context are guaranteed to be kept
  alive while there is a ``ContextItem`` key referencing them in
  the thread.

* If a ``ContextItem`` is garbage collected, all of its values will
  be removed from all contexts, allowing them to be GCed if needed.

* If a thread has ended its execution, its thread state will be
  cleaned up along with its ``ExecutionContext``, cleaning
  up all values bound to all Context Items in the thread.


ContextItem.get() Cache
-----------------------

We can add three new fields to ``PyThreadState`` and
``PyInterpreterState`` structs:

* ``uint64_t PyThreadState->unique_id``: a globally unique
  thread state identifier (we can add a counter to
  ``PyInterpreterState`` and increment it when a new thread state is
  created.)

* ``uint64_t PyInterpreterState->context_item_deallocs``: every time
  a ``ContextItem`` is GCed, all Execution Contexts in all threads
  will lose track of it.  ``context_item_deallocs`` will simply
  count all ``ContextItem`` deallocations.

* ``uint64_t PyThreadState->execution_context_ver``: every time
  a new item is set, or an existing item is updated, or the stack
  of execution contexts is changed in the thread, we increment this
  counter.

The above two fields allow implementing a fast cache path in
``ContextItem.get()``, in pseudo-code::

    class ContextItem:

        def get(self):
            tstate = PyThreadState_Get()

            if (self.last_tstate_id == tstate.unique_id and
                self.last_ver == tstate.execution_context_ver
                self.last_deallocs ==
                    tstate.iterp.context_item_deallocs):
                return self.last_value

            value = None
            for mapping in reversed(tstate.execution_context):
                if self in mapping:
                    value = mapping[self]
                    break

            self.last_value = value  # borrowed ref
            self.last_tstate_id = tstate.unique_id
            self.last_ver = tstate.execution_context_ver
            self.last_deallocs = tstate.interp.context_item_deallocs

            return value

Note that ``last_value`` is a borrowed reference.  The assumption
is that if all counters tests are OK, the object will be alive.
This allows the CI values to be properly GCed.

This is similar to the trick that decimal C implementation uses
for caching the current decimal context, and will have the same
performance characteristics, but available to all
Execution Context users.


Approach #1: Use a dict for LocalContext
----------------------------------------

The straightforward way of implementing the proposed EC
mechanisms is to create a ``WeakKeyDict`` on top of Python
``dict`` type.

To implement the ``ExecutionContext`` type we can use Python
``list`` (or a custom stack implementation with some
pre-allocation optimizations).

This approach will have the following runtime complexity:

* O(M) for ``ContextItem.get()``, where ``M`` is the number of
  Local Contexts in the stack.

  It is important to note that ``ContextItem.get()`` will implement
  a cache making the operation O(1) for packages like ``decimal``
  and ``numpy``.

* O(1) for ``ContextItem.set()``.

* O(N) for ``sys.get_execution_context()``, where ``N`` is the
  total number of items in the current **execution** context.


Approach #2: Use HAMT for LocalContext
--------------------------------------

Languages like Clojure and Scala use Hash Array Mapped Tries (HAMT)
to implement high performance immutable collections [5]_, [6]_.

Immutable mappings implemented with HAMT have O(log\ :sub:`32`\ N)
performance for both ``set()``, ``get()``, and ``merge()`` operations,
which is essentially O(1) for relatively small mappings
(read about HAMT performance in CPython in the
`Appendix: HAMT Performance`_ section.)

In this approach we use the same design of the ``ExecutionContext``
as in Approach #1, but we will use HAMT backed weak key Local Context
implementation.  With that we will have the following runtime
complexity:

* O(M * log\ :sub:`32`\ N) for ``ContextItem.get()``,
  where ``M`` is the number of Local Contexts in the stack,
  and ``N`` is the number of items in the EC.  The operation will
  essentially be O(M), because execution contexts are normally not
  expected to have more than a few dozen of items.

  (``ContextItem.get()`` will have the same caching mechanism as in
  Approach #1.)

* O(log\ :sub:`32`\ N) for ``ContextItem.set()`` where ``N`` is the
  number of items in the current **local** context.  This will
  essentially be an O(1) operation most of the time.

* O(log\ :sub:`32`\ N) for ``sys.get_execution_context()``, where
  ``N`` is the total number of items in the current **execution**
  context.

Essentially, using HAMT for Local Contexts instead of Python dicts,
allows to bring down the complexity of ``sys.get_execution_context()``
from O(N) to O(log\ :sub:`32`\ N) because of the more efficient
merge algorithm.


Approach #3: Use HAMT and Immutable Linked List
-----------------------------------------------

We can make an alternative ``ExecutionContext`` design by using
a linked list.  Each ``LocalContext`` in the ``ExecutionContext``
object will be wrapped in a linked-list node.

``LocalContext`` objects will use an HAMT backed weak key
implementation described in the Approach #2.

Every modification to the current ``LocalContext`` will produce a
new version of it, which will be wrapped in a **new linked list
node**.  Essentially this means, that ``ExecutionContext`` is an
immutable forest of ``LocalContext`` objects, and can be safely
copied by reference in ``sys.get_execution_context()`` (eliminating
the expensive "merge" operation.)

With this approach, ``sys.get_execution_context()`` will be an
**O(1) operation**.


Summary
-------

We believe that approach #3 enables an efficient and complete
Execution Context implementation, with excellent runtime performance.

`ContextItem.get() Cache`_ enables fast retrieval of context items
for performance critical libraries like decimal and numpy.

Fast ``sys.get_execution_context()`` enables efficient management
of execution contexts in asynchronous libraries like asyncio.


Design Considerations
=====================

Can we fix ``PyThreadState_GetDict()``?
---------------------------------------

``PyThreadState_GetDict`` is a TLS, and some of its existing users
might depend on it being just a TLS.  Changing its behaviour to follow
the Execution Context semantics would break backwards compatibility.


PEP 521
-------

:pep:`521` proposes an alternative solution to the problem:
enhance Context Manager Protocol with two new methods: ``__suspend__``
and ``__resume__``.  To make it compatible with async/await,
the Asynchronous Context Manager Protocol will also need to be
extended with ``__asuspend__`` and ``__aresume__``.

This allows to implement context managers like decimal context and
``numpy.errstate`` for generators and coroutines.

The following code::

    class Context:

        def __enter__(self):
            self.old_x = get_execution_context_item('x')
            set_execution_context_item('x', 'something')

        def __exit__(self, *err):
            set_execution_context_item('x', self.old_x)

would become this::

    local = threading.local()

    class Context:

        def __enter__(self):
            self.old_x = getattr(local, 'x', None)
            local.x = 'something'

        def __suspend__(self):
            local.x = self.old_x

        def __resume__(self):
            local.x = 'something'

        def __exit__(self, *err):
            local.x = self.old_x

Besides complicating the protocol, the implementation will likely
negatively impact performance of coroutines, generators, and any code
that uses context managers, and will notably complicate the
interpreter implementation.

:pep:`521` also does not provide any mechanism to propagate state
in a local context, like storing a request object in an HTTP request
handler to have better logging.  Nor does it solve the leaking state
problem for greenlet/gevent.


Can Execution Context be implemented outside of CPython?
--------------------------------------------------------

Because async/await code needs an event loop to run it, an EC-like
solution can be implemented in a limited way for coroutines.

Generators, on the other hand, do not have an event loop or
trampoline, making it impossible to intercept their ``yield`` points
outside of the Python interpreter.


Backwards Compatibility
=======================

This proposal preserves 100% backwards compatibility.


Appendix: HAMT Performance
==========================

First, while investigating possibilities of how to implement an
immutable mapping in CPython, we were able to improve the efficiency
of ``dict.copy()`` up to 5 times: [4]_.  All benchmarks in this
section were run against the optimized dict.

To assess if HAMT can be used for Execution Context, we implemented
it in CPython [7]_.

.. figure:: pep-0550-hamt_vs_dict.png
   :align: center
   :width: 100%

   Figure 1.  Benchmark code can be found here: [9]_.

Figure 1 shows that HAMT indeed displays O(1) performance for all
benchmarked dictionary sizes.  For dictionaries with less than 100
items, HAMT is a bit slower than Python dict/shallow copy.

.. figure:: pep-0550-lookup_hamt.png
   :align: center
   :width: 100%

   Figure 2.  Benchmark code can be found here: [10]_.

Figure 2 shows comparison of lookup costs between Python dict
and an HAMT immutable mapping.  HAMT lookup time is 30-40% worse
than Python dict lookups on average, which is a very good result,
considering how well Python dicts are optimized.

Note, that according to [8]_, HAMT design can be further improved.

There is a limitation of Python ``dict`` design which makes HAMT
the preferred choice for immutable mapping implementation:
dicts need to be resized periodically, and resize is expensive.
The ``dict.copy()`` optimization we were  able to do (see [4]_) will
only work for dicts that had no deleted items.  Dicts that had
deleted items need to be resized during ``copy()``, which makes it
much slower.

Because adding and deleting items from LocalContext is a very common
operation, we would not be able to always use the optimized
``dict.copy()`` for LocalContext, frequently resorting to use the
slower version of it.


Acknowledgments
===============

I thank Elvis Pranskevichus and Victor Petrovykh for countless
discussions around the topic and PEP proof reading and edits.

Thanks to Nathaniel Smith for proposing the ``ContextItem`` design
[17]_ [18]_, for pushing the PEP towards a more complete design, and
coming up with the idea of having a stack of contexts in the thread
state.

Thanks to Nick Coghlan for numerous suggestions and ideas on the
mailing list, and for coming up with a case that cause the complete
rewrite of the initial PEP version [19]_.


Version History
===============

1. Posted on 11-Aug-2017, view it here: [20]_.

   The fundamental limitation that caused a complete redesign of the
   PEP was that it was not possible to implement an iterator that
   would interact with the EC in the same way as generators
   (see [19]_.)

2. Posted on 15-Aug-2017: the current version.


References
==========

.. [1] https://blog.golang.org/context

.. [2] https://msdn.microsoft.com/en-us/library/system.threading.executioncontext.aspx

.. [3] https://github.com/numpy/numpy/issues/9444

.. [4] http://bugs.python.org/issue31179

.. [5] https://en.wikipedia.org/wiki/Hash_array_mapped_trie

.. [6] http://blog.higher-order.net/2010/08/16/assoc-and-clojures-persistenthashmap-part-ii.html

.. [7] https://github.com/1st1/cpython/tree/hamt

.. [8] https://michael.steindorfer.name/publications/oopsla15.pdf

.. [9] https://gist.github.com/1st1/9004813d5576c96529527d44c5457dcd

.. [10] https://gist.github.com/1st1/dbe27f2e14c30cce6f0b5fddfc8c437e

.. [11] https://github.com/1st1/cpython/tree/pep550

.. [12] https://www.python.org/dev/peps/pep-0492/#async-await

.. [13] https://github.com/MagicStack/uvloop/blob/master/examples/bench/echoserver.py

.. [14] https://github.com/MagicStack/pgbench

.. [15] https://github.com/python/performance

.. [16] https://gist.github.com/1st1/6b7a614643f91ead3edf37c4451a6b4c

.. [17] https://mail.python.org/pipermail/python-ideas/2017-August/046752.html

.. [18] https://mail.python.org/pipermail/python-ideas/2017-August/046772.html

.. [19] https://mail.python.org/pipermail/python-ideas/2017-August/046775.html

.. [20] https://github.com/python/peps/blob/e8a06c9a790f39451d9e99e203b13b3ad73a1d01/pep-0550.rst


Copyright
=========

This document has been placed in the public domain.


..
   Local Variables:
   mode: indented-text
   indent-tabs-mode: nil
   sentence-end-double-space: t
   fill-column: 70
   coding: utf-8
   End: