PEP: 550
Title: Execution Context
Version: $Revision$
Last-Modified: $Date$
Author: Yury Selivanov <yury@magic.io>,
        Elvis Pranskevichus <elvis@magic.io>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 11-Aug-2017
Python-Version: 3.7
Post-History: 11-Aug-2017, 15-Aug-2017, 18-Aug-2017, 25-Aug-2017


Abstract
========

This PEP adds a new generic mechanism of ensuring consistent access
to non-local state in the context of out-of-order execution, such
as in Python generators and coroutines.

Thread-local storage, such as ``threading.local()``, is inadequate for
programs that execute concurrently in the same OS thread.  This PEP
proposes a solution to this problem.


Rationale
=========

Prior to the advent of asynchronous programming in Python, programs
used OS threads to achieve concurrency.  The need for thread-specific
state was solved by ``threading.local()`` and its C-API equivalent,
``PyThreadState_GetDict()``.

A few examples of where Thread-local storage (TLS) is commonly
relied upon:

* Context managers like decimal contexts, ``numpy.errstate``,
  and ``warnings.catch_warnings``.

* Request-related data, such as security tokens and request
  data in web applications, language context for ``gettext`` etc.

* Profiling, tracing, and logging in large code bases.

Unfortunately, TLS does not work well for programs which execute
concurrently in a single thread.  A Python generator is the simplest
example of a concurrent program.  Consider the following::

    def fractions(precision, x, y):
        with decimal.localcontext() as ctx:
            ctx.prec = precision
            yield Decimal(x) / Decimal(y)
            yield Decimal(x) / Decimal(y**2)

    g1 = fractions(precision=2, x=1, y=3)
    g2 = fractions(precision=6, x=2, y=3)

    items = list(zip(g1, g2))

The expected value of ``items`` is::

    [(Decimal('0.33'), Decimal('0.666667')),
     (Decimal('0.11'), Decimal('0.222222'))]

Rather surprisingly, the actual result is::

    [(Decimal('0.33'), Decimal('0.666667')),
     (Decimal('0.111111'), Decimal('0.222222'))]

This is because Decimal context is stored as a thread-local, so
concurrent iteration of the ``fractions()`` generator would corrupt
the state.  A similar problem exists with coroutines.

Applications also often need to associate certain data with a given
thread of execution.  For example, a web application server commonly
needs access to the current HTTP request object.

The inadequacy of TLS in asynchronous code has lead to the
proliferation of ad-hoc solutions, which are limited in scope and
do not support all required use cases.

The current status quo is that any library (including the standard
library), which relies on TLS, is likely to be broken when used in
asynchronous code or with generators (see [3]_ as an example issue.)

Some languages, that support coroutines or generators, recommend
passing the context manually as an argument to every function, see [1]_
for an example.  This approach, however, has limited use for Python,
where there is a large ecosystem that was built to work with a TLS-like
context.  Furthermore, libraries like ``decimal`` or ``numpy`` rely
on context implicitly in overloaded operator implementations.

The .NET runtime, which has support for async/await, has a generic
solution for this problem, called ``ExecutionContext`` (see [2]_).


Goals
=====

The goal of this PEP is to provide a more reliable
``threading.local()`` alternative, which:

* provides the mechanism and the API to fix non-local state issues
  with coroutines and generators;

* has no or negligible performance impact on the existing code or
  the code that will be using the new mechanism, including
  libraries like ``decimal`` and ``numpy``.


High-Level Specification
========================

The full specification of this PEP is broken down into three parts:

* High-Level Specification (this section): the description of the
  overall solution.  We show how it applies to generators and
  coroutines in user code, without delving into implementation details.

* Detailed Specification: the complete description of new concepts,
  APIs, and related changes to the standard library.

* Implementation Details: the description and analysis of data
  structures and algorithms used to implement this PEP, as well as the
  necessary changes to CPython.

For the purpose of this section, we define *execution context* as an
opaque container of non-local state that allows consistent access to
its contents in the concurrent execution environment.

A *context variable* is an object representing a value in the
execution context.  A new context variable is created by calling
the ``new_context_var()`` function.  A context variable object has
two methods:

* ``lookup()``: returns the value of the variable in the current
  execution context;

* ``set()``: sets the value of the variable in the current
  execution context.


Regular Single-threaded Code
----------------------------

In regular, single-threaded code that doesn't involve generators or
coroutines, context variables behave like globals::

    var = new_context_var()

    def sub():
        assert var.lookup() == 'main'
        var.set('sub')

    def main():
        var.set('main')
        sub()
        assert var.lookup() == 'sub'


Multithreaded Code
------------------

In multithreaded code, context variables behave like thread locals::

    var = new_context_var()

    def sub():
        assert var.lookup() is None  # The execution context is empty
                                     # for each new thread.
        var.set('sub')

    def main():
        var.set('main')

        thread = threading.Thread(target=sub)
        thread.start()
        thread.join()

        assert var.lookup() == 'main'


Generators
----------

In generators, changes to context variables are local and are not
visible to the caller, but are visible to the code called by the
generator.  Once set in the generator, the context variable is
guaranteed not to change between iterations::

    var = new_context_var()

    def gen():
        var.set('gen')
        assert var.lookup() == 'gen'
        yield 1

        assert var.lookup() == 'gen'
        yield 2

    def main():
        var.set('main')

        g = gen()
        next(g)
        assert var.lookup() == 'main'

        var.set('main modified')
        next(g)
        assert var.lookup() == 'main modified'

Changes to caller's context variables are visible to the generator
(unless they were also modified inside the generator)::

    var = new_context_var()

    def gen():
        assert var.lookup() == 'var'
        yield 1

        assert var.lookup() == 'var modified'
        yield 2

    def main():
        g = gen()

        var.set('var')
        next(g)

        var.set('var modified')
        next(g)

Now, let's revisit the decimal precision example from the `Rationale`_
section, and see how the execution context can improve the situation::

    import decimal

    decimal_prec = new_context_var()  # create a new context variable

    # Pre-PEP 550 Decimal relies on TLS for its context.
    # This subclass switches the decimal context storage
    # to the execution context for illustration purposes.
    #
    class MyDecimal(decimal.Decimal):
        def __init__(self, value="0"):
            prec = decimal_prec.lookup()
            if prec is None:
                raise ValueError('could not find decimal precision')
            context = decimal.Context(prec=prec)
            super().__init__(value, context=context)

    def fractions(precision, x, y):
        # Normally, this would be set by a context manager,
        # but for simplicity we do this directly.
        decimal_prec.set(precision)

        yield MyDecimal(x) / MyDecimal(y)
        yield MyDecimal(x) / MyDecimal(y**2)

    g1 = fractions(precision=2, x=1, y=3)
    g2 = fractions(precision=6, x=2, y=3)

    items = list(zip(g1, g2))

The value of ``items`` is::

    [(Decimal('0.33'), Decimal('0.666667')),
     (Decimal('0.11'), Decimal('0.222222'))]

which matches the expected result.


Coroutines and Asynchronous Tasks
---------------------------------

In coroutines, like in generators, context variable changes are local
and are not visible to the caller::

    import asyncio

    var = new_context_var()

    async def sub():
        assert var.lookup() == 'main'
        var.set('sub')
        assert var.lookup() == 'sub'

    async def main():
        var.set('main')
        await sub()
        assert var.lookup() == 'main'

    loop = asyncio.get_event_loop()
    loop.run_until_complete(main())

To establish the full semantics of execution context in couroutines,
we must also consider *tasks*.  A task is the abstraction used by
*asyncio*, and other similar libraries, to manage the concurrent
execution of coroutines.  In the example above, a task is created
implicitly by the ``run_until_complete()`` function.
``asyncio.wait_for()`` is another example of implicit task creation::

    async def sub():
        await asyncio.sleep(1)
        assert var.lookup() == 'main'

    async def main():
        var.set('main')

        # waiting for sub() directly
        await sub()

        # waiting for sub() with a timeout
        await asyncio.wait_for(sub(), timeout=2)

        var.set('main changed')

Intuitively, we expect the assertion in ``sub()`` to hold true in both
invocations, even though the ``wait_for()`` implementation actually
spawns a task, which runs ``sub()`` concurrently with ``main()``.

Thus, tasks **must** capture a snapshot of the current execution
context at the moment of their creation and use it to execute the
wrapped coroutine whenever that happens.  If this is not done, then
innocuous looking changes like wrapping a coroutine in a ``wait_for()``
call would cause surprising breakage.  This leads to the following::

    import asyncio

    var = new_context_var()

    async def sub():
        # Sleeping will make sub() run after
        # `var` is modified in main().
        await asyncio.sleep(1)

        assert var.lookup() == 'main'

    async def main():
        var.set('main')
        loop.create_task(sub())  # schedules asynchronous execution
                                 # of sub().
        assert var.lookup() == 'main'
        var.set('main changed')

    loop = asyncio.get_event_loop()
    loop.run_until_complete(main())

In the above code we show how ``sub()``, running in a separate task,
sees the value of ``var`` as it was when ``loop.create_task(sub())``
was called.

Like tasks, the intuitive behaviour of callbacks scheduled with either
``Loop.call_soon()``, ``Loop.call_later()``, or
``Future.add_done_callback()`` is to also capture a snapshot of the
current execution context at the point of scheduling, and use it to
run the callback::

    current_request = new_context_var()

    def log_error(e):
        logging.error('error when handling request %r',
                      current_request.lookup())

    async def render_response():
        ...

    async def handle_get_request(request):
        current_request.set(request)

        try:
            return await render_response()
        except Exception as e:
            get_event_loop().call_soon(log_error, e)
            return '500 - Internal Server Error'


Detailed Specification
======================

Conceptually, an *execution context* (EC) is a stack of logical
contexts.  There is one EC per Python thread.

A *logical context* (LC) is a mapping of context variables to their
values in that particular LC.

A *context variable* is an object representing a value in the
execution context.  A new context variable object is created by calling
the ``sys.new_context_var(name: str)`` function.  The value of the
``name`` argument is not used by the EC machinery, but may be used for
debugging and introspection.

The context variable object has the following methods and attributes:

* ``name``: the value passed to ``new_context_var()``.

* ``lookup()``: traverses the execution context top-to-bottom,
  until the variable value is found.  Returns ``None``, if the variable
  is not present in the execution context;

* ``set()``: sets the value of the variable in the topmost logical
  context.


Generators
----------

When created, each generator object has an empty logical context object
stored in its ``__logical_context__`` attribute.  This logical context
is pushed onto the execution context at the beginning of each generator
iteration and popped at the end::

    var1 = sys.new_context_var('var1')
    var2 = sys.new_context_var('var2')

    def gen():
        var1.set('var1-gen')
        var2.set('var2-gen')

        # EC = [
        #     outer_LC(),
        #     gen_LC({var1: 'var1-gen', var2: 'var2-gen'})
        # ]
        n = nested_gen()  # nested_gen_LC is created
        next(n)
        # EC = [
        #     outer_LC(),
        #     gen_LC({var1: 'var1-gen', var2: 'var2-gen'})
        # ]

        var1.set('var1-gen-mod')
        var2.set('var2-gen-mod')
        # EC = [
        #     outer_LC(),
        #     gen_LC({var1: 'var1-gen-mod', var2: 'var2-gen-mod'})
        # ]
        next(n)

    def nested_gen():
        # EC = [
        #     outer_LC(),
        #     gen_LC({var1: 'var1-gen', var2: 'var2-gen'}),
        #     nested_gen_LC()
        # ]
        assert var1.lookup() == 'var1-gen'
        assert var2.lookup() == 'var2-gen'

        var1.set('var1-nested-gen')
        # EC = [
        #     outer_LC(),
        #     gen_LC({var1: 'var1-gen', var2: 'var2-gen'}),
        #     nested_gen_LC({var1: 'var1-nested-gen'})
        # ]
        yield

        # EC = [
        #     outer_LC(),
        #     gen_LC({var1: 'var1-gen-mod', var2: 'var2-gen-mod'}),
        #     nested_gen_LC({var1: 'var1-nested-gen'})
        # ]
        assert var1.lookup() == 'var1-nested-gen'
        assert var2.lookup() == 'var2-gen-mod'

        yield

    # EC = [outer_LC()]

    g = gen()  # gen_LC is created for the generator object `g`
    list(g)

    # EC = [outer_LC()]

The snippet above shows the state of the execution context stack
throughout the generator lifespan.


contextlib.contextmanager
-------------------------

Earlier, we've used the following example::

    import decimal

    # create a new context variable
    decimal_prec = sys.new_context_var('decimal_prec')

    # ...

    def fractions(precision, x, y):
        decimal_prec.set(precision)

        yield MyDecimal(x) / MyDecimal(y)
        yield MyDecimal(x) / MyDecimal(y**2)

Let's extend it by adding a context manager::

    @contextlib.contextmanager
    def precision_context(prec):
        old_rec = decimal_prec.lookup()

        try:
            decimal_prec.set(prec)
            yield
        finally:
            decimal_prec.set(old_prec)

Unfortunately, this would not work straight away, as the modification
to the ``decimal_prec`` variable is contained to the
``precision_context()`` generator, and therefore will not be visible
inside the ``with`` block::

    def fractions(precision, x, y):
        # EC = [{}, {}]

        with precision_context(precision):
            # EC becomes [{}, {}, {decimal_prec: precision}] in the
            # *precision_context()* generator,
            # but here the EC is still [{}, {}]

            # raises ValueError('could not find decimal precision')!
            yield MyDecimal(x) / MyDecimal(y)
            yield MyDecimal(x) / MyDecimal(y**2)

The way to fix this is to set the generator's ``__logical_context__``
attribute to ``None``.  This will cause the generator to avoid
modifying the execution context stack.

We modify the ``contextlib.contextmanager()`` decorator to
set ``genobj.__logical_context__`` to ``None`` to produce
well-behaved context managers::

    def fractions(precision, x, y):
        # EC = [{}, {}]

        with precision_context(precision):
            # EC = [{}, {decimal_prec: precision}]

            yield MyDecimal(x) / MyDecimal(y)
            yield MyDecimal(x) / MyDecimal(y**2)

        # EC becomes [{}, {decimal_prec: None}]


asyncio
-------

``asyncio`` uses ``Loop.call_soon``, ``Loop.call_later``,
and ``Loop.call_at`` to schedule the asynchronous execution of a
function.  ``asyncio.Task`` uses ``call_soon()`` to further the
execution of the wrapped coroutine.

We modify ``Loop.call_{at,later,soon}`` to accept the new
optional *execution_context* keyword argument, which defaults to
the copy of the current execution context::

    def call_soon(self, callback, *args, execution_context=None):
        if execution_context is None:
            execution_context = sys.get_execution_context()

        # ... some time later

        sys.run_with_execution_context(
            execution_context, callback, args)

The ``sys.get_execution_context()`` function returns a shallow copy
of the current execution context.  By shallow copy here we mean such
a new execution context that:

* lookups in the copy provide the same results as in the original
  execution context, and
* any changes in the original execution context do not affect the
  copy, and
* any changes to the copy do not affect the original execution
  context.

Either of the following satisfy the copy requirements:

* a new stack with shallow copies of logical contexts;
* a new stack with one squashed logical context.

The ``sys.run_with_execution_context(ec, func, *args, **kwargs)``
function runs ``func(*args, **kwargs)`` with *ec* as the execution
context.  The function performs the following steps:

1. Set *ec* as the current execution context stack in the current
   thread.
2. Push an empty logical context onto the stack.
3. Run ``func(*args, **kwargs)``.
4. Pop the logical context from the stack.
5. Restore the original execution context stack.
6. Return or raise the ``func()`` result.

These steps ensure that *ec* cannot be modified by *func*,
which makes ``run_with_execution_context()`` idempotent.

``asyncio.Task`` is modified as follows::

    class Task:
        def __init__(self, coro):
            ...
            # Get the current execution context snapshot.
            self._exec_context = sys.get_execution_context()

            self._loop.call_soon(
                self._step,
                execution_context=self._exec_context)

        def _step(self, exc=None):
            ...
            self._loop.call_soon(
                self._step,
                execution_context=self._exec_context)
            ...


Generators Transformed into Iterators
-------------------------------------

Any Python generator can be represented as an equivalent iterator.
Compilers like Cython rely on this axiom.  With respect to the
execution context, such iterator should behave the same way as the
generator it represents.

This means that there needs to be a Python API to create new logical
contexts and run code with a given logical context.

The ``sys.new_logical_context()`` function creates a new empty
logical context.

The ``sys.run_with_logical_context(lc, func, *args, **kwargs)``
function can be used to run functions in the specified logical context.
The *lc* can be modified as a result of the call.

The ``sys.run_with_logical_context()`` function performs the following
steps:

1. Push *lc* onto the current execution context stack.
2. Run ``func(*args, **kwargs)``.
3. Pop *lc* from the execution context stack.
4. Return or raise the ``func()`` result.

By using ``new_logical_context()`` and ``run_with_logical_context()``,
we can replicate the generator behaviour like this::

    class Generator:

        def __init__(self):
            self.logical_context = sys.new_logical_context()

        def __iter__(self):
            return self

        def __next__(self):
            return sys.run_with_logical_context(
                self.logical_context, self._next_impl)

        def _next_impl(self):
            # Actual __next__ implementation.
            ...

Let's see how this pattern can be applied to a real generator::

    # create a new context variable
    decimal_prec = sys.new_context_var('decimal_precision')

    def gen_series(n, precision):
        decimal_prec.set(precision)

        for i in range(1, n):
            yield MyDecimal(i) / MyDecimal(3)

    # gen_series is equivalent to the following iterator:

    class Series:

        def __init__(self, n, precision):
            # Create a new empty logical context on creation,
            # like the generators do.
            self.logical_context = sys.new_logical_context()

            # run_with_logical_context() will pushes
            # self.logical_context onto the execution context stack,
            # runs self._next_impl, and pops self.logical_context
            # from the stack.
            return sys.run_with_logical_context(
                self.logical_context, self._init, n, precision)

        def _init(self, n, precision):
            self.i = 1
            self.n = n
            decimal_prec.set(precision)

        def __iter__(self):
            return self

        def __next__(self):
            return sys.run_with_logical_context(
                self.logical_context, self._next_impl)

        def _next_impl(self):
            decimal_prec.set(self.precision)
            result = MyDecimal(self.i) / MyDecimal(3)
            self.i += 1
            return result

For regular iterators such approach to logical context management is
normally not necessary, and it is recommended to set and restore
context variables directly in ``__next__``::

    class Series:

        def __next__(self):
            old_prec = decimal_prec.lookup()

            try:
                decimal_prec.set(self.precision)
                ...
            finally:
                decimal_prec.set(old_prec)


Asynchronous Generators
-----------------------

The execution context semantics in asynchronous generators does not
differ from that of regular generators and coroutines.


Implementation
==============

Execution context is implemented as an immutable linked list of
logical contexts, where each logical context is an immutable weak key
mapping.  A pointer to the currently active execution context is stored
in the OS thread state::

                      +-----------------+
                      |                 |     ec
                      |  PyThreadState  +-------------+
                      |                 |             |
                      +-----------------+             |
                                                      |
    ec_node             ec_node             ec_node   v
    +------+------+     +------+------+     +------+------+
    | NULL |  lc  |<----| prev |  lc  |<----| prev |  lc  |
    +------+--+---+     +------+--+---+     +------+--+---+
              |                   |                   |
    LC        v         LC        v         LC        v
    +-------------+     +-------------+     +-------------+
    | var1: obj1  |     |    EMPTY    |     | var1: obj4  |
    | var2: obj2  |     +-------------+     +-------------+
    | var3: obj3  |
    +-------------+

The choice of the immutable list of immutable mappings as a fundamental
data structure is motivated by the need to efficiently implement
``sys.get_execution_context()``, which is to be frequently used by
asynchronous tasks and callbacks.  When the EC is immutable,
``get_execution_context()`` can simply copy the current execution
context *by reference*::

    def get_execution_context(self):
        return PyThreadState_Get().ec

Let's review all possible context modification scenarios:

* The ``ContextVariable.set()`` method is called::

    def ContextVar_set(self, val):
        # See a more complete set() definition
        # in the `Context Variables` section.

        tstate = PyThreadState_Get()
        top_ec_node = tstate.ec
        top_lc = top_ec_node.lc
        new_top_lc = top_lc.set(self, val)
        tstate.ec = ec_node(
            prev=top_ec_node.prev,
            lc=new_top_lc)

* The ``sys.run_with_logical_context()`` is called, in which case
  the passed logical context object is appended to the
  execution context::

    def run_with_logical_context(lc, func, *args, **kwargs):
        tstate = PyThreadState_Get()

        old_top_ec_node = tstate.ec
        new_top_ec_node = ec_node(prev=old_top_ec_node, lc=lc)

        try:
            tstate.ec = new_top_ec_node
            return func(*args, **kwargs)
        finally:
            tstate.ec = old_top_ec_node

* The ``sys.run_with_execution_context()`` is called, in which case
  the current execution context is set to the passed execution context
  with a new empty logical context appended to it::

    def run_with_execution_context(ec, func, *args, **kwargs):
        tstate = PyThreadState_Get()

        old_top_ec_node = tstate.ec
        new_lc = sys.new_logical_context()
        new_top_ec_node = ec_node(prev=ec, lc=new_lc)

        try:
            tstate.ec = new_top_ec_node
            return func(*args, **kwargs)
        finally:
            tstate.ec = old_top_ec_node

* Either ``genobj.send()``, ``genobj.throw()``, ``genobj.close()``
  are called on a ``genobj`` generator, in which case the logical
  context recorded in ``genobj`` is pushed onto the stack::

    PyGen_New(PyGenObject *gen):
        gen.__logical_context__ = sys.new_logical_context()

    gen_send(PyGenObject *gen, ...):
        tstate = PyThreadState_Get()

        if gen.__logical_context__ is not None:
            old_top_ec_node = tstate.ec
            new_top_ec_node = ec_node(
                prev=old_top_ec_node,
                lc=gen.__logical_context__)

            try:
                tstate.ec = new_top_ec_node
                return _gen_send_impl(gen, ...)
            finally:
                gen.__logical_context__ = tstate.ec.lc
                tstate.ec = old_top_ec_node
        else:
            return _gen_send_impl(gen, ...)

* Coroutines and asynchronous generators share the implementation
  with generators, and the above changes apply to them as well.

In certain scenarios the EC may need to be squashed to limit the
size of the chain.  For example, consider the following corner case::

    async def repeat(coro, delay):
        await coro()
        await asyncio.sleep(delay)
        loop.create_task(repeat(coro, delay))

    async def ping():
        print('ping')

    loop = asyncio.get_event_loop()
    loop.create_task(repeat(ping, 1))
    loop.run_forever()

In the above code, the EC chain will grow as long as ``repeat()`` is
called. Each new task will call ``sys.run_in_execution_context()``,
which will append a new logical context to the chain.  To prevent
unbounded growth, ``sys.get_execution_context()`` checks if the chain
is longer than a predetermined maximum, and if it is, squashes the
chain into a single LC::

    def get_execution_context():
        tstate = PyThreadState_Get()

        if tstate.ec_len > EC_LEN_MAX:
            squashed_lc = sys.new_logical_context()

            ec_node = tstate.ec
            while ec_node:
                # The LC.merge() method does not replace existing keys.
                squashed_lc = squashed_lc.merge(ec_node.lc)
                ec_node = ec_node.prev

            return ec_node(prev=NULL, lc=squashed_lc)
        else:
            return tstate.ec


Logical Context
---------------

Logical context is an immutable weak key mapping which has the
following properties with respect to garbage collection:

* ``ContextVar`` objects are strongly-referenced only from the
  application code, not from any of the Execution Context machinery
  or values they point to.  This means that there are no reference
  cycles that could extend their lifespan longer than necessary, or
  prevent their collection by the GC.

* Values put in the Execution Context are guaranteed to be kept
  alive while there is a ``ContextVar`` key referencing them in
  the thread.

* If a ``ContextVar`` is garbage collected, all of its values will
  be removed from all contexts, allowing them to be GCed if needed.

* If a thread has ended its execution, its thread state will be
  cleaned up along with its ``ExecutionContext``, cleaning
  up all values bound to all context variables in the thread.

As discussed earluier, we need ``sys.get_execution_context()`` to be
consistently fast regardless of the size of the execution context, so
logical context is necessarily an immutable mapping.

Choosing ``dict`` for the underlying implementation is suboptimal,
because ``LC.set()`` will cause ``dict.copy()``, which is an O(N)
operation, where *N* is the number of items in the LC.

``get_execution_context()``, when squashing the EC, is a O(M)
operation, where *M* is the total number of context variable values
in the EC.

So, instead of ``dict``, we choose Hash Array Mapped Trie (HAMT)
as the underlying implementation of logical contexts.  (Scala and
Clojure use HAMT to implement high performance immutable collections
[5]_, [6]_.)

With HAMT ``.set()`` becomes an O(log N) operation, and
``get_execution_context()`` squashing is more efficient on average due
to structural sharing in HAMT.

See `Appendix: HAMT Performance Analysis`_ for a more elaborate
analysis of HAMT performance compared to ``dict``.


Context Variables
-----------------

The ``ContextVar.lookup()`` and ``ContextVar.set()`` methods are
implemented as follows (in pseudo-code)::

    class ContextVar:

        def lookup(self):
            tstate = PyThreadState_Get()

            ec_node = tstate.ec
            while ec_node:
                if self in ec_node.lc:
                    return ec_node.lc[self]
                ec_node = ec_node.prev

            return None

        def set(self, value):
            tstate = PyThreadState_Get()
            top_ec_node = tstate.ec

            if top_ec_node is not None:
                top_lc = top_ec_node.lc
                new_top_lc = top_lc.set(self, value)
                tstate.ec = ec_node(
                    prev=top_ec_node.prev,
                    lc=new_top_lc)
            else:
                top_lc = sys.new_logical_context()
                new_top_lc = top_lc.set(self, value)
                tstate.ec = ec_node(
                    prev=NULL,
                    lc=new_top_lc)

For efficient access in performance-sensitive code paths, such as in
``numpy`` and ``decimal``, we add a cache to ``ContextVar.get()``,
making it an O(1) operation when the cache is hit.  The cache key is
composed from the following:

* The new ``uint64_t PyThreadState->unique_id``, which is a globally
  unique thread state identifier.  It is computed from the new
  ``uint64_t PyInterpreterState->ts_counter``, which is incremented
  whenever a new thread state is created.

* The ``uint64_t ContextVar->version`` counter, which is incremented
  whenever the context variable value is changed in any logical context
  in any thread.

The cache is then implemented as follows::

    class ContextVar:

        def set(self, value):
            ...  # implementation
            self.version += 1


        def lookup(self):
            tstate = PyThreadState_Get()

            if (self.last_tstate_id == tstate.unique_id and
                    self.last_version == self.version):
                return self.last_value

            value = self._lookup_uncached()

            self.last_value = value  # borrowed ref
            self.last_tstate_id = tstate.unique_id
            self.last_version = self.version

            return value

Note that ``last_value`` is a borrowed reference.  The assumption
is that if the version checks are fine, the object will be alive.
This allows the values of context variables to be properly garbage
collected.

This generic caching approach is similar to what the current C
implementation of ``decimal`` does to cache the the current decimal
context, and has similar performance characteristics.


Performance Considerations
==========================

Tests of the reference implementation based on the prior
revisions of this PEP have shown 1-2% slowdown on generator
microbenchmarks and no noticeable difference in macrobenchmarks.

The performance of non-generator and non-async code is not
affected by this PEP.


Summary of the New APIs
=======================

Python
------

The following new Python APIs are introduced by this PEP:

1. The ``sys.new_context_var(name: str='...')`` function to create
   ``ContextVar`` objects.

2. The ``ContextVar`` object, which has:

   * the read-only ``.name`` attribute,
   * the ``.lookup()`` method which returns the value of the variable
     in the current execution context;
   * the ``.set()`` method which sets the value of the variable in
     the current execution context.

3. The ``sys.get_execution_context()`` function, which returns a
   copy of the current execution context.

4. The ``sys.new_execution_context()`` function, which returns a new
   empty execution context.

5. The ``sys.new_logical_context()`` function, which returns a new
   empty logical context.

6. The ``sys.run_with_execution_context(ec: ExecutionContext,
   func, *args, **kwargs)`` function, which runs *func* with the
   provided execution context.

7. The ``sys.run_with_logical_context(lc:LogicalContext,
   func, *args, **kwargs)`` function, which runs *func* with the
   provided logical context on top of the current execution context.


C API
-----

1. ``PyContextVar * PyContext_NewVar(char *desc)``: create a
   ``PyContextVar`` object.

2. ``PyObject * PyContext_LookupVar(PyContextVar *)``: return
   the value of the variable in the current execution context.

3. ``int PyContext_SetVar(PyContextVar *, PyObject *)``: set
   the value of the variable in the current execution context.

4. ``PyLogicalContext * PyLogicalContext_New()``: create a new empty
   ``PyLogicalContext``.

5. ``PyLogicalContext * PyExecutionContext_New()``: create a new empty
   ``PyExecutionContext``.

6. ``PyExecutionContext * PyExecutionContext_Get()``: return the
   current execution context.

7. ``int PyExecutionContext_Set(PyExecutionContext *)``: set the
   passed EC object as the current for the active thread state.

8. ``int PyExecutionContext_SetWithLogicalContext(PyExecutionContext *,
   PyLogicalContext *)``: allows to implement
   ``sys.run_with_logical_context`` Python API.


Design Considerations
=====================

Should ``PyThreadState_GetDict()`` use the execution context?
-------------------------------------------------------------

No. ``PyThreadState_GetDict`` is based on TLS, and changing its
semantics will break backwards compatibility.


PEP 521
-------

:pep:`521` proposes an alternative solution to the problem, which
extends the context manager protocol with two new methods:
``__suspend__()`` and ``__resume__()``.  Similarly, the asynchronous
context manager protocol is also extended with ``__asuspend__()`` and
``__aresume__()``.

This allows implementing context managers that manage non-local state,
which behave correctly in generators and coroutines.

For example, consider the following context manager, which uses
execution state::

    class Context:

        def __init__(self):
            self.var = new_context_var('var')

        def __enter__(self):
            self.old_x = self.var.lookup()
            self.var.set('something')

        def __exit__(self, *err):
            self.var.set(self.old_x)

An equivalent implementation with PEP 521::

    local = threading.local()

    class Context:

        def __enter__(self):
            self.old_x = getattr(local, 'x', None)
            local.x = 'something'

        def __suspend__(self):
            local.x = self.old_x

        def __resume__(self):
            local.x = 'something'

        def __exit__(self, *err):
            local.x = self.old_x

The downside of this approach is the addition of significant new
complexity to the context manager protocol and the interpreter
implementation.  This approach is also likely to negatively impact
the performance of generators and coroutines.

Additionally, the solution in :pep:`521` is limited to context managers,
and does not provide any mechanism to propagate state in asynchronous
tasks and callbacks.


Can Execution Context be implemented outside of CPython?
--------------------------------------------------------

No.  Proper generator behaviour with respect to the execution context
requires changes to the interpreter.


Should we update sys.displayhook and other APIs to use EC?
----------------------------------------------------------

APIs like redirecting stdout by overwriting ``sys.stdout``, or
specifying new exception display hooks by overwriting the
``sys.displayhook`` function are affecting the whole Python process
**by design**.  Their users assume that the effect of changing
them will be visible across OS threads.  Therefore we cannot
just make these APIs to use the new Execution Context.

That said we think it is possible to design new APIs that will
be context aware, but that is outside of the scope of this PEP.


Greenlets
---------

Greenlet is an alternative implementation of cooperative
scheduling for Python.  Although greenlet package is not part of
CPython, popular frameworks like gevent rely on it, and it is
important that greenlet can be modified to support execution
contexts.

Conceptually, the behaviour of greenlets is very similar to that of
generators, which means that similar changes around greenlet entry
and exit can be done to add support for execution context.


Backwards Compatibility
=======================

This proposal preserves 100% backwards compatibility.


Appendix: HAMT Performance Analysis
===================================

.. figure:: pep-0550-hamt_vs_dict-v2.png
   :align: center
   :width: 100%

   Figure 1.  Benchmark code can be found here: [9]_.

The above chart demonstrates that:

* HAMT displays near O(1) performance for all benchmarked
  dictionary sizes.

* ``dict.copy()`` becomes very slow around 100 items.

.. figure:: pep-0550-lookup_hamt.png
   :align: center
   :width: 100%

   Figure 2.  Benchmark code can be found here: [10]_.

Figure 2 compares the lookup costs of ``dict`` versus a HAMT-based
immutable mapping.  HAMT lookup time is 30-40% slower than Python dict
lookups on average, which is a very good result, considering that the
latter is very well optimized.

Thre is research [8]_ showing that there are further possible
improvements to the performance of HAMT.

The reference implementation of HAMT for CPython can be found here:
[7]_.


Acknowledgments
===============

Thanks to Victor Petrovykh for countless discussions around the topic
and PEP proofreading and edits.

Thanks to Nathaniel Smith for proposing the ``ContextVar`` design
[17]_ [18]_, for pushing the PEP towards a more complete design, and
coming up with the idea of having a stack of contexts in the thread
state.

Thanks to Nick Coghlan for numerous suggestions and ideas on the
mailing list, and for coming up with a case that cause the complete
rewrite of the initial PEP version [19]_.


Version History
===============

1. Initial revision, posted on 11-Aug-2017 [20]_.

2. V2 posted on 15-Aug-2017 [21]_.

   The fundamental limitation that caused a complete redesign of the
   first version was that it was not possible to implement an iterator
   that would interact with the EC in the same way as generators
   (see [19]_.)

   Version 2 was a complete rewrite, introducing new terminology
   (Local Context, Execution Context, Context Item) and new APIs.

3. V3 posted on 18-Aug-2017 [22]_.

   Updates:

   * Local Context was renamed to Logical Context.  The term "local"
     was ambiguous and conflicted with local name scopes.

   * Context Item was renamed to Context Key, see the thread with Nick
     Coghlan, Stefan Krah, and Yury Selivanov [23]_ for details.

   * Context Item get cache design was adjusted, per Nathaniel Smith's
     idea in [25]_.

   * Coroutines are created without a Logical Context; ceval loop
     no longer needs to special case the ``await`` expression
     (proposed by Nick Coghlan in [24]_.)

4. V4 posted on 25-Aug-2017: the current version.

   * The specification section has been completely rewritten.

   * Context Key renamed to Context Var.

   * Removed the distinction between generators and coroutines with
     respect to logical context isolation.


References
==========

.. [1] https://blog.golang.org/context

.. [2] https://msdn.microsoft.com/en-us/library/system.threading.executioncontext.aspx

.. [3] https://github.com/numpy/numpy/issues/9444

.. [4] http://bugs.python.org/issue31179

.. [5] https://en.wikipedia.org/wiki/Hash_array_mapped_trie

.. [6] http://blog.higher-order.net/2010/08/16/assoc-and-clojures-persistenthashmap-part-ii.html

.. [7] https://github.com/1st1/cpython/tree/hamt

.. [8] https://michael.steindorfer.name/publications/oopsla15.pdf

.. [9] https://gist.github.com/1st1/9004813d5576c96529527d44c5457dcd

.. [10] https://gist.github.com/1st1/dbe27f2e14c30cce6f0b5fddfc8c437e

.. [11] https://github.com/1st1/cpython/tree/pep550

.. [12] https://www.python.org/dev/peps/pep-0492/#async-await

.. [13] https://github.com/MagicStack/uvloop/blob/master/examples/bench/echoserver.py

.. [14] https://github.com/MagicStack/pgbench

.. [15] https://github.com/python/performance

.. [16] https://gist.github.com/1st1/6b7a614643f91ead3edf37c4451a6b4c

.. [17] https://mail.python.org/pipermail/python-ideas/2017-August/046752.html

.. [18] https://mail.python.org/pipermail/python-ideas/2017-August/046772.html

.. [19] https://mail.python.org/pipermail/python-ideas/2017-August/046775.html

.. [20] https://github.com/python/peps/blob/e8a06c9a790f39451d9e99e203b13b3ad73a1d01/pep-0550.rst

.. [21] https://github.com/python/peps/blob/e3aa3b2b4e4e9967d28a10827eed1e9e5960c175/pep-0550.rst

.. [22] https://github.com/python/peps/blob/287ed87bb475a7da657f950b353c71c1248f67e7/pep-0550.rst

.. [23] https://mail.python.org/pipermail/python-ideas/2017-August/046801.html

.. [24] https://mail.python.org/pipermail/python-ideas/2017-August/046790.html

.. [25] https://mail.python.org/pipermail/python-ideas/2017-August/046786.html


Copyright
=========

This document has been placed in the public domain.


..
   Local Variables:
   mode: indented-text
   indent-tabs-mode: nil
   sentence-end-double-space: t
   fill-column: 70
   coding: utf-8
   End: