1358 lines
42 KiB
ReStructuredText
1358 lines
42 KiB
ReStructuredText
PEP: 550
|
|
Title: Execution Context
|
|
Version: $Revision$
|
|
Last-Modified: $Date$
|
|
Author: Yury Selivanov <yury@magic.io>,
|
|
Elvis Pranskevichus <elvis@magic.io>
|
|
Status: Draft
|
|
Type: Standards Track
|
|
Content-Type: text/x-rst
|
|
Created: 11-Aug-2017
|
|
Python-Version: 3.7
|
|
Post-History: 11-Aug-2017, 15-Aug-2017, 18-Aug-2017, 25-Aug-2017
|
|
|
|
|
|
Abstract
|
|
========
|
|
|
|
This PEP adds a new generic mechanism of ensuring consistent access
|
|
to non-local state in the context of out-of-order execution, such
|
|
as in Python generators and coroutines.
|
|
|
|
Thread-local storage, such as ``threading.local()``, is inadequate for
|
|
programs that execute concurrently in the same OS thread. This PEP
|
|
proposes a solution to this problem.
|
|
|
|
|
|
Rationale
|
|
=========
|
|
|
|
Prior to the advent of asynchronous programming in Python, programs
|
|
used OS threads to achieve concurrency. The need for thread-specific
|
|
state was solved by ``threading.local()`` and its C-API equivalent,
|
|
``PyThreadState_GetDict()``.
|
|
|
|
A few examples of where Thread-local storage (TLS) is commonly
|
|
relied upon:
|
|
|
|
* Context managers like decimal contexts, ``numpy.errstate``,
|
|
and ``warnings.catch_warnings``.
|
|
|
|
* Request-related data, such as security tokens and request
|
|
data in web applications, language context for ``gettext`` etc.
|
|
|
|
* Profiling, tracing, and logging in large code bases.
|
|
|
|
Unfortunately, TLS does not work well for programs which execute
|
|
concurrently in a single thread. A Python generator is the simplest
|
|
example of a concurrent program. Consider the following::
|
|
|
|
def fractions(precision, x, y):
|
|
with decimal.localcontext() as ctx:
|
|
ctx.prec = precision
|
|
yield Decimal(x) / Decimal(y)
|
|
yield Decimal(x) / Decimal(y**2)
|
|
|
|
g1 = fractions(precision=2, x=1, y=3)
|
|
g2 = fractions(precision=6, x=2, y=3)
|
|
|
|
items = list(zip(g1, g2))
|
|
|
|
The expected value of ``items`` is::
|
|
|
|
[(Decimal('0.33'), Decimal('0.666667')),
|
|
(Decimal('0.11'), Decimal('0.222222'))]
|
|
|
|
Rather surprisingly, the actual result is::
|
|
|
|
[(Decimal('0.33'), Decimal('0.666667')),
|
|
(Decimal('0.111111'), Decimal('0.222222'))]
|
|
|
|
This is because Decimal context is stored as a thread-local, so
|
|
concurrent iteration of the ``fractions()`` generator would corrupt
|
|
the state. A similar problem exists with coroutines.
|
|
|
|
Applications also often need to associate certain data with a given
|
|
thread of execution. For example, a web application server commonly
|
|
needs access to the current HTTP request object.
|
|
|
|
The inadequacy of TLS in asynchronous code has lead to the
|
|
proliferation of ad-hoc solutions, which are limited in scope and
|
|
do not support all required use cases.
|
|
|
|
The current status quo is that any library (including the standard
|
|
library), which relies on TLS, is likely to be broken when used in
|
|
asynchronous code or with generators (see [3]_ as an example issue.)
|
|
|
|
Some languages, that support coroutines or generators, recommend
|
|
passing the context manually as an argument to every function, see [1]_
|
|
for an example. This approach, however, has limited use for Python,
|
|
where there is a large ecosystem that was built to work with a TLS-like
|
|
context. Furthermore, libraries like ``decimal`` or ``numpy`` rely
|
|
on context implicitly in overloaded operator implementations.
|
|
|
|
The .NET runtime, which has support for async/await, has a generic
|
|
solution for this problem, called ``ExecutionContext`` (see [2]_).
|
|
|
|
|
|
Goals
|
|
=====
|
|
|
|
The goal of this PEP is to provide a more reliable
|
|
``threading.local()`` alternative, which:
|
|
|
|
* provides the mechanism and the API to fix non-local state issues
|
|
with coroutines and generators;
|
|
|
|
* has no or negligible performance impact on the existing code or
|
|
the code that will be using the new mechanism, including
|
|
libraries like ``decimal`` and ``numpy``.
|
|
|
|
|
|
High-Level Specification
|
|
========================
|
|
|
|
The full specification of this PEP is broken down into three parts:
|
|
|
|
* High-Level Specification (this section): the description of the
|
|
overall solution. We show how it applies to generators and
|
|
coroutines in user code, without delving into implementation details.
|
|
|
|
* Detailed Specification: the complete description of new concepts,
|
|
APIs, and related changes to the standard library.
|
|
|
|
* Implementation Details: the description and analysis of data
|
|
structures and algorithms used to implement this PEP, as well as the
|
|
necessary changes to CPython.
|
|
|
|
For the purpose of this section, we define *execution context* as an
|
|
opaque container of non-local state that allows consistent access to
|
|
its contents in the concurrent execution environment.
|
|
|
|
A *context variable* is an object representing a value in the
|
|
execution context. A new context variable is created by calling
|
|
the ``new_context_var()`` function. A context variable object has
|
|
two methods:
|
|
|
|
* ``lookup()``: returns the value of the variable in the current
|
|
execution context;
|
|
|
|
* ``set()``: sets the value of the variable in the current
|
|
execution context.
|
|
|
|
|
|
Regular Single-threaded Code
|
|
----------------------------
|
|
|
|
In regular, single-threaded code that doesn't involve generators or
|
|
coroutines, context variables behave like globals::
|
|
|
|
var = new_context_var()
|
|
|
|
def sub():
|
|
assert var.lookup() == 'main'
|
|
var.set('sub')
|
|
|
|
def main():
|
|
var.set('main')
|
|
sub()
|
|
assert var.lookup() == 'sub'
|
|
|
|
|
|
Multithreaded Code
|
|
------------------
|
|
|
|
In multithreaded code, context variables behave like thread locals::
|
|
|
|
var = new_context_var()
|
|
|
|
def sub():
|
|
assert var.lookup() is None # The execution context is empty
|
|
# for each new thread.
|
|
var.set('sub')
|
|
|
|
def main():
|
|
var.set('main')
|
|
|
|
thread = threading.Thread(target=sub)
|
|
thread.start()
|
|
thread.join()
|
|
|
|
assert var.lookup() == 'main'
|
|
|
|
|
|
Generators
|
|
----------
|
|
|
|
In generators, changes to context variables are local and are not
|
|
visible to the caller, but are visible to the code called by the
|
|
generator. Once set in the generator, the context variable is
|
|
guaranteed not to change between iterations::
|
|
|
|
var = new_context_var()
|
|
|
|
def gen():
|
|
var.set('gen')
|
|
assert var.lookup() == 'gen'
|
|
yield 1
|
|
|
|
assert var.lookup() == 'gen'
|
|
yield 2
|
|
|
|
def main():
|
|
var.set('main')
|
|
|
|
g = gen()
|
|
next(g)
|
|
assert var.lookup() == 'main'
|
|
|
|
var.set('main modified')
|
|
next(g)
|
|
assert var.lookup() == 'main modified'
|
|
|
|
Changes to caller's context variables are visible to the generator
|
|
(unless they were also modified inside the generator)::
|
|
|
|
var = new_context_var()
|
|
|
|
def gen():
|
|
assert var.lookup() == 'var'
|
|
yield 1
|
|
|
|
assert var.lookup() == 'var modified'
|
|
yield 2
|
|
|
|
def main():
|
|
g = gen()
|
|
|
|
var.set('var')
|
|
next(g)
|
|
|
|
var.set('var modified')
|
|
next(g)
|
|
|
|
Now, let's revisit the decimal precision example from the `Rationale`_
|
|
section, and see how the execution context can improve the situation::
|
|
|
|
import decimal
|
|
|
|
decimal_prec = new_context_var() # create a new context variable
|
|
|
|
# Pre-PEP 550 Decimal relies on TLS for its context.
|
|
# This subclass switches the decimal context storage
|
|
# to the execution context for illustration purposes.
|
|
#
|
|
class MyDecimal(decimal.Decimal):
|
|
def __init__(self, value="0"):
|
|
prec = decimal_prec.lookup()
|
|
if prec is None:
|
|
raise ValueError('could not find decimal precision')
|
|
context = decimal.Context(prec=prec)
|
|
super().__init__(value, context=context)
|
|
|
|
def fractions(precision, x, y):
|
|
# Normally, this would be set by a context manager,
|
|
# but for simplicity we do this directly.
|
|
decimal_prec.set(precision)
|
|
|
|
yield MyDecimal(x) / MyDecimal(y)
|
|
yield MyDecimal(x) / MyDecimal(y**2)
|
|
|
|
g1 = fractions(precision=2, x=1, y=3)
|
|
g2 = fractions(precision=6, x=2, y=3)
|
|
|
|
items = list(zip(g1, g2))
|
|
|
|
The value of ``items`` is::
|
|
|
|
[(Decimal('0.33'), Decimal('0.666667')),
|
|
(Decimal('0.11'), Decimal('0.222222'))]
|
|
|
|
which matches the expected result.
|
|
|
|
|
|
Coroutines and Asynchronous Tasks
|
|
---------------------------------
|
|
|
|
In coroutines, like in generators, context variable changes are local
|
|
and are not visible to the caller::
|
|
|
|
import asyncio
|
|
|
|
var = new_context_var()
|
|
|
|
async def sub():
|
|
assert var.lookup() == 'main'
|
|
var.set('sub')
|
|
assert var.lookup() == 'sub'
|
|
|
|
async def main():
|
|
var.set('main')
|
|
await sub()
|
|
assert var.lookup() == 'main'
|
|
|
|
loop = asyncio.get_event_loop()
|
|
loop.run_until_complete(main())
|
|
|
|
To establish the full semantics of execution context in couroutines,
|
|
we must also consider *tasks*. A task is the abstraction used by
|
|
*asyncio*, and other similar libraries, to manage the concurrent
|
|
execution of coroutines. In the example above, a task is created
|
|
implicitly by the ``run_until_complete()`` function.
|
|
``asyncio.wait_for()`` is another example of implicit task creation::
|
|
|
|
async def sub():
|
|
await asyncio.sleep(1)
|
|
assert var.lookup() == 'main'
|
|
|
|
async def main():
|
|
var.set('main')
|
|
|
|
# waiting for sub() directly
|
|
await sub()
|
|
|
|
# waiting for sub() with a timeout
|
|
await asyncio.wait_for(sub(), timeout=2)
|
|
|
|
var.set('main changed')
|
|
|
|
Intuitively, we expect the assertion in ``sub()`` to hold true in both
|
|
invocations, even though the ``wait_for()`` implementation actually
|
|
spawns a task, which runs ``sub()`` concurrently with ``main()``.
|
|
|
|
Thus, tasks **must** capture a snapshot of the current execution
|
|
context at the moment of their creation and use it to execute the
|
|
wrapped coroutine whenever that happens. If this is not done, then
|
|
innocuous looking changes like wrapping a coroutine in a ``wait_for()``
|
|
call would cause surprising breakage. This leads to the following::
|
|
|
|
import asyncio
|
|
|
|
var = new_context_var()
|
|
|
|
async def sub():
|
|
# Sleeping will make sub() run after
|
|
# `var` is modified in main().
|
|
await asyncio.sleep(1)
|
|
|
|
assert var.lookup() == 'main'
|
|
|
|
async def main():
|
|
var.set('main')
|
|
loop.create_task(sub()) # schedules asynchronous execution
|
|
# of sub().
|
|
assert var.lookup() == 'main'
|
|
var.set('main changed')
|
|
|
|
loop = asyncio.get_event_loop()
|
|
loop.run_until_complete(main())
|
|
|
|
In the above code we show how ``sub()``, running in a separate task,
|
|
sees the value of ``var`` as it was when ``loop.create_task(sub())``
|
|
was called.
|
|
|
|
Like tasks, the intuitive behaviour of callbacks scheduled with either
|
|
``Loop.call_soon()``, ``Loop.call_later()``, or
|
|
``Future.add_done_callback()`` is to also capture a snapshot of the
|
|
current execution context at the point of scheduling, and use it to
|
|
run the callback::
|
|
|
|
current_request = new_context_var()
|
|
|
|
def log_error(e):
|
|
logging.error('error when handling request %r',
|
|
current_request.lookup())
|
|
|
|
async def render_response():
|
|
...
|
|
|
|
async def handle_get_request(request):
|
|
current_request.set(request)
|
|
|
|
try:
|
|
return await render_response()
|
|
except Exception as e:
|
|
get_event_loop().call_soon(log_error, e)
|
|
return '500 - Internal Server Error'
|
|
|
|
|
|
Detailed Specification
|
|
======================
|
|
|
|
Conceptually, an *execution context* (EC) is a stack of logical
|
|
contexts. There is one EC per Python thread.
|
|
|
|
A *logical context* (LC) is a mapping of context variables to their
|
|
values in that particular LC.
|
|
|
|
A *context variable* is an object representing a value in the
|
|
execution context. A new context variable object is created by calling
|
|
the ``sys.new_context_var(name: str)`` function. The value of the
|
|
``name`` argument is not used by the EC machinery, but may be used for
|
|
debugging and introspection.
|
|
|
|
The context variable object has the following methods and attributes:
|
|
|
|
* ``name``: the value passed to ``new_context_var()``.
|
|
|
|
* ``lookup()``: traverses the execution context top-to-bottom,
|
|
until the variable value is found. Returns ``None``, if the variable
|
|
is not present in the execution context;
|
|
|
|
* ``set()``: sets the value of the variable in the topmost logical
|
|
context.
|
|
|
|
|
|
Generators
|
|
----------
|
|
|
|
When created, each generator object has an empty logical context object
|
|
stored in its ``__logical_context__`` attribute. This logical context
|
|
is pushed onto the execution context at the beginning of each generator
|
|
iteration and popped at the end::
|
|
|
|
var1 = sys.new_context_var('var1')
|
|
var2 = sys.new_context_var('var2')
|
|
|
|
def gen():
|
|
var1.set('var1-gen')
|
|
var2.set('var2-gen')
|
|
|
|
# EC = [
|
|
# outer_LC(),
|
|
# gen_LC({var1: 'var1-gen', var2: 'var2-gen'})
|
|
# ]
|
|
n = nested_gen() # nested_gen_LC is created
|
|
next(n)
|
|
# EC = [
|
|
# outer_LC(),
|
|
# gen_LC({var1: 'var1-gen', var2: 'var2-gen'})
|
|
# ]
|
|
|
|
var1.set('var1-gen-mod')
|
|
var2.set('var2-gen-mod')
|
|
# EC = [
|
|
# outer_LC(),
|
|
# gen_LC({var1: 'var1-gen-mod', var2: 'var2-gen-mod'})
|
|
# ]
|
|
next(n)
|
|
|
|
def nested_gen():
|
|
# EC = [
|
|
# outer_LC(),
|
|
# gen_LC({var1: 'var1-gen', var2: 'var2-gen'}),
|
|
# nested_gen_LC()
|
|
# ]
|
|
assert var1.lookup() == 'var1-gen'
|
|
assert var2.lookup() == 'var2-gen'
|
|
|
|
var1.set('var1-nested-gen')
|
|
# EC = [
|
|
# outer_LC(),
|
|
# gen_LC({var1: 'var1-gen', var2: 'var2-gen'}),
|
|
# nested_gen_LC({var1: 'var1-nested-gen'})
|
|
# ]
|
|
yield
|
|
|
|
# EC = [
|
|
# outer_LC(),
|
|
# gen_LC({var1: 'var1-gen-mod', var2: 'var2-gen-mod'}),
|
|
# nested_gen_LC({var1: 'var1-nested-gen'})
|
|
# ]
|
|
assert var1.lookup() == 'var1-nested-gen'
|
|
assert var2.lookup() == 'var2-gen-mod'
|
|
|
|
yield
|
|
|
|
# EC = [outer_LC()]
|
|
|
|
g = gen() # gen_LC is created for the generator object `g`
|
|
list(g)
|
|
|
|
# EC = [outer_LC()]
|
|
|
|
The snippet above shows the state of the execution context stack
|
|
throughout the generator lifespan.
|
|
|
|
|
|
contextlib.contextmanager
|
|
-------------------------
|
|
|
|
Earlier, we've used the following example::
|
|
|
|
import decimal
|
|
|
|
# create a new context variable
|
|
decimal_prec = sys.new_context_var('decimal_prec')
|
|
|
|
# ...
|
|
|
|
def fractions(precision, x, y):
|
|
decimal_prec.set(precision)
|
|
|
|
yield MyDecimal(x) / MyDecimal(y)
|
|
yield MyDecimal(x) / MyDecimal(y**2)
|
|
|
|
Let's extend it by adding a context manager::
|
|
|
|
@contextlib.contextmanager
|
|
def precision_context(prec):
|
|
old_rec = decimal_prec.lookup()
|
|
|
|
try:
|
|
decimal_prec.set(prec)
|
|
yield
|
|
finally:
|
|
decimal_prec.set(old_prec)
|
|
|
|
Unfortunately, this would not work straight away, as the modification
|
|
to the ``decimal_prec`` variable is contained to the
|
|
``precision_context()`` generator, and therefore will not be visible
|
|
inside the ``with`` block::
|
|
|
|
def fractions(precision, x, y):
|
|
# EC = [{}, {}]
|
|
|
|
with precision_context(precision):
|
|
# EC becomes [{}, {}, {decimal_prec: precision}] in the
|
|
# *precision_context()* generator,
|
|
# but here the EC is still [{}, {}]
|
|
|
|
# raises ValueError('could not find decimal precision')!
|
|
yield MyDecimal(x) / MyDecimal(y)
|
|
yield MyDecimal(x) / MyDecimal(y**2)
|
|
|
|
The way to fix this is to set the generator's ``__logical_context__``
|
|
attribute to ``None``. This will cause the generator to avoid
|
|
modifying the execution context stack.
|
|
|
|
We modify the ``contextlib.contextmanager()`` decorator to
|
|
set ``genobj.__logical_context__`` to ``None`` to produce
|
|
well-behaved context managers::
|
|
|
|
def fractions(precision, x, y):
|
|
# EC = [{}, {}]
|
|
|
|
with precision_context(precision):
|
|
# EC = [{}, {decimal_prec: precision}]
|
|
|
|
yield MyDecimal(x) / MyDecimal(y)
|
|
yield MyDecimal(x) / MyDecimal(y**2)
|
|
|
|
# EC becomes [{}, {decimal_prec: None}]
|
|
|
|
|
|
asyncio
|
|
-------
|
|
|
|
``asyncio`` uses ``Loop.call_soon``, ``Loop.call_later``,
|
|
and ``Loop.call_at`` to schedule the asynchronous execution of a
|
|
function. ``asyncio.Task`` uses ``call_soon()`` to further the
|
|
execution of the wrapped coroutine.
|
|
|
|
We modify ``Loop.call_{at,later,soon}`` to accept the new
|
|
optional *execution_context* keyword argument, which defaults to
|
|
the copy of the current execution context::
|
|
|
|
def call_soon(self, callback, *args, execution_context=None):
|
|
if execution_context is None:
|
|
execution_context = sys.get_execution_context()
|
|
|
|
# ... some time later
|
|
|
|
sys.run_with_execution_context(
|
|
execution_context, callback, args)
|
|
|
|
The ``sys.get_execution_context()`` function returns a shallow copy
|
|
of the current execution context. By shallow copy here we mean such
|
|
a new execution context that:
|
|
|
|
* lookups in the copy provide the same results as in the original
|
|
execution context, and
|
|
* any changes in the original execution context do not affect the
|
|
copy, and
|
|
* any changes to the copy do not affect the original execution
|
|
context.
|
|
|
|
Either of the following satisfy the copy requirements:
|
|
|
|
* a new stack with shallow copies of logical contexts;
|
|
* a new stack with one squashed logical context.
|
|
|
|
The ``sys.run_with_execution_context(ec, func, *args, **kwargs)``
|
|
function runs ``func(*args, **kwargs)`` with *ec* as the execution
|
|
context. The function performs the following steps:
|
|
|
|
1. Set *ec* as the current execution context stack in the current
|
|
thread.
|
|
2. Push an empty logical context onto the stack.
|
|
3. Run ``func(*args, **kwargs)``.
|
|
4. Pop the logical context from the stack.
|
|
5. Restore the original execution context stack.
|
|
6. Return or raise the ``func()`` result.
|
|
|
|
These steps ensure that *ec* cannot be modified by *func*,
|
|
which makes ``run_with_execution_context()`` idempotent.
|
|
|
|
``asyncio.Task`` is modified as follows::
|
|
|
|
class Task:
|
|
def __init__(self, coro):
|
|
...
|
|
# Get the current execution context snapshot.
|
|
self._exec_context = sys.get_execution_context()
|
|
|
|
self._loop.call_soon(
|
|
self._step,
|
|
execution_context=self._exec_context)
|
|
|
|
def _step(self, exc=None):
|
|
...
|
|
self._loop.call_soon(
|
|
self._step,
|
|
execution_context=self._exec_context)
|
|
...
|
|
|
|
|
|
Generators Transformed into Iterators
|
|
-------------------------------------
|
|
|
|
Any Python generator can be represented as an equivalent iterator.
|
|
Compilers like Cython rely on this axiom. With respect to the
|
|
execution context, such iterator should behave the same way as the
|
|
generator it represents.
|
|
|
|
This means that there needs to be a Python API to create new logical
|
|
contexts and run code with a given logical context.
|
|
|
|
The ``sys.new_logical_context()`` function creates a new empty
|
|
logical context.
|
|
|
|
The ``sys.run_with_logical_context(lc, func, *args, **kwargs)``
|
|
function can be used to run functions in the specified logical context.
|
|
The *lc* can be modified as a result of the call.
|
|
|
|
The ``sys.run_with_logical_context()`` function performs the following
|
|
steps:
|
|
|
|
1. Push *lc* onto the current execution context stack.
|
|
2. Run ``func(*args, **kwargs)``.
|
|
3. Pop *lc* from the execution context stack.
|
|
4. Return or raise the ``func()`` result.
|
|
|
|
By using ``new_logical_context()`` and ``run_with_logical_context()``,
|
|
we can replicate the generator behaviour like this::
|
|
|
|
class Generator:
|
|
|
|
def __init__(self):
|
|
self.logical_context = sys.new_logical_context()
|
|
|
|
def __iter__(self):
|
|
return self
|
|
|
|
def __next__(self):
|
|
return sys.run_with_logical_context(
|
|
self.logical_context, self._next_impl)
|
|
|
|
def _next_impl(self):
|
|
# Actual __next__ implementation.
|
|
...
|
|
|
|
Let's see how this pattern can be applied to a real generator::
|
|
|
|
# create a new context variable
|
|
decimal_prec = sys.new_context_var('decimal_precision')
|
|
|
|
def gen_series(n, precision):
|
|
decimal_prec.set(precision)
|
|
|
|
for i in range(1, n):
|
|
yield MyDecimal(i) / MyDecimal(3)
|
|
|
|
# gen_series is equivalent to the following iterator:
|
|
|
|
class Series:
|
|
|
|
def __init__(self, n, precision):
|
|
# Create a new empty logical context on creation,
|
|
# like the generators do.
|
|
self.logical_context = sys.new_logical_context()
|
|
|
|
# run_with_logical_context() will pushes
|
|
# self.logical_context onto the execution context stack,
|
|
# runs self._next_impl, and pops self.logical_context
|
|
# from the stack.
|
|
return sys.run_with_logical_context(
|
|
self.logical_context, self._init, n, precision)
|
|
|
|
def _init(self, n, precision):
|
|
self.i = 1
|
|
self.n = n
|
|
decimal_prec.set(precision)
|
|
|
|
def __iter__(self):
|
|
return self
|
|
|
|
def __next__(self):
|
|
return sys.run_with_logical_context(
|
|
self.logical_context, self._next_impl)
|
|
|
|
def _next_impl(self):
|
|
decimal_prec.set(self.precision)
|
|
result = MyDecimal(self.i) / MyDecimal(3)
|
|
self.i += 1
|
|
return result
|
|
|
|
For regular iterators such approach to logical context management is
|
|
normally not necessary, and it is recommended to set and restore
|
|
context variables directly in ``__next__``::
|
|
|
|
class Series:
|
|
|
|
def __next__(self):
|
|
old_prec = decimal_prec.lookup()
|
|
|
|
try:
|
|
decimal_prec.set(self.precision)
|
|
...
|
|
finally:
|
|
decimal_prec.set(old_prec)
|
|
|
|
|
|
Asynchronous Generators
|
|
-----------------------
|
|
|
|
The execution context semantics in asynchronous generators does not
|
|
differ from that of regular generators and coroutines.
|
|
|
|
|
|
Implementation
|
|
==============
|
|
|
|
Execution context is implemented as an immutable linked list of
|
|
logical contexts, where each logical context is an immutable weak key
|
|
mapping. A pointer to the currently active execution context is stored
|
|
in the OS thread state::
|
|
|
|
+-----------------+
|
|
| | ec
|
|
| PyThreadState +-------------+
|
|
| | |
|
|
+-----------------+ |
|
|
|
|
|
ec_node ec_node ec_node v
|
|
+------+------+ +------+------+ +------+------+
|
|
| NULL | lc |<----| prev | lc |<----| prev | lc |
|
|
+------+--+---+ +------+--+---+ +------+--+---+
|
|
| | |
|
|
LC v LC v LC v
|
|
+-------------+ +-------------+ +-------------+
|
|
| var1: obj1 | | EMPTY | | var1: obj4 |
|
|
| var2: obj2 | +-------------+ +-------------+
|
|
| var3: obj3 |
|
|
+-------------+
|
|
|
|
The choice of the immutable list of immutable mappings as a fundamental
|
|
data structure is motivated by the need to efficiently implement
|
|
``sys.get_execution_context()``, which is to be frequently used by
|
|
asynchronous tasks and callbacks. When the EC is immutable,
|
|
``get_execution_context()`` can simply copy the current execution
|
|
context *by reference*::
|
|
|
|
def get_execution_context(self):
|
|
return PyThreadState_Get().ec
|
|
|
|
Let's review all possible context modification scenarios:
|
|
|
|
* The ``ContextVariable.set()`` method is called::
|
|
|
|
def ContextVar_set(self, val):
|
|
# See a more complete set() definition
|
|
# in the `Context Variables` section.
|
|
|
|
tstate = PyThreadState_Get()
|
|
top_ec_node = tstate.ec
|
|
top_lc = top_ec_node.lc
|
|
new_top_lc = top_lc.set(self, val)
|
|
tstate.ec = ec_node(
|
|
prev=top_ec_node.prev,
|
|
lc=new_top_lc)
|
|
|
|
* The ``sys.run_with_logical_context()`` is called, in which case
|
|
the passed logical context object is appended to the
|
|
execution context::
|
|
|
|
def run_with_logical_context(lc, func, *args, **kwargs):
|
|
tstate = PyThreadState_Get()
|
|
|
|
old_top_ec_node = tstate.ec
|
|
new_top_ec_node = ec_node(prev=old_top_ec_node, lc=lc)
|
|
|
|
try:
|
|
tstate.ec = new_top_ec_node
|
|
return func(*args, **kwargs)
|
|
finally:
|
|
tstate.ec = old_top_ec_node
|
|
|
|
* The ``sys.run_with_execution_context()`` is called, in which case
|
|
the current execution context is set to the passed execution context
|
|
with a new empty logical context appended to it::
|
|
|
|
def run_with_execution_context(ec, func, *args, **kwargs):
|
|
tstate = PyThreadState_Get()
|
|
|
|
old_top_ec_node = tstate.ec
|
|
new_lc = sys.new_logical_context()
|
|
new_top_ec_node = ec_node(prev=ec, lc=new_lc)
|
|
|
|
try:
|
|
tstate.ec = new_top_ec_node
|
|
return func(*args, **kwargs)
|
|
finally:
|
|
tstate.ec = old_top_ec_node
|
|
|
|
* Either ``genobj.send()``, ``genobj.throw()``, ``genobj.close()``
|
|
are called on a ``genobj`` generator, in which case the logical
|
|
context recorded in ``genobj`` is pushed onto the stack::
|
|
|
|
PyGen_New(PyGenObject *gen):
|
|
gen.__logical_context__ = sys.new_logical_context()
|
|
|
|
gen_send(PyGenObject *gen, ...):
|
|
tstate = PyThreadState_Get()
|
|
|
|
if gen.__logical_context__ is not None:
|
|
old_top_ec_node = tstate.ec
|
|
new_top_ec_node = ec_node(
|
|
prev=old_top_ec_node,
|
|
lc=gen.__logical_context__)
|
|
|
|
try:
|
|
tstate.ec = new_top_ec_node
|
|
return _gen_send_impl(gen, ...)
|
|
finally:
|
|
gen.__logical_context__ = tstate.ec.lc
|
|
tstate.ec = old_top_ec_node
|
|
else:
|
|
return _gen_send_impl(gen, ...)
|
|
|
|
* Coroutines and asynchronous generators share the implementation
|
|
with generators, and the above changes apply to them as well.
|
|
|
|
In certain scenarios the EC may need to be squashed to limit the
|
|
size of the chain. For example, consider the following corner case::
|
|
|
|
async def repeat(coro, delay):
|
|
await coro()
|
|
await asyncio.sleep(delay)
|
|
loop.create_task(repeat(coro, delay))
|
|
|
|
async def ping():
|
|
print('ping')
|
|
|
|
loop = asyncio.get_event_loop()
|
|
loop.create_task(repeat(ping, 1))
|
|
loop.run_forever()
|
|
|
|
In the above code, the EC chain will grow as long as ``repeat()`` is
|
|
called. Each new task will call ``sys.run_in_execution_context()``,
|
|
which will append a new logical context to the chain. To prevent
|
|
unbounded growth, ``sys.get_execution_context()`` checks if the chain
|
|
is longer than a predetermined maximum, and if it is, squashes the
|
|
chain into a single LC::
|
|
|
|
def get_execution_context():
|
|
tstate = PyThreadState_Get()
|
|
|
|
if tstate.ec_len > EC_LEN_MAX:
|
|
squashed_lc = sys.new_logical_context()
|
|
|
|
ec_node = tstate.ec
|
|
while ec_node:
|
|
# The LC.merge() method does not replace existing keys.
|
|
squashed_lc = squashed_lc.merge(ec_node.lc)
|
|
ec_node = ec_node.prev
|
|
|
|
return ec_node(prev=NULL, lc=squashed_lc)
|
|
else:
|
|
return tstate.ec
|
|
|
|
|
|
Logical Context
|
|
---------------
|
|
|
|
Logical context is an immutable weak key mapping which has the
|
|
following properties with respect to garbage collection:
|
|
|
|
* ``ContextVar`` objects are strongly-referenced only from the
|
|
application code, not from any of the Execution Context machinery
|
|
or values they point to. This means that there are no reference
|
|
cycles that could extend their lifespan longer than necessary, or
|
|
prevent their collection by the GC.
|
|
|
|
* Values put in the Execution Context are guaranteed to be kept
|
|
alive while there is a ``ContextVar`` key referencing them in
|
|
the thread.
|
|
|
|
* If a ``ContextVar`` is garbage collected, all of its values will
|
|
be removed from all contexts, allowing them to be GCed if needed.
|
|
|
|
* If a thread has ended its execution, its thread state will be
|
|
cleaned up along with its ``ExecutionContext``, cleaning
|
|
up all values bound to all context variables in the thread.
|
|
|
|
As discussed earluier, we need ``sys.get_execution_context()`` to be
|
|
consistently fast regardless of the size of the execution context, so
|
|
logical context is necessarily an immutable mapping.
|
|
|
|
Choosing ``dict`` for the underlying implementation is suboptimal,
|
|
because ``LC.set()`` will cause ``dict.copy()``, which is an O(N)
|
|
operation, where *N* is the number of items in the LC.
|
|
|
|
``get_execution_context()``, when squashing the EC, is a O(M)
|
|
operation, where *M* is the total number of context variable values
|
|
in the EC.
|
|
|
|
So, instead of ``dict``, we choose Hash Array Mapped Trie (HAMT)
|
|
as the underlying implementation of logical contexts. (Scala and
|
|
Clojure use HAMT to implement high performance immutable collections
|
|
[5]_, [6]_.)
|
|
|
|
With HAMT ``.set()`` becomes an O(log N) operation, and
|
|
``get_execution_context()`` squashing is more efficient on average due
|
|
to structural sharing in HAMT.
|
|
|
|
See `Appendix: HAMT Performance Analysis`_ for a more elaborate
|
|
analysis of HAMT performance compared to ``dict``.
|
|
|
|
|
|
Context Variables
|
|
-----------------
|
|
|
|
The ``ContextVar.lookup()`` and ``ContextVar.set()`` methods are
|
|
implemented as follows (in pseudo-code)::
|
|
|
|
class ContextVar:
|
|
|
|
def lookup(self):
|
|
tstate = PyThreadState_Get()
|
|
|
|
ec_node = tstate.ec
|
|
while ec_node:
|
|
if self in ec_node.lc:
|
|
return ec_node.lc[self]
|
|
ec_node = ec_node.prev
|
|
|
|
return None
|
|
|
|
def set(self, value):
|
|
tstate = PyThreadState_Get()
|
|
top_ec_node = tstate.ec
|
|
|
|
if top_ec_node is not None:
|
|
top_lc = top_ec_node.lc
|
|
new_top_lc = top_lc.set(self, value)
|
|
tstate.ec = ec_node(
|
|
prev=top_ec_node.prev,
|
|
lc=new_top_lc)
|
|
else:
|
|
top_lc = sys.new_logical_context()
|
|
new_top_lc = top_lc.set(self, value)
|
|
tstate.ec = ec_node(
|
|
prev=NULL,
|
|
lc=new_top_lc)
|
|
|
|
For efficient access in performance-sensitive code paths, such as in
|
|
``numpy`` and ``decimal``, we add a cache to ``ContextVar.get()``,
|
|
making it an O(1) operation when the cache is hit. The cache key is
|
|
composed from the following:
|
|
|
|
* The new ``uint64_t PyThreadState->unique_id``, which is a globally
|
|
unique thread state identifier. It is computed from the new
|
|
``uint64_t PyInterpreterState->ts_counter``, which is incremented
|
|
whenever a new thread state is created.
|
|
|
|
* The ``uint64_t ContextVar->version`` counter, which is incremented
|
|
whenever the context variable value is changed in any logical context
|
|
in any thread.
|
|
|
|
The cache is then implemented as follows::
|
|
|
|
class ContextVar:
|
|
|
|
def set(self, value):
|
|
... # implementation
|
|
self.version += 1
|
|
|
|
|
|
def lookup(self):
|
|
tstate = PyThreadState_Get()
|
|
|
|
if (self.last_tstate_id == tstate.unique_id and
|
|
self.last_version == self.version):
|
|
return self.last_value
|
|
|
|
value = self._lookup_uncached()
|
|
|
|
self.last_value = value # borrowed ref
|
|
self.last_tstate_id = tstate.unique_id
|
|
self.last_version = self.version
|
|
|
|
return value
|
|
|
|
Note that ``last_value`` is a borrowed reference. The assumption
|
|
is that if the version checks are fine, the object will be alive.
|
|
This allows the values of context variables to be properly garbage
|
|
collected.
|
|
|
|
This generic caching approach is similar to what the current C
|
|
implementation of ``decimal`` does to cache the the current decimal
|
|
context, and has similar performance characteristics.
|
|
|
|
|
|
Performance Considerations
|
|
==========================
|
|
|
|
Tests of the reference implementation based on the prior
|
|
revisions of this PEP have shown 1-2% slowdown on generator
|
|
microbenchmarks and no noticeable difference in macrobenchmarks.
|
|
|
|
The performance of non-generator and non-async code is not
|
|
affected by this PEP.
|
|
|
|
|
|
Summary of the New APIs
|
|
=======================
|
|
|
|
Python
|
|
------
|
|
|
|
The following new Python APIs are introduced by this PEP:
|
|
|
|
1. The ``sys.new_context_var(name: str='...')`` function to create
|
|
``ContextVar`` objects.
|
|
|
|
2. The ``ContextVar`` object, which has:
|
|
|
|
* the read-only ``.name`` attribute,
|
|
* the ``.lookup()`` method which returns the value of the variable
|
|
in the current execution context;
|
|
* the ``.set()`` method which sets the value of the variable in
|
|
the current execution context.
|
|
|
|
3. The ``sys.get_execution_context()`` function, which returns a
|
|
copy of the current execution context.
|
|
|
|
4. The ``sys.new_execution_context()`` function, which returns a new
|
|
empty execution context.
|
|
|
|
5. The ``sys.new_logical_context()`` function, which returns a new
|
|
empty logical context.
|
|
|
|
6. The ``sys.run_with_execution_context(ec: ExecutionContext,
|
|
func, *args, **kwargs)`` function, which runs *func* with the
|
|
provided execution context.
|
|
|
|
7. The ``sys.run_with_logical_context(lc:LogicalContext,
|
|
func, *args, **kwargs)`` function, which runs *func* with the
|
|
provided logical context on top of the current execution context.
|
|
|
|
|
|
C API
|
|
-----
|
|
|
|
1. ``PyContextVar * PyContext_NewVar(char *desc)``: create a
|
|
``PyContextVar`` object.
|
|
|
|
2. ``PyObject * PyContext_LookupVar(PyContextVar *)``: return
|
|
the value of the variable in the current execution context.
|
|
|
|
3. ``int PyContext_SetVar(PyContextVar *, PyObject *)``: set
|
|
the value of the variable in the current execution context.
|
|
|
|
4. ``PyLogicalContext * PyLogicalContext_New()``: create a new empty
|
|
``PyLogicalContext``.
|
|
|
|
5. ``PyLogicalContext * PyExecutionContext_New()``: create a new empty
|
|
``PyExecutionContext``.
|
|
|
|
6. ``PyExecutionContext * PyExecutionContext_Get()``: return the
|
|
current execution context.
|
|
|
|
7. ``int PyExecutionContext_Set(PyExecutionContext *)``: set the
|
|
passed EC object as the current for the active thread state.
|
|
|
|
8. ``int PyExecutionContext_SetWithLogicalContext(PyExecutionContext *,
|
|
PyLogicalContext *)``: allows to implement
|
|
``sys.run_with_logical_context`` Python API.
|
|
|
|
|
|
Design Considerations
|
|
=====================
|
|
|
|
Should ``PyThreadState_GetDict()`` use the execution context?
|
|
-------------------------------------------------------------
|
|
|
|
No. ``PyThreadState_GetDict`` is based on TLS, and changing its
|
|
semantics will break backwards compatibility.
|
|
|
|
|
|
PEP 521
|
|
-------
|
|
|
|
:pep:`521` proposes an alternative solution to the problem, which
|
|
extends the context manager protocol with two new methods:
|
|
``__suspend__()`` and ``__resume__()``. Similarly, the asynchronous
|
|
context manager protocol is also extended with ``__asuspend__()`` and
|
|
``__aresume__()``.
|
|
|
|
This allows implementing context managers that manage non-local state,
|
|
which behave correctly in generators and coroutines.
|
|
|
|
For example, consider the following context manager, which uses
|
|
execution state::
|
|
|
|
class Context:
|
|
|
|
def __init__(self):
|
|
self.var = new_context_var('var')
|
|
|
|
def __enter__(self):
|
|
self.old_x = self.var.lookup()
|
|
self.var.set('something')
|
|
|
|
def __exit__(self, *err):
|
|
self.var.set(self.old_x)
|
|
|
|
An equivalent implementation with PEP 521::
|
|
|
|
local = threading.local()
|
|
|
|
class Context:
|
|
|
|
def __enter__(self):
|
|
self.old_x = getattr(local, 'x', None)
|
|
local.x = 'something'
|
|
|
|
def __suspend__(self):
|
|
local.x = self.old_x
|
|
|
|
def __resume__(self):
|
|
local.x = 'something'
|
|
|
|
def __exit__(self, *err):
|
|
local.x = self.old_x
|
|
|
|
The downside of this approach is the addition of significant new
|
|
complexity to the context manager protocol and the interpreter
|
|
implementation. This approach is also likely to negatively impact
|
|
the performance of generators and coroutines.
|
|
|
|
Additionally, the solution in :pep:`521` is limited to context managers,
|
|
and does not provide any mechanism to propagate state in asynchronous
|
|
tasks and callbacks.
|
|
|
|
|
|
Can Execution Context be implemented outside of CPython?
|
|
--------------------------------------------------------
|
|
|
|
No. Proper generator behaviour with respect to the execution context
|
|
requires changes to the interpreter.
|
|
|
|
|
|
Should we update sys.displayhook and other APIs to use EC?
|
|
----------------------------------------------------------
|
|
|
|
APIs like redirecting stdout by overwriting ``sys.stdout``, or
|
|
specifying new exception display hooks by overwriting the
|
|
``sys.displayhook`` function are affecting the whole Python process
|
|
**by design**. Their users assume that the effect of changing
|
|
them will be visible across OS threads. Therefore we cannot
|
|
just make these APIs to use the new Execution Context.
|
|
|
|
That said we think it is possible to design new APIs that will
|
|
be context aware, but that is outside of the scope of this PEP.
|
|
|
|
|
|
Greenlets
|
|
---------
|
|
|
|
Greenlet is an alternative implementation of cooperative
|
|
scheduling for Python. Although greenlet package is not part of
|
|
CPython, popular frameworks like gevent rely on it, and it is
|
|
important that greenlet can be modified to support execution
|
|
contexts.
|
|
|
|
Conceptually, the behaviour of greenlets is very similar to that of
|
|
generators, which means that similar changes around greenlet entry
|
|
and exit can be done to add support for execution context.
|
|
|
|
|
|
Backwards Compatibility
|
|
=======================
|
|
|
|
This proposal preserves 100% backwards compatibility.
|
|
|
|
|
|
Appendix: HAMT Performance Analysis
|
|
===================================
|
|
|
|
.. figure:: pep-0550-hamt_vs_dict-v2.png
|
|
:align: center
|
|
:width: 100%
|
|
|
|
Figure 1. Benchmark code can be found here: [9]_.
|
|
|
|
The above chart demonstrates that:
|
|
|
|
* HAMT displays near O(1) performance for all benchmarked
|
|
dictionary sizes.
|
|
|
|
* ``dict.copy()`` becomes very slow around 100 items.
|
|
|
|
.. figure:: pep-0550-lookup_hamt.png
|
|
:align: center
|
|
:width: 100%
|
|
|
|
Figure 2. Benchmark code can be found here: [10]_.
|
|
|
|
Figure 2 compares the lookup costs of ``dict`` versus a HAMT-based
|
|
immutable mapping. HAMT lookup time is 30-40% slower than Python dict
|
|
lookups on average, which is a very good result, considering that the
|
|
latter is very well optimized.
|
|
|
|
Thre is research [8]_ showing that there are further possible
|
|
improvements to the performance of HAMT.
|
|
|
|
The reference implementation of HAMT for CPython can be found here:
|
|
[7]_.
|
|
|
|
|
|
Acknowledgments
|
|
===============
|
|
|
|
Thanks to Victor Petrovykh for countless discussions around the topic
|
|
and PEP proofreading and edits.
|
|
|
|
Thanks to Nathaniel Smith for proposing the ``ContextVar`` design
|
|
[17]_ [18]_, for pushing the PEP towards a more complete design, and
|
|
coming up with the idea of having a stack of contexts in the thread
|
|
state.
|
|
|
|
Thanks to Nick Coghlan for numerous suggestions and ideas on the
|
|
mailing list, and for coming up with a case that cause the complete
|
|
rewrite of the initial PEP version [19]_.
|
|
|
|
|
|
Version History
|
|
===============
|
|
|
|
1. Initial revision, posted on 11-Aug-2017 [20]_.
|
|
|
|
2. V2 posted on 15-Aug-2017 [21]_.
|
|
|
|
The fundamental limitation that caused a complete redesign of the
|
|
first version was that it was not possible to implement an iterator
|
|
that would interact with the EC in the same way as generators
|
|
(see [19]_.)
|
|
|
|
Version 2 was a complete rewrite, introducing new terminology
|
|
(Local Context, Execution Context, Context Item) and new APIs.
|
|
|
|
3. V3 posted on 18-Aug-2017 [22]_.
|
|
|
|
Updates:
|
|
|
|
* Local Context was renamed to Logical Context. The term "local"
|
|
was ambiguous and conflicted with local name scopes.
|
|
|
|
* Context Item was renamed to Context Key, see the thread with Nick
|
|
Coghlan, Stefan Krah, and Yury Selivanov [23]_ for details.
|
|
|
|
* Context Item get cache design was adjusted, per Nathaniel Smith's
|
|
idea in [25]_.
|
|
|
|
* Coroutines are created without a Logical Context; ceval loop
|
|
no longer needs to special case the ``await`` expression
|
|
(proposed by Nick Coghlan in [24]_.)
|
|
|
|
4. V4 posted on 25-Aug-2017: the current version.
|
|
|
|
* The specification section has been completely rewritten.
|
|
|
|
* Context Key renamed to Context Var.
|
|
|
|
* Removed the distinction between generators and coroutines with
|
|
respect to logical context isolation.
|
|
|
|
|
|
References
|
|
==========
|
|
|
|
.. [1] https://blog.golang.org/context
|
|
|
|
.. [2] https://msdn.microsoft.com/en-us/library/system.threading.executioncontext.aspx
|
|
|
|
.. [3] https://github.com/numpy/numpy/issues/9444
|
|
|
|
.. [4] http://bugs.python.org/issue31179
|
|
|
|
.. [5] https://en.wikipedia.org/wiki/Hash_array_mapped_trie
|
|
|
|
.. [6] http://blog.higher-order.net/2010/08/16/assoc-and-clojures-persistenthashmap-part-ii.html
|
|
|
|
.. [7] https://github.com/1st1/cpython/tree/hamt
|
|
|
|
.. [8] https://michael.steindorfer.name/publications/oopsla15.pdf
|
|
|
|
.. [9] https://gist.github.com/1st1/9004813d5576c96529527d44c5457dcd
|
|
|
|
.. [10] https://gist.github.com/1st1/dbe27f2e14c30cce6f0b5fddfc8c437e
|
|
|
|
.. [11] https://github.com/1st1/cpython/tree/pep550
|
|
|
|
.. [12] https://www.python.org/dev/peps/pep-0492/#async-await
|
|
|
|
.. [13] https://github.com/MagicStack/uvloop/blob/master/examples/bench/echoserver.py
|
|
|
|
.. [14] https://github.com/MagicStack/pgbench
|
|
|
|
.. [15] https://github.com/python/performance
|
|
|
|
.. [16] https://gist.github.com/1st1/6b7a614643f91ead3edf37c4451a6b4c
|
|
|
|
.. [17] https://mail.python.org/pipermail/python-ideas/2017-August/046752.html
|
|
|
|
.. [18] https://mail.python.org/pipermail/python-ideas/2017-August/046772.html
|
|
|
|
.. [19] https://mail.python.org/pipermail/python-ideas/2017-August/046775.html
|
|
|
|
.. [20] https://github.com/python/peps/blob/e8a06c9a790f39451d9e99e203b13b3ad73a1d01/pep-0550.rst
|
|
|
|
.. [21] https://github.com/python/peps/blob/e3aa3b2b4e4e9967d28a10827eed1e9e5960c175/pep-0550.rst
|
|
|
|
.. [22] https://github.com/python/peps/blob/287ed87bb475a7da657f950b353c71c1248f67e7/pep-0550.rst
|
|
|
|
.. [23] https://mail.python.org/pipermail/python-ideas/2017-August/046801.html
|
|
|
|
.. [24] https://mail.python.org/pipermail/python-ideas/2017-August/046790.html
|
|
|
|
.. [25] https://mail.python.org/pipermail/python-ideas/2017-August/046786.html
|
|
|
|
|
|
Copyright
|
|
=========
|
|
|
|
This document has been placed in the public domain.
|
|
|
|
|
|
..
|
|
Local Variables:
|
|
mode: indented-text
|
|
indent-tabs-mode: nil
|
|
sentence-end-double-space: t
|
|
fill-column: 70
|
|
coding: utf-8
|
|
End:
|