1176 lines
36 KiB
ReStructuredText
1176 lines
36 KiB
ReStructuredText
PEP: 550
|
|
Title: Execution Context
|
|
Version: $Revision$
|
|
Last-Modified: $Date$
|
|
Author: Yury Selivanov <yury@magic.io>
|
|
Status: Draft
|
|
Type: Standards Track
|
|
Content-Type: text/x-rst
|
|
Created: 11-Aug-2017
|
|
Python-Version: 3.7
|
|
Post-History: 11-Aug-2017, 15-Aug-2017
|
|
|
|
|
|
Abstract
|
|
========
|
|
|
|
This PEP proposes a new mechanism to manage execution state--the
|
|
logical environment in which a function, a thread, a generator,
|
|
or a coroutine executes in.
|
|
|
|
A few examples of where having a reliable state storage is required:
|
|
|
|
* Context managers like decimal contexts, ``numpy.errstate``,
|
|
and ``warnings.catch_warnings``;
|
|
|
|
* Storing request-related data such as security tokens and request
|
|
data in web applications, implementing i18n;
|
|
|
|
* Profiling, tracing, and logging in complex and large code bases.
|
|
|
|
The usual solution for storing state is to use a Thread-local Storage
|
|
(TLS), implemented in the standard library as ``threading.local()``.
|
|
Unfortunately, TLS does not work for the purpose of state isolation
|
|
for generators or asynchronous code, because such code executes
|
|
concurrently in a single thread.
|
|
|
|
|
|
Rationale
|
|
=========
|
|
|
|
Traditionally, a Thread-local Storage (TLS) is used for storing the
|
|
state. However, the major flaw of using the TLS is that it works only
|
|
for multi-threaded code. It is not possible to reliably contain the
|
|
state within a generator or a coroutine. For example, consider
|
|
the following generator::
|
|
|
|
def calculate(precision, ...):
|
|
with decimal.localcontext() as ctx:
|
|
# Set the precision for decimal calculations
|
|
# inside this block
|
|
ctx.prec = precision
|
|
|
|
yield calculate_something()
|
|
yield calculate_something_else()
|
|
|
|
Decimal context is using a TLS to store the state, and because TLS is
|
|
not aware of generators, the state can leak. If a user iterates over
|
|
the ``calculate()`` generator with different precisions one by one
|
|
using a ``zip()`` built-in, the above code will not work correctly.
|
|
For example::
|
|
|
|
g1 = calculate(precision=100)
|
|
g2 = calculate(precision=50)
|
|
|
|
items = list(zip(g1, g2))
|
|
|
|
# items[0] will be a tuple of:
|
|
# first value from g1 calculated with 100 precision,
|
|
# first value from g2 calculated with 50 precision.
|
|
#
|
|
# items[1] will be a tuple of:
|
|
# second value from g1 calculated with 50 precision (!!!),
|
|
# second value from g2 calculated with 50 precision.
|
|
|
|
An even scarier example would be using decimals to represent money
|
|
in an async/await application: decimal calculations can suddenly
|
|
lose precision in the middle of processing a request. Currently,
|
|
bugs like this are extremely hard to find and fix.
|
|
|
|
Another common need for web applications is to have access to the
|
|
current request object, or security context, or, simply, the request
|
|
URL for logging or submitting performance tracing data::
|
|
|
|
async def handle_http_request(request):
|
|
context.current_http_request = request
|
|
|
|
await ...
|
|
# Invoke your framework code, render templates,
|
|
# make DB queries, etc, and use the global
|
|
# 'current_http_request' in that code.
|
|
|
|
# This isn't currently possible to do reliably
|
|
# in asyncio out of the box.
|
|
|
|
These examples are just a few out of many, where a reliable way to
|
|
store context data is absolutely needed.
|
|
|
|
The inability to use TLS for asynchronous code has lead to
|
|
proliferation of ad-hoc solutions, which are limited in scope and
|
|
do not support all required use cases.
|
|
|
|
Current status quo is that any library, including the standard
|
|
library, that uses a TLS, will likely not work as expected in
|
|
asynchronous code or with generators (see [3]_ as an example issue.)
|
|
|
|
Some languages that have coroutines or generators recommend to
|
|
manually pass a ``context`` object to every function, see [1]_
|
|
describing the pattern for Go. This approach, however, has limited
|
|
use for Python, where we have a huge ecosystem that was built to work
|
|
with a TLS-like context. Moreover, passing the context explicitly
|
|
does not work at all for libraries like ``decimal`` or ``numpy``,
|
|
which use operator overloading.
|
|
|
|
.NET runtime, which has support for async/await, has a generic
|
|
solution of this problem, called ``ExecutionContext`` (see [2]_).
|
|
On the surface, working with it is very similar to working with a TLS,
|
|
but the former explicitly supports asynchronous code.
|
|
|
|
|
|
Goals
|
|
=====
|
|
|
|
The goal of this PEP is to provide a more reliable alternative to
|
|
``threading.local()``. It should be explicitly designed to work with
|
|
Python execution model, equally supporting threads, generators, and
|
|
coroutines.
|
|
|
|
An acceptable solution for Python should meet the following
|
|
requirements:
|
|
|
|
* Transparent support for code executing in threads, coroutines,
|
|
and generators with an easy to use API.
|
|
|
|
* Negligible impact on the performance of the existing code or the
|
|
code that will be using the new mechanism.
|
|
|
|
* Fast C API for packages like ``decimal`` and ``numpy``.
|
|
|
|
Explicit is still better than implicit, hence the new APIs should only
|
|
be used when there is no acceptable way of passing the state
|
|
explicitly.
|
|
|
|
|
|
Specification
|
|
=============
|
|
|
|
Execution Context is a mechanism of storing and accessing data specific
|
|
to a logical thread of execution. We consider OS threads,
|
|
generators, and chains of coroutines (such as ``asyncio.Task``)
|
|
to be variants of a logical thread.
|
|
|
|
In this specification, we will use the following terminology:
|
|
|
|
* **Local Context**, or LC, is a key/value mapping that stores the
|
|
context of a logical thread.
|
|
|
|
* **Execution Context**, or EC, is an OS-thread-specific dynamic
|
|
stack of Local Contexts.
|
|
|
|
* **Context Item**, or CI, is an object used to set and get values
|
|
from the Execution Context.
|
|
|
|
Please note that throughout the specification we use simple
|
|
pseudo-code to illustrate how the EC machinery works. The actual
|
|
algorithms and data structures that we will use to implement the PEP
|
|
are discussed in the `Implementation Strategy`_ section.
|
|
|
|
|
|
Context Item Object
|
|
-------------------
|
|
|
|
The ``sys.new_context_item(description)`` function creates a
|
|
new ``ContextItem`` object. The ``description`` parameter is a
|
|
``str``, explaining the nature of the context key for introspection
|
|
and debugging purposes.
|
|
|
|
``ContextItem`` objects have the following methods and attributes:
|
|
|
|
* ``.description``: read-only description;
|
|
|
|
* ``.set(o)`` method: set the value to ``o`` for the context item
|
|
in the execution context.
|
|
|
|
* ``.get()`` method: return the current EC value for the context item.
|
|
Context items are initialized with ``None`` when created, so
|
|
this method call never fails.
|
|
|
|
The below is an example of how context items can be used::
|
|
|
|
my_context = sys.new_context_item(description='mylib.context')
|
|
my_context.set('spam')
|
|
|
|
# Later, to access the value of my_context:
|
|
print(my_context.get())
|
|
|
|
|
|
Thread State and Multi-threaded code
|
|
------------------------------------
|
|
|
|
Execution Context is implemented on top of Thread-local Storage.
|
|
For every thread there is a separate stack of Local Contexts --
|
|
mappings of ``ContextItem`` objects to their values in the LC.
|
|
New threads always start with an empty EC.
|
|
|
|
For CPython::
|
|
|
|
PyThreadState:
|
|
execution_context: ExecutionContext([
|
|
LocalContext({ci1: val1, ci2: val2, ...}),
|
|
...
|
|
])
|
|
|
|
The ``ContextItem.get()`` and ``.set()`` methods are defined as
|
|
follows (in pseudo-code)::
|
|
|
|
class ContextItem:
|
|
|
|
def get(self):
|
|
tstate = PyThreadState_Get()
|
|
|
|
for local_context in reversed(tstate.execution_context):
|
|
if self in local_context:
|
|
return local_context[self]
|
|
|
|
return None
|
|
|
|
def set(self, value):
|
|
tstate = PyThreadState_Get()
|
|
|
|
if not tstate.execution_context:
|
|
tstate.execution_context = [LocalContext()]
|
|
|
|
tstate.execution_context[-1][self] = value
|
|
|
|
With the semantics defined so far, the Execution Context can already
|
|
be used as an alternative to ``threading.local()``::
|
|
|
|
def print_foo():
|
|
print(ci.get() or 'nothing')
|
|
|
|
ci = sys.new_context_item(description='test')
|
|
ci.set('foo')
|
|
|
|
# Will print "foo":
|
|
print_foo()
|
|
|
|
# Will print "nothing":
|
|
threading.Thread(target=print_foo).start()
|
|
|
|
|
|
Manual Context Management
|
|
-------------------------
|
|
|
|
Execution Context is generally managed by the Python interpreter,
|
|
but sometimes it is desirable for the user to take the control
|
|
over it. A few examples when this is needed:
|
|
|
|
* running a computation in ``concurrent.futures.ThreadPoolExecutor``
|
|
with the current EC;
|
|
|
|
* reimplementing generators with iterators (more on that later);
|
|
|
|
* managing contexts in asynchronous frameworks (implement proper
|
|
EC support in ``asyncio.Task`` and ``asyncio.loop.call_soon``.)
|
|
|
|
For these purposes we add a set of new APIs (they will be used in
|
|
later sections of this specification):
|
|
|
|
* ``sys.new_local_context()``: create an empty ``LocalContext``
|
|
object.
|
|
|
|
* ``sys.new_execution_context()``: create an empty
|
|
``ExecutionContext`` object.
|
|
|
|
* Both ``LocalContext`` and ``ExecutionContext`` objects are opaque
|
|
to Python code, and there are no APIs to modify them.
|
|
|
|
* ``sys.get_execution_context()`` function. The function returns a
|
|
copy of the current EC: an ``ExecutionContext`` instance.
|
|
|
|
The runtime complexity of the actual implementation of this function
|
|
can be O(1), but for the purposes of this section it is equivalent
|
|
to::
|
|
|
|
def get_execution_context():
|
|
tstate = PyThreadState_Get()
|
|
return copy(tstate.execution_context)
|
|
|
|
* ``sys.run_with_execution_context(ec: ExecutionContext, func, *args,
|
|
**kwargs)`` runs ``func(*args, **kwargs)`` in the provided execution
|
|
context::
|
|
|
|
def run_with_execution_context(ec, func, *args, **kwargs):
|
|
tstate = PyThreadState_Get()
|
|
|
|
old_ec = tstate.execution_context
|
|
|
|
tstate.execution_context = ExecutionContext(
|
|
ec.local_contexts + [LocalContext()]
|
|
)
|
|
|
|
try:
|
|
return func(*args, **kwargs)
|
|
finally:
|
|
tstate.execution_context = old_ec
|
|
|
|
Any changes to Local Context by ``func`` will be ignored.
|
|
This allows to reuse one ``ExecutionContext`` object for multiple
|
|
invocations of different functions, without them being able to
|
|
affect each other's environment::
|
|
|
|
ci = sys.new_context_item('example')
|
|
ci.set('spam')
|
|
|
|
def func():
|
|
print(ci.get())
|
|
ci.set('ham')
|
|
|
|
ec = sys.get_execution_context()
|
|
|
|
sys.run_with_execution_context(ec, func)
|
|
sys.run_with_execution_context(ec, func)
|
|
|
|
# Will print:
|
|
# spam
|
|
# spam
|
|
|
|
* ``sys.run_with_local_context(lc: LocalContext, func, *args,
|
|
**kwargs)`` runs ``func(*args, **kwargs)`` in the current execution
|
|
context using the specified local context.
|
|
|
|
Any changes that ``func`` does to the local context will be
|
|
persisted in ``lc``. This behaviour is different from the
|
|
``run_with_execution_context()`` function, which always creates
|
|
a new throw-away local context.
|
|
|
|
In pseudo-code::
|
|
|
|
def run_with_local_context(lc, func, *args, **kwargs):
|
|
tstate = PyThreadState_Get()
|
|
|
|
old_ec = tstate.execution_context
|
|
|
|
tstate.execution_context = ExecutionContext(
|
|
old_ec.local_contexts + [lc]
|
|
)
|
|
|
|
try:
|
|
return func(*args, **kwargs)
|
|
finally:
|
|
tstate.execution_context = old_ec
|
|
|
|
Using the previous example::
|
|
|
|
ci = sys.new_context_item('example')
|
|
ci.set('spam')
|
|
|
|
def func():
|
|
print(ci.get())
|
|
ci.set('ham')
|
|
|
|
ec = sys.get_execution_context()
|
|
lc = sys.new_local_context()
|
|
|
|
sys.run_with_local_context(lc, func)
|
|
sys.run_with_local_context(lc, func)
|
|
|
|
# Will print:
|
|
# spam
|
|
# ham
|
|
|
|
As an example, let's make a subclass of
|
|
``concurrent.futures.ThreadPoolExecutor`` that preserves the execution
|
|
context for scheduled functions::
|
|
|
|
class Executor(concurrent.futures.ThreadPoolExecutor):
|
|
|
|
def submit(self, fn, *args, **kwargs):
|
|
context = sys.get_execution_context()
|
|
|
|
fn = functools.partial(
|
|
sys.run_with_execution_context, context,
|
|
fn, *args, **kwargs)
|
|
|
|
return super().submit(fn)
|
|
|
|
|
|
EC Semantics for Coroutines
|
|
---------------------------
|
|
|
|
Python :pep:`492` coroutines are used to implement cooperative
|
|
multitasking. For a Python end-user they are similar to threads,
|
|
especially when it comes to sharing resources or modifying
|
|
the global state.
|
|
|
|
An event loop is needed to schedule coroutines. Coroutines that
|
|
are explicitly scheduled by the user are usually called Tasks.
|
|
When a coroutine is scheduled, it can schedule other coroutines using
|
|
an ``await`` expression. In async/await world, awaiting a coroutine
|
|
is equivalent to a regular function call in synchronous code. Thus,
|
|
Tasks are similar to threads.
|
|
|
|
By drawing a parallel between regular multithreaded code and
|
|
async/await, it becomes apparent that any modification of the
|
|
execution context within one Task should be visible to all coroutines
|
|
scheduled within it. Any execution context modifications, however,
|
|
must not be visible to other Tasks executing within the same OS
|
|
thread.
|
|
|
|
|
|
Coroutine Object Modifications
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
To achieve this, a small set of modifications to the coroutine object
|
|
is needed:
|
|
|
|
* New ``cr_local_context`` attribute. This attribute is readable
|
|
and writable for Python code.
|
|
|
|
* When a coroutine object is instantiated, its ``cr_local_context``
|
|
is initialized with an empty Local Context.
|
|
|
|
* Coroutine's ``.send()`` and ``.throw()`` methods are modified as
|
|
follows (in pseudo-C)::
|
|
|
|
if coro.cr_local_context is not None:
|
|
tstate = PyThreadState_Get()
|
|
|
|
tstate.execution_context.push(coro.cr_local_context)
|
|
|
|
try:
|
|
# Perform the actual `Coroutine.send()` or
|
|
# `Coroutine.throw()` call.
|
|
return coro.send(...)
|
|
finally:
|
|
coro.cr_local_context = tstate.execution_context.pop()
|
|
else:
|
|
# Perform the actual `Coroutine.send()` or
|
|
# `Coroutine.throw()` call.
|
|
return coro.send(...)
|
|
|
|
* When Python interpreter sees an ``await`` instruction, it inspects
|
|
the ``cr_local_context`` attribute of the coroutine that is about
|
|
to be awaited. For ``await coro``:
|
|
|
|
* If ``coro.cr_local_context`` is an empty ``LocalContext`` object
|
|
that ``coro`` was created with, the interpreter will set
|
|
``coro.cr_local_context`` to ``None``.
|
|
|
|
* If ``coro.cr_local_context`` was modified by Python code, the
|
|
interpreter will leave it as is.
|
|
|
|
This makes any changes to execution context made by nested coroutine
|
|
calls within a Task to be visible throughout the Task::
|
|
|
|
ci = sys.new_context_item('example')
|
|
|
|
async def nested():
|
|
ci.set('nested')
|
|
|
|
async def main():
|
|
ci.set('main')
|
|
print('before:', ci.get())
|
|
await nested()
|
|
print('after:', ci.get())
|
|
|
|
# Will print:
|
|
# before: main
|
|
# after: nested
|
|
|
|
Essentially, coroutines work with Execution Context items similarly
|
|
to threads, and ``await`` expression acts like a function call.
|
|
|
|
This mechanism also works for ``yield from`` in generators decorated
|
|
with ``@types.coroutine`` or ``@asyncio.coroutine``, which are
|
|
called "generator-based coroutines" according to :pep:`492`,
|
|
and should be fully compatible with native async/await coroutines.
|
|
|
|
|
|
Tasks
|
|
^^^^^
|
|
|
|
In asynchronous frameworks like asyncio, coroutines are run by
|
|
an event loop, and need to be explicitly scheduled (in asyncio
|
|
coroutines are run by ``asyncio.Task``.)
|
|
|
|
With the currently defined semantics, the interpreter makes
|
|
coroutines linked by an ``await`` expression share the same
|
|
Local Context.
|
|
|
|
The interpreter, however, is not aware of the Task concept, and
|
|
cannot help with ensuring that new Tasks started in coroutines,
|
|
use the correct EC::
|
|
|
|
current_request = sys.new_context_item(description='request')
|
|
|
|
async def child():
|
|
print('current request:', repr(current_request.get()))
|
|
|
|
async def handle_request(request):
|
|
current_request.set(request)
|
|
event_loop.create_task(child)
|
|
|
|
run(top_coro())
|
|
|
|
# Will print:
|
|
# current_request: None
|
|
|
|
To enable correct Execution Context propagation into Tasks, the
|
|
asynchronous framework needs to assist the interpreter:
|
|
|
|
* When ``create_task`` is called, it should capture the current
|
|
execution context with ``sys.get_execution_context()`` and save it
|
|
on the Task object.
|
|
|
|
* When the Task object runs its coroutine object, it should execute
|
|
``.send()`` and ``.throw()`` methods within the captured
|
|
execution context, using the ``sys.run_with_execution_context()``
|
|
function.
|
|
|
|
With help from the asynchronous framework, the above snippet will
|
|
run correctly, and the ``child()`` coroutine will be able to access
|
|
the current request object through the ``current_request``
|
|
Context Item.
|
|
|
|
|
|
Event Loop Callbacks
|
|
^^^^^^^^^^^^^^^^^^^^
|
|
|
|
Similarly to Tasks, functions like asyncio's ``loop.call_soon()``
|
|
should capture the current execution context with
|
|
``sys.get_execution_context()`` and execute callbacks
|
|
within it with ``sys.run_with_execution_context()``.
|
|
|
|
This way the following code will work::
|
|
|
|
current_request = sys.new_context_item(description='request')
|
|
|
|
def log():
|
|
request = current_request.get()
|
|
print(request)
|
|
|
|
async def request_handler(request):
|
|
current_request.set(request)
|
|
get_event_loop.call_soon(log)
|
|
|
|
|
|
Generators
|
|
----------
|
|
|
|
Generators in Python, while similar to Coroutines, are used in a
|
|
fundamentally different way. They are producers of data, and
|
|
they use ``yield`` expression to suspend/resume their execution.
|
|
|
|
A crucial difference between ``await coro`` and ``yield value`` is
|
|
that the former expression guarantees that the ``coro`` will be
|
|
executed fully, while the latter is producing ``value`` and
|
|
suspending the generator until it gets iterated again.
|
|
|
|
Generators, similarly to coroutines, have a ``gi_local_context``
|
|
attribute, which is set to an empty Local Context when created.
|
|
|
|
Contrary to coroutines though, ``yield from o`` expression in
|
|
generators (that are not generator-based coroutines) is semantically
|
|
equivalent to ``for v in o: yield v``, therefore the interpreter does
|
|
not attempt to control their ``gi_local_context``.
|
|
|
|
|
|
EC Semantics for Generators
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
Every generator object has its own Local Context that stores
|
|
only its own local modifications of the context. When a generator
|
|
is being iterated, its local context will be put in the EC stack
|
|
of the current thread. This means that the generator will be able
|
|
to see access items from the surrounding context::
|
|
|
|
local = sys.new_context_item("local")
|
|
global = sys.new_context_item("global")
|
|
|
|
def generator():
|
|
local.set('inside gen:')
|
|
while True:
|
|
print(local.get(), global.get())
|
|
yield
|
|
|
|
g = gen()
|
|
|
|
local.set('hello')
|
|
global.set('spam')
|
|
next(g)
|
|
|
|
local.set('world')
|
|
global.set('ham')
|
|
next(g)
|
|
|
|
# Will print:
|
|
# inside gen: spam
|
|
# inside gen: ham
|
|
|
|
Any changes to the EC in nested generators are invisible to the outer
|
|
generator::
|
|
|
|
local = sys.new_context_item("local")
|
|
|
|
def inner_gen():
|
|
local.set('spam')
|
|
yield
|
|
|
|
def outer_gen():
|
|
local.set('ham')
|
|
yield from gen()
|
|
print(local.get())
|
|
|
|
list(outer_gen())
|
|
|
|
# Will print:
|
|
# ham
|
|
|
|
|
|
Running generators without LC
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
Similarly to coroutines, generators with ``gi_local_context``
|
|
set to ``None`` simply use the outer Local Context.
|
|
|
|
The ``@contextlib.contextmanager`` decorator uses this mechanism to
|
|
allow its generator to affect the EC::
|
|
|
|
item = sys.new_context_item('test')
|
|
|
|
@contextmanager
|
|
def context(x):
|
|
old = item.get()
|
|
item.set('x')
|
|
try:
|
|
yield
|
|
finally:
|
|
item.set(old)
|
|
|
|
with context('spam'):
|
|
|
|
with context('ham'):
|
|
print(1, item.get())
|
|
|
|
print(2, item.get())
|
|
|
|
# Will print:
|
|
# 1 ham
|
|
# 2 spam
|
|
|
|
|
|
Implementing Generators with Iterators
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
The Execution Context API allows to fully replicate EC behaviour
|
|
imposed on generators with a regular Python iterator class::
|
|
|
|
class Gen:
|
|
|
|
def __init__(self):
|
|
self.local_context = sys.new_local_context()
|
|
|
|
def __iter__(self):
|
|
return self
|
|
|
|
def __next__(self):
|
|
return sys.run_with_local_context(
|
|
self.local_context, self._next_impl)
|
|
|
|
def _next_impl(self):
|
|
# Actual __next__ implementation.
|
|
...
|
|
|
|
|
|
Asynchronous Generators
|
|
-----------------------
|
|
|
|
Asynchronous Generators (AG) interact with the Execution Context
|
|
similarly to regular generators.
|
|
|
|
They have an ``ag_local_context`` attribute, which, similarly to
|
|
regular generators, can be set to ``None`` to make them use the outer
|
|
Local Context. This is used by the new
|
|
``contextlib.asynccontextmanager`` decorator.
|
|
|
|
The EC support of ``await`` expression is implemented using the same
|
|
approach as in coroutines, see the `Coroutine Object Modifications`_
|
|
section.
|
|
|
|
|
|
Greenlets
|
|
---------
|
|
|
|
Greenlet is an alternative implementation of cooperative
|
|
scheduling for Python. Although greenlet package is not part of
|
|
CPython, popular frameworks like gevent rely on it, and it is
|
|
important that greenlet can be modified to support execution
|
|
contexts.
|
|
|
|
In a nutshell, greenlet design is very similar to design of
|
|
generators. The main difference is that for generators, the stack
|
|
is managed by the Python interpreter. Greenlet works outside of the
|
|
Python interpreter, and manually saves some ``PyThreadState``
|
|
fields and pushes/pops the C-stack. Thus the ``greenlet`` package
|
|
can be easily updated to use the new low-level `C API`_ to enable
|
|
full support of EC.
|
|
|
|
|
|
New APIs
|
|
========
|
|
|
|
Python
|
|
------
|
|
|
|
Python APIs were designed to completely hide the internal
|
|
implementation details, but at the same time provide enough control
|
|
over EC and LC to re-implement all of Python built-in objects
|
|
in pure Python.
|
|
|
|
1. ``sys.new_context_item(description='...')``: create a
|
|
``ContextItem`` object used to access/set values in EC.
|
|
|
|
2. ``ContextItem``:
|
|
|
|
* ``.description``: read-only attribute.
|
|
* ``.get()``: return the current value for the item.
|
|
* ``.set(o)``: set the current value in the EC for the item.
|
|
|
|
3. ``sys.get_execution_context()``: return the current
|
|
``ExecutionContext``.
|
|
|
|
4. ``sys.new_execution_context()``: create a new empty
|
|
``ExecutionContext``.
|
|
|
|
5. ``sys.new_local_context()``: create a new empty ``LocalContext``.
|
|
|
|
6. ``sys.run_with_execution_context(ec: ExecutionContext,
|
|
func, *args, **kwargs)``.
|
|
|
|
7. ``sys.run_with_local_context(lc:LocalContext,
|
|
func, *args, **kwargs)``.
|
|
|
|
|
|
C API
|
|
-----
|
|
|
|
1. ``PyContextItem * PyContext_NewItem(char *desc)``: create a
|
|
``PyContextItem`` object.
|
|
|
|
2. ``PyObject * PyContext_GetItem(PyContextItem *)``: get the
|
|
current value for the context item.
|
|
|
|
3. ``int PyContext_SetItem(PyContextItem *, PyObject *)``: set
|
|
the current value for the context item.
|
|
|
|
4. ``PyLocalContext * PyLocalContext_New()``: create a new empty
|
|
``PyLocalContext``.
|
|
|
|
5. ``PyLocalContext * PyExecutionContext_New()``: create a new empty
|
|
``PyExecutionContext``.
|
|
|
|
6. ``PyExecutionContext * PyExecutionContext_Get()``: get the
|
|
EC for the active thread state.
|
|
|
|
7. ``int PyExecutionContext_Set(PyExecutionContext *)``: set the
|
|
passed EC object as the current for the active thread state.
|
|
|
|
8. ``int PyExecutionContext_SetWithLocalContext(PyExecutionContext *,
|
|
PyLocalContext *)``: allows to implement
|
|
``sys.run_with_local_context`` Python API.
|
|
|
|
|
|
Implementation Strategy
|
|
=======================
|
|
|
|
LocalContext is a Weak Key Mapping
|
|
----------------------------------
|
|
|
|
Using a weak key mapping for ``LocalContext`` implementation
|
|
enables the following properties with regards to garbage
|
|
collection:
|
|
|
|
* ``ContextItem`` objects are strongly-referenced only from the
|
|
application code, not from any of the Execution Context
|
|
machinery or values they point to. This means that there
|
|
are no reference cycles that could extend their lifespan
|
|
longer than necessary, or prevent their garbage collection.
|
|
|
|
* Values put in the Execution Context are guaranteed to be kept
|
|
alive while there is a ``ContextItem`` key referencing them in
|
|
the thread.
|
|
|
|
* If a ``ContextItem`` is garbage collected, all of its values will
|
|
be removed from all contexts, allowing them to be GCed if needed.
|
|
|
|
* If a thread has ended its execution, its thread state will be
|
|
cleaned up along with its ``ExecutionContext``, cleaning
|
|
up all values bound to all Context Items in the thread.
|
|
|
|
|
|
ContextItem.get() Cache
|
|
-----------------------
|
|
|
|
We can add three new fields to ``PyThreadState`` and
|
|
``PyInterpreterState`` structs:
|
|
|
|
* ``uint64_t PyThreadState->unique_id``: a globally unique
|
|
thread state identifier (we can add a counter to
|
|
``PyInterpreterState`` and increment it when a new thread state is
|
|
created.)
|
|
|
|
* ``uint64_t PyInterpreterState->context_item_deallocs``: every time
|
|
a ``ContextItem`` is GCed, all Execution Contexts in all threads
|
|
will lose track of it. ``context_item_deallocs`` will simply
|
|
count all ``ContextItem`` deallocations.
|
|
|
|
* ``uint64_t PyThreadState->execution_context_ver``: every time
|
|
a new item is set, or an existing item is updated, or the stack
|
|
of execution contexts is changed in the thread, we increment this
|
|
counter.
|
|
|
|
The above two fields allow implementing a fast cache path in
|
|
``ContextItem.get()``, in pseudo-code::
|
|
|
|
class ContextItem:
|
|
|
|
def get(self):
|
|
tstate = PyThreadState_Get()
|
|
|
|
if (self.last_tstate_id == tstate.unique_id and
|
|
self.last_ver == tstate.execution_context_ver
|
|
self.last_deallocs ==
|
|
tstate.iterp.context_item_deallocs):
|
|
return self.last_value
|
|
|
|
value = None
|
|
for mapping in reversed(tstate.execution_context):
|
|
if self in mapping:
|
|
value = mapping[self]
|
|
break
|
|
|
|
self.last_value = value # borrowed ref
|
|
self.last_tstate_id = tstate.unique_id
|
|
self.last_ver = tstate.execution_context_ver
|
|
self.last_deallocs = tstate.interp.context_item_deallocs
|
|
|
|
return value
|
|
|
|
Note that ``last_value`` is a borrowed reference. The assumption
|
|
is that if all counters tests are OK, the object will be alive.
|
|
This allows the CI values to be properly GCed.
|
|
|
|
This is similar to the trick that decimal C implementation uses
|
|
for caching the current decimal context, and will have the same
|
|
performance characteristics, but available to all
|
|
Execution Context users.
|
|
|
|
|
|
Approach #1: Use a dict for LocalContext
|
|
----------------------------------------
|
|
|
|
The straightforward way of implementing the proposed EC
|
|
mechanisms is to create a ``WeakKeyDict`` on top of Python
|
|
``dict`` type.
|
|
|
|
To implement the ``ExecutionContext`` type we can use Python
|
|
``list`` (or a custom stack implementation with some
|
|
pre-allocation optimizations).
|
|
|
|
This approach will have the following runtime complexity:
|
|
|
|
* O(M) for ``ContextItem.get()``, where ``M`` is the number of
|
|
Local Contexts in the stack.
|
|
|
|
It is important to note that ``ContextItem.get()`` will implement
|
|
a cache making the operation O(1) for packages like ``decimal``
|
|
and ``numpy``.
|
|
|
|
* O(1) for ``ContextItem.set()``.
|
|
|
|
* O(N) for ``sys.get_execution_context()``, where ``N`` is the
|
|
total number of items in the current **execution** context.
|
|
|
|
|
|
Approach #2: Use HAMT for LocalContext
|
|
--------------------------------------
|
|
|
|
Languages like Clojure and Scala use Hash Array Mapped Tries (HAMT)
|
|
to implement high performance immutable collections [5]_, [6]_.
|
|
|
|
Immutable mappings implemented with HAMT have O(log\ :sub:`32`\ N)
|
|
performance for both ``set()``, ``get()``, and ``merge()`` operations,
|
|
which is essentially O(1) for relatively small mappings
|
|
(read about HAMT performance in CPython in the
|
|
`Appendix: HAMT Performance`_ section.)
|
|
|
|
In this approach we use the same design of the ``ExecutionContext``
|
|
as in Approach #1, but we will use HAMT backed weak key Local Context
|
|
implementation. With that we will have the following runtime
|
|
complexity:
|
|
|
|
* O(M * log\ :sub:`32`\ N) for ``ContextItem.get()``,
|
|
where ``M`` is the number of Local Contexts in the stack,
|
|
and ``N`` is the number of items in the EC. The operation will
|
|
essentially be O(M), because execution contexts are normally not
|
|
expected to have more than a few dozen of items.
|
|
|
|
(``ContextItem.get()`` will have the same caching mechanism as in
|
|
Approach #1.)
|
|
|
|
* O(log\ :sub:`32`\ N) for ``ContextItem.set()`` where ``N`` is the
|
|
number of items in the current **local** context. This will
|
|
essentially be an O(1) operation most of the time.
|
|
|
|
* O(log\ :sub:`32`\ N) for ``sys.get_execution_context()``, where
|
|
``N`` is the total number of items in the current **execution**
|
|
context.
|
|
|
|
Essentially, using HAMT for Local Contexts instead of Python dicts,
|
|
allows to bring down the complexity of ``sys.get_execution_context()``
|
|
from O(N) to O(log\ :sub:`32`\ N) because of the more efficient
|
|
merge algorithm.
|
|
|
|
|
|
Approach #3: Use HAMT and Immutable Linked List
|
|
-----------------------------------------------
|
|
|
|
We can make an alternative ``ExecutionContext`` design by using
|
|
a linked list. Each ``LocalContext`` in the ``ExecutionContext``
|
|
object will be wrapped in a linked-list node.
|
|
|
|
``LocalContext`` objects will use an HAMT backed weak key
|
|
implementation described in the Approach #2.
|
|
|
|
Every modification to the current ``LocalContext`` will produce a
|
|
new version of it, which will be wrapped in a **new linked list
|
|
node**. Essentially this means, that ``ExecutionContext`` is an
|
|
immutable forest of ``LocalContext`` objects, and can be safely
|
|
copied by reference in ``sys.get_execution_context()`` (eliminating
|
|
the expensive "merge" operation.)
|
|
|
|
With this approach, ``sys.get_execution_context()`` will be an
|
|
**O(1) operation**.
|
|
|
|
|
|
Summary
|
|
-------
|
|
|
|
We believe that approach #3 enables an efficient and complete
|
|
Execution Context implementation, with excellent runtime performance.
|
|
|
|
`ContextItem.get() Cache`_ enables fast retrieval of context items
|
|
for performance critical libraries like decimal and numpy.
|
|
|
|
Fast ``sys.get_execution_context()`` enables efficient management
|
|
of execution contexts in asynchronous libraries like asyncio.
|
|
|
|
|
|
Design Considerations
|
|
=====================
|
|
|
|
Can we fix ``PyThreadState_GetDict()``?
|
|
---------------------------------------
|
|
|
|
``PyThreadState_GetDict`` is a TLS, and some of its existing users
|
|
might depend on it being just a TLS. Changing its behaviour to follow
|
|
the Execution Context semantics would break backwards compatibility.
|
|
|
|
|
|
PEP 521
|
|
-------
|
|
|
|
:pep:`521` proposes an alternative solution to the problem:
|
|
enhance Context Manager Protocol with two new methods: ``__suspend__``
|
|
and ``__resume__``. To make it compatible with async/await,
|
|
the Asynchronous Context Manager Protocol will also need to be
|
|
extended with ``__asuspend__`` and ``__aresume__``.
|
|
|
|
This allows to implement context managers like decimal context and
|
|
``numpy.errstate`` for generators and coroutines.
|
|
|
|
The following code::
|
|
|
|
class Context:
|
|
|
|
def __enter__(self):
|
|
self.old_x = get_execution_context_item('x')
|
|
set_execution_context_item('x', 'something')
|
|
|
|
def __exit__(self, *err):
|
|
set_execution_context_item('x', self.old_x)
|
|
|
|
would become this::
|
|
|
|
local = threading.local()
|
|
|
|
class Context:
|
|
|
|
def __enter__(self):
|
|
self.old_x = getattr(local, 'x', None)
|
|
local.x = 'something'
|
|
|
|
def __suspend__(self):
|
|
local.x = self.old_x
|
|
|
|
def __resume__(self):
|
|
local.x = 'something'
|
|
|
|
def __exit__(self, *err):
|
|
local.x = self.old_x
|
|
|
|
Besides complicating the protocol, the implementation will likely
|
|
negatively impact performance of coroutines, generators, and any code
|
|
that uses context managers, and will notably complicate the
|
|
interpreter implementation.
|
|
|
|
:pep:`521` also does not provide any mechanism to propagate state
|
|
in a local context, like storing a request object in an HTTP request
|
|
handler to have better logging. Nor does it solve the leaking state
|
|
problem for greenlet/gevent.
|
|
|
|
|
|
Can Execution Context be implemented outside of CPython?
|
|
--------------------------------------------------------
|
|
|
|
Because async/await code needs an event loop to run it, an EC-like
|
|
solution can be implemented in a limited way for coroutines.
|
|
|
|
Generators, on the other hand, do not have an event loop or
|
|
trampoline, making it impossible to intercept their ``yield`` points
|
|
outside of the Python interpreter.
|
|
|
|
|
|
Backwards Compatibility
|
|
=======================
|
|
|
|
This proposal preserves 100% backwards compatibility.
|
|
|
|
|
|
Appendix: HAMT Performance
|
|
==========================
|
|
|
|
First, while investigating possibilities of how to implement an
|
|
immutable mapping in CPython, we were able to improve the efficiency
|
|
of ``dict.copy()`` up to 5 times: [4]_. All benchmarks in this
|
|
section were run against the optimized dict.
|
|
|
|
To assess if HAMT can be used for Execution Context, we implemented
|
|
it in CPython [7]_.
|
|
|
|
.. figure:: pep-0550-hamt_vs_dict.png
|
|
:align: center
|
|
:width: 100%
|
|
|
|
Figure 1. Benchmark code can be found here: [9]_.
|
|
|
|
Figure 1 shows that HAMT indeed displays O(1) performance for all
|
|
benchmarked dictionary sizes. For dictionaries with less than 100
|
|
items, HAMT is a bit slower than Python dict/shallow copy.
|
|
|
|
.. figure:: pep-0550-lookup_hamt.png
|
|
:align: center
|
|
:width: 100%
|
|
|
|
Figure 2. Benchmark code can be found here: [10]_.
|
|
|
|
Figure 2 shows comparison of lookup costs between Python dict
|
|
and an HAMT immutable mapping. HAMT lookup time is 30-40% worse
|
|
than Python dict lookups on average, which is a very good result,
|
|
considering how well Python dicts are optimized.
|
|
|
|
Note, that according to [8]_, HAMT design can be further improved.
|
|
|
|
There is a limitation of Python ``dict`` design which makes HAMT
|
|
the preferred choice for immutable mapping implementation:
|
|
dicts need to be resized periodically, and resize is expensive.
|
|
The ``dict.copy()`` optimization we were able to do (see [4]_) will
|
|
only work for dicts that had no deleted items. Dicts that had
|
|
deleted items need to be resized during ``copy()``, which makes it
|
|
much slower.
|
|
|
|
Because adding and deleting items from LocalContext is a very common
|
|
operation, we would not be able to always use the optimized
|
|
``dict.copy()`` for LocalContext, frequently resorting to use the
|
|
slower version of it.
|
|
|
|
|
|
Acknowledgments
|
|
===============
|
|
|
|
I thank Elvis Pranskevichus and Victor Petrovykh for countless
|
|
discussions around the topic and PEP proof reading and edits.
|
|
|
|
Thanks to Nathaniel Smith for proposing the ``ContextItem`` design
|
|
[17]_ [18]_, for pushing the PEP towards a more complete design, and
|
|
coming up with the idea of having a stack of contexts in the thread
|
|
state.
|
|
|
|
Thanks to Nick Coghlan for numerous suggestions and ideas on the
|
|
mailing list, and for coming up with a case that cause the complete
|
|
rewrite of the initial PEP version [19]_.
|
|
|
|
|
|
Version History
|
|
===============
|
|
|
|
1. Posted on 11-Aug-2017, view it here: [20]_.
|
|
|
|
The fundamental limitation that caused a complete redesign of the
|
|
PEP was that it was not possible to implement an iterator that
|
|
would interact with the EC in the same way as generators
|
|
(see [19]_.)
|
|
|
|
2. Posted on 15-Aug-2017: the current version.
|
|
|
|
|
|
References
|
|
==========
|
|
|
|
.. [1] https://blog.golang.org/context
|
|
|
|
.. [2] https://msdn.microsoft.com/en-us/library/system.threading.executioncontext.aspx
|
|
|
|
.. [3] https://github.com/numpy/numpy/issues/9444
|
|
|
|
.. [4] http://bugs.python.org/issue31179
|
|
|
|
.. [5] https://en.wikipedia.org/wiki/Hash_array_mapped_trie
|
|
|
|
.. [6] http://blog.higher-order.net/2010/08/16/assoc-and-clojures-persistenthashmap-part-ii.html
|
|
|
|
.. [7] https://github.com/1st1/cpython/tree/hamt
|
|
|
|
.. [8] https://michael.steindorfer.name/publications/oopsla15.pdf
|
|
|
|
.. [9] https://gist.github.com/1st1/9004813d5576c96529527d44c5457dcd
|
|
|
|
.. [10] https://gist.github.com/1st1/dbe27f2e14c30cce6f0b5fddfc8c437e
|
|
|
|
.. [11] https://github.com/1st1/cpython/tree/pep550
|
|
|
|
.. [12] https://www.python.org/dev/peps/pep-0492/#async-await
|
|
|
|
.. [13] https://github.com/MagicStack/uvloop/blob/master/examples/bench/echoserver.py
|
|
|
|
.. [14] https://github.com/MagicStack/pgbench
|
|
|
|
.. [15] https://github.com/python/performance
|
|
|
|
.. [16] https://gist.github.com/1st1/6b7a614643f91ead3edf37c4451a6b4c
|
|
|
|
.. [17] https://mail.python.org/pipermail/python-ideas/2017-August/046752.html
|
|
|
|
.. [18] https://mail.python.org/pipermail/python-ideas/2017-August/046772.html
|
|
|
|
.. [19] https://mail.python.org/pipermail/python-ideas/2017-August/046775.html
|
|
|
|
.. [20] https://github.com/python/peps/blob/e8a06c9a790f39451d9e99e203b13b3ad73a1d01/pep-0550.rst
|
|
|
|
|
|
Copyright
|
|
=========
|
|
|
|
This document has been placed in the public domain.
|
|
|
|
|
|
..
|
|
Local Variables:
|
|
mode: indented-text
|
|
indent-tabs-mode: nil
|
|
sentence-end-double-space: t
|
|
fill-column: 70
|
|
coding: utf-8
|
|
End:
|