1040 lines
34 KiB
ReStructuredText
1040 lines
34 KiB
ReStructuredText
PEP: 550
|
|
Title: Execution Context
|
|
Version: $Revision$
|
|
Last-Modified: $Date$
|
|
Author: Yury Selivanov <yury@magic.io>
|
|
Status: Draft
|
|
Type: Standards Track
|
|
Content-Type: text/x-rst
|
|
Created: 11-Aug-2017
|
|
Python-Version: 3.7
|
|
Post-History: 11-Aug-2017
|
|
|
|
|
|
Abstract
|
|
========
|
|
|
|
This PEP proposes a new mechanism to manage execution state--the
|
|
logical environment in which a function, a thread, a generator,
|
|
or a coroutine executes in.
|
|
|
|
A few examples of where having a reliable state storage is required:
|
|
|
|
* Context managers like decimal contexts, ``numpy.errstate``,
|
|
and ``warnings.catch_warnings``;
|
|
|
|
* Storing request-related data such as security tokens and request
|
|
data in web applications;
|
|
|
|
* Profiling, tracing, and logging in complex and large code bases.
|
|
|
|
The usual solution for storing state is to use a Thread-local Storage
|
|
(TLS), implemented in the standard library as ``threading.local()``.
|
|
Unfortunately, TLS does not work for isolating state of generators or
|
|
asynchronous code because such code shares a single thread.
|
|
|
|
|
|
Rationale
|
|
=========
|
|
|
|
Traditionally a Thread-local Storage (TLS) is used for storing the
|
|
state. However, the major flaw of using the TLS is that it works only
|
|
for multi-threaded code. It is not possible to reliably contain the
|
|
state within a generator or a coroutine. For example, consider
|
|
the following generator::
|
|
|
|
def calculate(precision, ...):
|
|
with decimal.localcontext() as ctx:
|
|
# Set the precision for decimal calculations
|
|
# inside this block
|
|
ctx.prec = precision
|
|
|
|
yield calculate_something()
|
|
yield calculate_something_else()
|
|
|
|
Decimal context is using a TLS to store the state, and because TLS is
|
|
not aware of generators, the state can leak. The above code will
|
|
not work correctly, if a user iterates over the ``calculate()``
|
|
generator with different precisions one by one using a ``zip()``
|
|
built-in, for example::
|
|
|
|
g1 = calculate(precision=100)
|
|
g2 = calculate(precision=50)
|
|
|
|
items = list(zip(g1, g2))
|
|
|
|
# items[0] will be a tuple of:
|
|
# first value from g1 calculated with 100 precision,
|
|
# first value from g2 calculated with 50 precision.
|
|
#
|
|
# items[1] will be a tuple of:
|
|
# second value from g1 calculated with 50 precision (!!!),
|
|
# second value from g2 calculated with 50 precision.
|
|
|
|
An even scarier example would be using decimals to represent money
|
|
in an async/await application: decimal calculations can suddenly
|
|
lose precision in the middle of processing a request. Currently,
|
|
bugs like this are extremely hard to find and fix.
|
|
|
|
Another common need for web applications is to have access to the
|
|
current request object, or security context, or, simply, the request
|
|
URL for logging or submitting performance tracing data::
|
|
|
|
async def handle_http_request(request):
|
|
context.current_http_request = request
|
|
|
|
await ...
|
|
# Invoke your framework code, render templates,
|
|
# make DB queries, etc, and use the global
|
|
# 'current_http_request' in that code.
|
|
|
|
# This isn't currently possible to do reliably
|
|
# in asyncio out of the box.
|
|
|
|
These examples are just a few out of many, where a reliable way to
|
|
store context data is absolutely needed.
|
|
|
|
The inability to use TLS for asynchronous code has lead to
|
|
proliferation of ad-hoc solutions, limited to be supported only by
|
|
code that was explicitly enabled to work with them.
|
|
|
|
Current status quo is that any library, including the standard
|
|
library, that uses a TLS, will likely not work as expected in
|
|
asynchronous code or with generators (see [3]_ as an example issue.)
|
|
|
|
Some languages that have coroutines or generators recommend to
|
|
manually pass a ``context`` object to every function, see [1]_
|
|
describing the pattern for Go. This approach, however, has limited
|
|
use for Python, where we have a huge ecosystem that was built to work
|
|
with a TLS-like context. Moreover, passing the context explicitly
|
|
does not work at all for libraries like ``decimal`` or ``numpy``,
|
|
which use operator overloading.
|
|
|
|
.NET runtime, which has support for async/await, has a generic
|
|
solution of this problem, called ``ExecutionContext`` (see [2]_).
|
|
On the surface, working with it is very similar to working with a TLS,
|
|
but the former explicitly supports asynchronous code.
|
|
|
|
|
|
Goals
|
|
=====
|
|
|
|
The goal of this PEP is to provide a more reliable alternative to
|
|
``threading.local()``. It should be explicitly designed to work with
|
|
Python execution model, equally supporting threads, generators, and
|
|
coroutines.
|
|
|
|
An acceptable solution for Python should meet the following
|
|
requirements:
|
|
|
|
* Transparent support for code executing in threads, coroutines,
|
|
and generators with an easy to use API.
|
|
|
|
* Negligible impact on the performance of the existing code or the
|
|
code that will be using the new mechanism.
|
|
|
|
* Fast C API for packages like ``decimal`` and ``numpy``.
|
|
|
|
Explicit is still better than implicit, hence the new APIs should only
|
|
be used when there is no option to pass the state explicitly.
|
|
|
|
With this PEP implemented, it should be possible to update a context
|
|
manager like the below::
|
|
|
|
_local = threading.local()
|
|
|
|
@contextmanager
|
|
def context(x):
|
|
old_x = getattr(_local, 'x', None)
|
|
_local.x = x
|
|
try:
|
|
yield
|
|
finally:
|
|
_local.x = old_x
|
|
|
|
to a more robust version that can be reliably used in generators
|
|
and async/await code, with a simple transformation::
|
|
|
|
@contextmanager
|
|
def context(x):
|
|
old_x = get_execution_context_item('x')
|
|
set_execution_context_item('x', x)
|
|
try:
|
|
yield
|
|
finally:
|
|
set_execution_context_item('x', old_x)
|
|
|
|
|
|
Specification
|
|
=============
|
|
|
|
This proposal introduces a new concept called Execution Context (EC),
|
|
along with a set of Python APIs and C APIs to interact with it.
|
|
|
|
EC is implemented using an immutable mapping. Every modification
|
|
of the mapping produces a new copy of it. To illustrate what it
|
|
means let's compare it to how we work with tuples in Python::
|
|
|
|
a0 = ()
|
|
a1 = a0 + (1,)
|
|
a2 = a1 + (2,)
|
|
|
|
# a0 is an empty tuple
|
|
# a1 is (1,)
|
|
# a2 is (1, 2)
|
|
|
|
Manipulating an EC object would be similar::
|
|
|
|
a0 = EC()
|
|
a1 = a0.set('foo', 'bar')
|
|
a2 = a1.set('spam', 'ham')
|
|
|
|
# a0 is an empty mapping
|
|
# a1 is {'foo': 'bar'}
|
|
# a2 is {'foo': 'bar', 'spam': 'ham'}
|
|
|
|
In CPython, every thread that can execute Python code has a
|
|
corresponding ``PyThreadState`` object. It encapsulates important
|
|
runtime information like a pointer to the current frame, and is
|
|
being used by the ceval loop extensively. We add a new field to
|
|
``PyThreadState``, called ``exec_context``, which points to the
|
|
current EC object.
|
|
|
|
We also introduce a set of APIs to work with Execution Context.
|
|
In this section we will only cover two functions that are needed to
|
|
explain how Execution Context works. See the full list of new APIs
|
|
in the `New APIs`_ section.
|
|
|
|
* ``sys.get_execution_context_item(key, default=None)``: lookup
|
|
``key`` in the EC of the executing thread. If not found,
|
|
return ``default``.
|
|
|
|
* ``sys.set_execution_context_item(key, value)``: get the
|
|
current EC of the executing thread. Add a ``key``/``value``
|
|
item to it, which will produce a new EC object. Set the
|
|
new object as the current one for the executing thread.
|
|
In pseudo-code::
|
|
|
|
tstate = PyThreadState_GET()
|
|
ec = tstate.exec_context
|
|
ec2 = ec.set(key, value)
|
|
tstate.exec_context = ec2
|
|
|
|
Note, that some important implementation details and optimizations
|
|
are omitted here, and will be covered in later sections of this PEP.
|
|
|
|
Now let's see how Execution Contexts work with regular multi-threaded
|
|
code, generators, and coroutines.
|
|
|
|
|
|
Regular & Multithreaded Code
|
|
----------------------------
|
|
|
|
For regular Python code, EC behaves just like a thread-local. Any
|
|
modification of the EC object produces a new one, which is immediately
|
|
set as the current one for the thread state.
|
|
|
|
.. figure:: pep-0550-functions.png
|
|
:align: center
|
|
:width: 100%
|
|
|
|
Figure 1. Execution Context flow in a thread.
|
|
|
|
As Figure 1 illustrates, if a function calls
|
|
``set_execution_context_item()``, the modification of the execution
|
|
context will be visible to all subsequent calls and to the caller::
|
|
|
|
def set_foo():
|
|
set_execution_context_item('foo', 'spam')
|
|
|
|
set_execution_context_item('foo', 'bar')
|
|
print(get_execution_context_item('foo'))
|
|
|
|
set_foo()
|
|
print(get_execution_context_item('foo'))
|
|
|
|
# will print:
|
|
# bar
|
|
# spam
|
|
|
|
|
|
Coroutines
|
|
----------
|
|
|
|
Python :pep:`492` coroutines are used to implement cooperative
|
|
multitasking. For a Python end-user they are similar to threads,
|
|
especially when it comes to sharing resources or modifying
|
|
the global state.
|
|
|
|
An event loop is needed to schedule coroutines. Coroutines that
|
|
are explicitly scheduled by the user are usually called Tasks.
|
|
When a coroutine is scheduled, it can schedule other coroutines using
|
|
an ``await`` expression. In async/await world, awaiting a coroutine
|
|
can be viewed as a different calling convention: Tasks are similar to
|
|
threads, and awaiting on coroutines within a Task is similar to
|
|
calling functions within a thread.
|
|
|
|
By drawing a parallel between regular multithreaded code and
|
|
async/await, it becomes apparent that any modification of the
|
|
execution context within one Task should be visible to all coroutines
|
|
scheduled within it. Any execution context modifications, however,
|
|
must not be visible to other Tasks executing within the same thread.
|
|
|
|
To achieve this, a small set of modifications to the coroutine object
|
|
is needed:
|
|
|
|
* When a coroutine object is instantiated, it saves a reference to
|
|
the current execution context object to its ``cr_execution_context``
|
|
attribute.
|
|
|
|
* Coroutine's ``.send()`` and ``.throw()`` methods are modified as
|
|
follows (in pseudo-C)::
|
|
|
|
if coro->cr_isolated_execution_context:
|
|
# Save a reference to the current execution context
|
|
old_context = tstate->execution_context
|
|
|
|
# Set our saved execution context as the current
|
|
# for the current thread.
|
|
tstate->execution_context = coro->cr_execution_context
|
|
|
|
try:
|
|
# Perform the actual `Coroutine.send()` or
|
|
# `Coroutine.throw()` call.
|
|
return coro->send(...)
|
|
finally:
|
|
# Save a reference to the updated execution_context.
|
|
# We will need it later, when `.send()` or `.throw()`
|
|
# are called again.
|
|
coro->cr_execution_context = tstate->execution_context
|
|
|
|
# Restore thread's execution context to what it was before
|
|
# invoking this coroutine.
|
|
tstate->execution_context = old_context
|
|
else:
|
|
# Perform the actual `Coroutine.send()` or
|
|
# `Coroutine.throw()` call.
|
|
return coro->send(...)
|
|
|
|
* ``cr_isolated_execution_context`` is a new attribute on coroutine
|
|
objects. Set to ``True`` by default, it makes any execution context
|
|
modifications performed by coroutine to stay visible only to that
|
|
coroutine.
|
|
|
|
When Python interpreter sees an ``await`` instruction, it flips
|
|
``cr_isolated_execution_context`` to ``False`` for the coroutine
|
|
that is about to be awaited. This makes any changes to execution
|
|
context made by nested coroutine calls within a Task to be visible
|
|
throughout the Task.
|
|
|
|
Because the top-level coroutine (Task) cannot be scheduled with
|
|
``await`` (in asyncio you need to call ``loop.create_task()`` or
|
|
``asyncio.ensure_future()`` to schedule a Task), all execution
|
|
context modifications are guaranteed to stay within the Task.
|
|
|
|
* We always work with ``tstate->exec_context``. We use
|
|
``coro->cr_execution_context`` only to store coroutine's execution
|
|
context when it is not executing.
|
|
|
|
Figure 2 below illustrates how execution context mutations work with
|
|
coroutines.
|
|
|
|
.. figure:: pep-0550-coroutines.png
|
|
:align: center
|
|
:width: 100%
|
|
|
|
Figure 2. Execution Context flow in coroutines.
|
|
|
|
In the above diagram:
|
|
|
|
* When "coro1" is created, it saves a reference to the current
|
|
execution context "2".
|
|
|
|
* If it makes any change to the context, it will have its own
|
|
execution context branch "2.1".
|
|
|
|
* When it awaits on "coro2", any subsequent changes it does to
|
|
the execution context are visible to "coro1", but not outside
|
|
of it.
|
|
|
|
In code::
|
|
|
|
async def inner_foo():
|
|
print('inner_foo:', get_execution_context_item('key'))
|
|
set_execution_context_item('key', 2)
|
|
|
|
async def foo():
|
|
print('foo:', get_execution_context_item('key'))
|
|
|
|
set_execution_context_item('key', 1)
|
|
await inner_foo()
|
|
|
|
print('foo:', get_execution_context_item('key'))
|
|
|
|
|
|
set_execution_context_item('key', 'spam')
|
|
print('main:', get_execution_context_item('key'))
|
|
|
|
asyncio.get_event_loop().run_until_complete(foo())
|
|
|
|
print('main:', get_execution_context_item('key'))
|
|
|
|
which will output::
|
|
|
|
main: spam
|
|
foo: spam
|
|
inner_foo: 1
|
|
foo: 2
|
|
main: spam
|
|
|
|
Generator-based coroutines (generators decorated with
|
|
``types.coroutine`` or ``asyncio.coroutine``) behave exactly as
|
|
native coroutines with regards to execution context management:
|
|
their ``yield from`` expression is semantically equivalent to
|
|
``await``.
|
|
|
|
|
|
Generators
|
|
----------
|
|
|
|
Generators in Python, while similar to Coroutines, are used in a
|
|
fundamentally different way. They are producers of data, and
|
|
they use ``yield`` expression to suspend/resume their execution.
|
|
|
|
A crucial difference between ``await coro`` and ``yield value`` is
|
|
that the former expression guarantees that the ``coro`` will be
|
|
executed to the end, while the latter is producing ``value`` and
|
|
suspending the generator until it gets iterated again.
|
|
|
|
Generators share 99% of their implementation with coroutines, and
|
|
thus have similar new attributes ``gi_execution_context`` and
|
|
``gi_isolated_execution_context``. Similar to coroutines, generators
|
|
save a reference to the current execution context when they are
|
|
instantiated. The have the same implementation of ``.send()`` and
|
|
``.throw()`` methods.
|
|
|
|
The only difference is that ``gi_isolated_execution_context``
|
|
is always set to ``True``, and is never modified by the interpreter.
|
|
``yield from o`` expression in regular generators that are not
|
|
decorated with ``types.coroutine``, is semantically equivalent to
|
|
``for v in o: yield v``.
|
|
|
|
.. figure:: pep-0550-generators.png
|
|
:align: center
|
|
:width: 100%
|
|
|
|
Figure 3. Execution Context flow in a generator.
|
|
|
|
In the above diagram:
|
|
|
|
* When "gen1" is created, it saves a reference to the current
|
|
execution context "2".
|
|
|
|
* If it makes any change to the context, it will have its own
|
|
execution context branch "2.1".
|
|
|
|
* When "gen2" is created, it saves a reference to the current
|
|
execution context for it -- "2.1".
|
|
|
|
* Any subsequent execution context updated in "gen2" will only
|
|
be visible to "gen2".
|
|
|
|
* Likewise, any context changes that "gen1" will do after it
|
|
created "gen2" will not be visible to "gen2".
|
|
|
|
In code::
|
|
|
|
def inner_foo():
|
|
for i in range(3):
|
|
print('inner_foo:', get_execution_context_item('key'))
|
|
set_execution_context_item('key', i)
|
|
yield i
|
|
|
|
|
|
def foo():
|
|
set_execution_context_item('key', 'spam')
|
|
print('foo:', get_execution_context_item('key'))
|
|
|
|
inner = inner_foo()
|
|
|
|
while True:
|
|
val = next(inner, None)
|
|
if val is None:
|
|
break
|
|
yield val
|
|
print('foo:', get_execution_context_item('key'))
|
|
|
|
set_execution_context_item('key', 'spam')
|
|
print('main:', get_execution_context_item('key'))
|
|
|
|
list(foo())
|
|
|
|
print('main:', get_execution_context_item('key'))
|
|
|
|
which will output::
|
|
|
|
main: ham
|
|
foo: spam
|
|
inner_foo: spam
|
|
foo: spam
|
|
inner_foo: 0
|
|
foo: spam
|
|
inner_foo: 1
|
|
foo: spam
|
|
main: ham
|
|
|
|
As we see, any modification of the execution context in a generator
|
|
is visible only to the generator itself.
|
|
|
|
There is one use-case where it is desired for generators to affect
|
|
the surrounding execution context: ``contextlib.contextmanager``
|
|
decorator. To make the following work::
|
|
|
|
@contextmanager
|
|
def context(x):
|
|
old_x = get_execution_context_item('x')
|
|
set_execution_context_item('x', x)
|
|
try:
|
|
yield
|
|
finally:
|
|
set_execution_context_item('x', old_x)
|
|
|
|
we modified ``contextmanager`` to flip
|
|
``gi_isolated_execution_context`` flag to ``False`` on its generator.
|
|
|
|
|
|
Greenlets
|
|
---------
|
|
|
|
Greenlet is an alternative implementation of cooperative
|
|
scheduling for Python. Although greenlet package is not part of
|
|
CPython, popular frameworks like gevent rely on it, and it is
|
|
important that greenlet can be modified to support execution
|
|
contexts.
|
|
|
|
In a nutshell, greenlet design is very similar to design of
|
|
generators. The main difference is that for generators, the stack
|
|
is managed by the Python interpreter. Greenlet works outside of the
|
|
Python interpreter, and manually saves some ``PyThreadState``
|
|
fields and pushes/pops the C-stack. Since Execution Context is
|
|
implemented on top of ``PyThreadState``, it's easy to add
|
|
transparent support of it to greenlet.
|
|
|
|
|
|
New APIs
|
|
========
|
|
|
|
Even though this PEP adds a number of new APIs, please keep in mind,
|
|
that most Python users will likely ever use only two of them:
|
|
``sys.get_execution_context_item()`` and
|
|
``sys.set_execution_context_item()``.
|
|
|
|
|
|
Python
|
|
------
|
|
|
|
1. ``sys.get_execution_context_item(key, default=None)``: lookup
|
|
``key`` for the current Execution Context. If not found,
|
|
return ``default``.
|
|
|
|
2. ``sys.set_execution_context_item(key, value)``: set
|
|
``key``/``value`` item for the current Execution Context.
|
|
If ``value`` is ``None``, the item will be removed
|
|
(read more about it in
|
|
`Why setting a key to None removes the item?`_.)
|
|
|
|
3. ``sys.get_execution_context()``: return the current Execution
|
|
Context object: ``sys.ExecutionContext``.
|
|
|
|
4. ``sys.set_execution_context(ec)``: set the passed
|
|
``sys.ExecutionContext`` instance as a current one for the current
|
|
thread.
|
|
|
|
5. ``sys.ExecutionContext`` object.
|
|
|
|
Implementation detail: ``sys.ExecutionContext`` wraps a low-level
|
|
``PyExecContextData`` object. ``sys.ExecutionContext`` has a
|
|
mutable mapping API, abstracting away the real immutable
|
|
``PyExecContextData``.
|
|
|
|
* ``ExecutionContext()``: create a new, empty, execution context.
|
|
|
|
* ``ec.run(func, *args)`` method: run ``func(*args)`` in the
|
|
``ec`` execution context. Any changes to the Execution Context
|
|
performed by ``func`` will not be visible outside of the
|
|
``run()`` call, nor will affect the ``ec`` itself. In other
|
|
words, it's safe to do the following::
|
|
|
|
ec.run(func1)
|
|
ec.run(func2)
|
|
|
|
both ``func1`` and ``func2`` will be executed in the same
|
|
Execution Context.
|
|
|
|
* ``ec[key]``: lookup ``key`` in ``ec`` context.
|
|
|
|
* ``ec[key] = value``: assign ``key``/``value`` item to the ``ec``.
|
|
|
|
* ``ec.get()``, ``ec.items()``, ``ec.values()``, ``ec.keys()``, and
|
|
``ec.copy()`` are similar to that of ``dict`` object.
|
|
|
|
|
|
C API
|
|
-----
|
|
|
|
C API is different from the Python one because it operates directly
|
|
on the low-level immutable ``PyExecContextData`` object.
|
|
|
|
1. New ``PyThreadState->exec_context`` field, pointing to a
|
|
``PyExecContextData`` object.
|
|
|
|
2. ``PyThreadState_SetExecContextItem`` and
|
|
``PyThreadState_GetExecContextItem`` similar to
|
|
``sys.set_execution_context_item()`` and
|
|
``sys.get_execution_context_item()``.
|
|
|
|
3. ``PyThreadState_GetExecContext``: similar to
|
|
``sys.get_execution_context()``. Always returns an
|
|
``PyExecContextData`` object. If ``PyThreadState->exec_context``
|
|
is ``NULL`` an new and empty one will be created and assigned
|
|
to ``PyThreadState->exec_context``.
|
|
|
|
4. ``PyThreadState_SetExecContext``: similar to
|
|
``sys.set_execution_context()``.
|
|
|
|
5. ``PyExecContext_New``: create a new empty ``PyExecContextData``
|
|
object.
|
|
|
|
6. ``PyExecContext_SetItem`` and ``PyExecContext_GetItem``.
|
|
|
|
The exact layout of ``PyExecContextData`` is private, which allows
|
|
us to switch it to a different implementation later. More on that
|
|
in the `Implementation Details`_ section.
|
|
|
|
|
|
Modifications in Standard Library
|
|
=================================
|
|
|
|
* ``contextlib.contextmanager`` was updated to flip the new
|
|
``gi_isolated_execution_context`` attribute on the generator.
|
|
|
|
* ``asyncio.events.Handle`` object now captures the current
|
|
execution context when it is created, and uses the saved
|
|
execution context to run the callback (with
|
|
``ExecutionContext.run()`` method.) This makes
|
|
``loop.call_soon()`` to run callbacks in the execution context
|
|
they were scheduled.
|
|
|
|
No modifications in ``asyncio.Task`` or ``asyncio.Future`` were
|
|
necessary.
|
|
|
|
Some standard library modules like ``warnings`` and ``decimal``
|
|
can be updated to use new execution contexts. This will be considered
|
|
in separate issues if this PEP is accepted.
|
|
|
|
|
|
Backwards Compatibility
|
|
=======================
|
|
|
|
This proposal preserves 100% backwards compatibility.
|
|
|
|
|
|
Performance
|
|
===========
|
|
|
|
Implementation Details
|
|
----------------------
|
|
|
|
The new ``PyExecContextData`` object is wrapping a ``dict`` object.
|
|
Any modification requires creating a shallow copy of the dict.
|
|
|
|
While working on the reference implementation of this PEP, we were
|
|
able to optimize ``dict.copy()`` operation **5.5x**, see [4]_ for
|
|
details.
|
|
|
|
.. figure:: pep-0550-dict_copy.png
|
|
:align: center
|
|
:width: 100%
|
|
|
|
Figure 4.
|
|
|
|
Figure 4 shows that the performance of immutable dict implemented
|
|
with shallow copying is expectedly O(n) for the ``set()`` operation.
|
|
However, this is tolerable until dict has more than 100 items
|
|
(1 ``set()`` takes about a microsecond.)
|
|
|
|
Judging by the number of modules that need EC in Standard Library
|
|
it is likely that real world Python applications will use
|
|
significantly less than 100 execution context variables.
|
|
|
|
The important point is that the cost of accessing a key in
|
|
Execution Context is always O(1).
|
|
|
|
If the ``set()`` operation performance is a major concern, we discuss
|
|
alternative approaches that have O(1) or close ``set()`` performance
|
|
in `Alternative Immutable Dict Implementation`_, `Faster C API`_, and
|
|
`Copy-on-write Execution Context`_ sections.
|
|
|
|
|
|
Generators and Coroutines
|
|
-------------------------
|
|
|
|
Using a microbenchmark for generators and coroutines from :pep:`492`
|
|
([12]_), it was possible to observe 0.5 to 1% performance degradation.
|
|
|
|
asyncio "echo server" microbechmarks from the uvloop project [13]_
|
|
showed 1-1.5% performance degradation for asyncio code.
|
|
|
|
asyncpg benchmarks [14]_, that execute more code and are closer to a
|
|
real-world application, did not exhibit any noticeable performance
|
|
change.
|
|
|
|
|
|
Overall Performance Impact
|
|
--------------------------
|
|
|
|
The total number of changed lines in the ceval loop is 2 -- in the
|
|
``YIELD_FROM`` opcode implementation. Only performance of generators
|
|
and coroutines can be affected by the proposal.
|
|
|
|
This was confirmed by running Python Performance Benchmark Suite
|
|
[15]_, which demonstrated that there is no difference between
|
|
3.7 master branch and this PEP reference implementation branch
|
|
(full benchmark results can be found here [16]_.)
|
|
|
|
|
|
Design Considerations
|
|
=====================
|
|
|
|
Alternative Immutable Dict Implementation
|
|
-----------------------------------------
|
|
|
|
Languages like Clojure and Scala use Hash Array Mapped Tries (HAMT)
|
|
to implement high performance immutable collections [5]_, [6]_.
|
|
|
|
Immutable mappings implemented with HAMT have O(log\ :sub:`32`\ N)
|
|
performance for both ``set()`` and ``get()`` operations, which will
|
|
be essentially O(1) for relatively small mappings in EC.
|
|
|
|
To assess if HAMT can be used for Execution Context, we implemented
|
|
it in CPython [7]_.
|
|
|
|
.. figure:: pep-0550-hamt_vs_dict.png
|
|
:align: center
|
|
:width: 100%
|
|
|
|
Figure 5. Benchmark code can be found here: [9]_.
|
|
|
|
Figure 5 shows that HAMT indeed displays O(1) performance for all
|
|
benchmarked dictionary sizes. For dictionaries with less than 100
|
|
items, HAMT is a bit slower than Python dict/shallow copy.
|
|
|
|
.. figure:: pep-0550-lookup_hamt.png
|
|
:align: center
|
|
:width: 100%
|
|
|
|
Figure 6. Benchmark code can be found here: [10]_.
|
|
|
|
Figure 6 shows comparison of lookup costs between Python dict
|
|
and an HAMT immutable mapping. HAMT lookup time is 30-40% worse
|
|
than Python dict lookups on average, which is a very good result,
|
|
considering how well Python dicts are optimized.
|
|
|
|
Note, that according to [8]_, HAMT design can be further improved.
|
|
|
|
The bottom line is that the current approach with implementing
|
|
an immutable mapping with shallow-copying dict will likely perform
|
|
adequately in real-life applications. The HAMT solution is more
|
|
future proof, however.
|
|
|
|
The proposed API is designed in such a way that the underlying
|
|
implementation of the mapping can be changed completely without
|
|
affecting the Execution Context `Specification`_, which allows
|
|
us to switch to HAMT at some point if necessary.
|
|
|
|
|
|
Copy-on-write Execution Context
|
|
-------------------------------
|
|
|
|
The implementation of Execution Context in .NET is different from
|
|
this PEP. .NET uses copy-on-write mechanism and a regular mutable
|
|
mapping.
|
|
|
|
One way to implement this in CPython would be to have two new
|
|
fields in ``PyThreadState``:
|
|
|
|
* ``exec_context`` pointing to the current Execution Context mapping;
|
|
* ``exec_context_copy_on_write`` flag, set to ``0`` initially.
|
|
|
|
The idea is that whenever we are modifying the EC, the copy-on-write
|
|
flag is checked, and if it is set to ``1``, the EC is copied.
|
|
|
|
Modifications to Coroutine and Generator ``.send()`` and ``.throw()``
|
|
methods described in the `Coroutines`_ section will be almost the
|
|
same, except that in addition to the ``gi_execution_context`` they
|
|
will have a ``gi_exec_context_copy_on_write`` flag. When a coroutine
|
|
or a generator starts, the flag will be set to ``1``. This will
|
|
ensure that any modification of the EC performed within a coroutine
|
|
or a generator will be isolated.
|
|
|
|
This approach has one advantage:
|
|
|
|
* For Execution Context that contains a large number of items,
|
|
copy-on-write is a more efficient solution than the shallow-copy
|
|
dict approach.
|
|
|
|
However, we believe that copy-on-write disadvantages are more
|
|
important to consider:
|
|
|
|
* Copy-on-write behaviour for generators and coroutines makes
|
|
EC semantics less predictable.
|
|
|
|
With immutable EC approach, generators and coroutines always
|
|
execute in the EC that was current at the moment of their
|
|
creation. Any modifications to the outer EC while a generator
|
|
or a coroutine is executing are not visible to them::
|
|
|
|
def generator():
|
|
yield 1
|
|
print(get_execution_context_item('key'))
|
|
yield 2
|
|
|
|
set_execution_context_item('key', 'spam')
|
|
gen = iter(generator())
|
|
next(gen)
|
|
set_execution_context_item('key', 'ham')
|
|
next(gen)
|
|
|
|
The above script will always print 'spam' with immutable EC.
|
|
|
|
With a copy-on-write approach, the above script will print 'ham'.
|
|
Now, consider that ``generator()`` was refactored to call some
|
|
library function, that uses Execution Context::
|
|
|
|
def generator():
|
|
yield 1
|
|
some_function_that_uses_decimal_context()
|
|
print(get_execution_context_item('key'))
|
|
yield 2
|
|
|
|
Now, the script will print 'spam', because
|
|
``some_function_that_uses_decimal_context`` forced the EC to copy,
|
|
and ``set_execution_context_item('key', 'ham')`` line did not
|
|
affect the ``generator()`` code after all.
|
|
|
|
* Similarly to the previous point, ``sys.ExecutionContext.run()``
|
|
method will also become less predictable, as
|
|
``sys.get_execution_context()`` would still return a reference to
|
|
the current mutable EC.
|
|
|
|
We can't modify ``sys.get_execution_context()`` to return a shallow
|
|
copy of the current EC, because this would seriously harm
|
|
performance of ``asyncio.call_soon()`` and similar places, where
|
|
it is important to propagate the Execution Context.
|
|
|
|
* Even though copy-on-write requires to shallow copy the execution
|
|
context object less frequently, copying will still take place
|
|
in coroutines and generators. In which case, HAMT approach will
|
|
perform better for medium to large sized execution contexts.
|
|
|
|
All in all, we believe that the copy-on-write approach introduces
|
|
very subtle corner cases that could lead to bugs that are
|
|
exceptionally hard to discover and fix.
|
|
|
|
The immutable EC solution in comparison is always predictable and
|
|
easy to reason about. Therefore we believe that any slight
|
|
performance gain that the copy-on-write solution might offer is not
|
|
worth it.
|
|
|
|
|
|
Faster C API
|
|
------------
|
|
|
|
Packages like numpy and standard library modules like decimal need
|
|
to frequently query the global state for some local context
|
|
configuration. It is important that the APIs that they use is as
|
|
fast as possible.
|
|
|
|
The proposed ``PyThreadState_SetExecContextItem`` and
|
|
``PyThreadState_GetExecContextItem`` functions need to get the
|
|
current thread state with ``PyThreadState_GET()`` (fast) and then
|
|
perform a hash lookup (relatively slow). We can eliminate the hash
|
|
lookup by adding three additional C API functions:
|
|
|
|
* ``Py_ssize_t PyExecContext_RequestIndex(char *key_name)``:
|
|
a function similar to the existing ``_PyEval_RequestCodeExtraIndex``
|
|
introduced :pep:`523`. The idea is to request a unique index
|
|
that can later be used to lookup context items.
|
|
|
|
The ``key_name`` can later be used by ``sys.ExecutionContext`` to
|
|
introspect items added with this API.
|
|
|
|
* ``PyThreadState_SetExecContextIndexedItem(Py_ssize_t index, PyObject *val)``
|
|
and ``PyThreadState_GetExecContextIndexedItem(Py_ssize_t index)``
|
|
to request an item by its index, avoiding the cost of hash lookup.
|
|
|
|
|
|
Why setting a key to None removes the item?
|
|
-------------------------------------------
|
|
|
|
Consider a context manager::
|
|
|
|
@contextmanager
|
|
def context(x):
|
|
old_x = get_execution_context_item('x')
|
|
set_execution_context_item('x', x)
|
|
try:
|
|
yield
|
|
finally:
|
|
set_execution_context_item('x', old_x)
|
|
|
|
With ``set_execution_context_item(key, None)`` call removing the
|
|
``key``, the user doesn't need to write additional code to remove
|
|
the ``key`` if it wasn't in the execution context already.
|
|
|
|
An alternative design with ``del_execution_context_item()`` method
|
|
would look like the following::
|
|
|
|
@contextmanager
|
|
def context(x):
|
|
not_there = object()
|
|
old_x = get_execution_context_item('x', not_there)
|
|
set_execution_context_item('x', x)
|
|
try:
|
|
yield
|
|
finally:
|
|
if old_x is not_there:
|
|
del_execution_context_item('x')
|
|
else:
|
|
set_execution_context_item('x', old_x)
|
|
|
|
|
|
Can we fix ``PyThreadState_GetDict()``?
|
|
---------------------------------------
|
|
|
|
``PyThreadState_GetDict`` is a TLS, and some of its existing users
|
|
might depend on it being just a TLS. Changing its behaviour to follow
|
|
the Execution Context semantics would break backwards compatibility.
|
|
|
|
|
|
PEP 521
|
|
-------
|
|
|
|
:pep:`521` proposes an alternative solution to the problem:
|
|
enhance Context Manager Protocol with two new methods: ``__suspend__``
|
|
and ``__resume__``. To make it compatible with async/await,
|
|
the Asynchronous Context Manager Protocol will also need to be
|
|
extended with ``__asuspend__`` and ``__aresume__``.
|
|
|
|
This allows to implement context managers like decimal context and
|
|
``numpy.errstate`` for generators and coroutines.
|
|
|
|
The following code::
|
|
|
|
class Context:
|
|
|
|
def __enter__(self):
|
|
self.old_x = get_execution_context_item('x')
|
|
set_execution_context_item('x', 'something')
|
|
|
|
def __exit__(self, *err):
|
|
set_execution_context_item('x', self.old_x)
|
|
|
|
would become this::
|
|
|
|
local = threading.local()
|
|
|
|
class Context:
|
|
|
|
def __enter__(self):
|
|
self.old_x = getattr(local, 'x', None)
|
|
local.x = 'something'
|
|
|
|
def __suspend__(self):
|
|
local.x = self.old_x
|
|
|
|
def __resume__(self):
|
|
local.x = 'something'
|
|
|
|
def __exit__(self, *err):
|
|
local.x = self.old_x
|
|
|
|
Besides complicating the protocol, the implementation will likely
|
|
negatively impact performance of coroutines, generators, and any code
|
|
that uses context managers, and will notably complicate the
|
|
interpreter implementation.
|
|
|
|
:pep:`521` also does not provide any mechanism to propagate state
|
|
in a local context, like storing a request object in an HTTP request
|
|
handler to have better logging. Nor does it solve the leaking state
|
|
problem for greenlet/gevent.
|
|
|
|
|
|
Can Execution Context be implemented outside of CPython?
|
|
--------------------------------------------------------
|
|
|
|
Because async/await code needs an event loop to run it, an EC-like
|
|
solution can be implemented in a limited way for coroutines.
|
|
|
|
Generators, on the other hand, do not have an event loop or
|
|
trampoline, making it impossible to intercept their ``yield`` points
|
|
outside of the Python interpreter.
|
|
|
|
|
|
Reference Implementation
|
|
========================
|
|
|
|
The reference implementation can be found here: [11]_.
|
|
|
|
|
|
References
|
|
==========
|
|
|
|
.. [1] https://blog.golang.org/context
|
|
|
|
.. [2] https://msdn.microsoft.com/en-us/library/system.threading.executioncontext.aspx
|
|
|
|
.. [3] https://github.com/numpy/numpy/issues/9444
|
|
|
|
.. [4] http://bugs.python.org/issue31179
|
|
|
|
.. [5] https://en.wikipedia.org/wiki/Hash_array_mapped_trie
|
|
|
|
.. [6] http://blog.higher-order.net/2010/08/16/assoc-and-clojures-persistenthashmap-part-ii.html
|
|
|
|
.. [7] https://github.com/1st1/cpython/tree/hamt
|
|
|
|
.. [8] https://michael.steindorfer.name/publications/oopsla15.pdf
|
|
|
|
.. [9] https://gist.github.com/1st1/9004813d5576c96529527d44c5457dcd
|
|
|
|
.. [10] https://gist.github.com/1st1/dbe27f2e14c30cce6f0b5fddfc8c437e
|
|
|
|
.. [11] https://github.com/1st1/cpython/tree/pep550
|
|
|
|
.. [12] https://www.python.org/dev/peps/pep-0492/#async-await
|
|
|
|
.. [13] https://github.com/MagicStack/uvloop/blob/master/examples/bench/echoserver.py
|
|
|
|
.. [14] https://github.com/MagicStack/pgbench
|
|
|
|
.. [15] https://github.com/python/performance
|
|
|
|
.. [16] https://gist.github.com/1st1/6b7a614643f91ead3edf37c4451a6b4c
|
|
|
|
|
|
Copyright
|
|
=========
|
|
|
|
This document has been placed in the public domain.
|
|
|
|
|
|
..
|
|
Local Variables:
|
|
mode: indented-text
|
|
indent-tabs-mode: nil
|
|
sentence-end-double-space: t
|
|
fill-column: 70
|
|
coding: utf-8
|
|
End:
|