Add PEP 512 by Nathaniel Smith
This commit is contained in:
parent
75be7047af
commit
6f2c9ab021
|
@ -0,0 +1,387 @@
|
|||
PEP: 521
|
||||
Title: Managing global context via 'with' blocks in generators and coroutines
|
||||
Version: $Revision$
|
||||
Last-Modified: $Date$
|
||||
Author: Nathaniel J. Smith <njs@pobox.com>
|
||||
Status: Deferred
|
||||
Type: Standards Track
|
||||
Content-Type: text/x-rst
|
||||
Created: 27-Apr-2015
|
||||
Python-Version: 3.6
|
||||
Post-History: 29-Apr-2015
|
||||
|
||||
|
||||
Abstract
|
||||
========
|
||||
|
||||
While we generally try to avoid global state when possible, there
|
||||
nonetheless exist a number of situations where it is agreed to be the
|
||||
best approach. In Python, the standard way of handling such cases is
|
||||
to store the global state in global or thread-local storage, and then
|
||||
use ``with`` blocks to limit modifications of this global state to a
|
||||
single dynamic scope. Examples where this pattern is used include the
|
||||
standard library's ``warnings.catch_warnings`` and
|
||||
``decimal.localcontext``, NumPy's ``numpy.errstate`` (which exposes
|
||||
the error-handling settings provided by the IEEE 754 floating point
|
||||
standard), and the handling of logging context or HTTP request context
|
||||
in many server application frameworks.
|
||||
|
||||
However, there is currently no ergonomic way to manage such local
|
||||
changes to global state when writing a generator or coroutine. For
|
||||
example, this code::
|
||||
|
||||
def f():
|
||||
with warnings.catch_warnings():
|
||||
for x in g():
|
||||
yield x
|
||||
|
||||
may or may not successfully catch warnings raised by ``g()``, and may
|
||||
or may not inadverdantly swallow warnings triggered elsewhere in the
|
||||
code. The context manager, which was intended to apply only to ``f``
|
||||
and its callees, ends up having a dynamic scope that encompasses
|
||||
arbitrary and unpredictable parts of its call\ **ers**. This problem
|
||||
becomes particularly acute when writing asynchronous code, where
|
||||
essentially all functions become coroutines.
|
||||
|
||||
Here, we propose to solve this problem by notifying context managers
|
||||
whenever execution is suspended or resumed within their scope,
|
||||
allowing them to restrict their effects appropriately.
|
||||
|
||||
|
||||
Specification
|
||||
=============
|
||||
|
||||
Two new, optional, methods are added to the context manager protocol:
|
||||
``__suspend__`` and ``__resume__``. If present, these methods will be
|
||||
called whenever a frame's execution is suspended or resumed from
|
||||
within the context of the ``with`` block.
|
||||
|
||||
More formally, consider the following code::
|
||||
|
||||
with EXPR as VAR:
|
||||
PARTIAL-BLOCK-1
|
||||
f((yield foo))
|
||||
PARTIAL-BLOCK-2
|
||||
|
||||
Currently this is equivalent to the following code copied from PEP 343::
|
||||
|
||||
mgr = (EXPR)
|
||||
exit = type(mgr).__exit__ # Not calling it yet
|
||||
value = type(mgr).__enter__(mgr)
|
||||
exc = True
|
||||
try:
|
||||
try:
|
||||
VAR = value # Only if "as VAR" is present
|
||||
PARTIAL-BLOCK-1
|
||||
f((yield foo))
|
||||
PARTIAL-BLOCK-2
|
||||
except:
|
||||
exc = False
|
||||
if not exit(mgr, *sys.exc_info()):
|
||||
raise
|
||||
finally:
|
||||
if exc:
|
||||
exit(mgr, None, None, None)
|
||||
|
||||
This PEP proposes to modify ``with`` block handling to instead become::
|
||||
|
||||
mgr = (EXPR)
|
||||
exit = type(mgr).__exit__ # Not calling it yet
|
||||
### --- NEW STUFF ---
|
||||
if the_block_contains_yield_points: # known statically at compile time
|
||||
suspend = getattr(type(mgr), "__suspend__", lambda: None)
|
||||
resume = getattr(type(mgr), "__resume__", lambda: None)
|
||||
### --- END OF NEW STUFF ---
|
||||
value = type(mgr).__enter__(mgr)
|
||||
exc = True
|
||||
try:
|
||||
try:
|
||||
VAR = value # Only if "as VAR" is present
|
||||
PARTIAL-BLOCK-1
|
||||
### --- NEW STUFF ---
|
||||
suspend()
|
||||
tmp = yield foo
|
||||
resume()
|
||||
f(tmp)
|
||||
### --- END OF NEW STUFF ---
|
||||
PARTIAL-BLOCK-2
|
||||
except:
|
||||
exc = False
|
||||
if not exit(mgr, *sys.exc_info()):
|
||||
raise
|
||||
finally:
|
||||
if exc:
|
||||
exit(mgr, None, None, None)
|
||||
|
||||
Analogous suspend/resume calls are also wrapped around the ``yield``
|
||||
points embedded inside the ``yield from``, ``await``, ``async with``,
|
||||
and ``async for`` constructs.
|
||||
|
||||
|
||||
Nested blocks
|
||||
-------------
|
||||
|
||||
Given this code::
|
||||
|
||||
def f():
|
||||
with OUTER:
|
||||
with INNER:
|
||||
yield VALUE
|
||||
|
||||
then we perform the following operations in the following sequence::
|
||||
|
||||
INNER.__suspend__()
|
||||
OUTER.__suspend__()
|
||||
yield VALUE
|
||||
OUTER.__resume__()
|
||||
INNER.__resume__()
|
||||
|
||||
Note that this ensures that the following is a valid refactoring::
|
||||
|
||||
def f():
|
||||
with OUTER:
|
||||
yield from g()
|
||||
|
||||
def g():
|
||||
with INNER
|
||||
yield VALUE
|
||||
|
||||
Similarly, ``with`` statements with multiple context managers suspend
|
||||
from right to left, and resume from left to right.
|
||||
|
||||
|
||||
Other changes
|
||||
-------------
|
||||
|
||||
``__suspend__`` and ``__resume__`` methods are added to
|
||||
``warnings.catch_warnings`` and ``decimal.localcontext``.
|
||||
|
||||
|
||||
Rationale
|
||||
=========
|
||||
|
||||
In the abstract, we gave an example of plausible but incorrect code::
|
||||
|
||||
def f():
|
||||
with warnings.catch_warnings():
|
||||
for x in g():
|
||||
yield x
|
||||
|
||||
To make this correct in current Python, we need to instead write
|
||||
something like::
|
||||
|
||||
def f():
|
||||
with warnings.catch_warnings():
|
||||
it = iter(g())
|
||||
while True:
|
||||
with warnings.catch_warnings():
|
||||
try:
|
||||
x = next(it)
|
||||
except StopIteration:
|
||||
break
|
||||
yield x
|
||||
|
||||
OTOH, if this PEP is accepted then the original code will become
|
||||
correct as-is. Or if this isn't convincing, then here's another
|
||||
example of broken code; fixing it requires even greater gyrations, and
|
||||
these are left as an exercise for the reader::
|
||||
|
||||
def f2():
|
||||
with warnings.catch_warnings(record=True) as w:
|
||||
for x in g():
|
||||
yield x
|
||||
assert len(w) == 1
|
||||
assert "xyzzy" in w[0].message
|
||||
|
||||
And notice that this last example isn't artificial at all -- if you
|
||||
squint, it turns out to be exactly how you write a test that an
|
||||
asyncio-using coroutine ``g`` correctly raises a warning. Similar
|
||||
issues arise for pretty much any use of ``warnings.catch_warnings``,
|
||||
``decimal.localcontext``, or ``numpy.errstate`` in asyncio-using code.
|
||||
So there's clearly a real problem to solve here, and the growing
|
||||
prominence of async code makes it increasingly urgent.
|
||||
|
||||
|
||||
Alternative approaches
|
||||
----------------------
|
||||
|
||||
The main alternative that has been proposed is to create some kind of
|
||||
"task-local storage", analogous to "thread-local storage"
|
||||
[#yury-task-local-proposal]_. In essence, the idea would be that the
|
||||
event loop would take care to allocate a new "task namespace" for each
|
||||
task it schedules, and provide an API to at any given time fetch the
|
||||
namespace corresponding to the currently executing task. While there
|
||||
are many details to be worked out [#task-local-challenges]_, the basic
|
||||
idea seems doable, and it is an especially natural way to handle the
|
||||
kind of global context that arises at the top-level of async
|
||||
application frameworks (e.g., setting up context objects in a web
|
||||
framework). But it also has a number of flaws:
|
||||
|
||||
* It only solves the problem of managing global state for coroutines
|
||||
that ``yield`` back to an asynchronous event loop. But there
|
||||
actually isn't anything about this problem that's specific to
|
||||
asyncio -- as shown in the examples above, simple generators run
|
||||
into exactly the same issue.
|
||||
|
||||
* It creates an unnecessary coupling between event loops and code that
|
||||
needs to manage global state. Obviously an async web framework needs
|
||||
to interact with some event loop API anyway, so it's not a big deal
|
||||
in that case. But it's weird that ``warnings`` or ``decimal`` or
|
||||
NumPy should have to call into an async library's API to access
|
||||
their internal state when they themselves involve no async code.
|
||||
Worse, since there are multiple event loop APIs in common use, it
|
||||
isn't clear how to choose which to integrate with. (This could be
|
||||
somewhat mitigated by CPython providing a standard API for creating
|
||||
and switching "task-local domains" that asyncio, Twisted, tornado,
|
||||
etc. could then work with.)
|
||||
|
||||
* It's not at all clear that this can be made acceptably fast. NumPy
|
||||
has to check the floating point error settings on every single
|
||||
arithmetic operation. Checking a piece of data in thread-local
|
||||
storage is absurdly quick, because modern platforms have put massive
|
||||
resources into optimizing this case (e.g. dedicating a CPU register
|
||||
for this purpose); calling a method on an event loop to fetch a
|
||||
handle to a namespace and then doing lookup in that namespace is
|
||||
much slower.
|
||||
|
||||
More importantly, this extra cost would be paid on *every* access to
|
||||
the global data, even for programs which are not otherwise using an
|
||||
event loop at all. This PEP's proposal, by contrast, only affects
|
||||
code that actually mixes ``with`` blocks and ``yield`` statements,
|
||||
meaning that the users who experience the costs are the same users
|
||||
who also reap the benefits.
|
||||
|
||||
On the other hand, such tight integration between task context and the
|
||||
event loop does potentially allow other features that are beyond the
|
||||
scope of the current proposal. For example, an event loop could note
|
||||
which task namespace was in effect when a task called ``call_soon``,
|
||||
and arrange that the callback when run would have access to the same
|
||||
task namespace. Whether this is useful, or even well-defined in the
|
||||
case of cross-thread calls (what does it mean to have task-local
|
||||
storage accessed from two threads simultaneously?), is left as a
|
||||
puzzle for event loop implementors to ponder -- nothing in this
|
||||
proposal rules out such enhancements as well. It does seem though
|
||||
that such features would be useful primarily for state that already
|
||||
has a tight integration with the event loop -- while we might want a
|
||||
request id to be preserved across ``call_soon``, most people would not
|
||||
expect::
|
||||
|
||||
with warnings.catch_warnings():
|
||||
loop.call_soon(f)
|
||||
|
||||
to result in ``f`` being run with warnings disabled, which would be
|
||||
the result if ``call_soon`` preserved global context in general.
|
||||
|
||||
|
||||
Backwards compatibility
|
||||
=======================
|
||||
|
||||
Because ``__suspend__`` and ``__resume__`` are optional and default to
|
||||
no-ops, all existing context managers continue to work exactly as
|
||||
before.
|
||||
|
||||
Speed-wise, this proposal adds additional overhead when entering a
|
||||
``with`` block (where we must now check for the additional methods;
|
||||
failed attribute lookup in CPython is rather slow, since it involves
|
||||
allocating an ``AttributeError``), and additional overhead at
|
||||
suspension points. Since the position of ``with`` blocks and
|
||||
suspension points is known statically, the compiler can
|
||||
straightforwardly optimize away this overhead in all cases except
|
||||
where one actually has a ``yield`` inside a ``with``.
|
||||
|
||||
|
||||
Interaction with PEP 492
|
||||
========================
|
||||
|
||||
PEP 492 added new asynchronous context managers, which are like
|
||||
regular context managers but instead of having regular methods
|
||||
``__enter__`` and ``__exit__`` they have coroutine methods
|
||||
``__aenter__`` and ``__aexit__``.
|
||||
|
||||
There are a few options for how to handle these:
|
||||
|
||||
1) Add ``__asuspend__`` and ``__aresume__`` coroutine methods.
|
||||
|
||||
One potential difficulty here is that this would add a complication
|
||||
to an already complicated part of the bytecode
|
||||
interpreter. Consider code like::
|
||||
|
||||
async def f():
|
||||
async with MGR:
|
||||
await g()
|
||||
|
||||
@types.coroutine
|
||||
def g():
|
||||
yield 1
|
||||
|
||||
In 3.5, ``f`` gets desugared to something like::
|
||||
|
||||
@types.coroutine
|
||||
def f():
|
||||
yield from MGR.__aenter__()
|
||||
try:
|
||||
yield from g()
|
||||
finally:
|
||||
yield from MGR.__aexit__()
|
||||
|
||||
With the addition of ``__asuspend__`` / ``__aresume__``, the
|
||||
``yield from`` would have to replaced by something like::
|
||||
|
||||
for SUBVALUE in g():
|
||||
yield from MGR.__asuspend__()
|
||||
yield SUBVALUE
|
||||
yield from MGR.__aresume__()
|
||||
|
||||
Notice that we've had to introduce a new temporary ``SUBVALUE`` to
|
||||
hold the value yielded from ``g()`` while we yield from
|
||||
``MGR.__asuspend__()``. Where does this temporary go? Currently
|
||||
``yield from`` is a single bytecode that doesn't modify the stack
|
||||
while looping. Also, the above code isn't even complete, because it
|
||||
skips over the issue of how to direct ``send``/``throw`` calls to
|
||||
the right place at the right time...
|
||||
|
||||
2) Add plain ``__suspend__`` and ``__resume__`` methods.
|
||||
|
||||
3) Leave async context managers alone for now until we have more
|
||||
experience with them.
|
||||
|
||||
It isn't entirely clear what use cases even exist in which an async
|
||||
context manager would need to set coroutine-local-state (= like
|
||||
thread-local-state, but for a coroutine stack instead of an OS
|
||||
thread), and couldn't do so via coordination with the coroutine
|
||||
runner. So this draft tentatively goes with option (3) and punts on
|
||||
this question until later.
|
||||
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
.. [#yury-task-local-proposal] https://groups.google.com/forum/#!topic/python-tulip/zix5HQxtElg
|
||||
https://github.com/python/asyncio/issues/165
|
||||
|
||||
.. [#task-local-challenges] For example, we would have to decide
|
||||
whether there is a single task-local namespace shared by all users
|
||||
(in which case we need a way for multiple third-party libraries to
|
||||
adjudicate access to this namespace), or else if there are multiple
|
||||
task-local namespaces, then we need some mechanism for each library
|
||||
to arrange for their task-local namespaces to be created and
|
||||
destroyed at appropriate moments. The preliminary patch linked
|
||||
from the github issue above doesn't seem to provide any mechanism
|
||||
for such lifecycle management.
|
||||
|
||||
|
||||
Copyright
|
||||
=========
|
||||
|
||||
This document has been placed in the public domain.
|
||||
|
||||
|
||||
|
||||
..
|
||||
Local Variables:
|
||||
mode: indented-text
|
||||
indent-tabs-mode: nil
|
||||
sentence-end-double-space: t
|
||||
fill-column: 70
|
||||
coding: utf-8
|
||||
End:
|
Loading…
Reference in New Issue