python-peps/pep-0521.txt

PEP: 521
Title: Managing global context via 'with' blocks in generators and coroutines
Version: $Revision$
Last-Modified: $Date$
Author: Nathaniel J. Smith <njs@pobox.com>
Status: Deferred
Type: Standards Track
Content-Type: text/x-rst
Created: 27-Apr-2015
Python-Version: 3.6
Post-History: 29-Apr-2015


Abstract
========

While we generally try to avoid global state when possible, there
nonetheless exist a number of situations where it is agreed to be the
best approach.  In Python, the standard way of handling such cases is
to store the global state in global or thread-local storage, and then
use ``with`` blocks to limit modifications of this global state to a
single dynamic scope. Examples where this pattern is used include the
standard library's ``warnings.catch_warnings`` and
``decimal.localcontext``, NumPy's ``numpy.errstate`` (which exposes
the error-handling settings provided by the IEEE 754 floating point
standard), and the handling of logging context or HTTP request context
in many server application frameworks.

However, there is currently no ergonomic way to manage such local
changes to global state when writing a generator or coroutine. For
example, this code::

  def f():
      with warnings.catch_warnings():
          for x in g():
              yield x

may or may not successfully catch warnings raised by ``g()``, and may
or may not inadverdantly swallow warnings triggered elsewhere in the
code.  The context manager, which was intended to apply only to ``f``
and its callees, ends up having a dynamic scope that encompasses
arbitrary and unpredictable parts of its call\ **ers**. This problem
becomes particularly acute when writing asynchronous code, where
essentially all functions become coroutines.

Here, we propose to solve this problem by notifying context managers
whenever execution is suspended or resumed within their scope,
allowing them to restrict their effects appropriately.


Specification
=============

Two new, optional, methods are added to the context manager protocol:
``__suspend__`` and ``__resume__``.  If present, these methods will be
called whenever a frame's execution is suspended or resumed from
within the context of the ``with`` block.

More formally, consider the following code::

  with EXPR as VAR:
      PARTIAL-BLOCK-1
      f((yield foo))
      PARTIAL-BLOCK-2

Currently this is equivalent to the following code copied from PEP 343::

  mgr = (EXPR)
  exit = type(mgr).__exit__  # Not calling it yet
  value = type(mgr).__enter__(mgr)
  exc = True
  try:
      try:
          VAR = value  # Only if "as VAR" is present
          PARTIAL-BLOCK-1
          f((yield foo))
          PARTIAL-BLOCK-2
      except:
          exc = False
          if not exit(mgr, *sys.exc_info()):
              raise
  finally:
      if exc:
          exit(mgr, None, None, None)

This PEP proposes to modify ``with`` block handling to instead become::

  mgr = (EXPR)
  exit = type(mgr).__exit__  # Not calling it yet
  ### --- NEW STUFF ---
  if the_block_contains_yield_points:  # known statically at compile time
      suspend = getattr(type(mgr), "__suspend__", lambda: None)
      resume = getattr(type(mgr), "__resume__", lambda: None)
  ### --- END OF NEW STUFF ---
  value = type(mgr).__enter__(mgr)
  exc = True
  try:
      try:
          VAR = value  # Only if "as VAR" is present
          PARTIAL-BLOCK-1
          ### --- NEW STUFF ---
          suspend()
          tmp = yield foo
          resume()
          f(tmp)
          ### --- END OF NEW STUFF ---
          PARTIAL-BLOCK-2
      except:
          exc = False
          if not exit(mgr, *sys.exc_info()):
              raise
  finally:
      if exc:
          exit(mgr, None, None, None)

Analogous suspend/resume calls are also wrapped around the ``yield``
points embedded inside the ``yield from``, ``await``, ``async with``,
and ``async for`` constructs.


Nested blocks
-------------

Given this code::

  def f():
      with OUTER:
          with INNER:
              yield VALUE

then we perform the following operations in the following sequence::

  INNER.__suspend__()
  OUTER.__suspend__()
  yield VALUE
  OUTER.__resume__()
  INNER.__resume__()

Note that this ensures that the following is a valid refactoring::

  def f():
      with OUTER:
          yield from g()

  def g():
      with INNER
          yield VALUE

Similarly, ``with`` statements with multiple context managers suspend
from right to left, and resume from left to right.


Other changes
-------------

``__suspend__`` and ``__resume__`` methods are added to
``warnings.catch_warnings`` and ``decimal.localcontext``.


Rationale
=========

In the abstract, we gave an example of plausible but incorrect code::

  def f():
      with warnings.catch_warnings():
          for x in g():
              yield x

To make this correct in current Python, we need to instead write
something like::

  def f():
      with warnings.catch_warnings():
          it = iter(g())
      while True:
          with warnings.catch_warnings():
              try:
                  x = next(it)
              except StopIteration:
                  break
          yield x

OTOH, if this PEP is accepted then the original code will become
correct as-is.  Or if this isn't convincing, then here's another
example of broken code; fixing it requires even greater gyrations, and
these are left as an exercise for the reader::

  def f2():
      with warnings.catch_warnings(record=True) as w:
          for x in g():
              yield x
      assert len(w) == 1
      assert "xyzzy" in w[0].message

And notice that this last example isn't artificial at all -- if you
squint, it turns out to be exactly how you write a test that an
asyncio-using coroutine ``g`` correctly raises a warning.  Similar
issues arise for pretty much any use of ``warnings.catch_warnings``,
``decimal.localcontext``, or ``numpy.errstate`` in asyncio-using code.
So there's clearly a real problem to solve here, and the growing
prominence of async code makes it increasingly urgent.


Alternative approaches
----------------------

The main alternative that has been proposed is to create some kind of
"task-local storage", analogous to "thread-local storage"
[#yury-task-local-proposal]_. In essence, the idea would be that the
event loop would take care to allocate a new "task namespace" for each
task it schedules, and provide an API to at any given time fetch the
namespace corresponding to the currently executing task.  While there
are many details to be worked out [#task-local-challenges]_, the basic
idea seems doable, and it is an especially natural way to handle the
kind of global context that arises at the top-level of async
application frameworks (e.g., setting up context objects in a web
framework).  But it also has a number of flaws:

* It only solves the problem of managing global state for coroutines
  that ``yield`` back to an asynchronous event loop.  But there
  actually isn't anything about this problem that's specific to
  asyncio -- as shown in the examples above, simple generators run
  into exactly the same issue.

* It creates an unnecessary coupling between event loops and code that
  needs to manage global state. Obviously an async web framework needs
  to interact with some event loop API anyway, so it's not a big deal
  in that case. But it's weird that ``warnings`` or ``decimal`` or
  NumPy should have to call into an async library's API to access
  their internal state when they themselves involve no async code.
  Worse, since there are multiple event loop APIs in common use, it
  isn't clear how to choose which to integrate with.  (This could be
  somewhat mitigated by CPython providing a standard API for creating
  and switching "task-local domains" that asyncio, Twisted, tornado,
  etc. could then work with.)

* It's not at all clear that this can be made acceptably fast.  NumPy
  has to check the floating point error settings on every single
  arithmetic operation.  Checking a piece of data in thread-local
  storage is absurdly quick, because modern platforms have put massive
  resources into optimizing this case (e.g. dedicating a CPU register
  for this purpose); calling a method on an event loop to fetch a
  handle to a namespace and then doing lookup in that namespace is
  much slower.

  More importantly, this extra cost would be paid on *every* access to
  the global data, even for programs which are not otherwise using an
  event loop at all.  This PEP's proposal, by contrast, only affects
  code that actually mixes ``with`` blocks and ``yield`` statements,
  meaning that the users who experience the costs are the same users
  who also reap the benefits.

On the other hand, such tight integration between task context and the
event loop does potentially allow other features that are beyond the
scope of the current proposal.  For example, an event loop could note
which task namespace was in effect when a task called ``call_soon``,
and arrange that the callback when run would have access to the same
task namespace.  Whether this is useful, or even well-defined in the
case of cross-thread calls (what does it mean to have task-local
storage accessed from two threads simultaneously?), is left as a
puzzle for event loop implementors to ponder -- nothing in this
proposal rules out such enhancements as well.  It does seem though
that such features would be useful primarily for state that already
has a tight integration with the event loop -- while we might want a
request id to be preserved across ``call_soon``, most people would not
expect::

  with warnings.catch_warnings():
      loop.call_soon(f)

to result in ``f`` being run with warnings disabled, which would be
the result if ``call_soon`` preserved global context in general.


Backwards compatibility
=======================

Because ``__suspend__`` and ``__resume__`` are optional and default to
no-ops, all existing context managers continue to work exactly as
before.

Speed-wise, this proposal adds additional overhead when entering a
``with`` block (where we must now check for the additional methods;
failed attribute lookup in CPython is rather slow, since it involves
allocating an ``AttributeError``), and additional overhead at
suspension points.  Since the position of ``with`` blocks and
suspension points is known statically, the compiler can
straightforwardly optimize away this overhead in all cases except
where one actually has a ``yield`` inside a ``with``.


Interaction with PEP 492
========================

PEP 492 added new asynchronous context managers, which are like
regular context managers but instead of having regular methods
``__enter__`` and ``__exit__`` they have coroutine methods
``__aenter__`` and ``__aexit__``.

There are a few options for how to handle these:

1) Add ``__asuspend__`` and ``__aresume__`` coroutine methods.

   One potential difficulty here is that this would add a complication
   to an already complicated part of the bytecode
   interpreter. Consider code like::

     async def f():
         async with MGR:
             await g()

     @types.coroutine
     def g():
          yield 1

   In 3.5, ``f`` gets desugared to something like::

     @types.coroutine
     def f():
         yield from MGR.__aenter__()
         try:
             yield from g()
         finally:
             yield from MGR.__aexit__()

   With the addition of ``__asuspend__`` / ``__aresume__``, the
   ``yield from`` would have to replaced by something like::

         for SUBVALUE in g():
             yield from MGR.__asuspend__()
             yield SUBVALUE
             yield from MGR.__aresume__()

   Notice that we've had to introduce a new temporary ``SUBVALUE`` to
   hold the value yielded from ``g()`` while we yield from
   ``MGR.__asuspend__()``. Where does this temporary go? Currently
   ``yield from`` is a single bytecode that doesn't modify the stack
   while looping. Also, the above code isn't even complete, because it
   skips over the issue of how to direct ``send``/``throw`` calls to
   the right place at the right time...

2) Add plain ``__suspend__`` and ``__resume__`` methods.

3) Leave async context managers alone for now until we have more
   experience with them.

It isn't entirely clear what use cases even exist in which an async
context manager would need to set coroutine-local-state (= like
thread-local-state, but for a coroutine stack instead of an OS
thread), and couldn't do so via coordination with the coroutine
runner. So this draft tentatively goes with option (3) and punts on
this question until later.


References
==========

.. [#yury-task-local-proposal] https://groups.google.com/forum/#!topic/python-tulip/zix5HQxtElg
   https://github.com/python/asyncio/issues/165

.. [#task-local-challenges] For example, we would have to decide
   whether there is a single task-local namespace shared by all users
   (in which case we need a way for multiple third-party libraries to
   adjudicate access to this namespace), or else if there are multiple
   task-local namespaces, then we need some mechanism for each library
   to arrange for their task-local namespaces to be created and
   destroyed at appropriate moments.  The preliminary patch linked
   from the github issue above doesn't seem to provide any mechanism
   for such lifecycle management.


Copyright
=========

This document has been placed in the public domain.


..
   Local Variables:
   mode: indented-text
   indent-tabs-mode: nil
   sentence-end-double-space: t
   fill-column: 70
   coding: utf-8
   End: