521 lines
17 KiB
ReStructuredText
521 lines
17 KiB
ReStructuredText
PEP: 419
|
|
Title: Protecting cleanup statements from interruptions
|
|
Version: $Revision$
|
|
Last-Modified: $Date$
|
|
Author: Paul Colomiets <paul@colomiets.name>
|
|
Status: Deferred
|
|
Type: Standards Track
|
|
Content-Type: text/x-rst
|
|
Created: 06-Apr-2012
|
|
Python-Version: 3.3
|
|
|
|
|
|
Abstract
|
|
========
|
|
|
|
This PEP proposes a way to protect Python code from being interrupted
|
|
inside a finally clause or during context manager cleanup.
|
|
|
|
PEP Deferral
|
|
============
|
|
|
|
Further exploration of the concepts covered in this PEP has been deferred
|
|
for lack of a current champion interested in promoting the goals of the PEP
|
|
and collecting and incorporating feedback, and with sufficient available
|
|
time to do so effectively.
|
|
|
|
|
|
Rationale
|
|
=========
|
|
|
|
Python has two nice ways to do cleanup. One is a ``finally``
|
|
statement and the other is a context manager (usually called using a
|
|
``with`` statement). However, neither is protected from interruption
|
|
by ``KeyboardInterrupt`` or ``GeneratorExit`` caused by
|
|
``generator.throw()``. For example::
|
|
|
|
lock.acquire()
|
|
try:
|
|
print('starting')
|
|
do_something()
|
|
finally:
|
|
print('finished')
|
|
lock.release()
|
|
|
|
If ``KeyboardInterrupt`` occurs just after the second ``print()``
|
|
call, the lock will not be released. Similarly, the following code
|
|
using the ``with`` statement is affected::
|
|
|
|
from threading import Lock
|
|
|
|
class MyLock:
|
|
|
|
def __init__(self):
|
|
self._lock_impl = Lock()
|
|
|
|
def __enter__(self):
|
|
self._lock_impl.acquire()
|
|
print("LOCKED")
|
|
|
|
def __exit__(self):
|
|
print("UNLOCKING")
|
|
self._lock_impl.release()
|
|
|
|
lock = MyLock()
|
|
with lock:
|
|
do_something
|
|
|
|
If ``KeyboardInterrupt`` occurs near any of the ``print()`` calls, the
|
|
lock will never be released.
|
|
|
|
|
|
Coroutine Use Case
|
|
------------------
|
|
|
|
A similar case occurs with coroutines. Usually coroutine libraries
|
|
want to interrupt the coroutine with a timeout. The
|
|
``generator.throw()`` method works for this use case, but there is no
|
|
way of knowing if the coroutine is currently suspended from inside a
|
|
``finally`` clause.
|
|
|
|
An example that uses yield-based coroutines follows. The code looks
|
|
similar using any of the popular coroutine libraries Monocle [1]_,
|
|
Bluelet [2]_, or Twisted [3]_. ::
|
|
|
|
def run_locked():
|
|
yield connection.sendall('LOCK')
|
|
try:
|
|
yield do_something()
|
|
yield do_something_else()
|
|
finally:
|
|
yield connection.sendall('UNLOCK')
|
|
|
|
with timeout(5):
|
|
yield run_locked()
|
|
|
|
In the example above, ``yield something`` means to pause executing the
|
|
current coroutine and to execute coroutine ``something`` until it
|
|
finishes execution. Therefore, the coroutine library itself needs to
|
|
maintain a stack of generators. The ``connection.sendall()`` call waits
|
|
until the socket is writable and does a similar thing to what
|
|
``socket.sendall()`` does.
|
|
|
|
The ``with`` statement ensures that all code is executed within 5
|
|
seconds timeout. It does so by registering a callback in the main
|
|
loop, which calls ``generator.throw()`` on the top-most frame in the
|
|
coroutine stack when a timeout happens.
|
|
|
|
The ``greenlets`` extension works in a similar way, except that it
|
|
doesn't need ``yield`` to enter a new stack frame. Otherwise
|
|
considerations are similar.
|
|
|
|
|
|
Specification
|
|
=============
|
|
|
|
Frame Flag 'f_in_cleanup'
|
|
-------------------------
|
|
|
|
A new flag on the frame object is proposed. It is set to ``True`` if
|
|
this frame is currently executing a ``finally`` clause. Internally,
|
|
the flag must be implemented as a counter of nested finally statements
|
|
currently being executed.
|
|
|
|
The internal counter also needs to be incremented during execution of
|
|
the ``SETUP_WITH`` and ``WITH_CLEANUP`` bytecodes, and decremented
|
|
when execution for these bytecodes is finished. This allows to also
|
|
protect ``__enter__()`` and ``__exit__()`` methods.
|
|
|
|
|
|
Function 'sys.setcleanuphook'
|
|
-----------------------------
|
|
|
|
A new function for the ``sys`` module is proposed. This function sets
|
|
a callback which is executed every time ``f_in_cleanup`` becomes
|
|
false. Callbacks get a frame object as their sole argument, so that
|
|
they can figure out where they are called from.
|
|
|
|
The setting is thread local and must be stored in the
|
|
``PyThreadState`` structure.
|
|
|
|
|
|
Inspect Module Enhancements
|
|
---------------------------
|
|
|
|
Two new functions are proposed for the ``inspect`` module:
|
|
``isframeincleanup()`` and ``getcleanupframe()``.
|
|
|
|
``isframeincleanup()``, given a frame or generator object as its sole
|
|
argument, returns the value of the ``f_in_cleanup`` attribute of a
|
|
frame itself or of the ``gi_frame`` attribute of a generator.
|
|
|
|
``getcleanupframe()``, given a frame object as its sole argument,
|
|
returns the innermost frame which has a true value of
|
|
``f_in_cleanup``, or ``None`` if no frames in the stack have a nonzero
|
|
value for that attribute. It starts to inspect from the specified
|
|
frame and walks to outer frames using ``f_back`` pointers, just like
|
|
``getouterframes()`` does.
|
|
|
|
|
|
Example
|
|
=======
|
|
|
|
An example implementation of a SIGINT handler that interrupts safely
|
|
might look like::
|
|
|
|
import inspect, sys, functools
|
|
|
|
def sigint_handler(sig, frame):
|
|
if inspect.getcleanupframe(frame) is None:
|
|
raise KeyboardInterrupt()
|
|
sys.setcleanuphook(functools.partial(sigint_handler, 0))
|
|
|
|
A coroutine example is out of scope of this document, because its
|
|
implementation depends very much on a trampoline (or main loop) used
|
|
by coroutine library.
|
|
|
|
|
|
Unresolved Issues
|
|
=================
|
|
|
|
Interruption Inside With Statement Expression
|
|
---------------------------------------------
|
|
|
|
Given the statement ::
|
|
|
|
with open(filename):
|
|
do_something()
|
|
|
|
Python can be interrupted after ``open()`` is called, but before the
|
|
``SETUP_WITH`` bytecode is executed. There are two possible
|
|
decisions:
|
|
|
|
* Protect ``with`` expressions. This would require another bytecode,
|
|
since currently there is no way of recognizing the start of the
|
|
``with`` expression.
|
|
|
|
* Let the user write a wrapper if he considers it important for the
|
|
use-case. A safe wrapper might look like this::
|
|
|
|
class FileWrapper(object):
|
|
|
|
def __init__(self, filename, mode):
|
|
self.filename = filename
|
|
self.mode = mode
|
|
|
|
def __enter__(self):
|
|
self.file = open(self.filename, self.mode)
|
|
|
|
def __exit__(self):
|
|
self.file.close()
|
|
|
|
Alternatively it can be written using the ``contextmanager()``
|
|
decorator::
|
|
|
|
@contextmanager
|
|
def open_wrapper(filename, mode):
|
|
file = open(filename, mode)
|
|
try:
|
|
yield file
|
|
finally:
|
|
file.close()
|
|
|
|
This code is safe, as the first part of the generator (before yield)
|
|
is executed inside the ``SETUP_WITH`` bytecode of the caller.
|
|
|
|
|
|
Exception Propagation
|
|
---------------------
|
|
|
|
Sometimes a ``finally`` clause or an ``__enter__()``/``__exit__()``
|
|
method can raise an exception. Usually this is not a problem, since
|
|
more important exceptions like ``KeyboardInterrupt`` or ``SystemExit``
|
|
should be raised instead. But it may be nice to be able to keep the
|
|
original exception inside a ``__context__`` attribute. So the cleanup
|
|
hook signature may grow an exception argument::
|
|
|
|
def sigint_handler(sig, frame)
|
|
if inspect.getcleanupframe(frame) is None:
|
|
raise KeyboardInterrupt()
|
|
sys.setcleanuphook(retry_sigint)
|
|
|
|
def retry_sigint(frame, exception=None):
|
|
if inspect.getcleanupframe(frame) is None:
|
|
raise KeyboardInterrupt() from exception
|
|
|
|
.. note::
|
|
|
|
There is no need to have three arguments like in the ``__exit__``
|
|
method since there is a ``__traceback__`` attribute in exception in
|
|
Python 3.
|
|
|
|
However, this will set the ``__cause__`` for the exception, which is
|
|
not exactly what's intended. So some hidden interpreter logic may be
|
|
used to put a ``__context__`` attribute on every exception raised in a
|
|
cleanup hook.
|
|
|
|
|
|
Interruption Between Acquiring Resource and Try Block
|
|
-----------------------------------------------------
|
|
|
|
The example from the first section is not totally safe. Let's take a
|
|
closer look::
|
|
|
|
lock.acquire()
|
|
try:
|
|
do_something()
|
|
finally:
|
|
lock.release()
|
|
|
|
The problem might occur if the code is interrupted just after
|
|
``lock.acquire()`` is executed but before the ``try`` block is
|
|
entered.
|
|
|
|
There is no way the code can be fixed unmodified. The actual fix
|
|
depends very much on the use case. Usually code can be fixed using a
|
|
``with`` statement::
|
|
|
|
with lock:
|
|
do_something()
|
|
|
|
However, for coroutines one usually can't use the ``with`` statement
|
|
because you need to ``yield`` for both the acquire and release
|
|
operations. So the code might be rewritten like this::
|
|
|
|
try:
|
|
yield lock.acquire()
|
|
do_something()
|
|
finally:
|
|
yield lock.release()
|
|
|
|
The actual locking code might need more code to support this use case,
|
|
but the implementation is usually trivial, like this: check if the
|
|
lock has been acquired and unlock if it is.
|
|
|
|
|
|
Handling EINTR Inside a Finally
|
|
-------------------------------
|
|
|
|
Even if a signal handler is prepared to check the ``f_in_cleanup``
|
|
flag, ``InterruptedError`` might be raised in the cleanup handler,
|
|
because the respective system call returned an ``EINTR`` error. The
|
|
primary use cases are prepared to handle this:
|
|
|
|
* Posix mutexes never return ``EINTR``
|
|
|
|
* Networking libraries are always prepared to handle ``EINTR``
|
|
|
|
* Coroutine libraries are usually interrupted with the ``throw()``
|
|
method, not with a signal
|
|
|
|
The platform-specific function ``siginterrupt()`` might be used to
|
|
remove the need to handle ``EINTR``. However, it may have hardly
|
|
predictable consequences, for example ``SIGINT`` a handler is never
|
|
called if the main thread is stuck inside an IO routine.
|
|
|
|
A better approach would be to have the code, which is usually used in
|
|
cleanup handlers, be prepared to handle ``InterruptedError``
|
|
explicitly. An example of such code might be a file-based lock
|
|
implementation.
|
|
|
|
``signal.pthread_sigmask`` can be used to block signals inside
|
|
cleanup handlers which can be interrupted with ``EINTR``.
|
|
|
|
|
|
Setting Interruption Context Inside Finally Itself
|
|
--------------------------------------------------
|
|
|
|
Some coroutine libraries may need to set a timeout for the finally
|
|
clause itself. For example::
|
|
|
|
try:
|
|
do_something()
|
|
finally:
|
|
with timeout(0.5):
|
|
try:
|
|
yield do_slow_cleanup()
|
|
finally:
|
|
yield do_fast_cleanup()
|
|
|
|
With current semantics, timeout will either protect the whole ``with``
|
|
block or nothing at all, depending on the implementation of each
|
|
library. What the author intended is to treat ``do_slow_cleanup`` as
|
|
ordinary code, and ``do_fast_cleanup`` as a cleanup (a
|
|
non-interruptible one).
|
|
|
|
A similar case might occur when using greenlets or tasklets.
|
|
|
|
This case can be fixed by exposing ``f_in_cleanup`` as a counter, and
|
|
by calling a cleanup hook on each decrement. A coroutine library may
|
|
then remember the value at timeout start, and compare it on each hook
|
|
execution.
|
|
|
|
But in practice, the example is considered to be too obscure to take
|
|
into account.
|
|
|
|
|
|
Modifying KeyboardInterrupt
|
|
---------------------------
|
|
|
|
It should be decided if the default ``SIGINT`` handler should be
|
|
modified to use the described mechanism. The initial proposition is
|
|
to keep old behavior, for two reasons:
|
|
|
|
* Most application do not care about cleanup on exit (either they do
|
|
not have external state, or they modify it in crash-safe way).
|
|
|
|
* Cleanup may take too much time, not giving user a chance to
|
|
interrupt an application.
|
|
|
|
The latter case can be fixed by allowing an unsafe break if a
|
|
``SIGINT`` handler is called twice, but it seems not worth the
|
|
complexity.
|
|
|
|
|
|
Alternative Python Implementations Support
|
|
==========================================
|
|
|
|
We consider ``f_in_cleanup`` an implementation detail. The actual
|
|
implementation may have some fake frame-like object passed to signal
|
|
handler, cleanup hook and returned from ``getcleanupframe()``. The
|
|
only requirement is that the ``inspect`` module functions work as
|
|
expected on these objects. For this reason, we also allow to pass a
|
|
generator object to the ``isframeincleanup()`` function, which removes
|
|
the need to use the ``gi_frame`` attribute.
|
|
|
|
It might be necessary to specify that ``getcleanupframe()`` must
|
|
return the same object that will be passed to cleanup hook at the next
|
|
invocation.
|
|
|
|
|
|
Alternative Names
|
|
=================
|
|
|
|
The original proposal had a ``f_in_finally`` frame attribute, as the
|
|
original intention was to protect ``finally`` clauses. But as it grew
|
|
up to protecting ``__enter__`` and ``__exit__`` methods too, the
|
|
``f_in_cleanup`` name seems better. Although the ``__enter__`` method
|
|
is not a cleanup routine, it at least relates to cleanup done by
|
|
context managers.
|
|
|
|
``setcleanuphook``, ``isframeincleanup`` and ``getcleanupframe`` can
|
|
be unobscured to ``set_cleanup_hook``, ``is_frame_in_cleanup`` and
|
|
``get_cleanup_frame``, although they follow the naming convention of
|
|
their respective modules.
|
|
|
|
|
|
Alternative Proposals
|
|
=====================
|
|
|
|
Propagating 'f_in_cleanup' Flag Automatically
|
|
---------------------------------------------
|
|
|
|
This can make ``getcleanupframe()`` unnecessary. But for yield-based
|
|
coroutines you need to propagate it yourself. Making it writable
|
|
leads to somewhat unpredictable behavior of ``setcleanuphook()``.
|
|
|
|
|
|
Add Bytecodes 'INCR_CLEANUP', 'DECR_CLEANUP'
|
|
--------------------------------------------
|
|
|
|
These bytecodes can be used to protect the expression inside the
|
|
``with`` statement, as well as making counter increments more explicit
|
|
and easy to debug (visible inside a disassembly). Some middle ground
|
|
might be chosen, like ``END_FINALLY`` and ``SETUP_WITH`` implicitly
|
|
decrementing the counter (``END_FINALLY`` is present at end of every
|
|
``with`` suite).
|
|
|
|
However, adding new bytecodes must be considered very carefully.
|
|
|
|
|
|
Expose 'f_in_cleanup' as a Counter
|
|
----------------------------------
|
|
|
|
The original intention was to expose a minimum of needed
|
|
functionality. However, as we consider the frame flag
|
|
``f_in_cleanup`` an implementation detail, we may expose it as a
|
|
counter.
|
|
|
|
Similarly, if we have a counter we may need to have the cleanup hook
|
|
called on every counter decrement. It's unlikely to have much
|
|
performance impact as nested finally clauses are an uncommon case.
|
|
|
|
|
|
Add code object flag 'CO_CLEANUP'
|
|
---------------------------------
|
|
|
|
As an alternative to set the flag inside the ``SETUP_WITH`` and
|
|
``WITH_CLEANUP`` bytecodes, we can introduce a flag ``CO_CLEANUP``.
|
|
When the interpreter starts to execute code with ``CO_CLEANUP`` set,
|
|
it sets ``f_in_cleanup`` for the whole function body. This flag is
|
|
set for code objects of ``__enter__`` and ``__exit__`` special
|
|
methods. Technically it might be set on functions called
|
|
``__enter__`` and ``__exit__``.
|
|
|
|
This seems to be less clear solution. It also covers the case where
|
|
``__enter__`` and ``__exit__`` are called manually. This may be
|
|
accepted either as a feature or as an unnecessary side-effect (or,
|
|
though unlikely, as a bug).
|
|
|
|
It may also impose a problem when ``__enter__`` or ``__exit__``
|
|
functions are implemented in C, as there is no code object to check
|
|
for the ``f_in_cleanup`` flag.
|
|
|
|
|
|
Have Cleanup Callback on Frame Object Itself
|
|
--------------------------------------------
|
|
|
|
The frame object may be extended to have a ``f_cleanup_callback``
|
|
member which is called when ``f_in_cleanup`` is reset to 0. This
|
|
would help to register different callbacks to different coroutines.
|
|
|
|
Despite its apparent beauty, this solution doesn't add anything, as
|
|
the two primary use cases are:
|
|
|
|
* Setting the callback in a signal handler. The callback is
|
|
inherently a single one for this case.
|
|
|
|
* Use a single callback per loop for the coroutine use case. Here, in
|
|
almost all cases, there is only one loop per thread.
|
|
|
|
|
|
No Cleanup Hook
|
|
---------------
|
|
|
|
The original proposal included no cleanup hook specification, as there
|
|
are a few ways to achieve the same using current tools:
|
|
|
|
* Using ``sys.settrace()`` and the ``f_trace`` callback. This may
|
|
impose some problem to debugging, and has a big performance impact
|
|
(although interrupting doesn't happen very often).
|
|
|
|
* Sleeping a bit more and trying again. For a coroutine library this
|
|
is easy. For signals it may be achieved using ``signal.alert``.
|
|
|
|
Both methods are considered too impractical and a way to catch exit
|
|
from ``finally`` clauses is proposed.
|
|
|
|
|
|
References
|
|
==========
|
|
|
|
.. [1] Monocle
|
|
https://github.com/saucelabs/monocle
|
|
|
|
.. [2] Bluelet
|
|
https://github.com/sampsyo/bluelet
|
|
|
|
.. [3] Twisted: inlineCallbacks
|
|
https://twisted.org/documents/8.1.0/api/twisted.internet.defer.html
|
|
|
|
[4] Original discussion
|
|
\ https://mail.python.org/pipermail/python-ideas/2012-April/014705.html
|
|
|
|
[5] Implementation of PEP 419
|
|
\ https://github.com/python/cpython/issues/58935
|
|
|
|
Copyright
|
|
=========
|
|
|
|
This document has been placed in the public domain.
|