python-peps/pep-0707.rst

394 lines
17 KiB
ReStructuredText

PEP: 707
Title: A simplified signature for __exit__ and __aexit__
Author: Irit Katriel <iritkatriel@gmail.com>
Discussions-To: https://discuss.python.org/t/24402
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 18-Feb-2023
Python-Version: 3.12
Post-History: `02-Mar-2023 <https://discuss.python.org/t/24402/>`__,
Resolution:
Abstract
========
This PEP proposes to make the interpreter accept context managers whose
:meth:`~py3.11:object.__exit__` / :meth:`~py3.11:object.__aexit__` method
takes only a single exception instance,
while continuing to also support the current ``(typ, exc, tb)`` signature
for backwards compatibility.
This proposal is part of an ongoing effort to remove the redundancy of
the 3-item exception representation from the language, a relic of earlier
Python versions which now confuses language users while adding complexity
and overhead to the interpreter.
The proposed implementation uses introspection, which is tailored to the
requirements of this use case. The solution ensures the safety of the new
feature by supporting it only in non-ambiguous cases. In particular, any
signature that *could* accept three arguments is assumed to expect them.
Because reliable introspection of callables is not currently possible in
Python, the solution proposed here is limited in that only the common types
of single-arg callables will be identified as such, while some of the more
esoteric ones will continue to be called with three arguments. This
imperfect solution was chosen among several imperfect alternatives in the
spirit of practicality. It is my hope that the discussion about this PEP
will explore the other options and lead us to the best way forward, which
may well be to remain with our imperfect status quo.
Motivation
==========
In the past, an exception was represented in many parts of Python by a
tuple of three elements: the type of the exception, its value, and its
traceback. While there were good reasons for this design at the time,
they no longer hold because the type and traceback can now be reliably
deduced from the exception instance. Over the last few years we saw
several efforts to simplify the representation of exceptions.
Since 3.10 in `CPython PR #70577 <https://github.com/python/cpython/issues/70577>`_,
the :mod:`py3.11:traceback` module's functions accept either a 3-tuple
as described above, or just an exception instance as a single argument.
Internally, the interpreter no longer represents exceptions as a triplet.
This was `removed for the handled exception in 3.11
<https://github.com/python/cpython/pull/30122>`_ and
`for the raised exception in 3.12
<https://github.com/python/cpython/pull/101607>`_. As a consequence,
several APIs that expose the triplet can now be replaced by
simpler alternatives:
.. list-table::
:header-rows: 1
:widths: auto
* -
- Legacy API
- Alternative
* - Get handled exception (Python)
- :func:`py3.12:sys.exc_info`
- :func:`py3.12:sys.exception`
* - Get handled exception (C)
- :external+py3.12:c:func:`PyErr_GetExcInfo`
- :external+py3.12:c:func:`PyErr_GetHandledException`
* - Set handled exception (C)
- :external+py3.12:c:func:`PyErr_SetExcInfo`
- :external+py3.12:c:func:`PyErr_SetHandledException`
* - Get raised exception (C)
- :external+py3.12:c:func:`PyErr_Fetch`
- :external+py3.12:c:func:`PyErr_GetRaisedException`
* - Set raised exception (C)
- :external+py3.12:c:func:`PyErr_Restore`
- :external+py3.12:c:func:`PyErr_SetRaisedException`
* - Construct an exception instance from the 3-tuple (C)
- :external+py3.12:c:func:`PyErr_NormalizeException`
- N/A
The current proposal is a step in this process, and considers the way
forward for one more case in which the 3-tuple representation has
leaked to the language. The motivation for all this work is twofold.
Simplify the implementation of the language
-------------------------------------------
The simplification gained by reducing the interpreter's internal
representation of the handled exception to a single object was significant.
Previously, the interpreter needed to push onto/pop
from the stack three items whenever it did anything with exceptions.
This increased stack depth (adding pressure on caches and registers) and
complicated some of the bytecodes. Reducing this to one item
`removed about 100 lines of code <https://github.com/python/cpython/pull/30122>`_
from ``ceval.c`` (the interpreter's eval loop implementation), and it was later
followed by the removal of the ``POP_EXCEPT_AND_RERAISE`` opcode which has
become simple enough to be `replaced by generic stack manipulation instructions
<https://github.com/python/cpython/issues/90360>`_. Micro-benchmarks showed
`a speedup of about 10% for catching and raising an exception, as well as
for creating generators
<https://github.com/faster-cpython/ideas/issues/106#issuecomment-990172363>`_.
To summarize, removing this redundancy in Python's internals simplified the
interpreter and made it faster.
The performance of invoking ``__exit__``/``__aexit__`` when leaving
a context manager can be also improved by replacing a multi-arg function
call with a single-arg one. Micro-benchmarks showed that entering and exiting
a context manager with single-arg ``__exit__`` is about 13% faster.
Simplify the language itself
----------------------------
One of the reasons for the popularity of Python is its simplicity. The
:func:`py3.11:sys.exc_info` triplet is cryptic for new learners,
and the redundancy in it is confusing for those who do understand it.
It will take multiple releases to get to a point where we can think of
deprecating ``sys.exc_info()``. However, we can relatively quickly reach a
stage where new learners do not need to know about it, or about the 3-tuple
representation, at least until they are maintaining legacy code.
Rationale
=========
The only reason to object today to the removal of the last remaining
appearances of the 3-tuple from the language is the concerns about
disruption that such changes can bring. The goal of this PEP is to propose
a safe, gradual and minimally disruptive way to make this change in the
case of ``__exit__``, and with this to initiate a discussion of our options
for evolving its method signature.
In the case of the :mod:`py3.11:traceback` module's API, evolving the
functions to have a hybrid signature is relatively straightforward and
safe. The functions take one positional and two optional arguments, and
interpret them according to their types. This is safe when sentinels
are used for default values. The signatures of callbacks, which are
defined by the user's program, are harder to evolve.
The safest option is to make the user explicitly indicate which signature
the callback is expecting, by marking it with an additional attribute or
giving it a different name. For example, we could make the interpreter
look for a ``__leave__`` method on the context manager, and call it with
a single arg if it exists (otherwise, it looks for ``__exit__`` and
continues as it does now). The introspection-based alternative proposed
here intends to make it more convenient for users to write new code,
because they can just use the single-arg version and remain unaware of
the legacy API. However, if the limitations of introspection are found
to be too severe, we should consider an explicit option. Having both
``__exit__`` and ``__leave__`` around for 5-10 years with similar
functionality is not ideal, but it is an option.
Let us now examine the limitations of the current proposal. It identifies
2-arg python functions and ``METH_O`` C functions as having a single-arg
signature, and assumes that anything else is expecting 3 args. Obviously
it is possible to create false negatives for this heuristic (single-arg
callables that it will not identify). Context managers written in this
way won't work, they will continue to fail as they do now when their
``__exit__`` function will be called with three arguments.
I believe that it will not be a problem in practice. First, all working
code will continue to work, so this is a limitation on new code rather
than a problem impacting existing code. Second, exotic callable types are
rarely used for ``__exit__`` and if one is needed, it can always be wrapped
by a plain vanilla method that delegates to the callable. For example, we
can write this::
class C:
__enter__ = lambda self: self
__exit__ = ExoticCallable()
as follows::
class CM:
__enter__ = lambda self: self
_exit = ExoticCallable()
__exit__ = lambda self, exc: CM._exit(exc)
While discussing the real-world impact of the problem in this PEP, it is
worth noting that most ``__exit__`` functions don't do anything with their
arguments. Typically, a context manager is implemented to ensure that some
cleanup actions take place upon exit. It is rarely appropriate for the
``__exit__`` function to handle exceptions raised within the context, and
they are typically allowed to propagate out of ``__exit__`` to the calling
function. This means that most ``__exit__`` functions do not access their
arguments at all, and we should take this into account when trying to
assess the impact of different solutions on Python's userbase.
Specification
=============
A context manager's ``__exit__``/``__aexit__`` method can have a single-arg
signature, in which case it is invoked by the interpreter with the argument
equal to an exception instance or ``None``:
.. code-block::
>>> class C:
... def __enter__(self):
... return self
... def __exit__(self, exc):
... print(f'__exit__ called with: {exc!r}')
...
>>> with C():
... pass
...
__exit__ called with: None
>>> with C():
... 1/0
...
__exit__ called with: ZeroDivisionError('division by zero')
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
ZeroDivisionError: division by zero
If ``__exit__``/``__aexit__`` has any other signature, it is invoked with
the 3-tuple ``(typ, exc, tb)`` as happens now:
.. code-block::
>>> class C:
... def __enter__(self):
... return self
... def __exit__(self, *exc):
... print(f'__exit__ called with: {exc!r}')
...
>>> with C():
... pass
...
__exit__ called with: (None, None, None)
>>> with C():
... 1/0
...
__exit__ called with: (<class 'ZeroDivisionError'>, ZeroDivisionError('division by zero'), <traceback object at 0x1039cb570>)
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
ZeroDivisionError: division by zero
These ``__exit__`` methods will also be called with a 3-tuple:
.. code-block::
def __exit__(self, typ, *exc):
pass
def __exit__(self, typ, exc, tb):
pass
A reference implementation is provided in
`CPython PR #101995 <https://github.com/python/cpython/pull/101995>`_.
When the interpreter reaches the end of the scope of a context manager,
and it is about to call the relevant ``__exit__`` or ``__aexit__`` function,
it instrospects this function to determine whether it is the single-arg
or the legacy 3-arg version. In the draft PR, this introspection is performed
by the ``is_legacy___exit__`` function:
.. code-block:: c
static int is_legacy___exit__(PyObject *exit_func) {
if (PyMethod_Check(exit_func)) {
PyObject *func = PyMethod_GET_FUNCTION(exit_func);
if (PyFunction_Check(func)) {
PyCodeObject *code = (PyCodeObject*)PyFunction_GetCode(func);
if (code->co_argcount == 2 && !(code->co_flags & CO_VARARGS)) {
/* Python method that expects self + one more arg */
return false;
}
}
}
else if (PyCFunction_Check(exit_func)) {
if (PyCFunction_GET_FLAGS(exit_func) == METH_O) {
/* C function declared as single-arg */
return false;
}
}
return true;
}
It is important to note that this is not a generic introspection function, but
rather one which is specifically designed for our use case. We know that
``exit_func`` is an attribute of the context manager class (taken from the
type of the object that provided ``__enter__``), and it is typically a function.
Furthermore, for this to be useful we need to identify enough single-arg forms,
but not necessarily all of them. What is critical for backwards compatibility is
that we will never misidentify a legacy ``exit_func`` as a single-arg one. So,
for example, ``__exit__(self, *args)`` and ``__exit__(self, exc_type, *args)``
both have the legacy form, even though they *could* be invoked with one arg.
In summary, an ``exit_func`` will be invoke with a single arg if:
* It is a ``PyMethod`` with ``argcount`` ``2`` (to count ``self``) and no vararg, or
* it is a ``PyCFunction`` with the ``METH_O`` flag.
Note that any performance cost of the introspection can be mitigated via
:pep:`specialization <659>`, so it won't be a problem if we need to make it more
sophisticated than this for some reason.
Backwards Compatibility
=======================
All context managers that previously worked will continue to work in the
same way because the interpreter will call them with three args whenever
they can accept three args. There may be context managers that previously
did not work because their ``exit_func`` expected one argument, so the call
to ``__exit__`` would have caused a ``TypeError`` exception to be raised,
and now the call would succeed. This could theoretically change the
behaviour of existing code, but it is unlikely to be a problem in practice.
The backwards compatibility concerns will show up in some cases when libraries
try to migrate their context managers from the multi-arg to the single-arg
signature. If ``__exit__`` or ``__aexit__`` is called by any code other than
the interpreter's eval loop, the introspection does not automatically happen.
For example, this will occur where a context manager is subclassed and its
``__exit__`` method is called directly from the derived ``__exit__``. Such
context managers will need to migrate to the single-arg version with their
users, and may choose to offer a parallel API rather than breaking the
existing one. Alternatively, a superclass can stay with the signature
``__exit__(self, *args)``, and support both one and three args. Since
most context managers do not use the value of the arguments to ``__exit__``,
and simply allow the exception to propagate onward, this is likely to be the
common approach.
Security Implications
=====================
I am not aware of any.
How to Teach This
=================
The language tutorial will present the single-arg version, and the documentation
for context managers will include a section on the legacy signatures of
``__exit__`` and ``__aexit__``.
Reference Implementation
========================
`CPython PR #101995 <https://github.com/python/cpython/pull/101995>`_
implements the proposal of this PEP.
Rejected Ideas
==============
Support ``__leave__(self, exc)``
----------------------------------
It was considered to support a method by a new name, such as ``__leave__``,
with the new signature. This basically makes the programmer explicitly declare
which signature they are intending to use, and avoid the need for introspection.
Different variations of this idea include different amounts of magic that can
help automate the equivalence between ``__leave__`` and ``__exit__``. For example,
`Mark Shannon suggested <https://github.com/faster-cpython/ideas/issues/550#issuecomment-1410120100>`_
that the type constructor would add a default implementation for each of ``__exit__``
and ``__leave__`` whenever one of them is defined on a class. This default
implementation acts as a trampoline that calls the user's function. This would
make inheritance work seamlessly, as well as the migration from ``__exit__`` to
``__leave__`` for particular classes. The interpreter would just need to call
``__leave__``, and that would call ``__exit__`` whenever necessary.
While this suggestion has several advantages over the current proposal, it has
two drawbacks. The first is that it adds a new dunder name to the data model,
and we would end up with two dunders that mean the same thing, and only slightly
differ in their signatures. The second is that it would require the migration of
every ``__exit__`` to ``__leave__``, while with introspection it would not be
necessary to change the many ``__exit__(*arg)`` methods that do not access their
args. While it is not as simple as a grep for ``__exit__``, it is possible to write
an AST visitor that detects ``__exit__`` methods that can accept multiple arguments,
and which do access them.
Copyright
=========
This document is placed in the public domain or under the
CC0-1.0-Universal license, whichever is more permissive.