From 23e1d474ad64a50ed771e47802391b28e4c382c1 Mon Sep 17 00:00:00 2001 From: Irit Katriel <1055913+iritkatriel@users.noreply.github.com> Date: Thu, 2 Mar 2023 10:28:07 +0000 Subject: [PATCH] PEP 707: A simplified signature for __exit__ and __aexit__ (#3018) --- .github/CODEOWNERS | 1 + conf.py | 1 + pep-0707.rst | 393 +++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 395 insertions(+) create mode 100644 pep-0707.rst diff --git a/.github/CODEOWNERS b/.github/CODEOWNERS index d39689cc1..f96695da9 100644 --- a/.github/CODEOWNERS +++ b/.github/CODEOWNERS @@ -587,6 +587,7 @@ pep-0703.rst @ambv pep-0704.rst @brettcannon @pradyunsg # pep-0705.rst pep-0706.rst @encukou +pep-0707.rst @iritkatriel pep-0708.rst @dstufft pep-0709.rst @carljm # ... diff --git a/conf.py b/conf.py index 599f210f1..9fa30be5d 100644 --- a/conf.py +++ b/conf.py @@ -50,6 +50,7 @@ intersphinx_mapping = { 'python': ('https://docs.python.org/3/', None), 'packaging': ('https://packaging.python.org/en/latest/', None), 'py3.11': ('https://docs.python.org/3.11/', None), + 'py3.12': ('https://docs.python.org/3.12/', None), } intersphinx_disabled_reftypes = [] diff --git a/pep-0707.rst b/pep-0707.rst new file mode 100644 index 000000000..bf78f904b --- /dev/null +++ b/pep-0707.rst @@ -0,0 +1,393 @@ +PEP: 707 +Title: A simplified signature for __exit__ and __aexit__ +Author: Irit Katriel +Discussions-To: +Status: Draft +Type: Standards Track +Content-Type: text/x-rst +Created: 18-Feb-2023 +Python-Version: 3.12 +Post-History: +Resolution: + + +Abstract +======== + +This PEP proposes to make the interpreter accept context managers whose +:meth:`~py3.11:object.__exit__` / :meth:`~py3.11:object.__aexit__` method +takes only a single exception instance, +while continuing to also support the current ``(typ, exc, tb)`` signature +for backwards compatibility. + +This proposal is part of an ongoing effort to remove the redundancy of +the 3-item exception representation from the language, a relic of earlier +Python versions which now confuses language users while adding complexity +and overhead to the interpreter. + +The proposed implementation uses introspection, which is tailored to the +requirements of this use case. The solution ensures the safety of the new +feature by supporting it only in non-ambiguous cases. In particular, any +signature that *could* accept three arguments is assumed to expect them. + +Because reliable introspection of callables is not currently possible in +Python, the solution proposed here is limited in that only the common types +of single-arg callables will be identified as such, while some of the more +esoteric ones will continue to be called with three arguments. This +imperfect solution was chosen among several imperfect alternatives in the +spirit of practicality. It is my hope that the discussion about this PEP +will explore the other options and lead us to the best way forward, which +may well be to remain with our imperfect status quo. + + +Motivation +========== + +In the past, an exception was represented in many parts of Python by a +tuple of three elements: the type of the exception, its value, and its +traceback. While there were good reasons for this design at the time, +they no longer hold because the type and traceback can now be reliably +deduced from the exception instance. Over the last few years we saw +several efforts to simplify the representation of exceptions. + +Since 3.10 in `CPython PR #70577 `_, +the :mod:`py3.11:traceback` module's functions accept either a 3-tuple +as described above, or just an exception instance as a single argument. + +Internally, the interpreter no longer represents exceptions as a triplet. +This was `removed for the handled exception in 3.11 +`_ and +`for the raised exception in 3.12 +`_. As a consequence, +several APIs that expose the triplet were deprecated in favour of +simpler alternatives: + +.. list-table:: + :header-rows: 1 + :widths: auto + + * - + - Deprecated + - Alternative + * - Get handled exception (Python) + - :func:`py3.12:sys.exc_info` + - :func:`py3.12:sys.exception` + * - Get handled exception (C) + - :external+py3.12:c:func:`PyErr_GetExcInfo` + - :external+py3.12:c:func:`PyErr_GetHandledException` + * - Set handled exception (C) + - :external+py3.12:c:func:`PyErr_SetExcInfo` + - :external+py3.12:c:func:`PyErr_SetHandledException` + * - Get raised exception (C) + - :external+py3.12:c:func:`PyErr_Fetch` + - :external+py3.12:c:func:`PyErr_GetRaisedException` + * - Set raised exception (C) + - :external+py3.12:c:func:`PyErr_Restore` + - :external+py3.12:c:func:`PyErr_SetRaisedException` + * - Construct an exception instance from the 3-tuple (C) + - :external+py3.12:c:func:`PyErr_NormalizeException` + - N/A + + +The current proposal is a step in this process, and considers the way +forward for one more case in which the 3-tuple representation has +leaked to the language. The motivation for all this work is twofold. + +Simplify the implementation of the language +------------------------------------------- + +The simplification gained by reducing the interpreter's internal +representation of the handled exception to a single object was significant. +Previously, the interpreter needed to push onto/pop +from the stack three items whenever it did anything with exceptions. +This increased stack depth (adding pressure on caches and registers) and +complicated some of the bytecodes. Reducing this to one item +`removed about 100 lines of code `_ +from ``ceval.c`` (the interpreter's eval loop implementation), and it was later +followed by the removal of the ``POP_EXCEPT_AND_RERAISE`` opcode which has +become simple enough to be `replaced by generic stack manipulation instructions +`_. Micro-benchmarks showed +`a speedup of about 10% for catching and raising an exception, as well as +for creating generators +`_. +To summarize, removing this redundancy in Python's internals simplified the +interpreter and made it faster. + +The performance of invoking ``__exit__``/``__aexit__`` when leaving +a context manager can be also improved by replacing a multi-arg function +call with a single-arg one. Micro-benchmarks showed that entering and exiting +a context manager with single-arg ``__exit__`` is about 13% faster. + +Simplify the language itself +---------------------------- + +One of the reasons for the popularity of Python is its simplicity. The +:func:`py3.11:sys.exc_info` triplet is cryptic for new learners, +and the redundancy in it is confusing for those who do understand it. + +It will take multiple releases to get to a point where we can think of +deprecating ``sys.exc_info()``. However, we can relatively quickly reach a +stage where new learners do not need to know about it, or about the 3-tuple +representation, at least until they are maintaining legacy code. + +Rationale +========= + +The only reason to object today to the removal of the last remaining +appearances of the 3-tuple from the language is the concerns about +disruption that such changes can bring. The goal of this PEP is to propose +a safe, gradual and minimally disruptive way to make this change in the +case of ``__exit__``, and with this to initiate a discussion of our options +for evolving its method signature. + +In the case of the :mod:`py3.11:traceback` module's API, evolving the +functions to have a hybrid signature is relatively straightforward and +safe. The functions take one positional and two optional arguments, and +interpret them according to their types. This is safe when sentinels +are used for default values. The signatures of callbacks, which are +defined by the user's program, are harder to evolve. + +The safest option is to make the user explicitly indicate which signature +the callback is expecting, by marking it with an additional attribute or +giving it a different name. For example, we could make the interpreter +look for a ``__leave__`` method on the context manager, and call it with +a single arg if it exists (otherwise, it looks for ``__exit__`` and +continues as it does now). The introspection-based alternative proposed +here intends to make it more convenient for users to write new code, +because they can just use the single-arg version and remain unaware of +the legacy API. However, if the limitations of introspection are found +to be too severe, we should consider an explicit option. Having both +``__exit__`` and ``__leave__`` around for 5-10 years with similar +functionality is not ideal, but it is an option. + +Let us now examine the limitations of the current proposal. It identifies +2-arg python functions and ``METH_O`` C functions as having a single-arg +signature, and assumes that anything else is expecting 3 args. Obviously +it is possible to create false negatives for this heuristic (single-arg +callables that it will not identify). Context managers written in this +way won't work, they will continue to fail as they do now when their +``__exit__`` function will be called with three arguments. + +I believe that it will not be a problem in practice. First, all working +code will continue to work, so this is a limitation on new code rather +than a problem impacting existing code. Second, exotic callable types are +rarely used for ``__exit__`` and if one is needed, it can always be wrapped +by a plain vanilla method that delegates to the callable. For example, we +can write this:: + + class C: + __enter__ = lambda self: self + __exit__ = ExoticCallable() + +as follows:: + + class CM: + __enter__ = lambda self: self + _exit = ExoticCallable() + __exit__ = lambda self, exc: CM._exit(exc) + +While discussing the real-world impact of the problem in this PEP, it is +worth noting that most ``__exit__`` functions don't do anything with their +arguments. Typically, a context manager is implemented to ensure that some +cleanup actions take place upon exit. It is rarely appropriate for the +``__exit__`` function to handle exceptions raised within the context, and +they are typically allowed to propagate out of ``__exit__`` to the calling +function. This means that most ``__exit__`` functions do not access their +arguments at all, and we should take this into account when trying to +assess the impact of different solutions on Python's userbase. + + +Specification +============= + +A context manager's ``__exit__``/``__aexit__`` method can have a single-arg +signature, in which case it is invoked by the interpreter with the argument +equal to an exception instance or ``None``: + +.. code-block:: + + >>> class C: + ... def __enter__(self): + ... return self + ... def __exit__(self, exc): + ... print(f'__exit__ called with: {exc!r}') + ... + >>> with C(): + ... pass + ... + __exit__ called with: None + >>> with C(): + ... 1/0 + ... + __exit__ called with: ZeroDivisionError('division by zero') + Traceback (most recent call last): + File "", line 2, in + ZeroDivisionError: division by zero + +If ``__exit__``/``__aexit__`` has any other signature, it is invoked with +the 3-tuple ``(typ, exc, tb)`` as happens now: + +.. code-block:: + + >>> class C: + ... def __enter__(self): + ... return self + ... def __exit__(self, *exc): + ... print(f'__exit__ called with: {exc!r}') + ... + >>> with C(): + ... pass + ... + __exit__ called with: (None, None, None) + >>> with C(): + ... 1/0 + ... + __exit__ called with: (, ZeroDivisionError('division by zero'), ) + Traceback (most recent call last): + File "", line 2, in + ZeroDivisionError: division by zero + + +These ``__exit__`` methods will also be called with a 3-tuple: + +.. code-block:: + + def __exit__(self, typ, *exc): + pass + + def __exit__(self, typ, exc, tb): + pass + +A reference implementation is provided in +`CPython PR #101995 `_. + +When the interpreter reaches the end of the scope of a context manager, +and it is about to call the relevant ``__exit__`` or ``__aexit__`` function, +it instrospects this function to determine whether it is the single-arg +or the legacy 3-arg version. In the draft PR, this introspection is performed +by the ``is_legacy___exit__`` function: + +.. code-block:: c + + static int is_legacy___exit__(PyObject *exit_func) { + if (PyMethod_Check(exit_func)) { + PyObject *func = PyMethod_GET_FUNCTION(exit_func); + if (PyFunction_Check(func)) { + PyCodeObject *code = (PyCodeObject*)PyFunction_GetCode(func); + if (code->co_argcount == 2 && !(code->co_flags & CO_VARARGS)) { + /* Python method that expects self + one more arg */ + return false; + } + } + } + else if (PyCFunction_Check(exit_func)) { + if (PyCFunction_GET_FLAGS(exit_func) == METH_O) { + /* C function declared as single-arg */ + return false; + } + } + return true; + } + +It is important to note that this is not a generic introspection function, but +rather one which is specifically designed for our use case. We know that +``exit_func`` is an attribute of the context manager class (taken from the +type of the object that provided ``__enter__``), and it is typically a function. +Furthermore, for this to be useful we need to identify enough single-arg forms, +but not necessarily all of them. What is critical for backwards compatibility is +that we will never misidentify a legacy ``exit_func`` as a single-arg one. So, +for example, ``__exit__(self, *args)`` and ``__exit__(self, exc_type, *args)`` +both have the legacy form, even though they *could* be invoked with one arg. + +In summary, an ``exit_func`` will be invoke with a single arg if: + +* It is a ``PyMethod`` with ``argcount`` ``2`` (to count ``self``) and no vararg, or +* it is a ``PyCFunction`` with the ``METH_O`` flag. + +Note that any performance cost of the introspection can be mitigated via +:pep:`specialization <564>`, so it won't be a problem if we need to make it more +sophisticated than this for some reason. + + +Backwards Compatibility +======================= + +All context managers that previously worked will continue to work in the +same way because the interpreter will call them with three args whenever +they can accept three args. There may be context managers that previously +did not work because their ``exit_func`` expected one argument, so the call +to ``__exit__`` would have caused a ``TypeError`` exception to be raised, +and now the call would succeed. This could theoretically change the +behaviour of existing code, but it is unlikely to be a problem in practice. + +The backwards compatibility concerns will show up in some cases when libraries +try to migrate their context managers from the multi-arg to the single-arg +signature. If ``__exit__`` or ``__aexit__`` is called by any code other than +the interpreter's eval loop, the introspection does not automatically happen. +For example, this will occur where a context manager is subclassed and its +``__exit__`` method is called directly from the derived ``__exit__``. Such +context managers will need to migrate to the single-arg version with their +users, and may choose to offer a parallel API rather than breaking the +existing one. Alternatively, a superclass can stay with the signature +``__exit__(self, *args)``, and support both one and three args. Since +most context managers do not use the value of the arguments to ``__exit__``, +and simply allow the exception to propagate onward, this is likely to be the +common approach. + + +Security Implications +===================== + +I am not aware of any. + +How to Teach This +================= + +The language tutorial will present the single-arg version, and the documentation +for context managers will include a section on the legacy signatures of +``__exit__`` and ``__aexit__``. + + +Reference Implementation +======================== + +`CPython PR #101995 `_ +implements the proposal of this PEP. + + +Rejected Ideas +============== + +Support ``__leave__(self, exc)`` +---------------------------------- + +It was considered to support a method by a new name, such as ``__leave__``, +with the new signature. This basically makes the programmer explicitly declare +which signature they are intending to use, and avoid the need for introspection. + +Different variations of this idea include different amounts of magic that can +help automate the equivalence between ``__leave__`` and ``__exit__``. For example, +`Mark Shannon suggested `_ +that the type constructor would add a default implementation for each of ``__exit__`` +and ``__leave__`` whenever one of them is defined on a class. This default +implementation acts as a trampoline that calls the user's function. This would +make inheritance work seamlessly, as well as the migration from ``__exit__`` to +``__leave__`` for particular classes. The interpreter would just need to call +``__leave__``, and that would call ``__exit__`` whenever necessary. + +While this suggestion has several advantages over the current proposal, it has +two drawbacks. The first is that it adds a new dunder name to the data model, +and we would end up with two dunders that mean the same thing, and only slightly +differ in their signatures. The second is that it would require the migration of +every ``__exit__`` to ``__leave__``, while with introspection it would not be +necessary to change the many ``__exit__(*arg)`` methods that do not access their +args. While it is not as simple as a grep for ``__exit__``, it is possible to write +an AST visitor that detects ``__exit__`` methods that can accept multiple arguments, +and which do access them. + + +Copyright +========= + +This document is placed in the public domain or under the +CC0-1.0-Universal license, whichever is more permissive.