PEP 709: Inlined comprehensions (#3029)
Co-authored-by: C.A.M. Gerlach <CAM.Gerlach@Gerlach.CAM> Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com>
This commit is contained in:
parent
6d182da522
commit
cf5741c181
|
@ -588,6 +588,7 @@ pep-0704.rst @brettcannon @pradyunsg
|
|||
# pep-0705.rst
|
||||
pep-0706.rst @encukou
|
||||
pep-0708.rst @dstufft
|
||||
pep-0709.rst @carljm
|
||||
# ...
|
||||
# pep-0754.txt
|
||||
# ...
|
||||
|
|
|
@ -0,0 +1,320 @@
|
|||
PEP: 709
|
||||
Title: Inlined comprehensions
|
||||
Author: Carl Meyer <carl@oddbird.net>
|
||||
Sponsor: Guido van Rossum <guido@python.org>
|
||||
Discussions-To: https://discuss.python.org/t/pep-709-inlined-comprehensions/24240
|
||||
Status: Draft
|
||||
Type: Standards Track
|
||||
Content-Type: text/x-rst
|
||||
Created: 24-Feb-2023
|
||||
Python-Version: 3.12
|
||||
Post-History: `25-Feb-2023 <https://discuss.python.org/t/pep-709-inlined-comprehensions/24240>`__
|
||||
|
||||
|
||||
Abstract
|
||||
========
|
||||
|
||||
Comprehensions are currently compiled as nested functions, which provides
|
||||
isolation of the comprehension's iteration variable, but is inefficient at
|
||||
runtime. This PEP proposes to inline list, dictionary, and set comprehensions
|
||||
into the function where they are defined, and provide the expected isolation by
|
||||
pushing/popping clashing locals on the stack. This change makes comprehensions
|
||||
much faster: up to 2x faster for a microbenchmark of a comprehension alone,
|
||||
translating to an 11% speedup for one sample benchmark derived from real-world
|
||||
code that makes heavy use of comprehensions in the context of doing actual
|
||||
work.
|
||||
|
||||
|
||||
Motivation
|
||||
==========
|
||||
|
||||
Comprehensions are a popular and widely-used feature of the Python language.
|
||||
The nested-function compilation of comprehensions optimizes for compiler
|
||||
simplicity at the expense of performance of user code. It is possible to
|
||||
provide near-identical semantics (see `Backwards Compatibility`_) with much
|
||||
better runtime performance for all users of comprehensions, with only a small
|
||||
increase in compiler complexity.
|
||||
|
||||
|
||||
Rationale
|
||||
=========
|
||||
|
||||
Inlining is a common compiler optimization in many languages. Generalized
|
||||
inlining of function calls at compile time in Python is near-impossible, since
|
||||
call targets may be patched at runtime. Comprehensions are a special case,
|
||||
where we have a call target known statically in the compiler that can neither
|
||||
be patched (barring undocumented and unsupported fiddling with bytecode
|
||||
directly) nor escape.
|
||||
|
||||
Inlining also permits other compiler optimizations of bytecode to be more
|
||||
effective, because they can now "see through" the comprehension bytecode,
|
||||
instead of it being an opaque call.
|
||||
|
||||
Normally a performance improvement would not require a PEP. In this case, the
|
||||
simplest and most efficient implementation results in some user-visible effects,
|
||||
so this is not just a performance improvement, it is a (small) change to the
|
||||
language.
|
||||
|
||||
|
||||
Specification
|
||||
=============
|
||||
|
||||
Given a simple comprehension::
|
||||
|
||||
def f(lst):
|
||||
return [x for x in lst]
|
||||
|
||||
The compiler currently emits the following bytecode for the function ``f``:
|
||||
|
||||
.. code-block:: text
|
||||
|
||||
1 0 RESUME 0
|
||||
|
||||
2 2 LOAD_CONST 1 (<code object <listcomp> at 0x...)
|
||||
4 MAKE_FUNCTION 0
|
||||
6 LOAD_FAST 0 (lst)
|
||||
8 GET_ITER
|
||||
10 CALL 0
|
||||
20 RETURN_VALUE
|
||||
|
||||
Disassembly of <code object <listcomp> at 0x...>:
|
||||
2 0 RESUME 0
|
||||
2 BUILD_LIST 0
|
||||
4 LOAD_FAST 0 (.0)
|
||||
>> 6 FOR_ITER 4 (to 18)
|
||||
10 STORE_FAST 1 (x)
|
||||
12 LOAD_FAST 1 (x)
|
||||
14 LIST_APPEND 2
|
||||
16 JUMP_BACKWARD 6 (to 6)
|
||||
>> 18 END_FOR
|
||||
20 RETURN_VALUE
|
||||
|
||||
The bytecode for the comprehension is in a separate code object. Each time
|
||||
``f()`` is called, a new single-use function object is allocated (by
|
||||
``MAKE_FUNCTION``), called (allocating and then destroying a new frame on the
|
||||
Python stack), and then immediately thrown away.
|
||||
|
||||
Under this PEP, the compiler will emit the following bytecode for ``f()``
|
||||
instead:
|
||||
|
||||
.. code-block:: text
|
||||
|
||||
1 0 RESUME 0
|
||||
|
||||
2 2 LOAD_FAST 0 (lst)
|
||||
4 GET_ITER
|
||||
6 LOAD_FAST_AND_CLEAR 1 (x)
|
||||
8 SWAP 2
|
||||
10 BUILD_LIST 0
|
||||
12 SWAP 2
|
||||
>> 14 FOR_ITER 4 (to 26)
|
||||
18 STORE_FAST 1 (x)
|
||||
20 LOAD_FAST 1 (x)
|
||||
22 LIST_APPEND 2
|
||||
24 JUMP_BACKWARD 6 (to 14)
|
||||
>> 26 END_FOR
|
||||
28 SWAP 2
|
||||
30 STORE_FAST 1 (x)
|
||||
32 RETURN_VALUE
|
||||
|
||||
There is no longer a separate code object, nor creation of a single-use function
|
||||
object, nor any need to create and destroy a Python frame.
|
||||
|
||||
Isolation of the ``x`` iteration variable is achieved by the combination of the
|
||||
new ``LOAD_FAST_AND_CLEAR`` opcode at offset ``6``, which saves any outer value
|
||||
of ``x`` on the stack before running the comprehension, and ``30 STORE_FAST``,
|
||||
which restores the outer value of ``x`` (if any) after running the
|
||||
comprehension.
|
||||
|
||||
If the comprehension accesses variables from the outer scope, inlining avoids
|
||||
the need to place these variables in a cell, allowing the comprehension (and all
|
||||
other code in the outer function) to access them as normal fast locals instead.
|
||||
This provides further performance gains.
|
||||
|
||||
Only comprehensions occurring inside functions, where fast-locals
|
||||
(``LOAD_FAST/STORE_FAST``) are used, will be inlined. Module-level
|
||||
comprehensions will continue to create and call a function.
|
||||
|
||||
Generator expressions are currently never inlined in the reference
|
||||
implementation of this PEP. In the future, some generator expressions may be
|
||||
inlined, where the returned generator object does not leak.
|
||||
|
||||
|
||||
Backwards Compatibility
|
||||
=======================
|
||||
|
||||
Comprehension inlining will cause the following visible behavior changes. No
|
||||
changes in the standard library or test suite were necessary to adapt to these
|
||||
changes in the implementation, suggesting the impact in user code is likely to
|
||||
be minimal.
|
||||
|
||||
Specialized tools depending on undocumented details of compiler bytecode output
|
||||
may of course be affected in ways beyond the below, but these tools already must
|
||||
adapt to bytecode changes in each Python version.
|
||||
|
||||
locals() includes outer variables
|
||||
---------------------------------
|
||||
|
||||
Calling ``locals()`` within a comprehension will include all locals of the
|
||||
function containing the comprehension. E.g. given the following function::
|
||||
|
||||
def f(lst):
|
||||
return [locals() for x in lst]
|
||||
|
||||
Calling ``f([1])`` in current Python will return::
|
||||
|
||||
[{'.0': <list_iterator object at 0x7f8d37170460>, 'x': 1}]
|
||||
|
||||
where ``.0`` is an internal implementation detail: the synthetic sole argument
|
||||
to the comprehension "function".
|
||||
|
||||
Under this PEP, it will instead return::
|
||||
|
||||
[{'lst': [1], 'x': 1}]
|
||||
|
||||
This now includes the outer ``lst`` variable as a local, and eliminates the
|
||||
synthetic ``.0``.
|
||||
|
||||
No comprehension frame in tracebacks
|
||||
------------------------------------
|
||||
|
||||
Under this PEP, a comprehension will no longer have its own dedicated frame in
|
||||
a stack trace. For example, given this function::
|
||||
|
||||
def g():
|
||||
raise RuntimeError("boom")
|
||||
|
||||
def f():
|
||||
return [g() for x in [1]]
|
||||
|
||||
Currently, calling ``f()`` results in the following traceback:
|
||||
|
||||
.. code-block:: text
|
||||
|
||||
Traceback (most recent call last):
|
||||
File "<stdin>", line 1, in <module>
|
||||
File "<stdin>", line 5, in f
|
||||
File "<stdin>", line 5, in <listcomp>
|
||||
File "<stdin>", line 2, in g
|
||||
RuntimeError: boom
|
||||
|
||||
Note the dedicated frame for ``<listcomp>``.
|
||||
|
||||
Under this PEP, the traceback looks like this instead:
|
||||
|
||||
.. code-block:: text
|
||||
|
||||
Traceback (most recent call last):
|
||||
File "<stdin>", line 1, in <module>
|
||||
File "<stdin>", line 5, in f
|
||||
File "<stdin>", line 2, in g
|
||||
RuntimeError: boom
|
||||
|
||||
There is no longer an extra frame for the list comprehension. The frame for the
|
||||
``f`` function has the correct line number for the comprehension, however, so
|
||||
this simply makes the traceback more compact without losing any useful
|
||||
information.
|
||||
|
||||
It is theoretically possible that code using warnings with the ``stacklevel``
|
||||
argument could observe a behavior change due to the frame stack change. In
|
||||
practice, however, this seems unlikely. It would require a warning raised in
|
||||
library code that is always called through a comprehension in that same
|
||||
library, where the warning is using a ``stacklevel`` of 3+ to bypass the
|
||||
comprehension and its containing function and point to a calling frame outside
|
||||
the library. In such a scenario it would usually be simpler and more reliable
|
||||
to raise the warning closer to the calling code and bypass fewer frames.
|
||||
|
||||
|
||||
UnboundLocalError instead of NameError
|
||||
--------------------------------------
|
||||
|
||||
Although the value of the comprehension iteration variable is saved and
|
||||
restored to provide isolation, it still becomes a local variable of the outer
|
||||
function under this PEP. This implies a small behavior change in a function
|
||||
where the comprehension iteration variable is accessed outside the
|
||||
comprehension without ever being set outside the comprehension::
|
||||
|
||||
def f(lst):
|
||||
items = [x for x in lst]
|
||||
return x
|
||||
|
||||
Under this PEP, calling ``f()`` will raise ``UnboundLocalError``, where
|
||||
currently it raises ``NameError``. ``UnboundLocalError`` is a subclass of
|
||||
``NameError``, so this should not impact code catching ``NameError``.
|
||||
|
||||
|
||||
How to Teach This
|
||||
=================
|
||||
|
||||
It is not intuitively obvious that comprehension syntax will or should result
|
||||
in creation and call of a nested function. For new users not already accustomed
|
||||
to the prior behavior, I suspect the new behavior in this PEP will be more
|
||||
intuitive and require less explanation. ("Why is there a ``<listcomp>`` line in
|
||||
my traceback when I didn't define any such function? What is this ``.0``
|
||||
variable I see in ``locals()``?")
|
||||
|
||||
|
||||
Security Implications
|
||||
=====================
|
||||
|
||||
None known.
|
||||
|
||||
|
||||
Reference Implementation
|
||||
========================
|
||||
|
||||
This PEP has a reference implementation in the form of `a PR against the CPython main
|
||||
branch <https://github.com/python/cpython/pull/101441>`_ which passes all tests.
|
||||
|
||||
The reference implementation performs the micro-benchmark ``./python -m pyperf
|
||||
timeit -s 'l = [1]' '[x for x in l]'`` 1.96x faster than the ``main`` branch (in a
|
||||
build compiled with ``--enable-optimizations``.)
|
||||
|
||||
The reference implementation performs the ``comprehensions`` benchmark in the
|
||||
`pyperformance <https://github.com/python/pyperformance>`_ benchmark suite
|
||||
(which is not a micro-benchmark of comprehensions alone, but tests
|
||||
real-world-derived code doing realistic work using comprehensions) 11% faster
|
||||
than ``main`` branch (again in optimized builds). Other benchmarks in
|
||||
pyperformance (none of which use comprehensions heavily) don't show any impact
|
||||
outside the noise.
|
||||
|
||||
The implementation has no impact on non-comprehension code.
|
||||
|
||||
|
||||
Rejected Ideas
|
||||
==============
|
||||
|
||||
More efficient comprehension calling, without inlining
|
||||
------------------------------------------------------
|
||||
|
||||
An `alternate approach <https://github.com/python/cpython/pull/101310>`_
|
||||
introduces a new opcode for "calling" a comprehension in streamlined fashion
|
||||
without the need to create a throwaway function object, but still creating a new
|
||||
Python frame. This avoids all of the visible effects listed under `Backwards
|
||||
Compatibility`_, and provides roughly half of the performance benefit (1.5x
|
||||
improvement on the microbenchmark, 4% improvement on ``comprehensions``
|
||||
benchmark in pyperformance.) It also requires adding a new pointer to the
|
||||
``_PyInterpreterFrame`` struct and a new ``Py_INCREF`` on each frame
|
||||
construction, meaning (unlike this PEP) it has a (very small) performance cost
|
||||
for all code. It also provides less scope for future optimizations.
|
||||
|
||||
This PEP takes the position that full inlining offers sufficient additional
|
||||
performance to more than justify the behavior changes.
|
||||
|
||||
Inlining module-level comprehensions
|
||||
------------------------------------
|
||||
|
||||
Module-level comprehensions are generally called only once (when the module is
|
||||
imported), so optimizing their performance is low priority. Inlining them would
|
||||
require separate code paths in the compiler to handle a module global namespace
|
||||
dictionary instead of fast-locals. It would be difficult or impossible to avoid
|
||||
breaking semantics, since the comprehension iteration variable itself would be
|
||||
a module global which might be referenced inside other functions that in turn
|
||||
could be called within the comprehension.
|
||||
|
||||
|
||||
Copyright
|
||||
=========
|
||||
|
||||
This document is placed in the public domain or under the
|
||||
CC0-1.0-Universal license, whichever is more permissive.
|
Loading…
Reference in New Issue