PEP 709: Inlined comprehensions (#3029)
Co-authored-by: C.A.M. Gerlach <CAM.Gerlach@Gerlach.CAM> Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com>
This commit is contained in:
parent
6d182da522
commit
cf5741c181
|
@ -588,6 +588,7 @@ pep-0704.rst @brettcannon @pradyunsg
|
||||||
# pep-0705.rst
|
# pep-0705.rst
|
||||||
pep-0706.rst @encukou
|
pep-0706.rst @encukou
|
||||||
pep-0708.rst @dstufft
|
pep-0708.rst @dstufft
|
||||||
|
pep-0709.rst @carljm
|
||||||
# ...
|
# ...
|
||||||
# pep-0754.txt
|
# pep-0754.txt
|
||||||
# ...
|
# ...
|
||||||
|
|
|
@ -0,0 +1,320 @@
|
||||||
|
PEP: 709
|
||||||
|
Title: Inlined comprehensions
|
||||||
|
Author: Carl Meyer <carl@oddbird.net>
|
||||||
|
Sponsor: Guido van Rossum <guido@python.org>
|
||||||
|
Discussions-To: https://discuss.python.org/t/pep-709-inlined-comprehensions/24240
|
||||||
|
Status: Draft
|
||||||
|
Type: Standards Track
|
||||||
|
Content-Type: text/x-rst
|
||||||
|
Created: 24-Feb-2023
|
||||||
|
Python-Version: 3.12
|
||||||
|
Post-History: `25-Feb-2023 <https://discuss.python.org/t/pep-709-inlined-comprehensions/24240>`__
|
||||||
|
|
||||||
|
|
||||||
|
Abstract
|
||||||
|
========
|
||||||
|
|
||||||
|
Comprehensions are currently compiled as nested functions, which provides
|
||||||
|
isolation of the comprehension's iteration variable, but is inefficient at
|
||||||
|
runtime. This PEP proposes to inline list, dictionary, and set comprehensions
|
||||||
|
into the function where they are defined, and provide the expected isolation by
|
||||||
|
pushing/popping clashing locals on the stack. This change makes comprehensions
|
||||||
|
much faster: up to 2x faster for a microbenchmark of a comprehension alone,
|
||||||
|
translating to an 11% speedup for one sample benchmark derived from real-world
|
||||||
|
code that makes heavy use of comprehensions in the context of doing actual
|
||||||
|
work.
|
||||||
|
|
||||||
|
|
||||||
|
Motivation
|
||||||
|
==========
|
||||||
|
|
||||||
|
Comprehensions are a popular and widely-used feature of the Python language.
|
||||||
|
The nested-function compilation of comprehensions optimizes for compiler
|
||||||
|
simplicity at the expense of performance of user code. It is possible to
|
||||||
|
provide near-identical semantics (see `Backwards Compatibility`_) with much
|
||||||
|
better runtime performance for all users of comprehensions, with only a small
|
||||||
|
increase in compiler complexity.
|
||||||
|
|
||||||
|
|
||||||
|
Rationale
|
||||||
|
=========
|
||||||
|
|
||||||
|
Inlining is a common compiler optimization in many languages. Generalized
|
||||||
|
inlining of function calls at compile time in Python is near-impossible, since
|
||||||
|
call targets may be patched at runtime. Comprehensions are a special case,
|
||||||
|
where we have a call target known statically in the compiler that can neither
|
||||||
|
be patched (barring undocumented and unsupported fiddling with bytecode
|
||||||
|
directly) nor escape.
|
||||||
|
|
||||||
|
Inlining also permits other compiler optimizations of bytecode to be more
|
||||||
|
effective, because they can now "see through" the comprehension bytecode,
|
||||||
|
instead of it being an opaque call.
|
||||||
|
|
||||||
|
Normally a performance improvement would not require a PEP. In this case, the
|
||||||
|
simplest and most efficient implementation results in some user-visible effects,
|
||||||
|
so this is not just a performance improvement, it is a (small) change to the
|
||||||
|
language.
|
||||||
|
|
||||||
|
|
||||||
|
Specification
|
||||||
|
=============
|
||||||
|
|
||||||
|
Given a simple comprehension::
|
||||||
|
|
||||||
|
def f(lst):
|
||||||
|
return [x for x in lst]
|
||||||
|
|
||||||
|
The compiler currently emits the following bytecode for the function ``f``:
|
||||||
|
|
||||||
|
.. code-block:: text
|
||||||
|
|
||||||
|
1 0 RESUME 0
|
||||||
|
|
||||||
|
2 2 LOAD_CONST 1 (<code object <listcomp> at 0x...)
|
||||||
|
4 MAKE_FUNCTION 0
|
||||||
|
6 LOAD_FAST 0 (lst)
|
||||||
|
8 GET_ITER
|
||||||
|
10 CALL 0
|
||||||
|
20 RETURN_VALUE
|
||||||
|
|
||||||
|
Disassembly of <code object <listcomp> at 0x...>:
|
||||||
|
2 0 RESUME 0
|
||||||
|
2 BUILD_LIST 0
|
||||||
|
4 LOAD_FAST 0 (.0)
|
||||||
|
>> 6 FOR_ITER 4 (to 18)
|
||||||
|
10 STORE_FAST 1 (x)
|
||||||
|
12 LOAD_FAST 1 (x)
|
||||||
|
14 LIST_APPEND 2
|
||||||
|
16 JUMP_BACKWARD 6 (to 6)
|
||||||
|
>> 18 END_FOR
|
||||||
|
20 RETURN_VALUE
|
||||||
|
|
||||||
|
The bytecode for the comprehension is in a separate code object. Each time
|
||||||
|
``f()`` is called, a new single-use function object is allocated (by
|
||||||
|
``MAKE_FUNCTION``), called (allocating and then destroying a new frame on the
|
||||||
|
Python stack), and then immediately thrown away.
|
||||||
|
|
||||||
|
Under this PEP, the compiler will emit the following bytecode for ``f()``
|
||||||
|
instead:
|
||||||
|
|
||||||
|
.. code-block:: text
|
||||||
|
|
||||||
|
1 0 RESUME 0
|
||||||
|
|
||||||
|
2 2 LOAD_FAST 0 (lst)
|
||||||
|
4 GET_ITER
|
||||||
|
6 LOAD_FAST_AND_CLEAR 1 (x)
|
||||||
|
8 SWAP 2
|
||||||
|
10 BUILD_LIST 0
|
||||||
|
12 SWAP 2
|
||||||
|
>> 14 FOR_ITER 4 (to 26)
|
||||||
|
18 STORE_FAST 1 (x)
|
||||||
|
20 LOAD_FAST 1 (x)
|
||||||
|
22 LIST_APPEND 2
|
||||||
|
24 JUMP_BACKWARD 6 (to 14)
|
||||||
|
>> 26 END_FOR
|
||||||
|
28 SWAP 2
|
||||||
|
30 STORE_FAST 1 (x)
|
||||||
|
32 RETURN_VALUE
|
||||||
|
|
||||||
|
There is no longer a separate code object, nor creation of a single-use function
|
||||||
|
object, nor any need to create and destroy a Python frame.
|
||||||
|
|
||||||
|
Isolation of the ``x`` iteration variable is achieved by the combination of the
|
||||||
|
new ``LOAD_FAST_AND_CLEAR`` opcode at offset ``6``, which saves any outer value
|
||||||
|
of ``x`` on the stack before running the comprehension, and ``30 STORE_FAST``,
|
||||||
|
which restores the outer value of ``x`` (if any) after running the
|
||||||
|
comprehension.
|
||||||
|
|
||||||
|
If the comprehension accesses variables from the outer scope, inlining avoids
|
||||||
|
the need to place these variables in a cell, allowing the comprehension (and all
|
||||||
|
other code in the outer function) to access them as normal fast locals instead.
|
||||||
|
This provides further performance gains.
|
||||||
|
|
||||||
|
Only comprehensions occurring inside functions, where fast-locals
|
||||||
|
(``LOAD_FAST/STORE_FAST``) are used, will be inlined. Module-level
|
||||||
|
comprehensions will continue to create and call a function.
|
||||||
|
|
||||||
|
Generator expressions are currently never inlined in the reference
|
||||||
|
implementation of this PEP. In the future, some generator expressions may be
|
||||||
|
inlined, where the returned generator object does not leak.
|
||||||
|
|
||||||
|
|
||||||
|
Backwards Compatibility
|
||||||
|
=======================
|
||||||
|
|
||||||
|
Comprehension inlining will cause the following visible behavior changes. No
|
||||||
|
changes in the standard library or test suite were necessary to adapt to these
|
||||||
|
changes in the implementation, suggesting the impact in user code is likely to
|
||||||
|
be minimal.
|
||||||
|
|
||||||
|
Specialized tools depending on undocumented details of compiler bytecode output
|
||||||
|
may of course be affected in ways beyond the below, but these tools already must
|
||||||
|
adapt to bytecode changes in each Python version.
|
||||||
|
|
||||||
|
locals() includes outer variables
|
||||||
|
---------------------------------
|
||||||
|
|
||||||
|
Calling ``locals()`` within a comprehension will include all locals of the
|
||||||
|
function containing the comprehension. E.g. given the following function::
|
||||||
|
|
||||||
|
def f(lst):
|
||||||
|
return [locals() for x in lst]
|
||||||
|
|
||||||
|
Calling ``f([1])`` in current Python will return::
|
||||||
|
|
||||||
|
[{'.0': <list_iterator object at 0x7f8d37170460>, 'x': 1}]
|
||||||
|
|
||||||
|
where ``.0`` is an internal implementation detail: the synthetic sole argument
|
||||||
|
to the comprehension "function".
|
||||||
|
|
||||||
|
Under this PEP, it will instead return::
|
||||||
|
|
||||||
|
[{'lst': [1], 'x': 1}]
|
||||||
|
|
||||||
|
This now includes the outer ``lst`` variable as a local, and eliminates the
|
||||||
|
synthetic ``.0``.
|
||||||
|
|
||||||
|
No comprehension frame in tracebacks
|
||||||
|
------------------------------------
|
||||||
|
|
||||||
|
Under this PEP, a comprehension will no longer have its own dedicated frame in
|
||||||
|
a stack trace. For example, given this function::
|
||||||
|
|
||||||
|
def g():
|
||||||
|
raise RuntimeError("boom")
|
||||||
|
|
||||||
|
def f():
|
||||||
|
return [g() for x in [1]]
|
||||||
|
|
||||||
|
Currently, calling ``f()`` results in the following traceback:
|
||||||
|
|
||||||
|
.. code-block:: text
|
||||||
|
|
||||||
|
Traceback (most recent call last):
|
||||||
|
File "<stdin>", line 1, in <module>
|
||||||
|
File "<stdin>", line 5, in f
|
||||||
|
File "<stdin>", line 5, in <listcomp>
|
||||||
|
File "<stdin>", line 2, in g
|
||||||
|
RuntimeError: boom
|
||||||
|
|
||||||
|
Note the dedicated frame for ``<listcomp>``.
|
||||||
|
|
||||||
|
Under this PEP, the traceback looks like this instead:
|
||||||
|
|
||||||
|
.. code-block:: text
|
||||||
|
|
||||||
|
Traceback (most recent call last):
|
||||||
|
File "<stdin>", line 1, in <module>
|
||||||
|
File "<stdin>", line 5, in f
|
||||||
|
File "<stdin>", line 2, in g
|
||||||
|
RuntimeError: boom
|
||||||
|
|
||||||
|
There is no longer an extra frame for the list comprehension. The frame for the
|
||||||
|
``f`` function has the correct line number for the comprehension, however, so
|
||||||
|
this simply makes the traceback more compact without losing any useful
|
||||||
|
information.
|
||||||
|
|
||||||
|
It is theoretically possible that code using warnings with the ``stacklevel``
|
||||||
|
argument could observe a behavior change due to the frame stack change. In
|
||||||
|
practice, however, this seems unlikely. It would require a warning raised in
|
||||||
|
library code that is always called through a comprehension in that same
|
||||||
|
library, where the warning is using a ``stacklevel`` of 3+ to bypass the
|
||||||
|
comprehension and its containing function and point to a calling frame outside
|
||||||
|
the library. In such a scenario it would usually be simpler and more reliable
|
||||||
|
to raise the warning closer to the calling code and bypass fewer frames.
|
||||||
|
|
||||||
|
|
||||||
|
UnboundLocalError instead of NameError
|
||||||
|
--------------------------------------
|
||||||
|
|
||||||
|
Although the value of the comprehension iteration variable is saved and
|
||||||
|
restored to provide isolation, it still becomes a local variable of the outer
|
||||||
|
function under this PEP. This implies a small behavior change in a function
|
||||||
|
where the comprehension iteration variable is accessed outside the
|
||||||
|
comprehension without ever being set outside the comprehension::
|
||||||
|
|
||||||
|
def f(lst):
|
||||||
|
items = [x for x in lst]
|
||||||
|
return x
|
||||||
|
|
||||||
|
Under this PEP, calling ``f()`` will raise ``UnboundLocalError``, where
|
||||||
|
currently it raises ``NameError``. ``UnboundLocalError`` is a subclass of
|
||||||
|
``NameError``, so this should not impact code catching ``NameError``.
|
||||||
|
|
||||||
|
|
||||||
|
How to Teach This
|
||||||
|
=================
|
||||||
|
|
||||||
|
It is not intuitively obvious that comprehension syntax will or should result
|
||||||
|
in creation and call of a nested function. For new users not already accustomed
|
||||||
|
to the prior behavior, I suspect the new behavior in this PEP will be more
|
||||||
|
intuitive and require less explanation. ("Why is there a ``<listcomp>`` line in
|
||||||
|
my traceback when I didn't define any such function? What is this ``.0``
|
||||||
|
variable I see in ``locals()``?")
|
||||||
|
|
||||||
|
|
||||||
|
Security Implications
|
||||||
|
=====================
|
||||||
|
|
||||||
|
None known.
|
||||||
|
|
||||||
|
|
||||||
|
Reference Implementation
|
||||||
|
========================
|
||||||
|
|
||||||
|
This PEP has a reference implementation in the form of `a PR against the CPython main
|
||||||
|
branch <https://github.com/python/cpython/pull/101441>`_ which passes all tests.
|
||||||
|
|
||||||
|
The reference implementation performs the micro-benchmark ``./python -m pyperf
|
||||||
|
timeit -s 'l = [1]' '[x for x in l]'`` 1.96x faster than the ``main`` branch (in a
|
||||||
|
build compiled with ``--enable-optimizations``.)
|
||||||
|
|
||||||
|
The reference implementation performs the ``comprehensions`` benchmark in the
|
||||||
|
`pyperformance <https://github.com/python/pyperformance>`_ benchmark suite
|
||||||
|
(which is not a micro-benchmark of comprehensions alone, but tests
|
||||||
|
real-world-derived code doing realistic work using comprehensions) 11% faster
|
||||||
|
than ``main`` branch (again in optimized builds). Other benchmarks in
|
||||||
|
pyperformance (none of which use comprehensions heavily) don't show any impact
|
||||||
|
outside the noise.
|
||||||
|
|
||||||
|
The implementation has no impact on non-comprehension code.
|
||||||
|
|
||||||
|
|
||||||
|
Rejected Ideas
|
||||||
|
==============
|
||||||
|
|
||||||
|
More efficient comprehension calling, without inlining
|
||||||
|
------------------------------------------------------
|
||||||
|
|
||||||
|
An `alternate approach <https://github.com/python/cpython/pull/101310>`_
|
||||||
|
introduces a new opcode for "calling" a comprehension in streamlined fashion
|
||||||
|
without the need to create a throwaway function object, but still creating a new
|
||||||
|
Python frame. This avoids all of the visible effects listed under `Backwards
|
||||||
|
Compatibility`_, and provides roughly half of the performance benefit (1.5x
|
||||||
|
improvement on the microbenchmark, 4% improvement on ``comprehensions``
|
||||||
|
benchmark in pyperformance.) It also requires adding a new pointer to the
|
||||||
|
``_PyInterpreterFrame`` struct and a new ``Py_INCREF`` on each frame
|
||||||
|
construction, meaning (unlike this PEP) it has a (very small) performance cost
|
||||||
|
for all code. It also provides less scope for future optimizations.
|
||||||
|
|
||||||
|
This PEP takes the position that full inlining offers sufficient additional
|
||||||
|
performance to more than justify the behavior changes.
|
||||||
|
|
||||||
|
Inlining module-level comprehensions
|
||||||
|
------------------------------------
|
||||||
|
|
||||||
|
Module-level comprehensions are generally called only once (when the module is
|
||||||
|
imported), so optimizing their performance is low priority. Inlining them would
|
||||||
|
require separate code paths in the compiler to handle a module global namespace
|
||||||
|
dictionary instead of fast-locals. It would be difficult or impossible to avoid
|
||||||
|
breaking semantics, since the comprehension iteration variable itself would be
|
||||||
|
a module global which might be referenced inside other functions that in turn
|
||||||
|
could be called within the comprehension.
|
||||||
|
|
||||||
|
|
||||||
|
Copyright
|
||||||
|
=========
|
||||||
|
|
||||||
|
This document is placed in the public domain or under the
|
||||||
|
CC0-1.0-Universal license, whichever is more permissive.
|
Loading…
Reference in New Issue