python-peps/pep-0659/index.html

542 lines
32 KiB
HTML
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="color-scheme" content="light dark">
<title>PEP 659 Specializing Adaptive Interpreter | peps.python.org</title>
<link rel="shortcut icon" href="../_static/py.png">
<link rel="canonical" href="https://peps.python.org/pep-0659/">
<link rel="stylesheet" href="../_static/style.css" type="text/css">
<link rel="stylesheet" href="../_static/mq.css" type="text/css">
<link rel="stylesheet" href="../_static/pygments.css" type="text/css" media="(prefers-color-scheme: light)" id="pyg-light">
<link rel="stylesheet" href="../_static/pygments_dark.css" type="text/css" media="(prefers-color-scheme: dark)" id="pyg-dark">
<link rel="alternate" type="application/rss+xml" title="Latest PEPs" href="https://peps.python.org/peps.rss">
<meta property="og:title" content='PEP 659 Specializing Adaptive Interpreter | peps.python.org'>
<meta property="og:description" content="In order to perform well, virtual machines for dynamic languages must specialize the code that they execute to the types and values in the program being run. This specialization is often associated with “JIT” compilers, but is beneficial even without ma...">
<meta property="og:type" content="website">
<meta property="og:url" content="https://peps.python.org/pep-0659/">
<meta property="og:site_name" content="Python Enhancement Proposals (PEPs)">
<meta property="og:image" content="https://peps.python.org/_static/og-image.png">
<meta property="og:image:alt" content="Python PEPs">
<meta property="og:image:width" content="200">
<meta property="og:image:height" content="200">
<meta name="description" content="In order to perform well, virtual machines for dynamic languages must specialize the code that they execute to the types and values in the program being run. This specialization is often associated with “JIT” compilers, but is beneficial even without ma...">
<meta name="theme-color" content="#3776ab">
</head>
<body>
<svg xmlns="http://www.w3.org/2000/svg" style="display: none;">
<symbol id="svg-sun-half" viewBox="0 0 24 24" pointer-events="all">
<title>Following system colour scheme</title>
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none"
stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round">
<circle cx="12" cy="12" r="9"></circle>
<path d="M12 3v18m0-12l4.65-4.65M12 14.3l7.37-7.37M12 19.6l8.85-8.85"></path>
</svg>
</symbol>
<symbol id="svg-moon" viewBox="0 0 24 24" pointer-events="all">
<title>Selected dark colour scheme</title>
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none"
stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round">
<path stroke="none" d="M0 0h24v24H0z" fill="none"></path>
<path d="M12 3c.132 0 .263 0 .393 0a7.5 7.5 0 0 0 7.92 12.446a9 9 0 1 1 -8.313 -12.454z"></path>
</svg>
</symbol>
<symbol id="svg-sun" viewBox="0 0 24 24" pointer-events="all">
<title>Selected light colour scheme</title>
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none"
stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round">
<circle cx="12" cy="12" r="5"></circle>
<line x1="12" y1="1" x2="12" y2="3"></line>
<line x1="12" y1="21" x2="12" y2="23"></line>
<line x1="4.22" y1="4.22" x2="5.64" y2="5.64"></line>
<line x1="18.36" y1="18.36" x2="19.78" y2="19.78"></line>
<line x1="1" y1="12" x2="3" y2="12"></line>
<line x1="21" y1="12" x2="23" y2="12"></line>
<line x1="4.22" y1="19.78" x2="5.64" y2="18.36"></line>
<line x1="18.36" y1="5.64" x2="19.78" y2="4.22"></line>
</svg>
</symbol>
</svg>
<script>
document.documentElement.dataset.colour_scheme = localStorage.getItem("colour_scheme") || "auto"
</script>
<section id="pep-page-section">
<header>
<h1>Python Enhancement Proposals</h1>
<ul class="breadcrumbs">
<li><a href="https://www.python.org/" title="The Python Programming Language">Python</a> &raquo; </li>
<li><a href="../pep-0000/">PEP Index</a> &raquo; </li>
<li>PEP 659</li>
</ul>
<button id="colour-scheme-cycler" onClick="setColourScheme(nextColourScheme())">
<svg aria-hidden="true" class="colour-scheme-icon-when-auto"><use href="#svg-sun-half"></use></svg>
<svg aria-hidden="true" class="colour-scheme-icon-when-dark"><use href="#svg-moon"></use></svg>
<svg aria-hidden="true" class="colour-scheme-icon-when-light"><use href="#svg-sun"></use></svg>
<span class="visually-hidden">Toggle light / dark / auto colour theme</span>
</button>
</header>
<article>
<section id="pep-content">
<h1 class="page-title">PEP 659 Specializing Adaptive Interpreter</h1>
<dl class="rfc2822 field-list simple">
<dt class="field-odd">Author<span class="colon">:</span></dt>
<dd class="field-odd">Mark Shannon &lt;mark&#32;&#97;t&#32;hotpy.org&gt;</dd>
<dt class="field-even">Status<span class="colon">:</span></dt>
<dd class="field-even"><abbr title="Accepted and implementation complete, or no longer active">Final</abbr></dd>
<dt class="field-odd">Type<span class="colon">:</span></dt>
<dd class="field-odd"><abbr title="Non-normative PEP containing background, guidelines or other information relevant to the Python ecosystem">Informational</abbr></dd>
<dt class="field-even">Created<span class="colon">:</span></dt>
<dd class="field-even">13-Apr-2021</dd>
<dt class="field-odd">Post-History<span class="colon">:</span></dt>
<dd class="field-odd">11-May-2021</dd>
</dl>
<hr class="docutils" />
<section id="contents">
<details><summary>Table of Contents</summary><ul class="simple">
<li><a class="reference internal" href="#abstract">Abstract</a></li>
<li><a class="reference internal" href="#motivation">Motivation</a></li>
<li><a class="reference internal" href="#rationale">Rationale</a><ul>
<li><a class="reference internal" href="#performance">Performance</a></li>
</ul>
</li>
<li><a class="reference internal" href="#implementation">Implementation</a><ul>
<li><a class="reference internal" href="#overview">Overview</a></li>
<li><a class="reference internal" href="#quickening">Quickening</a></li>
<li><a class="reference internal" href="#adaptive-instructions">Adaptive instructions</a></li>
<li><a class="reference internal" href="#specialization">Specialization</a></li>
<li><a class="reference internal" href="#ancillary-data">Ancillary data</a></li>
<li><a class="reference internal" href="#example-families-of-instructions">Example families of instructions</a><ul>
<li><a class="reference internal" href="#load-attr">LOAD_ATTR</a></li>
<li><a class="reference internal" href="#load-global">LOAD_GLOBAL</a></li>
</ul>
</li>
</ul>
</li>
<li><a class="reference internal" href="#compatibility">Compatibility</a></li>
<li><a class="reference internal" href="#costs">Costs</a><ul>
<li><a class="reference internal" href="#memory-use">Memory use</a><ul>
<li><a class="reference internal" href="#comparing-memory-use-to-3-10">Comparing memory use to 3.10</a></li>
</ul>
</li>
</ul>
</li>
<li><a class="reference internal" href="#security-implications">Security Implications</a></li>
<li><a class="reference internal" href="#rejected-ideas">Rejected Ideas</a><ul>
<li><a class="reference internal" href="#storing-data-caches-before-the-bytecode">Storing data caches before the bytecode.</a></li>
</ul>
</li>
<li><a class="reference internal" href="#references">References</a></li>
<li><a class="reference internal" href="#copyright">Copyright</a></li>
</ul>
</details></section>
<div class="pep-banner canonical-doc sticky-banner admonition important">
<p class="admonition-title">Important</p>
<p>This PEP is a historical document. The up-to-date, canonical documentation can now be found at <a class="reference external" href="https://docs.python.org/3/whatsnew/3.11.html#whatsnew311-pep659" title="(in Python v3.13)"><span class="xref std std-ref">Specializing Adaptive Interpreter</span></a>.</p>
<p class="close-button">×</p>
<p>See <a class="pep reference internal" href="../pep-0001/" title="PEP 1 PEP Purpose and Guidelines">PEP 1</a> for how to propose changes.</p>
</div>
<section id="abstract">
<h2><a class="toc-backref" href="#abstract" role="doc-backlink">Abstract</a></h2>
<p>In order to perform well, virtual machines for dynamic languages must
specialize the code that they execute to the types and values in the
program being run. This specialization is often associated with “JIT”
compilers, but is beneficial even without machine code generation.</p>
<p>A specializing, adaptive interpreter is one that speculatively specializes
on the types or values it is currently operating on, and adapts to changes
in those types and values.</p>
<p>Specialization gives us improved performance, and adaptation allows the
interpreter to rapidly change when the pattern of usage in a program alters,
limiting the amount of additional work caused by mis-specialization.</p>
<p>This PEP proposes using a specializing, adaptive interpreter that specializes
code aggressively, but over a very small region, and is able to adjust to
mis-specialization rapidly and at low cost.</p>
<p>Adding a specializing, adaptive interpreter to CPython will bring significant
performance improvements. It is hard to come up with meaningful numbers,
as it depends very much on the benchmarks and on work that has not yet happened.
Extensive experimentation suggests speedups of up to 50%.
Even if the speedup were only 25%, this would still be a worthwhile enhancement.</p>
</section>
<section id="motivation">
<h2><a class="toc-backref" href="#motivation" role="doc-backlink">Motivation</a></h2>
<p>Python is widely acknowledged as slow.
Whilst Python will never attain the performance of low-level languages like C,
Fortran, or even Java, we would like it to be competitive with fast
implementations of scripting languages, like V8 for Javascript or luajit for
lua.
Specifically, we want to achieve these performance goals with CPython to
benefit all users of Python including those unable to use PyPy or
other alternative virtual machines.</p>
<p>Achieving these performance goals is a long way off, and will require a lot of
engineering effort, but we can make a significant step towards those goals by
speeding up the interpreter.
Both academic research and practical implementations have shown that a fast
interpreter is a key part of a fast virtual machine.</p>
<p>Typical optimizations for virtual machines are expensive, so a long “warm up”
time is required to gain confidence that the cost of optimization is justified.
In order to get speed-ups rapidly, without noticeable warmup times,
the VM should speculate that specialization is justified even after a few
executions of a function. To do that effectively, the interpreter must be able
to optimize and de-optimize continually and very cheaply.</p>
<p>By using adaptive and speculative specialization at the granularity of
individual virtual machine instructions,
we get a faster interpreter that also generates profiling information
for more sophisticated optimizations in the future.</p>
</section>
<section id="rationale">
<h2><a class="toc-backref" href="#rationale" role="doc-backlink">Rationale</a></h2>
<p>There are many practical ways to speed-up a virtual machine for a dynamic
language.
However, specialization is the most important, both in itself and as an
enabler of other optimizations.
Therefore it makes sense to focus our efforts on specialization first,
if we want to improve the performance of CPython.</p>
<p>Specialization is typically done in the context of a JIT compiler,
but research shows specialization in an interpreter can boost performance
significantly, even outperforming a naive compiler <a class="footnote-reference brackets" href="#id6" id="id1">[1]</a>.</p>
<p>There have been several ways of doing this proposed in the academic
literature, but most attempt to optimize regions larger than a
single bytecode <a class="footnote-reference brackets" href="#id6" id="id2">[1]</a> <a class="footnote-reference brackets" href="#id7" id="id3">[2]</a>.
Using larger regions than a single instruction requires code to handle
de-optimization in the middle of a region.
Specialization at the level of individual bytecodes makes de-optimization
trivial, as it cannot occur in the middle of a region.</p>
<p>By speculatively specializing individual bytecodes, we can gain significant
performance improvements without anything but the most local,
and trivial to implement, de-optimizations.</p>
<p>The closest approach to this PEP in the literature is
“Inline Caching meets Quickening” <a class="footnote-reference brackets" href="#id8" id="id4">[3]</a>.
This PEP has the advantages of inline caching,
but adds the ability to quickly de-optimize making the performance
more robust in cases where specialization fails or is not stable.</p>
<section id="performance">
<h3><a class="toc-backref" href="#performance" role="doc-backlink">Performance</a></h3>
<p>The speedup from specialization is hard to determine, as many specializations
depend on other optimizations. Speedups seem to be in the range 10% - 60%.</p>
<ul class="simple">
<li>Most of the speedup comes directly from specialization. The largest
contributors are speedups to attribute lookup, global variables, and calls.</li>
<li>A small, but useful, fraction is from improved dispatch such as
super-instructions and other optimizations enabled by quickening.</li>
</ul>
</section>
</section>
<section id="implementation">
<h2><a class="toc-backref" href="#implementation" role="doc-backlink">Implementation</a></h2>
<section id="overview">
<h3><a class="toc-backref" href="#overview" role="doc-backlink">Overview</a></h3>
<p>Any instruction that would benefit from specialization will be replaced by an
“adaptive” form of that instruction. When executed, the adaptive instructions
will specialize themselves in response to the types and values that they see.
This process is known as “quickening”.</p>
<p>Once an instruction in a code object has executed enough times,
that instruction will be “specialized” by replacing it with a new instruction
that is expected to execute faster for that operation.</p>
</section>
<section id="quickening">
<h3><a class="toc-backref" href="#quickening" role="doc-backlink">Quickening</a></h3>
<p>Quickening is the process of replacing slow instructions with faster variants.</p>
<p>Quickened code has a number of advantages over immutable bytecode:</p>
<ul class="simple">
<li>It can be changed at runtime.</li>
<li>It can use super-instructions that span lines and take multiple operands.</li>
<li>It does not need to handle tracing as it can fallback to the original
bytecode for that.</li>
</ul>
<p>In order that tracing can be supported, the quickened instruction format
should match the immutable, user visible, bytecode format:
16-bit instructions of 8-bit opcode followed by 8-bit operand.</p>
</section>
<section id="adaptive-instructions">
<h3><a class="toc-backref" href="#adaptive-instructions" role="doc-backlink">Adaptive instructions</a></h3>
<p>Each instruction that would benefit from specialization is replaced by an
adaptive version during quickening. For example,
the <code class="docutils literal notranslate"><span class="pre">LOAD_ATTR</span></code> instruction would be replaced with <code class="docutils literal notranslate"><span class="pre">LOAD_ATTR_ADAPTIVE</span></code>.</p>
<p>Each adaptive instruction periodically attempts to specialize itself.</p>
</section>
<section id="specialization">
<h3><a class="toc-backref" href="#specialization" role="doc-backlink">Specialization</a></h3>
<p>CPython bytecode contains many instructions that represent high-level
operations, and would benefit from specialization. Examples include <code class="docutils literal notranslate"><span class="pre">CALL</span></code>,
<code class="docutils literal notranslate"><span class="pre">LOAD_ATTR</span></code>, <code class="docutils literal notranslate"><span class="pre">LOAD_GLOBAL</span></code> and <code class="docutils literal notranslate"><span class="pre">BINARY_ADD</span></code>.</p>
<p>By introducing a “family” of specialized instructions for each of these
instructions allows effective specialization,
since each new instruction is specialized to a single task.
Each family will include an “adaptive” instruction, that maintains a counter
and attempts to specialize itself when that counter reaches zero.</p>
<p>Each family will also include one or more specialized instructions that
perform the equivalent of the generic operation much faster provided their
inputs are as expected.
Each specialized instruction will maintain a saturating counter which will
be incremented whenever the inputs are as expected. Should the inputs not
be as expected, the counter will be decremented and the generic operation
will be performed.
If the counter reaches the minimum value, the instruction is de-optimized by
simply replacing its opcode with the adaptive version.</p>
</section>
<section id="ancillary-data">
<h3><a class="toc-backref" href="#ancillary-data" role="doc-backlink">Ancillary data</a></h3>
<p>Most families of specialized instructions will require more information than
can fit in an 8-bit operand. To do this, a number of 16 bit entries immediately
following the instruction are used to store this data. This is a form of inline
cache, an “inline data cache”. Unspecialized, or adaptive, instructions will
use the first entry of this cache as a counter, and simply skip over the others.</p>
</section>
<section id="example-families-of-instructions">
<h3><a class="toc-backref" href="#example-families-of-instructions" role="doc-backlink">Example families of instructions</a></h3>
<section id="load-attr">
<h4><a class="toc-backref" href="#load-attr" role="doc-backlink">LOAD_ATTR</a></h4>
<p>The <code class="docutils literal notranslate"><span class="pre">LOAD_ATTR</span></code> instruction loads the named attribute of the object on top of the stack,
then replaces the object on top of the stack with the attribute.</p>
<p>This is an obvious candidate for specialization. Attributes might belong to
a normal instance, a class, a module, or one of many other special cases.</p>
<p><code class="docutils literal notranslate"><span class="pre">LOAD_ATTR</span></code> would initially be quickened to <code class="docutils literal notranslate"><span class="pre">LOAD_ATTR_ADAPTIVE</span></code> which
would track how often it is executed, and call the <code class="docutils literal notranslate"><span class="pre">_Py_Specialize_LoadAttr</span></code>
internal function when executed enough times, or jump to the original
<code class="docutils literal notranslate"><span class="pre">LOAD_ATTR</span></code> instruction to perform the load. When optimizing, the kind
of the attribute would be examined, and if a suitable specialized instruction
was found, it would replace <code class="docutils literal notranslate"><span class="pre">LOAD_ATTR_ADAPTIVE</span></code> in place.</p>
<p>Specialization for <code class="docutils literal notranslate"><span class="pre">LOAD_ATTR</span></code> might include:</p>
<ul class="simple">
<li><code class="docutils literal notranslate"><span class="pre">LOAD_ATTR_INSTANCE_VALUE</span></code> A common case where the attribute is stored in
the objects value array, and not shadowed by an overriding descriptor.</li>
<li><code class="docutils literal notranslate"><span class="pre">LOAD_ATTR_MODULE</span></code> Load an attribute from a module.</li>
<li><code class="docutils literal notranslate"><span class="pre">LOAD_ATTR_SLOT</span></code> Load an attribute from an object whose
class defines <code class="docutils literal notranslate"><span class="pre">__slots__</span></code>.</li>
</ul>
<p>Note how this allows optimizations that complement other optimizations.
The <code class="docutils literal notranslate"><span class="pre">LOAD_ATTR_INSTANCE_VALUE</span></code> works well with the “lazy dictionary” used for
many objects.</p>
</section>
<section id="load-global">
<h4><a class="toc-backref" href="#load-global" role="doc-backlink">LOAD_GLOBAL</a></h4>
<p>The <code class="docutils literal notranslate"><span class="pre">LOAD_GLOBAL</span></code> instruction looks up a name in the global namespace
and then, if not present in the global namespace,
looks it up in the builtins namespace.
In 3.9 the C code for the <code class="docutils literal notranslate"><span class="pre">LOAD_GLOBAL</span></code> includes code to check to see
whether the whole code object should be modified to add a cache,
whether either the global or builtins namespace,
code to lookup the value in a cache, and fallback code.
This makes it complicated and bulky.
It also performs many redundant operations even when supposedly optimized.</p>
<p>Using a family of instructions makes the code more maintainable and faster,
as each instruction only needs to handle one concern.</p>
<p>Specializations would include:</p>
<ul class="simple">
<li><code class="docutils literal notranslate"><span class="pre">LOAD_GLOBAL_ADAPTIVE</span></code> would operate like <code class="docutils literal notranslate"><span class="pre">LOAD_ATTR_ADAPTIVE</span></code> above.</li>
<li><code class="docutils literal notranslate"><span class="pre">LOAD_GLOBAL_MODULE</span></code> can be specialized for the case where the value is in
the globals namespace. After checking that the keys of the namespace have
not changed, it can load the value from the stored index.</li>
<li><code class="docutils literal notranslate"><span class="pre">LOAD_GLOBAL_BUILTIN</span></code> can be specialized for the case where the value is
in the builtins namespace. It needs to check that the keys of the global
namespace have not been added to, and that the builtins namespace has not
changed. Note that we dont care if the values of the global namespace
have changed, just the keys.</li>
</ul>
<p>See <a class="footnote-reference brackets" href="#id9" id="id5">[4]</a> for a full implementation.</p>
<div class="admonition note">
<p class="admonition-title">Note</p>
<p>This PEP outlines the mechanisms for managing specialization, and does not
specify the particular optimizations to be applied.
It is likely that details, or even the entire implementation, may change
as the code is further developed.</p>
</div>
</section>
</section>
</section>
<section id="compatibility">
<h2><a class="toc-backref" href="#compatibility" role="doc-backlink">Compatibility</a></h2>
<p>There will be no change to the language, library or API.</p>
<p>The only way that users will be able to detect the presence of the new
interpreter is through timing execution, the use of debugging tools,
or measuring memory use.</p>
</section>
<section id="costs">
<h2><a class="toc-backref" href="#costs" role="doc-backlink">Costs</a></h2>
<section id="memory-use">
<h3><a class="toc-backref" href="#memory-use" role="doc-backlink">Memory use</a></h3>
<p>An obvious concern with any scheme that performs any sort of caching is
“how much more memory does it use?”.
The short answer is “not that much”.</p>
<section id="comparing-memory-use-to-3-10">
<h4><a class="toc-backref" href="#comparing-memory-use-to-3-10" role="doc-backlink">Comparing memory use to 3.10</a></h4>
<p>CPython 3.10 used 2 bytes per instruction, until the execution count
reached ~2000 when it allocates another byte per instruction and
32 bytes per instruction with a cache (<code class="docutils literal notranslate"><span class="pre">LOAD_GLOBAL</span></code> and <code class="docutils literal notranslate"><span class="pre">LOAD_ATTR</span></code>).</p>
<p>The following table shows the additional bytes per instruction to support the
3.10 opcache or the proposed adaptive interpreter, on a 64 bit machine.</p>
<table class="docutils align-default">
<tbody>
<tr class="row-odd"><td>Version</td>
<td>3.10 cold</td>
<td>3.10 hot</td>
<td>3.11</td>
</tr>
<tr class="row-even"><td>Specialised</td>
<td>0%</td>
<td>~15%</td>
<td>~25%</td>
</tr>
<tr class="row-odd"><td>code</td>
<td>2</td>
<td>2</td>
<td>2</td>
</tr>
<tr class="row-even"><td>opcache_map</td>
<td>0</td>
<td>1</td>
<td>0</td>
</tr>
<tr class="row-odd"><td>opcache/data</td>
<td>0</td>
<td>4.8</td>
<td>4</td>
</tr>
<tr class="row-even"><td>Total</td>
<td>2</td>
<td>7.8</td>
<td>6</td>
</tr>
</tbody>
</table>
<p><code class="docutils literal notranslate"><span class="pre">3.10</span> <span class="pre">cold</span></code> is before the code has reached the ~2000 limit.
<code class="docutils literal notranslate"><span class="pre">3.10</span> <span class="pre">hot</span></code> shows the cache use once the threshold is reached.</p>
<p>The relative memory use depends on how much code is “hot” enough to trigger
creation of the cache in 3.10. The break even point, where the memory used
by 3.10 is the same as for 3.11 is ~70%.</p>
<p>It is also worth noting that the actual bytecode is only part of a code
object. Code objects also include names, constants and quite a lot of
debugging information.</p>
<p>In summary, for most applications where many of the functions are relatively
unused, 3.11 will consume more memory than 3.10, but not by much.</p>
</section>
</section>
</section>
<section id="security-implications">
<h2><a class="toc-backref" href="#security-implications" role="doc-backlink">Security Implications</a></h2>
<p>None</p>
</section>
<section id="rejected-ideas">
<h2><a class="toc-backref" href="#rejected-ideas" role="doc-backlink">Rejected Ideas</a></h2>
<p>By implementing a specializing adaptive interpreter with inline data caches,
we are implicitly rejecting many alternative ways to optimize CPython.
However, it is worth emphasizing that some ideas, such as just-in-time
compilation, have not been rejected, merely deferred.</p>
<section id="storing-data-caches-before-the-bytecode">
<h3><a class="toc-backref" href="#storing-data-caches-before-the-bytecode" role="doc-backlink">Storing data caches before the bytecode.</a></h3>
<p>An earlier implementation of this PEP for 3.11 alpha used a different caching
scheme as described below:</p>
<blockquote>
<div>Quickened instructions will be stored in an array (it is neither necessary not
desirable to store them in a Python object) with the same format as the
original bytecode. Ancillary data will be stored in a separate array.<p>Each instruction will use 0 or more data entries.
Each instruction within a family must have the same amount of data allocated,
although some instructions may not use all of it.
Instructions that cannot be specialized, e.g. <code class="docutils literal notranslate"><span class="pre">POP_TOP</span></code>,
do not need any entries.
Experiments show that 25% to 30% of instructions can be usefully specialized.
Different families will need different amounts of data,
but most need 2 entries (16 bytes on a 64 bit machine).</p>
<p>In order to support larger functions than 256 instructions,
we compute the offset of the first data entry for instructions
as <code class="docutils literal notranslate"><span class="pre">(instruction</span> <span class="pre">offset)//2</span> <span class="pre">+</span> <span class="pre">(quickened</span> <span class="pre">operand)</span></code>.</p>
<p>Compared to the opcache in Python 3.10, this design:</p>
<ul class="simple">
<li>is faster; it requires no memory reads to compute the offset.
3.10 requires two reads, which are dependent.</li>
<li>uses much less memory, as the data can be different sizes for different
instruction families, and doesnt need an additional array of offsets.
can support much larger functions, up to about 5000 instructions
per function. 3.10 can support about 1000.</li>
</ul>
</div></blockquote>
<p>We rejected this scheme as the inline cache approach is both faster
and simpler.</p>
</section>
</section>
<section id="references">
<h2><a class="toc-backref" href="#references" role="doc-backlink">References</a></h2>
<aside class="footnote-list brackets">
<aside class="footnote brackets" id="id6" role="doc-footnote">
<dt class="label" id="id6">[1]<em> (<a href='#id1'>1</a>, <a href='#id2'>2</a>) </em></dt>
<dd>The construction of high-performance virtual machines for
dynamic languages, Mark Shannon 2011.
<a class="reference external" href="https://theses.gla.ac.uk/2975/1/2011shannonphd.pdf">https://theses.gla.ac.uk/2975/1/2011shannonphd.pdf</a></aside>
<aside class="footnote brackets" id="id7" role="doc-footnote">
<dt class="label" id="id7">[<a href="#id3">2</a>]</dt>
<dd>Dynamic Interpretation for Dynamic Scripting Languages
<a class="reference external" href="https://www.scss.tcd.ie/publications/tech-reports/reports.09/TCD-CS-2009-37.pdf">https://www.scss.tcd.ie/publications/tech-reports/reports.09/TCD-CS-2009-37.pdf</a></aside>
<aside class="footnote brackets" id="id8" role="doc-footnote">
<dt class="label" id="id8">[<a href="#id4">3</a>]</dt>
<dd>Inline Caching meets Quickening
<a class="reference external" href="https://www.unibw.de/ucsrl/pubs/ecoop10.pdf/view">https://www.unibw.de/ucsrl/pubs/ecoop10.pdf/view</a></aside>
<aside class="footnote brackets" id="id9" role="doc-footnote">
<dt class="label" id="id9">[<a href="#id5">4</a>]</dt>
<dd>The adaptive and specialized instructions are implemented in
<a class="reference external" href="https://github.com/python/cpython/blob/main/Python/ceval.c">https://github.com/python/cpython/blob/main/Python/ceval.c</a><p>The optimizations are implemented in:
<a class="reference external" href="https://github.com/python/cpython/blob/main/Python/specialize.c">https://github.com/python/cpython/blob/main/Python/specialize.c</a></p>
</aside>
</aside>
</section>
<section id="copyright">
<h2><a class="toc-backref" href="#copyright" role="doc-backlink">Copyright</a></h2>
<p>This document is placed in the public domain or under the
CC0-1.0-Universal license, whichever is more permissive.</p>
</section>
</section>
<hr class="docutils" />
<p>Source: <a class="reference external" href="https://github.com/python/peps/blob/main/peps/pep-0659.rst">https://github.com/python/peps/blob/main/peps/pep-0659.rst</a></p>
<p>Last modified: <a class="reference external" href="https://github.com/python/peps/commits/main/peps/pep-0659.rst">2024-10-29 10:45:35 GMT</a></p>
</article>
<nav id="pep-sidebar">
<h2>Contents</h2>
<ul>
<li><a class="reference internal" href="#abstract">Abstract</a></li>
<li><a class="reference internal" href="#motivation">Motivation</a></li>
<li><a class="reference internal" href="#rationale">Rationale</a><ul>
<li><a class="reference internal" href="#performance">Performance</a></li>
</ul>
</li>
<li><a class="reference internal" href="#implementation">Implementation</a><ul>
<li><a class="reference internal" href="#overview">Overview</a></li>
<li><a class="reference internal" href="#quickening">Quickening</a></li>
<li><a class="reference internal" href="#adaptive-instructions">Adaptive instructions</a></li>
<li><a class="reference internal" href="#specialization">Specialization</a></li>
<li><a class="reference internal" href="#ancillary-data">Ancillary data</a></li>
<li><a class="reference internal" href="#example-families-of-instructions">Example families of instructions</a><ul>
<li><a class="reference internal" href="#load-attr">LOAD_ATTR</a></li>
<li><a class="reference internal" href="#load-global">LOAD_GLOBAL</a></li>
</ul>
</li>
</ul>
</li>
<li><a class="reference internal" href="#compatibility">Compatibility</a></li>
<li><a class="reference internal" href="#costs">Costs</a><ul>
<li><a class="reference internal" href="#memory-use">Memory use</a><ul>
<li><a class="reference internal" href="#comparing-memory-use-to-3-10">Comparing memory use to 3.10</a></li>
</ul>
</li>
</ul>
</li>
<li><a class="reference internal" href="#security-implications">Security Implications</a></li>
<li><a class="reference internal" href="#rejected-ideas">Rejected Ideas</a><ul>
<li><a class="reference internal" href="#storing-data-caches-before-the-bytecode">Storing data caches before the bytecode.</a></li>
</ul>
</li>
<li><a class="reference internal" href="#references">References</a></li>
<li><a class="reference internal" href="#copyright">Copyright</a></li>
</ul>
<br>
<a id="source" href="https://github.com/python/peps/blob/main/peps/pep-0659.rst">Page Source (GitHub)</a>
</nav>
</section>
<script src="../_static/colour_scheme.js"></script>
<script src="../_static/wrap_tables.js"></script>
<script src="../_static/sticky_banner.js"></script>
</body>
</html>