Reformat PEP 659 to obey 80 column limit. (#2458)
This commit is contained in:
parent
04eb44995d
commit
293dd4e107
304
pep-0659.rst
304
pep-0659.rst
|
@ -11,21 +11,26 @@ Post-History: 11-May-2021
|
||||||
Abstract
|
Abstract
|
||||||
========
|
========
|
||||||
|
|
||||||
In order to perform well, virtual machines for dynamic languages must specialize the code that they execute
|
In order to perform well, virtual machines for dynamic languages must
|
||||||
to the types and values in the program being run.
|
specialize the code that they execute to the types and values in the
|
||||||
This specialization is often associated with "JIT" compilers, but is beneficial even without machine code generation.
|
program being run. This specialization is often associated with "JIT"
|
||||||
|
compilers, but is beneficial even without machine code generation.
|
||||||
|
|
||||||
A specializing, adaptive interpreter is one that speculatively specializes on the types or values it is currently operating on,
|
A specializing, adaptive interpreter is one that speculatively specializes
|
||||||
and adapts to changes in those types and values.
|
on the types or values it is currently operating on, and adapts to changes
|
||||||
|
in those types and values.
|
||||||
|
|
||||||
Specialization gives us improved performance, and adaptation allows the interpreter to rapidly change when the pattern of usage in a program alters,
|
Specialization gives us improved performance, and adaptation allows the
|
||||||
|
interpreter to rapidly change when the pattern of usage in a program alters,
|
||||||
limiting the amount of additional work caused by mis-specialization.
|
limiting the amount of additional work caused by mis-specialization.
|
||||||
|
|
||||||
This PEP proposes using a specializing, adaptive interpreter that specializes code aggressively, but over a very small region,
|
This PEP proposes using a specializing, adaptive interpreter that specializes
|
||||||
and is able to adjust to mis-specialization rapidly and at low cost.
|
code aggressively, but over a very small region, and is able to adjust to
|
||||||
|
mis-specialization rapidly and at low cost.
|
||||||
|
|
||||||
Adding a specializing, adaptive interpreter to CPython will bring significant performance improvements.
|
Adding a specializing, adaptive interpreter to CPython will bring significant
|
||||||
It is hard to come up with meaningful numbers, as it depends very much on the benchmarks and on work that has not yet happened.
|
performance improvements. It is hard to come up with meaningful numbers,
|
||||||
|
as it depends very much on the benchmarks and on work that has not yet happened.
|
||||||
Extensive experimentation suggests speedups of up to 50%.
|
Extensive experimentation suggests speedups of up to 50%.
|
||||||
Even if the speedup were only 25%, this would still be a worthwhile enhancement.
|
Even if the speedup were only 25%, this would still be a worthwhile enhancement.
|
||||||
|
|
||||||
|
@ -33,44 +38,62 @@ Motivation
|
||||||
==========
|
==========
|
||||||
|
|
||||||
Python is widely acknowledged as slow.
|
Python is widely acknowledged as slow.
|
||||||
Whilst Python will never attain the performance of low-level languages like C, Fortran, or even Java,
|
Whilst Python will never attain the performance of low-level languages like C,
|
||||||
we would like it to be competitive with fast implementations of scripting languages, like V8 for Javascript or luajit for lua.
|
Fortran, or even Java, we would like it to be competitive with fast
|
||||||
Specifically, we want to achieve these performance goals with CPython to benefit all users of Python
|
implementations of scripting languages, like V8 for Javascript or luajit for
|
||||||
including those unable to use PyPy or other alternative virtual machines.
|
lua.
|
||||||
|
Specifically, we want to achieve these performance goals with CPython to
|
||||||
|
benefit all users of Python including those unable to use PyPy or
|
||||||
|
other alternative virtual machines.
|
||||||
|
|
||||||
Achieving these performance goals is a long way off, and will require a lot of engineering effort,
|
Achieving these performance goals is a long way off, and will require a lot of
|
||||||
but we can make a significant step towards those goals by speeding up the interpreter.
|
engineering effort, but we can make a significant step towards those goals by
|
||||||
Both academic research and practical implementations have shown that a fast interpreter is a key part of a fast virtual machine.
|
speeding up the interpreter.
|
||||||
|
Both academic research and practical implementations have shown that a fast
|
||||||
|
interpreter is a key part of a fast virtual machine.
|
||||||
|
|
||||||
Typical optimizations for virtual machines are expensive, so a long "warm up" time is required
|
Typical optimizations for virtual machines are expensive, so a long "warm up"
|
||||||
to gain confidence that the cost of optimization is justified.
|
time is required to gain confidence that the cost of optimization is justified.
|
||||||
In order to get speed-ups rapidly, without noticeable warmup times,
|
In order to get speed-ups rapidly, without noticeable warmup times,
|
||||||
the VM should speculate that specialization is justified even after a few executions of a function.
|
the VM should speculate that specialization is justified even after a few
|
||||||
To do that effectively, the interpreter must be able to optimize and deoptimize continually and very cheaply.
|
executions of a function. To do that effectively, the interpreter must be able
|
||||||
|
to optimize and de-optimize continually and very cheaply.
|
||||||
|
|
||||||
By using adaptive and speculative specialization at the granularity of individual virtual machine instructions, we get a faster
|
By using adaptive and speculative specialization at the granularity of
|
||||||
interpreter that also generates profiling information for more sophisticated optimizations in the future.
|
individual virtual machine instructions,
|
||||||
|
we get a faster interpreter that also generates profiling information
|
||||||
|
for more sophisticated optimizations in the future.
|
||||||
|
|
||||||
Rationale
|
Rationale
|
||||||
=========
|
=========
|
||||||
|
|
||||||
There are many practical ways to speed-up a virtual machine for a dynamic language.
|
There are many practical ways to speed-up a virtual machine for a dynamic
|
||||||
However, specialization is the most important, both in itself and as an enabler of other optimizations.
|
language.
|
||||||
Therefore it makes sense to focus our efforts on specialization first, if we want to improve the performance of CPython.
|
However, specialization is the most important, both in itself and as an
|
||||||
|
enabler of other optimizations.
|
||||||
|
Therefore it makes sense to focus our efforts on specialization first,
|
||||||
|
if we want to improve the performance of CPython.
|
||||||
|
|
||||||
Specialization is typically done in the context of a JIT compiler, but research shows specialization in an interpreter
|
Specialization is typically done in the context of a JIT compiler,
|
||||||
can boost performance significantly, even outperforming a naive compiler [1]_.
|
but research shows specialization in an interpreter can boost performance
|
||||||
|
significantly, even outperforming a naive compiler [1]_.
|
||||||
|
|
||||||
There have been several ways of doing this proposed in the academic literature,
|
There have been several ways of doing this proposed in the academic
|
||||||
but most attempt to optimize regions larger than a single bytecode [1]_ [2]_.
|
literature, but most attempt to optimize regions larger than a
|
||||||
Using larger regions than a single instruction, requires code to handle deoptimization in the middle of a region.
|
single bytecode [1]_ [2]_.
|
||||||
Specialization at the level of individual bytecodes makes deoptimization trivial, as it cannot occur in the middle of a region.
|
Using larger regions than a single instruction requires code to handle
|
||||||
|
de-optimization in the middle of a region.
|
||||||
|
Specialization at the level of individual bytecodes makes de-optimization
|
||||||
|
trivial, as it cannot occur in the middle of a region.
|
||||||
|
|
||||||
By speculatively specializing individual bytecodes, we can gain significant performance improvements without anything but the most local,
|
By speculatively specializing individual bytecodes, we can gain significant
|
||||||
and trivial to implement, deoptimizations.
|
performance improvements without anything but the most local,
|
||||||
|
and trivial to implement, de-optimizations.
|
||||||
|
|
||||||
The closest approach to this PEP in the literature is "Inline Caching meets Quickening" [3]_.
|
The closest approach to this PEP in the literature is
|
||||||
This PEP has the advantages of inline caching, but adds the ability to quickly deoptimize making the performance
|
"Inline Caching meets Quickening" [3]_.
|
||||||
|
This PEP has the advantages of inline caching,
|
||||||
|
but adds the ability to quickly de-optimize making the performance
|
||||||
more robust in cases where specialization fails or is not stable.
|
more robust in cases where specialization fails or is not stable.
|
||||||
|
|
||||||
Performance
|
Performance
|
||||||
|
@ -78,11 +101,14 @@ Performance
|
||||||
|
|
||||||
The expected speedup of 50% can be broken roughly down as follows:
|
The expected speedup of 50% can be broken roughly down as follows:
|
||||||
|
|
||||||
* In the region of 30% from specialization. Much of that is from specialization of calls,
|
* In the region of 30% from specialization. Much of that is from
|
||||||
with improvements in instructions that are already specialized such as ``LOAD_ATTR`` and ``LOAD_GLOBAL``
|
specialization of calls, with improvements in instructions that are already
|
||||||
contributing much of the remainder. Specialization of operations adds a small amount.
|
specialized such as ``LOAD_ATTR`` and ``LOAD_GLOBAL`` contributing much of
|
||||||
* About 10% from improved dispatch such as super-instructions and other optimizations enabled by quickening.
|
the remainder. Specialization of operations adds a small amount.
|
||||||
* Further increases in the benefits of other optimizations, as they can exploit, or be exploited by specialization.
|
* About 10% from improved dispatch such as super-instructions
|
||||||
|
and other optimizations enabled by quickening.
|
||||||
|
* Further increases in the benefits of other optimizations,
|
||||||
|
as they can exploit, or be exploited by specialization.
|
||||||
|
|
||||||
Implementation
|
Implementation
|
||||||
==============
|
==============
|
||||||
|
@ -90,12 +116,15 @@ Implementation
|
||||||
Overview
|
Overview
|
||||||
--------
|
--------
|
||||||
|
|
||||||
Once any instruction in a code object has executed a few times, that code object will be "quickened" by allocating a new array
|
Once any instruction in a code object has executed a few times,
|
||||||
for the bytecode that can be modified at runtime, and is not constrained as the ``code.co_code`` object is.
|
that code object will be "quickened" by allocating a new array for the
|
||||||
From that point onwards, whenever any instruction in that code object is executed, it will use the quickened form.
|
bytecode that can be modified at runtime, and is not constrained as the
|
||||||
|
``code.co_code`` object is. From that point onwards, whenever any
|
||||||
|
instruction in that code object is executed, it will use the quickened form.
|
||||||
|
|
||||||
Any instruction that would benefit from specialization will be replaced by an "adaptive" form of that instruction.
|
Any instruction that would benefit from specialization will be replaced by an
|
||||||
When executed, the adaptive instructions will specialize themselves in response to the types and values that they see.
|
"adaptive" form of that instruction. When executed, the adaptive instructions
|
||||||
|
will specialize themselves in response to the types and values that they see.
|
||||||
|
|
||||||
Quickening
|
Quickening
|
||||||
----------
|
----------
|
||||||
|
@ -106,62 +135,85 @@ Quickened code has number of advantages over the normal bytecode:
|
||||||
|
|
||||||
* It can be changed at runtime
|
* It can be changed at runtime
|
||||||
* It can use super-instructions that span lines and take multiple operands.
|
* It can use super-instructions that span lines and take multiple operands.
|
||||||
* It does not need to handle tracing as it can fallback to the normal bytecode for that.
|
* It does not need to handle tracing as it can fallback to the normal
|
||||||
|
bytecode for that.
|
||||||
|
|
||||||
In order that tracing can be supported, and quickening performed quickly, the quickened instruction format should match the normal
|
In order that tracing can be supported, and quickening performed quickly,
|
||||||
bytecode format: 16-bit instructions of 8-bit opcode followed by 8-bit operand.
|
the quickened instruction format should match the normal bytecode format:
|
||||||
|
16-bit instructions of 8-bit opcode followed by 8-bit operand.
|
||||||
|
|
||||||
Adaptive instructions
|
Adaptive instructions
|
||||||
---------------------
|
---------------------
|
||||||
|
|
||||||
Each instruction that would benefit from specialization is replaced by an adaptive version during quickening.
|
Each instruction that would benefit from specialization is replaced by an
|
||||||
For example, the ``LOAD_ATTR`` instruction would be replaced with ``LOAD_ATTR_ADAPTIVE``.
|
adaptive version during quickening. For example,
|
||||||
|
the ``LOAD_ATTR`` instruction would be replaced with ``LOAD_ATTR_ADAPTIVE``.
|
||||||
|
|
||||||
Each adaptive instruction maintains a counter, and periodically attempts to specialize itself.
|
Each adaptive instruction maintains a counter,
|
||||||
|
and periodically attempts to specialize itself.
|
||||||
|
|
||||||
Specialization
|
Specialization
|
||||||
--------------
|
--------------
|
||||||
|
|
||||||
CPython bytecode contains many bytecodes that represent high-level operations, and would benefit from specialization.
|
CPython bytecode contains many bytecodes that represent high-level operations,
|
||||||
Examples include ``CALL_FUNCTION``, ``LOAD_ATTR``, ``LOAD_GLOBAL`` and ``BINARY_ADD``.
|
and would benefit from specialization. Examples include ``CALL_FUNCTION``,
|
||||||
|
``LOAD_ATTR``, ``LOAD_GLOBAL`` and ``BINARY_ADD``.
|
||||||
|
|
||||||
By introducing a "family" of specialized instructions for each of these instructions allows effective specialization,
|
By introducing a "family" of specialized instructions for each of these
|
||||||
|
instructions allows effective specialization,
|
||||||
since each new instruction is specialized to a single task.
|
since each new instruction is specialized to a single task.
|
||||||
Each family will include an "adaptive" instruction, that maintains a counter and periodically attempts to specialize itself.
|
Each family will include an "adaptive" instruction,
|
||||||
Each family will also include one or more specialized instructions that perform the equivalent
|
that maintains a counter and periodically attempts to specialize itself.
|
||||||
of the generic operation much faster provided their inputs are as expected.
|
Each family will also include one or more specialized instructions that
|
||||||
Each specialized instruction will maintain a saturating counter which will be incremented whenever the inputs are as expected.
|
perform the equivalent of the generic operation much faster provided their
|
||||||
Should the inputs not be as expected, the counter will be decremented and the generic operation will be performed.
|
inputs are as expected.
|
||||||
If the counter reaches the minimum value, the instruction is deoptimized by simply replacing its opcode with the adaptive version.
|
Each specialized instruction will maintain a saturating counter which will
|
||||||
|
be incremented whenever the inputs are as expected. Should the inputs not
|
||||||
|
be as expected, the counter will be decremented and the generic operation
|
||||||
|
will be performed.
|
||||||
|
If the counter reaches the minimum value, the instruction is de-optimized by
|
||||||
|
simply replacing its opcode with the adaptive version.
|
||||||
|
|
||||||
Ancillary data
|
Ancillary data
|
||||||
--------------
|
--------------
|
||||||
|
|
||||||
Most families of specialized instructions will require more information than can fit in an 8-bit operand.
|
Most families of specialized instructions will require more information than
|
||||||
To do this, an array of specialization data entries will be maintained alongside the new instruction array.
|
can fit in an 8-bit operand. To do this, an array of specialization data entries
|
||||||
For instructions that need specialization data, the operand in the quickened array will serve as a partial index,
|
will be maintained alongside the new instruction array. For instructions that
|
||||||
along with the offset of the instruction, to find the first specialization data entry for that instruction.
|
need specialization data, the operand in the quickened array will serve as a
|
||||||
Each entry will be 8 bytes (for a 64 bit machine). The data in an entry, and the number of entries needed, will vary from instruction to instruction.
|
partial index, along with the offset of the instruction, to find the first
|
||||||
|
specialization data entry for that instruction.
|
||||||
|
Each entry will be 8 bytes (for a 64 bit machine). The data in an entry,
|
||||||
|
and the number of entries needed, will vary from instruction to instruction.
|
||||||
|
|
||||||
Data layout
|
Data layout
|
||||||
-----------
|
-----------
|
||||||
|
|
||||||
Quickened instructions will be stored in an array (it is neither necessary not desirable to store them in a Python object) with the same
|
Quickened instructions will be stored in an array (it is neither necessary not
|
||||||
format as the original bytecode. Ancillary data will be stored in a separate array.
|
desirable to store them in a Python object) with the same format as the
|
||||||
|
original bytecode. Ancillary data will be stored in a separate array.
|
||||||
|
|
||||||
Each instruction will use 0 or more data entries. Each instruction within a family must have the same amount of data allocated, although some
|
Each instruction will use 0 or more data entries.
|
||||||
instructions may not use all of it. Instructions that cannot be specialized, e.g. ``POP_TOP``, do not need any entries.
|
Each instruction within a family must have the same amount of data allocated,
|
||||||
|
although some instructions may not use all of it.
|
||||||
|
Instructions that cannot be specialized, e.g. ``POP_TOP``,
|
||||||
|
do not need any entries.
|
||||||
Experiments show that 25% to 30% of instructions can be usefully specialized.
|
Experiments show that 25% to 30% of instructions can be usefully specialized.
|
||||||
Different families will need different amounts of data, but most need 2 entries (16 bytes on a 64 bit machine).
|
Different families will need different amounts of data,
|
||||||
|
but most need 2 entries (16 bytes on a 64 bit machine).
|
||||||
|
|
||||||
In order to support larger functions than 256 instructions, we compute the offset of the first data entry for instructions
|
In order to support larger functions than 256 instructions,
|
||||||
|
we compute the offset of the first data entry for instructions
|
||||||
as ``(instruction offset)//2 + (quickened operand)``.
|
as ``(instruction offset)//2 + (quickened operand)``.
|
||||||
|
|
||||||
Compared to the opcache in Python 3.10, this design:
|
Compared to the opcache in Python 3.10, this design:
|
||||||
|
|
||||||
* is faster; it requires no memory reads to compute the offset. 3.10 requires two reads, which are dependent.
|
* is faster; it requires no memory reads to compute the offset.
|
||||||
* uses much less memory, as the data can be different sizes for different instruction families, and doesn't need an additional array of offsets.
|
3.10 requires two reads, which are dependent.
|
||||||
* can support much larger functions, up to about 5000 instructions per function. 3.10 can support about 1000.
|
* uses much less memory, as the data can be different sizes for different
|
||||||
|
instruction families, and doesn't need an additional array of offsets.
|
||||||
|
can support much larger functions, up to about 5000 instructions
|
||||||
|
per function. 3.10 can support about 1000.
|
||||||
|
|
||||||
|
|
||||||
Example families of instructions
|
Example families of instructions
|
||||||
|
@ -170,64 +222,86 @@ Example families of instructions
|
||||||
CALL_FUNCTION
|
CALL_FUNCTION
|
||||||
'''''''''''''
|
'''''''''''''
|
||||||
|
|
||||||
The ``CALL_FUNCTION`` instruction calls the (N+1)th item on the stack with top N items on the stack as arguments.
|
The ``CALL_FUNCTION`` instruction calls the (N+1)th item on the stack with
|
||||||
|
top N items on the stack as arguments.
|
||||||
|
|
||||||
This is an obvious candidate for specialization. For example, the call in ``len(x)`` is represented as the bytecode ``CALL_FUNCTION 1``.
|
This is an obvious candidate for specialization. For example, the call in
|
||||||
In this case we would always expect the object ``len`` to be the function. We probably don't want to specialize for ``len``
|
``len(x)`` is represented as the bytecode ``CALL_FUNCTION 1``.
|
||||||
(although we might for ``type`` and ``isinstance``), but it would be beneficial to specialize for builtin functions taking a single argument.
|
In this case we would always expect the object ``len`` to be the function.
|
||||||
A fast check that the underlying function is a builtin function taking a single argument (``METHOD_O``) would allow us to avoid a
|
We probably don't want to specialize for ``len``
|
||||||
sequence of checks for number of parameters and keyword arguments.
|
(although we might for ``type`` and ``isinstance``), but it would be beneficial
|
||||||
|
to specialize for builtin functions taking a single argument.
|
||||||
|
A fast check that the underlying function is a builtin function taking a single
|
||||||
|
argument (``METHOD_O``) would allow us to avoid a sequence of checks for number
|
||||||
|
of parameters and keyword arguments.
|
||||||
|
|
||||||
``CALL_FUNCTION_ADAPTIVE`` would track how often it is executed, and call the ``call_function_optimize`` when executed enough times, or jump
|
``CALL_FUNCTION_ADAPTIVE`` would track how often it is executed, and call the
|
||||||
to ``CALL_FUNCTION`` otherwise.
|
``call_function_optimize`` when executed enough times, or jump to ``CALL_FUNCTION``
|
||||||
When optimizing, the kind of the function would be checked and if a suitable specialized instruction was found,
|
otherwise. When optimizing, the kind of the function would be checked and if a
|
||||||
|
suitable specialized instruction was found,
|
||||||
it would replace ``CALL_FUNCTION_ADAPTIVE`` in place.
|
it would replace ``CALL_FUNCTION_ADAPTIVE`` in place.
|
||||||
|
|
||||||
Specializations might include:
|
Specializations might include:
|
||||||
|
|
||||||
* ``CALL_FUNCTION_PY_SIMPLE``: Calls to Python functions with exactly matching parameters.
|
* ``CALL_FUNCTION_PY_SIMPLE``: Calls to Python functions with
|
||||||
* ``CALL_FUNCTION_PY_DEFAULTS``: Calls to Python functions with more parameters and default values.
|
exactly matching parameters.
|
||||||
Since the exact number of defaults needed is known, the instruction needs to do no additional checking or computation; just copy some defaults.
|
* ``CALL_FUNCTION_PY_DEFAULTS``: Calls to Python functions with more
|
||||||
* ``CALL_BUILTIN_O``: The example given above for calling builtin methods taking exactly one argument.
|
parameters and default values. Since the exact number of defaults needed is
|
||||||
* ``CALL_BUILTIN_VECTOR``: For calling builtin function taking vector arguments.
|
known, the instruction needs to do no additional checking or computation;
|
||||||
|
just copy some defaults.
|
||||||
|
* ``CALL_BUILTIN_O``: The example given above for calling builtin methods
|
||||||
|
taking exactly one argument.
|
||||||
|
* ``CALL_BUILTIN_VECTOR``: For calling builtin function taking
|
||||||
|
vector arguments.
|
||||||
|
|
||||||
Note how this allows optimizations that complement other optimizations.
|
Note how this allows optimizations that complement other optimizations.
|
||||||
For example, if the Python and C call stacks were decoupled and the data stack were contiguous,
|
For example, if the Python and C call stacks were decoupled and the data stack
|
||||||
then Python-to-Python calls could be made very fast.
|
were contiguous, then Python-to-Python calls could be made very fast.
|
||||||
|
|
||||||
LOAD_GLOBAL
|
LOAD_GLOBAL
|
||||||
'''''''''''
|
'''''''''''
|
||||||
|
|
||||||
The ``LOAD_GLOBAL`` instruction looks up a name in the global namespace and then, if not present in the global namespace,
|
The ``LOAD_GLOBAL`` instruction looks up a name in the global namespace
|
||||||
|
and then, if not present in the global namespace,
|
||||||
looks it up in the builtins namespace.
|
looks it up in the builtins namespace.
|
||||||
In 3.9 the C code for the ``LOAD_GLOBAL`` includes code to check to see whether the whole code object should be modified to add a cache,
|
In 3.9 the C code for the ``LOAD_GLOBAL`` includes code to check to see
|
||||||
whether either the global or builtins namespace, code to lookup the value in a cache, and fallback code.
|
whether the whole code object should be modified to add a cache,
|
||||||
This makes it complicated and bulky. It also performs many redundant operations even when supposedly optimized.
|
whether either the global or builtins namespace,
|
||||||
|
code to lookup the value in a cache, and fallback code.
|
||||||
|
This makes it complicated and bulky.
|
||||||
|
It also performs many redundant operations even when supposedly optimized.
|
||||||
|
|
||||||
Using a family of instructions makes the code more maintainable and faster, as each instruction only needs to handle one concern.
|
Using a family of instructions makes the code more maintainable and faster,
|
||||||
|
as each instruction only needs to handle one concern.
|
||||||
|
|
||||||
Specializations would include:
|
Specializations would include:
|
||||||
|
|
||||||
* ``LOAD_GLOBAL_ADAPTIVE`` would operate like ``CALL_FUNCTION_ADAPTIVE`` above.
|
* ``LOAD_GLOBAL_ADAPTIVE`` would operate like ``CALL_FUNCTION_ADAPTIVE`` above.
|
||||||
* ``LOAD_GLOBAL_MODULE`` can be specialized for the case where the value is in the globals namespace.
|
* ``LOAD_GLOBAL_MODULE`` can be specialized for the case where the value is in
|
||||||
After checking that the keys of the namespace have not changed, it can load the value from the stored index.
|
the globals namespace. After checking that the keys of the namespace have
|
||||||
* ``LOAD_GLOBAL_BUILTIN`` can be specialized for the case where the value is in the builtins namespace.
|
not changed, it can load the value from the stored index.
|
||||||
It needs to check that the keys of the global namespace have not been added to, and that the builtins namespace has not changed.
|
* ``LOAD_GLOBAL_BUILTIN`` can be specialized for the case where the value is
|
||||||
Note that we don't care if the values of the global namespace have changed, just the keys.
|
in the builtins namespace. It needs to check that the keys of the global
|
||||||
|
namespace have not been added to, and that the builtins namespace has not
|
||||||
|
changed. Note that we don't care if the values of the global namespace
|
||||||
|
have changed, just the keys.
|
||||||
|
|
||||||
See [4]_ for a full implementation.
|
See [4]_ for a full implementation.
|
||||||
|
|
||||||
.. note::
|
.. note::
|
||||||
|
|
||||||
This PEP outlines the mechanisms for managing specialization, and does not specify the particular optimizations to be applied.
|
This PEP outlines the mechanisms for managing specialization, and does not
|
||||||
The above scheme is just one possible scheme. Many others are possible and may well be better.
|
specify the particular optimizations to be applied.
|
||||||
|
The above scheme is just one possible scheme.
|
||||||
|
Many others are possible and may well be better.
|
||||||
|
|
||||||
Compatibility
|
Compatibility
|
||||||
=============
|
=============
|
||||||
|
|
||||||
There will be no change to the language, library or API.
|
There will be no change to the language, library or API.
|
||||||
|
|
||||||
The only way that users will be able to detect the presence of the new interpreter is through timing execution, the use of debugging tools,
|
The only way that users will be able to detect the presence of the new
|
||||||
|
interpreter is through timing execution, the use of debugging tools,
|
||||||
or measuring memory use.
|
or measuring memory use.
|
||||||
|
|
||||||
Costs
|
Costs
|
||||||
|
@ -236,13 +310,14 @@ Costs
|
||||||
Memory use
|
Memory use
|
||||||
----------
|
----------
|
||||||
|
|
||||||
An obvious concern with any scheme that performs any sort of caching is "how much more memory does it use?".
|
An obvious concern with any scheme that performs any sort of caching is
|
||||||
|
"how much more memory does it use?".
|
||||||
The short answer is "none".
|
The short answer is "none".
|
||||||
|
|
||||||
Comparing memory use to 3.10
|
Comparing memory use to 3.10
|
||||||
''''''''''''''''''''''''''''
|
''''''''''''''''''''''''''''
|
||||||
The following table shows the additional bytes per instruction to support the 3.10 opcache
|
The following table shows the additional bytes per instruction to support the
|
||||||
or the proposed adaptive interpreter, on a 64 bit machine.
|
3.10 opcache or the proposed adaptive interpreter, on a 64 bit machine.
|
||||||
|
|
||||||
================ ===== ======== ===== =====
|
================ ===== ======== ===== =====
|
||||||
Version 3.10 3.10 opt 3.11 3.11
|
Version 3.10 3.10 opt 3.11 3.11
|
||||||
|
@ -256,10 +331,13 @@ or the proposed adaptive interpreter, on a 64 bit machine.
|
||||||
================ ===== ======== ===== =====
|
================ ===== ======== ===== =====
|
||||||
|
|
||||||
``3.10`` is the current version of 3.10 which uses 32 bytes per entry.
|
``3.10`` is the current version of 3.10 which uses 32 bytes per entry.
|
||||||
``3.10 opt`` is a hypothetical improved version of 3.10 that uses 24 bytes per entry.
|
``3.10 opt`` is a hypothetical improved version of 3.10 that uses 24 bytes
|
||||||
|
per entry.
|
||||||
|
|
||||||
Even if one third of all instructions were specialized (a high proportion), then the memory use is still less than
|
Even if one third of all instructions were specialized (a high proportion),
|
||||||
that of 3.10. With a more realistic 25%, then memory use is basically the same as the hypothetical improved version of 3.10.
|
then the memory use is still less than that of 3.10.
|
||||||
|
With a more realistic 25%, then memory use is basically the same as the
|
||||||
|
hypothetical improved version of 3.10.
|
||||||
|
|
||||||
|
|
||||||
Security Implications
|
Security Implications
|
||||||
|
@ -277,7 +355,8 @@ Too many to list.
|
||||||
References
|
References
|
||||||
==========
|
==========
|
||||||
|
|
||||||
.. [1] The construction of high-performance virtual machines for dynamic languages, Mark Shannon 2010.
|
.. [1] The construction of high-performance virtual machines for
|
||||||
|
dynamic languages, Mark Shannon 2010.
|
||||||
http://theses.gla.ac.uk/2975/1/2011shannonphd.pdf
|
http://theses.gla.ac.uk/2975/1/2011shannonphd.pdf
|
||||||
|
|
||||||
.. [2] Dynamic Interpretation for Dynamic Scripting Languages
|
.. [2] Dynamic Interpretation for Dynamic Scripting Languages
|
||||||
|
@ -286,7 +365,8 @@ References
|
||||||
.. [3] Inline Caching meets Quickening
|
.. [3] Inline Caching meets Quickening
|
||||||
https://www.unibw.de/ucsrl/pubs/ecoop10.pdf/view
|
https://www.unibw.de/ucsrl/pubs/ecoop10.pdf/view
|
||||||
|
|
||||||
.. [4] Adaptive specializing examples (This will be moved to a more permanent location, once this PEP is accepted)
|
.. [4] Adaptive specializing examples
|
||||||
|
(This will be moved to a more permanent location, once this PEP is accepted)
|
||||||
https://gist.github.com/markshannon/556ccc0e99517c25a70e2fe551917c03
|
https://gist.github.com/markshannon/556ccc0e99517c25a70e2fe551917c03
|
||||||
|
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue