PEP 683: Immortal Objects v3 (#2372)

This is mostly changes in response to https://mail.python.org/archives/list/python-dev@python.org/thread/KDAR6CCMPOX36GQJUDWHQBKRD5USNV3B/.  Also, we increase the focus on the immutability of per-object runtime state.
This commit is contained in:
Eric Snow 2022-02-28 17:55:07 -07:00 committed by GitHub
parent 07e537864a
commit 0a6375d4e3
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 223 additions and 104 deletions

View File

@ -7,7 +7,7 @@ Type: Standards Track
Content-Type: text/x-rst
Created: 10-Feb-2022
Python-Version: 3.11
Post-History: 15-Feb-2022
Post-History: 15-Feb-2022, 19-Feb-2022, 28-Feb-2022
Resolution:
@ -19,29 +19,64 @@ Currently the CPython runtime maintains a
allocated memory of each object. Because of this, otherwise immutable
objects are actually mutable. This can have a large negative impact
on CPU and memory performance, especially for approaches to increasing
Python's scalability. The solution proposed here provides a way
to mark an object as one for which that per-object
runtime state should not change.
Python's scalability.
Specifically, if an object's refcount matches a very specific value
(defined below) then that object is treated as "immortal". If an object
is immortal then its refcount will never be modified by ``Py_INCREF()``,
etc. Consequently, the refcount will never reach 0, so that object will
never be cleaned up (unless explicitly done, e.g. during runtime
finalization). Additionally, all other per-object runtime state
for an immortal object will be considered immutable.
This proposal mandates that, internally, CPython will support marking
an object as one for which that runtime state will no longer change.
Consequently, such an object's refcount will never reach 0, and so the
object will never be cleaned up. We call these objects "immortal".
(Normally, only a relatively small number of internal objects
will ever be immortal.) The fundamental improvement here
is that now an object can be truly immutable.
This approach has some possible negative impact, which is explained
below, along with mitigations. A critical requirement for this change
is that the performance regression be no more than 2-3%. Anything worse
than performance-neutral requires that the other benefits are proportionally
large. Aside from specific applications, the fundamental improvement
here is that now an object can be truly immutable.
Scope
-----
(This proposal is meant to be CPython-specific and to affect only
internal implementation details. There are some slight exceptions
to that which are explained below. See `Backward Compatibility`_,
`Public Refcount Details`_, and `scope`_.)
Object immortality is meant to be an internal-only feature. So this
proposal does not include any changes to public API or behavior
(with one exception). As usual, we may still add some private
(yet publicly accessible) API to do things like immortalize an object
or tell if one is immortal. Any effort to expose this feature to users
would need to be proposed separately.
There is one exception to "no change in behavior": refcounting semantics
for immortal objects will differ in some cases from user expectations.
This exception, and the solution, are discussed below.
Most of this PEP focuses on an internal implementation that satisfies
the above mandate. However, those implementation details are not meant
to be strictly proscriptive. Instead, at the least they are included
to help illustrate the technical considerations required by the mandate.
The actual implementation may deviate somewhat as long as it satisfies
the constraints outlined below. Furthermore, the acceptability of any
specific implementation detail described below does not depend on
the status of this PEP, unless explicitly specified.
For example, the particular details of:
* how to mark something as immortal
* how to recognize something as immortal
* which subset of functionally immortal objects are marked as immortal
* which memory-management activities are skipped or modified for immortal objects
are not only CPython-specific but are also private implementation
details that are expected to change in subsequent versions.
Implementation Summary
----------------------
Here's a high-level look at the implementation:
If an object's refcount matches a very specific value (defined below)
then that object is treated as immortal. The CPython C-API and runtime
will not modify the refcount (or other runtime state) of an immortal
object.
Aside from the change to refcounting semantics, there is one other
possible negative impact to consider. A naive implementation of the
approach described below makes CPython roughly 4% slower. However,
the implementation is performance-neutral once known mitigations
are applied.
Motivation
@ -153,7 +188,7 @@ Impact
Benefits
--------
Most notably, the cases described in the two examples above stand
Most notably, the cases described in the above examples stand
to benefit greatly from immortal objects. Projects using pre-fork
can drop their workarounds. For the per-interpreter GIL project,
immortal objects greatly simplifies the solution for existing static
@ -167,10 +202,9 @@ usage. This is reflected in most of the above cases.
Performance
-----------
A naive implementation shows `a 4% slowdown`_.
Several promising mitigation strategies will be pursued in the effort
to bring it closer to performance-neutral. See the `mitigation`_
section below.
A naive implementation shows `a 4% slowdown`_. We have demonstrated
a return to performance-neutral with a handful of basic mitigations
applied. See the `mitigation`_ section below.
On the positive side, immortal objects save a significant amount of
memory when used with a pre-fork model. Also, immortal objects provide
@ -182,59 +216,52 @@ performance.
Backward Compatibility
----------------------
This proposal is meant to be completely compatible. It focuses strictly
on internal implementation details. It does not involve changes to any
public API, other than a few minor changes in behavior related to refcounts
(but only for immortal objects):
Ideally this internal-only feature would be completely compatible.
However, it does involve a change to refcount semantics in some cases.
Only immortal objects are affected, but this includes high-use objects
like ``None``, ``True``, and ``False``.
Specifically, when an immortal object is involved:
* code that inspects the refcount will see a really, really large value
* the new noop behavior may break code that:
* depends specifically on the refcount to always increment or decrement
(or have a specific value from ``Py_SET_REFCNT()``)
* relies on any specific refcount value, other than 0
* relies on any specific refcount value, other than 0 or 1
* directly manipulates the refcount to store extra information there
* in 32-bit pre-3.11 `Stable ABI`_ extensions,
objects may leak due to `Accidental Immortality`_
* such extensions may crash due to `Accidental De-Immortalizing`_
Again, those changes in behavior only apply to immortal objects, not
most of the objects a user will access. Furthermore, users cannot mark
an object as immortal so no user-created objects will ever have that
changed behavior. Users that rely on any of the changing behavior for
global (builtin) objects are already in trouble.
global (builtin) objects are already in trouble. So the overall impact
should be small.
Also note that code which checks for refleaks should keep working fine,
unless it checks for hard-coded small values relative to some immortal
object. The problems noticed by `Pyston`_ shouldn't apply here since
we do not modify the refcount.
See `Public Refcount Details`_ and `scope`_ below for further discussion.
Stable ABI
----------
The approach is also compatible with extensions compiled to the stable
ABI. Unfortunately, they will modify the refcount and invalidate all
the performance benefits of immortal objects. However, the high bit
of the refcount `will still match _Py_IMMORTAL_REFCNT <_Py_IMMORTAL_REFCNT_>`_
so we can still identify such objects as immortal. At worst, objects
in that situation would feel the effects described in the `Motivation`_
section. Even then the overall impact is unlikely to be significant.
Also see `_Py_IMMORTAL_REFCNT`_ below.
See `Public Refcount Details`_ below for further discussion.
Accidental Immortality
----------------------
''''''''''''''''''''''
Hypothetically, a regular object could be incref'ed so much that it
reaches the magic value needed to be considered immortal. That means
it would accidentally never be cleaned up (by going back to 0).
Hypothetically, a non-immortal object could be incref'ed so much
that it reaches the magic value needed to be considered immortal.
That means it would accidentally never be cleaned up
(by going back to 0).
While it isn't impossible, this accidental scenario is so unlikely
that we need not worry. Even if done deliberately by using
``Py_INCREF()`` in a tight loop and each iteration only took 1 CPU
cycle, it would take 2^61 cycles (on a 64-bit processor). At a fast
5 GHz that would still take nearly 500,000,000 seconds (over 5,000 days)!
If that CPU were 32-bit then it is (technically) more possible though
still highly unlikely.
On 64-bit builds, this accidental scenario is so unlikely that we need
not worry. Even if done deliberately by using ``Py_INCREF()`` in a
tight loop and each iteration only took 1 CPU cycle, it would take
2^60 cycles (if the immortal bit were 2^60). At a fast 5 GHz that would
still take nearly 250,000,000 seconds (over 2,500 days)!
Also note that it is doubly unlikely to be a problem because it wouldn't
matter until the refcount got back to 0 and the object was cleaned up.
@ -245,9 +272,106 @@ would be noticed.
Again, the only realistic way that the magic refcount would be reached
(and then reversed) is if it were done deliberately. (Of course, the
same thing could be done efficiently using ``Py_SET_REFCNT()`` though
that would be even less of an accident.) At that point we don't
that would be even less of an accident.) At that point we don't
consider it a concern of this proposal.
On 32-bit builds it isn't so obvious. Let's say the magic refcount
were 2^30. Using the same specs as above, it would take roughly
4 seconds to accidentally immortalize an object. Under reasonable
conditions, it is still highly unlikely that an object be accidentally
immortalized. It would have to meet these criteria:
* targeting a non-immortal object (so not one of the high-use builtins)
* the extension increfs without a corresponding decref
(e.g. returns from a function or method)
* no other code decrefs the object in the meantime
Even at a much less frequent rate incref it would not take long to reach
accidental immortality (on 32-bit). However, then it would have to run
through the same number of (now noop-ing) decrefs before that one object
would be effectively leaking. This is highly unlikely, especially because
the calculations assume no decrefs.
Furthermore, this isn't all that different from how such 32-bit extensions
can already incref an object past 2^31 and turn the refcount negative.
If that were an actual problem then we would have heard about it.
Between all of the above cases, the proposal doesn't consider
accidental immortality a problem.
Stable ABI
''''''''''
The implementation approach described in this PEP is compatible
with extensions compiled to the stable ABI (with the exception
of `Accidental Immortality`_ and `Accidental De-Immortalizing`_).
Due to the nature of the stable ABI, unfortunately, such extensions
use versions of ``Py_INCREF()``, etc. that directly modify the object's
``ob_refcnt`` field. This will invalidate all the performance benefits
of immortal objects.
However, we do ensure that immortal objects (mostly) stay immortal
in that situation. We set the initial refcount of immortal objects to
a value high above the magic refcount value, but one that still matches
the high bit. Thus we can still identify such objects as immortal.
(See `_Py_IMMORTAL_REFCNT`_.) At worst, objects in that situation
would feel the effects described in the `Motivation`_ section.
Even then the overall impact is unlikely to be significant.
Accidental De-Immortalizing
'''''''''''''''''''''''''''
32-bit builds of older stable ABI extensions can take `Accidental Immortality`_
to the next level.
Hypothetically, such an extension could incref an object to a value on
the next highest bit above the magic refcount value. For example, if
the magic value were 2^30 and the initial immortal refcount were thus
2^30 + 2^29 then it would take 2^29 increfs by the extension to reach
a value of 2^31, making the object non-immortal.
(Of course, a refcount that high would probably already cause a crash,
regardless of immortal objects.)
The more problematic case is where such a 32-bit stable ABI extension
goes crazy decref'ing an already immortal object. Continuing with the
above example, it would take 2^29 asymmetric decrefs to drop below the
magic immortal refcount value. So an object like ``None`` could be
made mortal and subject to decref. That still wouldn't be a problem
until somehow the decrefs continue on that object until it reaches 0.
For many immortal objects, like ``None``, the extension will crash
the process if it tries to dealloc the object. For the other
immortal objects, the dealloc might be okay. However, there will
be runtime code expecting the formerly-immortal object to be around
forever. That code will probably crash.
Again, the likelihood of this happening is extremely small, even on
32-bit builds. It would require roughly a billion decrefs on that
one object without a corresponding incref. The most likely scenario is
the following:
A "new" reference to ``None`` is returned by many functions and methods.
Unlike with non-immortal objects, the 3.11 runtime will almost never
incref ``None`` before giving it to the extension. However, the
extension *will* decref it when done with it (unless it returns it).
Each time that exchange happens with the one object, we get one step
closer to a crash.
How realistic is it that some form of that exchange (with a single
object) will happen a billion times in the lifetime of a Python process
on 32-bit? If it is a problem, how could it be addressed?
As to how realistic, the answer isn't clear currently. However, the
mitigation is simple enough that we can safely proceed under the
assumption that it would be a problem.
Here are some possible solutions (only needed on 32-bit):
* periodically reset the refcount for immortal objects
(only enable this if a stable ABI extension is imported?)
* special-case immortal objects in tp_dealloc() for the relevant types
(but not int, due to frequency?)
* provide a runtime flag for disabling immortality
Alternate Python Implementations
--------------------------------
@ -318,8 +442,10 @@ to the following questions:
As part of this proposal, we must make sure that users can clearly
understand on which parts of the refcount behavior they can rely and
which are considered implementation details. Specifically, they should
use the existing public refcount-related API and the only refcount value
with any meaning is 0. All other values are considered "not 0".
use the existing public refcount-related API and the only refcount
values with any meaning are 0 and 1. (Some code relies on 1 as an
indicator that the object can be safely modified.) All other values
are considered "not 0 or 1".
This information will be clarified in the `documentation <Documentation_>`_.
@ -343,27 +469,15 @@ Constraints
* be careful when immortalizing objects that we don't actually expect
to persist until runtime finalization.
* be careful when immortalizing objects that are not otherwise immutable
* ``__del__`` and weakrefs must continue working properly
.. _scope:
Scope of Changes
----------------
Object immortality is not meant to be a public feature but rather an
internal one. So the proposal does *not* include adding any new
public C-API, nor any Python API. However, this does not prevent
us from adding (publicly accessible) private API to do things
like immortalize an object or tell if one is immortal.
The particular details of:
* how to mark something as immortal
* how to recognize something as immortal
* which subset of functionally immortal objects are marked as immortal
* which memory-management activities are skipped or modified for immortal objects
are not only Cpython-specific but are also private implementation
details that are expected to change in subsequent versions.
Regarding "truly" immutable objects, this PEP doesn't impact the
effective immutability of any objects, other than the per-object
runtime state (e.g. refcount). So whether or not some immortal object
is truly (or even effectively) immutable can only be settled separately
from this proposal. For example, str objects are generally considered
immutable, but ``PyUnicodeObject`` holds some lazily cached data. This
PEP has no influence on how that state affects str immutability.
Immortal Mutable Objects
------------------------
@ -390,9 +504,6 @@ it immortal, we no longer incur the extra overhead during incref/decref.
We explore this idea further in the `mitigation`_ section below.
(Note that we are still investigating the impact on GC
of immortalizing containers.)
Implicitly Immortal Objects
---------------------------
@ -437,14 +548,18 @@ _Py_IMMORTAL_REFCNT
We will add two internal constants::
#define _Py_IMMORTAL_BIT (1LL << (8 * sizeof(Py_ssize_t) - 4))
#define _Py_IMMORTAL_REFCNT (_Py_IMMORTAL_BIT + (_Py_IMMORTAL_BIT / 2))
_Py_IMMORTAL_BIT - has the top-most available bit set (e.g. 2^62)
_Py_IMMORTAL_REFCNT - has the two top-most available bits set
The refcount for immortal objects will be set to ``_Py_IMMORTAL_REFCNT``.
However, to check if an object is immortal we will compare its refcount
against just the bit::
The actual top-most bit depends on existing uses for refcount bits,
e.g. the sign bit or some GC uses. We will use the highest bit possible
after consideration of existing uses.
(op->ob_refcnt & _Py_IMMORTAL_BIT) != 0
The refcount for immortal objects will be set to ``_Py_IMMORTAL_REFCNT``
(meaning the value will be halfway between ``_Py_IMMORTAL_BIT`` and the
value at the next highest bit). However, to check if an object is
immortal we will compare (bitwise-and) its refcount against just
``_Py_IMMORTAL_BIT``.
The difference means that an immortal object will still be considered
immortal, even if somehow its refcount were modified (e.g. by an older
@ -471,24 +586,21 @@ API that exposes refcounts (unchanged but may now return large values):
(Note that ``_Py_RefTotal`` and ``sys.gettotalrefcount()``
will not be affected.)
Also, immortal objects will not participate in GC.
Immortal Global Objects
-----------------------
All objects that we expect to be shared globally (between interpreters)
will be made immortal. That includes the following:
All runtime-global (builtin) objects will be made immortal.
That includes the following:
* singletons (``None``, ``True``, ``False``, ``Ellipsis``, ``NotImplemented``)
* all static types (e.g. ``PyLong_Type``, ``PyExc_Exception``)
* all static objects in ``_PyRuntimeState.global_objects`` (e.g. identifiers,
small ints)
All such objects will be immutable. In the case of the static types,
they will only be effectively immutable. ``PyTypeObject`` has some mutable
state (``tp_dict`` and ``tp_subclasses``), but we can work around this
by storing that state on ``PyInterpreterState`` instead of on the
respective static type object. Then the ``__dict__``, etc. getter
will do a lookup on the current interpreter, if appropriate, instead
of using ``tp_dict``.
The question of making them actually immutable (e.g. for
per-interpreter GIL) is not in the scope of this PEP.
Object Cleanup
--------------
@ -515,6 +627,8 @@ accessible on the runtime state, e.g. in a ``_PyRuntimeState`` or
``PyInterpreterState`` field. We may need to add a tracking mechanism
to the runtime state for a small number of objects.
None of the cleanup will have a significant effect on performance.
.. _mitigation:
Performance Regression Mitigation
@ -557,11 +671,18 @@ However, we will update the documentation to make public guarantees
about refcount behavior more clear. That includes, specifically:
* ``Py_INCREF()`` - change "Increment the reference count for object o."
to "Acquire a new reference to object o."
to "Indicate taking a new reference to object o."
* ``Py_DECREF()`` - change "Decrement the reference count for object o."
to "Release a reference to object o."
to "Indicate no longer using a previously taken reference to object o."
* similar for ``Py_XINCREF()``, ``Py_XDECREF()``, ``Py_NewRef()``,
``Py_XNewRef()``, ``Py_Clear()``, ``Py_REFCNT()``, and ``Py_SET_REFCNT()``
``Py_XNewRef()``, ``Py_Clear()``
* ``Py_REFCNT()`` - add "The refcounts 0 and 1 have specific meanings
and all others only mean code somewhere is using the object,
regardless of the value.
0 means the object is not used and will be cleaned up.
1 means code holds exactly a single reference."
* ``Py_SET_REFCNT()`` - refer to ``Py_REFCNT()`` about how values over 1
may be substituted with some over value
We *may* also add a note about immortal objects to the following,
to help reduce any surprise users may have with the change:
@ -586,9 +707,7 @@ https://github.com/python/cpython/pull/19474
Open Issues
===========
* is there any other impact on GC?
* `are the copy-on-write benefits real? <https://mail.python.org/archives/list/python-dev@python.org/message/J53GY7XKFOI4KWHSTTA7FUL7TJLE7WG6/>`__
* must the fate of this PEP be tied to acceptance of a per-interpreter GIL PEP?
* how realistic is the `Accidental De-Immortalizing`_ concern?
References