921 lines
38 KiB
ReStructuredText
921 lines
38 KiB
ReStructuredText
PEP: 683
|
|
Title: Immortal Objects, Using a Fixed Refcount
|
|
Author: Eric Snow <ericsnowcurrently@gmail.com>, Eddie Elizondo <eduardo.elizondorueda@gmail.com>
|
|
Discussions-To: https://discuss.python.org/t/18183
|
|
Status: Accepted
|
|
Type: Standards Track
|
|
Content-Type: text/x-rst
|
|
Created: 10-Feb-2022
|
|
Python-Version: 3.12
|
|
Post-History: `16-Feb-2022 <https://mail.python.org/archives/list/python-dev@python.org/thread/TPLEYDCXFQ4AMTW6F6OQFINSIFYBRFCR/>`__,
|
|
`19-Feb-2022 <https://mail.python.org/archives/list/python-dev@python.org/thread/KDAR6CCMPOX36GQJUDWHQBKRD5USNV3B/>`__,
|
|
`28-Feb-2022 <https://mail.python.org/archives/list/python-dev@python.org/thread/MI22URMVKC63OFMZTALHFZKAKVGAT4UF/>`__,
|
|
`12-Aug-2022 <https://discuss.python.org/t/18183>`__,
|
|
Resolution: https://discuss.python.org/t/18183/26
|
|
|
|
|
|
PEP Acceptance Conditions
|
|
=========================
|
|
|
|
The PEP was accepted with conditions:
|
|
|
|
* we must apply the primary proposal
|
|
in `Solutions for Accidental De-Immortalization`_
|
|
(reset the immortal refcount in ``tp_dealloc()``)
|
|
* types without this may not be immortalized (in CPython's code)
|
|
* the PEP must be updated with final benchmark results once
|
|
the implementation is finalized
|
|
* we will have one last round of discussion about those results at that point
|
|
|
|
|
|
Abstract
|
|
========
|
|
|
|
Currently the CPython runtime maintains a
|
|
`small amount of mutable state <Runtime Object State_>`_ in the
|
|
allocated memory of each object. Because of this, otherwise immutable
|
|
objects are actually mutable. This can have a large negative impact
|
|
on CPU and memory performance, especially for approaches to increasing
|
|
Python's scalability.
|
|
|
|
This proposal mandates that, internally, CPython will support marking
|
|
an object as one for which that runtime state will no longer change.
|
|
Consequently, such an object's refcount will never reach 0, and thus
|
|
the object will never be cleaned up (except when the runtime knows
|
|
it's safe to do so, like during runtime finalization).
|
|
We call these objects "immortal". (Normally, only a relatively small
|
|
number of internal objects will ever be immortal.)
|
|
The fundamental improvement here is that now an object
|
|
can be truly immutable.
|
|
|
|
Scope
|
|
-----
|
|
|
|
Object immortality is meant to be an internal-only feature, so this
|
|
proposal does not include any changes to public API or behavior
|
|
(with one exception). As usual, we may still add some private
|
|
(yet publicly accessible) API to do things like immortalize an object
|
|
or tell if one is immortal. Any effort to expose this feature to users
|
|
would need to be proposed separately.
|
|
|
|
There is one exception to "no change in behavior": refcounting semantics
|
|
for immortal objects will differ in some cases from user expectations.
|
|
This exception, and the solution, are discussed below.
|
|
|
|
Most of this PEP focuses on an internal implementation that satisfies
|
|
the above mandate. However, those implementation details are not meant
|
|
to be strictly proscriptive. Instead, at the least they are included
|
|
to help illustrate the technical considerations required by the mandate.
|
|
The actual implementation may deviate somewhat as long as it satisfies
|
|
the constraints outlined below. Furthermore, the acceptability of any
|
|
specific implementation detail described below does not depend on
|
|
the status of this PEP, unless explicitly specified.
|
|
|
|
For example, the particular details of:
|
|
|
|
* how to mark something as immortal
|
|
* how to recognize something as immortal
|
|
* which subset of functionally immortal objects are marked as immortal
|
|
* which memory-management activities are skipped or modified for immortal objects
|
|
|
|
are not only CPython-specific but are also private implementation
|
|
details that are expected to change in subsequent versions.
|
|
|
|
Implementation Summary
|
|
----------------------
|
|
|
|
Here's a high-level look at the implementation:
|
|
|
|
If an object's refcount matches a very specific value (defined below)
|
|
then that object is treated as immortal. The CPython C-API and runtime
|
|
will not modify the refcount (or other runtime state) of an immortal
|
|
object. The runtime will now be explicitly responsible for deallocating
|
|
all immortal objects during finalization, unless statically allocated.
|
|
(See `Object Cleanup`_ below.)
|
|
|
|
Aside from the change to refcounting semantics, there is one other
|
|
possible negative impact to consider. The threshold for an "acceptable"
|
|
performance penalty for immortal objects is 2% (the consensus at the
|
|
2022 Language Summit). A naive implementation of the approach described
|
|
below makes CPython roughly 4% slower. However, the implementation
|
|
is ~performance-neutral~ once known mitigations are applied.
|
|
|
|
TODO: Update the performance impact for the latest branch
|
|
(both for GCC and for clang).
|
|
|
|
|
|
Motivation
|
|
==========
|
|
|
|
As noted above, currently all objects are effectively mutable. That
|
|
includes "immutable" objects like ``str`` instances. This is because
|
|
every object's refcount is frequently modified as the object is used
|
|
during execution. This is especially significant for a number of
|
|
commonly used global (builtin) objects, e.g. ``None``. Such objects
|
|
are used a lot, both in Python code and internally. That adds up to
|
|
a consistent high volume of refcount changes.
|
|
|
|
The effective mutability of all Python objects has a concrete impact
|
|
on parts of the Python community, e.g. projects that aim for
|
|
scalability like Instagram or the effort to make the GIL
|
|
per-interpreter. Below we describe several ways in which refcount
|
|
modification has a real negative effect on such projects.
|
|
None of that would happen for objects that are truly immutable.
|
|
|
|
Reducing CPU Cache Invalidation
|
|
-------------------------------
|
|
|
|
Every modification of a refcount causes the corresponding CPU cache
|
|
line to be invalidated. This has a number of effects.
|
|
|
|
For one, the write must be propagated to other cache levels
|
|
and to main memory. This has small effect on all Python programs.
|
|
Immortal objects would provide a slight relief in that regard.
|
|
|
|
On top of that, multi-core applications pay a price. If two threads
|
|
(running simultaneously on distinct cores) are interacting with the
|
|
same object (e.g. ``None``) then they will end up invalidating each
|
|
other's caches with each incref and decref. This is true even for
|
|
otherwise immutable objects like ``True``, ``0``, and ``str`` instances.
|
|
CPython's GIL helps reduce this effect, since only one thread runs at a
|
|
time, but it doesn't completely eliminate the penalty.
|
|
|
|
Avoiding Data Races
|
|
-------------------
|
|
|
|
Speaking of multi-core, we are considering making the GIL
|
|
a per-interpreter lock, which would enable true multi-core parallelism.
|
|
Among other things, the GIL currently protects against races between
|
|
multiple concurrent threads that may incref or decref the same object.
|
|
Without a shared GIL, two running interpreters could not safely share
|
|
any objects, even otherwise immutable ones like ``None``.
|
|
|
|
This means that, to have a per-interpreter GIL, each interpreter must
|
|
have its own copy of *every* object. That includes the singletons and
|
|
static types. We have a viable strategy for that but it will require
|
|
a meaningful amount of extra effort and extra complexity.
|
|
|
|
The alternative is to ensure that all shared objects are truly immutable.
|
|
There would be no races because there would be no modification. This
|
|
is something that the immortality proposed here would enable for
|
|
otherwise immutable objects. With immortal objects,
|
|
support for a per-interpreter GIL
|
|
becomes much simpler.
|
|
|
|
Avoiding Copy-on-Write
|
|
----------------------
|
|
|
|
For some applications it makes sense to get the application into
|
|
a desired initial state and then fork the process for each worker.
|
|
This can result in a large performance improvement, especially
|
|
memory usage. Several enterprise Python users (e.g. Instagram,
|
|
YouTube) have taken advantage of this. However, the above
|
|
refcount semantics drastically reduce the benefits and
|
|
have led to some sub-optimal workarounds.
|
|
|
|
Also note that "fork" isn't the only operating system mechanism
|
|
that uses copy-on-write semantics. Another example is ``mmap``.
|
|
Any such utility will potentially benefit from fewer copy-on-writes
|
|
when immortal objects are involved, when compared to using only
|
|
"mortal" objects.
|
|
|
|
|
|
Rationale
|
|
=========
|
|
|
|
The proposed solution is obvious enough that both of this proposal's
|
|
authors came to the same conclusion (and implementation, more or less)
|
|
independently. The Pyston project `uses a similar approach <Pyston_>`_.
|
|
Other designs were also considered. Several possibilities have also
|
|
been discussed on python-dev in past years.
|
|
|
|
Alternatives include:
|
|
|
|
* use a high bit to mark "immortal" but do not change ``Py_INCREF()``
|
|
* add an explicit flag to objects
|
|
* implement via the type (``tp_dealloc()`` is a no-op)
|
|
* track via the object's type object
|
|
* track with a separate table
|
|
|
|
Each of the above makes objects immortal, but none of them address
|
|
the performance penalties from refcount modification described above.
|
|
|
|
In the case of per-interpreter GIL, the only realistic alternative
|
|
is to move all global objects into ``PyInterpreterState`` and add
|
|
one or more lookup functions to access them. Then we'd have to
|
|
add some hacks to the C-API to preserve compatibility for the
|
|
may objects exposed there. The story is much, much simpler
|
|
with immortal objects.
|
|
|
|
|
|
Impact
|
|
======
|
|
|
|
Benefits
|
|
--------
|
|
|
|
Most notably, the cases described in the above examples stand
|
|
to benefit greatly from immortal objects. Projects using pre-fork
|
|
can drop their workarounds. For the per-interpreter GIL project,
|
|
immortal objects greatly simplifies the solution for existing static
|
|
types, as well as objects exposed by the public C-API.
|
|
|
|
In general, a strong immutability guarantee for objects enables Python
|
|
applications to scale better, particularly in
|
|
`multi-process deployments <Facebook>`_. This is because they can then
|
|
leverage multi-core parallelism without such a significant tradeoff in
|
|
memory usage as they now have. The cases we just described, as well as
|
|
those described above in `Motivation`_, reflect this improvement.
|
|
|
|
Performance
|
|
-----------
|
|
|
|
A naive implementation shows `a 4% slowdown`_. We have demonstrated
|
|
a return to ~performance-neutral~ with a handful of basic mitigations
|
|
applied. See the `mitigations`_ section below.
|
|
|
|
On the positive side, immortal objects save a significant amount of
|
|
memory when used `with a pre-fork model <Facebook>`_. Also, immortal
|
|
objects provide opportunities for specialization in the eval loop that
|
|
would improve performance.
|
|
|
|
.. _a 4% slowdown: https://github.com/python/cpython/pull/19474#issuecomment-1032944709
|
|
|
|
TODO: Update the performance impact for the latest branch
|
|
(both for GCC and for clang).
|
|
|
|
Backward Compatibility
|
|
----------------------
|
|
|
|
Ideally this internal-only feature would be completely compatible.
|
|
However, it does involve a change to refcount semantics in some cases.
|
|
Only immortal objects are affected, but this includes high-use objects
|
|
like ``None``, ``True``, and ``False``.
|
|
|
|
Specifically, when an immortal object is involved:
|
|
|
|
* code that inspects the refcount will see a really, really large value
|
|
* the new noop behavior may break code that:
|
|
|
|
* depends specifically on the refcount to always increment or decrement
|
|
(or have a specific value from ``Py_SET_REFCNT()``)
|
|
* relies on any specific refcount value, other than 0 or 1
|
|
* directly manipulates the refcount to store extra information there
|
|
|
|
* in 32-bit pre-3.12 `Stable ABI`_ extensions,
|
|
objects may leak due to `Accidental Immortality`_
|
|
* such extensions may crash due to `Accidental De-Immortalizing`_
|
|
|
|
Again, those changes in behavior only apply to immortal objects,
|
|
not the vast majority of objects a user will use. Furthermore,
|
|
users cannot mark an object as immortal so no user-created objects
|
|
will ever have that changed behavior. Users that rely on any of
|
|
the changing behavior for global (builtin) objects are already
|
|
in trouble. So the overall impact should be small.
|
|
|
|
Also note that code which checks for refleaks should keep working fine,
|
|
unless it checks for hard-coded small values relative to some immortal
|
|
object. The problems noticed by `Pyston`_ shouldn't apply here since
|
|
we do not modify the refcount.
|
|
|
|
See `Public Refcount Details`_ below for further discussion.
|
|
|
|
Accidental Immortality
|
|
''''''''''''''''''''''
|
|
|
|
Hypothetically, a non-immortal object could be incref'ed so much
|
|
that it reaches the magic value needed to be considered immortal.
|
|
That means it would never be decref'ed all the way back to 0, so it
|
|
would accidentally leak (never be cleaned up).
|
|
|
|
With 64-bit refcounts, this accidental scenario is so unlikely that
|
|
we need not worry. Even if done deliberately by using ``Py_INCREF()``
|
|
in a tight loop and each iteration only took 1 CPU cycle, it would take
|
|
2^60 cycles (if the immortal bit were 2^60). At a fast 5 GHz that would
|
|
still take nearly 250,000,000 seconds (over 2,500 days)!
|
|
|
|
Also note that it is doubly unlikely to be a problem because it wouldn't
|
|
matter until the refcount would have gotten back to 0 and the object
|
|
cleaned up. So any object that hit that magic "immortal" refcount value
|
|
would have to be decref'ed that many times again before the change
|
|
in behavior would be noticed.
|
|
|
|
Again, the only realistic way that the magic refcount would be reached
|
|
(and then reversed) is if it were done deliberately. (Of course, the
|
|
same thing could be done efficiently using ``Py_SET_REFCNT()`` though
|
|
that would be even less of an accident.) At that point we don't
|
|
consider it a concern of this proposal.
|
|
|
|
On builds with much smaller maximum refcounts, like 32-bit platforms,
|
|
the consequences aren't so obvious. Let's say the magic refcount
|
|
were 2^30. Using the same specs as above, it would take roughly
|
|
4 seconds to accidentally immortalize an object. Under reasonable
|
|
conditions, it is still highly unlikely that an object be accidentally
|
|
immortalized. It would have to meet these criteria:
|
|
|
|
* targeting a non-immortal object (so not one of the high-use builtins)
|
|
* the extension increfs without a corresponding decref
|
|
(e.g. returns from a function or method)
|
|
* no other code decrefs the object in the meantime
|
|
|
|
Even at a much less frequent rate it would not take long to reach
|
|
accidental immortality (on 32-bit). However, then it would have to run
|
|
through the same number of (now noop-ing) decrefs before that one object
|
|
would be effectively leaking. This is highly unlikely, especially because
|
|
the calculations assume no decrefs.
|
|
|
|
Furthermore, this isn't all that different from how such 32-bit extensions
|
|
can already incref an object past 2^31 and turn the refcount negative.
|
|
If that were an actual problem then we would have heard about it.
|
|
|
|
Between all of the above cases, the proposal doesn't consider
|
|
accidental immortality a problem.
|
|
|
|
Stable ABI
|
|
''''''''''
|
|
|
|
The implementation approach described in this PEP is compatible
|
|
with extensions compiled to the stable ABI (with the exception
|
|
of `Accidental Immortality`_ and `Accidental De-Immortalizing`_).
|
|
Due to the nature of the stable ABI, unfortunately, such extensions
|
|
use versions of ``Py_INCREF()``, etc. that directly modify the object's
|
|
``ob_refcnt`` field. This will invalidate all the performance benefits
|
|
of immortal objects.
|
|
|
|
However, we do ensure that immortal objects (mostly) stay immortal
|
|
in that situation. We set the initial refcount of immortal objects to
|
|
a value for which we can identify the object as immortal and which
|
|
continues to do so even if the refcount is modified by an extension.
|
|
(For example, suppose we used one of the high refcount bits to indicate
|
|
that an object was immortal. We would set the initial refcount to a
|
|
higher value that still matches the bit, like halfway to the next bit.
|
|
See `_Py_IMMORTAL_REFCNT`_.)
|
|
At worst, objects in that situation would feel the effects
|
|
described in the `Motivation`_ section. Even then
|
|
the overall impact is unlikely to be significant.
|
|
|
|
Accidental De-Immortalizing
|
|
'''''''''''''''''''''''''''
|
|
|
|
32-bit builds of older stable ABI extensions can take
|
|
`Accidental Immortality`_ to the next level.
|
|
|
|
Hypothetically, such an extension could incref an object to a value on
|
|
the next highest bit above the magic refcount value. For example, if
|
|
the magic value were 2^30 and the initial immortal refcount were thus
|
|
2^30 + 2^29 then it would take 2^29 increfs by the extension to reach
|
|
a value of 2^31, making the object non-immortal.
|
|
(Of course, a refcount that high would probably already cause a crash,
|
|
regardless of immortal objects.)
|
|
|
|
The more problematic case is where such a 32-bit stable ABI extension
|
|
goes crazy decref'ing an already immortal object. Continuing with the
|
|
above example, it would take 2^29 asymmetric decrefs to drop below the
|
|
magic immortal refcount value. So an object like ``None`` could be
|
|
made mortal and subject to decref. That still wouldn't be a problem
|
|
until somehow the decrefs continue on that object until it reaches 0.
|
|
For statically allocated immortal objects, like ``None``, the extension
|
|
would crash the process if it tried to dealloc the object. For any
|
|
other immortal objects, the dealloc might be okay. However, there
|
|
might be runtime code expecting the formerly-immortal object to be
|
|
around forever. That code would probably crash.
|
|
|
|
Again, the likelihood of this happening is extremely small, even on
|
|
32-bit builds. It would require roughly a billion decrefs on that
|
|
one object without a corresponding incref. The most likely scenario is
|
|
the following:
|
|
|
|
A "new" reference to ``None`` is returned by many functions and methods.
|
|
Unlike with non-immortal objects, the 3.12 runtime will basically never
|
|
incref ``None`` before giving it to the extension. However, the
|
|
extension *will* decref it when done with it (unless it returns it).
|
|
Each time that exchange happens with the one object, we get one step
|
|
closer to a crash.
|
|
|
|
How realistic is it that some form of that exchange (with a single
|
|
object) will happen a billion times in the lifetime of a Python process
|
|
on 32-bit? If it is a problem, how could it be addressed?
|
|
|
|
As to how realistic, the answer isn't clear currently. However, the
|
|
mitigation is simple enough that we can safely proceed under the
|
|
assumption that it would not be a problem.
|
|
|
|
We look at possible solutions
|
|
`later on <Solutions for Accidental De-Immortalization>`_.
|
|
|
|
Alternate Python Implementations
|
|
--------------------------------
|
|
|
|
This proposal is CPython-specific. However, it does relate to the
|
|
behavior of the C-API, which may affect other Python implementations.
|
|
Consequently, the effect of changed behavior described in
|
|
`Backward Compatibility`_ above also applies here (e.g. if another
|
|
implementation is tightly coupled to specific refcount values, other
|
|
than 0, or on exactly how refcounts change, then they may impacted).
|
|
|
|
Security Implications
|
|
---------------------
|
|
|
|
This feature has no known impact on security.
|
|
|
|
Maintainability
|
|
---------------
|
|
|
|
This is not a complex feature so it should not cause much mental
|
|
overhead for maintainers. The basic implementation doesn't touch
|
|
much code so it should have much impact on maintainability. There
|
|
may be some extra complexity due to performance penalty mitigation.
|
|
However, that should be limited to where we immortalize all objects
|
|
post-init and later explicitly deallocate them during runtime
|
|
finalization. The code for this should be relatively concentrated.
|
|
|
|
|
|
Specification
|
|
=============
|
|
|
|
The approach involves these fundamental changes:
|
|
|
|
* add `_Py_IMMORTAL_REFCNT`_ (the magic value) to the internal C-API
|
|
* update ``Py_INCREF()`` and ``Py_DECREF()`` to no-op for objects
|
|
that match the magic refcount
|
|
* do the same for any other API that modifies the refcount
|
|
* stop modifying ``PyGC_Head`` for immortal GC objects ("containers")
|
|
* ensure that all immortal objects are cleaned up during
|
|
runtime finalization
|
|
|
|
Then setting any object's refcount to ``_Py_IMMORTAL_REFCNT``
|
|
makes it immortal.
|
|
|
|
(There are other minor, internal changes which are not described here.)
|
|
|
|
In the following sub-sections we dive into the most significant details.
|
|
First we will cover some conceptual topics, followed by more concrete
|
|
aspects like specific affected APIs.
|
|
|
|
Public Refcount Details
|
|
-----------------------
|
|
|
|
In `Backward Compatibility`_ we introduced possible ways that user code
|
|
might be broken by the change in this proposal. Any contributing
|
|
misunderstanding by users is likely due in large part to the names of
|
|
the refcount-related API and to how the documentation explains those
|
|
API (and refcounting in general).
|
|
|
|
Between the names and the docs, we can clearly see answers
|
|
to the following questions:
|
|
|
|
* what behavior do users expect?
|
|
* what guarantees do we make?
|
|
* do we indicate how to interpret the refcount value they receive?
|
|
* what are the use cases under which a user would set an object's
|
|
refcount to a specific value?
|
|
* are users setting the refcount of objects they did not create?
|
|
|
|
As part of this proposal, we must make sure that users can clearly
|
|
understand on which parts of the refcount behavior they can rely and
|
|
which are considered implementation details. Specifically, they should
|
|
use the existing public refcount-related API and the only refcount
|
|
values with any meaning are 0 and 1. (Some code relies on 1 as an
|
|
indicator that the object can be safely modified.) All other values
|
|
are considered "not 0 or 1".
|
|
|
|
This information will be clarified
|
|
in the `documentation <Documentation_>`_.
|
|
|
|
Arguably, the existing refcount-related API should be modified to reflect
|
|
what we want users to expect. Something like the following:
|
|
|
|
* ``Py_INCREF()`` -> ``Py_ACQUIRE_REF()`` (or only support ``Py_NewRef()``)
|
|
* ``Py_DECREF()`` -> ``Py_RELEASE_REF()``
|
|
* ``Py_REFCNT()`` -> ``Py_HAS_REFS()``
|
|
* ``Py_SET_REFCNT()`` -> ``Py_RESET_REFS()`` and ``Py_SET_NO_REFS()``
|
|
|
|
However, such a change is not a part of this proposal. It is included
|
|
here to demonstrate the tighter focus for user expectations that would
|
|
benefit this change.
|
|
|
|
Constraints
|
|
-----------
|
|
|
|
* ensure that otherwise immutable objects can be truly immutable
|
|
* minimize performance penalty for normal Python use cases
|
|
* be careful when immortalizing objects that we don't actually expect
|
|
to persist until runtime finalization.
|
|
* be careful when immortalizing objects that are not otherwise immutable
|
|
* ``__del__`` and weakrefs must continue working properly
|
|
|
|
Regarding "truly" immutable objects, this PEP doesn't impact the
|
|
effective immutability of any objects, other than the per-object
|
|
runtime state (e.g. refcount). So whether or not some immortal object
|
|
is truly (or even effectively) immutable can only be settled separately
|
|
from this proposal. For example, str objects are generally considered
|
|
immutable, but ``PyUnicodeObject`` holds some lazily cached data. This
|
|
PEP has no influence on how that state affects str immutability.
|
|
|
|
Immortal Mutable Objects
|
|
------------------------
|
|
|
|
Any object can be marked as immortal. We do not propose any
|
|
restrictions or checks. However, in practice the value of making an
|
|
object immortal relates to its mutability and depends on the likelihood
|
|
it would be used for a sufficient portion of the application's lifetime.
|
|
Marking a mutable object as immortal can make sense in some situations.
|
|
|
|
Many of the use cases for immortal objects center on immutability, so
|
|
that threads can safely and efficiently share such objects without
|
|
locking. For this reason a mutable object, like a dict or list, would
|
|
never be shared (and thus no immortality). However, immortality may
|
|
be appropriate if there is sufficient guarantee that the normally
|
|
mutable object won't actually be modified.
|
|
|
|
On the other hand, some mutable objects will never be shared between
|
|
threads (at least not without a lock like the GIL). In some cases it
|
|
may be practical to make some of those immortal too. For example,
|
|
``sys.modules`` is a per-interpreter dict that we do not expect to
|
|
ever get freed until the corresponding interpreter is finalized
|
|
(assuming it isn't replaced). By making it immortal, we would
|
|
no longer incur the extra overhead during incref/decref.
|
|
|
|
We explore this idea further in the `mitigations`_ section below.
|
|
|
|
Implicitly Immortal Objects
|
|
---------------------------
|
|
|
|
If an immortal object holds a reference to a normal (mortal) object
|
|
then that held object is effectively immortal. This is because that
|
|
object's refcount can never reach 0 until the immortal object releases
|
|
it.
|
|
|
|
Examples:
|
|
|
|
* containers like ``dict`` and ``list``
|
|
* objects that hold references internally like ``PyTypeObject`` with
|
|
its ``tp_subclasses`` and ``tp_weaklist``
|
|
* an object's type (held in ``ob_type``)
|
|
|
|
Such held objects are thus implicitly immortal for as long as they are
|
|
held. In practice, this should have no real consequences since it
|
|
really isn't a change in behavior. The only difference is that the
|
|
immortal object (holding the reference) doesn't ever get cleaned up.
|
|
|
|
We do not propose that such implicitly immortal objects be changed
|
|
in any way. They should not be explicitly marked as immortal just
|
|
because they are held by an immortal object. That would provide
|
|
no advantage over doing nothing.
|
|
|
|
Un-Immortalizing Objects
|
|
------------------------
|
|
|
|
This proposal does not include any mechanism for taking an immortal
|
|
object and returning it to a "normal" condition. Currently there
|
|
is no need for such an ability.
|
|
|
|
On top of that, the obvious approach is to simply set the refcount
|
|
to a small value. However, at that point there is no way in knowing
|
|
which value would be safe. Ideally we'd set it to the value that it
|
|
would have been if it hadn't been made immortal. However, that value
|
|
will have long been lost. Hence the complexities involved make it less
|
|
likely that an object could safely be un-immortalized, even if we
|
|
had a good reason to do so.
|
|
|
|
_Py_IMMORTAL_REFCNT
|
|
-------------------
|
|
|
|
We will add two internal constants::
|
|
|
|
_Py_IMMORTAL_BIT - has the top-most available bit set (e.g. 2^62)
|
|
_Py_IMMORTAL_REFCNT - has the two top-most available bits set
|
|
|
|
The actual top-most bit depends on existing uses for refcount bits,
|
|
e.g. the sign bit or some GC uses. We will use the highest bit possible
|
|
after consideration of existing uses.
|
|
|
|
The refcount for immortal objects will be set to ``_Py_IMMORTAL_REFCNT``
|
|
(meaning the value will be halfway between ``_Py_IMMORTAL_BIT`` and the
|
|
value at the next highest bit). However, to check if an object is
|
|
immortal we will compare (bitwise-and) its refcount against just
|
|
``_Py_IMMORTAL_BIT``.
|
|
|
|
The difference means that an immortal object will still be considered
|
|
immortal, even if somehow its refcount were modified (e.g. by an older
|
|
stable ABI extension).
|
|
|
|
Note that top two bits of the refcount are already reserved for other
|
|
uses. That's why we are using the third top-most bit.
|
|
|
|
The implementation is also open to using other values for the immortal
|
|
bit, such as the sign bit or 2^31 (for saturated refcounts on 64-bit).
|
|
|
|
Affected API
|
|
------------
|
|
|
|
API that will now ignore immortal objects:
|
|
|
|
* (public) ``Py_INCREF()``
|
|
* (public) ``Py_DECREF()``
|
|
* (public) ``Py_SET_REFCNT()``
|
|
* (private) ``_Py_NewReference()``
|
|
|
|
API that exposes refcounts (unchanged but may now return large values):
|
|
|
|
* (public) ``Py_REFCNT()``
|
|
* (public) ``sys.getrefcount()``
|
|
|
|
(Note that ``_Py_RefTotal``, and consequently ``sys.gettotalrefcount()``,
|
|
will not be affected.)
|
|
|
|
TODO: clarify the status of ``_Py_RefTotal``.
|
|
|
|
Also, immortal objects will not participate in GC.
|
|
|
|
Immortal Global Objects
|
|
-----------------------
|
|
|
|
All runtime-global (builtin) objects will be made immortal.
|
|
That includes the following:
|
|
|
|
* singletons (``None``, ``True``, ``False``, ``Ellipsis``, ``NotImplemented``)
|
|
* all static types (e.g. ``PyLong_Type``, ``PyExc_Exception``)
|
|
* all static objects in ``_PyRuntimeState.global_objects`` (e.g. identifiers,
|
|
small ints)
|
|
|
|
The question of making the full objects actually immutable (e.g.
|
|
for per-interpreter GIL) is not in the scope of this PEP.
|
|
|
|
Object Cleanup
|
|
--------------
|
|
|
|
In order to clean up all immortal objects during runtime finalization,
|
|
we must keep track of them.
|
|
|
|
For GC objects ("containers") we'll leverage the GC's permanent
|
|
generation by pushing all immortalized containers there. During
|
|
runtime shutdown, the strategy will be to first let the runtime try
|
|
to do its best effort of deallocating these instances normally. Most
|
|
of the module deallocation will now be handled by
|
|
``pylifecycle.c:finalize_modules()`` where we clean up the remaining
|
|
modules as best as we can. It will change which modules are available
|
|
during ``__del__``, but that's already explicitly undefined behavior
|
|
in the docs. Optionally, we could do some topological ordering
|
|
to guarantee that user modules will be deallocated first before
|
|
the stdlib modules. Finally, anything left over (if any) can be found
|
|
through the permanent generation GC list which we can clear
|
|
after ``finalize_modules()`` is done.
|
|
|
|
For non-container objects, the tracking approach will vary on a
|
|
case-by-case basis. In nearly every case, each such object is directly
|
|
accessible on the runtime state, e.g. in a ``_PyRuntimeState`` or
|
|
``PyInterpreterState`` field. We may need to add a tracking mechanism
|
|
to the runtime state for a small number of objects.
|
|
|
|
None of the cleanup will have a significant effect on performance.
|
|
|
|
.. _mitigations:
|
|
|
|
Performance Regression Mitigations
|
|
----------------------------------
|
|
|
|
In the interest of clarity, here are some of the ways we are going
|
|
to try to recover some of the `4% performance <Performance_>`_
|
|
we lose with the naive implementation of immortal objects.
|
|
|
|
Note that none of this section is actually part of the proposal.
|
|
|
|
at the end of runtime init, mark all objects as immortal
|
|
''''''''''''''''''''''''''''''''''''''''''''''''''''''''
|
|
|
|
We can apply the concept from
|
|
`Immortal Mutable Objects`_ in the pursuit of getting back some of
|
|
that 4% performance we lose with the naive implementation of immortal
|
|
objects. At the end of runtime init we can mark *all* objects as
|
|
immortal and avoid the extra cost in incref/decref. We only need
|
|
to worry about immutability with objects that we plan on sharing
|
|
between threads without a GIL.
|
|
|
|
drop unnecessary hard-coded refcount operations
|
|
'''''''''''''''''''''''''''''''''''''''''''''''
|
|
|
|
Parts of the C-API interact specifically with objects that we know
|
|
to be immortal, like ``Py_RETURN_NONE``. Such functions and macros
|
|
can be updated to drop any refcount operations.
|
|
|
|
specialize for immortal objects in the eval loop
|
|
''''''''''''''''''''''''''''''''''''''''''''''''
|
|
|
|
There are opportunities to optimize operations in the eval loop
|
|
involving specific known immortal objects (e.g. ``None``). The
|
|
general mechanism is described in :pep:`659`. Also see `Pyston`_.
|
|
|
|
other possibilities
|
|
'''''''''''''''''''
|
|
|
|
* mark every interned string as immortal
|
|
* mark the "interned" dict as immortal if shared else share all interned strings
|
|
* (Larry,MAL) mark all constants unmarshalled for a module as immortal
|
|
* (Larry,MAL) allocate (immutable) immortal objects in their own memory page(s)
|
|
* saturated refcounts using the 32 least-significant bits
|
|
|
|
Solutions for Accidental De-Immortalization
|
|
-------------------------------------------
|
|
|
|
In the `Accidental De-Immortalizing`_ section we outlined a possible
|
|
negative consequence of immortal objects. Here we look at some
|
|
of the options to deal with that.
|
|
|
|
Note that we enumerate solutions here to illustrate that satisfactory
|
|
options are available, rather than to dictate how the problem will
|
|
be solved.
|
|
|
|
Also note the following:
|
|
|
|
* this only matters in the 32-bit stable-ABI case
|
|
* it only affects immortal objects
|
|
* there are no user-defined immortal objects, only built-in types
|
|
* most immortal objects will be statically allocated
|
|
(and thus already must fail if ``tp_dealloc()`` is called)
|
|
* only a handful of immortal objects will be used often enough
|
|
to possibly face this problem in practice (e.g. ``None``)
|
|
* the main problem to solve is crashes coming from ``tp_dealloc()``
|
|
|
|
One fundamental observation for a solution is that we can reset
|
|
an immortal object's refcount to ``_Py_IMMORTAL_REFCNT``
|
|
when some condition is met.
|
|
|
|
With all that in mind, a simple, yet effective, solution would be
|
|
to reset an immortal object's refcount in ``tp_dealloc()``.
|
|
``NoneType`` and ``bool`` already have a ``tp_dealloc()`` that calls
|
|
``Py_FatalError()`` if triggered. The same goes for other types based
|
|
on certain conditions, like ``PyUnicodeObject`` (depending on
|
|
``unicode_is_singleton()``), ``PyTupleObject``, and ``PyTypeObject``.
|
|
In fact, the same check is important for all statically declared object.
|
|
For those types, we would instead reset the refcount. For the
|
|
remaining cases we would introduce the check. In all cases,
|
|
the overhead of the check in ``tp_dealloc()`` should be too small
|
|
to matter.
|
|
|
|
Other (less practical) solutions:
|
|
|
|
* periodically reset the refcount for immortal objects
|
|
* only do that for high-use objects
|
|
* only do it if a stable-ABI extension has been imported
|
|
* provide a runtime flag for disabling immortality
|
|
|
|
(`The discussion thread <https://mail.python.org/archives/list/python-dev@python.org/message/OXAYWH47ZGLOWXTNKCIW4YE5PXGHNT4Y/>`__
|
|
has further detail.)
|
|
|
|
Regardless of the solution we end up with, we can do something else
|
|
later if necessary.
|
|
|
|
TODO: Add a note indicating that the implemented solution does not
|
|
affect the overall ~performance-neutral~ outcome.
|
|
|
|
Documentation
|
|
-------------
|
|
|
|
The immortal objects behavior and API are internal, implementation
|
|
details and will not be added to the documentation.
|
|
|
|
However, we will update the documentation to make public guarantees
|
|
about refcount behavior more clear. That includes, specifically:
|
|
|
|
* ``Py_INCREF()`` - change "Increment the reference count for object o."
|
|
to "Indicate taking a new reference to object o."
|
|
* ``Py_DECREF()`` - change "Decrement the reference count for object o."
|
|
to "Indicate no longer using a previously taken reference to object o."
|
|
* similar for ``Py_XINCREF()``, ``Py_XDECREF()``, ``Py_NewRef()``,
|
|
``Py_XNewRef()``, ``Py_Clear()``
|
|
* ``Py_REFCNT()`` - add "The refcounts 0 and 1 have specific meanings
|
|
and all others only mean code somewhere is using the object,
|
|
regardless of the value.
|
|
0 means the object is not used and will be cleaned up.
|
|
1 means code holds exactly a single reference."
|
|
* ``Py_SET_REFCNT()`` - refer to ``Py_REFCNT()`` about how values over 1
|
|
may be substituted with some over value
|
|
|
|
We *may* also add a note about immortal objects to the following,
|
|
to help reduce any surprise users may have with the change:
|
|
|
|
* ``Py_SET_REFCNT()`` (a no-op for immortal objects)
|
|
* ``Py_REFCNT()`` (value may be surprisingly large)
|
|
* ``sys.getrefcount()`` (value may be surprisingly large)
|
|
|
|
Other API that might benefit from such notes are currently undocumented.
|
|
We wouldn't add such a note anywhere else (including for ``Py_INCREF()``
|
|
and ``Py_DECREF()``) since the feature is otherwise transparent to users.
|
|
|
|
|
|
Reference Implementation
|
|
========================
|
|
|
|
The implementation is proposed on GitHub:
|
|
|
|
https://github.com/python/cpython/pull/19474
|
|
|
|
|
|
Open Issues
|
|
===========
|
|
|
|
* how realistic is the `Accidental De-Immortalizing`_ concern?
|
|
|
|
|
|
References
|
|
==========
|
|
|
|
.. _Pyston: https://mail.python.org/archives/list/python-dev@python.org/message/JLHRTBJGKAENPNZURV4CIJSO6HI62BV3/
|
|
|
|
.. _Facebook: https://www.facebook.com/watch/?v=437636037237097&t=560
|
|
|
|
Prior Art
|
|
---------
|
|
|
|
* `Pyston`_
|
|
|
|
Discussions
|
|
-----------
|
|
|
|
This was discussed in December 2021 on python-dev:
|
|
|
|
* https://mail.python.org/archives/list/python-dev@python.org/thread/7O3FUA52QGTVDC6MDAV5WXKNFEDRK5D6/#TBTHSOI2XRWRO6WQOLUW3X7S5DUXFAOV
|
|
* https://mail.python.org/archives/list/python-dev@python.org/thread/PNLBJBNIQDMG2YYGPBCTGOKOAVXRBJWY
|
|
|
|
Runtime Object State
|
|
--------------------
|
|
|
|
Here is the internal state that the CPython runtime keeps
|
|
for each Python object:
|
|
|
|
* `PyObject.ob_refcnt`_: the object's `refcount <refcounting_>`_
|
|
* `_PyGC_Head <PyGC_Head>`_: (optional) the object's node in a list of `"GC" objects <refcounting_>`_
|
|
* `_PyObject_HEAD_EXTRA <PyObject_HEAD_EXTRA>`_: (optional) the object's node in the list of heap objects
|
|
|
|
``ob_refcnt`` is part of the memory allocated for every object.
|
|
However, ``_PyObject_HEAD_EXTRA`` is allocated only if CPython was built
|
|
with ``Py_TRACE_REFS`` defined. ``PyGC_Head`` is allocated only if the
|
|
object's type has ``Py_TPFLAGS_HAVE_GC`` set. Typically this is only
|
|
container types (e.g. ``list``). Also note that ``PyObject.ob_refcnt``
|
|
and ``_PyObject_HEAD_EXTRA`` are part of ``PyObject_HEAD``.
|
|
|
|
.. _PyObject.ob_refcnt: https://github.com/python/cpython/blob/80a9ba537f1f1666a9e6c5eceef4683f86967a1f/Include/object.h#L107
|
|
.. _PyGC_Head: https://github.com/python/cpython/blob/80a9ba537f1f1666a9e6c5eceef4683f86967a1f/Include/internal/pycore_gc.h#L11-L20
|
|
.. _PyObject_HEAD_EXTRA: https://github.com/python/cpython/blob/80a9ba537f1f1666a9e6c5eceef4683f86967a1f/Include/object.h#L68-L72
|
|
|
|
.. _refcounting:
|
|
|
|
Reference Counting, with Cyclic Garbage Collection
|
|
--------------------------------------------------
|
|
|
|
Garbage collection is a memory management feature of some programming
|
|
languages. It means objects are cleaned up (e.g. memory freed)
|
|
once they are no longer used.
|
|
|
|
Refcounting is one approach to garbage collection. The language runtime
|
|
tracks how many references are held to an object. When code takes
|
|
ownership of a reference to an object or releases it, the runtime
|
|
is notified and it increments or decrements the refcount accordingly.
|
|
When the refcount reaches 0, the runtime cleans up the object.
|
|
|
|
With CPython, code must explicitly take or release references using
|
|
the C-API's ``Py_INCREF()`` and ``Py_DECREF()``. These macros happen
|
|
to directly modify the object's refcount (unfortunately, since that
|
|
causes ABI compatibility issues if we want to change our garbage
|
|
collection scheme). Also, when an object is cleaned up in CPython,
|
|
it also releases any references (and resources) it owns
|
|
(before it's memory is freed).
|
|
|
|
Sometimes objects may be involved in reference cycles, e.g. where
|
|
object A holds a reference to object B and object B holds a reference
|
|
to object A. Consequently, neither object would ever be cleaned up
|
|
even if no other references were held (i.e. a memory leak). The
|
|
most common objects involved in cycles are containers.
|
|
|
|
CPython has dedicated machinery to deal with reference cycles, which
|
|
we call the "cyclic garbage collector", or often just
|
|
"garbage collector" or "GC". Don't let the name confuse you.
|
|
It only deals with breaking reference cycles.
|
|
|
|
See the docs for a more detailed explanation of refcounting
|
|
and cyclic garbage collection:
|
|
|
|
* https://docs.python.org/3.11/c-api/intro.html#reference-counts
|
|
* https://docs.python.org/3.11/c-api/refcounting.html
|
|
* https://docs.python.org/3.11/c-api/typeobj.html#c.PyObject.ob_refcnt
|
|
* https://docs.python.org/3.11/c-api/gcsupport.html
|
|
|
|
|
|
Copyright
|
|
=========
|
|
|
|
This document is placed in the public domain or under the
|
|
CC0-1.0-Universal license, whichever is more permissive.
|
|
|
|
|
|
|
|
..
|
|
Local Variables:
|
|
mode: indented-text
|
|
indent-tabs-mode: nil
|
|
sentence-end-double-space: t
|
|
fill-column: 70
|
|
coding: utf-8
|
|
End:
|