PEP 556: Threaded garbage collection (#399)
This commit is contained in:
parent
a70a538ec7
commit
cb7ebc00b4
|
@ -0,0 +1,364 @@
|
|||
PEP: 556
|
||||
Title: Threaded garbage collection
|
||||
Author: Antoine Pitrou <solipsis@pitrou.net>
|
||||
Status: Draft
|
||||
Type: Standards Track
|
||||
Content-Type: text/x-rst
|
||||
Created: 2017-09-08
|
||||
Python-Version: 3.7
|
||||
Post-History:
|
||||
|
||||
|
||||
Abstract
|
||||
========
|
||||
|
||||
This PEP proposes a new optional mode of operation for CPython's cyclic
|
||||
garbage collector (GC) where implicit (i.e. opportunistic) collections
|
||||
happen in a dedicated thread rather than synchronously.
|
||||
|
||||
|
||||
Terminology
|
||||
===========
|
||||
|
||||
An "implicit" GC run (or "implicit" collection) is one that is triggered
|
||||
opportunistically based on a certain heuristic computed over allocation
|
||||
statistics, whenever a new allocation is requested. Details of the
|
||||
heuristic are not relevant to this PEP, as it does not propose to change it.
|
||||
|
||||
An "explicit" GC run (or "explicit" collection) is one that is requested
|
||||
programmatically by an API call such as ``gc.collect``.
|
||||
|
||||
"Threaded" refers to the fact that GC runs happen in a dedicated thread
|
||||
separate from sequential execution of application code. It does not mean
|
||||
"concurrent" (the Global Interpreter Lock, or GIL, still serializes
|
||||
execution among Python threads *including* the dedicated GC thread)
|
||||
nor "parallel" (the GC is not able to distribute its work onto several
|
||||
threads at once to lower wall-clock latencies of GC runs).
|
||||
|
||||
|
||||
Rationale
|
||||
=========
|
||||
|
||||
The mode of operation for the GC has always been to perform implicit
|
||||
collections synchronously. That is, whenever the aforementioned heuristic
|
||||
is activated, execution of application code in the current thread is
|
||||
suspended and the GC is launched in order to reclaim dead reference
|
||||
cycles.
|
||||
|
||||
There is a catch, though. Over the course of reclaiming dead reference
|
||||
cycles (and any ancillary objects hanging at those cycles), the GC can
|
||||
execute arbitrary finalization code in the form of ``__del__`` methods
|
||||
and ``weakref`` callbacks. Over the years, Python has been used for more
|
||||
and more sophisticated purposes, and it is increasinly common for
|
||||
finalization code to perform complex tasks, for example in distributed
|
||||
systems where loss of an object may require notifying other (logical
|
||||
or physical) nodes.
|
||||
|
||||
Interrupting application code at arbitrary points to execute finalization
|
||||
code that may rely on a consistent internal state and/or on acquiring
|
||||
synchronization primitives give rise to reentrancy issues that even the
|
||||
most seasoned experts have trouble fixing properly [1]_.
|
||||
|
||||
This PEP bases itself on the observation that, despite the apparent
|
||||
similarities, same-thread reentrancy is a fundamentally harder
|
||||
problem than multi-thread synchronization. Instead of letting each
|
||||
developer or library author struggle with extremely hard reentrancy
|
||||
issues, one by one, this PEP proposes to allow the GC to run in a
|
||||
separate thread where well-known multi-thread synchronization practices
|
||||
are sufficient.
|
||||
|
||||
|
||||
Proposal
|
||||
========
|
||||
|
||||
Under this PEP, the GC has two modes of operation:
|
||||
|
||||
* "serial", which is the default and legacy mode, where an implicit GC
|
||||
run is performed immediately in the thread that detects such an implicit
|
||||
run is desired (based on the aforementioned allocation heuristic).
|
||||
|
||||
* "threaded", which can be explicitly enabled at runtime on a per-process
|
||||
basis, where implicit GC runs are *scheduled* whenever the allocation
|
||||
heuristic is triggered, but run in a dedicated background thread.
|
||||
|
||||
Hard reentrancy problems which plague sophisticated uses of finalization
|
||||
callbacks in the "serial" mode become relatively easy multi-thread
|
||||
synchronization problems in the "threaded" mode of operation.
|
||||
|
||||
The GC also traditionally allows for explicit GC runs, using the Python
|
||||
API ``gc.collect`` and the C API ``PyGC_Collect``. The visible semantics
|
||||
of these two APIs are left unchanged: they perform a GC run immediately
|
||||
when called, and only return when the GC run is finished.
|
||||
|
||||
|
||||
New public APIs
|
||||
===============
|
||||
|
||||
Two new Python APIs are added to the ``gc`` module:
|
||||
|
||||
* ``gc.set_mode(mode)`` sets the current mode of operation (either "serial"
|
||||
or "threaded"). If setting to "serial" and the current mode is
|
||||
"threaded", then the function also waits for the GC thread to end.
|
||||
|
||||
* ``gc.get_mode()`` returns the current mode of operation.
|
||||
|
||||
It is allowed to switch back and forth between modes of operation.
|
||||
|
||||
|
||||
Intended use
|
||||
============
|
||||
|
||||
Given the per-process nature of the switch and its repercussions on
|
||||
semantics of all finalization callbacks, it is recommended that it is
|
||||
set at the beginning of an application's code (and/or in initializers
|
||||
for child processes e.g. when using ``multiprocessing``). Library functions
|
||||
should probably not mess with this setting, just as they shouldn't call
|
||||
``gc.enable`` or ``gc.disable``, but there's nothing to prevent them from
|
||||
doing so.
|
||||
|
||||
|
||||
Internal details
|
||||
================
|
||||
|
||||
``gc`` module
|
||||
-------------
|
||||
|
||||
An internal flag ``gc_is_threaded`` is added, telling whether GC is serial
|
||||
or threaded.
|
||||
|
||||
An internal structure ``gc_mutex`` is added to avoid two GC runs at once:
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
static struct {
|
||||
PyThread_type_lock collecting; /* taken when collecting */
|
||||
PyThreadState *owner; /* whichever thread is currently collecting
|
||||
(NULL if no collection is taking place) */
|
||||
} gc_mutex;
|
||||
|
||||
An internal structure ``gc_thread`` is added to handle synchronization with
|
||||
the GC thread:
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
static struct {
|
||||
PyThread_type_lock wakeup; /* acts as an event
|
||||
to wake up the GC thread */
|
||||
int collection_requested; /* non-zero if collection requested */
|
||||
PyThread_type_lock done; /* acts as an event signaling
|
||||
the GC thread has exited */
|
||||
} gc_thread;
|
||||
|
||||
|
||||
``threading`` module
|
||||
--------------------
|
||||
|
||||
Two private functions are added to the ``threading`` module:
|
||||
|
||||
* ``threading._ensure_dummy_thread(name)`` creates and registers a ``Thread``
|
||||
instance for the current thread with the given *name*, and returns it.
|
||||
|
||||
* ``threading._remove_dummy_thread(thread)`` removes the given *thread*
|
||||
(as returned by ``_ensure_dummy_thread``) from the threading module's
|
||||
internal state.
|
||||
|
||||
The purpose of these two functions is to improve debugging and introspection
|
||||
by letting ``threading.current_thread()`` return a more meaningfully-named
|
||||
object when called inside a finalization callback in the GC thread.
|
||||
|
||||
|
||||
Pseudo-code
|
||||
===========
|
||||
|
||||
Here is a proposed pseudo-code for the main primitives, public and internal,
|
||||
required for implementing this PEP. All of them will be implemented in C
|
||||
and live inside the ``gc`` module, unless otherwise noted:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
def collect_with_callback(generation):
|
||||
"""
|
||||
Collect up to the given *generation*.
|
||||
"""
|
||||
# Same code as currently (see collect_with_callback() in gcmodule.c)
|
||||
|
||||
|
||||
def collect_generations():
|
||||
"""
|
||||
Collect as many generations as desired by the heuristic.
|
||||
"""
|
||||
# Same code as currently (see collect_generations() in gcmodule.c)
|
||||
|
||||
|
||||
def lock_and_collect(generation=-1):
|
||||
me = PyThreadState_GET()
|
||||
if gc_mutex.owner == me:
|
||||
# reentrant GC collection request, bail out
|
||||
return
|
||||
Py_BEGIN_ALLOW_THREADS
|
||||
gc_mutex.lock.acquire()
|
||||
Py_END_ALLOW_THREADS
|
||||
gc_mutex.owner = me
|
||||
try:
|
||||
if generation >= 0:
|
||||
return collect_generation(generation)
|
||||
else:
|
||||
return collect_generations()
|
||||
finally:
|
||||
gc_mutex.owner = NULL
|
||||
gc_mutex.lock.release()
|
||||
|
||||
|
||||
def schedule_gc_request():
|
||||
"""
|
||||
Ask the GC thread to run an implicit collection.
|
||||
"""
|
||||
assert gc_is_threaded == True
|
||||
# Note this is extremely fast if a collection is already requested
|
||||
if gc_thread.collection_requested == False:
|
||||
gc_thread.collection_requested = True
|
||||
gc_thread.wakeup.release()
|
||||
|
||||
|
||||
def is_implicit_gc_desired():
|
||||
"""
|
||||
Whether an implicit GC run is currently desired based on allocation
|
||||
stats. Return a generation number, or -1 if none desired.
|
||||
"""
|
||||
# Same heuristic as currently (see _PyObject_GC_Alloc in gcmodule.c)
|
||||
|
||||
|
||||
def PyGC_Malloc():
|
||||
# Update allocation statistics (same code as currently, omitted for brievity)
|
||||
if is_implicit_gc_desired():
|
||||
if gc_is_threaded:
|
||||
schedule_gc_request()
|
||||
else:
|
||||
lock_and_collect()
|
||||
# Go ahead with allocation (same code as currently, omitted for brievity)
|
||||
|
||||
|
||||
def gc_thread(interp_state):
|
||||
"""
|
||||
Dedicated loop for threaded GC.
|
||||
"""
|
||||
# Init Python thread state (omitted, see t_bootstrap in _threadmodule.c)
|
||||
# Optional: init thread in Python threading module, for better introspection
|
||||
me = threading._ensure_dummy_thread(name="GC thread")
|
||||
|
||||
while gc_is_threaded == True:
|
||||
Py_BEGIN_ALLOW_THREADS
|
||||
gc_thread.wakeup.acquire()
|
||||
Py_END_ALLOW_THREADS
|
||||
if gc_thread.collection_requested != 0:
|
||||
gc_thread.collection_requested = 0
|
||||
lock_and_collect(generation=-1)
|
||||
|
||||
threading._remove_dummy_thread(me)
|
||||
# Signal we're exiting
|
||||
gc_thread.done.release()
|
||||
# Free Python thread state (omitted)
|
||||
|
||||
|
||||
def gc.set_mode(mode):
|
||||
"""
|
||||
Set current GC mode. This is a process-global setting.
|
||||
"""
|
||||
if mode == "threaded":
|
||||
if not gc_is_threaded == False:
|
||||
# Launch thread
|
||||
gc_thread.done.acquire(block=False) # should not fail
|
||||
gc_is_threaded = True
|
||||
PyThread_start_new_thread(gc_thread)
|
||||
elif mode == "serial":
|
||||
if gc_is_threaded == True:
|
||||
# Wake up thread, asking it to end
|
||||
gc_is_threaded = False
|
||||
gc_thread..wakeup.release()
|
||||
# Wait for thread exit
|
||||
Py_BEGIN_ALLOW_THREADS
|
||||
gc_thread.done.acquire()
|
||||
Py_END_ALLOW_THREADS
|
||||
gc_thread.done.release()
|
||||
else:
|
||||
raise ValueError("unsupported mode %r" % (mode,))
|
||||
|
||||
|
||||
def gc.get_mode(mode):
|
||||
"""
|
||||
Get current GC mode.
|
||||
"""
|
||||
return "threaded" if gc_is_threaded else "serial"
|
||||
|
||||
|
||||
def gc.collect(generation=2):
|
||||
"""
|
||||
Schedule collection of the given generation and wait for it to
|
||||
finish.
|
||||
"""
|
||||
return lock_and_collect(collection)
|
||||
|
||||
|
||||
Discussion
|
||||
==========
|
||||
|
||||
Default mode
|
||||
------------
|
||||
|
||||
One may wonder whether the default mode should simply be changed to "threaded".
|
||||
For multi-threaded applications, it would probably not be a problem:
|
||||
those applications must already be prepared for finalization handlers to
|
||||
be run in arbitrary threads. In single-thread applications, however, it
|
||||
is currently guaranteed that finalizers will always be called in the main
|
||||
thread. Breaking this property may induce subtle behaviour changes or bugs,
|
||||
for example if finalizers rely on some thread-local values.
|
||||
|
||||
Explicit collections
|
||||
--------------------
|
||||
|
||||
One may ask why explicit collections should not also be delegated to the
|
||||
background thread. The answer is it doesn't really matter: since
|
||||
``gc.collect`` and ``PyGC_Collect`` actually *wait* for the collection to
|
||||
end (breaking this property would break compatibility), delegating the
|
||||
actual work to a background thread wouldn't ease synchronization with the
|
||||
thread requesting an explicit collection.
|
||||
|
||||
In the end, this PEP choses the behaviour that seems simpler to implement
|
||||
based on the pseudo-code above.
|
||||
|
||||
|
||||
Open issues
|
||||
===========
|
||||
|
||||
``gc.set_mode`` should probably be protected against multiple concurrent
|
||||
invocations. Also, it should raise when called from *inside* a GC run
|
||||
(i.e. from a finalizer).
|
||||
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
No actual implementation exists as of yet.
|
||||
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
.. [1] https://bugs.python.org/issue14976
|
||||
|
||||
|
||||
Copyright
|
||||
=========
|
||||
|
||||
This document has been placed in the public domain.
|
||||
|
||||
|
||||
|
||||
..
|
||||
Local Variables:
|
||||
mode: indented-text
|
||||
indent-tabs-mode: nil
|
||||
sentence-end-double-space: t
|
||||
fill-column: 70
|
||||
coding: utf-8
|
||||
End:
|
Loading…
Reference in New Issue