PEP 556: Threaded garbage collection (#399)

2017-09-08 16:24:56 +02:00 · 2017-09-08 16:24:56 +02:00 · cb7ebc00b4
parent a70a538ec7
commit cb7ebc00b4
1 changed files with 364 additions and 0 deletions
--- a/pep-0556.rst
+++ b/pep-0556.rst
@ -0,0 +1,364 @@
+PEP: 556
+Title: Threaded garbage collection
+Author: Antoine Pitrou <solipsis@pitrou.net>
+Status: Draft
+Type: Standards Track
+Content-Type: text/x-rst
+Created: 2017-09-08
+Python-Version: 3.7
+Post-History:
+
+
+Abstract
+========
+
+This PEP proposes a new optional mode of operation for CPython's cyclic
+garbage collector (GC) where implicit (i.e. opportunistic) collections
+happen in a dedicated thread rather than synchronously.
+
+
+Terminology
+===========
+
+An "implicit" GC run (or "implicit" collection) is one that is triggered
+opportunistically based on a certain heuristic computed over allocation
+statistics, whenever a new allocation is requested.  Details of the
+heuristic are not relevant to this PEP, as it does not propose to change it.
+
+An "explicit" GC run (or "explicit" collection) is one that is requested
+programmatically by an API call such as ``gc.collect``.
+
+"Threaded" refers to the fact that GC runs happen in a dedicated thread
+separate from sequential execution of application code.  It does not mean
+"concurrent" (the Global Interpreter Lock, or GIL, still serializes
+execution among Python threads *including* the dedicated GC thread)
+nor "parallel" (the GC is not able to distribute its work onto several
+threads at once to lower wall-clock latencies of GC runs).
+
+
+Rationale
+=========
+
+The mode of operation for the GC has always been to perform implicit
+collections synchronously.  That is, whenever the aforementioned heuristic
+is activated, execution of application code in the current thread is
+suspended and the GC is launched in order to reclaim dead reference
+cycles.
+
+There is a catch, though.  Over the course of reclaiming dead reference
+cycles (and any ancillary objects hanging at those cycles), the GC can
+execute arbitrary finalization code in the form of ``__del__`` methods
+and ``weakref`` callbacks.  Over the years, Python has been used for more
+and more sophisticated purposes, and it is increasinly common for
+finalization code to perform complex tasks, for example in distributed
+systems where loss of an object may require notifying other (logical
+or physical) nodes.
+
+Interrupting application code at arbitrary points to execute finalization
+code that may rely on a consistent internal state and/or on acquiring
+synchronization primitives give rise to reentrancy issues that even the
+most seasoned experts have trouble fixing properly [1]_.
+
+This PEP bases itself on the observation that, despite the apparent
+similarities, same-thread reentrancy is a fundamentally harder
+problem than multi-thread synchronization.  Instead of letting each
+developer or library author struggle with extremely hard reentrancy
+issues, one by one, this PEP proposes to allow the GC to run in a
+separate thread where well-known multi-thread synchronization practices
+are sufficient.
+
+
+Proposal
+========
+
+Under this PEP, the GC has two modes of operation:
+
+* "serial", which is the default and legacy mode, where an implicit GC
+  run is performed immediately in the thread that detects such an implicit
+  run is desired (based on the aforementioned allocation heuristic).
+
+* "threaded", which can be explicitly enabled at runtime on a per-process
+  basis, where implicit GC runs are *scheduled* whenever the allocation
+  heuristic is triggered, but run in a dedicated background thread.
+
+Hard reentrancy problems which plague sophisticated uses of finalization
+callbacks in the "serial" mode become relatively easy multi-thread
+synchronization problems in the "threaded" mode of operation.
+
+The GC also traditionally allows for explicit GC runs, using the Python
+API ``gc.collect`` and the C API ``PyGC_Collect``.  The visible semantics
+of these two APIs are left unchanged: they perform a GC run immediately
+when called, and only return when the GC run is finished.
+
+
+New public APIs
+===============
+
+Two new Python APIs are added to the ``gc`` module:
+
+* ``gc.set_mode(mode)`` sets the current mode of operation (either "serial"
+  or "threaded").  If setting to "serial" and the current mode is
+  "threaded", then the function also waits for the GC thread to end.
+
+* ``gc.get_mode()`` returns the current mode of operation.
+
+It is allowed to switch back and forth between modes of operation.
+
+
+Intended use
+============
+
+Given the per-process nature of the switch and its repercussions on
+semantics of all finalization callbacks, it is recommended that it is
+set at the beginning of an application's code (and/or in initializers
+for child processes e.g. when using ``multiprocessing``).  Library functions
+should probably not mess with this setting, just as they shouldn't call
+``gc.enable`` or ``gc.disable``, but there's nothing to prevent them from
+doing so.
+
+
+Internal details
+================
+
+``gc`` module
+-------------
+
+An internal flag ``gc_is_threaded`` is added, telling whether GC is serial
+or threaded.
+
+An internal structure ``gc_mutex`` is added to avoid two GC runs at once:
+
+.. code-block:: c
+
+   static struct {
+       PyThread_type_lock collecting;  /* taken when collecting */
+       PyThreadState *owner;  /* whichever thread is currently collecting
+                                 (NULL if no collection is taking place) */
+   } gc_mutex;
+
+An internal structure ``gc_thread`` is added to handle synchronization with
+the GC thread:
+
+.. code-block:: c
+
+   static struct {
+      PyThread_type_lock wakeup; /* acts as an event
+                                    to wake up the GC thread */
+      int collection_requested; /* non-zero if collection requested */
+      PyThread_type_lock done; /* acts as an event signaling
+                                  the GC thread has exited */
+   } gc_thread;
+
+
+``threading`` module
+--------------------
+
+Two private functions are added to the ``threading`` module:
+
+* ``threading._ensure_dummy_thread(name)`` creates and registers a ``Thread``
+  instance for the current thread with the given *name*, and returns it.
+
+* ``threading._remove_dummy_thread(thread)`` removes the given *thread*
+  (as returned by ``_ensure_dummy_thread``) from the threading module's
+  internal state.
+
+The purpose of these two functions is to improve debugging and introspection
+by letting ``threading.current_thread()`` return a more meaningfully-named
+object when called inside a finalization callback in the GC thread.
+
+
+Pseudo-code
+===========
+
+Here is a proposed pseudo-code for the main primitives, public and internal,
+required for implementing this PEP.  All of them will be implemented in C
+and live inside the ``gc`` module, unless otherwise noted:
+
+.. code-block:: python
+
+   def collect_with_callback(generation):
+       """
+       Collect up to the given *generation*.
+       """
+       # Same code as currently (see collect_with_callback() in gcmodule.c)
+
+
+   def collect_generations():
+       """
+       Collect as many generations as desired by the heuristic.
+       """
+       # Same code as currently (see collect_generations() in gcmodule.c)
+
+
+   def lock_and_collect(generation=-1):
+       me = PyThreadState_GET()
+       if gc_mutex.owner == me:
+           # reentrant GC collection request, bail out
+           return
+       Py_BEGIN_ALLOW_THREADS
+       gc_mutex.lock.acquire()
+       Py_END_ALLOW_THREADS
+       gc_mutex.owner = me
+       try:
+           if generation >= 0:
+               return collect_generation(generation)
+           else:
+               return collect_generations()
+       finally:
+           gc_mutex.owner = NULL
+           gc_mutex.lock.release()
+
+
+   def schedule_gc_request():
+       """
+       Ask the GC thread to run an implicit collection.
+       """
+       assert gc_is_threaded == True
+       # Note this is extremely fast if a collection is already requested
+       if gc_thread.collection_requested == False:
+           gc_thread.collection_requested = True
+           gc_thread.wakeup.release()
+
+
+   def is_implicit_gc_desired():
+       """
+       Whether an implicit GC run is currently desired based on allocation
+       stats.  Return a generation number, or -1 if none desired.
+       """
+       # Same heuristic as currently (see _PyObject_GC_Alloc in gcmodule.c)
+
+
+   def PyGC_Malloc():
+       # Update allocation statistics (same code as currently, omitted for brievity)
+       if is_implicit_gc_desired():
+           if gc_is_threaded:
+               schedule_gc_request()
+           else:
+               lock_and_collect()
+       # Go ahead with allocation (same code as currently, omitted for brievity)
+
+
+   def gc_thread(interp_state):
+       """
+       Dedicated loop for threaded GC.
+       """
+       # Init Python thread state (omitted, see t_bootstrap in _threadmodule.c)
+       # Optional: init thread in Python threading module, for better introspection
+       me = threading._ensure_dummy_thread(name="GC thread")
+
+       while gc_is_threaded == True:
+           Py_BEGIN_ALLOW_THREADS
+           gc_thread.wakeup.acquire()
+           Py_END_ALLOW_THREADS
+           if gc_thread.collection_requested != 0:
+               gc_thread.collection_requested = 0
+               lock_and_collect(generation=-1)
+
+       threading._remove_dummy_thread(me)
+       # Signal we're exiting
+       gc_thread.done.release()
+       # Free Python thread state (omitted)
+
+
+   def gc.set_mode(mode):
+       """
+       Set current GC mode.  This is a process-global setting.
+       """
+       if mode == "threaded":
+           if not gc_is_threaded == False:
+               # Launch thread
+               gc_thread.done.acquire(block=False)  # should not fail
+               gc_is_threaded = True
+               PyThread_start_new_thread(gc_thread)
+       elif mode == "serial":
+           if gc_is_threaded == True:
+               # Wake up thread, asking it to end
+               gc_is_threaded = False
+               gc_thread..wakeup.release()
+               # Wait for thread exit
+               Py_BEGIN_ALLOW_THREADS
+               gc_thread.done.acquire()
+               Py_END_ALLOW_THREADS
+               gc_thread.done.release()
+       else:
+           raise ValueError("unsupported mode %r" % (mode,))
+
+
+   def gc.get_mode(mode):
+       """
+       Get current GC mode.
+       """
+       return "threaded" if gc_is_threaded else "serial"
+
+
+   def gc.collect(generation=2):
+       """
+       Schedule collection of the given generation and wait for it to
+       finish.
+       """
+       return lock_and_collect(collection)
+
+
+Discussion
+==========
+
+Default mode
+------------
+
+One may wonder whether the default mode should simply be changed to "threaded".
+For multi-threaded applications, it would probably not be a problem:
+those applications must already be prepared for finalization handlers to
+be run in arbitrary threads.  In single-thread applications, however, it
+is currently guaranteed that finalizers will always be called in the main
+thread.  Breaking this property may induce subtle behaviour changes or bugs,
+for example if finalizers rely on some thread-local values.
+
+Explicit collections
+--------------------
+
+One may ask why explicit collections should not also be delegated to the
+background thread.  The answer is it doesn't really matter: since
+``gc.collect`` and ``PyGC_Collect`` actually *wait* for the collection to
+end (breaking this property would break compatibility), delegating the
+actual work to a background thread wouldn't ease synchronization with the
+thread requesting an explicit collection.
+
+In the end, this PEP choses the behaviour that seems simpler to implement
+based on the pseudo-code above.
+
+
+Open issues
+===========
+
+``gc.set_mode`` should probably be protected against multiple concurrent
+invocations.  Also, it should raise when called from *inside* a GC run
+(i.e. from a finalizer).
+
+
+Implementation
+==============
+
+No actual implementation exists as of yet.
+
+
+References
+==========
+
+.. [1] https://bugs.python.org/issue14976
+
+
+Copyright
+=========
+
+This document has been placed in the public domain.
+
+
+
+..
+   Local Variables:
+   mode: indented-text
+   indent-tabs-mode: nil
+   sentence-end-double-space: t
+   fill-column: 70
+   coding: utf-8
+   End: