PEP 734: Updates After Discussion (#3664)

2024-02-27 11:49:36 -07:00 · 2024-02-27 11:49:36 -07:00 · 13022a6d12
parent c55835e170
commit 13022a6d12
1 changed files with 223 additions and 155 deletions
--- a/peps/pep-0734.rst
+++ b/peps/pep-0734.rst
@ -25,9 +25,9 @@ This PEP proposes to add a new module, ``interpreters``, to support
 inspecting, creating, and running code in multiple interpreters in the
 current process.  This includes ``Interpreter`` objects that represent
 the underlying interpreters.  The module will also provide a basic
-``Queue`` class for communication between interpreters.  Finally, we
-will add a new ``concurrent.futures.InterpreterPoolExecutor`` based
-on the ``interpreters`` module.
+``Queue`` class for communication between interpreters.
+Finally, we will add a new ``concurrent.futures.InterpreterPoolExecutor``
+based on the ``interpreters`` module.


 Introduction
@ -92,7 +92,7 @@ Interpreters and Threads
 ------------------------

 Thread states are related to interpreter states in much the same way
-that OS threads and processes are related (at a hight level).  To
+that OS threads and processes are related (at a high level).  To
 begin with, the relationship is one-to-many.
 A thread state belongs to a single interpreter (and stores
 a pointer to it).  That thread state is never used for a different
@ -276,106 +276,6 @@ interpreters.  Without one, multiple interpreters are a much less
 useful feature.


-Rationale
-=========
-
-A Minimal API
-------------
-
-Since the core dev team has no real experience with
-how users will make use of multiple interpreters in Python code, this
-proposal purposefully keeps the initial API as lean and minimal as
-possible.  The objective is to provide a well-considered foundation
-on which further (more advanced) functionality may be added later,
-as appropriate.
-
-That said, the proposed design incorporates lessons learned from
-existing use of subinterpreters by the community, from existing stdlib
-modules, and from other programming languages.  It also factors in
-experience from using subinterpreters in the CPython test suite and
-using them in `concurrency benchmarks`_.
-
-.. _concurrency benchmarks:
-   https://github.com/ericsnowcurrently/concurrency-benchmarks
-
-Interpreter.prepare_main() Sets Multiple Variables
--------------------------------------------------
-
-``prepare_main()`` may be seen as a setter function of sorts.
-It supports setting multiple names at once,
-e.g. ``interp.prepare_main(spam=1, eggs=2)``, whereas most setters
-set one item at a time.  The main reason is for efficiency.
-
-To set a value in the interpreter's ``__main__.__dict__``, the
-implementation must first switch the OS thread to the identified
-interpreter, which involves some non-negligible overhead.  After
-setting the value it must switch back.
-Furthermore, there is some additional overhead to the mechanism
-by which it passes objects between interpreters, which can be
-reduced in aggregate if multiple values are set at once.
-
-Therefore, ``prepare_main()`` supports setting multiple
-values at once.
-
-Propagating Exceptions
----------------------
-
-An uncaught exception from a subinterpreter,
-via ``Interpreter.exec_sync()``,
-could either be (effectively) ignored, like ``threading.Thread()`` does,
-or propagated, like the builtin ``exec()`` does.  Since ``exec_sync()``
-is a synchronous operation, like the builtin ``exec()``,
-uncaught exceptions are propagated.
-
-However, such exceptions are not raised directly.  That's because
-interpreters are isolated from each other and must not share objects,
-including exceptions.  That could be addressed by raising a surrogate
-of the exception, whether a summary, a copy, or a proxy that wraps it.
-Any of those could preserve the traceback, which is useful for
-debugging.  The ``ExecFailure`` that gets raised
-is such a surrogate.
-
-There's another concern to consider.  If a propagated exception isn't
-immediately caught, it will bubble up through the call stack until
-caught (or not).  In the case that code somewhere else may catch it,
-it is helpful to identify that the exception came from a subinterpreter
-(i.e. a "remote" source), rather than from the current interpreter.
-That's why ``Interpreter.exec_sync()`` raises ``ExecFailure`` and why
-it is a plain ``Exception``, rather than a copy or proxy with a class
-that matches the original exception.  For example, an uncaught
-``ValueError`` from a subinterpreter would never get caught in a later
-``try: ... except ValueError: ...``.  Instead, ``ExecFailure``
-must be handled directly.
-
-Limited Object Sharing
----------------------
-
-As noted in `Interpreter Isolation`_, only a small number of builtin
-objects may be truly shared between interpreters.  In all other cases
-objects can only be shared indirectly, through copies or proxies.
-
-The set of objects that are shareable as copies through queues
-(and ``Interpreter.prepare_main()``) is limited for the sake of
-efficiency.
-
-Supporting sharing of *all* objects is possible (via pickle)
-but not part of this proposal.  For one thing, it's helpful to know
-that only an efficient implementation is being used.  Furthermore,
-for mutable objects pickling would violate the guarantee that "shared"
-objects be equivalent (and stay that way).
-
-Objects vs. ID Proxies
----------------------
-
-For both interpreters and queues, the low-level module makes use of
-proxy objects that expose the underlying state by their corresponding
-process-global IDs.  In both cases the state is likewise process-global
-and will be used by multiple interpreters.  Thus they aren't suitable
-to be implemented as ``PyObject``, which is only really an option for
-interpreter-specific data.  That's why the ``interpreters`` module
-instead provides objects that are weakly associated through the ID.
-
-
 Specification
 =============

@ -407,7 +307,7 @@ The module defines the following functions:
      for it.  The interpreter doesn't do anything on its own and is
      not inherently tied to any OS thread.  That only happens when
      something is actually run in the interpreter
-      (e.g. ``Interpreter.exec_sync()``), and only while running.
+      (e.g. ``Interpreter.exec()``), and only while running.
      The interpreter may or may not have thread states ready to use,
      but that is strictly an internal implementation detail.

@ -439,7 +339,7 @@ Attributes and methods:

      It refers only to if there is an OS thread
      running a script (code) in the interpreter's ``__main__`` module.
-      That basically means whether or not ``Interpreter.exec_sync()``
+      That basically means whether or not ``Interpreter.exec()``
      is running in some OS thread.  Code running in sub-threads
      is ignored.

@ -454,7 +354,7 @@ Attributes and methods:
      ``prepare_main()`` is helpful for initializing the
      globals for an interpreter before running code in it.

-* ``exec_sync(code, /)``
+* ``exec(code, /)``
      Execute the given source code in the interpreter
      (in the current OS thread), using its ``__main__`` module.
      It doesn't return anything.
@ -465,39 +365,59 @@ Attributes and methods:
      the globals and locals.

      The code running in the current OS thread (a different
-      interpreter) is effectively paused until ``exec_sync()``
+      interpreter) is effectively paused until ``Interpreter.exec()``
      finishes.  To avoid pausing it, create a new ``threading.Thread``
-      and call ``exec_sync()`` in it.
+      and call ``Interpreter.exec()`` in it
+      (like ``Interpreter.call_in_thread()`` does).

-      ``exec_sync()`` does not reset the interpreter's state nor
+      ``Interpreter.exec()`` does not reset the interpreter's state nor
      the ``__main__`` module, neither before nor after, so each
      successive call picks up where the last one left off.  This can
      be useful for running some code to initialize an interpreter
      (e.g. with imports) before later performing some repeated task.

      If there is an uncaught exception, it will be propagated into
-      the calling interpreter as a ``ExecFailure``, which
-      preserves enough information for a helpful error display.  That
-      means if the ``ExecFailure`` isn't caught then the full
-      traceback of the propagated exception, including details about
-      syntax errors, etc., will be displayed.  Having the full
-      traceback is particularly useful when debugging.
+      the calling interpreter as an ``ExecutionFailed``.  The full error
+      display of the original exception, generated relative to the
+      called interpreter, is preserved on the propagated ``ExecutionFailed``.
+      That includes the full traceback, with all the extra info like
+      syntax error details and chained exceptions.
+      If the ``ExecutionFailed`` is not caught then that full error display
+      will be shown, much like it would be if the propagated exception
+      had been raised in the main interpreter and uncaught.  Having
+      the full traceback is particularly useful when debugging.

      If exception propagation is not desired then an explicit
      try-except should be used around the *code* passed to
-      ``exec_sync()``.  Likewise any error handling that depends
+      ``Interpreter.exec()``.  Likewise any error handling that depends
      on specific information from the exception must use an explicit
-      try-except around the given *code*, since ``ExecFailure``
+      try-except around the given *code*, since ``ExecutionFailed``
      will not preserve that information.

-* ``run(code, /) -> threading.Thread``
-      Create a new thread and call ``exec_sync()`` in it.
-      Exceptions are not propagated.
+* ``call(callable, /)``
+      Call the callable object in the interpreter.
+      The return value is discarded.  If the callable raises an exception
+      then it gets propagated as an ``ExecutionFailed`` exception,
+      in the same way as ``Interpreter.exec()``.

-      This is roughly equivalent to::
+      For now only plain functions are supported and only ones that
+      take no arguments and have no cell vars.  Free globals are resolved
+      against the target interpreter's ``__main__`` module.
+
+      In the future, we can add support for arguments, closures,
+      and a broader variety of callables, at least partly via pickle.
+      We can also consider not discarding the return value.
+      The initial restrictions are in place to allow us to get the basic
+      functionality of the module out to users sooner.
+
+* ``call_in_thread(callable, /) -> threading.Thread``
+      Essentially, apply ``Interpreter.call()`` in a new thread.
+      Return values are discarded and exceptions are not propagated.
+
+      ``call_in_thread()`` is roughly equivalent to::

         def task():
-             interp.exec_sync(code)
+             interp.run(func)
         t = threading.Thread(target=task)
         t.start()

@ -518,7 +438,7 @@ the back and each "get" pops the next one off the front.  Every added
 object will be popped off in the order it was pushed on.

 Only objects that are specifically supported for passing
-between interpreters may be sent through a ``Queue``.
+between interpreters may be sent through an ``interpreters.Queue``.
 Note that the actual objects aren't sent, but rather their
 underlying data.  However, the popped object will still be
 strictly equivalent to the original.
@ -526,10 +446,12 @@ See `Shareable Objects`_.

 The module defines the following functions:

-* ``create_queue(maxsize=0) -> Queue``
+* ``create_queue(maxsize=0, *, syncobj=False) -> Queue``
      Create a new queue.  If the maxsize is zero or negative then the
      queue is unbounded.

+      "syncobj" is used as the default for ``put()`` and ``put_nowait()``.
+
 Queue Objects
 -------------

@ -552,7 +474,8 @@ Attributes and methods:
      used for a pipe.

 * ``maxsize``
-      Number of items allowed in the queue.  Zero means "unbounded".
+      (read-only) Number of items allowed in the queue.
+      Zero means "unbounded".

 * ``__hash__()``
      Return the hash of the queue's ``id``.  This is the same
@ -579,18 +502,25 @@ Attributes and methods:
      This is only a snapshot of the state at the time of the call.
      Other threads or interpreters may cause this to change.

-* ``put(obj, timeout=None)``
+* ``put(obj, timeout=None, *, syncobj=None)``
      Add the object to the queue.

-      The object must be `shareable <Shareable Objects_>`_, which means
-      the object's data is passed through rather than the object itself.
-
      If ``maxsize > 0`` and the queue is full then this blocks until
      a free slot is available.  If *timeout* is a positive number
      then it only blocks at least that many seconds and then raises
      ``interpreters.QueueFull``.  Otherwise is blocks forever.

-* ``put_nowait(obj)``
+      If "syncobj" is true then the object must be
+      `shareable <Shareable Objects_>`_, which means the object's data
+      is passed through rather than the object itself.
+      If "syncobj" is false then all objects are supported.  However,
+      there are some performance penalties and all objects are copies
+      (e.g. via pickle).  Thus mutable objects will never be
+      automatically synchronized between interpreters.
+      If "syncobj" is None (the default) then the queue's default
+      value is used.
+
+* ``put_nowait(obj, *, syncobj=None)``
      Like ``put()`` but effectively with an immediate timeout.
      Thus if the queue is full, it immediately raises
      ``interpreters.QueueFull``.
@ -609,8 +539,8 @@ Attributes and methods:
 Shareable Objects
 -----------------

-Both ``Interpreter.prepare_main()`` and ``Queue`` work only with
-"shareable" objects.
+``Interpreter.prepare_main()`` only works with "shareable" objects.
+The same goes for ``interpreters.Queue`` (optionally).

 A "shareable" object is one which may be passed from one interpreter
 to another.  The object is not necessarily actually directly shared
@ -640,7 +570,7 @@ Here's the initial list of supported objects:
 * ``bool`` (``True``/``False``)
 * ``None``
 * ``tuple`` (only with shareable items)
-* ``Queue``
+* ``interpreters.Queue``
 * ``memoryview`` (underlying buffer actually shared)

 Note that the last two on the list, queues and ``memoryview``, are
@ -655,12 +585,13 @@ a token back and forth through a queue to indicate safety
 (see `Synchronization`_), or by assigning sub-range exclusivity
 to individual interpreters.

-Most objects will be shared through queues (``Queue``), as interpreters
-communicate information between each other.  Less frequently, objects
-will be shared through ``prepare_main()`` to set up an interpreter
-prior to running code in it.  However, ``prepare_main()`` is the
-primary way that queues are shared, to provide another interpreter
-with a means of further communication.
+Most objects will be shared through queues (``interpreters.Queue``),
+as interpreters communicate information between each other.
+Less frequently, objects will be shared through ``prepare_main()``
+to set up an interpreter prior to running code in it.  However,
+``prepare_main()`` is the primary way that queues are shared,
+to provide another interpreter with a means
+of further communication.

 Finally, a reminder: for a few types the actual object is shared,
 whereas for the rest only the underlying data is shared, whether
@ -675,9 +606,9 @@ had been shared directly, whether or not it actually was.
 That's a slightly different and stronger promise than just equality.

 The guarantee is especially important for mutable objects, like
-``Queue`` and ``memoryview``.  Mutating the object in one interpreter
-will always be reflected immediately in every other interpreter
-sharing the object.
+``Interpreters.Queue`` and ``memoryview``.  Mutating the object
+in one interpreter will always be reflected immediately in every
+other interpreter sharing the object.

 Synchronization
 ---------------
@ -692,8 +623,8 @@ However, interpreters cannot share objects which means they cannot
 share ``threading.Lock`` objects.

 The ``interpreters`` module does not provide any such dedicated
-synchronization primitives.  Instead, ``Queue`` objects provide
-everything one might need.
+synchronization primitives.  Instead, ``interpreters.Queue``
+objects provide everything one might need.

 For example, if there's a shared resource that needs managed
 access then a queue may be used to manage it, where the interpreters
@ -709,7 +640,7 @@ pass an object around to indicate who can use the resource::
   def worker():
       interp = interpreters.create()
       interp.prepare_main(control=control, data=data)
-       interp.exec_sync("""if True:
+       interp.exec("""if True:
           from mymodule import edit_data
           while True:
               token = control.get()
@ -731,12 +662,12 @@ pass an object around to indicate who can use the resource::
 Exceptions
 ----------

-* ``ExecFailure``
-      Raised from ``Interpreter.exec_sync()`` when there's an
-      uncaught exception.  The error display for this exception
-      includes the traceback of the uncaught exception, which gets
-      shown after the normal error display, much like happens for
-      ``ExceptionGroup``.
+* ``ExecutionFailed``
+      Raised from ``Interpreter.exec()`` and ``Interpreter.call()``
+      when there's an uncaught exception.
+      The error display for this exception includes the traceback
+      of the uncaught exception, which gets shown after the normal
+      error display, much like happens for ``ExceptionGroup``.

      Attributes:

@ -766,7 +697,18 @@ InterpreterPoolExecutor
 Along with the new ``interpreters`` module, there will be a new
 ``concurrent.futures.InterpreterPoolExecutor``.  Each worker executes
 in its own thread with its own subinterpreter.  Communication may
-still be done through ``Queue`` objects, set with the initializer.
+still be done through ``interpreters.Queue`` objects,
+set with the initializer.
+
+sys.implementation.supports_isolated_interpreters
+-------------------------------------------------
+
+Python implementations are not required to support subinterpreters,
+though most major ones do.  If an implementation does support them
+then ``sys.implementation.supports_isolated_interpreters`` will be
+set to ``True``.  Otherwise it will be ``False``.  If the feature
+is not supported then importing the ``interpreters`` module will
+raise an ``ImportError``.

 Examples
 --------
@ -818,7 +760,7 @@ via workers in sub-threads.
   def worker():
       interp = interpreters.create()
       interp.prepare_main(tasks=tasks, results=results)
-       interp.exec_sync("""if True:
+       interp.exec("""if True:
           from mymodule import handle_request, capture_exception

           while True:
@ -880,7 +822,7 @@ so the code takes advantage of directly sharing ``memoryview`` buffers.
   def worker(id):
       interp = interpreters.create()
       interp.prepare_main(data=buf, results=results, tasks=tasks)
-       interp.exec_sync("""if True:
+       interp.exec("""if True:
           from mymodule import reduce_chunk

           while True:
@ -914,6 +856,132 @@ so the code takes advantage of directly sharing ``memoryview`` buffers.
   use_results(results)


+Rationale
+=========
+
+A Minimal API
+-------------
+
+Since the core dev team has no real experience with
+how users will make use of multiple interpreters in Python code, this
+proposal purposefully keeps the initial API as lean and minimal as
+possible.  The objective is to provide a well-considered foundation
+on which further (more advanced) functionality may be added later,
+as appropriate.
+
+That said, the proposed design incorporates lessons learned from
+existing use of subinterpreters by the community, from existing stdlib
+modules, and from other programming languages.  It also factors in
+experience from using subinterpreters in the CPython test suite and
+using them in `concurrency benchmarks`_.
+
+.. _concurrency benchmarks:
+   https://github.com/ericsnowcurrently/concurrency-benchmarks
+
+create(), create_queue()
+------------------------
+
+Typically, users call a type to create instances of the type, at which
+point the object's resources get provisioned.  The ``interpreters``
+module takes a different approach, where users must call ``create()``
+to get a new interpreter or ``create_queue()`` for a new queue.
+Calling ``interpreters.Interpreter()`` directly only returns a wrapper
+around an existing interpreters (likewise for
+``interpreters.Queue()``).
+
+This is because interpreters (and queues) are special resources.
+They exist globally in the process and are not managed/owned by the
+current interpreter.  Thus the ``interpreters`` module makes creating
+an interpreter (or queue) a visibly distinct operation from creating
+an instance of ``interpreters.Interpreter``
+(or ``interpreters.Queue``).
+
+Interpreter.prepare_main() Sets Multiple Variables
+--------------------------------------------------
+
+``prepare_main()`` may be seen as a setter function of sorts.
+It supports setting multiple names at once,
+e.g. ``interp.prepare_main(spam=1, eggs=2)``, whereas most setters
+set one item at a time.  The main reason is for efficiency.
+
+To set a value in the interpreter's ``__main__.__dict__``, the
+implementation must first switch the OS thread to the identified
+interpreter, which involves some non-negligible overhead.  After
+setting the value it must switch back.
+Furthermore, there is some additional overhead to the mechanism
+by which it passes objects between interpreters, which can be
+reduced in aggregate if multiple values are set at once.
+
+Therefore, ``prepare_main()`` supports setting multiple
+values at once.
+
+Propagating Exceptions
+----------------------
+
+An uncaught exception from a subinterpreter,
+via ``Interpreter.exec()``,
+could either be (effectively) ignored,
+like ``threading.Thread()`` does,
+or propagated, like the builtin ``exec()`` does.
+Since ``Interpreter.exec()`` is a synchronous operation,
+like the builtin ``exec()``, uncaught exceptions are propagated.
+
+However, such exceptions are not raised directly.  That's because
+interpreters are isolated from each other and must not share objects,
+including exceptions.  That could be addressed by raising a surrogate
+of the exception, whether a summary, a copy, or a proxy that wraps it.
+Any of those could preserve the traceback, which is useful for
+debugging.  The ``ExecutionFailed`` that gets raised
+is such a surrogate.
+
+There's another concern to consider.  If a propagated exception isn't
+immediately caught, it will bubble up through the call stack until
+caught (or not).  In the case that code somewhere else may catch it,
+it is helpful to identify that the exception came from a subinterpreter
+(i.e. a "remote" source), rather than from the current interpreter.
+That's why ``Interpreter.exec()`` raises ``ExecutionFailed`` and why
+it is a plain ``Exception``, rather than a copy or proxy with a class
+that matches the original exception.  For example, an uncaught
+``ValueError`` from a subinterpreter would never get caught in a later
+``try: ... except ValueError: ...``.  Instead, ``ExecutionFailed``
+must be handled directly.
+
+In contrast, exceptions propagated from ``Interpreter.call()`` do not
+involve ``ExecutionFailed`` but are raised directly, as though originating
+in the calling interpreter.  This is because ``Interpreter.call()`` is
+a higher level method that uses pickle to support objects that can't
+normally be passed between interpreters.
+
+Limited Object Sharing
+----------------------
+
+As noted in `Interpreter Isolation`_, only a small number of builtin
+objects may be truly shared between interpreters.  In all other cases
+objects can only be shared indirectly, through copies or proxies.
+
+The set of objects that are shareable as copies through queues
+(and ``Interpreter.prepare_main()``) is limited for the sake of
+efficiency.
+
+Supporting sharing of *all* objects is possible (via pickle)
+but not part of this proposal.  For one thing, it's helpful to know
+in those cases that only an efficient implementation is being used.
+Furthermore, in those cases supporting mutable objects via pickling
+would violate the guarantee that "shared" objects be equivalent
+(and stay that way).
+
+Objects vs. ID Proxies
+----------------------
+
+For both interpreters and queues, the low-level module makes use of
+proxy objects that expose the underlying state by their corresponding
+process-global IDs.  In both cases the state is likewise process-global
+and will be used by multiple interpreters.  Thus they aren't suitable
+to be implemented as ``PyObject``, which is only really an option for
+interpreter-specific data.  That's why the ``interpreters`` module
+instead provides objects that are weakly associated through the ID.
+
+
 Rejected Ideas
 ==============