PEP 734: Updates After Discussion (#3664)

2024-02-27 11:49:36 -07:00 · 2024-02-27 11:49:36 -07:00 · 13022a6d12
parent c55835e170
commit 13022a6d12
1 changed files with 223 additions and 155 deletions
--- a/peps/pep-0734.rst
+++ b/peps/pep-0734.rst
@ -25,9 +25,9 @@ This PEP proposes to add a new module, ``interpreters``, to support
 inspecting, creating, and running code in multiple interpreters in the
 current process.  This includes ``Interpreter`` objects that represent
 the underlying interpreters.  The module will also provide a basic
-``Queue`` class for communication between interpreters.  Finally, we
+``Queue`` class for communication between interpreters.
-will add a new ``concurrent.futures.InterpreterPoolExecutor`` based
+Finally, we will add a new ``concurrent.futures.InterpreterPoolExecutor``
-on the ``interpreters`` module.
+based on the ``interpreters`` module.
 Introduction
@ -92,7 +92,7 @@ Interpreters and Threads
 ------------------------
 Thread states are related to interpreter states in much the same way
-that OS threads and processes are related (at a hight level).  To
+that OS threads and processes are related (at a high level).  To
 begin with, the relationship is one-to-many.
 A thread state belongs to a single interpreter (and stores
 a pointer to it).  That thread state is never used for a different
@ -276,106 +276,6 @@ interpreters.  Without one, multiple interpreters are a much less
 useful feature.
 Rationale
 =========
 A Minimal API
 -------------
 Since the core dev team has no real experience with
 how users will make use of multiple interpreters in Python code, this
 proposal purposefully keeps the initial API as lean and minimal as
 possible.  The objective is to provide a well-considered foundation
 on which further (more advanced) functionality may be added later,
 as appropriate.
 That said, the proposed design incorporates lessons learned from
 existing use of subinterpreters by the community, from existing stdlib
 modules, and from other programming languages.  It also factors in
 experience from using subinterpreters in the CPython test suite and
 using them in `concurrency benchmarks`_.
 .. _concurrency benchmarks:
   https://github.com/ericsnowcurrently/concurrency-benchmarks
 Interpreter.prepare_main() Sets Multiple Variables
 --------------------------------------------------
 ``prepare_main()`` may be seen as a setter function of sorts.
 It supports setting multiple names at once,
 e.g. ``interp.prepare_main(spam=1, eggs=2)``, whereas most setters
 set one item at a time.  The main reason is for efficiency.
 To set a value in the interpreter's ``__main__.__dict__``, the
 implementation must first switch the OS thread to the identified
 interpreter, which involves some non-negligible overhead.  After
 setting the value it must switch back.
 Furthermore, there is some additional overhead to the mechanism
 by which it passes objects between interpreters, which can be
 reduced in aggregate if multiple values are set at once.
 Therefore, ``prepare_main()`` supports setting multiple
 values at once.
 Propagating Exceptions
 ----------------------
 An uncaught exception from a subinterpreter,
 via ``Interpreter.exec_sync()``,
 could either be (effectively) ignored, like ``threading.Thread()`` does,
 or propagated, like the builtin ``exec()`` does.  Since ``exec_sync()``
 is a synchronous operation, like the builtin ``exec()``,
 uncaught exceptions are propagated.
 However, such exceptions are not raised directly.  That's because
 interpreters are isolated from each other and must not share objects,
 including exceptions.  That could be addressed by raising a surrogate
 of the exception, whether a summary, a copy, or a proxy that wraps it.
 Any of those could preserve the traceback, which is useful for
 debugging.  The ``ExecFailure`` that gets raised
 is such a surrogate.
 There's another concern to consider.  If a propagated exception isn't
 immediately caught, it will bubble up through the call stack until
 caught (or not).  In the case that code somewhere else may catch it,
 it is helpful to identify that the exception came from a subinterpreter
 (i.e. a "remote" source), rather than from the current interpreter.
 That's why ``Interpreter.exec_sync()`` raises ``ExecFailure`` and why
 it is a plain ``Exception``, rather than a copy or proxy with a class
 that matches the original exception.  For example, an uncaught
 ``ValueError`` from a subinterpreter would never get caught in a later
 ``try: ... except ValueError: ...``.  Instead, ``ExecFailure``
 must be handled directly.
 Limited Object Sharing
 ----------------------
 As noted in `Interpreter Isolation`_, only a small number of builtin
 objects may be truly shared between interpreters.  In all other cases
 objects can only be shared indirectly, through copies or proxies.
 The set of objects that are shareable as copies through queues
 (and ``Interpreter.prepare_main()``) is limited for the sake of
 efficiency.
 Supporting sharing of *all* objects is possible (via pickle)
 but not part of this proposal.  For one thing, it's helpful to know
 that only an efficient implementation is being used.  Furthermore,
 for mutable objects pickling would violate the guarantee that "shared"
 objects be equivalent (and stay that way).
 Objects vs. ID Proxies
 ----------------------
 For both interpreters and queues, the low-level module makes use of
 proxy objects that expose the underlying state by their corresponding
 process-global IDs.  In both cases the state is likewise process-global
 and will be used by multiple interpreters.  Thus they aren't suitable
 to be implemented as ``PyObject``, which is only really an option for
 interpreter-specific data.  That's why the ``interpreters`` module
 instead provides objects that are weakly associated through the ID.
 Specification
 =============
@ -407,7 +307,7 @@ The module defines the following functions:
      for it.  The interpreter doesn't do anything on its own and is
      not inherently tied to any OS thread.  That only happens when
      something is actually run in the interpreter
-      (e.g. ``Interpreter.exec_sync()``), and only while running.
+      (e.g. ``Interpreter.exec()``), and only while running.
      The interpreter may or may not have thread states ready to use,
      but that is strictly an internal implementation detail.
@ -439,7 +339,7 @@ Attributes and methods:
      It refers only to if there is an OS thread
      running a script (code) in the interpreter's ``__main__`` module.
-      That basically means whether or not ``Interpreter.exec_sync()``
+      That basically means whether or not ``Interpreter.exec()``
      is running in some OS thread.  Code running in sub-threads
      is ignored.
@ -454,7 +354,7 @@ Attributes and methods:
      ``prepare_main()`` is helpful for initializing the
      globals for an interpreter before running code in it.
-* ``exec_sync(code, /)``
+* ``exec(code, /)``
      Execute the given source code in the interpreter
      (in the current OS thread), using its ``__main__`` module.
      It doesn't return anything.
@ -465,39 +365,59 @@ Attributes and methods:
      the globals and locals.
      The code running in the current OS thread (a different
-      interpreter) is effectively paused until ``exec_sync()``
+      interpreter) is effectively paused until ``Interpreter.exec()``
      finishes.  To avoid pausing it, create a new ``threading.Thread``
-      and call ``exec_sync()`` in it.
+      and call ``Interpreter.exec()`` in it
      (like ``Interpreter.call_in_thread()`` does).
-      ``exec_sync()`` does not reset the interpreter's state nor
+      ``Interpreter.exec()`` does not reset the interpreter's state nor
      the ``__main__`` module, neither before nor after, so each
      successive call picks up where the last one left off.  This can
      be useful for running some code to initialize an interpreter
      (e.g. with imports) before later performing some repeated task.
      If there is an uncaught exception, it will be propagated into
-      the calling interpreter as a ``ExecFailure``, which
+      the calling interpreter as an ``ExecutionFailed``.  The full error
-      preserves enough information for a helpful error display.  That
+      display of the original exception, generated relative to the
-      means if the ``ExecFailure`` isn't caught then the full
+      called interpreter, is preserved on the propagated ``ExecutionFailed``.
-      traceback of the propagated exception, including details about
+      That includes the full traceback, with all the extra info like
-      syntax errors, etc., will be displayed.  Having the full
+      syntax error details and chained exceptions.
-      traceback is particularly useful when debugging.
+      If the ``ExecutionFailed`` is not caught then that full error display
      will be shown, much like it would be if the propagated exception
      had been raised in the main interpreter and uncaught.  Having
      the full traceback is particularly useful when debugging.
      If exception propagation is not desired then an explicit
      try-except should be used around the *code* passed to
-      ``exec_sync()``.  Likewise any error handling that depends
+      ``Interpreter.exec()``.  Likewise any error handling that depends
      on specific information from the exception must use an explicit
-      try-except around the given *code*, since ``ExecFailure``
+      try-except around the given *code*, since ``ExecutionFailed``
      will not preserve that information.
-* ``run(code, /) -> threading.Thread``
+* ``call(callable, /)``
-      Create a new thread and call ``exec_sync()`` in it.
+      Call the callable object in the interpreter.
-      Exceptions are not propagated.
+      The return value is discarded.  If the callable raises an exception
      then it gets propagated as an ``ExecutionFailed`` exception,
      in the same way as ``Interpreter.exec()``.
-      This is roughly equivalent to::
+      For now only plain functions are supported and only ones that
      take no arguments and have no cell vars.  Free globals are resolved
      against the target interpreter's ``__main__`` module.
      In the future, we can add support for arguments, closures,
      and a broader variety of callables, at least partly via pickle.
      We can also consider not discarding the return value.
      The initial restrictions are in place to allow us to get the basic
      functionality of the module out to users sooner.
 * ``call_in_thread(callable, /) -> threading.Thread``
      Essentially, apply ``Interpreter.call()`` in a new thread.
      Return values are discarded and exceptions are not propagated.
      ``call_in_thread()`` is roughly equivalent to::
         def task():
-             interp.exec_sync(code)
+             interp.run(func)
         t = threading.Thread(target=task)
         t.start()
@ -518,7 +438,7 @@ the back and each "get" pops the next one off the front.  Every added
 object will be popped off in the order it was pushed on.
 Only objects that are specifically supported for passing
-between interpreters may be sent through a ``Queue``.
+between interpreters may be sent through an ``interpreters.Queue``.
 Note that the actual objects aren't sent, but rather their
 underlying data.  However, the popped object will still be
 strictly equivalent to the original.
@ -526,10 +446,12 @@ See `Shareable Objects`_.
 The module defines the following functions:
-* ``create_queue(maxsize=0) -> Queue``
+* ``create_queue(maxsize=0, *, syncobj=False) -> Queue``
      Create a new queue.  If the maxsize is zero or negative then the
      queue is unbounded.
      "syncobj" is used as the default for ``put()`` and ``put_nowait()``.
 Queue Objects
 -------------
@ -552,7 +474,8 @@ Attributes and methods:
      used for a pipe.
 * ``maxsize``
-      Number of items allowed in the queue.  Zero means "unbounded".
+      (read-only) Number of items allowed in the queue.
      Zero means "unbounded".
 * ``__hash__()``
      Return the hash of the queue's ``id``.  This is the same
@ -579,18 +502,25 @@ Attributes and methods:
      This is only a snapshot of the state at the time of the call.
      Other threads or interpreters may cause this to change.
-* ``put(obj, timeout=None)``
+* ``put(obj, timeout=None, *, syncobj=None)``
      Add the object to the queue.
      The object must be `shareable <Shareable Objects_>`_, which means
      the object's data is passed through rather than the object itself.
      If ``maxsize > 0`` and the queue is full then this blocks until
      a free slot is available.  If *timeout* is a positive number
      then it only blocks at least that many seconds and then raises
      ``interpreters.QueueFull``.  Otherwise is blocks forever.
-* ``put_nowait(obj)``
+      If "syncobj" is true then the object must be
      `shareable <Shareable Objects_>`_, which means the object's data
      is passed through rather than the object itself.
      If "syncobj" is false then all objects are supported.  However,
      there are some performance penalties and all objects are copies
      (e.g. via pickle).  Thus mutable objects will never be
      automatically synchronized between interpreters.
      If "syncobj" is None (the default) then the queue's default
      value is used.
 * ``put_nowait(obj, *, syncobj=None)``
      Like ``put()`` but effectively with an immediate timeout.
      Thus if the queue is full, it immediately raises
      ``interpreters.QueueFull``.
@ -609,8 +539,8 @@ Attributes and methods:
 Shareable Objects
 -----------------
-Both ``Interpreter.prepare_main()`` and ``Queue`` work only with
+``Interpreter.prepare_main()`` only works with "shareable" objects.
-"shareable" objects.
+The same goes for ``interpreters.Queue`` (optionally).
 A "shareable" object is one which may be passed from one interpreter
 to another.  The object is not necessarily actually directly shared
@ -640,7 +570,7 @@ Here's the initial list of supported objects:
 * ``bool`` (``True``/``False``)
 * ``None``
 * ``tuple`` (only with shareable items)
-* ``Queue``
+* ``interpreters.Queue``
 * ``memoryview`` (underlying buffer actually shared)
 Note that the last two on the list, queues and ``memoryview``, are
@ -655,12 +585,13 @@ a token back and forth through a queue to indicate safety
 (see `Synchronization`_), or by assigning sub-range exclusivity
 to individual interpreters.
-Most objects will be shared through queues (``Queue``), as interpreters
+Most objects will be shared through queues (``interpreters.Queue``),
-communicate information between each other.  Less frequently, objects
+as interpreters communicate information between each other.
-will be shared through ``prepare_main()`` to set up an interpreter
+Less frequently, objects will be shared through ``prepare_main()``
-prior to running code in it.  However, ``prepare_main()`` is the
+to set up an interpreter prior to running code in it.  However,
-primary way that queues are shared, to provide another interpreter
+``prepare_main()`` is the primary way that queues are shared,
-with a means of further communication.
+to provide another interpreter with a means
 of further communication.
 Finally, a reminder: for a few types the actual object is shared,
 whereas for the rest only the underlying data is shared, whether
@ -675,9 +606,9 @@ had been shared directly, whether or not it actually was.
 That's a slightly different and stronger promise than just equality.
 The guarantee is especially important for mutable objects, like
-``Queue`` and ``memoryview``.  Mutating the object in one interpreter
+``Interpreters.Queue`` and ``memoryview``.  Mutating the object
-will always be reflected immediately in every other interpreter
+in one interpreter will always be reflected immediately in every
-sharing the object.
+other interpreter sharing the object.
 Synchronization
 ---------------
@ -692,8 +623,8 @@ However, interpreters cannot share objects which means they cannot
 share ``threading.Lock`` objects.
 The ``interpreters`` module does not provide any such dedicated
-synchronization primitives.  Instead, ``Queue`` objects provide
+synchronization primitives.  Instead, ``interpreters.Queue``
-everything one might need.
+objects provide everything one might need.
 For example, if there's a shared resource that needs managed
 access then a queue may be used to manage it, where the interpreters
@ -709,7 +640,7 @@ pass an object around to indicate who can use the resource::
   def worker():
       interp = interpreters.create()
       interp.prepare_main(control=control, data=data)
-       interp.exec_sync("""if True:
+       interp.exec("""if True:
           from mymodule import edit_data
           while True:
               token = control.get()
@ -731,12 +662,12 @@ pass an object around to indicate who can use the resource::
 Exceptions
 ----------
-* ``ExecFailure``
+* ``ExecutionFailed``
-      Raised from ``Interpreter.exec_sync()`` when there's an
+      Raised from ``Interpreter.exec()`` and ``Interpreter.call()``
-      uncaught exception.  The error display for this exception
+      when there's an uncaught exception.
-      includes the traceback of the uncaught exception, which gets
+      The error display for this exception includes the traceback
-      shown after the normal error display, much like happens for
+      of the uncaught exception, which gets shown after the normal
-      ``ExceptionGroup``.
+      error display, much like happens for ``ExceptionGroup``.
      Attributes:
@ -766,7 +697,18 @@ InterpreterPoolExecutor
 Along with the new ``interpreters`` module, there will be a new
 ``concurrent.futures.InterpreterPoolExecutor``.  Each worker executes
 in its own thread with its own subinterpreter.  Communication may
-still be done through ``Queue`` objects, set with the initializer.
+still be done through ``interpreters.Queue`` objects,
 set with the initializer.
 sys.implementation.supports_isolated_interpreters
 -------------------------------------------------
 Python implementations are not required to support subinterpreters,
 though most major ones do.  If an implementation does support them
 then ``sys.implementation.supports_isolated_interpreters`` will be
 set to ``True``.  Otherwise it will be ``False``.  If the feature
 is not supported then importing the ``interpreters`` module will
 raise an ``ImportError``.
 Examples
 --------
@ -818,7 +760,7 @@ via workers in sub-threads.
   def worker():
       interp = interpreters.create()
       interp.prepare_main(tasks=tasks, results=results)
-       interp.exec_sync("""if True:
+       interp.exec("""if True:
           from mymodule import handle_request, capture_exception
           while True:
@ -880,7 +822,7 @@ so the code takes advantage of directly sharing ``memoryview`` buffers.
   def worker(id):
       interp = interpreters.create()
       interp.prepare_main(data=buf, results=results, tasks=tasks)
-       interp.exec_sync("""if True:
+       interp.exec("""if True:
           from mymodule import reduce_chunk
           while True:
@ -914,6 +856,132 @@ so the code takes advantage of directly sharing ``memoryview`` buffers.
   use_results(results)
 Rationale
 =========
 A Minimal API
 -------------
 Since the core dev team has no real experience with
 how users will make use of multiple interpreters in Python code, this
 proposal purposefully keeps the initial API as lean and minimal as
 possible.  The objective is to provide a well-considered foundation
 on which further (more advanced) functionality may be added later,
 as appropriate.
 That said, the proposed design incorporates lessons learned from
 existing use of subinterpreters by the community, from existing stdlib
 modules, and from other programming languages.  It also factors in
 experience from using subinterpreters in the CPython test suite and
 using them in `concurrency benchmarks`_.
 .. _concurrency benchmarks:
   https://github.com/ericsnowcurrently/concurrency-benchmarks
 create(), create_queue()
 ------------------------
 Typically, users call a type to create instances of the type, at which
 point the object's resources get provisioned.  The ``interpreters``
 module takes a different approach, where users must call ``create()``
 to get a new interpreter or ``create_queue()`` for a new queue.
 Calling ``interpreters.Interpreter()`` directly only returns a wrapper
 around an existing interpreters (likewise for
 ``interpreters.Queue()``).
 This is because interpreters (and queues) are special resources.
 They exist globally in the process and are not managed/owned by the
 current interpreter.  Thus the ``interpreters`` module makes creating
 an interpreter (or queue) a visibly distinct operation from creating
 an instance of ``interpreters.Interpreter``
 (or ``interpreters.Queue``).
 Interpreter.prepare_main() Sets Multiple Variables
 --------------------------------------------------
 ``prepare_main()`` may be seen as a setter function of sorts.
 It supports setting multiple names at once,
 e.g. ``interp.prepare_main(spam=1, eggs=2)``, whereas most setters
 set one item at a time.  The main reason is for efficiency.
 To set a value in the interpreter's ``__main__.__dict__``, the
 implementation must first switch the OS thread to the identified
 interpreter, which involves some non-negligible overhead.  After
 setting the value it must switch back.
 Furthermore, there is some additional overhead to the mechanism
 by which it passes objects between interpreters, which can be
 reduced in aggregate if multiple values are set at once.
 Therefore, ``prepare_main()`` supports setting multiple
 values at once.
 Propagating Exceptions
 ----------------------
 An uncaught exception from a subinterpreter,
 via ``Interpreter.exec()``,
 could either be (effectively) ignored,
 like ``threading.Thread()`` does,
 or propagated, like the builtin ``exec()`` does.
 Since ``Interpreter.exec()`` is a synchronous operation,
 like the builtin ``exec()``, uncaught exceptions are propagated.
 However, such exceptions are not raised directly.  That's because
 interpreters are isolated from each other and must not share objects,
 including exceptions.  That could be addressed by raising a surrogate
 of the exception, whether a summary, a copy, or a proxy that wraps it.
 Any of those could preserve the traceback, which is useful for
 debugging.  The ``ExecutionFailed`` that gets raised
 is such a surrogate.
 There's another concern to consider.  If a propagated exception isn't
 immediately caught, it will bubble up through the call stack until
 caught (or not).  In the case that code somewhere else may catch it,
 it is helpful to identify that the exception came from a subinterpreter
 (i.e. a "remote" source), rather than from the current interpreter.
 That's why ``Interpreter.exec()`` raises ``ExecutionFailed`` and why
 it is a plain ``Exception``, rather than a copy or proxy with a class
 that matches the original exception.  For example, an uncaught
 ``ValueError`` from a subinterpreter would never get caught in a later
 ``try: ... except ValueError: ...``.  Instead, ``ExecutionFailed``
 must be handled directly.
 In contrast, exceptions propagated from ``Interpreter.call()`` do not
 involve ``ExecutionFailed`` but are raised directly, as though originating
 in the calling interpreter.  This is because ``Interpreter.call()`` is
 a higher level method that uses pickle to support objects that can't
 normally be passed between interpreters.
 Limited Object Sharing
 ----------------------
 As noted in `Interpreter Isolation`_, only a small number of builtin
 objects may be truly shared between interpreters.  In all other cases
 objects can only be shared indirectly, through copies or proxies.
 The set of objects that are shareable as copies through queues
 (and ``Interpreter.prepare_main()``) is limited for the sake of
 efficiency.
 Supporting sharing of *all* objects is possible (via pickle)
 but not part of this proposal.  For one thing, it's helpful to know
 in those cases that only an efficient implementation is being used.
 Furthermore, in those cases supporting mutable objects via pickling
 would violate the guarantee that "shared" objects be equivalent
 (and stay that way).
 Objects vs. ID Proxies
 ----------------------
 For both interpreters and queues, the low-level module makes use of
 proxy objects that expose the underlying state by their corresponding
 process-global IDs.  In both cases the state is likewise process-global
 and will be used by multiple interpreters.  Thus they aren't suitable
 to be implemented as ``PyObject``, which is only really an option for
 interpreter-specific data.  That's why the ``interpreters`` module
 instead provides objects that are weakly associated through the ID.
 Rejected Ideas
 ==============