PEP 734: Updates After Discussion (#3664)

This commit is contained in:
Eric Snow 2024-02-27 11:49:36 -07:00 committed by GitHub
parent c55835e170
commit 13022a6d12
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
1 changed files with 223 additions and 155 deletions

View File

@ -25,9 +25,9 @@ This PEP proposes to add a new module, ``interpreters``, to support
inspecting, creating, and running code in multiple interpreters in the
current process. This includes ``Interpreter`` objects that represent
the underlying interpreters. The module will also provide a basic
``Queue`` class for communication between interpreters. Finally, we
will add a new ``concurrent.futures.InterpreterPoolExecutor`` based
on the ``interpreters`` module.
``Queue`` class for communication between interpreters.
Finally, we will add a new ``concurrent.futures.InterpreterPoolExecutor``
based on the ``interpreters`` module.
Introduction
@ -92,7 +92,7 @@ Interpreters and Threads
------------------------
Thread states are related to interpreter states in much the same way
that OS threads and processes are related (at a hight level). To
that OS threads and processes are related (at a high level). To
begin with, the relationship is one-to-many.
A thread state belongs to a single interpreter (and stores
a pointer to it). That thread state is never used for a different
@ -276,106 +276,6 @@ interpreters. Without one, multiple interpreters are a much less
useful feature.
Rationale
=========
A Minimal API
-------------
Since the core dev team has no real experience with
how users will make use of multiple interpreters in Python code, this
proposal purposefully keeps the initial API as lean and minimal as
possible. The objective is to provide a well-considered foundation
on which further (more advanced) functionality may be added later,
as appropriate.
That said, the proposed design incorporates lessons learned from
existing use of subinterpreters by the community, from existing stdlib
modules, and from other programming languages. It also factors in
experience from using subinterpreters in the CPython test suite and
using them in `concurrency benchmarks`_.
.. _concurrency benchmarks:
https://github.com/ericsnowcurrently/concurrency-benchmarks
Interpreter.prepare_main() Sets Multiple Variables
--------------------------------------------------
``prepare_main()`` may be seen as a setter function of sorts.
It supports setting multiple names at once,
e.g. ``interp.prepare_main(spam=1, eggs=2)``, whereas most setters
set one item at a time. The main reason is for efficiency.
To set a value in the interpreter's ``__main__.__dict__``, the
implementation must first switch the OS thread to the identified
interpreter, which involves some non-negligible overhead. After
setting the value it must switch back.
Furthermore, there is some additional overhead to the mechanism
by which it passes objects between interpreters, which can be
reduced in aggregate if multiple values are set at once.
Therefore, ``prepare_main()`` supports setting multiple
values at once.
Propagating Exceptions
----------------------
An uncaught exception from a subinterpreter,
via ``Interpreter.exec_sync()``,
could either be (effectively) ignored, like ``threading.Thread()`` does,
or propagated, like the builtin ``exec()`` does. Since ``exec_sync()``
is a synchronous operation, like the builtin ``exec()``,
uncaught exceptions are propagated.
However, such exceptions are not raised directly. That's because
interpreters are isolated from each other and must not share objects,
including exceptions. That could be addressed by raising a surrogate
of the exception, whether a summary, a copy, or a proxy that wraps it.
Any of those could preserve the traceback, which is useful for
debugging. The ``ExecFailure`` that gets raised
is such a surrogate.
There's another concern to consider. If a propagated exception isn't
immediately caught, it will bubble up through the call stack until
caught (or not). In the case that code somewhere else may catch it,
it is helpful to identify that the exception came from a subinterpreter
(i.e. a "remote" source), rather than from the current interpreter.
That's why ``Interpreter.exec_sync()`` raises ``ExecFailure`` and why
it is a plain ``Exception``, rather than a copy or proxy with a class
that matches the original exception. For example, an uncaught
``ValueError`` from a subinterpreter would never get caught in a later
``try: ... except ValueError: ...``. Instead, ``ExecFailure``
must be handled directly.
Limited Object Sharing
----------------------
As noted in `Interpreter Isolation`_, only a small number of builtin
objects may be truly shared between interpreters. In all other cases
objects can only be shared indirectly, through copies or proxies.
The set of objects that are shareable as copies through queues
(and ``Interpreter.prepare_main()``) is limited for the sake of
efficiency.
Supporting sharing of *all* objects is possible (via pickle)
but not part of this proposal. For one thing, it's helpful to know
that only an efficient implementation is being used. Furthermore,
for mutable objects pickling would violate the guarantee that "shared"
objects be equivalent (and stay that way).
Objects vs. ID Proxies
----------------------
For both interpreters and queues, the low-level module makes use of
proxy objects that expose the underlying state by their corresponding
process-global IDs. In both cases the state is likewise process-global
and will be used by multiple interpreters. Thus they aren't suitable
to be implemented as ``PyObject``, which is only really an option for
interpreter-specific data. That's why the ``interpreters`` module
instead provides objects that are weakly associated through the ID.
Specification
=============
@ -407,7 +307,7 @@ The module defines the following functions:
for it. The interpreter doesn't do anything on its own and is
not inherently tied to any OS thread. That only happens when
something is actually run in the interpreter
(e.g. ``Interpreter.exec_sync()``), and only while running.
(e.g. ``Interpreter.exec()``), and only while running.
The interpreter may or may not have thread states ready to use,
but that is strictly an internal implementation detail.
@ -439,7 +339,7 @@ Attributes and methods:
It refers only to if there is an OS thread
running a script (code) in the interpreter's ``__main__`` module.
That basically means whether or not ``Interpreter.exec_sync()``
That basically means whether or not ``Interpreter.exec()``
is running in some OS thread. Code running in sub-threads
is ignored.
@ -454,7 +354,7 @@ Attributes and methods:
``prepare_main()`` is helpful for initializing the
globals for an interpreter before running code in it.
* ``exec_sync(code, /)``
* ``exec(code, /)``
Execute the given source code in the interpreter
(in the current OS thread), using its ``__main__`` module.
It doesn't return anything.
@ -465,39 +365,59 @@ Attributes and methods:
the globals and locals.
The code running in the current OS thread (a different
interpreter) is effectively paused until ``exec_sync()``
interpreter) is effectively paused until ``Interpreter.exec()``
finishes. To avoid pausing it, create a new ``threading.Thread``
and call ``exec_sync()`` in it.
and call ``Interpreter.exec()`` in it
(like ``Interpreter.call_in_thread()`` does).
``exec_sync()`` does not reset the interpreter's state nor
``Interpreter.exec()`` does not reset the interpreter's state nor
the ``__main__`` module, neither before nor after, so each
successive call picks up where the last one left off. This can
be useful for running some code to initialize an interpreter
(e.g. with imports) before later performing some repeated task.
If there is an uncaught exception, it will be propagated into
the calling interpreter as a ``ExecFailure``, which
preserves enough information for a helpful error display. That
means if the ``ExecFailure`` isn't caught then the full
traceback of the propagated exception, including details about
syntax errors, etc., will be displayed. Having the full
traceback is particularly useful when debugging.
the calling interpreter as an ``ExecutionFailed``. The full error
display of the original exception, generated relative to the
called interpreter, is preserved on the propagated ``ExecutionFailed``.
That includes the full traceback, with all the extra info like
syntax error details and chained exceptions.
If the ``ExecutionFailed`` is not caught then that full error display
will be shown, much like it would be if the propagated exception
had been raised in the main interpreter and uncaught. Having
the full traceback is particularly useful when debugging.
If exception propagation is not desired then an explicit
try-except should be used around the *code* passed to
``exec_sync()``. Likewise any error handling that depends
``Interpreter.exec()``. Likewise any error handling that depends
on specific information from the exception must use an explicit
try-except around the given *code*, since ``ExecFailure``
try-except around the given *code*, since ``ExecutionFailed``
will not preserve that information.
* ``run(code, /) -> threading.Thread``
Create a new thread and call ``exec_sync()`` in it.
Exceptions are not propagated.
* ``call(callable, /)``
Call the callable object in the interpreter.
The return value is discarded. If the callable raises an exception
then it gets propagated as an ``ExecutionFailed`` exception,
in the same way as ``Interpreter.exec()``.
This is roughly equivalent to::
For now only plain functions are supported and only ones that
take no arguments and have no cell vars. Free globals are resolved
against the target interpreter's ``__main__`` module.
In the future, we can add support for arguments, closures,
and a broader variety of callables, at least partly via pickle.
We can also consider not discarding the return value.
The initial restrictions are in place to allow us to get the basic
functionality of the module out to users sooner.
* ``call_in_thread(callable, /) -> threading.Thread``
Essentially, apply ``Interpreter.call()`` in a new thread.
Return values are discarded and exceptions are not propagated.
``call_in_thread()`` is roughly equivalent to::
def task():
interp.exec_sync(code)
interp.run(func)
t = threading.Thread(target=task)
t.start()
@ -518,7 +438,7 @@ the back and each "get" pops the next one off the front. Every added
object will be popped off in the order it was pushed on.
Only objects that are specifically supported for passing
between interpreters may be sent through a ``Queue``.
between interpreters may be sent through an ``interpreters.Queue``.
Note that the actual objects aren't sent, but rather their
underlying data. However, the popped object will still be
strictly equivalent to the original.
@ -526,10 +446,12 @@ See `Shareable Objects`_.
The module defines the following functions:
* ``create_queue(maxsize=0) -> Queue``
* ``create_queue(maxsize=0, *, syncobj=False) -> Queue``
Create a new queue. If the maxsize is zero or negative then the
queue is unbounded.
"syncobj" is used as the default for ``put()`` and ``put_nowait()``.
Queue Objects
-------------
@ -552,7 +474,8 @@ Attributes and methods:
used for a pipe.
* ``maxsize``
Number of items allowed in the queue. Zero means "unbounded".
(read-only) Number of items allowed in the queue.
Zero means "unbounded".
* ``__hash__()``
Return the hash of the queue's ``id``. This is the same
@ -579,18 +502,25 @@ Attributes and methods:
This is only a snapshot of the state at the time of the call.
Other threads or interpreters may cause this to change.
* ``put(obj, timeout=None)``
* ``put(obj, timeout=None, *, syncobj=None)``
Add the object to the queue.
The object must be `shareable <Shareable Objects_>`_, which means
the object's data is passed through rather than the object itself.
If ``maxsize > 0`` and the queue is full then this blocks until
a free slot is available. If *timeout* is a positive number
then it only blocks at least that many seconds and then raises
``interpreters.QueueFull``. Otherwise is blocks forever.
* ``put_nowait(obj)``
If "syncobj" is true then the object must be
`shareable <Shareable Objects_>`_, which means the object's data
is passed through rather than the object itself.
If "syncobj" is false then all objects are supported. However,
there are some performance penalties and all objects are copies
(e.g. via pickle). Thus mutable objects will never be
automatically synchronized between interpreters.
If "syncobj" is None (the default) then the queue's default
value is used.
* ``put_nowait(obj, *, syncobj=None)``
Like ``put()`` but effectively with an immediate timeout.
Thus if the queue is full, it immediately raises
``interpreters.QueueFull``.
@ -609,8 +539,8 @@ Attributes and methods:
Shareable Objects
-----------------
Both ``Interpreter.prepare_main()`` and ``Queue`` work only with
"shareable" objects.
``Interpreter.prepare_main()`` only works with "shareable" objects.
The same goes for ``interpreters.Queue`` (optionally).
A "shareable" object is one which may be passed from one interpreter
to another. The object is not necessarily actually directly shared
@ -640,7 +570,7 @@ Here's the initial list of supported objects:
* ``bool`` (``True``/``False``)
* ``None``
* ``tuple`` (only with shareable items)
* ``Queue``
* ``interpreters.Queue``
* ``memoryview`` (underlying buffer actually shared)
Note that the last two on the list, queues and ``memoryview``, are
@ -655,12 +585,13 @@ a token back and forth through a queue to indicate safety
(see `Synchronization`_), or by assigning sub-range exclusivity
to individual interpreters.
Most objects will be shared through queues (``Queue``), as interpreters
communicate information between each other. Less frequently, objects
will be shared through ``prepare_main()`` to set up an interpreter
prior to running code in it. However, ``prepare_main()`` is the
primary way that queues are shared, to provide another interpreter
with a means of further communication.
Most objects will be shared through queues (``interpreters.Queue``),
as interpreters communicate information between each other.
Less frequently, objects will be shared through ``prepare_main()``
to set up an interpreter prior to running code in it. However,
``prepare_main()`` is the primary way that queues are shared,
to provide another interpreter with a means
of further communication.
Finally, a reminder: for a few types the actual object is shared,
whereas for the rest only the underlying data is shared, whether
@ -675,9 +606,9 @@ had been shared directly, whether or not it actually was.
That's a slightly different and stronger promise than just equality.
The guarantee is especially important for mutable objects, like
``Queue`` and ``memoryview``. Mutating the object in one interpreter
will always be reflected immediately in every other interpreter
sharing the object.
``Interpreters.Queue`` and ``memoryview``. Mutating the object
in one interpreter will always be reflected immediately in every
other interpreter sharing the object.
Synchronization
---------------
@ -692,8 +623,8 @@ However, interpreters cannot share objects which means they cannot
share ``threading.Lock`` objects.
The ``interpreters`` module does not provide any such dedicated
synchronization primitives. Instead, ``Queue`` objects provide
everything one might need.
synchronization primitives. Instead, ``interpreters.Queue``
objects provide everything one might need.
For example, if there's a shared resource that needs managed
access then a queue may be used to manage it, where the interpreters
@ -709,7 +640,7 @@ pass an object around to indicate who can use the resource::
def worker():
interp = interpreters.create()
interp.prepare_main(control=control, data=data)
interp.exec_sync("""if True:
interp.exec("""if True:
from mymodule import edit_data
while True:
token = control.get()
@ -731,12 +662,12 @@ pass an object around to indicate who can use the resource::
Exceptions
----------
* ``ExecFailure``
Raised from ``Interpreter.exec_sync()`` when there's an
uncaught exception. The error display for this exception
includes the traceback of the uncaught exception, which gets
shown after the normal error display, much like happens for
``ExceptionGroup``.
* ``ExecutionFailed``
Raised from ``Interpreter.exec()`` and ``Interpreter.call()``
when there's an uncaught exception.
The error display for this exception includes the traceback
of the uncaught exception, which gets shown after the normal
error display, much like happens for ``ExceptionGroup``.
Attributes:
@ -766,7 +697,18 @@ InterpreterPoolExecutor
Along with the new ``interpreters`` module, there will be a new
``concurrent.futures.InterpreterPoolExecutor``. Each worker executes
in its own thread with its own subinterpreter. Communication may
still be done through ``Queue`` objects, set with the initializer.
still be done through ``interpreters.Queue`` objects,
set with the initializer.
sys.implementation.supports_isolated_interpreters
-------------------------------------------------
Python implementations are not required to support subinterpreters,
though most major ones do. If an implementation does support them
then ``sys.implementation.supports_isolated_interpreters`` will be
set to ``True``. Otherwise it will be ``False``. If the feature
is not supported then importing the ``interpreters`` module will
raise an ``ImportError``.
Examples
--------
@ -818,7 +760,7 @@ via workers in sub-threads.
def worker():
interp = interpreters.create()
interp.prepare_main(tasks=tasks, results=results)
interp.exec_sync("""if True:
interp.exec("""if True:
from mymodule import handle_request, capture_exception
while True:
@ -880,7 +822,7 @@ so the code takes advantage of directly sharing ``memoryview`` buffers.
def worker(id):
interp = interpreters.create()
interp.prepare_main(data=buf, results=results, tasks=tasks)
interp.exec_sync("""if True:
interp.exec("""if True:
from mymodule import reduce_chunk
while True:
@ -914,6 +856,132 @@ so the code takes advantage of directly sharing ``memoryview`` buffers.
use_results(results)
Rationale
=========
A Minimal API
-------------
Since the core dev team has no real experience with
how users will make use of multiple interpreters in Python code, this
proposal purposefully keeps the initial API as lean and minimal as
possible. The objective is to provide a well-considered foundation
on which further (more advanced) functionality may be added later,
as appropriate.
That said, the proposed design incorporates lessons learned from
existing use of subinterpreters by the community, from existing stdlib
modules, and from other programming languages. It also factors in
experience from using subinterpreters in the CPython test suite and
using them in `concurrency benchmarks`_.
.. _concurrency benchmarks:
https://github.com/ericsnowcurrently/concurrency-benchmarks
create(), create_queue()
------------------------
Typically, users call a type to create instances of the type, at which
point the object's resources get provisioned. The ``interpreters``
module takes a different approach, where users must call ``create()``
to get a new interpreter or ``create_queue()`` for a new queue.
Calling ``interpreters.Interpreter()`` directly only returns a wrapper
around an existing interpreters (likewise for
``interpreters.Queue()``).
This is because interpreters (and queues) are special resources.
They exist globally in the process and are not managed/owned by the
current interpreter. Thus the ``interpreters`` module makes creating
an interpreter (or queue) a visibly distinct operation from creating
an instance of ``interpreters.Interpreter``
(or ``interpreters.Queue``).
Interpreter.prepare_main() Sets Multiple Variables
--------------------------------------------------
``prepare_main()`` may be seen as a setter function of sorts.
It supports setting multiple names at once,
e.g. ``interp.prepare_main(spam=1, eggs=2)``, whereas most setters
set one item at a time. The main reason is for efficiency.
To set a value in the interpreter's ``__main__.__dict__``, the
implementation must first switch the OS thread to the identified
interpreter, which involves some non-negligible overhead. After
setting the value it must switch back.
Furthermore, there is some additional overhead to the mechanism
by which it passes objects between interpreters, which can be
reduced in aggregate if multiple values are set at once.
Therefore, ``prepare_main()`` supports setting multiple
values at once.
Propagating Exceptions
----------------------
An uncaught exception from a subinterpreter,
via ``Interpreter.exec()``,
could either be (effectively) ignored,
like ``threading.Thread()`` does,
or propagated, like the builtin ``exec()`` does.
Since ``Interpreter.exec()`` is a synchronous operation,
like the builtin ``exec()``, uncaught exceptions are propagated.
However, such exceptions are not raised directly. That's because
interpreters are isolated from each other and must not share objects,
including exceptions. That could be addressed by raising a surrogate
of the exception, whether a summary, a copy, or a proxy that wraps it.
Any of those could preserve the traceback, which is useful for
debugging. The ``ExecutionFailed`` that gets raised
is such a surrogate.
There's another concern to consider. If a propagated exception isn't
immediately caught, it will bubble up through the call stack until
caught (or not). In the case that code somewhere else may catch it,
it is helpful to identify that the exception came from a subinterpreter
(i.e. a "remote" source), rather than from the current interpreter.
That's why ``Interpreter.exec()`` raises ``ExecutionFailed`` and why
it is a plain ``Exception``, rather than a copy or proxy with a class
that matches the original exception. For example, an uncaught
``ValueError`` from a subinterpreter would never get caught in a later
``try: ... except ValueError: ...``. Instead, ``ExecutionFailed``
must be handled directly.
In contrast, exceptions propagated from ``Interpreter.call()`` do not
involve ``ExecutionFailed`` but are raised directly, as though originating
in the calling interpreter. This is because ``Interpreter.call()`` is
a higher level method that uses pickle to support objects that can't
normally be passed between interpreters.
Limited Object Sharing
----------------------
As noted in `Interpreter Isolation`_, only a small number of builtin
objects may be truly shared between interpreters. In all other cases
objects can only be shared indirectly, through copies or proxies.
The set of objects that are shareable as copies through queues
(and ``Interpreter.prepare_main()``) is limited for the sake of
efficiency.
Supporting sharing of *all* objects is possible (via pickle)
but not part of this proposal. For one thing, it's helpful to know
in those cases that only an efficient implementation is being used.
Furthermore, in those cases supporting mutable objects via pickling
would violate the guarantee that "shared" objects be equivalent
(and stay that way).
Objects vs. ID Proxies
----------------------
For both interpreters and queues, the low-level module makes use of
proxy objects that expose the underlying state by their corresponding
process-global IDs. In both cases the state is likewise process-global
and will be used by multiple interpreters. Thus they aren't suitable
to be implemented as ``PyObject``, which is only really an option for
interpreter-specific data. That's why the ``interpreters`` module
instead provides objects that are weakly associated through the ID.
Rejected Ideas
==============