PEP 554: Seasonal Updates. (#944)
This commit is contained in:
parent
ad0f49e0a2
commit
1ff0e1d362
223
pep-0554.rst
223
pep-0554.rst
|
@ -15,7 +15,7 @@ Abstract
|
|||
|
||||
CPython has supported multiple interpreters in the same process (AKA
|
||||
"subinterpreters") since version 1.5 (1997). The feature has been
|
||||
available via the C-API. [c-api]_ Subinterpreters operate in
|
||||
available via the C-API. [c-api]_ Subinterpreters operate in
|
||||
`relative isolation from one another <Interpreter Isolation_>`_, which
|
||||
provides the basis for an
|
||||
`alternative concurrency model <Concurrency_>`_.
|
||||
|
@ -51,6 +51,7 @@ At first only the following types will be supported for sharing:
|
|||
* str
|
||||
* int
|
||||
* PEP 3118 buffer objects (via ``send_buffer()``)
|
||||
* PEP 554 channels
|
||||
|
||||
Support for other basic types (e.g. bool, float, Ellipsis) will be added later.
|
||||
|
||||
|
@ -152,7 +153,7 @@ For sharing data between interpreters:
|
|||
| | | receiving end of the channel and wait. |
|
||||
| | | Associate the interpreter with the channel. |
|
||||
+---------------------------+-------------------------------------------------+
|
||||
| .send_nowait(obj) | | Like send(), but Fail if not received. |
|
||||
| .send_nowait(obj) | | Like send(), but fail if not received. |
|
||||
+---------------------------+-------------------------------------------------+
|
||||
| .send_buffer(obj) | | Send the object's (PEP 3118) buffer to the |
|
||||
| | | receiving end of the channel and wait. |
|
||||
|
@ -242,6 +243,24 @@ Handling an exception
|
|||
except interpreters.RunFailedError as exc:
|
||||
print(f"got the error from the subinterpreter: {exc}")
|
||||
|
||||
Re-raising an exception
|
||||
-----------------------
|
||||
|
||||
::
|
||||
|
||||
interp = interpreters.create()
|
||||
try:
|
||||
try:
|
||||
interp.run(tw.dedent("""
|
||||
raise KeyError
|
||||
"""))
|
||||
except interpreters.RunFailedError as exc:
|
||||
raise exc.__cause__
|
||||
except KeyError:
|
||||
print("got a KeyError from the subinterpreter")
|
||||
|
||||
Note that this pattern is a candidate for later improvement.
|
||||
|
||||
Synchronize using a channel
|
||||
---------------------------
|
||||
|
||||
|
@ -494,8 +513,8 @@ each with different goals. Most center on correctness and usability.
|
|||
|
||||
One class of concurrency models focuses on isolated threads of
|
||||
execution that interoperate through some message passing scheme. A
|
||||
notable example is `Communicating Sequential Processes`_ (CSP), upon
|
||||
which Go's concurrency is based. The isolation inherent to
|
||||
notable example is `Communicating Sequential Processes`_ (CSP) (upon
|
||||
which Go's concurrency is roughly based). The isolation inherent to
|
||||
subinterpreters makes them well-suited to this approach.
|
||||
|
||||
Shared data
|
||||
|
@ -521,9 +540,9 @@ There are a number of valid solutions, several of which may be
|
|||
appropriate to support in Python. This proposal provides a single basic
|
||||
solution: "channels". Ultimately, any other solution will look similar
|
||||
to the proposed one, which will set the precedent. Note that the
|
||||
implementation of ``Interpreter.run()`` can be done in a way that allows
|
||||
for multiple solutions to coexist, but doing so is not technically
|
||||
a part of the proposal here.
|
||||
implementation of ``Interpreter.run()`` will be done in a way that
|
||||
allows for multiple solutions to coexist, but doing so is not
|
||||
technically a part of the proposal here.
|
||||
|
||||
Regarding the proposed solution, "channels", it is a basic, opt-in data
|
||||
sharing mechanism that draws inspiration from pipes, queues, and CSP's
|
||||
|
@ -534,7 +553,8 @@ channels have two operations: send and receive. A key characteristic
|
|||
of those operations is that channels transmit data derived from Python
|
||||
objects rather than the objects themselves. When objects are sent,
|
||||
their data is extracted. When the "object" is received in the other
|
||||
interpreter, the data is converted back into an object.
|
||||
interpreter, the data is converted back into an object owned by that
|
||||
interpreter.
|
||||
|
||||
To make this work, the mutable shared state will be managed by the
|
||||
Python runtime, not by any of the interpreters. Initially we will
|
||||
|
@ -552,6 +572,7 @@ channels to the following:
|
|||
* str
|
||||
* int
|
||||
* PEP 3118 buffer objects (via ``send_buffer()``)
|
||||
* channels
|
||||
|
||||
Limiting the initial shareable types is a practical matter, reducing
|
||||
the potential complexity of the initial implementation. There are a
|
||||
|
@ -589,11 +610,11 @@ Finally, some potential isolation is missing due to the current design
|
|||
of CPython. Improvements are currently going on to address gaps in this
|
||||
area:
|
||||
|
||||
* interpreters share the GIL
|
||||
* interpreters share memory management (e.g. allocators, gc)
|
||||
* GC is not run per-interpreter [global-gc]_
|
||||
* at-exit handlers are not run per-interpreter [global-atexit]_
|
||||
* extensions using the ``PyGILState_*`` API are incompatible [gilstate]_
|
||||
* interpreters share memory management (e.g. allocators, gc)
|
||||
* interpreters share the GIL
|
||||
|
||||
Existing Usage
|
||||
--------------
|
||||
|
@ -683,7 +704,7 @@ The module also provides the following class:
|
|||
"channels" keyword argument is provided (and is a mapping of
|
||||
attribute names to channels) then it is added to the interpreter's
|
||||
execution namespace (the interpreter's "__main__" module). If any
|
||||
of the values are not are not RecvChannel or SendChannel instances
|
||||
of the values are not RecvChannel or SendChannel instances
|
||||
then ValueError gets raised.
|
||||
|
||||
This may not be called on an already running interpreter. Doing
|
||||
|
@ -737,9 +758,9 @@ interpreters, we create a surrogate of the exception and its traceback
|
|||
(see ``traceback.TracebackException``), set it to ``__cause__`` on a
|
||||
new ``RunFailedError``, and raise that.
|
||||
|
||||
Raising (a proxy of) the exception is problematic since it's harder to
|
||||
distinguish between an error in the ``run()`` call and an uncaught
|
||||
exception from the subinterpreter.
|
||||
Raising (a proxy of) the exception directly is problematic since it's
|
||||
harder to distinguish between an error in the ``run()`` call and an
|
||||
uncaught exception from the subinterpreter.
|
||||
|
||||
|
||||
API for sharing data
|
||||
|
@ -763,14 +784,15 @@ whether an object is shareable or not:
|
|||
a cross-interpreter way, whether via a proxy, a copy, or some other
|
||||
means.
|
||||
|
||||
This proposal provides two ways to do share such objects between
|
||||
This proposal provides two ways to share such objects between
|
||||
interpreters.
|
||||
|
||||
First, shareable objects may be passed to ``run()`` as keyword arguments,
|
||||
where they are effectively injected into the target interpreter's
|
||||
``__main__`` module. This is mainly intended for sharing meta-objects
|
||||
(e.g. channels) between interpreters, as it is less useful to pass other
|
||||
objects (like ``bytes``) to ``run``.
|
||||
First, channels may be passed to ``run()`` via the ``channels``
|
||||
keyword argument, where they are effectively injected into the target
|
||||
interpreter's ``__main__`` module. While passing arbitrary shareable
|
||||
objects this way is possible, doing so is mainly intended for sharing
|
||||
meta-objects (e.g. channels) between interpreters. It is less useful
|
||||
to pass other objects (like ``bytes``) to ``run`` directly.
|
||||
|
||||
Second, the main mechanism for sharing objects (i.e. their data) between
|
||||
interpreters is through channels. A channel is a simplex FIFO similar
|
||||
|
@ -778,6 +800,9 @@ to a pipe. The main difference is that channels can be associated with
|
|||
zero or more interpreters on either end. Unlike queues, which are also
|
||||
many-to-many, channels have no buffer.
|
||||
|
||||
The ``interpreters`` module provides the following functions and
|
||||
classes related to channels:
|
||||
|
||||
``create_channel()``::
|
||||
|
||||
Create a new channel and return (recv, send), the RecvChannel and
|
||||
|
@ -802,24 +827,25 @@ many-to-many, channels have no buffer.
|
|||
``RecvChannel(id)``::
|
||||
|
||||
The receiving end of a channel. An interpreter may use this to
|
||||
receive objects from another interpreter. At first only bytes will
|
||||
be supported.
|
||||
receive objects from another interpreter. At first only a few of
|
||||
the simple, immutable builtin types will be supported.
|
||||
|
||||
id:
|
||||
|
||||
The channel's unique ID.
|
||||
The channel's unique ID. This is shared with the "send" end.
|
||||
|
||||
interpreters:
|
||||
|
||||
The list of associated interpreters: those that have called
|
||||
the "recv()" or "__next__()" methods and haven't called
|
||||
"release()" (and the channel hasn't been explicitly closed).
|
||||
the "recv()" method and haven't called "release()" (and the
|
||||
channel hasn't been explicitly closed).
|
||||
|
||||
recv():
|
||||
|
||||
Return the next object (i.e. the data from the sent object) from
|
||||
the channel. If none have been sent then wait until the next
|
||||
send. This associates the current interpreter with the channel.
|
||||
send. This associates the current interpreter with the "recv"
|
||||
end of the channel.
|
||||
|
||||
If the channel is already closed then raise ChannelClosedError.
|
||||
If the channel isn't closed but the current interpreter already
|
||||
|
@ -848,7 +874,7 @@ many-to-many, channels have no buffer.
|
|||
to 0, the channel is actually marked as closed. The Python
|
||||
runtime will garbage collect all closed channels, though it may
|
||||
not be immediately. Note that "release()" is automatically called
|
||||
in behalf of the current interpreter when the channel is no longer
|
||||
on behalf of the current interpreter when the channel is no longer
|
||||
used (i.e. has no references) in that interpreter.
|
||||
|
||||
This operation is idempotent. Return True if "release()" has not
|
||||
|
@ -857,21 +883,21 @@ many-to-many, channels have no buffer.
|
|||
close(force=False):
|
||||
|
||||
Close both ends of the channel (in all interpreters). This means
|
||||
that any further use of the channel raises ChannelClosedError. If
|
||||
the channel is not empty then raise ChannelNotEmptyError (if
|
||||
"force" is False) or discard the remaining objects (if "force"
|
||||
is True) and close it.
|
||||
that any further use of the channel anywhere raises
|
||||
ChannelClosedError. If the channel is not empty then raise
|
||||
ChannelNotEmptyError (if "force" is False) or discard the
|
||||
remaining objects (if "force" is True) and close it.
|
||||
|
||||
|
||||
``SendChannel(id)``::
|
||||
|
||||
The sending end of a channel. An interpreter may use this to send
|
||||
objects to another interpreter. At first only bytes will be
|
||||
supported.
|
||||
objects to another interpreter. At first only a few of
|
||||
the simple, immutable builtin types will be supported.
|
||||
|
||||
id:
|
||||
|
||||
The channel's unique ID.
|
||||
The channel's unique ID. This is shared with the "recv" end.
|
||||
|
||||
interpreters:
|
||||
|
||||
|
@ -882,8 +908,9 @@ many-to-many, channels have no buffer.
|
|||
|
||||
Send the object (i.e. its data) to the receiving end of the
|
||||
channel. Wait until the object is received. If the the
|
||||
object is not shareable then ValueError is raised. Currently
|
||||
only bytes are supported.
|
||||
object is not shareable then ValueError is raised. This
|
||||
associates the current interpreter with the "send" end of the
|
||||
channel.
|
||||
|
||||
If the channel is already closed then raise ChannelClosedError.
|
||||
If the channel isn't closed but the current interpreter already
|
||||
|
@ -892,9 +919,10 @@ many-to-many, channels have no buffer.
|
|||
|
||||
send_nowait(obj):
|
||||
|
||||
Send the object to the receiving end of the channel. If the other
|
||||
end is not currently receiving then raise NotReceivedError.
|
||||
Otherwise this is the same as "send()".
|
||||
Send the object to the receiving end of the channel. If no
|
||||
interpreter is currently receiving (waiting on the other end)
|
||||
then raise NotReceivedError. Otherwise this is the same as
|
||||
"send()".
|
||||
|
||||
send_buffer(obj):
|
||||
|
||||
|
@ -918,9 +946,9 @@ many-to-many, channels have no buffer.
|
|||
Close both ends of the channel (in all interpreters). No matter
|
||||
what the "send" end of the channel is immediately closed. If the
|
||||
channel is empty then close the "recv" end immediately too.
|
||||
Otherwise wait until the channel is empty before closing it (if
|
||||
"force" is False) or discard the remaining items and close
|
||||
immediately (if "force" is True).
|
||||
Otherwise, if "force" if False, close the "recv" end (and hence
|
||||
the full channel) once the channel becomes empty; or, if "force"
|
||||
is True, discard the remaining items and close immediately.
|
||||
|
||||
Note that ``send_buffer()`` is similar to how
|
||||
``multiprocessing.Connection`` works. [mp-conn]_
|
||||
|
@ -929,53 +957,10 @@ Note that ``send_buffer()`` is similar to how
|
|||
Open Questions
|
||||
==============
|
||||
|
||||
* "force" argument to ``ch.release()``?
|
||||
* add a "tp_share" type slot instead of using a global registry
|
||||
for shareable types?
|
||||
|
||||
|
||||
Open Implementation Questions
|
||||
=============================
|
||||
|
||||
Does every interpreter think that their thread is the "main" thread?
|
||||
--------------------------------------------------------------------
|
||||
|
||||
(This is more of an implementation detail that an issue for the PEP.)
|
||||
|
||||
CPython's interpreter implementation identifies the OS thread in which
|
||||
it was started as the "main" thread. The interpreter the has slightly
|
||||
different behavior depending on if the current thread is the main one
|
||||
or not. This presents a problem in cases where "main thread" is meant
|
||||
to imply "main thread in the main interpreter" [main-thread]_, where
|
||||
the main interpreter is the initial one.
|
||||
|
||||
Disallow subinterpreters in the main thread?
|
||||
--------------------------------------------
|
||||
|
||||
(This is more of an implementation detail that an issue for the PEP.)
|
||||
|
||||
This is a specific case of the above issue. Currently in CPython,
|
||||
"we need a main \*thread\* in order to sensibly manage the way signal
|
||||
handling works across different platforms". [main-thread]_
|
||||
|
||||
Since signal handlers are part of the interpreter state, running a
|
||||
subinterpreter in the main thread means that the main interpreter
|
||||
can no longer properly handle signals (since it's effectively paused).
|
||||
|
||||
Furthermore, running a subinterpreter in the main thread would
|
||||
conceivably allow setting signal handlers on that interpreter, which
|
||||
would likewise impact signal handling when that interpreter isn't
|
||||
running or is running in a different thread.
|
||||
|
||||
Ultimately, running subinterpreters in the main OS thread introduces
|
||||
complications to the signal handling implementation. So it may make
|
||||
the most sense to disallow running subinterpreters in the main thread.
|
||||
Support for it could be considered later. The downside is that folks
|
||||
wanting to try out subinterpreters would be required to take the extra
|
||||
step of using threads. This could slow adoption and experimentation,
|
||||
whereas without the restriction there's less of an obstacle.
|
||||
|
||||
|
||||
Deferred Functionality
|
||||
======================
|
||||
|
||||
|
@ -1048,10 +1033,11 @@ Syntactic Support
|
|||
|
||||
The ``Go`` language provides a concurrency model based on CSP, so
|
||||
it's similar to the concurrency model that subinterpreters support.
|
||||
``Go`` provides syntactic support, as well several builtin concurrency
|
||||
primitives, to make concurrency a first-class feature. Conceivably,
|
||||
similar syntactic (and builtin) support could be added to Python using
|
||||
subinterpreters. However, that is *way* outside the scope of this PEP!
|
||||
However, ``Go`` also provides syntactic support, as well several builtin
|
||||
concurrency primitives, to make concurrency a first-class feature.
|
||||
Conceivably, similar syntactic (and builtin) support could be added to
|
||||
Python using subinterpreters. However, that is *way* outside the scope
|
||||
of this PEP!
|
||||
|
||||
Multiprocessing
|
||||
---------------
|
||||
|
@ -1072,19 +1058,21 @@ raise an ImportError if unsupported.
|
|||
|
||||
Alternately we could support opting in to subinterpreter support.
|
||||
However, that would probably exclude many more modules (unnecessarily)
|
||||
than the opt-out approach.
|
||||
than the opt-out approach. Also, note that PEP 489 defined that an
|
||||
extension's use of the PEP's machinery implies support for
|
||||
subinterpreters.
|
||||
|
||||
The scope of adding the ModuleDef slot and fixing up the import
|
||||
machinery is non-trivial, but could be worth it. It all depends on
|
||||
how many extension modules break under subinterpreters. Given the
|
||||
relatively few cases we know of through mod_wsgi, we can leave this
|
||||
for later.
|
||||
how many extension modules break under subinterpreters. Given that
|
||||
there are relatively few cases we know of through mod_wsgi, we can
|
||||
leave this for later.
|
||||
|
||||
Poisoning channels
|
||||
------------------
|
||||
|
||||
CSP has the concept of poisoning a channel. Once a channel has been
|
||||
poisoned, and ``send()`` or ``recv()`` call on it will raise a special
|
||||
poisoned, any ``send()`` or ``recv()`` call on it would raise a special
|
||||
exception, effectively ending execution in the interpreter that tried
|
||||
to use the poisoned channel.
|
||||
|
||||
|
@ -1092,15 +1080,6 @@ This could be accomplished by adding a ``poison()`` method to both ends
|
|||
of the channel. The ``close()`` method can be used in this way
|
||||
(mostly), but these semantics are relatively specialized and can wait.
|
||||
|
||||
Sending channels over channels
|
||||
------------------------------
|
||||
|
||||
Some advanced usage of subinterpreters could take advantage of the
|
||||
ability to send channels over channels, in addition to bytes. Given
|
||||
that channels will already be multi-interpreter safe, supporting then
|
||||
in ``RecvChannel.recv()`` wouldn't be a big change. However, this can
|
||||
wait until the basic functionality has been ironed out.
|
||||
|
||||
Reseting __main__
|
||||
-----------------
|
||||
|
||||
|
@ -1161,7 +1140,7 @@ Per Antoine Pitrou [async]_::
|
|||
on (probably a file descriptor?).
|
||||
|
||||
A possible solution is to provide async implementations of the blocking
|
||||
channel methods (``__next__()``, ``recv()``, and ``send()``). However,
|
||||
channel methods (``recv()``, and ``send()``). However,
|
||||
the basic functionality of subinterpreters does not depend on async and
|
||||
can be added later.
|
||||
|
||||
|
@ -1320,6 +1299,39 @@ Rejected possible solutions:
|
|||
to do something similar
|
||||
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
The implementation of the PEP has 4 parts:
|
||||
|
||||
* the high-level module described in this PEP (mostly a light wrapper
|
||||
around a low-level C extension
|
||||
* the low-level C extension module
|
||||
* additions to the ("private") C=API needed by the low-level module
|
||||
* secondary fixes/changes in the CPython runtime that facilitate
|
||||
the low-level module (among other benefits)
|
||||
|
||||
These are at various levels of completion, with more done the lower
|
||||
you go:
|
||||
|
||||
* the high-level module has been, at best, roughly implemented.
|
||||
However, fully implementing it will be almost trivial.
|
||||
* the low-level module is mostly complete. The bulk of the
|
||||
implementation was merged into master in December 2018 as the
|
||||
"_xxsubinterpreters" module (for the sake of testing subinterpreter
|
||||
functionality). Only 3 parts of the implementation remain:
|
||||
"send_wait()", "send_buffer()", and exception propagation. All three
|
||||
have been mostly finished, but were blocked by work related to ceval.
|
||||
That blocker is basically resolved now and finishing the low-level
|
||||
will not require extensive work.
|
||||
* all necessary C-API work has been finished
|
||||
* all anticipated work in the runtime has been finished
|
||||
|
||||
The implementation effort for PEP 554 is being tracked as part of
|
||||
a larger project aimed at improving multi-core support in CPython.
|
||||
[multi-core-project]_
|
||||
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
|
@ -1389,6 +1401,9 @@ References
|
|||
.. [pypy]
|
||||
https://mail.python.org/pipermail/python-ideas/2017-September/046973.html
|
||||
|
||||
.. [multi-core-project]
|
||||
https://github.com/ericsnowcurrently/multi-core-python
|
||||
|
||||
|
||||
Copyright
|
||||
=========
|
||||
|
|
Loading…
Reference in New Issue