PEP 554: Seasonal Updates. (#944)

This commit is contained in:
Eric Snow 2019-03-23 00:12:14 -06:00 committed by GitHub
parent ad0f49e0a2
commit 1ff0e1d362
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 119 additions and 104 deletions

View File

@ -15,7 +15,7 @@ Abstract
CPython has supported multiple interpreters in the same process (AKA
"subinterpreters") since version 1.5 (1997). The feature has been
available via the C-API. [c-api]_ Subinterpreters operate in
available via the C-API. [c-api]_ Subinterpreters operate in
`relative isolation from one another <Interpreter Isolation_>`_, which
provides the basis for an
`alternative concurrency model <Concurrency_>`_.
@ -51,6 +51,7 @@ At first only the following types will be supported for sharing:
* str
* int
* PEP 3118 buffer objects (via ``send_buffer()``)
* PEP 554 channels
Support for other basic types (e.g. bool, float, Ellipsis) will be added later.
@ -152,7 +153,7 @@ For sharing data between interpreters:
| | | receiving end of the channel and wait. |
| | | Associate the interpreter with the channel. |
+---------------------------+-------------------------------------------------+
| .send_nowait(obj) | | Like send(), but Fail if not received. |
| .send_nowait(obj) | | Like send(), but fail if not received. |
+---------------------------+-------------------------------------------------+
| .send_buffer(obj) | | Send the object's (PEP 3118) buffer to the |
| | | receiving end of the channel and wait. |
@ -242,6 +243,24 @@ Handling an exception
except interpreters.RunFailedError as exc:
print(f"got the error from the subinterpreter: {exc}")
Re-raising an exception
-----------------------
::
interp = interpreters.create()
try:
try:
interp.run(tw.dedent("""
raise KeyError
"""))
except interpreters.RunFailedError as exc:
raise exc.__cause__
except KeyError:
print("got a KeyError from the subinterpreter")
Note that this pattern is a candidate for later improvement.
Synchronize using a channel
---------------------------
@ -494,8 +513,8 @@ each with different goals. Most center on correctness and usability.
One class of concurrency models focuses on isolated threads of
execution that interoperate through some message passing scheme. A
notable example is `Communicating Sequential Processes`_ (CSP), upon
which Go's concurrency is based. The isolation inherent to
notable example is `Communicating Sequential Processes`_ (CSP) (upon
which Go's concurrency is roughly based). The isolation inherent to
subinterpreters makes them well-suited to this approach.
Shared data
@ -521,9 +540,9 @@ There are a number of valid solutions, several of which may be
appropriate to support in Python. This proposal provides a single basic
solution: "channels". Ultimately, any other solution will look similar
to the proposed one, which will set the precedent. Note that the
implementation of ``Interpreter.run()`` can be done in a way that allows
for multiple solutions to coexist, but doing so is not technically
a part of the proposal here.
implementation of ``Interpreter.run()`` will be done in a way that
allows for multiple solutions to coexist, but doing so is not
technically a part of the proposal here.
Regarding the proposed solution, "channels", it is a basic, opt-in data
sharing mechanism that draws inspiration from pipes, queues, and CSP's
@ -534,7 +553,8 @@ channels have two operations: send and receive. A key characteristic
of those operations is that channels transmit data derived from Python
objects rather than the objects themselves. When objects are sent,
their data is extracted. When the "object" is received in the other
interpreter, the data is converted back into an object.
interpreter, the data is converted back into an object owned by that
interpreter.
To make this work, the mutable shared state will be managed by the
Python runtime, not by any of the interpreters. Initially we will
@ -552,6 +572,7 @@ channels to the following:
* str
* int
* PEP 3118 buffer objects (via ``send_buffer()``)
* channels
Limiting the initial shareable types is a practical matter, reducing
the potential complexity of the initial implementation. There are a
@ -589,11 +610,11 @@ Finally, some potential isolation is missing due to the current design
of CPython. Improvements are currently going on to address gaps in this
area:
* interpreters share the GIL
* interpreters share memory management (e.g. allocators, gc)
* GC is not run per-interpreter [global-gc]_
* at-exit handlers are not run per-interpreter [global-atexit]_
* extensions using the ``PyGILState_*`` API are incompatible [gilstate]_
* interpreters share memory management (e.g. allocators, gc)
* interpreters share the GIL
Existing Usage
--------------
@ -683,7 +704,7 @@ The module also provides the following class:
"channels" keyword argument is provided (and is a mapping of
attribute names to channels) then it is added to the interpreter's
execution namespace (the interpreter's "__main__" module). If any
of the values are not are not RecvChannel or SendChannel instances
of the values are not RecvChannel or SendChannel instances
then ValueError gets raised.
This may not be called on an already running interpreter. Doing
@ -737,9 +758,9 @@ interpreters, we create a surrogate of the exception and its traceback
(see ``traceback.TracebackException``), set it to ``__cause__`` on a
new ``RunFailedError``, and raise that.
Raising (a proxy of) the exception is problematic since it's harder to
distinguish between an error in the ``run()`` call and an uncaught
exception from the subinterpreter.
Raising (a proxy of) the exception directly is problematic since it's
harder to distinguish between an error in the ``run()`` call and an
uncaught exception from the subinterpreter.
API for sharing data
@ -763,14 +784,15 @@ whether an object is shareable or not:
a cross-interpreter way, whether via a proxy, a copy, or some other
means.
This proposal provides two ways to do share such objects between
This proposal provides two ways to share such objects between
interpreters.
First, shareable objects may be passed to ``run()`` as keyword arguments,
where they are effectively injected into the target interpreter's
``__main__`` module. This is mainly intended for sharing meta-objects
(e.g. channels) between interpreters, as it is less useful to pass other
objects (like ``bytes``) to ``run``.
First, channels may be passed to ``run()`` via the ``channels``
keyword argument, where they are effectively injected into the target
interpreter's ``__main__`` module. While passing arbitrary shareable
objects this way is possible, doing so is mainly intended for sharing
meta-objects (e.g. channels) between interpreters. It is less useful
to pass other objects (like ``bytes``) to ``run`` directly.
Second, the main mechanism for sharing objects (i.e. their data) between
interpreters is through channels. A channel is a simplex FIFO similar
@ -778,6 +800,9 @@ to a pipe. The main difference is that channels can be associated with
zero or more interpreters on either end. Unlike queues, which are also
many-to-many, channels have no buffer.
The ``interpreters`` module provides the following functions and
classes related to channels:
``create_channel()``::
Create a new channel and return (recv, send), the RecvChannel and
@ -802,24 +827,25 @@ many-to-many, channels have no buffer.
``RecvChannel(id)``::
The receiving end of a channel. An interpreter may use this to
receive objects from another interpreter. At first only bytes will
be supported.
receive objects from another interpreter. At first only a few of
the simple, immutable builtin types will be supported.
id:
The channel's unique ID.
The channel's unique ID. This is shared with the "send" end.
interpreters:
The list of associated interpreters: those that have called
the "recv()" or "__next__()" methods and haven't called
"release()" (and the channel hasn't been explicitly closed).
the "recv()" method and haven't called "release()" (and the
channel hasn't been explicitly closed).
recv():
Return the next object (i.e. the data from the sent object) from
the channel. If none have been sent then wait until the next
send. This associates the current interpreter with the channel.
send. This associates the current interpreter with the "recv"
end of the channel.
If the channel is already closed then raise ChannelClosedError.
If the channel isn't closed but the current interpreter already
@ -848,7 +874,7 @@ many-to-many, channels have no buffer.
to 0, the channel is actually marked as closed. The Python
runtime will garbage collect all closed channels, though it may
not be immediately. Note that "release()" is automatically called
in behalf of the current interpreter when the channel is no longer
on behalf of the current interpreter when the channel is no longer
used (i.e. has no references) in that interpreter.
This operation is idempotent. Return True if "release()" has not
@ -857,21 +883,21 @@ many-to-many, channels have no buffer.
close(force=False):
Close both ends of the channel (in all interpreters). This means
that any further use of the channel raises ChannelClosedError. If
the channel is not empty then raise ChannelNotEmptyError (if
"force" is False) or discard the remaining objects (if "force"
is True) and close it.
that any further use of the channel anywhere raises
ChannelClosedError. If the channel is not empty then raise
ChannelNotEmptyError (if "force" is False) or discard the
remaining objects (if "force" is True) and close it.
``SendChannel(id)``::
The sending end of a channel. An interpreter may use this to send
objects to another interpreter. At first only bytes will be
supported.
objects to another interpreter. At first only a few of
the simple, immutable builtin types will be supported.
id:
The channel's unique ID.
The channel's unique ID. This is shared with the "recv" end.
interpreters:
@ -882,8 +908,9 @@ many-to-many, channels have no buffer.
Send the object (i.e. its data) to the receiving end of the
channel. Wait until the object is received. If the the
object is not shareable then ValueError is raised. Currently
only bytes are supported.
object is not shareable then ValueError is raised. This
associates the current interpreter with the "send" end of the
channel.
If the channel is already closed then raise ChannelClosedError.
If the channel isn't closed but the current interpreter already
@ -892,9 +919,10 @@ many-to-many, channels have no buffer.
send_nowait(obj):
Send the object to the receiving end of the channel. If the other
end is not currently receiving then raise NotReceivedError.
Otherwise this is the same as "send()".
Send the object to the receiving end of the channel. If no
interpreter is currently receiving (waiting on the other end)
then raise NotReceivedError. Otherwise this is the same as
"send()".
send_buffer(obj):
@ -918,9 +946,9 @@ many-to-many, channels have no buffer.
Close both ends of the channel (in all interpreters). No matter
what the "send" end of the channel is immediately closed. If the
channel is empty then close the "recv" end immediately too.
Otherwise wait until the channel is empty before closing it (if
"force" is False) or discard the remaining items and close
immediately (if "force" is True).
Otherwise, if "force" if False, close the "recv" end (and hence
the full channel) once the channel becomes empty; or, if "force"
is True, discard the remaining items and close immediately.
Note that ``send_buffer()`` is similar to how
``multiprocessing.Connection`` works. [mp-conn]_
@ -929,53 +957,10 @@ Note that ``send_buffer()`` is similar to how
Open Questions
==============
* "force" argument to ``ch.release()``?
* add a "tp_share" type slot instead of using a global registry
for shareable types?
Open Implementation Questions
=============================
Does every interpreter think that their thread is the "main" thread?
--------------------------------------------------------------------
(This is more of an implementation detail that an issue for the PEP.)
CPython's interpreter implementation identifies the OS thread in which
it was started as the "main" thread. The interpreter the has slightly
different behavior depending on if the current thread is the main one
or not. This presents a problem in cases where "main thread" is meant
to imply "main thread in the main interpreter" [main-thread]_, where
the main interpreter is the initial one.
Disallow subinterpreters in the main thread?
--------------------------------------------
(This is more of an implementation detail that an issue for the PEP.)
This is a specific case of the above issue. Currently in CPython,
"we need a main \*thread\* in order to sensibly manage the way signal
handling works across different platforms". [main-thread]_
Since signal handlers are part of the interpreter state, running a
subinterpreter in the main thread means that the main interpreter
can no longer properly handle signals (since it's effectively paused).
Furthermore, running a subinterpreter in the main thread would
conceivably allow setting signal handlers on that interpreter, which
would likewise impact signal handling when that interpreter isn't
running or is running in a different thread.
Ultimately, running subinterpreters in the main OS thread introduces
complications to the signal handling implementation. So it may make
the most sense to disallow running subinterpreters in the main thread.
Support for it could be considered later. The downside is that folks
wanting to try out subinterpreters would be required to take the extra
step of using threads. This could slow adoption and experimentation,
whereas without the restriction there's less of an obstacle.
Deferred Functionality
======================
@ -1048,10 +1033,11 @@ Syntactic Support
The ``Go`` language provides a concurrency model based on CSP, so
it's similar to the concurrency model that subinterpreters support.
``Go`` provides syntactic support, as well several builtin concurrency
primitives, to make concurrency a first-class feature. Conceivably,
similar syntactic (and builtin) support could be added to Python using
subinterpreters. However, that is *way* outside the scope of this PEP!
However, ``Go`` also provides syntactic support, as well several builtin
concurrency primitives, to make concurrency a first-class feature.
Conceivably, similar syntactic (and builtin) support could be added to
Python using subinterpreters. However, that is *way* outside the scope
of this PEP!
Multiprocessing
---------------
@ -1072,19 +1058,21 @@ raise an ImportError if unsupported.
Alternately we could support opting in to subinterpreter support.
However, that would probably exclude many more modules (unnecessarily)
than the opt-out approach.
than the opt-out approach. Also, note that PEP 489 defined that an
extension's use of the PEP's machinery implies support for
subinterpreters.
The scope of adding the ModuleDef slot and fixing up the import
machinery is non-trivial, but could be worth it. It all depends on
how many extension modules break under subinterpreters. Given the
relatively few cases we know of through mod_wsgi, we can leave this
for later.
how many extension modules break under subinterpreters. Given that
there are relatively few cases we know of through mod_wsgi, we can
leave this for later.
Poisoning channels
------------------
CSP has the concept of poisoning a channel. Once a channel has been
poisoned, and ``send()`` or ``recv()`` call on it will raise a special
poisoned, any ``send()`` or ``recv()`` call on it would raise a special
exception, effectively ending execution in the interpreter that tried
to use the poisoned channel.
@ -1092,15 +1080,6 @@ This could be accomplished by adding a ``poison()`` method to both ends
of the channel. The ``close()`` method can be used in this way
(mostly), but these semantics are relatively specialized and can wait.
Sending channels over channels
------------------------------
Some advanced usage of subinterpreters could take advantage of the
ability to send channels over channels, in addition to bytes. Given
that channels will already be multi-interpreter safe, supporting then
in ``RecvChannel.recv()`` wouldn't be a big change. However, this can
wait until the basic functionality has been ironed out.
Reseting __main__
-----------------
@ -1161,7 +1140,7 @@ Per Antoine Pitrou [async]_::
on (probably a file descriptor?).
A possible solution is to provide async implementations of the blocking
channel methods (``__next__()``, ``recv()``, and ``send()``). However,
channel methods (``recv()``, and ``send()``). However,
the basic functionality of subinterpreters does not depend on async and
can be added later.
@ -1320,6 +1299,39 @@ Rejected possible solutions:
to do something similar
Implementation
==============
The implementation of the PEP has 4 parts:
* the high-level module described in this PEP (mostly a light wrapper
around a low-level C extension
* the low-level C extension module
* additions to the ("private") C=API needed by the low-level module
* secondary fixes/changes in the CPython runtime that facilitate
the low-level module (among other benefits)
These are at various levels of completion, with more done the lower
you go:
* the high-level module has been, at best, roughly implemented.
However, fully implementing it will be almost trivial.
* the low-level module is mostly complete. The bulk of the
implementation was merged into master in December 2018 as the
"_xxsubinterpreters" module (for the sake of testing subinterpreter
functionality). Only 3 parts of the implementation remain:
"send_wait()", "send_buffer()", and exception propagation. All three
have been mostly finished, but were blocked by work related to ceval.
That blocker is basically resolved now and finishing the low-level
will not require extensive work.
* all necessary C-API work has been finished
* all anticipated work in the runtime has been finished
The implementation effort for PEP 554 is being tracked as part of
a larger project aimed at improving multi-core support in CPython.
[multi-core-project]_
References
==========
@ -1389,6 +1401,9 @@ References
.. [pypy]
https://mail.python.org/pipermail/python-ideas/2017-September/046973.html
.. [multi-core-project]
https://github.com/ericsnowcurrently/multi-core-python
Copyright
=========