PEP 554: updates for latest feedback (#1378)
This commit is contained in:
parent
edb30ef04b
commit
5b7d9991b7
255
pep-0554.rst
255
pep-0554.rst
|
@ -18,8 +18,8 @@ CPython has supported multiple interpreters in the same process (AKA
|
|||
"subinterpreters") since version 1.5 (1997). The feature has been
|
||||
available via the C-API. [c-api]_ Subinterpreters operate in
|
||||
`relative isolation from one another <Interpreter Isolation_>`_, which
|
||||
provides the basis for an
|
||||
`alternative concurrency model <Concurrency_>`_.
|
||||
facilitates novel alternative approaches to
|
||||
`concurrency <Concurrency_>`_.
|
||||
|
||||
This proposal introduces the stdlib ``interpreters`` module. The module
|
||||
will be `provisional <Provisional Status_>`_. It exposes the basic
|
||||
|
@ -27,10 +27,28 @@ functionality of subinterpreters already provided by the C-API, along
|
|||
with new (basic) functionality for sharing data between interpreters.
|
||||
|
||||
|
||||
A Disclaimer about the GIL
|
||||
==========================
|
||||
|
||||
To avoid any confusion up front: This PEP is unrelated to any efforts
|
||||
to stop sharing the GIL between subinterpreters. At most this proposal
|
||||
will allow users to take advantage of any results of work on the GIL.
|
||||
The position here is that exposing subinterpreters to Python code is
|
||||
worth doing, even if they still share the GIL.
|
||||
|
||||
|
||||
Proposal
|
||||
========
|
||||
|
||||
The ``interpreters`` module will be added to the stdlib. It will
|
||||
The ``interpreters`` module will be added to the stdlib. To help
|
||||
authors of extension modules, a new page will be added to the
|
||||
`Extending Python <extension-docs_>`_ docs. More information on both
|
||||
is found in the immediately following sections.
|
||||
|
||||
The "interpreters" Module
|
||||
-------------------------
|
||||
|
||||
The ``interpreters`` module will
|
||||
provide a high-level interface to subinterpreters and wrap a new
|
||||
low-level ``_interpreters`` (in the same way as the ``threading``
|
||||
module). See the `Examples`_ section for concrete usage and use cases.
|
||||
|
@ -72,6 +90,8 @@ For creating and using interpreters:
|
|||
+----------------------------------+----------------------------------------------+
|
||||
| ``get_current() -> Interpreter`` | Get the currently running interpreter. |
|
||||
+----------------------------------+----------------------------------------------+
|
||||
| ``get_main() -> Interpreter`` | Get the main interpreter. |
|
||||
+----------------------------------+----------------------------------------------+
|
||||
| ``create() -> Interpreter`` | Initialize a new (idle) Python interpreter. |
|
||||
+----------------------------------+----------------------------------------------+
|
||||
|
||||
|
@ -188,6 +208,17 @@ For sharing data between interpreters:
|
|||
| ``ChannelReleasedError`` | ``ChannelClosedError`` | The channel is released (but not yet closed). |
|
||||
+--------------------------+------------------------+------------------------------------------------+
|
||||
|
||||
"Extending Python" Docs
|
||||
-----------------------
|
||||
|
||||
Many extension modules do not support use in subinterpreters. The
|
||||
authors and users of such extension modules will both benefit when they
|
||||
are updated to support subinterpreters. To help with that, a new page
|
||||
will be added to the `Extending Python <extension-docs_>`_ docs.
|
||||
|
||||
This page will explain how to implement PEP 489 support and how to move
|
||||
from global module state to per-interpreter.
|
||||
|
||||
|
||||
Examples
|
||||
========
|
||||
|
@ -482,9 +513,9 @@ In the `Interpreter Isolation`_ section below we identify ways in
|
|||
which isolation in CPython's subinterpreters is incomplete. Most
|
||||
notable is extension modules that use C globals to store internal
|
||||
state. PEP 3121 and PEP 489 provide a solution for most of the
|
||||
problem, but one still remains. [petr-c-ext]_ Until that is resolved,
|
||||
C extension authors will face extra difficulty to support
|
||||
subinterpreters.
|
||||
problem, but one still remains. [petr-c-ext]_ Until that is resolved
|
||||
(see PEP 573), C extension authors will face extra difficulty
|
||||
to support subinterpreters.
|
||||
|
||||
Consequently, projects that publish extension modules may face an
|
||||
increased maintenance burden as their users start using subinterpreters,
|
||||
|
@ -501,6 +532,38 @@ is to offset those costs. The position of this PEP is that the actual
|
|||
extra maintenance burden will be small and well below the threshold at
|
||||
which subinterpreters are worth it.
|
||||
|
||||
* "creating a new concurrency API deserves much more thought and
|
||||
experimentation, so the new module shouldn't go into the stdlib
|
||||
right away, if ever"
|
||||
|
||||
Introducing an API for a a new concurrency model, like happened with
|
||||
asyncio, is an extremely large project that requires a lot of careful
|
||||
consideration. It is not something that can be done a simply as this
|
||||
PEP proposes and likely deserves significant time on PyPI to mature.
|
||||
(See `Nathaniel's post <nathaniel-asyncio>`_ on python-dev.)
|
||||
|
||||
However, this PEP does not propose any new concurrency API. At most
|
||||
it exposes minimal tools (e.g. subinterpreters, channels) which may
|
||||
be used to write code that follows patterns associated with (relatively)
|
||||
new-to-Python `concurrency models <Concurrency_>`_. Those tools could
|
||||
also be used as the basis for APIs for such concurrency models.
|
||||
Again, this PEP does not propose any such API.
|
||||
|
||||
* "there is no point to exposing subinterpreters if they still share
|
||||
the GIL"
|
||||
* "the effort to make the GIL per-interpreter is disruptive and risky"
|
||||
|
||||
A common misconception is that this PEP also includes a promise that
|
||||
subinterpreters will no longer share the GIL. When that is clarified,
|
||||
the next question is "what is the point?". This is already answered
|
||||
at length in this PEP. Just to be clear, the value lies in::
|
||||
|
||||
* increase exposure of the existing feature, which helps improve
|
||||
the code health of the entire CPython runtime
|
||||
* expose the (mostly) isolated execution of subinterpreters
|
||||
* preparation for per-interpreter GIL
|
||||
* encourage experimentation
|
||||
|
||||
|
||||
About Subinterpreters
|
||||
=====================
|
||||
|
@ -673,6 +736,10 @@ The module provides the following functions::
|
|||
|
||||
Return the currently running interpreter.
|
||||
|
||||
get_main() => Interpreter
|
||||
|
||||
Return the main interpreter.
|
||||
|
||||
create() -> Interpreter
|
||||
|
||||
Initialize a new Python interpreter and return it. The
|
||||
|
@ -807,8 +874,17 @@ to pass other objects (like ``bytes``) to ``run`` directly.
|
|||
Second, the main mechanism for sharing objects (i.e. their data) between
|
||||
interpreters is through channels. A channel is a simplex FIFO similar
|
||||
to a pipe. The main difference is that channels can be associated with
|
||||
zero or more interpreters on either end. Unlike queues, which are also
|
||||
many-to-many, channels have no buffer.
|
||||
zero or more interpreters on either end. Like queues, which are also
|
||||
many-to-many, channels are buffered (though they also offer methods
|
||||
with unbuffered semantics).
|
||||
|
||||
Python objects are not shared between interpreters. However, in some
|
||||
cases data those objects wrap is actually shared and not just copied.
|
||||
One example is PEP 3118 buffers. In those cases the object in the
|
||||
original interpreter is kept alive until the shared data in the other
|
||||
interpreter is no longer used. Then object destruction can happen like
|
||||
normal in the original interpreter, along with the previously shared
|
||||
data.
|
||||
|
||||
The ``interpreters`` module provides the following functions related
|
||||
to channels::
|
||||
|
@ -817,13 +893,8 @@ to channels::
|
|||
|
||||
Create a new channel and return (recv, send), the RecvChannel
|
||||
and SendChannel corresponding to the ends of the channel. The
|
||||
channel is not closed and destroyed (i.e. garbage-collected)
|
||||
until the number of associated interpreters returns to 0
|
||||
(including when the channel is explicitly closed).
|
||||
|
||||
An interpreter gets associated with a channel by calling its
|
||||
"send()" or "recv()" method. That association gets dropped
|
||||
by calling "release()" on the channel.
|
||||
lifetime of the channel is determined by associations between
|
||||
intepreters and the channel's ends (see below).
|
||||
|
||||
Both ends of the channel are supported "shared" objects (i.e.
|
||||
may be safely shared by different interpreters. Thus they
|
||||
|
@ -848,18 +919,15 @@ The module also provides the following channel-related classes::
|
|||
interpreters => [Interpreter]:
|
||||
|
||||
The list of interpreters associated with the "recv" end of
|
||||
the channel. That means those that have called the "recv()"
|
||||
(or "recv_nowait()") method, still hold a reference to the
|
||||
channel end, and haven't called "release()". If the
|
||||
channel has been closed then raise
|
||||
ChannelClosedError.
|
||||
the channel. (See below for more on how interpreters are
|
||||
associated with channels.) If the channel has been closed
|
||||
then raise ChannelClosedError.
|
||||
|
||||
recv():
|
||||
|
||||
Return the next object (i.e. the data from the sent object)
|
||||
from the channel. If none have been sent then wait until
|
||||
the next send. This associates the current interpreter
|
||||
with the "recv" end of the channel.
|
||||
the next send.
|
||||
|
||||
If the channel is already closed then raise ChannelClosedError.
|
||||
If the channel isn't closed but the current interpreter already
|
||||
|
@ -876,26 +944,18 @@ The module also provides the following channel-related classes::
|
|||
release() -> bool:
|
||||
|
||||
No longer associate the current interpreter with the channel
|
||||
(on the "recv" end) and block any future association (via the
|
||||
"recv()" or ``recv_nowait()`` methods). If the interpreter
|
||||
was never associated with the channel then still block any
|
||||
future association. The "send" end of the channel is
|
||||
unaffected by a released "recv" end.
|
||||
(on the "recv" end) and block any future association If the
|
||||
interpreter was never associated with the channel then still
|
||||
block any future association. The "send" end of the channel
|
||||
is unaffected by a released "recv" end.
|
||||
|
||||
Once an interpreter is no longer associated with the "recv"
|
||||
end of the channel, any "recv()" and "recv_nowait()" calls
|
||||
from that interpreter will fail (even ongoing calls). See
|
||||
"recv()" for details.
|
||||
|
||||
Once the number of associated interpreters on both ends drops
|
||||
to 0, the channel is actually marked as closed. The Python
|
||||
runtime will garbage collect all closed channels, though it
|
||||
may not happen immediately.
|
||||
|
||||
Note that the interpreter automatically loses its association
|
||||
with the channel end when it is no longer used (i.e. has no
|
||||
references) in that interpreter, as though "release()"
|
||||
were called.
|
||||
See below for more on how association relates to auto-closing
|
||||
a channel.
|
||||
|
||||
This operation is idempotent. Return True if "release()"
|
||||
has not been called before by the current interpreter.
|
||||
|
@ -929,11 +989,9 @@ The module also provides the following channel-related classes::
|
|||
|
||||
Send the object (i.e. its data) to the "recv" end of the
|
||||
channel. Wait until the object is received. If the object
|
||||
is not shareable then ValueError is raised. This associates
|
||||
the current interpreter with the "send" end of the channel.
|
||||
is not shareable then ValueError is raised.
|
||||
|
||||
This associates the current interpreter with the "send" end
|
||||
of the channel. If the channel send was already released
|
||||
If this channel end was already released
|
||||
by the interpreter then raise ChannelReleasedError. If
|
||||
the channel is already closed then raise
|
||||
ChannelClosedError.
|
||||
|
@ -943,13 +1001,16 @@ The module also provides the following channel-related classes::
|
|||
Send the object to the "recv" end of the channel. This
|
||||
behaves the same as "send()", except for the waiting part.
|
||||
If no interpreter is currently receiving (waiting on the
|
||||
other end) then return False. Otherwise return True.
|
||||
other end) then queue the object and return False. Otherwise
|
||||
return True.
|
||||
|
||||
send_buffer(obj):
|
||||
|
||||
Send a MemoryView of the object rather than the object.
|
||||
Otherwise this is the same as "send()". Note that the
|
||||
object must implement the PEP 3118 buffer protocol.
|
||||
The buffer will always be released in the original
|
||||
interpreter, like normal.
|
||||
|
||||
send_buffer_nowait(obj):
|
||||
|
||||
|
@ -977,12 +1038,81 @@ Note that ``send_buffer()`` is similar to how
|
|||
``multiprocessing.Connection`` works. [mp-conn]_
|
||||
|
||||
|
||||
Channel Association
|
||||
-------------------
|
||||
|
||||
Each end (send/recv) of each channel is associated with a set of
|
||||
interpreters. This association effectively means "the channel end
|
||||
is available to that interpreter". It has ramifications on
|
||||
introspection and on how channels are automatically closed.
|
||||
|
||||
When a channel is created, both ends are immediately associated with
|
||||
the current interpreter. When a channel end is passed to an interpreter
|
||||
via ``Interpreter.run(..., channels=...)`` then that interpreter is
|
||||
associated with the channel end. Likewise when a channel end is sent
|
||||
through another channel, the receiving interpreter is associated with
|
||||
the sent channel end.
|
||||
|
||||
A channel end is explicitly released by an interpreter through the
|
||||
``release()`` method. It is also done automatically for an interpreter
|
||||
when the last ``*Channel`` object for the end in that interpreter is
|
||||
garbage-collected, as though ``release()`` were called.
|
||||
|
||||
Calling ``*Channel.close()`` automatically releases the channel in all
|
||||
interpreters for both ends.
|
||||
|
||||
Once the number of associated interpreters on both ends drops
|
||||
to 0, the channel is actually closed. The Python runtime will
|
||||
garbage-collect all closed channels, though it may not happen
|
||||
immediately.
|
||||
|
||||
Consequently, ``*Channel.interpreters`` means those to which the
|
||||
channel end was sent, still hold a reference to the channel end, and
|
||||
haven't called ``release()``.
|
||||
|
||||
|
||||
Open Questions
|
||||
==============
|
||||
|
||||
* add a "tp_share" type slot instead of using a global registry
|
||||
for shareable types?
|
||||
|
||||
* impact of data sharing on cache performance in multi-core scenarios?
|
||||
(see [cache-line-ping-pong]_)
|
||||
|
||||
* strictly disallow subinterpreter import of extension modules without
|
||||
PEP 489 support?
|
||||
|
||||
* add "isolated" mode to subinterpreters API?
|
||||
|
||||
An "isolated" mode for subinterpreters would mean an interpreter in
|
||||
that mode is especially restricted. It might include any of the
|
||||
following::
|
||||
|
||||
* ImportError when importing ext. module without PEP 489 support
|
||||
* no daemon threads
|
||||
* no threads at all
|
||||
* no multiprocessing
|
||||
|
||||
For now the default would be ``False``, but it would become ``True``
|
||||
later.
|
||||
|
||||
* add a shareable synchronization primitive?
|
||||
|
||||
This would be ``_threading.Lock`` (or something like it) where
|
||||
interpreters would actually share the underlying mutex. This would
|
||||
provide much better efficiency than blocking channel ops. The main
|
||||
concern is that locks and channels don't mix well (as learned in Go).
|
||||
|
||||
* add readiness callback support to channels?
|
||||
|
||||
This is an alternative to channel buffering. It is probably
|
||||
unnecessary, but may have enough advantages to consider it for the
|
||||
high-level API. It may also be better only for the low-level
|
||||
implementation.
|
||||
|
||||
* also track which interpreters are using a channel end?
|
||||
|
||||
|
||||
Deferred Functionality
|
||||
======================
|
||||
|
@ -1211,7 +1341,7 @@ Pipes and Queues
|
|||
With the proposed object passing machanism of "channels", other similar
|
||||
basic types aren't required to achieve the minimal useful functionality
|
||||
of subinterpreters. Such types include pipes (like channels, but
|
||||
one-to-one) and queues (like channels, but buffered). See below in
|
||||
one-to-one) and queues (like channels, but more generic). See below in
|
||||
`Rejected Ideas` for more information.
|
||||
|
||||
Even though these types aren't part of this proposal, they may still
|
||||
|
@ -1234,9 +1364,12 @@ when the object gets received on the other end. One way to work around
|
|||
this is to return a locked ``threading.Lock`` from ``SendChannel.send()``
|
||||
that unlocks once the object is received.
|
||||
|
||||
This matters for buffered channels (i.e. queues). For unbuffered
|
||||
channels it is a non-issue. So this can be dealt with once channels
|
||||
support buffering.
|
||||
Alternately, the proposed ``SendChannel.send()`` (blocking) and
|
||||
``SendChannel.send_nowait()`` provide an explicit distinction that is
|
||||
less likely to confuse users.
|
||||
|
||||
Note that returning a lock would matter for buffered channels
|
||||
(i.e. queues). For unbuffered channels it is a non-issue.
|
||||
|
||||
Add a "reraise" method to RunFailedError
|
||||
----------------------------------------
|
||||
|
@ -1306,9 +1439,16 @@ ends up being slightly more complicated, requiring naming the pipes.
|
|||
Use queues instead of channels
|
||||
------------------------------
|
||||
|
||||
The main difference between queues and channels is that queues support
|
||||
buffering. This would complicate the blocking semantics of ``recv()``
|
||||
and ``send()``. Also, queues can be built on top of channels.
|
||||
Queues and buffered channels are almost the same thing. The main
|
||||
difference is that channels has a stronger relationship with context
|
||||
(i.e. the associated interpreter).
|
||||
|
||||
The name "Channel" was used instead of "Queue" to avoid confusion with
|
||||
the stdlib ``queue`` module.
|
||||
|
||||
Note that buffering in channels does complicate the blocking semantics
|
||||
of ``recv()`` and ``send()``. Also, queues can be built on top of
|
||||
unbuffered channels.
|
||||
|
||||
"enumerate"
|
||||
-----------
|
||||
|
@ -1392,6 +1532,16 @@ require extra runtime modifications. It would also make the module's
|
|||
implementation overly complicated. Finally, it might not even make
|
||||
the module easier to understand.
|
||||
|
||||
Only associate interpreters upon use
|
||||
------------------------------------
|
||||
|
||||
Associate interpreters with channel ends only once ``recv()``,
|
||||
``send()``, etc. are called.
|
||||
|
||||
Doing this is potentially confusing and also can lead to unexpected
|
||||
races where a channel is auto-closed before it can be used in the
|
||||
original (creating) interpreter.
|
||||
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
@ -1495,6 +1645,15 @@ References
|
|||
.. [multi-core-project]
|
||||
https://github.com/ericsnowcurrently/multi-core-python
|
||||
|
||||
.. [cache-line-ping-pong]
|
||||
https://mail.python.org/archives/list/python-dev@python.org/message/3HVRFWHDMWPNR367GXBILZ4JJAUQ2STZ/
|
||||
|
||||
.. [nathaniel-asyncio]
|
||||
https://mail.python.org/archives/list/python-dev@python.org/message/TUEAZNZHVJGGLL4OFD32OW6JJDKM6FAS/
|
||||
|
||||
.. [extension-docs]
|
||||
https://docs.python.org/3/extending/index.html
|
||||
|
||||
|
||||
Copyright
|
||||
=========
|
||||
|
|
Loading…
Reference in New Issue