PEP 554: updates for latest feedback (#1378)

This commit is contained in:
Eric Snow 2020-04-21 10:47:03 -06:00 committed by GitHub
parent edb30ef04b
commit 5b7d9991b7
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 207 additions and 48 deletions

View File

@ -18,8 +18,8 @@ CPython has supported multiple interpreters in the same process (AKA
"subinterpreters") since version 1.5 (1997). The feature has been
available via the C-API. [c-api]_ Subinterpreters operate in
`relative isolation from one another <Interpreter Isolation_>`_, which
provides the basis for an
`alternative concurrency model <Concurrency_>`_.
facilitates novel alternative approaches to
`concurrency <Concurrency_>`_.
This proposal introduces the stdlib ``interpreters`` module. The module
will be `provisional <Provisional Status_>`_. It exposes the basic
@ -27,10 +27,28 @@ functionality of subinterpreters already provided by the C-API, along
with new (basic) functionality for sharing data between interpreters.
A Disclaimer about the GIL
==========================
To avoid any confusion up front: This PEP is unrelated to any efforts
to stop sharing the GIL between subinterpreters. At most this proposal
will allow users to take advantage of any results of work on the GIL.
The position here is that exposing subinterpreters to Python code is
worth doing, even if they still share the GIL.
Proposal
========
The ``interpreters`` module will be added to the stdlib. It will
The ``interpreters`` module will be added to the stdlib. To help
authors of extension modules, a new page will be added to the
`Extending Python <extension-docs_>`_ docs. More information on both
is found in the immediately following sections.
The "interpreters" Module
-------------------------
The ``interpreters`` module will
provide a high-level interface to subinterpreters and wrap a new
low-level ``_interpreters`` (in the same way as the ``threading``
module). See the `Examples`_ section for concrete usage and use cases.
@ -72,6 +90,8 @@ For creating and using interpreters:
+----------------------------------+----------------------------------------------+
| ``get_current() -> Interpreter`` | Get the currently running interpreter. |
+----------------------------------+----------------------------------------------+
| ``get_main() -> Interpreter`` | Get the main interpreter. |
+----------------------------------+----------------------------------------------+
| ``create() -> Interpreter`` | Initialize a new (idle) Python interpreter. |
+----------------------------------+----------------------------------------------+
@ -188,6 +208,17 @@ For sharing data between interpreters:
| ``ChannelReleasedError`` | ``ChannelClosedError`` | The channel is released (but not yet closed). |
+--------------------------+------------------------+------------------------------------------------+
"Extending Python" Docs
-----------------------
Many extension modules do not support use in subinterpreters. The
authors and users of such extension modules will both benefit when they
are updated to support subinterpreters. To help with that, a new page
will be added to the `Extending Python <extension-docs_>`_ docs.
This page will explain how to implement PEP 489 support and how to move
from global module state to per-interpreter.
Examples
========
@ -482,9 +513,9 @@ In the `Interpreter Isolation`_ section below we identify ways in
which isolation in CPython's subinterpreters is incomplete. Most
notable is extension modules that use C globals to store internal
state. PEP 3121 and PEP 489 provide a solution for most of the
problem, but one still remains. [petr-c-ext]_ Until that is resolved,
C extension authors will face extra difficulty to support
subinterpreters.
problem, but one still remains. [petr-c-ext]_ Until that is resolved
(see PEP 573), C extension authors will face extra difficulty
to support subinterpreters.
Consequently, projects that publish extension modules may face an
increased maintenance burden as their users start using subinterpreters,
@ -501,6 +532,38 @@ is to offset those costs. The position of this PEP is that the actual
extra maintenance burden will be small and well below the threshold at
which subinterpreters are worth it.
* "creating a new concurrency API deserves much more thought and
experimentation, so the new module shouldn't go into the stdlib
right away, if ever"
Introducing an API for a a new concurrency model, like happened with
asyncio, is an extremely large project that requires a lot of careful
consideration. It is not something that can be done a simply as this
PEP proposes and likely deserves significant time on PyPI to mature.
(See `Nathaniel's post <nathaniel-asyncio>`_ on python-dev.)
However, this PEP does not propose any new concurrency API. At most
it exposes minimal tools (e.g. subinterpreters, channels) which may
be used to write code that follows patterns associated with (relatively)
new-to-Python `concurrency models <Concurrency_>`_. Those tools could
also be used as the basis for APIs for such concurrency models.
Again, this PEP does not propose any such API.
* "there is no point to exposing subinterpreters if they still share
the GIL"
* "the effort to make the GIL per-interpreter is disruptive and risky"
A common misconception is that this PEP also includes a promise that
subinterpreters will no longer share the GIL. When that is clarified,
the next question is "what is the point?". This is already answered
at length in this PEP. Just to be clear, the value lies in::
* increase exposure of the existing feature, which helps improve
the code health of the entire CPython runtime
* expose the (mostly) isolated execution of subinterpreters
* preparation for per-interpreter GIL
* encourage experimentation
About Subinterpreters
=====================
@ -673,6 +736,10 @@ The module provides the following functions::
Return the currently running interpreter.
get_main() => Interpreter
Return the main interpreter.
create() -> Interpreter
Initialize a new Python interpreter and return it. The
@ -807,8 +874,17 @@ to pass other objects (like ``bytes``) to ``run`` directly.
Second, the main mechanism for sharing objects (i.e. their data) between
interpreters is through channels. A channel is a simplex FIFO similar
to a pipe. The main difference is that channels can be associated with
zero or more interpreters on either end. Unlike queues, which are also
many-to-many, channels have no buffer.
zero or more interpreters on either end. Like queues, which are also
many-to-many, channels are buffered (though they also offer methods
with unbuffered semantics).
Python objects are not shared between interpreters. However, in some
cases data those objects wrap is actually shared and not just copied.
One example is PEP 3118 buffers. In those cases the object in the
original interpreter is kept alive until the shared data in the other
interpreter is no longer used. Then object destruction can happen like
normal in the original interpreter, along with the previously shared
data.
The ``interpreters`` module provides the following functions related
to channels::
@ -817,13 +893,8 @@ to channels::
Create a new channel and return (recv, send), the RecvChannel
and SendChannel corresponding to the ends of the channel. The
channel is not closed and destroyed (i.e. garbage-collected)
until the number of associated interpreters returns to 0
(including when the channel is explicitly closed).
An interpreter gets associated with a channel by calling its
"send()" or "recv()" method. That association gets dropped
by calling "release()" on the channel.
lifetime of the channel is determined by associations between
intepreters and the channel's ends (see below).
Both ends of the channel are supported "shared" objects (i.e.
may be safely shared by different interpreters. Thus they
@ -848,18 +919,15 @@ The module also provides the following channel-related classes::
interpreters => [Interpreter]:
The list of interpreters associated with the "recv" end of
the channel. That means those that have called the "recv()"
(or "recv_nowait()") method, still hold a reference to the
channel end, and haven't called "release()". If the
channel has been closed then raise
ChannelClosedError.
the channel. (See below for more on how interpreters are
associated with channels.) If the channel has been closed
then raise ChannelClosedError.
recv():
Return the next object (i.e. the data from the sent object)
from the channel. If none have been sent then wait until
the next send. This associates the current interpreter
with the "recv" end of the channel.
the next send.
If the channel is already closed then raise ChannelClosedError.
If the channel isn't closed but the current interpreter already
@ -876,26 +944,18 @@ The module also provides the following channel-related classes::
release() -> bool:
No longer associate the current interpreter with the channel
(on the "recv" end) and block any future association (via the
"recv()" or ``recv_nowait()`` methods). If the interpreter
was never associated with the channel then still block any
future association. The "send" end of the channel is
unaffected by a released "recv" end.
(on the "recv" end) and block any future association If the
interpreter was never associated with the channel then still
block any future association. The "send" end of the channel
is unaffected by a released "recv" end.
Once an interpreter is no longer associated with the "recv"
end of the channel, any "recv()" and "recv_nowait()" calls
from that interpreter will fail (even ongoing calls). See
"recv()" for details.
Once the number of associated interpreters on both ends drops
to 0, the channel is actually marked as closed. The Python
runtime will garbage collect all closed channels, though it
may not happen immediately.
Note that the interpreter automatically loses its association
with the channel end when it is no longer used (i.e. has no
references) in that interpreter, as though "release()"
were called.
See below for more on how association relates to auto-closing
a channel.
This operation is idempotent. Return True if "release()"
has not been called before by the current interpreter.
@ -929,11 +989,9 @@ The module also provides the following channel-related classes::
Send the object (i.e. its data) to the "recv" end of the
channel. Wait until the object is received. If the object
is not shareable then ValueError is raised. This associates
the current interpreter with the "send" end of the channel.
is not shareable then ValueError is raised.
This associates the current interpreter with the "send" end
of the channel. If the channel send was already released
If this channel end was already released
by the interpreter then raise ChannelReleasedError. If
the channel is already closed then raise
ChannelClosedError.
@ -943,13 +1001,16 @@ The module also provides the following channel-related classes::
Send the object to the "recv" end of the channel. This
behaves the same as "send()", except for the waiting part.
If no interpreter is currently receiving (waiting on the
other end) then return False. Otherwise return True.
other end) then queue the object and return False. Otherwise
return True.
send_buffer(obj):
Send a MemoryView of the object rather than the object.
Otherwise this is the same as "send()". Note that the
object must implement the PEP 3118 buffer protocol.
The buffer will always be released in the original
interpreter, like normal.
send_buffer_nowait(obj):
@ -977,12 +1038,81 @@ Note that ``send_buffer()`` is similar to how
``multiprocessing.Connection`` works. [mp-conn]_
Channel Association
-------------------
Each end (send/recv) of each channel is associated with a set of
interpreters. This association effectively means "the channel end
is available to that interpreter". It has ramifications on
introspection and on how channels are automatically closed.
When a channel is created, both ends are immediately associated with
the current interpreter. When a channel end is passed to an interpreter
via ``Interpreter.run(..., channels=...)`` then that interpreter is
associated with the channel end. Likewise when a channel end is sent
through another channel, the receiving interpreter is associated with
the sent channel end.
A channel end is explicitly released by an interpreter through the
``release()`` method. It is also done automatically for an interpreter
when the last ``*Channel`` object for the end in that interpreter is
garbage-collected, as though ``release()`` were called.
Calling ``*Channel.close()`` automatically releases the channel in all
interpreters for both ends.
Once the number of associated interpreters on both ends drops
to 0, the channel is actually closed. The Python runtime will
garbage-collect all closed channels, though it may not happen
immediately.
Consequently, ``*Channel.interpreters`` means those to which the
channel end was sent, still hold a reference to the channel end, and
haven't called ``release()``.
Open Questions
==============
* add a "tp_share" type slot instead of using a global registry
for shareable types?
* impact of data sharing on cache performance in multi-core scenarios?
(see [cache-line-ping-pong]_)
* strictly disallow subinterpreter import of extension modules without
PEP 489 support?
* add "isolated" mode to subinterpreters API?
An "isolated" mode for subinterpreters would mean an interpreter in
that mode is especially restricted. It might include any of the
following::
* ImportError when importing ext. module without PEP 489 support
* no daemon threads
* no threads at all
* no multiprocessing
For now the default would be ``False``, but it would become ``True``
later.
* add a shareable synchronization primitive?
This would be ``_threading.Lock`` (or something like it) where
interpreters would actually share the underlying mutex. This would
provide much better efficiency than blocking channel ops. The main
concern is that locks and channels don't mix well (as learned in Go).
* add readiness callback support to channels?
This is an alternative to channel buffering. It is probably
unnecessary, but may have enough advantages to consider it for the
high-level API. It may also be better only for the low-level
implementation.
* also track which interpreters are using a channel end?
Deferred Functionality
======================
@ -1211,7 +1341,7 @@ Pipes and Queues
With the proposed object passing machanism of "channels", other similar
basic types aren't required to achieve the minimal useful functionality
of subinterpreters. Such types include pipes (like channels, but
one-to-one) and queues (like channels, but buffered). See below in
one-to-one) and queues (like channels, but more generic). See below in
`Rejected Ideas` for more information.
Even though these types aren't part of this proposal, they may still
@ -1234,9 +1364,12 @@ when the object gets received on the other end. One way to work around
this is to return a locked ``threading.Lock`` from ``SendChannel.send()``
that unlocks once the object is received.
This matters for buffered channels (i.e. queues). For unbuffered
channels it is a non-issue. So this can be dealt with once channels
support buffering.
Alternately, the proposed ``SendChannel.send()`` (blocking) and
``SendChannel.send_nowait()`` provide an explicit distinction that is
less likely to confuse users.
Note that returning a lock would matter for buffered channels
(i.e. queues). For unbuffered channels it is a non-issue.
Add a "reraise" method to RunFailedError
----------------------------------------
@ -1306,9 +1439,16 @@ ends up being slightly more complicated, requiring naming the pipes.
Use queues instead of channels
------------------------------
The main difference between queues and channels is that queues support
buffering. This would complicate the blocking semantics of ``recv()``
and ``send()``. Also, queues can be built on top of channels.
Queues and buffered channels are almost the same thing. The main
difference is that channels has a stronger relationship with context
(i.e. the associated interpreter).
The name "Channel" was used instead of "Queue" to avoid confusion with
the stdlib ``queue`` module.
Note that buffering in channels does complicate the blocking semantics
of ``recv()`` and ``send()``. Also, queues can be built on top of
unbuffered channels.
"enumerate"
-----------
@ -1392,6 +1532,16 @@ require extra runtime modifications. It would also make the module's
implementation overly complicated. Finally, it might not even make
the module easier to understand.
Only associate interpreters upon use
------------------------------------
Associate interpreters with channel ends only once ``recv()``,
``send()``, etc. are called.
Doing this is potentially confusing and also can lead to unexpected
races where a channel is auto-closed before it can be used in the
original (creating) interpreter.
Implementation
==============
@ -1495,6 +1645,15 @@ References
.. [multi-core-project]
https://github.com/ericsnowcurrently/multi-core-python
.. [cache-line-ping-pong]
https://mail.python.org/archives/list/python-dev@python.org/message/3HVRFWHDMWPNR367GXBILZ4JJAUQ2STZ/
.. [nathaniel-asyncio]
https://mail.python.org/archives/list/python-dev@python.org/message/TUEAZNZHVJGGLL4OFD32OW6JJDKM6FAS/
.. [extension-docs]
https://docs.python.org/3/extending/index.html
Copyright
=========