PEP 554: updates for latest feedback (#1378)

This commit is contained in:
Eric Snow 2020-04-21 10:47:03 -06:00 committed by GitHub
parent edb30ef04b
commit 5b7d9991b7
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 207 additions and 48 deletions

View File

@ -18,8 +18,8 @@ CPython has supported multiple interpreters in the same process (AKA
"subinterpreters") since version 1.5 (1997). The feature has been "subinterpreters") since version 1.5 (1997). The feature has been
available via the C-API. [c-api]_ Subinterpreters operate in available via the C-API. [c-api]_ Subinterpreters operate in
`relative isolation from one another <Interpreter Isolation_>`_, which `relative isolation from one another <Interpreter Isolation_>`_, which
provides the basis for an facilitates novel alternative approaches to
`alternative concurrency model <Concurrency_>`_. `concurrency <Concurrency_>`_.
This proposal introduces the stdlib ``interpreters`` module. The module This proposal introduces the stdlib ``interpreters`` module. The module
will be `provisional <Provisional Status_>`_. It exposes the basic will be `provisional <Provisional Status_>`_. It exposes the basic
@ -27,10 +27,28 @@ functionality of subinterpreters already provided by the C-API, along
with new (basic) functionality for sharing data between interpreters. with new (basic) functionality for sharing data between interpreters.
A Disclaimer about the GIL
==========================
To avoid any confusion up front: This PEP is unrelated to any efforts
to stop sharing the GIL between subinterpreters. At most this proposal
will allow users to take advantage of any results of work on the GIL.
The position here is that exposing subinterpreters to Python code is
worth doing, even if they still share the GIL.
Proposal Proposal
======== ========
The ``interpreters`` module will be added to the stdlib. It will The ``interpreters`` module will be added to the stdlib. To help
authors of extension modules, a new page will be added to the
`Extending Python <extension-docs_>`_ docs. More information on both
is found in the immediately following sections.
The "interpreters" Module
-------------------------
The ``interpreters`` module will
provide a high-level interface to subinterpreters and wrap a new provide a high-level interface to subinterpreters and wrap a new
low-level ``_interpreters`` (in the same way as the ``threading`` low-level ``_interpreters`` (in the same way as the ``threading``
module). See the `Examples`_ section for concrete usage and use cases. module). See the `Examples`_ section for concrete usage and use cases.
@ -72,6 +90,8 @@ For creating and using interpreters:
+----------------------------------+----------------------------------------------+ +----------------------------------+----------------------------------------------+
| ``get_current() -> Interpreter`` | Get the currently running interpreter. | | ``get_current() -> Interpreter`` | Get the currently running interpreter. |
+----------------------------------+----------------------------------------------+ +----------------------------------+----------------------------------------------+
| ``get_main() -> Interpreter`` | Get the main interpreter. |
+----------------------------------+----------------------------------------------+
| ``create() -> Interpreter`` | Initialize a new (idle) Python interpreter. | | ``create() -> Interpreter`` | Initialize a new (idle) Python interpreter. |
+----------------------------------+----------------------------------------------+ +----------------------------------+----------------------------------------------+
@ -188,6 +208,17 @@ For sharing data between interpreters:
| ``ChannelReleasedError`` | ``ChannelClosedError`` | The channel is released (but not yet closed). | | ``ChannelReleasedError`` | ``ChannelClosedError`` | The channel is released (but not yet closed). |
+--------------------------+------------------------+------------------------------------------------+ +--------------------------+------------------------+------------------------------------------------+
"Extending Python" Docs
-----------------------
Many extension modules do not support use in subinterpreters. The
authors and users of such extension modules will both benefit when they
are updated to support subinterpreters. To help with that, a new page
will be added to the `Extending Python <extension-docs_>`_ docs.
This page will explain how to implement PEP 489 support and how to move
from global module state to per-interpreter.
Examples Examples
======== ========
@ -482,9 +513,9 @@ In the `Interpreter Isolation`_ section below we identify ways in
which isolation in CPython's subinterpreters is incomplete. Most which isolation in CPython's subinterpreters is incomplete. Most
notable is extension modules that use C globals to store internal notable is extension modules that use C globals to store internal
state. PEP 3121 and PEP 489 provide a solution for most of the state. PEP 3121 and PEP 489 provide a solution for most of the
problem, but one still remains. [petr-c-ext]_ Until that is resolved, problem, but one still remains. [petr-c-ext]_ Until that is resolved
C extension authors will face extra difficulty to support (see PEP 573), C extension authors will face extra difficulty
subinterpreters. to support subinterpreters.
Consequently, projects that publish extension modules may face an Consequently, projects that publish extension modules may face an
increased maintenance burden as their users start using subinterpreters, increased maintenance burden as their users start using subinterpreters,
@ -501,6 +532,38 @@ is to offset those costs. The position of this PEP is that the actual
extra maintenance burden will be small and well below the threshold at extra maintenance burden will be small and well below the threshold at
which subinterpreters are worth it. which subinterpreters are worth it.
* "creating a new concurrency API deserves much more thought and
experimentation, so the new module shouldn't go into the stdlib
right away, if ever"
Introducing an API for a a new concurrency model, like happened with
asyncio, is an extremely large project that requires a lot of careful
consideration. It is not something that can be done a simply as this
PEP proposes and likely deserves significant time on PyPI to mature.
(See `Nathaniel's post <nathaniel-asyncio>`_ on python-dev.)
However, this PEP does not propose any new concurrency API. At most
it exposes minimal tools (e.g. subinterpreters, channels) which may
be used to write code that follows patterns associated with (relatively)
new-to-Python `concurrency models <Concurrency_>`_. Those tools could
also be used as the basis for APIs for such concurrency models.
Again, this PEP does not propose any such API.
* "there is no point to exposing subinterpreters if they still share
the GIL"
* "the effort to make the GIL per-interpreter is disruptive and risky"
A common misconception is that this PEP also includes a promise that
subinterpreters will no longer share the GIL. When that is clarified,
the next question is "what is the point?". This is already answered
at length in this PEP. Just to be clear, the value lies in::
* increase exposure of the existing feature, which helps improve
the code health of the entire CPython runtime
* expose the (mostly) isolated execution of subinterpreters
* preparation for per-interpreter GIL
* encourage experimentation
About Subinterpreters About Subinterpreters
===================== =====================
@ -673,6 +736,10 @@ The module provides the following functions::
Return the currently running interpreter. Return the currently running interpreter.
get_main() => Interpreter
Return the main interpreter.
create() -> Interpreter create() -> Interpreter
Initialize a new Python interpreter and return it. The Initialize a new Python interpreter and return it. The
@ -807,8 +874,17 @@ to pass other objects (like ``bytes``) to ``run`` directly.
Second, the main mechanism for sharing objects (i.e. their data) between Second, the main mechanism for sharing objects (i.e. their data) between
interpreters is through channels. A channel is a simplex FIFO similar interpreters is through channels. A channel is a simplex FIFO similar
to a pipe. The main difference is that channels can be associated with to a pipe. The main difference is that channels can be associated with
zero or more interpreters on either end. Unlike queues, which are also zero or more interpreters on either end. Like queues, which are also
many-to-many, channels have no buffer. many-to-many, channels are buffered (though they also offer methods
with unbuffered semantics).
Python objects are not shared between interpreters. However, in some
cases data those objects wrap is actually shared and not just copied.
One example is PEP 3118 buffers. In those cases the object in the
original interpreter is kept alive until the shared data in the other
interpreter is no longer used. Then object destruction can happen like
normal in the original interpreter, along with the previously shared
data.
The ``interpreters`` module provides the following functions related The ``interpreters`` module provides the following functions related
to channels:: to channels::
@ -817,13 +893,8 @@ to channels::
Create a new channel and return (recv, send), the RecvChannel Create a new channel and return (recv, send), the RecvChannel
and SendChannel corresponding to the ends of the channel. The and SendChannel corresponding to the ends of the channel. The
channel is not closed and destroyed (i.e. garbage-collected) lifetime of the channel is determined by associations between
until the number of associated interpreters returns to 0 intepreters and the channel's ends (see below).
(including when the channel is explicitly closed).
An interpreter gets associated with a channel by calling its
"send()" or "recv()" method. That association gets dropped
by calling "release()" on the channel.
Both ends of the channel are supported "shared" objects (i.e. Both ends of the channel are supported "shared" objects (i.e.
may be safely shared by different interpreters. Thus they may be safely shared by different interpreters. Thus they
@ -848,18 +919,15 @@ The module also provides the following channel-related classes::
interpreters => [Interpreter]: interpreters => [Interpreter]:
The list of interpreters associated with the "recv" end of The list of interpreters associated with the "recv" end of
the channel. That means those that have called the "recv()" the channel. (See below for more on how interpreters are
(or "recv_nowait()") method, still hold a reference to the associated with channels.) If the channel has been closed
channel end, and haven't called "release()". If the then raise ChannelClosedError.
channel has been closed then raise
ChannelClosedError.
recv(): recv():
Return the next object (i.e. the data from the sent object) Return the next object (i.e. the data from the sent object)
from the channel. If none have been sent then wait until from the channel. If none have been sent then wait until
the next send. This associates the current interpreter the next send.
with the "recv" end of the channel.
If the channel is already closed then raise ChannelClosedError. If the channel is already closed then raise ChannelClosedError.
If the channel isn't closed but the current interpreter already If the channel isn't closed but the current interpreter already
@ -876,26 +944,18 @@ The module also provides the following channel-related classes::
release() -> bool: release() -> bool:
No longer associate the current interpreter with the channel No longer associate the current interpreter with the channel
(on the "recv" end) and block any future association (via the (on the "recv" end) and block any future association If the
"recv()" or ``recv_nowait()`` methods). If the interpreter interpreter was never associated with the channel then still
was never associated with the channel then still block any block any future association. The "send" end of the channel
future association. The "send" end of the channel is is unaffected by a released "recv" end.
unaffected by a released "recv" end.
Once an interpreter is no longer associated with the "recv" Once an interpreter is no longer associated with the "recv"
end of the channel, any "recv()" and "recv_nowait()" calls end of the channel, any "recv()" and "recv_nowait()" calls
from that interpreter will fail (even ongoing calls). See from that interpreter will fail (even ongoing calls). See
"recv()" for details. "recv()" for details.
Once the number of associated interpreters on both ends drops See below for more on how association relates to auto-closing
to 0, the channel is actually marked as closed. The Python a channel.
runtime will garbage collect all closed channels, though it
may not happen immediately.
Note that the interpreter automatically loses its association
with the channel end when it is no longer used (i.e. has no
references) in that interpreter, as though "release()"
were called.
This operation is idempotent. Return True if "release()" This operation is idempotent. Return True if "release()"
has not been called before by the current interpreter. has not been called before by the current interpreter.
@ -929,11 +989,9 @@ The module also provides the following channel-related classes::
Send the object (i.e. its data) to the "recv" end of the Send the object (i.e. its data) to the "recv" end of the
channel. Wait until the object is received. If the object channel. Wait until the object is received. If the object
is not shareable then ValueError is raised. This associates is not shareable then ValueError is raised.
the current interpreter with the "send" end of the channel.
This associates the current interpreter with the "send" end If this channel end was already released
of the channel. If the channel send was already released
by the interpreter then raise ChannelReleasedError. If by the interpreter then raise ChannelReleasedError. If
the channel is already closed then raise the channel is already closed then raise
ChannelClosedError. ChannelClosedError.
@ -943,13 +1001,16 @@ The module also provides the following channel-related classes::
Send the object to the "recv" end of the channel. This Send the object to the "recv" end of the channel. This
behaves the same as "send()", except for the waiting part. behaves the same as "send()", except for the waiting part.
If no interpreter is currently receiving (waiting on the If no interpreter is currently receiving (waiting on the
other end) then return False. Otherwise return True. other end) then queue the object and return False. Otherwise
return True.
send_buffer(obj): send_buffer(obj):
Send a MemoryView of the object rather than the object. Send a MemoryView of the object rather than the object.
Otherwise this is the same as "send()". Note that the Otherwise this is the same as "send()". Note that the
object must implement the PEP 3118 buffer protocol. object must implement the PEP 3118 buffer protocol.
The buffer will always be released in the original
interpreter, like normal.
send_buffer_nowait(obj): send_buffer_nowait(obj):
@ -977,12 +1038,81 @@ Note that ``send_buffer()`` is similar to how
``multiprocessing.Connection`` works. [mp-conn]_ ``multiprocessing.Connection`` works. [mp-conn]_
Channel Association
-------------------
Each end (send/recv) of each channel is associated with a set of
interpreters. This association effectively means "the channel end
is available to that interpreter". It has ramifications on
introspection and on how channels are automatically closed.
When a channel is created, both ends are immediately associated with
the current interpreter. When a channel end is passed to an interpreter
via ``Interpreter.run(..., channels=...)`` then that interpreter is
associated with the channel end. Likewise when a channel end is sent
through another channel, the receiving interpreter is associated with
the sent channel end.
A channel end is explicitly released by an interpreter through the
``release()`` method. It is also done automatically for an interpreter
when the last ``*Channel`` object for the end in that interpreter is
garbage-collected, as though ``release()`` were called.
Calling ``*Channel.close()`` automatically releases the channel in all
interpreters for both ends.
Once the number of associated interpreters on both ends drops
to 0, the channel is actually closed. The Python runtime will
garbage-collect all closed channels, though it may not happen
immediately.
Consequently, ``*Channel.interpreters`` means those to which the
channel end was sent, still hold a reference to the channel end, and
haven't called ``release()``.
Open Questions Open Questions
============== ==============
* add a "tp_share" type slot instead of using a global registry * add a "tp_share" type slot instead of using a global registry
for shareable types? for shareable types?
* impact of data sharing on cache performance in multi-core scenarios?
(see [cache-line-ping-pong]_)
* strictly disallow subinterpreter import of extension modules without
PEP 489 support?
* add "isolated" mode to subinterpreters API?
An "isolated" mode for subinterpreters would mean an interpreter in
that mode is especially restricted. It might include any of the
following::
* ImportError when importing ext. module without PEP 489 support
* no daemon threads
* no threads at all
* no multiprocessing
For now the default would be ``False``, but it would become ``True``
later.
* add a shareable synchronization primitive?
This would be ``_threading.Lock`` (or something like it) where
interpreters would actually share the underlying mutex. This would
provide much better efficiency than blocking channel ops. The main
concern is that locks and channels don't mix well (as learned in Go).
* add readiness callback support to channels?
This is an alternative to channel buffering. It is probably
unnecessary, but may have enough advantages to consider it for the
high-level API. It may also be better only for the low-level
implementation.
* also track which interpreters are using a channel end?
Deferred Functionality Deferred Functionality
====================== ======================
@ -1211,7 +1341,7 @@ Pipes and Queues
With the proposed object passing machanism of "channels", other similar With the proposed object passing machanism of "channels", other similar
basic types aren't required to achieve the minimal useful functionality basic types aren't required to achieve the minimal useful functionality
of subinterpreters. Such types include pipes (like channels, but of subinterpreters. Such types include pipes (like channels, but
one-to-one) and queues (like channels, but buffered). See below in one-to-one) and queues (like channels, but more generic). See below in
`Rejected Ideas` for more information. `Rejected Ideas` for more information.
Even though these types aren't part of this proposal, they may still Even though these types aren't part of this proposal, they may still
@ -1234,9 +1364,12 @@ when the object gets received on the other end. One way to work around
this is to return a locked ``threading.Lock`` from ``SendChannel.send()`` this is to return a locked ``threading.Lock`` from ``SendChannel.send()``
that unlocks once the object is received. that unlocks once the object is received.
This matters for buffered channels (i.e. queues). For unbuffered Alternately, the proposed ``SendChannel.send()`` (blocking) and
channels it is a non-issue. So this can be dealt with once channels ``SendChannel.send_nowait()`` provide an explicit distinction that is
support buffering. less likely to confuse users.
Note that returning a lock would matter for buffered channels
(i.e. queues). For unbuffered channels it is a non-issue.
Add a "reraise" method to RunFailedError Add a "reraise" method to RunFailedError
---------------------------------------- ----------------------------------------
@ -1306,9 +1439,16 @@ ends up being slightly more complicated, requiring naming the pipes.
Use queues instead of channels Use queues instead of channels
------------------------------ ------------------------------
The main difference between queues and channels is that queues support Queues and buffered channels are almost the same thing. The main
buffering. This would complicate the blocking semantics of ``recv()`` difference is that channels has a stronger relationship with context
and ``send()``. Also, queues can be built on top of channels. (i.e. the associated interpreter).
The name "Channel" was used instead of "Queue" to avoid confusion with
the stdlib ``queue`` module.
Note that buffering in channels does complicate the blocking semantics
of ``recv()`` and ``send()``. Also, queues can be built on top of
unbuffered channels.
"enumerate" "enumerate"
----------- -----------
@ -1392,6 +1532,16 @@ require extra runtime modifications. It would also make the module's
implementation overly complicated. Finally, it might not even make implementation overly complicated. Finally, it might not even make
the module easier to understand. the module easier to understand.
Only associate interpreters upon use
------------------------------------
Associate interpreters with channel ends only once ``recv()``,
``send()``, etc. are called.
Doing this is potentially confusing and also can lead to unexpected
races where a channel is auto-closed before it can be used in the
original (creating) interpreter.
Implementation Implementation
============== ==============
@ -1495,6 +1645,15 @@ References
.. [multi-core-project] .. [multi-core-project]
https://github.com/ericsnowcurrently/multi-core-python https://github.com/ericsnowcurrently/multi-core-python
.. [cache-line-ping-pong]
https://mail.python.org/archives/list/python-dev@python.org/message/3HVRFWHDMWPNR367GXBILZ4JJAUQ2STZ/
.. [nathaniel-asyncio]
https://mail.python.org/archives/list/python-dev@python.org/message/TUEAZNZHVJGGLL4OFD32OW6JJDKM6FAS/
.. [extension-docs]
https://docs.python.org/3/extending/index.html
Copyright Copyright
========= =========