diff --git a/pep-0554.rst b/pep-0554.rst index cd5e4552a..9af0dd3fd 100644 --- a/pep-0554.rst +++ b/pep-0554.rst @@ -18,8 +18,8 @@ CPython has supported multiple interpreters in the same process (AKA "subinterpreters") since version 1.5 (1997). The feature has been available via the C-API. [c-api]_ Subinterpreters operate in `relative isolation from one another `_, which -provides the basis for an -`alternative concurrency model `_. +facilitates novel alternative approaches to +`concurrency `_. This proposal introduces the stdlib ``interpreters`` module. The module will be `provisional `_. It exposes the basic @@ -27,10 +27,28 @@ functionality of subinterpreters already provided by the C-API, along with new (basic) functionality for sharing data between interpreters. +A Disclaimer about the GIL +========================== + +To avoid any confusion up front: This PEP is unrelated to any efforts +to stop sharing the GIL between subinterpreters. At most this proposal +will allow users to take advantage of any results of work on the GIL. +The position here is that exposing subinterpreters to Python code is +worth doing, even if they still share the GIL. + + Proposal ======== -The ``interpreters`` module will be added to the stdlib. It will +The ``interpreters`` module will be added to the stdlib. To help +authors of extension modules, a new page will be added to the +`Extending Python `_ docs. More information on both +is found in the immediately following sections. + +The "interpreters" Module +------------------------- + +The ``interpreters`` module will provide a high-level interface to subinterpreters and wrap a new low-level ``_interpreters`` (in the same way as the ``threading`` module). See the `Examples`_ section for concrete usage and use cases. @@ -72,6 +90,8 @@ For creating and using interpreters: +----------------------------------+----------------------------------------------+ | ``get_current() -> Interpreter`` | Get the currently running interpreter. | +----------------------------------+----------------------------------------------+ +| ``get_main() -> Interpreter`` | Get the main interpreter. | ++----------------------------------+----------------------------------------------+ | ``create() -> Interpreter`` | Initialize a new (idle) Python interpreter. | +----------------------------------+----------------------------------------------+ @@ -188,6 +208,17 @@ For sharing data between interpreters: | ``ChannelReleasedError`` | ``ChannelClosedError`` | The channel is released (but not yet closed). | +--------------------------+------------------------+------------------------------------------------+ +"Extending Python" Docs +----------------------- + +Many extension modules do not support use in subinterpreters. The +authors and users of such extension modules will both benefit when they +are updated to support subinterpreters. To help with that, a new page +will be added to the `Extending Python `_ docs. + +This page will explain how to implement PEP 489 support and how to move +from global module state to per-interpreter. + Examples ======== @@ -482,9 +513,9 @@ In the `Interpreter Isolation`_ section below we identify ways in which isolation in CPython's subinterpreters is incomplete. Most notable is extension modules that use C globals to store internal state. PEP 3121 and PEP 489 provide a solution for most of the -problem, but one still remains. [petr-c-ext]_ Until that is resolved, -C extension authors will face extra difficulty to support -subinterpreters. +problem, but one still remains. [petr-c-ext]_ Until that is resolved +(see PEP 573), C extension authors will face extra difficulty +to support subinterpreters. Consequently, projects that publish extension modules may face an increased maintenance burden as their users start using subinterpreters, @@ -501,6 +532,38 @@ is to offset those costs. The position of this PEP is that the actual extra maintenance burden will be small and well below the threshold at which subinterpreters are worth it. +* "creating a new concurrency API deserves much more thought and + experimentation, so the new module shouldn't go into the stdlib + right away, if ever" + +Introducing an API for a a new concurrency model, like happened with +asyncio, is an extremely large project that requires a lot of careful +consideration. It is not something that can be done a simply as this +PEP proposes and likely deserves significant time on PyPI to mature. +(See `Nathaniel's post `_ on python-dev.) + +However, this PEP does not propose any new concurrency API. At most +it exposes minimal tools (e.g. subinterpreters, channels) which may +be used to write code that follows patterns associated with (relatively) +new-to-Python `concurrency models `_. Those tools could +also be used as the basis for APIs for such concurrency models. +Again, this PEP does not propose any such API. + +* "there is no point to exposing subinterpreters if they still share + the GIL" +* "the effort to make the GIL per-interpreter is disruptive and risky" + +A common misconception is that this PEP also includes a promise that +subinterpreters will no longer share the GIL. When that is clarified, +the next question is "what is the point?". This is already answered +at length in this PEP. Just to be clear, the value lies in:: + + * increase exposure of the existing feature, which helps improve + the code health of the entire CPython runtime + * expose the (mostly) isolated execution of subinterpreters + * preparation for per-interpreter GIL + * encourage experimentation + About Subinterpreters ===================== @@ -673,6 +736,10 @@ The module provides the following functions:: Return the currently running interpreter. + get_main() => Interpreter + + Return the main interpreter. + create() -> Interpreter Initialize a new Python interpreter and return it. The @@ -807,8 +874,17 @@ to pass other objects (like ``bytes``) to ``run`` directly. Second, the main mechanism for sharing objects (i.e. their data) between interpreters is through channels. A channel is a simplex FIFO similar to a pipe. The main difference is that channels can be associated with -zero or more interpreters on either end. Unlike queues, which are also -many-to-many, channels have no buffer. +zero or more interpreters on either end. Like queues, which are also +many-to-many, channels are buffered (though they also offer methods +with unbuffered semantics). + +Python objects are not shared between interpreters. However, in some +cases data those objects wrap is actually shared and not just copied. +One example is PEP 3118 buffers. In those cases the object in the +original interpreter is kept alive until the shared data in the other +interpreter is no longer used. Then object destruction can happen like +normal in the original interpreter, along with the previously shared +data. The ``interpreters`` module provides the following functions related to channels:: @@ -817,13 +893,8 @@ to channels:: Create a new channel and return (recv, send), the RecvChannel and SendChannel corresponding to the ends of the channel. The - channel is not closed and destroyed (i.e. garbage-collected) - until the number of associated interpreters returns to 0 - (including when the channel is explicitly closed). - - An interpreter gets associated with a channel by calling its - "send()" or "recv()" method. That association gets dropped - by calling "release()" on the channel. + lifetime of the channel is determined by associations between + intepreters and the channel's ends (see below). Both ends of the channel are supported "shared" objects (i.e. may be safely shared by different interpreters. Thus they @@ -848,18 +919,15 @@ The module also provides the following channel-related classes:: interpreters => [Interpreter]: The list of interpreters associated with the "recv" end of - the channel. That means those that have called the "recv()" - (or "recv_nowait()") method, still hold a reference to the - channel end, and haven't called "release()". If the - channel has been closed then raise - ChannelClosedError. + the channel. (See below for more on how interpreters are + associated with channels.) If the channel has been closed + then raise ChannelClosedError. recv(): Return the next object (i.e. the data from the sent object) from the channel. If none have been sent then wait until - the next send. This associates the current interpreter - with the "recv" end of the channel. + the next send. If the channel is already closed then raise ChannelClosedError. If the channel isn't closed but the current interpreter already @@ -876,26 +944,18 @@ The module also provides the following channel-related classes:: release() -> bool: No longer associate the current interpreter with the channel - (on the "recv" end) and block any future association (via the - "recv()" or ``recv_nowait()`` methods). If the interpreter - was never associated with the channel then still block any - future association. The "send" end of the channel is - unaffected by a released "recv" end. + (on the "recv" end) and block any future association If the + interpreter was never associated with the channel then still + block any future association. The "send" end of the channel + is unaffected by a released "recv" end. Once an interpreter is no longer associated with the "recv" end of the channel, any "recv()" and "recv_nowait()" calls from that interpreter will fail (even ongoing calls). See "recv()" for details. - Once the number of associated interpreters on both ends drops - to 0, the channel is actually marked as closed. The Python - runtime will garbage collect all closed channels, though it - may not happen immediately. - - Note that the interpreter automatically loses its association - with the channel end when it is no longer used (i.e. has no - references) in that interpreter, as though "release()" - were called. + See below for more on how association relates to auto-closing + a channel. This operation is idempotent. Return True if "release()" has not been called before by the current interpreter. @@ -929,11 +989,9 @@ The module also provides the following channel-related classes:: Send the object (i.e. its data) to the "recv" end of the channel. Wait until the object is received. If the object - is not shareable then ValueError is raised. This associates - the current interpreter with the "send" end of the channel. + is not shareable then ValueError is raised. - This associates the current interpreter with the "send" end - of the channel. If the channel send was already released + If this channel end was already released by the interpreter then raise ChannelReleasedError. If the channel is already closed then raise ChannelClosedError. @@ -943,13 +1001,16 @@ The module also provides the following channel-related classes:: Send the object to the "recv" end of the channel. This behaves the same as "send()", except for the waiting part. If no interpreter is currently receiving (waiting on the - other end) then return False. Otherwise return True. + other end) then queue the object and return False. Otherwise + return True. send_buffer(obj): Send a MemoryView of the object rather than the object. Otherwise this is the same as "send()". Note that the object must implement the PEP 3118 buffer protocol. + The buffer will always be released in the original + interpreter, like normal. send_buffer_nowait(obj): @@ -977,12 +1038,81 @@ Note that ``send_buffer()`` is similar to how ``multiprocessing.Connection`` works. [mp-conn]_ +Channel Association +------------------- + +Each end (send/recv) of each channel is associated with a set of +interpreters. This association effectively means "the channel end +is available to that interpreter". It has ramifications on +introspection and on how channels are automatically closed. + +When a channel is created, both ends are immediately associated with +the current interpreter. When a channel end is passed to an interpreter +via ``Interpreter.run(..., channels=...)`` then that interpreter is +associated with the channel end. Likewise when a channel end is sent +through another channel, the receiving interpreter is associated with +the sent channel end. + +A channel end is explicitly released by an interpreter through the +``release()`` method. It is also done automatically for an interpreter +when the last ``*Channel`` object for the end in that interpreter is +garbage-collected, as though ``release()`` were called. + +Calling ``*Channel.close()`` automatically releases the channel in all +interpreters for both ends. + +Once the number of associated interpreters on both ends drops +to 0, the channel is actually closed. The Python runtime will +garbage-collect all closed channels, though it may not happen +immediately. + +Consequently, ``*Channel.interpreters`` means those to which the +channel end was sent, still hold a reference to the channel end, and +haven't called ``release()``. + + Open Questions ============== * add a "tp_share" type slot instead of using a global registry for shareable types? +* impact of data sharing on cache performance in multi-core scenarios? + (see [cache-line-ping-pong]_) + +* strictly disallow subinterpreter import of extension modules without + PEP 489 support? + +* add "isolated" mode to subinterpreters API? + +An "isolated" mode for subinterpreters would mean an interpreter in +that mode is especially restricted. It might include any of the +following:: + + * ImportError when importing ext. module without PEP 489 support + * no daemon threads + * no threads at all + * no multiprocessing + +For now the default would be ``False``, but it would become ``True`` +later. + +* add a shareable synchronization primitive? + +This would be ``_threading.Lock`` (or something like it) where +interpreters would actually share the underlying mutex. This would +provide much better efficiency than blocking channel ops. The main +concern is that locks and channels don't mix well (as learned in Go). + +* add readiness callback support to channels? + +This is an alternative to channel buffering. It is probably +unnecessary, but may have enough advantages to consider it for the +high-level API. It may also be better only for the low-level +implementation. + +* also track which interpreters are using a channel end? + Deferred Functionality ====================== @@ -1211,7 +1341,7 @@ Pipes and Queues With the proposed object passing machanism of "channels", other similar basic types aren't required to achieve the minimal useful functionality of subinterpreters. Such types include pipes (like channels, but -one-to-one) and queues (like channels, but buffered). See below in +one-to-one) and queues (like channels, but more generic). See below in `Rejected Ideas` for more information. Even though these types aren't part of this proposal, they may still @@ -1234,9 +1364,12 @@ when the object gets received on the other end. One way to work around this is to return a locked ``threading.Lock`` from ``SendChannel.send()`` that unlocks once the object is received. -This matters for buffered channels (i.e. queues). For unbuffered -channels it is a non-issue. So this can be dealt with once channels -support buffering. +Alternately, the proposed ``SendChannel.send()`` (blocking) and +``SendChannel.send_nowait()`` provide an explicit distinction that is +less likely to confuse users. + +Note that returning a lock would matter for buffered channels +(i.e. queues). For unbuffered channels it is a non-issue. Add a "reraise" method to RunFailedError ---------------------------------------- @@ -1306,9 +1439,16 @@ ends up being slightly more complicated, requiring naming the pipes. Use queues instead of channels ------------------------------ -The main difference between queues and channels is that queues support -buffering. This would complicate the blocking semantics of ``recv()`` -and ``send()``. Also, queues can be built on top of channels. +Queues and buffered channels are almost the same thing. The main +difference is that channels has a stronger relationship with context +(i.e. the associated interpreter). + +The name "Channel" was used instead of "Queue" to avoid confusion with +the stdlib ``queue`` module. + +Note that buffering in channels does complicate the blocking semantics +of ``recv()`` and ``send()``. Also, queues can be built on top of +unbuffered channels. "enumerate" ----------- @@ -1392,6 +1532,16 @@ require extra runtime modifications. It would also make the module's implementation overly complicated. Finally, it might not even make the module easier to understand. +Only associate interpreters upon use +------------------------------------ + +Associate interpreters with channel ends only once ``recv()``, +``send()``, etc. are called. + +Doing this is potentially confusing and also can lead to unexpected +races where a channel is auto-closed before it can be used in the +original (creating) interpreter. + Implementation ============== @@ -1495,6 +1645,15 @@ References .. [multi-core-project] https://github.com/ericsnowcurrently/multi-core-python +.. [cache-line-ping-pong] + https://mail.python.org/archives/list/python-dev@python.org/message/3HVRFWHDMWPNR367GXBILZ4JJAUQ2STZ/ + +.. [nathaniel-asyncio] + https://mail.python.org/archives/list/python-dev@python.org/message/TUEAZNZHVJGGLL4OFD32OW6JJDKM6FAS/ + +.. [extension-docs] + https://docs.python.org/3/extending/index.html + Copyright =========