PEP 554: updates for latest feedback (#1378)

2020-04-21 10:47:03 -06:00 · 2020-04-21 10:47:03 -06:00 · 5b7d9991b7
parent edb30ef04b
commit 5b7d9991b7
1 changed files with 207 additions and 48 deletions
--- a/pep-0554.rst
+++ b/pep-0554.rst
@ -18,8 +18,8 @@ CPython has supported multiple interpreters in the same process (AKA
 "subinterpreters") since version 1.5 (1997).  The feature has been
 available via the C-API. [c-api]_  Subinterpreters operate in
 `relative isolation from one another <Interpreter Isolation_>`_, which
-provides the basis for an
+facilitates novel alternative approaches to
-`alternative concurrency model <Concurrency_>`_.
+`concurrency <Concurrency_>`_.
 This proposal introduces the stdlib ``interpreters`` module.  The module
 will be `provisional <Provisional Status_>`_.  It exposes the basic
@ -27,10 +27,28 @@ functionality of subinterpreters already provided by the C-API, along
 with new (basic) functionality for sharing data between interpreters.
 A Disclaimer about the GIL
 ==========================
 To avoid any confusion up front:  This PEP is unrelated to any efforts
 to stop sharing the GIL between subinterpreters.  At most this proposal
 will allow users to take advantage of any results of work on the GIL.
 The position here is that exposing subinterpreters to Python code is
 worth doing, even if they still share the GIL.
 Proposal
 ========
-The ``interpreters`` module will be added to the stdlib.  It will
+The ``interpreters`` module will be added to the stdlib.  To help
 authors of extension modules, a new page will be added to the
 `Extending Python <extension-docs_>`_ docs.  More information on both
 is found in the immediately following sections.
 The "interpreters" Module
 -------------------------
 The ``interpreters`` module will
 provide a high-level interface to subinterpreters and wrap a new
 low-level ``_interpreters`` (in the same way as the ``threading``
 module).  See the `Examples`_ section for concrete usage and use cases.
@ -72,6 +90,8 @@ For creating and using interpreters:
 +----------------------------------+----------------------------------------------+
 | ``get_current() -> Interpreter`` | Get the currently running interpreter.       |
 +----------------------------------+----------------------------------------------+
 | ``get_main() -> Interpreter``    | Get the main interpreter.                    |
 +----------------------------------+----------------------------------------------+
 | ``create() -> Interpreter``      | Initialize a new (idle) Python interpreter.  |
 +----------------------------------+----------------------------------------------+
@ -188,6 +208,17 @@ For sharing data between interpreters:
 | ``ChannelReleasedError`` | ``ChannelClosedError`` | The channel is released (but not yet closed).  |
 +--------------------------+------------------------+------------------------------------------------+
 "Extending Python" Docs
 -----------------------
 Many extension modules do not support use in subinterpreters.  The
 authors and users of such extension modules will both benefit when they
 are updated to support subinterpreters.  To help with that, a new page
 will be added to the `Extending Python <extension-docs_>`_ docs.
 This page will explain how to implement PEP 489 support and how to move
 from global module state to per-interpreter.
 Examples
 ========
@ -482,9 +513,9 @@ In the `Interpreter Isolation`_ section below we identify ways in
 which isolation in CPython's subinterpreters is incomplete.  Most
 notable is extension modules that use C globals to store internal
 state.  PEP 3121 and PEP 489 provide a solution for most of the
-problem, but one still remains. [petr-c-ext]_  Until that is resolved,
+problem, but one still remains. [petr-c-ext]_  Until that is resolved
-C extension authors will face extra difficulty to support
+(see PEP 573), C extension authors will face extra difficulty
-subinterpreters.
+to support subinterpreters.
 Consequently, projects that publish extension modules may face an
 increased maintenance burden as their users start using subinterpreters,
@ -501,6 +532,38 @@ is to offset those costs.  The position of this PEP is that the actual
 extra maintenance burden will be small and well below the threshold at
 which subinterpreters are worth it.
 * "creating a new concurrency API deserves much more thought and
  experimentation, so the new module shouldn't go into the stdlib
  right away, if ever"
 Introducing an API for a a new concurrency model, like happened with
 asyncio, is an extremely large project that requires a lot of careful
 consideration.  It is not something that can be done a simply as this
 PEP proposes and likely deserves significant time on PyPI to mature.
 (See `Nathaniel's post <nathaniel-asyncio>`_ on python-dev.)
 However, this PEP does not propose any new concurrency API.  At most
 it exposes minimal tools (e.g. subinterpreters, channels) which may
 be used to write code that follows patterns associated with (relatively)
 new-to-Python `concurrency models <Concurrency_>`_.  Those tools could
 also be used as the basis for APIs for such concurrency models.
 Again, this PEP does not propose any such API.
 * "there is no point to exposing subinterpreters if they still share
  the GIL"
 * "the effort to make the GIL per-interpreter is disruptive and risky"
 A common misconception is that this PEP also includes a promise that
 subinterpreters will no longer share the GIL.  When that is clarified,
 the next question is "what is the point?".  This is already answered
 at length in this PEP.  Just to be clear, the value lies in::
   * increase exposure of the existing feature, which helps improve
     the code health of the entire CPython runtime
   * expose the (mostly) isolated execution of subinterpreters
   * preparation for per-interpreter GIL
   * encourage experimentation
 About Subinterpreters
 =====================
@ -673,6 +736,10 @@ The module provides the following functions::
      Return the currently running interpreter.
   get_main() => Interpreter
      Return the main interpreter.
   create() -> Interpreter
      Initialize a new Python interpreter and return it.  The
@ -807,8 +874,17 @@ to pass other objects (like ``bytes``) to ``run`` directly.
 Second, the main mechanism for sharing objects (i.e. their data) between
 interpreters is through channels.  A channel is a simplex FIFO similar
 to a pipe.  The main difference is that channels can be associated with
-zero or more interpreters on either end.  Unlike queues, which are also
+zero or more interpreters on either end.  Like queues, which are also
-many-to-many, channels have no buffer.
+many-to-many, channels are buffered (though they also offer methods
 with unbuffered semantics).
 Python objects are not shared between interpreters.  However, in some
 cases data those objects wrap is actually shared and not just copied.
 One example is PEP 3118 buffers.  In those cases the object in the
 original interpreter is kept alive until the shared data in the other
 interpreter is no longer used.  Then object destruction can happen like
 normal in the original interpreter, along with the previously shared
 data.
 The ``interpreters`` module provides the following functions related
 to channels::
@ -817,13 +893,8 @@ to channels::
      Create a new channel and return (recv, send), the RecvChannel
      and SendChannel corresponding to the ends of the channel.  The
-      channel is not closed and destroyed (i.e. garbage-collected)
+      lifetime of the channel is determined by associations between
-      until the number of associated interpreters returns to 0
+      intepreters and the channel's ends (see below).
      (including when the channel is explicitly closed).
      An interpreter gets associated with a channel by calling its
      "send()" or "recv()" method.  That association gets dropped
      by calling "release()" on the channel.
      Both ends of the channel are supported "shared" objects (i.e.
      may be safely shared by different interpreters.  Thus they
@ -848,18 +919,15 @@ The module also provides the following channel-related classes::
      interpreters => [Interpreter]:
         The list of interpreters associated with the "recv" end of
-         the channel.  That means those that have called the "recv()"
+         the channel.  (See below for more on how interpreters are
-         (or "recv_nowait()") method, still hold a reference to the
+         associated with channels.)  If the channel has been closed
-         channel end, and haven't called "release()".  If the
+         then raise ChannelClosedError.
         channel has been closed then raise
         ChannelClosedError.
      recv():
         Return the next object (i.e. the data from the sent object)
         from the channel.  If none have been sent then wait until
-         the next send.  This associates the current interpreter
+         the next send.
         with the "recv" end of the channel.
         If the channel is already closed then raise ChannelClosedError.
         If the channel isn't closed but the current interpreter already
@ -876,26 +944,18 @@ The module also provides the following channel-related classes::
      release() -> bool:
         No longer associate the current interpreter with the channel
-         (on the "recv" end) and block any future association (via the
+         (on the "recv" end) and block any future association If the
-         "recv()" or ``recv_nowait()`` methods).  If the interpreter
+         interpreter was never associated with the channel then still
-         was never associated with the channel then still block any
+         block any future association.  The "send" end of the channel
-         future association.  The "send" end of the channel is
+         is unaffected by a released "recv" end.
         unaffected by a released "recv" end.
         Once an interpreter is no longer associated with the "recv"
         end of the channel, any "recv()" and "recv_nowait()" calls
         from that interpreter will fail (even ongoing calls).  See
         "recv()" for details.
-         Once the number of associated interpreters on both ends drops
+         See below for more on how association relates to auto-closing
-         to 0, the channel is actually marked as closed.  The Python
+         a channel.
         runtime will garbage collect all closed channels, though it
         may not happen immediately.
         Note that the interpreter automatically loses its association
         with the channel end when it is no longer used (i.e. has no
         references) in that interpreter, as though "release()"
         were called.
         This operation is idempotent.  Return True if "release()"
         has not been called before by the current interpreter.
@ -929,11 +989,9 @@ The module also provides the following channel-related classes::
         Send the object (i.e. its data) to the "recv" end of the
         channel.  Wait until the object is received.  If the object
-         is not shareable then ValueError is raised.  This associates
+         is not shareable then ValueError is raised.
         the current interpreter with the "send" end of the channel.
-         This associates the current interpreter with the "send" end
+         If this channel end was already released
         of the channel.  If the channel send was already released
         by the interpreter then raise ChannelReleasedError.  If
         the channel is already closed then raise
         ChannelClosedError.
@ -943,13 +1001,16 @@ The module also provides the following channel-related classes::
         Send the object to the "recv" end of the channel.  This
         behaves the same as "send()", except for the waiting part.
         If no interpreter is currently receiving (waiting on the
-         other end) then return False.  Otherwise return True.
+         other end) then queue the object and return False.  Otherwise
         return True.
      send_buffer(obj):
         Send a MemoryView of the object rather than the object.
         Otherwise this is the same as "send()".  Note that the
         object must implement the PEP 3118 buffer protocol.
         The buffer will always be released in the original
         interpreter, like normal.
      send_buffer_nowait(obj):
@ -977,12 +1038,81 @@ Note that ``send_buffer()`` is similar to how
 ``multiprocessing.Connection`` works. [mp-conn]_
 Channel Association
 -------------------
 Each end (send/recv) of each channel is associated with a set of
 interpreters.  This association effectively means "the channel end
 is available to that interpreter".  It has ramifications on
 introspection and on how channels are automatically closed.
 When a channel is created, both ends are immediately associated with
 the current interpreter.  When a channel end is passed to an interpreter
 via ``Interpreter.run(..., channels=...)`` then that interpreter is
 associated with the channel end.  Likewise when a channel end is sent
 through another channel, the receiving interpreter is associated with
 the sent channel end.
 A channel end is explicitly released by an interpreter through the
 ``release()`` method.  It is also done automatically for an interpreter
 when the last ``*Channel`` object for the end in that interpreter is
 garbage-collected, as though ``release()`` were called.
 Calling ``*Channel.close()`` automatically releases the channel in all
 interpreters for both ends.
 Once the number of associated interpreters on both ends drops
 to 0, the channel is actually closed.  The Python runtime will
 garbage-collect all closed channels, though it may not happen
 immediately.
 Consequently, ``*Channel.interpreters`` means those to which the
 channel end was sent, still hold a reference to the channel end, and
 haven't called ``release()``.
 Open Questions
 ==============
 * add a "tp_share" type slot instead of using a global registry
  for shareable types?
 * impact of data sharing on cache performance in multi-core scenarios?
  (see [cache-line-ping-pong]_)
 * strictly disallow subinterpreter import of extension modules without
  PEP 489 support?
 * add "isolated" mode to subinterpreters API?
 An "isolated" mode for subinterpreters would mean an interpreter in
 that mode is especially restricted.  It might include any of the
 following::
   * ImportError when importing ext. module without PEP 489 support
   * no daemon threads
   * no threads at all
   * no multiprocessing
 For now the default would be ``False``, but it would become ``True``
 later.
 * add a shareable synchronization primitive?
 This would be ``_threading.Lock`` (or something like it) where
 interpreters would actually share the underlying mutex.  This would
 provide much better efficiency than blocking channel ops.  The main
 concern is that locks and channels don't mix well (as learned in Go).
 * add readiness callback support to channels?
 This is an alternative to channel buffering.  It is probably
 unnecessary, but may have enough advantages to consider it for the
 high-level API.  It may also be better only for the low-level
 implementation.
 * also track which interpreters are using a channel end?
 Deferred Functionality
 ======================
@ -1211,7 +1341,7 @@ Pipes and Queues
 With the proposed object passing machanism of "channels", other similar
 basic types aren't required to achieve the minimal useful functionality
 of subinterpreters.  Such types include pipes (like channels, but
-one-to-one) and queues (like channels, but buffered).  See below in
+one-to-one) and queues (like channels, but more generic).  See below in
 `Rejected Ideas` for more information.
 Even though these types aren't part of this proposal, they may still
@ -1234,9 +1364,12 @@ when the object gets received on the other end.  One way to work around
 this is to return a locked ``threading.Lock`` from ``SendChannel.send()``
 that unlocks once the object is received.
-This matters for buffered channels (i.e. queues).  For unbuffered
+Alternately, the proposed ``SendChannel.send()`` (blocking) and
-channels it is a non-issue.  So this can be dealt with once channels
+``SendChannel.send_nowait()`` provide an explicit distinction that is
-support buffering.
+less likely to confuse users.
 Note that returning a lock would matter for buffered channels
 (i.e. queues).  For unbuffered channels it is a non-issue.
 Add a "reraise" method to RunFailedError
 ----------------------------------------
@ -1306,9 +1439,16 @@ ends up being slightly more complicated, requiring naming the pipes.
 Use queues instead of channels
 ------------------------------
-The main difference between queues and channels is that queues support
+Queues and buffered channels are almost the same thing.  The main
-buffering.  This would complicate the blocking semantics of ``recv()``
+difference is that channels has a stronger relationship with context
-and ``send()``.  Also, queues can be built on top of channels.
+(i.e. the associated interpreter).
 The name "Channel" was used instead of "Queue" to avoid confusion with
 the stdlib ``queue`` module.
 Note that buffering in channels does complicate the blocking semantics
 of ``recv()`` and ``send()``.  Also, queues can be built on top of
 unbuffered channels.
 "enumerate"
 -----------
@ -1392,6 +1532,16 @@ require extra runtime modifications.  It would also make the module's
 implementation overly complicated.  Finally, it might not even make
 the module easier to understand.
 Only associate interpreters upon use
 ------------------------------------
 Associate interpreters with channel ends only once ``recv()``,
 ``send()``, etc. are called.
 Doing this is potentially confusing and also can lead to unexpected
 races where a channel is auto-closed before it can be used in the
 original (creating) interpreter.
 Implementation
 ==============
@ -1495,6 +1645,15 @@ References
 .. [multi-core-project]
   https://github.com/ericsnowcurrently/multi-core-python
 .. [cache-line-ping-pong]
   https://mail.python.org/archives/list/python-dev@python.org/message/3HVRFWHDMWPNR367GXBILZ4JJAUQ2STZ/
 .. [nathaniel-asyncio]
   https://mail.python.org/archives/list/python-dev@python.org/message/TUEAZNZHVJGGLL4OFD32OW6JJDKM6FAS/
 .. [extension-docs]
   https://docs.python.org/3/extending/index.html
 Copyright
 =========