PEP 554: updates after feedback (#1388)

2020-04-29 17:48:23 -06:00 · 2020-04-29 17:48:23 -06:00 · 08a58eccaa
parent e589d83236
commit 08a58eccaa
1 changed files with 170 additions and 66 deletions
--- a/pep-0554.rst
+++ b/pep-0554.rst
@ -8,7 +8,7 @@ Content-Type: text/x-rst
 Created: 2017-09-05
 Python-Version: 3.9
 Post-History: 07-Sep-2017, 08-Sep-2017, 13-Sep-2017, 05-Dec-2017,
-              09-May-2018
+              09-May-2018, 20-Apr-2020
 Abstract
@ -106,7 +106,7 @@ For creating and using interpreters:
 +----------------------------------------+-----------------------------------------------------+
 | ``.is_running() -> bool``              | Is the interpreter currently executing code?        |
 +----------------------------------------+-----------------------------------------------------+
-| ``.destroy()``                         | Finalize and destroy the interpreter.               |
+| ``.close()``                           | Finalize and destroy the interpreter.               |
 +----------------------------------------+-----------------------------------------------------+
 | ``.run(src_str, /, *, channels=None)`` | | Run the given source code in the interpreter.     |
 |                                        | | (This blocks the current thread until done.)      |
@ -738,7 +738,8 @@ The module provides the following functions::
   get_main() => Interpreter
-      Return the main interpreter.
+      Return the main interpreter.  If the Python implementation
      has no concept of a main interpreter then return None.
   create() -> Interpreter
@ -763,7 +764,7 @@ The module also provides the following class::
         code.  Calling this on the current interpreter will always
         return True.
-      destroy():
+      close():
         Finalize and destroy the interpreter.
@ -925,9 +926,13 @@ The module also provides the following channel-related classes::
      recv():
-         Return the next object (i.e. the data from the sent object)
+         Return the next object from the channel.  If none have been
-         from the channel.  If none have been sent then wait until
+         sent then wait until the next send.
-         the next send.
+
         At the least, the object will be equivalent to the sent object.
         That will almost always mean the same type with the same data,
         though it could also be a compatible proxy.  Regardless, it may
         use a copy of that data or actually share the data.
         If the channel is already closed then raise ChannelClosedError.
         If the channel isn't closed but the current interpreter already
@ -1085,17 +1090,27 @@ Open Questions
 * add "isolated" mode to subinterpreters API?
-An "isolated" mode for subinterpreters would mean an interpreter in
+There are various ways that an interpreter could potentially operate
-that mode is especially restricted.  It might include any of the
+in a more isolated/restricted way::
 following::
   * ImportError when importing ext. module without PEP 489 support
   * no daemon threads
   * no threads at all
   * no multiprocessing
   * ...
-For now the default would be ``False``, but it would become ``True``
+This could be facilitated via settinga (separate or an int flag) on
-later.
+the ``PyConfig`` struct on each ``PyInterpreterState``.  (This would
 require moving ``_PyInterpreterState_SetConfig()`` to the public C-API.)
 By default the settings would all be False, for backward compatibility.
 The ``interpreters`` module, however, would likely use a more
 restrictive default (e.g. always require PEP 489 support).  This would
 effectively be the "isolated" mode.  It would make sense to add an arg
 to ``interpreters.create()`` to disable "isolated" mode (at least the
 PEP 489 part), since then extension authors could test their modules
 under subinterpreters (without having to release a potentially broken
 build with PEP 489 support).
 * add a shareable synchronization primitive?
@ -1104,15 +1119,49 @@ interpreters would actually share the underlying mutex.  This would
 provide much better efficiency than blocking channel ops.  The main
 concern is that locks and channels don't mix well (as learned in Go).
 * add readiness callback support to channels?
 This is an alternative to channel buffering.  It is probably
 unnecessary, but may have enough advantages to consider it for the
 high-level API.  It may also be better only for the low-level
 implementation.
 * also track which interpreters are using a channel end?
 * auto-run in a thread?
 The PEP proposes a hard separation between subinterpreters and threads:
 if you want to run in a thread you must create the thread yourself and
 call ``run()`` in it.  However, it might be convenient if ``run()``
 could do that for you, meaning there would be less boilerplate.
 Furthermore, we anticipate that users will want to run in a thread much
 more often than not.  So it would make sense to make this the default
 behavior.  We would add a kw-only param "threaded" (default ``True``)
 to ``run()`` to allow the run-in-the-current-thread operation.
 * what to do about BaseException propagation?
 The exception types that inherit from ``BaseException`` (aside from
 ``Exception``) are usually treated specially.  These types are:
 ``KeyboardInterrupt``, ``SystemExit``, and ``GeneratorExit``.  It may
 make sense to treat them specially when it comes to propagation from
 ``run()``.  Here are some options::
   * propagate like normal via RunFailedError
   * do not propagate (handle them somehow in the subinterpreter)
   * propagate them directly (avoid RunFailedError)
   * propagate them directly (set RunFailedError as __cause__)
 TODO
 ======
 * add a more detailed description of channel lifespan
 A state machine diagram may be most effective.  Relevant questions:
 * How does an interpreter detach from the receiving end of a channel
   that is never empty?
 * What happens if an interpreter deletes the last reference to a
   non-empty channel?
 * On the receiving end, or on the sending end?
 * run the CPython test suite in a subinterpreter and see what shakes out
 Deferred Functionality
 ======================
@ -1143,18 +1192,6 @@ Typically functions that have a ``block`` argument also have a
 functions that otherwise block, like the channel ``recv()`` and
 ``send()`` methods.  We can add it later if needed.
 get_main()
 ----------
 CPython has a concept of a "main" interpreter.  This is the initial
 interpreter created during CPython's runtime initialization.  It may
 be useful to identify the main interpreter.  For instance, the main
 interpreter should not be destroyed.  However, for the basic
 functionality of a high-level API a ``get_main()`` function is not
 necessary.  Furthermore, there is no requirement that a Python
 implementation have a concept of a main interpreter.  So until there's
 a clear need we'll leave ``get_main()`` out.
 Interpreter.run_in_thread()
 ---------------------------
@ -1318,6 +1355,15 @@ channel methods (``recv()``, and ``send()``).  However,
 the basic functionality of subinterpreters does not depend on async and
 can be added later.
 Alternately, "readiness callbacks" could be used to simplify use in
 async scenarios.  This would mean adding an optional ``callback``
 (kw-only) parameter to the ``recv_nowait()`` and ``send_nowait()``
 channel methods.  The callback would be called once the object was sent
 or received (respectively).
 (Note that making channels buffered makes readiness callbacks less
 important.)
 Support for iteration
 ---------------------
@ -1340,9 +1386,9 @@ Pipes and Queues
 With the proposed object passing machanism of "channels", other similar
 basic types aren't required to achieve the minimal useful functionality
-of subinterpreters.  Such types include pipes (like channels, but
+of subinterpreters.  Such types include pipes (like unbuffered channels,
-one-to-one) and queues (like channels, but more generic).  See below in
+but one-to-one) and queues (like channels, but more generic).  See below
-`Rejected Ideas` for more information.
+in `Rejected Ideas` for more information.
 Even though these types aren't part of this proposal, they may still
 be useful in the context of concurrency.  Adding them later is entirely
@ -1350,12 +1396,6 @@ reasonable.  The could be trivially implemented as wrappers around
 channels.  Alternatively they could be implemented for efficiency at the
 same low level as channels.
 Buffering
 ---------
 The proposed channels are unbuffered.  This simplifies the API and
 implementation.  If buffering is desirable we can add it later.
 Return a lock from send()
 -------------------------
@ -1371,26 +1411,6 @@ less likely to confuse users.
 Note that returning a lock would matter for buffered channels
 (i.e. queues).  For unbuffered channels it is a non-issue.
 Add a "reraise" method to RunFailedError
 ----------------------------------------
 While having ``__cause__`` set on ``RunFailedError`` helps produce a
 more useful traceback, it's less helpful when handling the original
 error.  To help facilitate this, we could add
 ``RunFailedError.reraise()``.  This method would enable the following
 pattern::
   try:
       interp.run(script)
   except RunFailedError as exc:
       try:
           exc.reraise()
       except MyException:
           ...
 This would be made even simpler if there existed a ``__reraise__``
 protocol.
 Support prioritization in channels
 ----------------------------------
@ -1411,6 +1431,51 @@ will require significant work, especially when it comes to complex
 objects and most especially for mutable containers of mutable
 complex objects.
 Make exceptions shareable
 -------------------------
 Exceptions are propagated out of ``run()`` calls, so it isn't a big
 leap to make them shareable in channels.  However, as noted elsewhere,
 it isn't essential or (particularly common) so we can wait on doing
 that.
 Make RunFailedError.__cause__ lazy
 ----------------------------------
 An uncaught exception in a subinterpreter (from ``run()``) is copied
 to the calling interpreter and set as ``__cause__`` on a
 ``RunFailedError`` which is then raised.  That copying part involves
 some sort of deserialization in the calling intepreter, which can be
 expensive (e.g. due to imports) yet is not always necessary.
 So it may be useful to use an ``ExceptionProxy`` type to wrap the
 serialized exception and only deserialize it when needed.  That could
 be via ``ExceptionProxy__getattribute__()`` or perhaps through
 ``RunFailedError.resolve()`` (which would raise the deserialized
 exception and set ``RunFailedError.__cause__`` to the exception.
 It may also make sense to have ``RunFailedError.__cause__`` be a
 descriptor that does the lazy deserialization (and set ``__cause__``)
 on the ``RunFailedError`` instance.
 Serialize everything through channels
 -------------------------------------
 We could use pickle (or marshal) to serialize everything sent through
 channels.  Doing this is potentially inefficient, but it may be a
 matter of convenience in the end.  We can add it later, but trying to
 remove it later would be significantly more painful.
 Return a value from ``run()``
 -----------------------------
 Currently ``run()`` always returns None.  One idea is to return the
 return value from whatever the subinterpreter ran.  However, for now
 it doesn't make sense.  The only thing folks can run is a string of
 code (i.e. a script).  This is equivalent to ``PyRun_StringFlags()``,
 ``exec()``, or a module body.  None of those "return" anything.  We can
 revisit this once ``run()`` supports functions, etc.
 Rejected Ideas
 ==============
@ -1440,15 +1505,11 @@ Use queues instead of channels
 ------------------------------
 Queues and buffered channels are almost the same thing.  The main
-difference is that channels has a stronger relationship with context
+difference is that channels have a stronger relationship with context
 (i.e. the associated interpreter).
 The name "Channel" was used instead of "Queue" to avoid confusion with
-the stdlib ``queue`` module.
+the stdlib ``queue.Queue``.
 Note that buffering in channels does complicate the blocking semantics
 of ``recv()`` and ``send()``.  Also, queues can be built on top of
 unbuffered channels.
 "enumerate"
 -----------
@ -1542,6 +1603,49 @@ Doing this is potentially confusing and also can lead to unexpected
 races where a channel is auto-closed before it can be used in the
 original (creating) interpreter.
 Add a "reraise" method to RunFailedError
 ----------------------------------------
 While having ``__cause__`` set on ``RunFailedError`` helps produce a
 more useful traceback, it's less helpful when handling the original
 error.  To help facilitate this, we could add
 ``RunFailedError.reraise()``.  This method would enable the following
 pattern::
   try:
       try:
           interp.run(script)
       except RunFailedError as exc:
           exc.reraise()
   except MyException:
       ...
 This would be made even simpler if there existed a ``__reraise__``
 protocol.
 All that said, this is completely unnecessary.  Using ``__cause__``
 is good enough::
   try:
       try:
           interp.run(script)
       except RunFailedError as exc:
           raise exc.__cause__
   except MyException:
       ...
 Note that in extreme cases it may require a little extra boilerplate::
   try:
       try:
           interp.run(script)
       except RunFailedError as exc:
           if exc.__cause__ is not None:
               raise exc.__cause__
           raise  # re-raise
   except MyException:
       ...
 Implementation
 ==============