PEP 554: updates after feedback (#1388)

2020-04-29 17:48:23 -06:00 · 2020-04-29 17:48:23 -06:00 · 08a58eccaa
parent e589d83236
commit 08a58eccaa
1 changed files with 170 additions and 66 deletions
--- a/pep-0554.rst
+++ b/pep-0554.rst
@ -8,7 +8,7 @@ Content-Type: text/x-rst
 Created: 2017-09-05
 Python-Version: 3.9
 Post-History: 07-Sep-2017, 08-Sep-2017, 13-Sep-2017, 05-Dec-2017,
-              09-May-2018
+              09-May-2018, 20-Apr-2020


 Abstract
@ -106,7 +106,7 @@ For creating and using interpreters:
 +----------------------------------------+-----------------------------------------------------+
 | ``.is_running() -> bool``              | Is the interpreter currently executing code?        |
 +----------------------------------------+-----------------------------------------------------+
-| ``.destroy()``                         | Finalize and destroy the interpreter.               |
+| ``.close()``                           | Finalize and destroy the interpreter.               |
 +----------------------------------------+-----------------------------------------------------+
 | ``.run(src_str, /, *, channels=None)`` | | Run the given source code in the interpreter.     |
 |                                        | | (This blocks the current thread until done.)      |
@ -738,7 +738,8 @@ The module provides the following functions::

   get_main() => Interpreter

-      Return the main interpreter.
+      Return the main interpreter.  If the Python implementation
+      has no concept of a main interpreter then return None.

   create() -> Interpreter

@ -763,7 +764,7 @@ The module also provides the following class::
         code.  Calling this on the current interpreter will always
         return True.

-      destroy():
+      close():

         Finalize and destroy the interpreter.

@ -925,9 +926,13 @@ The module also provides the following channel-related classes::

      recv():

-         Return the next object (i.e. the data from the sent object)
-         from the channel.  If none have been sent then wait until
-         the next send.
+         Return the next object from the channel.  If none have been
+         sent then wait until the next send.
+
+         At the least, the object will be equivalent to the sent object.
+         That will almost always mean the same type with the same data,
+         though it could also be a compatible proxy.  Regardless, it may
+         use a copy of that data or actually share the data.

         If the channel is already closed then raise ChannelClosedError.
         If the channel isn't closed but the current interpreter already
@ -1085,17 +1090,27 @@ Open Questions

 * add "isolated" mode to subinterpreters API?

-An "isolated" mode for subinterpreters would mean an interpreter in
-that mode is especially restricted.  It might include any of the
-following::
+There are various ways that an interpreter could potentially operate
+in a more isolated/restricted way::

   * ImportError when importing ext. module without PEP 489 support
   * no daemon threads
   * no threads at all
   * no multiprocessing
+   * ...

-For now the default would be ``False``, but it would become ``True``
-later.
+This could be facilitated via settinga (separate or an int flag) on
+the ``PyConfig`` struct on each ``PyInterpreterState``.  (This would
+require moving ``_PyInterpreterState_SetConfig()`` to the public C-API.)
+By default the settings would all be False, for backward compatibility.
+
+The ``interpreters`` module, however, would likely use a more
+restrictive default (e.g. always require PEP 489 support).  This would
+effectively be the "isolated" mode.  It would make sense to add an arg
+to ``interpreters.create()`` to disable "isolated" mode (at least the
+PEP 489 part), since then extension authors could test their modules
+under subinterpreters (without having to release a potentially broken
+build with PEP 489 support).

 * add a shareable synchronization primitive?

@ -1104,15 +1119,49 @@ interpreters would actually share the underlying mutex.  This would
 provide much better efficiency than blocking channel ops.  The main
 concern is that locks and channels don't mix well (as learned in Go).

-* add readiness callback support to channels?
-
-This is an alternative to channel buffering.  It is probably
-unnecessary, but may have enough advantages to consider it for the
-high-level API.  It may also be better only for the low-level
-implementation.
-
 * also track which interpreters are using a channel end?

+* auto-run in a thread?
+
+The PEP proposes a hard separation between subinterpreters and threads:
+if you want to run in a thread you must create the thread yourself and
+call ``run()`` in it.  However, it might be convenient if ``run()``
+could do that for you, meaning there would be less boilerplate.
+
+Furthermore, we anticipate that users will want to run in a thread much
+more often than not.  So it would make sense to make this the default
+behavior.  We would add a kw-only param "threaded" (default ``True``)
+to ``run()`` to allow the run-in-the-current-thread operation.
+
+* what to do about BaseException propagation?
+
+The exception types that inherit from ``BaseException`` (aside from
+``Exception``) are usually treated specially.  These types are:
+``KeyboardInterrupt``, ``SystemExit``, and ``GeneratorExit``.  It may
+make sense to treat them specially when it comes to propagation from
+``run()``.  Here are some options::
+
+   * propagate like normal via RunFailedError
+   * do not propagate (handle them somehow in the subinterpreter)
+   * propagate them directly (avoid RunFailedError)
+   * propagate them directly (set RunFailedError as __cause__)
+
+
+TODO
+======
+
+* add a more detailed description of channel lifespan
+
+A state machine diagram may be most effective.  Relevant questions:
+
+ * How does an interpreter detach from the receiving end of a channel
+   that is never empty?
+ * What happens if an interpreter deletes the last reference to a
+   non-empty channel?
+ * On the receiving end, or on the sending end?
+
+* run the CPython test suite in a subinterpreter and see what shakes out
+

 Deferred Functionality
 ======================
@ -1143,18 +1192,6 @@ Typically functions that have a ``block`` argument also have a
 functions that otherwise block, like the channel ``recv()`` and
 ``send()`` methods.  We can add it later if needed.

-get_main()
----------
-
-CPython has a concept of a "main" interpreter.  This is the initial
-interpreter created during CPython's runtime initialization.  It may
-be useful to identify the main interpreter.  For instance, the main
-interpreter should not be destroyed.  However, for the basic
-functionality of a high-level API a ``get_main()`` function is not
-necessary.  Furthermore, there is no requirement that a Python
-implementation have a concept of a main interpreter.  So until there's
-a clear need we'll leave ``get_main()`` out.
-
 Interpreter.run_in_thread()
 ---------------------------

@ -1318,6 +1355,15 @@ channel methods (``recv()``, and ``send()``).  However,
 the basic functionality of subinterpreters does not depend on async and
 can be added later.

+Alternately, "readiness callbacks" could be used to simplify use in
+async scenarios.  This would mean adding an optional ``callback``
+(kw-only) parameter to the ``recv_nowait()`` and ``send_nowait()``
+channel methods.  The callback would be called once the object was sent
+or received (respectively).
+
+(Note that making channels buffered makes readiness callbacks less
+important.)
+
 Support for iteration
 ---------------------

@ -1340,9 +1386,9 @@ Pipes and Queues

 With the proposed object passing machanism of "channels", other similar
 basic types aren't required to achieve the minimal useful functionality
-of subinterpreters.  Such types include pipes (like channels, but
-one-to-one) and queues (like channels, but more generic).  See below in
-`Rejected Ideas` for more information.
+of subinterpreters.  Such types include pipes (like unbuffered channels,
+but one-to-one) and queues (like channels, but more generic).  See below
+in `Rejected Ideas` for more information.

 Even though these types aren't part of this proposal, they may still
 be useful in the context of concurrency.  Adding them later is entirely
@ -1350,12 +1396,6 @@ reasonable.  The could be trivially implemented as wrappers around
 channels.  Alternatively they could be implemented for efficiency at the
 same low level as channels.

-Buffering
---------
-
-The proposed channels are unbuffered.  This simplifies the API and
-implementation.  If buffering is desirable we can add it later.
-
 Return a lock from send()
 -------------------------

@ -1371,26 +1411,6 @@ less likely to confuse users.
 Note that returning a lock would matter for buffered channels
 (i.e. queues).  For unbuffered channels it is a non-issue.

-Add a "reraise" method to RunFailedError
----------------------------------------
-
-While having ``__cause__`` set on ``RunFailedError`` helps produce a
-more useful traceback, it's less helpful when handling the original
-error.  To help facilitate this, we could add
-``RunFailedError.reraise()``.  This method would enable the following
-pattern::
-
-   try:
-       interp.run(script)
-   except RunFailedError as exc:
-       try:
-           exc.reraise()
-       except MyException:
-           ...
-
-This would be made even simpler if there existed a ``__reraise__``
-protocol.
-
 Support prioritization in channels
 ----------------------------------

@ -1411,6 +1431,51 @@ will require significant work, especially when it comes to complex
 objects and most especially for mutable containers of mutable
 complex objects.

+Make exceptions shareable
+-------------------------
+
+Exceptions are propagated out of ``run()`` calls, so it isn't a big
+leap to make them shareable in channels.  However, as noted elsewhere,
+it isn't essential or (particularly common) so we can wait on doing
+that.
+
+Make RunFailedError.__cause__ lazy
+----------------------------------
+
+An uncaught exception in a subinterpreter (from ``run()``) is copied
+to the calling interpreter and set as ``__cause__`` on a
+``RunFailedError`` which is then raised.  That copying part involves
+some sort of deserialization in the calling intepreter, which can be
+expensive (e.g. due to imports) yet is not always necessary.
+
+So it may be useful to use an ``ExceptionProxy`` type to wrap the
+serialized exception and only deserialize it when needed.  That could
+be via ``ExceptionProxy__getattribute__()`` or perhaps through
+``RunFailedError.resolve()`` (which would raise the deserialized
+exception and set ``RunFailedError.__cause__`` to the exception.
+
+It may also make sense to have ``RunFailedError.__cause__`` be a
+descriptor that does the lazy deserialization (and set ``__cause__``)
+on the ``RunFailedError`` instance.
+
+Serialize everything through channels
+-------------------------------------
+
+We could use pickle (or marshal) to serialize everything sent through
+channels.  Doing this is potentially inefficient, but it may be a
+matter of convenience in the end.  We can add it later, but trying to
+remove it later would be significantly more painful.
+
+Return a value from ``run()``
+-----------------------------
+
+Currently ``run()`` always returns None.  One idea is to return the
+return value from whatever the subinterpreter ran.  However, for now
+it doesn't make sense.  The only thing folks can run is a string of
+code (i.e. a script).  This is equivalent to ``PyRun_StringFlags()``,
+``exec()``, or a module body.  None of those "return" anything.  We can
+revisit this once ``run()`` supports functions, etc.
+

 Rejected Ideas
 ==============
@ -1440,15 +1505,11 @@ Use queues instead of channels
 ------------------------------

 Queues and buffered channels are almost the same thing.  The main
-difference is that channels has a stronger relationship with context
+difference is that channels have a stronger relationship with context
 (i.e. the associated interpreter).

 The name "Channel" was used instead of "Queue" to avoid confusion with
-the stdlib ``queue`` module.
-
-Note that buffering in channels does complicate the blocking semantics
-of ``recv()`` and ``send()``.  Also, queues can be built on top of
-unbuffered channels.
+the stdlib ``queue.Queue``.

 "enumerate"
 -----------
@ -1542,6 +1603,49 @@ Doing this is potentially confusing and also can lead to unexpected
 races where a channel is auto-closed before it can be used in the
 original (creating) interpreter.

+Add a "reraise" method to RunFailedError
+----------------------------------------
+
+While having ``__cause__`` set on ``RunFailedError`` helps produce a
+more useful traceback, it's less helpful when handling the original
+error.  To help facilitate this, we could add
+``RunFailedError.reraise()``.  This method would enable the following
+pattern::
+
+   try:
+       try:
+           interp.run(script)
+       except RunFailedError as exc:
+           exc.reraise()
+   except MyException:
+       ...
+
+This would be made even simpler if there existed a ``__reraise__``
+protocol.
+
+All that said, this is completely unnecessary.  Using ``__cause__``
+is good enough::
+
+   try:
+       try:
+           interp.run(script)
+       except RunFailedError as exc:
+           raise exc.__cause__
+   except MyException:
+       ...
+
+Note that in extreme cases it may require a little extra boilerplate::
+
+   try:
+       try:
+           interp.run(script)
+       except RunFailedError as exc:
+           if exc.__cause__ is not None:
+               raise exc.__cause__
+           raise  # re-raise
+   except MyException:
+       ...
+

 Implementation
 ==============