PEP 554: Address feedback. (#426)

2017-09-22 17:51:38 -06:00 · 2017-09-22 17:51:38 -06:00 · e9be941a26
parent cc8ae3d31d
commit e9be941a26
1 changed files with 291 additions and 79 deletions
--- a/pep-0554.rst
+++ b/pep-0554.rst
@ -6,7 +6,7 @@ Type: Standards Track
 Content-Type: text/x-rst
 Created: 2017-09-05
 Python-Version: 3.7
-Post-History:
+Post-History: 07-Sep-2017, 08-Sep-2017, 13-Sep-2017


 Abstract
@ -29,9 +29,8 @@ Proposal

 The ``interpreters`` module will be added to the stdlib.  It will
 provide a high-level interface to subinterpreters and wrap the low-level
-``_interpreters`` module.  The proposed API is inspired by the
-``threading`` module.  See the `Examples`_ section for concrete usage
-and use cases.
+``_interpreters`` module.  See the `Examples`_ section for concrete
+usage and use cases.

 API for interpreters
 --------------------
@ -79,9 +78,10 @@ The module also provides the following class:

      Run the provided Python source code in the interpreter.  Any
      keyword arguments are added to the interpreter's execution
-      namespace.  If any of the values are not supported for sharing
-      between interpreters then RuntimeError gets raised.  Currently
-      only channels (see "create_channel()" below) are supported.
+      namespace (the interpreter's "__main__" module).  If any of the
+      values are not supported for sharing between interpreters then
+      ValueError gets raised.  Currently only channels (see
+      "create_channel()" below) are supported.

      This may not be called on an already running interpreter.  Doing
      so results in a RuntimeError.
@ -161,43 +161,45 @@ channels have no buffer.

   interpreters:

-      The list of associated interpreters (those that have called
-      the "recv()" method).
-
-   __next__():
-
-      Return the next object from the channel.  If none have been sent
-      then wait until the next send.
+      The list of associated interpreters: those that have called
+      the "recv()" or "__next__()" methods and haven't called "close()".

   recv():

      Return the next object from the channel.  If none have been sent
-      then wait until the next send.  If the channel has been closed
-      then EOFError is raised.
+      then wait until the next send.  This associates the current
+      interpreter with the channel.
+
+      If the channel is already closed (see the close() method)
+      then raise EOFError.  If the channel isn't closed, but the current
+      interpreter already called the "close()" method (which drops its
+      association with the channel) then raise ValueError.

   recv_nowait(default=None):

      Return the next object from the channel.  If none have been sent
-      then return the default.  If the channel has been closed
-      then EOFError is raised.
+      then return the default.  Otherwise, this is the same as the
+      "recv()" method.

   close():

      No longer associate the current interpreter with the channel (on
-      the receiving end).  This is a noop if the interpreter isn't
-      already associated.  Once an interpreter is no longer associated
-      with the channel, subsequent (or current) send() and recv() calls
-      from that interpreter will raise EOFError.
+      the receiving end) and block future association (via the "recv()"
+      method.  If the interpreter was never associated with the channel
+      then still block future association.  Once an interpreter is no
+      longer associated with the channel, subsequent (or current) send()
+      and recv() calls from that interpreter will raise ValueError
+      (or EOFError if the channel is actually marked as closed).

-      Once number of associated interpreters on both ends drops to 0,
-      the channel is actually marked as closed.  The Python runtime
-      will garbage collect all closed channels.  Note that "close()" is
-      automatically called when it is no longer used in the current
-      interpreter.
+      Once the number of associated interpreters on both ends drops
+      to 0, the channel is actually marked as closed.  The Python
+      runtime will garbage collect all closed channels, though it may
+      not be immediately.  Note that "close()" is automatically called
+      in behalf of the current interpreter when the channel is no longer
+      used (i.e. has no references) in that interpreter.

-      This operation is idempotent.  Return True if the current
-      interpreter was still associated with the receiving end of the
-      channel and False otherwise.
+      This operation is idempotent.  Return True if "close()" has not
+      been called before by the current interpreter.


 ``SendChannel(id)``::
@ -217,36 +219,26 @@ channels have no buffer.

   send(obj):

-       Send the object to the receiving end of the channel.  Wait until
-       the object is received.  If the channel does not support the
-       object then TypeError is raised.  Currently only bytes are
-       supported.  If the channel has been closed then EOFError is
-       raised.
+      Send the object to the receiving end of the channel.  Wait until
+      the object is received.  If the channel does not support the
+      object then ValueError is raised.  Currently only bytes are
+      supported.
+
+      If the channel is already closed (see the close() method)
+      then raise EOFError.  If the channel isn't closed, but the current
+      interpreter already called the "close()" method (which drops its
+      association with the channel) then raise ValueError.

   send_nowait(obj):

       Send the object to the receiving end of the channel.  If the
-       object is received then return True.  Otherwise return False.
-       If the channel does not support the object then TypeError is
-       raised.  If the channel has been closed then EOFError is raised.
+       object is received then return True.  If not then return False.
+       Otherwise, this is the same as the "send()" method.

   close():

-      No longer associate the current interpreter with the channel (on
-      the sending end).  This is a noop if the interpreter isn't already
-      associated.  Once an interpreter is no longer associated with the
-      channel, subsequent (or current) send() and recv() calls from that
-      interpreter will raise EOFError.
-
-      Once number of associated interpreters on both ends drops to 0,
-      the channel is actually marked as closed.  The Python runtime
-      will garbage collect all closed channels.  Note that "close()" is
-      automatically called when it is no longer used in the current
-      interpreter.
-
-      This operation is idempotent.  Return True if the current
-      interpreter was still associated with the sending end of the
-      channel and False otherwise.
+      This is the same as "RecvChannel.close(), but applied to the
+      sending end of the channel.


 Examples
@ -281,15 +273,15 @@ Pre-populate an interpreter
 ::

   interp = interpreters.create()
-   interp.run("""if True:
+   interp.run(tw.dedent("""
       import some_lib
       import an_expensive_module
       some_lib.set_up()
-       """)
+       """))
   wait_for_request()
-   interp.run("""if True:
+   interp.run(tw.dedent("""
       some_lib.handle_request()
-       """)
+       """))

 Handling an exception
 ---------------------
@ -298,9 +290,9 @@ Handling an exception

   interp = interpreters.create()
   try:
-       interp.run("""if True:
+       interp.run(tw.dedent("""
           raise KeyError
-           """)
+           """))
   except KeyError:
       print("got the error from the subinterpreter")

@ -312,12 +304,12 @@ Synchronize using a channel
   interp = interpreters.create()
   r, s = interpreters.create_channel()
   def run():
-       interp.run("""if True:
+       interp.run(tw.dedent("""
           reader.recv()
           print("during")
           reader.close()
-           """,
-           reader=r)
+           """),
+           reader=r))
   t = threading.Thread(target=run)
   print('before')
   t.start()
@ -334,13 +326,13 @@ Sharing a file descriptor
   r1, s1 = interpreters.create_channel()
   r2, s2 = interpreters.create_channel()
   def run():
-       interp.run("""if True:
+       interp.run(tw.dedent("""
           fd = int.from_bytes(
                   reader.recv(), 'big')
           for line in os.fdopen(fd):
               print(line)
           writer.send(b'')
-           """,
+           """),
           reader=r1, writer=s2)
   t = threading.Thread(target=run)
   t.start()
@ -356,19 +348,19 @@ Passing objects via pickle

   interp = interpreters.create()
   r, s = interpreters.create_channel()
-   interp.run("""if True:
+   interp.run(tw.dedent("""
       import pickle
-       """,
+       """),
       reader=r)
   def run():
-       interp.run("""if True:
+       interp.run(tw.dedent("""
           data = reader.recv()
           while data:
               obj = pickle.loads(data)
               do_something(obj)
               data = reader.recv()
           reader.close()
-           """,
+           """),
           reader=r)
   t = threading.Thread(target=run)
   t.start()
@ -386,6 +378,27 @@ isolation within the same process.  This can be leveraged in number
 of ways.  Furthermore, subinterpreters provide a well-defined framework
 in which such isolation may extended.

+Nick Coghlan explained some of the benefits through a comparison with
+multi-processing [benefits]_::
+
+   [I] expect that communicating between subinterpreters is going
+   to end up looking an awful lot like communicating between
+   subprocesses via shared memory.
+
+   The trade-off between the two models will then be that one still
+   just looks like a single process from the point of view of the
+   outside world, and hence doesn't place any extra demands on the
+   underlying OS beyond those required to run CPython with a single
+   interpreter, while the other gives much stricter isolation
+   (including isolating C globals in extension modules), but also
+   demands much more from the OS when it comes to its IPC
+   capabilities.
+
+   The security risk profiles of the two approaches will also be quite
+   different, since using subinterpreters won't require deliberately
+   poking holes in the process isolation that operating systems give
+   you by default.
+
 CPython has supported subinterpreters, with increasing levels of
 support, since version 1.5.  While the feature has the potential
 to be a powerful tool, subinterpreters have suffered from neglect
@ -442,7 +455,8 @@ Consequently, projects that publish extension modules may face an
 increased maintenance burden as their users start using subinterpreters,
 where their modules may break.  This situation is limited to modules
 that use C globals (or use libraries that use C globals) to store
-internal state.
+internal state.  For numpy, the reported-bug rate is one every 6
+months. [bug-rate]_

 Ultimately this comes down to a question of how often it will be a
 problem in practice: how many projects would be affected, how often
@ -545,11 +559,12 @@ Existing Usage
 --------------

 Subinterpreters are not a widely used feature.  In fact, the only
-documented case of wide-spread usage is
-`mod_wsgi <https://github.com/GrahamDumpleton/mod_wsgi>`_.  On the one
-hand, this case provides confidence that existing subinterpreter support
-is relatively stable.  On the other hand, there isn't much of a sample
-size from which to judge the utility of the feature.
+documented cases of wide-spread usage are
+`mod_wsgi <https://github.com/GrahamDumpleton/mod_wsgi>`_and
+`JEP <https://github.com/ninia/jep>`_.  On the one hand, this case
+provides confidence that existing subinterpreter support is relatively
+stable.  On the other hand, there isn't much of a sample size from which
+to judge the utility of the feature.


 Provisional Status
@ -566,8 +581,18 @@ remove it.
 Alternate Python Implementations
 ================================

+I'll be soliciting feedback from the different Python implementors about
+subinterpreter support.
+
+Multiple-interpter support in the major Python implementations:
+
 TBD

+* jython: yes [jython]_
+* ironpython: yes?
+* pypy: maybe not? [pypy]_
+* micropython: ???
+

 Open Questions
 ==============
@ -585,11 +610,24 @@ interpreters get better isolation relative to memory management (which
 is necessary to stop sharing the GIL between interpreters).  So the
 semantics of how the exceptions propagate needs to be resolved.

+Possible solutions:
+
+* convert at the boundary (e.g. ``subprocess.CalledProcessError``)
+* wrap in a proxy at the boundary (including with support for
+  something like ``err.raise()`` to propagate the traceback).
+* return the exception (or its proxy) from ``run()`` instead of
+  raising it
+* return a result object (like ``subprocess`` does) [result-object]_
+* throw the exception away and expect users to deal with unhandled
+  exceptions explicitly in the script they pass to ``run()``
+  (they can pass error info out via channels); with threads you have
+  to do something similar
+
 Initial support for buffers in channels
 ---------------------------------------

 An alternative to support for bytes in channels in support for
-read-only buffers (the PEP 3119 kind).  Then ``recv()`` would return
+read-only buffers (the PEP 3118 kind).  Then ``recv()`` would return
 a memoryview to expose the buffer in a zero-copy way.  This is similar
 to what ``multiprocessing.Connection`` supports. [mp-conn]

@ -597,6 +635,68 @@ Switching to such an approach would help resolve questions of how
 passing bytes through channels will work once we isolate memory
 management in interpreters.

+Does every interpreter think that their thread is the "main" thread?
+--------------------------------------------------------------------
+
+CPython's interpreter implementation identifies the OS thread in which
+it was started as the "main" thread.  The interpreter the has slightly
+different behavior depending on if the current thread is the main one
+or not.  This presents a problem in cases where "main thread" is meant
+to imply "main thread in the main interpreter" [main-thread]_, where
+the main interpreter is the initial one.
+
+Disallow subinterpreters in the main thread?
+--------------------------------------------
+
+This is a specific case of the above issue.  Currently in CPython,
+"we need a main \*thread\* in order to sensibly manage the way signal
+handling works across different platforms".  [main-thread]_
+
+Since signal handlers are part of the interpreter state, running a
+subinterpreter in the main thread means that the main interpreter
+can no longer properly handle signals (since it's effectively paused).
+
+Furthermore, running a subinterpreter in the main thread would
+conceivably allow setting signal handlers on that interpreter, which
+would likewise impact signal handling when that interpreter isn't
+running or is running in a different thread.
+
+Ultimately, running subinterpreters in the main OS thread introduces
+complications to the signal handling implementation.  So it may make
+the most sense to disallow running subinterpreters in the main thread.
+Support for it could be considered later.  The downside is that folks
+wanting to try out subinterpreters would be required to take the extra
+step of using threads.  This could slow adoption and experimentation,
+whereas without the restriction there's less of an obstacle.
+
+Pass channels explicitly to run()?
+----------------------------------
+
+Nick Coghlan suggested [explicit-channels]_ that we may want something more explicit than
+the keyword args of ``run()`` (``**shared``)::
+
+   The subprocess.run() comparison does make me wonder whether this
+   might be a more future-proof signature for Interpreter.run() though:
+
+       def run(source_str, /, *, channels=None):
+           ...
+
+   That way channels can be a namespace *specifically* for passing in
+   channels, and can be reported as such on RunResult. If we decide
+   to allow arbitrary shared objects in the future, or add flag options
+   like "reraise=True" to reraise exceptions from the subinterpreter
+   in the current interpreter, we'd have that ability, rather than
+   having the entire potential keyword namespace taken up for passing
+   shared objects.
+
+and::
+
+   It does occur to me that if we wanted to align with the way the
+   `runpy` module spells that concept, we'd call the option
+   `init_globals`, but I'm thinking it will be better to only allow
+   channels to be passed through directly, and require that everything
+   else be sent through a channel.
+

 Deferred Functionality
 ======================
@ -619,11 +719,13 @@ This suffers from the same problem as sharing objects between
 interpreters via queues.  The minimal solution (running a source string)
 is sufficient for us to get the feature out where it can be explored.

-timeout arg to pop() and push()
-------------------------------
+timeout arg to recv() and send()
+--------------------------------

 Typically functions that have a ``block`` argument also have a
-``timeout`` argument.  We can add it later if needed.
+``timeout`` argument.  It sometimes makes sense to do likewise for
+functions that otherwise block, like the channel ``recv()`` and
+``send()`` methods.  We can add it later if needed.

 get_main()
 ----------
@ -732,13 +834,29 @@ desireable and you want to execute in a fresh ``__main__``.  Also,
 you don't necessarily want to leak objects there that you aren't using
 any more.

-Solutions include:
+Note that the following won't work right because it will clear too much
+(e.g. ``__name__`` and the other "__dunder__" attributes::
+
+   interp.run('globals().clear()')
+
+Possible solutions include:

 * a ``create()`` arg to indicate resetting ``__main__`` after each
  ``run`` call
 * an ``Interpreter.reset_main`` flag to support opting in or out
  after the fact
 * an ``Interpreter.reset_main()`` method to opt in when desired
+* ``importlib.util.reset_globals()`` [reset_globals]_
+
+Also note that reseting ``__main__`` does nothing about state stored
+in other modules.  So any solution would have to be clear about the
+scope of what is being reset.  Conceivably we could invent a mechanism
+by which any (or every) module could be reset, unlike ``reload()``
+which does not clear the module before loading into it.  Regardless,
+since ``__main__`` is the execution namespace of the interpreter,
+resetting it has a much more direct correlation to interpreters and
+their dynamic state than does resetting other modules.  So a more
+generic module reset mechanism may prove unnecessary.

 This isn't a critical feature initially.  It can wait until later
 if desirable.
@ -760,6 +878,70 @@ would be a good candidate for the first effort at expanding the types
 that channels support.  They aren't strictly necessary for the initial
 API.

+Integration with async
+----------------------
+
+Per Antoine Pitrou [async]_::
+
+   Has any thought been given to how FIFOs could integrate with async
+   code driven by an event loop (e.g. asyncio)?  I think the model of
+   executing several asyncio (or Tornado) applications each in their
+   own subinterpreter may prove quite interesting to reconcile multi-
+   core concurrency with ease of programming.  That would require the
+   FIFOs to be able to synchronize on something an event loop can wait
+   on (probably a file descriptor?).
+
+A possible solution is to provide async implementations of the blocking
+channel methods (``__next__()``, ``recv()``, and ``send()``).  However,
+the basic functionality of subinterpreters does not depend on async and
+can be added later.
+
+Support for iteration
+---------------------
+
+Supporting iteration on ``RecvChannel`` (via ``__iter__()`` or
+``_next__()``) may be useful.  A trivial implementation would use the
+``recv()`` method, similar to how files do iteration.  Since this isn't
+a fundamental capability and has a simple analog, adding iteration
+support can wait until later.
+
+Channel context managers
+------------------------
+
+Context manager support on ``RecvChannel`` and ``SendChannel`` may be
+helpful.  The implementation would be simple, wrapping a call to
+``close()`` like files do.  As with iteration, this can wait.
+
+Pipes and Queues
+----------------
+
+With the proposed object passing machanism of "channels", other similar
+basic types aren't required to achieve the minimal useful functionality
+of subinterpreters.  Such types include pipes (like channels, but
+one-to-one) and queues (like channels, but buffered).  See below in
+`Rejected Ideas` for more information.
+
+Even though these types aren't part of this proposal, they may still
+be useful in the context of concurrency.  Adding them later is entirely
+reasonable.  The could be trivially implemented as wrappers around
+channels.  Alternatively they could be implemented for efficiency at the
+same low level as channels.
+
+interpreters.RunFailedError
+---------------------------
+
+As currently proposed, ``Interpreter.run()`` offers you no way to
+distinguish an error coming from sub-interpreter from any other
+error in the current interpreter.  Your only option would be to
+explicitly wrap your ``run()`` call in a ``try: ... except Exception:``.
+
+If this is a problem in practice then would could add something like
+``interpreters.RunFailedError`` and raise that in ``run()``, chaining
+the actual error.
+
+Of course, this depends on how we resolve `Leaking exceptions across
+interpreters`_.
+

 Rejected Ideas
 ==============
@ -846,6 +1028,36 @@ References
 .. [mp-conn]
   https://docs.python.org/3/library/multiprocessing.html#multiprocessing.Connection

+.. [bug-rate]
+   https://mail.python.org/pipermail/python-ideas/2017-September/047094.html
+
+.. [benefits]
+   https://mail.python.org/pipermail/python-ideas/2017-September/047122.html
+
+.. [main-thread]
+   https://mail.python.org/pipermail/python-ideas/2017-September/047144.html
+   https://mail.python.org/pipermail/python-dev/2017-September/149566.html
+
+.. [explicit-channels]
+   https://mail.python.org/pipermail/python-dev/2017-September/149562.html
+   https://mail.python.org/pipermail/python-dev/2017-September/149565.html
+
+.. [reset_globals]
+   https://mail.python.org/pipermail/python-dev/2017-September/149545.html
+
+.. [async]
+   https://mail.python.org/pipermail/python-dev/2017-September/149420.html
+   https://mail.python.org/pipermail/python-dev/2017-September/149585.html
+
+.. [result-object]
+   https://mail.python.org/pipermail/python-dev/2017-September/149562.html
+
+.. [jython]
+   https://mail.python.org/pipermail/python-ideas/2017-May/045771.html
+
+.. [pypy]
+   https://mail.python.org/pipermail/python-ideas/2017-September/046973.html
+

 Copyright
 =========