PEP 554: Address feedback. (#426)

This commit is contained in:
Eric Snow 2017-09-22 17:51:38 -06:00 committed by GitHub
parent cc8ae3d31d
commit e9be941a26
1 changed files with 291 additions and 79 deletions

View File

@ -6,7 +6,7 @@ Type: Standards Track
Content-Type: text/x-rst
Created: 2017-09-05
Python-Version: 3.7
Post-History:
Post-History: 07-Sep-2017, 08-Sep-2017, 13-Sep-2017
Abstract
@ -29,9 +29,8 @@ Proposal
The ``interpreters`` module will be added to the stdlib. It will
provide a high-level interface to subinterpreters and wrap the low-level
``_interpreters`` module. The proposed API is inspired by the
``threading`` module. See the `Examples`_ section for concrete usage
and use cases.
``_interpreters`` module. See the `Examples`_ section for concrete
usage and use cases.
API for interpreters
--------------------
@ -79,9 +78,10 @@ The module also provides the following class:
Run the provided Python source code in the interpreter. Any
keyword arguments are added to the interpreter's execution
namespace. If any of the values are not supported for sharing
between interpreters then RuntimeError gets raised. Currently
only channels (see "create_channel()" below) are supported.
namespace (the interpreter's "__main__" module). If any of the
values are not supported for sharing between interpreters then
ValueError gets raised. Currently only channels (see
"create_channel()" below) are supported.
This may not be called on an already running interpreter. Doing
so results in a RuntimeError.
@ -161,43 +161,45 @@ channels have no buffer.
interpreters:
The list of associated interpreters (those that have called
the "recv()" method).
__next__():
Return the next object from the channel. If none have been sent
then wait until the next send.
The list of associated interpreters: those that have called
the "recv()" or "__next__()" methods and haven't called "close()".
recv():
Return the next object from the channel. If none have been sent
then wait until the next send. If the channel has been closed
then EOFError is raised.
then wait until the next send. This associates the current
interpreter with the channel.
If the channel is already closed (see the close() method)
then raise EOFError. If the channel isn't closed, but the current
interpreter already called the "close()" method (which drops its
association with the channel) then raise ValueError.
recv_nowait(default=None):
Return the next object from the channel. If none have been sent
then return the default. If the channel has been closed
then EOFError is raised.
then return the default. Otherwise, this is the same as the
"recv()" method.
close():
No longer associate the current interpreter with the channel (on
the receiving end). This is a noop if the interpreter isn't
already associated. Once an interpreter is no longer associated
with the channel, subsequent (or current) send() and recv() calls
from that interpreter will raise EOFError.
the receiving end) and block future association (via the "recv()"
method. If the interpreter was never associated with the channel
then still block future association. Once an interpreter is no
longer associated with the channel, subsequent (or current) send()
and recv() calls from that interpreter will raise ValueError
(or EOFError if the channel is actually marked as closed).
Once number of associated interpreters on both ends drops to 0,
the channel is actually marked as closed. The Python runtime
will garbage collect all closed channels. Note that "close()" is
automatically called when it is no longer used in the current
interpreter.
Once the number of associated interpreters on both ends drops
to 0, the channel is actually marked as closed. The Python
runtime will garbage collect all closed channels, though it may
not be immediately. Note that "close()" is automatically called
in behalf of the current interpreter when the channel is no longer
used (i.e. has no references) in that interpreter.
This operation is idempotent. Return True if the current
interpreter was still associated with the receiving end of the
channel and False otherwise.
This operation is idempotent. Return True if "close()" has not
been called before by the current interpreter.
``SendChannel(id)``::
@ -217,36 +219,26 @@ channels have no buffer.
send(obj):
Send the object to the receiving end of the channel. Wait until
the object is received. If the channel does not support the
object then TypeError is raised. Currently only bytes are
supported. If the channel has been closed then EOFError is
raised.
Send the object to the receiving end of the channel. Wait until
the object is received. If the channel does not support the
object then ValueError is raised. Currently only bytes are
supported.
If the channel is already closed (see the close() method)
then raise EOFError. If the channel isn't closed, but the current
interpreter already called the "close()" method (which drops its
association with the channel) then raise ValueError.
send_nowait(obj):
Send the object to the receiving end of the channel. If the
object is received then return True. Otherwise return False.
If the channel does not support the object then TypeError is
raised. If the channel has been closed then EOFError is raised.
object is received then return True. If not then return False.
Otherwise, this is the same as the "send()" method.
close():
No longer associate the current interpreter with the channel (on
the sending end). This is a noop if the interpreter isn't already
associated. Once an interpreter is no longer associated with the
channel, subsequent (or current) send() and recv() calls from that
interpreter will raise EOFError.
Once number of associated interpreters on both ends drops to 0,
the channel is actually marked as closed. The Python runtime
will garbage collect all closed channels. Note that "close()" is
automatically called when it is no longer used in the current
interpreter.
This operation is idempotent. Return True if the current
interpreter was still associated with the sending end of the
channel and False otherwise.
This is the same as "RecvChannel.close(), but applied to the
sending end of the channel.
Examples
@ -281,15 +273,15 @@ Pre-populate an interpreter
::
interp = interpreters.create()
interp.run("""if True:
interp.run(tw.dedent("""
import some_lib
import an_expensive_module
some_lib.set_up()
""")
"""))
wait_for_request()
interp.run("""if True:
interp.run(tw.dedent("""
some_lib.handle_request()
""")
"""))
Handling an exception
---------------------
@ -298,9 +290,9 @@ Handling an exception
interp = interpreters.create()
try:
interp.run("""if True:
interp.run(tw.dedent("""
raise KeyError
""")
"""))
except KeyError:
print("got the error from the subinterpreter")
@ -312,12 +304,12 @@ Synchronize using a channel
interp = interpreters.create()
r, s = interpreters.create_channel()
def run():
interp.run("""if True:
interp.run(tw.dedent("""
reader.recv()
print("during")
reader.close()
""",
reader=r)
"""),
reader=r))
t = threading.Thread(target=run)
print('before')
t.start()
@ -334,13 +326,13 @@ Sharing a file descriptor
r1, s1 = interpreters.create_channel()
r2, s2 = interpreters.create_channel()
def run():
interp.run("""if True:
interp.run(tw.dedent("""
fd = int.from_bytes(
reader.recv(), 'big')
for line in os.fdopen(fd):
print(line)
writer.send(b'')
""",
"""),
reader=r1, writer=s2)
t = threading.Thread(target=run)
t.start()
@ -356,19 +348,19 @@ Passing objects via pickle
interp = interpreters.create()
r, s = interpreters.create_channel()
interp.run("""if True:
interp.run(tw.dedent("""
import pickle
""",
"""),
reader=r)
def run():
interp.run("""if True:
interp.run(tw.dedent("""
data = reader.recv()
while data:
obj = pickle.loads(data)
do_something(obj)
data = reader.recv()
reader.close()
""",
"""),
reader=r)
t = threading.Thread(target=run)
t.start()
@ -386,6 +378,27 @@ isolation within the same process. This can be leveraged in number
of ways. Furthermore, subinterpreters provide a well-defined framework
in which such isolation may extended.
Nick Coghlan explained some of the benefits through a comparison with
multi-processing [benefits]_::
[I] expect that communicating between subinterpreters is going
to end up looking an awful lot like communicating between
subprocesses via shared memory.
The trade-off between the two models will then be that one still
just looks like a single process from the point of view of the
outside world, and hence doesn't place any extra demands on the
underlying OS beyond those required to run CPython with a single
interpreter, while the other gives much stricter isolation
(including isolating C globals in extension modules), but also
demands much more from the OS when it comes to its IPC
capabilities.
The security risk profiles of the two approaches will also be quite
different, since using subinterpreters won't require deliberately
poking holes in the process isolation that operating systems give
you by default.
CPython has supported subinterpreters, with increasing levels of
support, since version 1.5. While the feature has the potential
to be a powerful tool, subinterpreters have suffered from neglect
@ -442,7 +455,8 @@ Consequently, projects that publish extension modules may face an
increased maintenance burden as their users start using subinterpreters,
where their modules may break. This situation is limited to modules
that use C globals (or use libraries that use C globals) to store
internal state.
internal state. For numpy, the reported-bug rate is one every 6
months. [bug-rate]_
Ultimately this comes down to a question of how often it will be a
problem in practice: how many projects would be affected, how often
@ -545,11 +559,12 @@ Existing Usage
--------------
Subinterpreters are not a widely used feature. In fact, the only
documented case of wide-spread usage is
`mod_wsgi <https://github.com/GrahamDumpleton/mod_wsgi>`_. On the one
hand, this case provides confidence that existing subinterpreter support
is relatively stable. On the other hand, there isn't much of a sample
size from which to judge the utility of the feature.
documented cases of wide-spread usage are
`mod_wsgi <https://github.com/GrahamDumpleton/mod_wsgi>`_and
`JEP <https://github.com/ninia/jep>`_. On the one hand, this case
provides confidence that existing subinterpreter support is relatively
stable. On the other hand, there isn't much of a sample size from which
to judge the utility of the feature.
Provisional Status
@ -566,8 +581,18 @@ remove it.
Alternate Python Implementations
================================
I'll be soliciting feedback from the different Python implementors about
subinterpreter support.
Multiple-interpter support in the major Python implementations:
TBD
* jython: yes [jython]_
* ironpython: yes?
* pypy: maybe not? [pypy]_
* micropython: ???
Open Questions
==============
@ -585,11 +610,24 @@ interpreters get better isolation relative to memory management (which
is necessary to stop sharing the GIL between interpreters). So the
semantics of how the exceptions propagate needs to be resolved.
Possible solutions:
* convert at the boundary (e.g. ``subprocess.CalledProcessError``)
* wrap in a proxy at the boundary (including with support for
something like ``err.raise()`` to propagate the traceback).
* return the exception (or its proxy) from ``run()`` instead of
raising it
* return a result object (like ``subprocess`` does) [result-object]_
* throw the exception away and expect users to deal with unhandled
exceptions explicitly in the script they pass to ``run()``
(they can pass error info out via channels); with threads you have
to do something similar
Initial support for buffers in channels
---------------------------------------
An alternative to support for bytes in channels in support for
read-only buffers (the PEP 3119 kind). Then ``recv()`` would return
read-only buffers (the PEP 3118 kind). Then ``recv()`` would return
a memoryview to expose the buffer in a zero-copy way. This is similar
to what ``multiprocessing.Connection`` supports. [mp-conn]
@ -597,6 +635,68 @@ Switching to such an approach would help resolve questions of how
passing bytes through channels will work once we isolate memory
management in interpreters.
Does every interpreter think that their thread is the "main" thread?
--------------------------------------------------------------------
CPython's interpreter implementation identifies the OS thread in which
it was started as the "main" thread. The interpreter the has slightly
different behavior depending on if the current thread is the main one
or not. This presents a problem in cases where "main thread" is meant
to imply "main thread in the main interpreter" [main-thread]_, where
the main interpreter is the initial one.
Disallow subinterpreters in the main thread?
--------------------------------------------
This is a specific case of the above issue. Currently in CPython,
"we need a main \*thread\* in order to sensibly manage the way signal
handling works across different platforms". [main-thread]_
Since signal handlers are part of the interpreter state, running a
subinterpreter in the main thread means that the main interpreter
can no longer properly handle signals (since it's effectively paused).
Furthermore, running a subinterpreter in the main thread would
conceivably allow setting signal handlers on that interpreter, which
would likewise impact signal handling when that interpreter isn't
running or is running in a different thread.
Ultimately, running subinterpreters in the main OS thread introduces
complications to the signal handling implementation. So it may make
the most sense to disallow running subinterpreters in the main thread.
Support for it could be considered later. The downside is that folks
wanting to try out subinterpreters would be required to take the extra
step of using threads. This could slow adoption and experimentation,
whereas without the restriction there's less of an obstacle.
Pass channels explicitly to run()?
----------------------------------
Nick Coghlan suggested [explicit-channels]_ that we may want something more explicit than
the keyword args of ``run()`` (``**shared``)::
The subprocess.run() comparison does make me wonder whether this
might be a more future-proof signature for Interpreter.run() though:
def run(source_str, /, *, channels=None):
...
That way channels can be a namespace *specifically* for passing in
channels, and can be reported as such on RunResult. If we decide
to allow arbitrary shared objects in the future, or add flag options
like "reraise=True" to reraise exceptions from the subinterpreter
in the current interpreter, we'd have that ability, rather than
having the entire potential keyword namespace taken up for passing
shared objects.
and::
It does occur to me that if we wanted to align with the way the
`runpy` module spells that concept, we'd call the option
`init_globals`, but I'm thinking it will be better to only allow
channels to be passed through directly, and require that everything
else be sent through a channel.
Deferred Functionality
======================
@ -619,11 +719,13 @@ This suffers from the same problem as sharing objects between
interpreters via queues. The minimal solution (running a source string)
is sufficient for us to get the feature out where it can be explored.
timeout arg to pop() and push()
-------------------------------
timeout arg to recv() and send()
--------------------------------
Typically functions that have a ``block`` argument also have a
``timeout`` argument. We can add it later if needed.
``timeout`` argument. It sometimes makes sense to do likewise for
functions that otherwise block, like the channel ``recv()`` and
``send()`` methods. We can add it later if needed.
get_main()
----------
@ -732,13 +834,29 @@ desireable and you want to execute in a fresh ``__main__``. Also,
you don't necessarily want to leak objects there that you aren't using
any more.
Solutions include:
Note that the following won't work right because it will clear too much
(e.g. ``__name__`` and the other "__dunder__" attributes::
interp.run('globals().clear()')
Possible solutions include:
* a ``create()`` arg to indicate resetting ``__main__`` after each
``run`` call
* an ``Interpreter.reset_main`` flag to support opting in or out
after the fact
* an ``Interpreter.reset_main()`` method to opt in when desired
* ``importlib.util.reset_globals()`` [reset_globals]_
Also note that reseting ``__main__`` does nothing about state stored
in other modules. So any solution would have to be clear about the
scope of what is being reset. Conceivably we could invent a mechanism
by which any (or every) module could be reset, unlike ``reload()``
which does not clear the module before loading into it. Regardless,
since ``__main__`` is the execution namespace of the interpreter,
resetting it has a much more direct correlation to interpreters and
their dynamic state than does resetting other modules. So a more
generic module reset mechanism may prove unnecessary.
This isn't a critical feature initially. It can wait until later
if desirable.
@ -760,6 +878,70 @@ would be a good candidate for the first effort at expanding the types
that channels support. They aren't strictly necessary for the initial
API.
Integration with async
----------------------
Per Antoine Pitrou [async]_::
Has any thought been given to how FIFOs could integrate with async
code driven by an event loop (e.g. asyncio)? I think the model of
executing several asyncio (or Tornado) applications each in their
own subinterpreter may prove quite interesting to reconcile multi-
core concurrency with ease of programming. That would require the
FIFOs to be able to synchronize on something an event loop can wait
on (probably a file descriptor?).
A possible solution is to provide async implementations of the blocking
channel methods (``__next__()``, ``recv()``, and ``send()``). However,
the basic functionality of subinterpreters does not depend on async and
can be added later.
Support for iteration
---------------------
Supporting iteration on ``RecvChannel`` (via ``__iter__()`` or
``_next__()``) may be useful. A trivial implementation would use the
``recv()`` method, similar to how files do iteration. Since this isn't
a fundamental capability and has a simple analog, adding iteration
support can wait until later.
Channel context managers
------------------------
Context manager support on ``RecvChannel`` and ``SendChannel`` may be
helpful. The implementation would be simple, wrapping a call to
``close()`` like files do. As with iteration, this can wait.
Pipes and Queues
----------------
With the proposed object passing machanism of "channels", other similar
basic types aren't required to achieve the minimal useful functionality
of subinterpreters. Such types include pipes (like channels, but
one-to-one) and queues (like channels, but buffered). See below in
`Rejected Ideas` for more information.
Even though these types aren't part of this proposal, they may still
be useful in the context of concurrency. Adding them later is entirely
reasonable. The could be trivially implemented as wrappers around
channels. Alternatively they could be implemented for efficiency at the
same low level as channels.
interpreters.RunFailedError
---------------------------
As currently proposed, ``Interpreter.run()`` offers you no way to
distinguish an error coming from sub-interpreter from any other
error in the current interpreter. Your only option would be to
explicitly wrap your ``run()`` call in a ``try: ... except Exception:``.
If this is a problem in practice then would could add something like
``interpreters.RunFailedError`` and raise that in ``run()``, chaining
the actual error.
Of course, this depends on how we resolve `Leaking exceptions across
interpreters`_.
Rejected Ideas
==============
@ -846,6 +1028,36 @@ References
.. [mp-conn]
https://docs.python.org/3/library/multiprocessing.html#multiprocessing.Connection
.. [bug-rate]
https://mail.python.org/pipermail/python-ideas/2017-September/047094.html
.. [benefits]
https://mail.python.org/pipermail/python-ideas/2017-September/047122.html
.. [main-thread]
https://mail.python.org/pipermail/python-ideas/2017-September/047144.html
https://mail.python.org/pipermail/python-dev/2017-September/149566.html
.. [explicit-channels]
https://mail.python.org/pipermail/python-dev/2017-September/149562.html
https://mail.python.org/pipermail/python-dev/2017-September/149565.html
.. [reset_globals]
https://mail.python.org/pipermail/python-dev/2017-September/149545.html
.. [async]
https://mail.python.org/pipermail/python-dev/2017-September/149420.html
https://mail.python.org/pipermail/python-dev/2017-September/149585.html
.. [result-object]
https://mail.python.org/pipermail/python-dev/2017-September/149562.html
.. [jython]
https://mail.python.org/pipermail/python-ideas/2017-May/045771.html
.. [pypy]
https://mail.python.org/pipermail/python-ideas/2017-September/046973.html
Copyright
=========