PEP 554: Address feedback. (#426)
This commit is contained in:
parent
cc8ae3d31d
commit
e9be941a26
370
pep-0554.rst
370
pep-0554.rst
|
@ -6,7 +6,7 @@ Type: Standards Track
|
|||
Content-Type: text/x-rst
|
||||
Created: 2017-09-05
|
||||
Python-Version: 3.7
|
||||
Post-History:
|
||||
Post-History: 07-Sep-2017, 08-Sep-2017, 13-Sep-2017
|
||||
|
||||
|
||||
Abstract
|
||||
|
@ -29,9 +29,8 @@ Proposal
|
|||
|
||||
The ``interpreters`` module will be added to the stdlib. It will
|
||||
provide a high-level interface to subinterpreters and wrap the low-level
|
||||
``_interpreters`` module. The proposed API is inspired by the
|
||||
``threading`` module. See the `Examples`_ section for concrete usage
|
||||
and use cases.
|
||||
``_interpreters`` module. See the `Examples`_ section for concrete
|
||||
usage and use cases.
|
||||
|
||||
API for interpreters
|
||||
--------------------
|
||||
|
@ -79,9 +78,10 @@ The module also provides the following class:
|
|||
|
||||
Run the provided Python source code in the interpreter. Any
|
||||
keyword arguments are added to the interpreter's execution
|
||||
namespace. If any of the values are not supported for sharing
|
||||
between interpreters then RuntimeError gets raised. Currently
|
||||
only channels (see "create_channel()" below) are supported.
|
||||
namespace (the interpreter's "__main__" module). If any of the
|
||||
values are not supported for sharing between interpreters then
|
||||
ValueError gets raised. Currently only channels (see
|
||||
"create_channel()" below) are supported.
|
||||
|
||||
This may not be called on an already running interpreter. Doing
|
||||
so results in a RuntimeError.
|
||||
|
@ -161,43 +161,45 @@ channels have no buffer.
|
|||
|
||||
interpreters:
|
||||
|
||||
The list of associated interpreters (those that have called
|
||||
the "recv()" method).
|
||||
|
||||
__next__():
|
||||
|
||||
Return the next object from the channel. If none have been sent
|
||||
then wait until the next send.
|
||||
The list of associated interpreters: those that have called
|
||||
the "recv()" or "__next__()" methods and haven't called "close()".
|
||||
|
||||
recv():
|
||||
|
||||
Return the next object from the channel. If none have been sent
|
||||
then wait until the next send. If the channel has been closed
|
||||
then EOFError is raised.
|
||||
then wait until the next send. This associates the current
|
||||
interpreter with the channel.
|
||||
|
||||
If the channel is already closed (see the close() method)
|
||||
then raise EOFError. If the channel isn't closed, but the current
|
||||
interpreter already called the "close()" method (which drops its
|
||||
association with the channel) then raise ValueError.
|
||||
|
||||
recv_nowait(default=None):
|
||||
|
||||
Return the next object from the channel. If none have been sent
|
||||
then return the default. If the channel has been closed
|
||||
then EOFError is raised.
|
||||
then return the default. Otherwise, this is the same as the
|
||||
"recv()" method.
|
||||
|
||||
close():
|
||||
|
||||
No longer associate the current interpreter with the channel (on
|
||||
the receiving end). This is a noop if the interpreter isn't
|
||||
already associated. Once an interpreter is no longer associated
|
||||
with the channel, subsequent (or current) send() and recv() calls
|
||||
from that interpreter will raise EOFError.
|
||||
the receiving end) and block future association (via the "recv()"
|
||||
method. If the interpreter was never associated with the channel
|
||||
then still block future association. Once an interpreter is no
|
||||
longer associated with the channel, subsequent (or current) send()
|
||||
and recv() calls from that interpreter will raise ValueError
|
||||
(or EOFError if the channel is actually marked as closed).
|
||||
|
||||
Once number of associated interpreters on both ends drops to 0,
|
||||
the channel is actually marked as closed. The Python runtime
|
||||
will garbage collect all closed channels. Note that "close()" is
|
||||
automatically called when it is no longer used in the current
|
||||
interpreter.
|
||||
Once the number of associated interpreters on both ends drops
|
||||
to 0, the channel is actually marked as closed. The Python
|
||||
runtime will garbage collect all closed channels, though it may
|
||||
not be immediately. Note that "close()" is automatically called
|
||||
in behalf of the current interpreter when the channel is no longer
|
||||
used (i.e. has no references) in that interpreter.
|
||||
|
||||
This operation is idempotent. Return True if the current
|
||||
interpreter was still associated with the receiving end of the
|
||||
channel and False otherwise.
|
||||
This operation is idempotent. Return True if "close()" has not
|
||||
been called before by the current interpreter.
|
||||
|
||||
|
||||
``SendChannel(id)``::
|
||||
|
@ -217,36 +219,26 @@ channels have no buffer.
|
|||
|
||||
send(obj):
|
||||
|
||||
Send the object to the receiving end of the channel. Wait until
|
||||
the object is received. If the channel does not support the
|
||||
object then TypeError is raised. Currently only bytes are
|
||||
supported. If the channel has been closed then EOFError is
|
||||
raised.
|
||||
Send the object to the receiving end of the channel. Wait until
|
||||
the object is received. If the channel does not support the
|
||||
object then ValueError is raised. Currently only bytes are
|
||||
supported.
|
||||
|
||||
If the channel is already closed (see the close() method)
|
||||
then raise EOFError. If the channel isn't closed, but the current
|
||||
interpreter already called the "close()" method (which drops its
|
||||
association with the channel) then raise ValueError.
|
||||
|
||||
send_nowait(obj):
|
||||
|
||||
Send the object to the receiving end of the channel. If the
|
||||
object is received then return True. Otherwise return False.
|
||||
If the channel does not support the object then TypeError is
|
||||
raised. If the channel has been closed then EOFError is raised.
|
||||
object is received then return True. If not then return False.
|
||||
Otherwise, this is the same as the "send()" method.
|
||||
|
||||
close():
|
||||
|
||||
No longer associate the current interpreter with the channel (on
|
||||
the sending end). This is a noop if the interpreter isn't already
|
||||
associated. Once an interpreter is no longer associated with the
|
||||
channel, subsequent (or current) send() and recv() calls from that
|
||||
interpreter will raise EOFError.
|
||||
|
||||
Once number of associated interpreters on both ends drops to 0,
|
||||
the channel is actually marked as closed. The Python runtime
|
||||
will garbage collect all closed channels. Note that "close()" is
|
||||
automatically called when it is no longer used in the current
|
||||
interpreter.
|
||||
|
||||
This operation is idempotent. Return True if the current
|
||||
interpreter was still associated with the sending end of the
|
||||
channel and False otherwise.
|
||||
This is the same as "RecvChannel.close(), but applied to the
|
||||
sending end of the channel.
|
||||
|
||||
|
||||
Examples
|
||||
|
@ -281,15 +273,15 @@ Pre-populate an interpreter
|
|||
::
|
||||
|
||||
interp = interpreters.create()
|
||||
interp.run("""if True:
|
||||
interp.run(tw.dedent("""
|
||||
import some_lib
|
||||
import an_expensive_module
|
||||
some_lib.set_up()
|
||||
""")
|
||||
"""))
|
||||
wait_for_request()
|
||||
interp.run("""if True:
|
||||
interp.run(tw.dedent("""
|
||||
some_lib.handle_request()
|
||||
""")
|
||||
"""))
|
||||
|
||||
Handling an exception
|
||||
---------------------
|
||||
|
@ -298,9 +290,9 @@ Handling an exception
|
|||
|
||||
interp = interpreters.create()
|
||||
try:
|
||||
interp.run("""if True:
|
||||
interp.run(tw.dedent("""
|
||||
raise KeyError
|
||||
""")
|
||||
"""))
|
||||
except KeyError:
|
||||
print("got the error from the subinterpreter")
|
||||
|
||||
|
@ -312,12 +304,12 @@ Synchronize using a channel
|
|||
interp = interpreters.create()
|
||||
r, s = interpreters.create_channel()
|
||||
def run():
|
||||
interp.run("""if True:
|
||||
interp.run(tw.dedent("""
|
||||
reader.recv()
|
||||
print("during")
|
||||
reader.close()
|
||||
""",
|
||||
reader=r)
|
||||
"""),
|
||||
reader=r))
|
||||
t = threading.Thread(target=run)
|
||||
print('before')
|
||||
t.start()
|
||||
|
@ -334,13 +326,13 @@ Sharing a file descriptor
|
|||
r1, s1 = interpreters.create_channel()
|
||||
r2, s2 = interpreters.create_channel()
|
||||
def run():
|
||||
interp.run("""if True:
|
||||
interp.run(tw.dedent("""
|
||||
fd = int.from_bytes(
|
||||
reader.recv(), 'big')
|
||||
for line in os.fdopen(fd):
|
||||
print(line)
|
||||
writer.send(b'')
|
||||
""",
|
||||
"""),
|
||||
reader=r1, writer=s2)
|
||||
t = threading.Thread(target=run)
|
||||
t.start()
|
||||
|
@ -356,19 +348,19 @@ Passing objects via pickle
|
|||
|
||||
interp = interpreters.create()
|
||||
r, s = interpreters.create_channel()
|
||||
interp.run("""if True:
|
||||
interp.run(tw.dedent("""
|
||||
import pickle
|
||||
""",
|
||||
"""),
|
||||
reader=r)
|
||||
def run():
|
||||
interp.run("""if True:
|
||||
interp.run(tw.dedent("""
|
||||
data = reader.recv()
|
||||
while data:
|
||||
obj = pickle.loads(data)
|
||||
do_something(obj)
|
||||
data = reader.recv()
|
||||
reader.close()
|
||||
""",
|
||||
"""),
|
||||
reader=r)
|
||||
t = threading.Thread(target=run)
|
||||
t.start()
|
||||
|
@ -386,6 +378,27 @@ isolation within the same process. This can be leveraged in number
|
|||
of ways. Furthermore, subinterpreters provide a well-defined framework
|
||||
in which such isolation may extended.
|
||||
|
||||
Nick Coghlan explained some of the benefits through a comparison with
|
||||
multi-processing [benefits]_::
|
||||
|
||||
[I] expect that communicating between subinterpreters is going
|
||||
to end up looking an awful lot like communicating between
|
||||
subprocesses via shared memory.
|
||||
|
||||
The trade-off between the two models will then be that one still
|
||||
just looks like a single process from the point of view of the
|
||||
outside world, and hence doesn't place any extra demands on the
|
||||
underlying OS beyond those required to run CPython with a single
|
||||
interpreter, while the other gives much stricter isolation
|
||||
(including isolating C globals in extension modules), but also
|
||||
demands much more from the OS when it comes to its IPC
|
||||
capabilities.
|
||||
|
||||
The security risk profiles of the two approaches will also be quite
|
||||
different, since using subinterpreters won't require deliberately
|
||||
poking holes in the process isolation that operating systems give
|
||||
you by default.
|
||||
|
||||
CPython has supported subinterpreters, with increasing levels of
|
||||
support, since version 1.5. While the feature has the potential
|
||||
to be a powerful tool, subinterpreters have suffered from neglect
|
||||
|
@ -442,7 +455,8 @@ Consequently, projects that publish extension modules may face an
|
|||
increased maintenance burden as their users start using subinterpreters,
|
||||
where their modules may break. This situation is limited to modules
|
||||
that use C globals (or use libraries that use C globals) to store
|
||||
internal state.
|
||||
internal state. For numpy, the reported-bug rate is one every 6
|
||||
months. [bug-rate]_
|
||||
|
||||
Ultimately this comes down to a question of how often it will be a
|
||||
problem in practice: how many projects would be affected, how often
|
||||
|
@ -545,11 +559,12 @@ Existing Usage
|
|||
--------------
|
||||
|
||||
Subinterpreters are not a widely used feature. In fact, the only
|
||||
documented case of wide-spread usage is
|
||||
`mod_wsgi <https://github.com/GrahamDumpleton/mod_wsgi>`_. On the one
|
||||
hand, this case provides confidence that existing subinterpreter support
|
||||
is relatively stable. On the other hand, there isn't much of a sample
|
||||
size from which to judge the utility of the feature.
|
||||
documented cases of wide-spread usage are
|
||||
`mod_wsgi <https://github.com/GrahamDumpleton/mod_wsgi>`_and
|
||||
`JEP <https://github.com/ninia/jep>`_. On the one hand, this case
|
||||
provides confidence that existing subinterpreter support is relatively
|
||||
stable. On the other hand, there isn't much of a sample size from which
|
||||
to judge the utility of the feature.
|
||||
|
||||
|
||||
Provisional Status
|
||||
|
@ -566,8 +581,18 @@ remove it.
|
|||
Alternate Python Implementations
|
||||
================================
|
||||
|
||||
I'll be soliciting feedback from the different Python implementors about
|
||||
subinterpreter support.
|
||||
|
||||
Multiple-interpter support in the major Python implementations:
|
||||
|
||||
TBD
|
||||
|
||||
* jython: yes [jython]_
|
||||
* ironpython: yes?
|
||||
* pypy: maybe not? [pypy]_
|
||||
* micropython: ???
|
||||
|
||||
|
||||
Open Questions
|
||||
==============
|
||||
|
@ -585,11 +610,24 @@ interpreters get better isolation relative to memory management (which
|
|||
is necessary to stop sharing the GIL between interpreters). So the
|
||||
semantics of how the exceptions propagate needs to be resolved.
|
||||
|
||||
Possible solutions:
|
||||
|
||||
* convert at the boundary (e.g. ``subprocess.CalledProcessError``)
|
||||
* wrap in a proxy at the boundary (including with support for
|
||||
something like ``err.raise()`` to propagate the traceback).
|
||||
* return the exception (or its proxy) from ``run()`` instead of
|
||||
raising it
|
||||
* return a result object (like ``subprocess`` does) [result-object]_
|
||||
* throw the exception away and expect users to deal with unhandled
|
||||
exceptions explicitly in the script they pass to ``run()``
|
||||
(they can pass error info out via channels); with threads you have
|
||||
to do something similar
|
||||
|
||||
Initial support for buffers in channels
|
||||
---------------------------------------
|
||||
|
||||
An alternative to support for bytes in channels in support for
|
||||
read-only buffers (the PEP 3119 kind). Then ``recv()`` would return
|
||||
read-only buffers (the PEP 3118 kind). Then ``recv()`` would return
|
||||
a memoryview to expose the buffer in a zero-copy way. This is similar
|
||||
to what ``multiprocessing.Connection`` supports. [mp-conn]
|
||||
|
||||
|
@ -597,6 +635,68 @@ Switching to such an approach would help resolve questions of how
|
|||
passing bytes through channels will work once we isolate memory
|
||||
management in interpreters.
|
||||
|
||||
Does every interpreter think that their thread is the "main" thread?
|
||||
--------------------------------------------------------------------
|
||||
|
||||
CPython's interpreter implementation identifies the OS thread in which
|
||||
it was started as the "main" thread. The interpreter the has slightly
|
||||
different behavior depending on if the current thread is the main one
|
||||
or not. This presents a problem in cases where "main thread" is meant
|
||||
to imply "main thread in the main interpreter" [main-thread]_, where
|
||||
the main interpreter is the initial one.
|
||||
|
||||
Disallow subinterpreters in the main thread?
|
||||
--------------------------------------------
|
||||
|
||||
This is a specific case of the above issue. Currently in CPython,
|
||||
"we need a main \*thread\* in order to sensibly manage the way signal
|
||||
handling works across different platforms". [main-thread]_
|
||||
|
||||
Since signal handlers are part of the interpreter state, running a
|
||||
subinterpreter in the main thread means that the main interpreter
|
||||
can no longer properly handle signals (since it's effectively paused).
|
||||
|
||||
Furthermore, running a subinterpreter in the main thread would
|
||||
conceivably allow setting signal handlers on that interpreter, which
|
||||
would likewise impact signal handling when that interpreter isn't
|
||||
running or is running in a different thread.
|
||||
|
||||
Ultimately, running subinterpreters in the main OS thread introduces
|
||||
complications to the signal handling implementation. So it may make
|
||||
the most sense to disallow running subinterpreters in the main thread.
|
||||
Support for it could be considered later. The downside is that folks
|
||||
wanting to try out subinterpreters would be required to take the extra
|
||||
step of using threads. This could slow adoption and experimentation,
|
||||
whereas without the restriction there's less of an obstacle.
|
||||
|
||||
Pass channels explicitly to run()?
|
||||
----------------------------------
|
||||
|
||||
Nick Coghlan suggested [explicit-channels]_ that we may want something more explicit than
|
||||
the keyword args of ``run()`` (``**shared``)::
|
||||
|
||||
The subprocess.run() comparison does make me wonder whether this
|
||||
might be a more future-proof signature for Interpreter.run() though:
|
||||
|
||||
def run(source_str, /, *, channels=None):
|
||||
...
|
||||
|
||||
That way channels can be a namespace *specifically* for passing in
|
||||
channels, and can be reported as such on RunResult. If we decide
|
||||
to allow arbitrary shared objects in the future, or add flag options
|
||||
like "reraise=True" to reraise exceptions from the subinterpreter
|
||||
in the current interpreter, we'd have that ability, rather than
|
||||
having the entire potential keyword namespace taken up for passing
|
||||
shared objects.
|
||||
|
||||
and::
|
||||
|
||||
It does occur to me that if we wanted to align with the way the
|
||||
`runpy` module spells that concept, we'd call the option
|
||||
`init_globals`, but I'm thinking it will be better to only allow
|
||||
channels to be passed through directly, and require that everything
|
||||
else be sent through a channel.
|
||||
|
||||
|
||||
Deferred Functionality
|
||||
======================
|
||||
|
@ -619,11 +719,13 @@ This suffers from the same problem as sharing objects between
|
|||
interpreters via queues. The minimal solution (running a source string)
|
||||
is sufficient for us to get the feature out where it can be explored.
|
||||
|
||||
timeout arg to pop() and push()
|
||||
-------------------------------
|
||||
timeout arg to recv() and send()
|
||||
--------------------------------
|
||||
|
||||
Typically functions that have a ``block`` argument also have a
|
||||
``timeout`` argument. We can add it later if needed.
|
||||
``timeout`` argument. It sometimes makes sense to do likewise for
|
||||
functions that otherwise block, like the channel ``recv()`` and
|
||||
``send()`` methods. We can add it later if needed.
|
||||
|
||||
get_main()
|
||||
----------
|
||||
|
@ -732,13 +834,29 @@ desireable and you want to execute in a fresh ``__main__``. Also,
|
|||
you don't necessarily want to leak objects there that you aren't using
|
||||
any more.
|
||||
|
||||
Solutions include:
|
||||
Note that the following won't work right because it will clear too much
|
||||
(e.g. ``__name__`` and the other "__dunder__" attributes::
|
||||
|
||||
interp.run('globals().clear()')
|
||||
|
||||
Possible solutions include:
|
||||
|
||||
* a ``create()`` arg to indicate resetting ``__main__`` after each
|
||||
``run`` call
|
||||
* an ``Interpreter.reset_main`` flag to support opting in or out
|
||||
after the fact
|
||||
* an ``Interpreter.reset_main()`` method to opt in when desired
|
||||
* ``importlib.util.reset_globals()`` [reset_globals]_
|
||||
|
||||
Also note that reseting ``__main__`` does nothing about state stored
|
||||
in other modules. So any solution would have to be clear about the
|
||||
scope of what is being reset. Conceivably we could invent a mechanism
|
||||
by which any (or every) module could be reset, unlike ``reload()``
|
||||
which does not clear the module before loading into it. Regardless,
|
||||
since ``__main__`` is the execution namespace of the interpreter,
|
||||
resetting it has a much more direct correlation to interpreters and
|
||||
their dynamic state than does resetting other modules. So a more
|
||||
generic module reset mechanism may prove unnecessary.
|
||||
|
||||
This isn't a critical feature initially. It can wait until later
|
||||
if desirable.
|
||||
|
@ -760,6 +878,70 @@ would be a good candidate for the first effort at expanding the types
|
|||
that channels support. They aren't strictly necessary for the initial
|
||||
API.
|
||||
|
||||
Integration with async
|
||||
----------------------
|
||||
|
||||
Per Antoine Pitrou [async]_::
|
||||
|
||||
Has any thought been given to how FIFOs could integrate with async
|
||||
code driven by an event loop (e.g. asyncio)? I think the model of
|
||||
executing several asyncio (or Tornado) applications each in their
|
||||
own subinterpreter may prove quite interesting to reconcile multi-
|
||||
core concurrency with ease of programming. That would require the
|
||||
FIFOs to be able to synchronize on something an event loop can wait
|
||||
on (probably a file descriptor?).
|
||||
|
||||
A possible solution is to provide async implementations of the blocking
|
||||
channel methods (``__next__()``, ``recv()``, and ``send()``). However,
|
||||
the basic functionality of subinterpreters does not depend on async and
|
||||
can be added later.
|
||||
|
||||
Support for iteration
|
||||
---------------------
|
||||
|
||||
Supporting iteration on ``RecvChannel`` (via ``__iter__()`` or
|
||||
``_next__()``) may be useful. A trivial implementation would use the
|
||||
``recv()`` method, similar to how files do iteration. Since this isn't
|
||||
a fundamental capability and has a simple analog, adding iteration
|
||||
support can wait until later.
|
||||
|
||||
Channel context managers
|
||||
------------------------
|
||||
|
||||
Context manager support on ``RecvChannel`` and ``SendChannel`` may be
|
||||
helpful. The implementation would be simple, wrapping a call to
|
||||
``close()`` like files do. As with iteration, this can wait.
|
||||
|
||||
Pipes and Queues
|
||||
----------------
|
||||
|
||||
With the proposed object passing machanism of "channels", other similar
|
||||
basic types aren't required to achieve the minimal useful functionality
|
||||
of subinterpreters. Such types include pipes (like channels, but
|
||||
one-to-one) and queues (like channels, but buffered). See below in
|
||||
`Rejected Ideas` for more information.
|
||||
|
||||
Even though these types aren't part of this proposal, they may still
|
||||
be useful in the context of concurrency. Adding them later is entirely
|
||||
reasonable. The could be trivially implemented as wrappers around
|
||||
channels. Alternatively they could be implemented for efficiency at the
|
||||
same low level as channels.
|
||||
|
||||
interpreters.RunFailedError
|
||||
---------------------------
|
||||
|
||||
As currently proposed, ``Interpreter.run()`` offers you no way to
|
||||
distinguish an error coming from sub-interpreter from any other
|
||||
error in the current interpreter. Your only option would be to
|
||||
explicitly wrap your ``run()`` call in a ``try: ... except Exception:``.
|
||||
|
||||
If this is a problem in practice then would could add something like
|
||||
``interpreters.RunFailedError`` and raise that in ``run()``, chaining
|
||||
the actual error.
|
||||
|
||||
Of course, this depends on how we resolve `Leaking exceptions across
|
||||
interpreters`_.
|
||||
|
||||
|
||||
Rejected Ideas
|
||||
==============
|
||||
|
@ -846,6 +1028,36 @@ References
|
|||
.. [mp-conn]
|
||||
https://docs.python.org/3/library/multiprocessing.html#multiprocessing.Connection
|
||||
|
||||
.. [bug-rate]
|
||||
https://mail.python.org/pipermail/python-ideas/2017-September/047094.html
|
||||
|
||||
.. [benefits]
|
||||
https://mail.python.org/pipermail/python-ideas/2017-September/047122.html
|
||||
|
||||
.. [main-thread]
|
||||
https://mail.python.org/pipermail/python-ideas/2017-September/047144.html
|
||||
https://mail.python.org/pipermail/python-dev/2017-September/149566.html
|
||||
|
||||
.. [explicit-channels]
|
||||
https://mail.python.org/pipermail/python-dev/2017-September/149562.html
|
||||
https://mail.python.org/pipermail/python-dev/2017-September/149565.html
|
||||
|
||||
.. [reset_globals]
|
||||
https://mail.python.org/pipermail/python-dev/2017-September/149545.html
|
||||
|
||||
.. [async]
|
||||
https://mail.python.org/pipermail/python-dev/2017-September/149420.html
|
||||
https://mail.python.org/pipermail/python-dev/2017-September/149585.html
|
||||
|
||||
.. [result-object]
|
||||
https://mail.python.org/pipermail/python-dev/2017-September/149562.html
|
||||
|
||||
.. [jython]
|
||||
https://mail.python.org/pipermail/python-ideas/2017-May/045771.html
|
||||
|
||||
.. [pypy]
|
||||
https://mail.python.org/pipermail/python-ideas/2017-September/046973.html
|
||||
|
||||
|
||||
Copyright
|
||||
=========
|
||||
|
|
Loading…
Reference in New Issue