PEP 554: updates after feedback (#1388)

This commit is contained in:
Eric Snow 2020-04-29 17:48:23 -06:00 committed by GitHub
parent e589d83236
commit 08a58eccaa
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 170 additions and 66 deletions

View File

@ -8,7 +8,7 @@ Content-Type: text/x-rst
Created: 2017-09-05
Python-Version: 3.9
Post-History: 07-Sep-2017, 08-Sep-2017, 13-Sep-2017, 05-Dec-2017,
09-May-2018
09-May-2018, 20-Apr-2020
Abstract
@ -106,7 +106,7 @@ For creating and using interpreters:
+----------------------------------------+-----------------------------------------------------+
| ``.is_running() -> bool`` | Is the interpreter currently executing code? |
+----------------------------------------+-----------------------------------------------------+
| ``.destroy()`` | Finalize and destroy the interpreter. |
| ``.close()`` | Finalize and destroy the interpreter. |
+----------------------------------------+-----------------------------------------------------+
| ``.run(src_str, /, *, channels=None)`` | | Run the given source code in the interpreter. |
| | | (This blocks the current thread until done.) |
@ -738,7 +738,8 @@ The module provides the following functions::
get_main() => Interpreter
Return the main interpreter.
Return the main interpreter. If the Python implementation
has no concept of a main interpreter then return None.
create() -> Interpreter
@ -763,7 +764,7 @@ The module also provides the following class::
code. Calling this on the current interpreter will always
return True.
destroy():
close():
Finalize and destroy the interpreter.
@ -925,9 +926,13 @@ The module also provides the following channel-related classes::
recv():
Return the next object (i.e. the data from the sent object)
from the channel. If none have been sent then wait until
the next send.
Return the next object from the channel. If none have been
sent then wait until the next send.
At the least, the object will be equivalent to the sent object.
That will almost always mean the same type with the same data,
though it could also be a compatible proxy. Regardless, it may
use a copy of that data or actually share the data.
If the channel is already closed then raise ChannelClosedError.
If the channel isn't closed but the current interpreter already
@ -1085,17 +1090,27 @@ Open Questions
* add "isolated" mode to subinterpreters API?
An "isolated" mode for subinterpreters would mean an interpreter in
that mode is especially restricted. It might include any of the
following::
There are various ways that an interpreter could potentially operate
in a more isolated/restricted way::
* ImportError when importing ext. module without PEP 489 support
* no daemon threads
* no threads at all
* no multiprocessing
* ...
For now the default would be ``False``, but it would become ``True``
later.
This could be facilitated via settinga (separate or an int flag) on
the ``PyConfig`` struct on each ``PyInterpreterState``. (This would
require moving ``_PyInterpreterState_SetConfig()`` to the public C-API.)
By default the settings would all be False, for backward compatibility.
The ``interpreters`` module, however, would likely use a more
restrictive default (e.g. always require PEP 489 support). This would
effectively be the "isolated" mode. It would make sense to add an arg
to ``interpreters.create()`` to disable "isolated" mode (at least the
PEP 489 part), since then extension authors could test their modules
under subinterpreters (without having to release a potentially broken
build with PEP 489 support).
* add a shareable synchronization primitive?
@ -1104,15 +1119,49 @@ interpreters would actually share the underlying mutex. This would
provide much better efficiency than blocking channel ops. The main
concern is that locks and channels don't mix well (as learned in Go).
* add readiness callback support to channels?
This is an alternative to channel buffering. It is probably
unnecessary, but may have enough advantages to consider it for the
high-level API. It may also be better only for the low-level
implementation.
* also track which interpreters are using a channel end?
* auto-run in a thread?
The PEP proposes a hard separation between subinterpreters and threads:
if you want to run in a thread you must create the thread yourself and
call ``run()`` in it. However, it might be convenient if ``run()``
could do that for you, meaning there would be less boilerplate.
Furthermore, we anticipate that users will want to run in a thread much
more often than not. So it would make sense to make this the default
behavior. We would add a kw-only param "threaded" (default ``True``)
to ``run()`` to allow the run-in-the-current-thread operation.
* what to do about BaseException propagation?
The exception types that inherit from ``BaseException`` (aside from
``Exception``) are usually treated specially. These types are:
``KeyboardInterrupt``, ``SystemExit``, and ``GeneratorExit``. It may
make sense to treat them specially when it comes to propagation from
``run()``. Here are some options::
* propagate like normal via RunFailedError
* do not propagate (handle them somehow in the subinterpreter)
* propagate them directly (avoid RunFailedError)
* propagate them directly (set RunFailedError as __cause__)
TODO
======
* add a more detailed description of channel lifespan
A state machine diagram may be most effective. Relevant questions:
* How does an interpreter detach from the receiving end of a channel
that is never empty?
* What happens if an interpreter deletes the last reference to a
non-empty channel?
* On the receiving end, or on the sending end?
* run the CPython test suite in a subinterpreter and see what shakes out
Deferred Functionality
======================
@ -1143,18 +1192,6 @@ Typically functions that have a ``block`` argument also have a
functions that otherwise block, like the channel ``recv()`` and
``send()`` methods. We can add it later if needed.
get_main()
----------
CPython has a concept of a "main" interpreter. This is the initial
interpreter created during CPython's runtime initialization. It may
be useful to identify the main interpreter. For instance, the main
interpreter should not be destroyed. However, for the basic
functionality of a high-level API a ``get_main()`` function is not
necessary. Furthermore, there is no requirement that a Python
implementation have a concept of a main interpreter. So until there's
a clear need we'll leave ``get_main()`` out.
Interpreter.run_in_thread()
---------------------------
@ -1318,6 +1355,15 @@ channel methods (``recv()``, and ``send()``). However,
the basic functionality of subinterpreters does not depend on async and
can be added later.
Alternately, "readiness callbacks" could be used to simplify use in
async scenarios. This would mean adding an optional ``callback``
(kw-only) parameter to the ``recv_nowait()`` and ``send_nowait()``
channel methods. The callback would be called once the object was sent
or received (respectively).
(Note that making channels buffered makes readiness callbacks less
important.)
Support for iteration
---------------------
@ -1340,9 +1386,9 @@ Pipes and Queues
With the proposed object passing machanism of "channels", other similar
basic types aren't required to achieve the minimal useful functionality
of subinterpreters. Such types include pipes (like channels, but
one-to-one) and queues (like channels, but more generic). See below in
`Rejected Ideas` for more information.
of subinterpreters. Such types include pipes (like unbuffered channels,
but one-to-one) and queues (like channels, but more generic). See below
in `Rejected Ideas` for more information.
Even though these types aren't part of this proposal, they may still
be useful in the context of concurrency. Adding them later is entirely
@ -1350,12 +1396,6 @@ reasonable. The could be trivially implemented as wrappers around
channels. Alternatively they could be implemented for efficiency at the
same low level as channels.
Buffering
---------
The proposed channels are unbuffered. This simplifies the API and
implementation. If buffering is desirable we can add it later.
Return a lock from send()
-------------------------
@ -1371,26 +1411,6 @@ less likely to confuse users.
Note that returning a lock would matter for buffered channels
(i.e. queues). For unbuffered channels it is a non-issue.
Add a "reraise" method to RunFailedError
----------------------------------------
While having ``__cause__`` set on ``RunFailedError`` helps produce a
more useful traceback, it's less helpful when handling the original
error. To help facilitate this, we could add
``RunFailedError.reraise()``. This method would enable the following
pattern::
try:
interp.run(script)
except RunFailedError as exc:
try:
exc.reraise()
except MyException:
...
This would be made even simpler if there existed a ``__reraise__``
protocol.
Support prioritization in channels
----------------------------------
@ -1411,6 +1431,51 @@ will require significant work, especially when it comes to complex
objects and most especially for mutable containers of mutable
complex objects.
Make exceptions shareable
-------------------------
Exceptions are propagated out of ``run()`` calls, so it isn't a big
leap to make them shareable in channels. However, as noted elsewhere,
it isn't essential or (particularly common) so we can wait on doing
that.
Make RunFailedError.__cause__ lazy
----------------------------------
An uncaught exception in a subinterpreter (from ``run()``) is copied
to the calling interpreter and set as ``__cause__`` on a
``RunFailedError`` which is then raised. That copying part involves
some sort of deserialization in the calling intepreter, which can be
expensive (e.g. due to imports) yet is not always necessary.
So it may be useful to use an ``ExceptionProxy`` type to wrap the
serialized exception and only deserialize it when needed. That could
be via ``ExceptionProxy__getattribute__()`` or perhaps through
``RunFailedError.resolve()`` (which would raise the deserialized
exception and set ``RunFailedError.__cause__`` to the exception.
It may also make sense to have ``RunFailedError.__cause__`` be a
descriptor that does the lazy deserialization (and set ``__cause__``)
on the ``RunFailedError`` instance.
Serialize everything through channels
-------------------------------------
We could use pickle (or marshal) to serialize everything sent through
channels. Doing this is potentially inefficient, but it may be a
matter of convenience in the end. We can add it later, but trying to
remove it later would be significantly more painful.
Return a value from ``run()``
-----------------------------
Currently ``run()`` always returns None. One idea is to return the
return value from whatever the subinterpreter ran. However, for now
it doesn't make sense. The only thing folks can run is a string of
code (i.e. a script). This is equivalent to ``PyRun_StringFlags()``,
``exec()``, or a module body. None of those "return" anything. We can
revisit this once ``run()`` supports functions, etc.
Rejected Ideas
==============
@ -1440,15 +1505,11 @@ Use queues instead of channels
------------------------------
Queues and buffered channels are almost the same thing. The main
difference is that channels has a stronger relationship with context
difference is that channels have a stronger relationship with context
(i.e. the associated interpreter).
The name "Channel" was used instead of "Queue" to avoid confusion with
the stdlib ``queue`` module.
Note that buffering in channels does complicate the blocking semantics
of ``recv()`` and ``send()``. Also, queues can be built on top of
unbuffered channels.
the stdlib ``queue.Queue``.
"enumerate"
-----------
@ -1542,6 +1603,49 @@ Doing this is potentially confusing and also can lead to unexpected
races where a channel is auto-closed before it can be used in the
original (creating) interpreter.
Add a "reraise" method to RunFailedError
----------------------------------------
While having ``__cause__`` set on ``RunFailedError`` helps produce a
more useful traceback, it's less helpful when handling the original
error. To help facilitate this, we could add
``RunFailedError.reraise()``. This method would enable the following
pattern::
try:
try:
interp.run(script)
except RunFailedError as exc:
exc.reraise()
except MyException:
...
This would be made even simpler if there existed a ``__reraise__``
protocol.
All that said, this is completely unnecessary. Using ``__cause__``
is good enough::
try:
try:
interp.run(script)
except RunFailedError as exc:
raise exc.__cause__
except MyException:
...
Note that in extreme cases it may require a little extra boilerplate::
try:
try:
interp.run(script)
except RunFailedError as exc:
if exc.__cause__ is not None:
raise exc.__cause__
raise # re-raise
except MyException:
...
Implementation
==============