PEP 554: updates based on feedback (#1390)

This commit is contained in:
Eric Snow 2020-05-01 16:05:01 -06:00 committed by GitHub
parent 11b2de41a7
commit 6f19fe2521
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 204 additions and 250 deletions

View File

@ -8,7 +8,7 @@ Content-Type: text/x-rst
Created: 2017-09-05
Python-Version: 3.9
Post-History: 07-Sep-2017, 08-Sep-2017, 13-Sep-2017, 05-Dec-2017,
09-May-2018, 20-Apr-2020
09-May-2018, 20-Apr-2020, 01-May-2020
Abstract
@ -69,7 +69,6 @@ At first only the following types will be supported for sharing:
* bytes
* str
* int
* PEP 3118 buffer objects (via ``send_buffer()``)
* PEP 554 channels
Support for other basic types (e.g. bool, float, Ellipsis) will be added later.
@ -83,17 +82,17 @@ the `"interpreters" Module API`_ section below.
For creating and using interpreters:
+----------------------------------+----------------------------------------------+
| signature | description |
+==================================+==============================================+
| ``list_all() -> [Interpreter]`` | Get all existing interpreters. |
+----------------------------------+----------------------------------------------+
| ``get_current() -> Interpreter`` | Get the currently running interpreter. |
+----------------------------------+----------------------------------------------+
| ``get_main() -> Interpreter`` | Get the main interpreter. |
+----------------------------------+----------------------------------------------+
| ``create() -> Interpreter`` | Initialize a new (idle) Python interpreter. |
+----------------------------------+----------------------------------------------+
+---------------------------------------------+----------------------------------------------+
| signature | description |
+=============================================+==============================================+
| ``list_all() -> [Interpreter]`` | Get all existing interpreters. |
+---------------------------------------------+----------------------------------------------+
| ``get_current() -> Interpreter`` | Get the currently running interpreter. |
+---------------------------------------------+----------------------------------------------+
| ``get_main() -> Interpreter`` | Get the main interpreter. |
+---------------------------------------------+----------------------------------------------+
| ``create(*, isolated=True) -> Interpreter`` | Initialize a new (idle) Python interpreter. |
+---------------------------------------------+----------------------------------------------+
|
@ -104,6 +103,8 @@ For creating and using interpreters:
+----------------------------------------+-----------------------------------------------------+
| ``.id`` | The interpreter's ID (read-only). |
+----------------------------------------+-----------------------------------------------------+
| ``.isolated`` | The interpreter's mode (read-only). |
+----------------------------------------+-----------------------------------------------------+
| ``.is_running() -> bool`` | Is the interpreter currently executing code? |
+----------------------------------------+-----------------------------------------------------+
| ``.close()`` | Finalize and destroy the interpreter. |
@ -143,20 +144,12 @@ For sharing data between interpreters:
+------------------------------------------+-----------------------------------------------+
| ``.id`` | The channel's unique ID. |
+------------------------------------------+-----------------------------------------------+
| ``.interpreters`` | The list of associated interpreters. |
+------------------------------------------+-----------------------------------------------+
| ``.recv() -> object`` | | Get the next object from the channel, |
| | | and wait if none have been sent. |
| | | Associate the interpreter with the channel. |
+------------------------------------------+-----------------------------------------------+
| ``.recv_nowait(default=None) -> object`` | | Like recv(), but return the default |
| | | instead of waiting. |
+------------------------------------------+-----------------------------------------------+
| ``.release()`` | | No longer associate the current interpreter |
| | | with the channel (on the receiving end). |
+------------------------------------------+-----------------------------------------------+
| ``.close(force=False)`` | | Close the channel in all interpreters. |
+------------------------------------------+-----------------------------------------------+
|
@ -167,26 +160,11 @@ For sharing data between interpreters:
+------------------------------+--------------------------------------------------+
| ``.id`` | The channel's unique ID. |
+------------------------------+--------------------------------------------------+
| ``.interpreters`` | The list of associated interpreters. |
+------------------------------+--------------------------------------------------+
| ``.send(obj)`` | | Send the object (i.e. its data) to the |
| | | receiving end of the channel and wait. |
| | | Associate the interpreter with the channel. |
+------------------------------+--------------------------------------------------+
| ``.send_nowait(obj)`` | | Like send(), but return False if not received. |
+------------------------------+--------------------------------------------------+
| ``.send_buffer(obj)`` | | Send the object's (PEP 3118) buffer to the |
| | | receiving end of the channel and wait. |
| | | Associate the interpreter with the channel. |
+------------------------------+--------------------------------------------------+
| ``.send_buffer_nowait(obj)`` | | Like send_buffer(), but return False |
| | | if not received. |
+------------------------------+--------------------------------------------------+
| ``.release()`` | | No longer associate the current interpreter |
| | | with the channel (on the sending end). |
+------------------------------+--------------------------------------------------+
| ``.close(force=False)`` | | Close the channel in all interpreters. |
+------------------------------+--------------------------------------------------+
|
@ -203,21 +181,26 @@ For sharing data between interpreters:
+--------------------------+------------------------+------------------------------------------------+
| ``NotReceivedError`` | ``ChannelError`` | Nothing was waiting to receive a sent object. |
+--------------------------+------------------------+------------------------------------------------+
| ``ChannelClosedError`` | ``ChannelError`` | The channel is closed. |
+--------------------------+------------------------+------------------------------------------------+
| ``ChannelReleasedError`` | ``ChannelClosedError`` | The channel is released (but not yet closed). |
+--------------------------+------------------------+------------------------------------------------+
"Extending Python" Docs
-----------------------
Help for Extension Module Maintainers
-------------------------------------
Many extension modules do not support use in subinterpreters. The
authors and users of such extension modules will both benefit when they
are updated to support subinterpreters. To help with that, a new page
will be added to the `Extending Python <extension-docs_>`_ docs.
Many extension modules do not support use in subinterpreters yet. The
maintainers and users of such extension modules will both benefit when
they are updated to support subinterpreters. In the meantime users may
become confused by failures when using subinterpreters, which could
negatively impact extension maintainers. See `Concerns`_ below.
This page will explain how to implement PEP 489 support and how to move
from global module state to per-interpreter.
To mitigate that impact and accelerate compatibility, we will do the
following:
* be clear that extension modules are *not* required to support use in
subinterpreters
* raise ``ImportError`` when an incompatible (no PEP 489 support) module
is imported in a subinterpreter
* provide resources (e.g. docs) to help maintainers reach compatibility
* reach out to the maintainers of Cython and of the most used extension
modules (on PyPI) to get feedback and possibly provide assistance
Examples
@ -304,7 +287,6 @@ Synchronize using a channel
interp.run(tw.dedent("""
reader.recv()
print("during")
reader.release()
"""),
shared=dict(
reader=r,
@ -315,7 +297,6 @@ Synchronize using a channel
t.start()
print('after')
s.send(b'')
s.release()
Sharing a file descriptor
-------------------------
@ -366,7 +347,6 @@ Passing objects via marshal
obj = marshal.loads(data)
do_something(obj)
data = reader.recv()
reader.release()
"""))
t = threading.Thread(target=run)
t.start()
@ -396,7 +376,6 @@ Passing objects via pickle
obj = pickle.loads(data)
do_something(obj)
data = reader.recv()
reader.release()
"""))
t = threading.Thread(target=run)
t.start()
@ -564,6 +543,15 @@ at length in this PEP. Just to be clear, the value lies in::
* preparation for per-interpreter GIL
* encourage experimentation
* "data sharing can have a negative impact on cache performance
in multi-core scenarios"
(See [cache-line-ping-pong]_.)
This shouldn't be a problem for now as we have no immediate plans
to actually share data between interpreters, instead focusing
on copying.
About Subinterpreters
=====================
@ -635,7 +623,6 @@ channels to the following:
* bytes
* str
* int
* PEP 3118 buffer objects (via ``send_buffer()``)
* channels
Limiting the initial shareable types is a practical matter, reducing
@ -699,9 +686,14 @@ Provisional Status
The new ``interpreters`` module will be added with "provisional" status
(see PEP 411). This allows Python users to experiment with the feature
and provide feedback while still allowing us to adjust to that feedback.
The module will be provisional in Python 3.8 and we will make a decision
before the 3.9 release whether to keep it provisional, graduate it, or
remove it.
The module will be provisional in Python 3.9 and we will make a decision
before the 3.10 release whether to keep it provisional, graduate it, or
remove it. This PEP will be updated accordingly.
While the module is provisional, any changes to the API (or to behavior)
do not need to be reflected here, nor get approval by the BDFL-delegate.
However, such changes will still need to go through the normal processes
(BPO for smaller changes and python-dev/PEP for substantial ones).
Alternate Python Implementations
@ -741,13 +733,14 @@ The module provides the following functions::
Return the main interpreter. If the Python implementation
has no concept of a main interpreter then return None.
create() -> Interpreter
create(*, isolated=True) -> Interpreter
Initialize a new Python interpreter and return it. The
interpreter will be created in the current thread and will remain
idle until something is run in it. The interpreter may be used
in any thread and will run in whichever thread calls
``interp.run()``.
``interp.run()``. See "Interpreter Isolated Mode" below for
an explanation of the "isolated" parameter.
The module also provides the following class::
@ -756,7 +749,12 @@ The module also provides the following class::
id -> int:
The interpreter's ID (read-only).
The interpreter's ID. (read-only)
isolated -> bool:
Whether or not the interpreter is operating in "isolated" mode.
(read-only)
is_running() -> bool:
@ -820,7 +818,6 @@ The module also provides the following class::
Supported code: source text.
Uncaught Exceptions
-------------------
@ -881,7 +878,7 @@ with unbuffered semantics).
Python objects are not shared between interpreters. However, in some
cases data those objects wrap is actually shared and not just copied.
One example is PEP 3118 buffers. In those cases the object in the
One example might be PEP 3118 buffers. In those cases the object in the
original interpreter is kept alive until the shared data in the other
interpreter is no longer used. Then object destruction can happen like
normal in the original interpreter, along with the previously shared
@ -893,9 +890,7 @@ to channels::
create_channel() -> (RecvChannel, SendChannel):
Create a new channel and return (recv, send), the RecvChannel
and SendChannel corresponding to the ends of the channel. The
lifetime of the channel is determined by associations between
intepreters and the channel's ends (see below).
and SendChannel corresponding to the ends of the channel.
Both ends of the channel are supported "shared" objects (i.e.
may be safely shared by different interpreters. Thus they
@ -917,13 +912,6 @@ The module also provides the following channel-related classes::
The channel's unique ID. This is shared with the "send" end.
interpreters => [Interpreter]:
The list of interpreters associated with the "recv" end of
the channel. (See below for more on how interpreters are
associated with channels.) If the channel has been closed
then raise ChannelClosedError.
recv():
Return the next object from the channel. If none have been
@ -934,47 +922,12 @@ The module also provides the following channel-related classes::
though it could also be a compatible proxy. Regardless, it may
use a copy of that data or actually share the data.
If the channel is already closed then raise ChannelClosedError.
If the channel isn't closed but the current interpreter already
called the "release()" method for the "recv" end then raise
ChannelReleasedError (which is a subclass of
ChannelClosedError).
recv_nowait(default=None):
Return the next object from the channel. If none have been
sent then return the default. Otherwise, this is the same
as the "recv()" method.
release() -> bool:
No longer associate the current interpreter with the channel
(on the "recv" end) and block any future association If the
interpreter was never associated with the channel then still
block any future association. The "send" end of the channel
is unaffected by a released "recv" end.
Once an interpreter is no longer associated with the "recv"
end of the channel, any "recv()" and "recv_nowait()" calls
from that interpreter will fail (even ongoing calls). See
"recv()" for details.
See below for more on how association relates to auto-closing
a channel.
This operation is idempotent. Return True if "release()"
has not been called before by the current interpreter.
close(force=False):
Close both ends of the channel (in all interpreters). This
means that any further use of the channel anywhere raises
ChannelClosedError. If the channel is not empty then
raise ChannelNotEmptyError (if "force" is False) or
discard the remaining objects (if "force" is True)
and close it. Note that the behavior of closing
the "send" end is slightly different.
class SendChannel(id):
@ -986,21 +939,12 @@ The module also provides the following channel-related classes::
The channel's unique ID. This is shared with the "recv" end.
interpreters -> [Interpreter]:
Like "RecvChannel.interpreters" but for the "send" end.
send(obj):
Send the object (i.e. its data) to the "recv" end of the
channel. Wait until the object is received. If the object
is not shareable then ValueError is raised.
If this channel end was already released
by the interpreter then raise ChannelReleasedError. If
the channel is already closed then raise
ChannelClosedError.
send_nowait(obj):
Send the object to the "recv" end of the channel. This
@ -1009,158 +953,88 @@ The module also provides the following channel-related classes::
other end) then queue the object and return False. Otherwise
return True.
send_buffer(obj):
Channel Lifespan
----------------
Send a MemoryView of the object rather than the object.
Otherwise this is the same as "send()". Note that the
object must implement the PEP 3118 buffer protocol.
The buffer will always be released in the original
interpreter, like normal.
send_buffer_nowait(obj):
Send a MemoryView of the object rather than the object.
If the other end is not currently receiving then return
False. Otherwise return True.
release():
This is the same as "RecvChannel.release(), but applied
to the sending end of the channel.
close(force=False):
Close both ends of the channel (in all interpreters). No
matter what the "send" end of the channel is immediately
closed. If the channel is empty then close the "recv"
end immediately too. Otherwise, if "force" if False,
close the "recv" end (and hence the full channel)
once the channel becomes empty; or, if "force"
is True, discard the remaining items and
close immediately.
Note that ``send_buffer()`` is similar to how
``multiprocessing.Connection`` works. [mp-conn]_
A channel is automatically closed and destoyed once there are no more
Python objects (e.g. ``RecvChannel`` and ``SendChannel``) referring
to it. So it is effectively triggered via garbage-collection of those
objects..
Channel Association
-------------------
.. _isolated-mode:
Each end (send/recv) of each channel is associated with a set of
interpreters. This association effectively means "the channel end
is available to that interpreter". It has ramifications on
introspection and on how channels are automatically closed.
Interpreter "Isolated" Mode
===========================
When a channel is created, both ends are immediately associated with
the current interpreter. When a channel end is passed to an interpreter
via ``Interpreter.run(..., channels=...)`` then that interpreter is
associated with the channel end. Likewise when a channel end is sent
through another channel, the receiving interpreter is associated with
the sent channel end.
By default, every new interpreter created by ``interpreters.create()``
has specific restrictions on any code it runs. This includes the
following:
A channel end is explicitly released by an interpreter through the
``release()`` method. It is also done automatically for an interpreter
when the last ``*Channel`` object for the end in that interpreter is
garbage-collected, as though ``release()`` were called.
* importing an extension module fails if it does not implement the
PEP 489 API
* new threads are not allowed (including daemon threads)
* ``os.fork()`` is not allowed (so no ``multiprocessing``)
* ``os.exec*()``, AKA "fork+exec", is not allowed (so no ``subprocess``)
Calling ``*Channel.close()`` automatically releases the channel in all
interpreters for both ends.
This represents the full "isolated" mode of subinterpreters. It is
applied when ``interpreters.create()`` is called with the "isolated"
keyword-only argument set to ``True`` (the default). If
``interpreters.create(isolated=False)`` is called then none of those
restrictions is applied.
Once the number of associated interpreters on both ends drops
to 0, the channel is actually closed. The Python runtime will
garbage-collect all closed channels, though it may not happen
immediately.
One advantage of this approach is that it allows extension maintainers
to check subinterpreter compatibility before they implement the PEP 489
API. Also note that ``isolated=False`` represents the historical
behavior when using the existing subinterpreters C-API, thus providing
backward compatibility. For the existing C-API itself, the default
remains ``isolated=False``. The same is true for the "main" module, so
existing use of Python will not change.
Consequently, ``*Channel.interpreters`` means those to which the
channel end was sent, still hold a reference to the channel end, and
haven't called ``release()``.
We may choose to later loosen some of the above restrictions or provide
a way to enable/disable granular restrictions individually. Regardless,
requiring PEP 489 support from extension modules will always be a
default restriction.
Open Questions
==============
Documentation
=============
* add a "tp_share" type slot instead of using a global registry
for shareable types?
The new stdlib docs page for the ``interpreters`` module will include
the following:
* impact of data sharing on cache performance in multi-core scenarios?
(see [cache-line-ping-pong]_)
* (at the top) a clear note that subinterpreter support in extension
modules is not required
* some explanation about what subinterpreters are
* brief examples of how to use subinterpreters and channels
* a summary of the limitations of subinterpreters
* (for extension maintainers) a link to the resources for ensuring
subinterpreter compatibilty
* much of the API information in this PEP
* strictly disallow subinterpreter import of extension modules without
PEP 489 support?
A separate page will be added to the docs for resources to help
extension maintainers ensure their modules can be used safely in
subinterpreters, under `Extending Python <extension-docs_>`. The page
will include the following information:
* add "isolated" mode to subinterpreters API?
* a summary about subinterpreters (similar to the same in the new
``interpreters`` module page and in the C-API docs)
* an explanation of how extension modules can be impacted
* how to implement PEP 489 support
* how to move from global module state to per-interpreter
* how to take advantage of PEP 384 (heap types), PEP 3121
(module state), and PEP 573
* strategies for dealing with 3rd party C libraries that keep their
own subinterpreter-incompatible global state
There are various ways that an interpreter could potentially operate
in a more isolated/restricted way::
Note that the documentation will play a large part in mitigating any
negative impact that the new ``interpreters`` module might have on
extension module maintainers.
* ImportError when importing ext. module without PEP 489 support
* no daemon threads
* no threads at all
* no multiprocessing
* ...
This could be facilitated via settinga (separate or an int flag) on
the ``PyConfig`` struct on each ``PyInterpreterState``. (This would
require moving ``_PyInterpreterState_SetConfig()`` to the public C-API.)
By default the settings would all be False, for backward compatibility.
The ``interpreters`` module, however, would likely use a more
restrictive default (e.g. always require PEP 489 support). This would
effectively be the "isolated" mode. It would make sense to add an arg
to ``interpreters.create()`` to disable "isolated" mode (at least the
PEP 489 part), since then extension authors could test their modules
under subinterpreters (without having to release a potentially broken
build with PEP 489 support).
* add a shareable synchronization primitive?
This would be ``_threading.Lock`` (or something like it) where
interpreters would actually share the underlying mutex. This would
provide much better efficiency than blocking channel ops. The main
concern is that locks and channels don't mix well (as learned in Go).
* also track which interpreters are using a channel end?
* auto-run in a thread?
The PEP proposes a hard separation between subinterpreters and threads:
if you want to run in a thread you must create the thread yourself and
call ``run()`` in it. However, it might be convenient if ``run()``
could do that for you, meaning there would be less boilerplate.
Furthermore, we anticipate that users will want to run in a thread much
more often than not. So it would make sense to make this the default
behavior. We would add a kw-only param "threaded" (default ``True``)
to ``run()`` to allow the run-in-the-current-thread operation.
* what to do about BaseException propagation?
The exception types that inherit from ``BaseException`` (aside from
``Exception``) are usually treated specially. These types are:
``KeyboardInterrupt``, ``SystemExit``, and ``GeneratorExit``. It may
make sense to treat them specially when it comes to propagation from
``run()``. Here are some options::
* propagate like normal via RunFailedError
* do not propagate (handle them somehow in the subinterpreter)
* propagate them directly (avoid RunFailedError)
* propagate them directly (set RunFailedError as __cause__)
TODO
======
* add a more detailed description of channel lifespan
A state machine diagram may be most effective. Relevant questions:
* How does an interpreter detach from the receiving end of a channel
that is never empty?
* What happens if an interpreter deletes the last reference to a
non-empty channel?
* On the receiving end, or on the sending end?
* run the CPython test suite in a subinterpreter and see what shakes out
Also, the ``ImportError`` for imcompatible extgension modules will have
a message that clearly says it is due to missing subinterpreter
compatibility and that extensions are not required to provide it. This
will help set user expectations properly.
Deferred Functionality
@ -1476,6 +1350,86 @@ code (i.e. a script). This is equivalent to ``PyRun_StringFlags()``,
``exec()``, or a module body. None of those "return" anything. We can
revisit this once ``run()`` supports functions, etc.
Add a "tp_share" type slot
--------------------------
This would replace the current global registry for shareable types.
Expose which interpreters have actually *used* a channel end.
-------------------------------------------------------------
Currently we associate interpreters upon access to a channel. We would
keep a separate association list for "upon use" and expose that.
Add a shareable synchronization primitive
-----------------------------------------
This would be ``_threading.Lock`` (or something like it) where
interpreters would actually share the underlying mutex. This would
provide much better efficiency than blocking channel ops. The main
concern is that locks and channels don't mix well (as learned in Go).
Note that the same functionality as a lock can be acheived by passing
some sort of "token" object through a channel. "send()" would be
equivalent to releasing the lock and "recv()" to acquiring the lock.
We can add this later if it proves desireable without much trouble.
Propagate SystemExit and KeyboardInterrupt Differently
------------------------------------------------------
The exception types that inherit from ``BaseException`` (aside from
``Exception``) are usually treated specially. These types are:
``KeyboardInterrupt``, ``SystemExit``, and ``GeneratorExit``. It may
make sense to treat them specially when it comes to propagation from
``run()``. Here are some options::
* propagate like normal via RunFailedError
* do not propagate (handle them somehow in the subinterpreter)
* propagate them directly (avoid RunFailedError)
* propagate them directly (set RunFailedError as __cause__)
We aren't going to worry about handling them differently. Threads
already ignore ``SystemExit``, so for now we will follow that pattern.
Add an explicit release() and close() to channel end classes
------------------------------------------------------------
It can be convenient to have an explicit way to close a channel against
further global use. Likewise it could be useful to have an explicit
way to release one of the channel ends relative to the current
interpreter. Among other reasons, such a mechanism is useful for
communicating overall state between interpreters without the extra
boilerplate that passing objects through a channel directly would
require.
The challenge is getting automatic release/close right without making
it hard to understand. This is especially true when dealing with a
non-empty channel. We should be able to get by without release/close
for now.
Add SendChannel.send_buffer()
-----------------------------
This method would allow no-copy sending of an object through a channel
if it supports the PEP 3118 buffer protocol (e.g. memoryview).
Support for this is not fundamental to channels and can be added on
later without much disruption.
Auto-run in a thread
--------------------
The PEP proposes a hard separation between subinterpreters and threads:
if you want to run in a thread you must create the thread yourself and
call ``run()`` in it. However, it might be convenient if ``run()``
could do that for you, meaning there would be less boilerplate.
Furthermore, we anticipate that users will want to run in a thread much
more often than not. So it would make sense to make this the default
behavior. We would add a kw-only param "threaded" (default ``True``)
to ``run()`` to allow the run-in-the-current-thread operation.
Rejected Ideas
==============