diff --git a/pep-0554.rst b/pep-0554.rst index 8f2925e89..2e1006468 100644 --- a/pep-0554.rst +++ b/pep-0554.rst @@ -8,7 +8,7 @@ Content-Type: text/x-rst Created: 2017-09-05 Python-Version: 3.9 Post-History: 07-Sep-2017, 08-Sep-2017, 13-Sep-2017, 05-Dec-2017, - 09-May-2018, 20-Apr-2020 + 09-May-2018, 20-Apr-2020, 01-May-2020 Abstract @@ -69,7 +69,6 @@ At first only the following types will be supported for sharing: * bytes * str * int -* PEP 3118 buffer objects (via ``send_buffer()``) * PEP 554 channels Support for other basic types (e.g. bool, float, Ellipsis) will be added later. @@ -83,17 +82,17 @@ the `"interpreters" Module API`_ section below. For creating and using interpreters: -+----------------------------------+----------------------------------------------+ -| signature | description | -+==================================+==============================================+ -| ``list_all() -> [Interpreter]`` | Get all existing interpreters. | -+----------------------------------+----------------------------------------------+ -| ``get_current() -> Interpreter`` | Get the currently running interpreter. | -+----------------------------------+----------------------------------------------+ -| ``get_main() -> Interpreter`` | Get the main interpreter. | -+----------------------------------+----------------------------------------------+ -| ``create() -> Interpreter`` | Initialize a new (idle) Python interpreter. | -+----------------------------------+----------------------------------------------+ ++---------------------------------------------+----------------------------------------------+ +| signature | description | ++=============================================+==============================================+ +| ``list_all() -> [Interpreter]`` | Get all existing interpreters. | ++---------------------------------------------+----------------------------------------------+ +| ``get_current() -> Interpreter`` | Get the currently running interpreter. | ++---------------------------------------------+----------------------------------------------+ +| ``get_main() -> Interpreter`` | Get the main interpreter. | ++---------------------------------------------+----------------------------------------------+ +| ``create(*, isolated=True) -> Interpreter`` | Initialize a new (idle) Python interpreter. | ++---------------------------------------------+----------------------------------------------+ | @@ -104,6 +103,8 @@ For creating and using interpreters: +----------------------------------------+-----------------------------------------------------+ | ``.id`` | The interpreter's ID (read-only). | +----------------------------------------+-----------------------------------------------------+ +| ``.isolated`` | The interpreter's mode (read-only). | ++----------------------------------------+-----------------------------------------------------+ | ``.is_running() -> bool`` | Is the interpreter currently executing code? | +----------------------------------------+-----------------------------------------------------+ | ``.close()`` | Finalize and destroy the interpreter. | @@ -143,20 +144,12 @@ For sharing data between interpreters: +------------------------------------------+-----------------------------------------------+ | ``.id`` | The channel's unique ID. | +------------------------------------------+-----------------------------------------------+ -| ``.interpreters`` | The list of associated interpreters. | -+------------------------------------------+-----------------------------------------------+ | ``.recv() -> object`` | | Get the next object from the channel, | | | | and wait if none have been sent. | -| | | Associate the interpreter with the channel. | +------------------------------------------+-----------------------------------------------+ | ``.recv_nowait(default=None) -> object`` | | Like recv(), but return the default | | | | instead of waiting. | +------------------------------------------+-----------------------------------------------+ -| ``.release()`` | | No longer associate the current interpreter | -| | | with the channel (on the receiving end). | -+------------------------------------------+-----------------------------------------------+ -| ``.close(force=False)`` | | Close the channel in all interpreters. | -+------------------------------------------+-----------------------------------------------+ | @@ -167,26 +160,11 @@ For sharing data between interpreters: +------------------------------+--------------------------------------------------+ | ``.id`` | The channel's unique ID. | +------------------------------+--------------------------------------------------+ -| ``.interpreters`` | The list of associated interpreters. | -+------------------------------+--------------------------------------------------+ | ``.send(obj)`` | | Send the object (i.e. its data) to the | | | | receiving end of the channel and wait. | -| | | Associate the interpreter with the channel. | +------------------------------+--------------------------------------------------+ | ``.send_nowait(obj)`` | | Like send(), but return False if not received. | +------------------------------+--------------------------------------------------+ -| ``.send_buffer(obj)`` | | Send the object's (PEP 3118) buffer to the | -| | | receiving end of the channel and wait. | -| | | Associate the interpreter with the channel. | -+------------------------------+--------------------------------------------------+ -| ``.send_buffer_nowait(obj)`` | | Like send_buffer(), but return False | -| | | if not received. | -+------------------------------+--------------------------------------------------+ -| ``.release()`` | | No longer associate the current interpreter | -| | | with the channel (on the sending end). | -+------------------------------+--------------------------------------------------+ -| ``.close(force=False)`` | | Close the channel in all interpreters. | -+------------------------------+--------------------------------------------------+ | @@ -203,21 +181,26 @@ For sharing data between interpreters: +--------------------------+------------------------+------------------------------------------------+ | ``NotReceivedError`` | ``ChannelError`` | Nothing was waiting to receive a sent object. | +--------------------------+------------------------+------------------------------------------------+ -| ``ChannelClosedError`` | ``ChannelError`` | The channel is closed. | -+--------------------------+------------------------+------------------------------------------------+ -| ``ChannelReleasedError`` | ``ChannelClosedError`` | The channel is released (but not yet closed). | -+--------------------------+------------------------+------------------------------------------------+ -"Extending Python" Docs ------------------------ +Help for Extension Module Maintainers +------------------------------------- -Many extension modules do not support use in subinterpreters. The -authors and users of such extension modules will both benefit when they -are updated to support subinterpreters. To help with that, a new page -will be added to the `Extending Python `_ docs. +Many extension modules do not support use in subinterpreters yet. The +maintainers and users of such extension modules will both benefit when +they are updated to support subinterpreters. In the meantime users may +become confused by failures when using subinterpreters, which could +negatively impact extension maintainers. See `Concerns`_ below. -This page will explain how to implement PEP 489 support and how to move -from global module state to per-interpreter. +To mitigate that impact and accelerate compatibility, we will do the +following: + +* be clear that extension modules are *not* required to support use in + subinterpreters +* raise ``ImportError`` when an incompatible (no PEP 489 support) module + is imported in a subinterpreter +* provide resources (e.g. docs) to help maintainers reach compatibility +* reach out to the maintainers of Cython and of the most used extension + modules (on PyPI) to get feedback and possibly provide assistance Examples @@ -304,7 +287,6 @@ Synchronize using a channel interp.run(tw.dedent(""" reader.recv() print("during") - reader.release() """), shared=dict( reader=r, @@ -315,7 +297,6 @@ Synchronize using a channel t.start() print('after') s.send(b'') - s.release() Sharing a file descriptor ------------------------- @@ -366,7 +347,6 @@ Passing objects via marshal obj = marshal.loads(data) do_something(obj) data = reader.recv() - reader.release() """)) t = threading.Thread(target=run) t.start() @@ -396,7 +376,6 @@ Passing objects via pickle obj = pickle.loads(data) do_something(obj) data = reader.recv() - reader.release() """)) t = threading.Thread(target=run) t.start() @@ -564,6 +543,15 @@ at length in this PEP. Just to be clear, the value lies in:: * preparation for per-interpreter GIL * encourage experimentation +* "data sharing can have a negative impact on cache performance + in multi-core scenarios" + +(See [cache-line-ping-pong]_.) + +This shouldn't be a problem for now as we have no immediate plans +to actually share data between interpreters, instead focusing +on copying. + About Subinterpreters ===================== @@ -635,7 +623,6 @@ channels to the following: * bytes * str * int -* PEP 3118 buffer objects (via ``send_buffer()``) * channels Limiting the initial shareable types is a practical matter, reducing @@ -699,9 +686,14 @@ Provisional Status The new ``interpreters`` module will be added with "provisional" status (see PEP 411). This allows Python users to experiment with the feature and provide feedback while still allowing us to adjust to that feedback. -The module will be provisional in Python 3.8 and we will make a decision -before the 3.9 release whether to keep it provisional, graduate it, or -remove it. +The module will be provisional in Python 3.9 and we will make a decision +before the 3.10 release whether to keep it provisional, graduate it, or +remove it. This PEP will be updated accordingly. + +While the module is provisional, any changes to the API (or to behavior) +do not need to be reflected here, nor get approval by the BDFL-delegate. +However, such changes will still need to go through the normal processes +(BPO for smaller changes and python-dev/PEP for substantial ones). Alternate Python Implementations @@ -741,13 +733,14 @@ The module provides the following functions:: Return the main interpreter. If the Python implementation has no concept of a main interpreter then return None. - create() -> Interpreter + create(*, isolated=True) -> Interpreter Initialize a new Python interpreter and return it. The interpreter will be created in the current thread and will remain idle until something is run in it. The interpreter may be used in any thread and will run in whichever thread calls - ``interp.run()``. + ``interp.run()``. See "Interpreter Isolated Mode" below for + an explanation of the "isolated" parameter. The module also provides the following class:: @@ -756,7 +749,12 @@ The module also provides the following class:: id -> int: - The interpreter's ID (read-only). + The interpreter's ID. (read-only) + + isolated -> bool: + + Whether or not the interpreter is operating in "isolated" mode. + (read-only) is_running() -> bool: @@ -820,7 +818,6 @@ The module also provides the following class:: Supported code: source text. - Uncaught Exceptions ------------------- @@ -881,7 +878,7 @@ with unbuffered semantics). Python objects are not shared between interpreters. However, in some cases data those objects wrap is actually shared and not just copied. -One example is PEP 3118 buffers. In those cases the object in the +One example might be PEP 3118 buffers. In those cases the object in the original interpreter is kept alive until the shared data in the other interpreter is no longer used. Then object destruction can happen like normal in the original interpreter, along with the previously shared @@ -893,9 +890,7 @@ to channels:: create_channel() -> (RecvChannel, SendChannel): Create a new channel and return (recv, send), the RecvChannel - and SendChannel corresponding to the ends of the channel. The - lifetime of the channel is determined by associations between - intepreters and the channel's ends (see below). + and SendChannel corresponding to the ends of the channel. Both ends of the channel are supported "shared" objects (i.e. may be safely shared by different interpreters. Thus they @@ -917,13 +912,6 @@ The module also provides the following channel-related classes:: The channel's unique ID. This is shared with the "send" end. - interpreters => [Interpreter]: - - The list of interpreters associated with the "recv" end of - the channel. (See below for more on how interpreters are - associated with channels.) If the channel has been closed - then raise ChannelClosedError. - recv(): Return the next object from the channel. If none have been @@ -934,47 +922,12 @@ The module also provides the following channel-related classes:: though it could also be a compatible proxy. Regardless, it may use a copy of that data or actually share the data. - If the channel is already closed then raise ChannelClosedError. - If the channel isn't closed but the current interpreter already - called the "release()" method for the "recv" end then raise - ChannelReleasedError (which is a subclass of - ChannelClosedError). - recv_nowait(default=None): Return the next object from the channel. If none have been sent then return the default. Otherwise, this is the same as the "recv()" method. - release() -> bool: - - No longer associate the current interpreter with the channel - (on the "recv" end) and block any future association If the - interpreter was never associated with the channel then still - block any future association. The "send" end of the channel - is unaffected by a released "recv" end. - - Once an interpreter is no longer associated with the "recv" - end of the channel, any "recv()" and "recv_nowait()" calls - from that interpreter will fail (even ongoing calls). See - "recv()" for details. - - See below for more on how association relates to auto-closing - a channel. - - This operation is idempotent. Return True if "release()" - has not been called before by the current interpreter. - - close(force=False): - - Close both ends of the channel (in all interpreters). This - means that any further use of the channel anywhere raises - ChannelClosedError. If the channel is not empty then - raise ChannelNotEmptyError (if "force" is False) or - discard the remaining objects (if "force" is True) - and close it. Note that the behavior of closing - the "send" end is slightly different. - class SendChannel(id): @@ -986,21 +939,12 @@ The module also provides the following channel-related classes:: The channel's unique ID. This is shared with the "recv" end. - interpreters -> [Interpreter]: - - Like "RecvChannel.interpreters" but for the "send" end. - send(obj): Send the object (i.e. its data) to the "recv" end of the channel. Wait until the object is received. If the object is not shareable then ValueError is raised. - If this channel end was already released - by the interpreter then raise ChannelReleasedError. If - the channel is already closed then raise - ChannelClosedError. - send_nowait(obj): Send the object to the "recv" end of the channel. This @@ -1009,158 +953,88 @@ The module also provides the following channel-related classes:: other end) then queue the object and return False. Otherwise return True. - send_buffer(obj): +Channel Lifespan +---------------- - Send a MemoryView of the object rather than the object. - Otherwise this is the same as "send()". Note that the - object must implement the PEP 3118 buffer protocol. - The buffer will always be released in the original - interpreter, like normal. - - send_buffer_nowait(obj): - - Send a MemoryView of the object rather than the object. - If the other end is not currently receiving then return - False. Otherwise return True. - - release(): - - This is the same as "RecvChannel.release(), but applied - to the sending end of the channel. - - close(force=False): - - Close both ends of the channel (in all interpreters). No - matter what the "send" end of the channel is immediately - closed. If the channel is empty then close the "recv" - end immediately too. Otherwise, if "force" if False, - close the "recv" end (and hence the full channel) - once the channel becomes empty; or, if "force" - is True, discard the remaining items and - close immediately. - -Note that ``send_buffer()`` is similar to how -``multiprocessing.Connection`` works. [mp-conn]_ +A channel is automatically closed and destoyed once there are no more +Python objects (e.g. ``RecvChannel`` and ``SendChannel``) referring +to it. So it is effectively triggered via garbage-collection of those +objects.. -Channel Association -------------------- +.. _isolated-mode: -Each end (send/recv) of each channel is associated with a set of -interpreters. This association effectively means "the channel end -is available to that interpreter". It has ramifications on -introspection and on how channels are automatically closed. +Interpreter "Isolated" Mode +=========================== -When a channel is created, both ends are immediately associated with -the current interpreter. When a channel end is passed to an interpreter -via ``Interpreter.run(..., channels=...)`` then that interpreter is -associated with the channel end. Likewise when a channel end is sent -through another channel, the receiving interpreter is associated with -the sent channel end. +By default, every new interpreter created by ``interpreters.create()`` +has specific restrictions on any code it runs. This includes the +following: -A channel end is explicitly released by an interpreter through the -``release()`` method. It is also done automatically for an interpreter -when the last ``*Channel`` object for the end in that interpreter is -garbage-collected, as though ``release()`` were called. +* importing an extension module fails if it does not implement the + PEP 489 API +* new threads are not allowed (including daemon threads) +* ``os.fork()`` is not allowed (so no ``multiprocessing``) +* ``os.exec*()``, AKA "fork+exec", is not allowed (so no ``subprocess``) -Calling ``*Channel.close()`` automatically releases the channel in all -interpreters for both ends. +This represents the full "isolated" mode of subinterpreters. It is +applied when ``interpreters.create()`` is called with the "isolated" +keyword-only argument set to ``True`` (the default). If +``interpreters.create(isolated=False)`` is called then none of those +restrictions is applied. -Once the number of associated interpreters on both ends drops -to 0, the channel is actually closed. The Python runtime will -garbage-collect all closed channels, though it may not happen -immediately. +One advantage of this approach is that it allows extension maintainers +to check subinterpreter compatibility before they implement the PEP 489 +API. Also note that ``isolated=False`` represents the historical +behavior when using the existing subinterpreters C-API, thus providing +backward compatibility. For the existing C-API itself, the default +remains ``isolated=False``. The same is true for the "main" module, so +existing use of Python will not change. -Consequently, ``*Channel.interpreters`` means those to which the -channel end was sent, still hold a reference to the channel end, and -haven't called ``release()``. +We may choose to later loosen some of the above restrictions or provide +a way to enable/disable granular restrictions individually. Regardless, +requiring PEP 489 support from extension modules will always be a +default restriction. -Open Questions -============== +Documentation +============= -* add a "tp_share" type slot instead of using a global registry - for shareable types? +The new stdlib docs page for the ``interpreters`` module will include +the following: -* impact of data sharing on cache performance in multi-core scenarios? - (see [cache-line-ping-pong]_) +* (at the top) a clear note that subinterpreter support in extension + modules is not required +* some explanation about what subinterpreters are +* brief examples of how to use subinterpreters and channels +* a summary of the limitations of subinterpreters +* (for extension maintainers) a link to the resources for ensuring + subinterpreter compatibilty +* much of the API information in this PEP -* strictly disallow subinterpreter import of extension modules without - PEP 489 support? +A separate page will be added to the docs for resources to help +extension maintainers ensure their modules can be used safely in +subinterpreters, under `Extending Python `. The page +will include the following information: -* add "isolated" mode to subinterpreters API? +* a summary about subinterpreters (similar to the same in the new + ``interpreters`` module page and in the C-API docs) +* an explanation of how extension modules can be impacted +* how to implement PEP 489 support +* how to move from global module state to per-interpreter +* how to take advantage of PEP 384 (heap types), PEP 3121 + (module state), and PEP 573 +* strategies for dealing with 3rd party C libraries that keep their + own subinterpreter-incompatible global state -There are various ways that an interpreter could potentially operate -in a more isolated/restricted way:: +Note that the documentation will play a large part in mitigating any +negative impact that the new ``interpreters`` module might have on +extension module maintainers. - * ImportError when importing ext. module without PEP 489 support - * no daemon threads - * no threads at all - * no multiprocessing - * ... - -This could be facilitated via settinga (separate or an int flag) on -the ``PyConfig`` struct on each ``PyInterpreterState``. (This would -require moving ``_PyInterpreterState_SetConfig()`` to the public C-API.) -By default the settings would all be False, for backward compatibility. - -The ``interpreters`` module, however, would likely use a more -restrictive default (e.g. always require PEP 489 support). This would -effectively be the "isolated" mode. It would make sense to add an arg -to ``interpreters.create()`` to disable "isolated" mode (at least the -PEP 489 part), since then extension authors could test their modules -under subinterpreters (without having to release a potentially broken -build with PEP 489 support). - -* add a shareable synchronization primitive? - -This would be ``_threading.Lock`` (or something like it) where -interpreters would actually share the underlying mutex. This would -provide much better efficiency than blocking channel ops. The main -concern is that locks and channels don't mix well (as learned in Go). - -* also track which interpreters are using a channel end? - -* auto-run in a thread? - -The PEP proposes a hard separation between subinterpreters and threads: -if you want to run in a thread you must create the thread yourself and -call ``run()`` in it. However, it might be convenient if ``run()`` -could do that for you, meaning there would be less boilerplate. - -Furthermore, we anticipate that users will want to run in a thread much -more often than not. So it would make sense to make this the default -behavior. We would add a kw-only param "threaded" (default ``True``) -to ``run()`` to allow the run-in-the-current-thread operation. - -* what to do about BaseException propagation? - -The exception types that inherit from ``BaseException`` (aside from -``Exception``) are usually treated specially. These types are: -``KeyboardInterrupt``, ``SystemExit``, and ``GeneratorExit``. It may -make sense to treat them specially when it comes to propagation from -``run()``. Here are some options:: - - * propagate like normal via RunFailedError - * do not propagate (handle them somehow in the subinterpreter) - * propagate them directly (avoid RunFailedError) - * propagate them directly (set RunFailedError as __cause__) - - -TODO -====== - -* add a more detailed description of channel lifespan - -A state machine diagram may be most effective. Relevant questions: - - * How does an interpreter detach from the receiving end of a channel - that is never empty? - * What happens if an interpreter deletes the last reference to a - non-empty channel? - * On the receiving end, or on the sending end? - -* run the CPython test suite in a subinterpreter and see what shakes out +Also, the ``ImportError`` for imcompatible extgension modules will have +a message that clearly says it is due to missing subinterpreter +compatibility and that extensions are not required to provide it. This +will help set user expectations properly. Deferred Functionality @@ -1476,6 +1350,86 @@ code (i.e. a script). This is equivalent to ``PyRun_StringFlags()``, ``exec()``, or a module body. None of those "return" anything. We can revisit this once ``run()`` supports functions, etc. +Add a "tp_share" type slot +-------------------------- + +This would replace the current global registry for shareable types. + +Expose which interpreters have actually *used* a channel end. +------------------------------------------------------------- + +Currently we associate interpreters upon access to a channel. We would +keep a separate association list for "upon use" and expose that. + +Add a shareable synchronization primitive +----------------------------------------- + +This would be ``_threading.Lock`` (or something like it) where +interpreters would actually share the underlying mutex. This would +provide much better efficiency than blocking channel ops. The main +concern is that locks and channels don't mix well (as learned in Go). + +Note that the same functionality as a lock can be acheived by passing +some sort of "token" object through a channel. "send()" would be +equivalent to releasing the lock and "recv()" to acquiring the lock. + +We can add this later if it proves desireable without much trouble. + +Propagate SystemExit and KeyboardInterrupt Differently +------------------------------------------------------ + +The exception types that inherit from ``BaseException`` (aside from +``Exception``) are usually treated specially. These types are: +``KeyboardInterrupt``, ``SystemExit``, and ``GeneratorExit``. It may +make sense to treat them specially when it comes to propagation from +``run()``. Here are some options:: + + * propagate like normal via RunFailedError + * do not propagate (handle them somehow in the subinterpreter) + * propagate them directly (avoid RunFailedError) + * propagate them directly (set RunFailedError as __cause__) + +We aren't going to worry about handling them differently. Threads +already ignore ``SystemExit``, so for now we will follow that pattern. + +Add an explicit release() and close() to channel end classes +------------------------------------------------------------ + +It can be convenient to have an explicit way to close a channel against +further global use. Likewise it could be useful to have an explicit +way to release one of the channel ends relative to the current +interpreter. Among other reasons, such a mechanism is useful for +communicating overall state between interpreters without the extra +boilerplate that passing objects through a channel directly would +require. + +The challenge is getting automatic release/close right without making +it hard to understand. This is especially true when dealing with a +non-empty channel. We should be able to get by without release/close +for now. + +Add SendChannel.send_buffer() +----------------------------- + +This method would allow no-copy sending of an object through a channel +if it supports the PEP 3118 buffer protocol (e.g. memoryview). + +Support for this is not fundamental to channels and can be added on +later without much disruption. + +Auto-run in a thread +-------------------- + +The PEP proposes a hard separation between subinterpreters and threads: +if you want to run in a thread you must create the thread yourself and +call ``run()`` in it. However, it might be convenient if ``run()`` +could do that for you, meaning there would be less boilerplate. + +Furthermore, we anticipate that users will want to run in a thread much +more often than not. So it would make sense to make this the default +behavior. We would add a kw-only param "threaded" (default ``True``) +to ``run()`` to allow the run-in-the-current-thread operation. + Rejected Ideas ==============