1070 lines
46 KiB
Plaintext
1070 lines
46 KiB
Plaintext
PEP: 3156
|
||
Title: Asynchronous IO Support Rebooted
|
||
Version: $Revision$
|
||
Last-Modified: $Date$
|
||
Author: Guido van Rossum <guido@python.org>
|
||
Status: Draft
|
||
Type: Standards Track
|
||
Content-Type: text/x-rst
|
||
Created: 12-Dec-2012
|
||
Post-History: 21-Dec-2012
|
||
|
||
Abstract
|
||
========
|
||
|
||
This is a proposal for asynchronous I/O in Python 3, starting with
|
||
Python 3.3. Consider this the concrete proposal that is missing from
|
||
PEP 3153. The proposal includes a pluggable event loop API, transport
|
||
and protocol abstractions similar to those in Twisted, and a
|
||
higher-level scheduler based on ``yield from`` (PEP 380). A reference
|
||
implementation is in the works under the code name tulip.
|
||
|
||
|
||
Introduction
|
||
============
|
||
|
||
The event loop is the place where most interoperability occurs. It
|
||
should be easy for (Python 3.3 ports of) frameworks like Twisted,
|
||
Tornado, or ZeroMQ to either adapt the default event loop
|
||
implementation to their needs using a lightweight wrapper or proxy, or
|
||
to replace the default event loop implementation with an adaptation of
|
||
their own event loop implementation. (Some frameworks, like Twisted,
|
||
have multiple event loop implementations. This should not be a
|
||
problem since these all have the same interface.)
|
||
|
||
It should even be possible for two different third-party frameworks to
|
||
interoperate, either by sharing the default event loop implementation
|
||
(each using its own adapter), or by sharing the event loop
|
||
implementation of either framework. In the latter case two levels of
|
||
adaptation would occur (from framework A's event loop to the standard
|
||
event loop interface, and from there to framework B's event loop).
|
||
Which event loop implementation is used should be under control of the
|
||
main program (though a default policy for event loop selection is
|
||
provided).
|
||
|
||
Thus, two separate APIs are defined:
|
||
|
||
- getting and setting the current event loop object
|
||
- the interface of a conforming event loop and its minimum guarantees
|
||
|
||
An event loop implementation may provide additional methods and
|
||
guarantees.
|
||
|
||
The event loop interface does not depend on ``yield from``. Rather, it
|
||
uses a combination of callbacks, additional interfaces (transports and
|
||
protocols), and Futures. The latter are similar to those defined in
|
||
PEP 3148, but have a different implementation and are not tied to
|
||
threads. In particular, they have no wait() method; the user is
|
||
expected to use callbacks.
|
||
|
||
For users (like myself) who don't like using callbacks, a scheduler is
|
||
provided for writing asynchronous I/O code as coroutines using the PEP
|
||
380 ``yield from`` expressions. The scheduler is not pluggable;
|
||
pluggability occurs at the event loop level, and the scheduler should
|
||
work with any conforming event loop implementation.
|
||
|
||
For interoperability between code written using coroutines and other
|
||
async frameworks, the scheduler has a Task class that behaves like a
|
||
Future. A framework that interoperates at the event loop level can
|
||
wait for a Future to complete by adding a callback to the Future.
|
||
Likewise, the scheduler offers an operation to suspend a coroutine
|
||
until a callback is called.
|
||
|
||
Limited interoperability with threads is provided by the event loop
|
||
interface; there is an API to submit a function to an executor (see
|
||
PEP 3148) which returns a Future that is compatible with the event
|
||
loop.
|
||
|
||
|
||
Non-goals
|
||
=========
|
||
|
||
Interoperability with systems like Stackless Python or
|
||
greenlets/gevent is not a goal of this PEP.
|
||
|
||
|
||
Specification
|
||
=============
|
||
|
||
Dependencies
|
||
------------
|
||
|
||
Python 3.3 is required. No new language or standard library features
|
||
beyond Python 3.3 are required. No third-party modules or packages
|
||
are required.
|
||
|
||
Module Namespace
|
||
----------------
|
||
|
||
The specification here will live in a new toplevel package. Different
|
||
components will live in separate submodules of that package. The
|
||
package will import common APIs from their respective submodules and
|
||
make them available as package attributes (similar to the way the
|
||
email package works).
|
||
|
||
The name of the toplevel package is currently unspecified. The
|
||
reference implementation uses the name 'tulip', but the name will
|
||
change to something more boring if and when the implementation is
|
||
moved into the standard library (hopefully for Python 3.4).
|
||
|
||
Until the boring name is chosen, this PEP will use 'tulip' as the
|
||
toplevel package name. Classes and functions given without a module
|
||
name are assumed to be accessed via the toplevel package.
|
||
|
||
Event Loop Policy: Getting and Setting the Event Loop
|
||
-----------------------------------------------------
|
||
|
||
To get the current event loop, use ``get_event_loop()``. This returns
|
||
an instance of the ``EventLoop`` class defined below or an equivalent
|
||
object. It is possible that ``get_event_loop()`` returns a different
|
||
object depending on the current thread, or depending on some other
|
||
notion of context.
|
||
|
||
To set the current event loop, use ``set_event_loop(event_loop)``,
|
||
where ``event_loop`` is an instance of the ``EventLoop`` class or
|
||
equivalent. This uses the same notion of context as
|
||
``get_event_loop()``.
|
||
|
||
For the benefit of unit tests and other special cases there's a third
|
||
policy function: ``new_event_loop()``, which creates and returns a new
|
||
EventLoop instance according to the policy's default rules. To make
|
||
this the current event loop, you must call ``set_event_loop()``.
|
||
|
||
To change the way the above three functions work
|
||
(including their notion of context), call
|
||
``set_event_loop_policy(policy)``, where ``policy`` is an event loop
|
||
policy object. The policy object can be any object that has methods
|
||
``get_event_loop()``, ``set_event_loop(event_loop)``
|
||
and ``new_event_loop()`` behaving like
|
||
the functions described above. The default event loop policy is an
|
||
instance of the class ``DefaultEventLoopPolicy``. The current event loop
|
||
policy object can be retrieved by calling ``get_event_loop_policy()``.
|
||
|
||
An event loop policy may but does not have to enforce that there is
|
||
only one event loop in existence. The default event loop policy does
|
||
not enforce this, but it does enforce that there is only one event
|
||
loop per thread.
|
||
|
||
Event Loop Interface
|
||
--------------------
|
||
|
||
A note about times: as usual in Python, all timeouts, intervals and
|
||
delays are measured in seconds, and may be ints or floats. The
|
||
accuracy and precision of the clock are up to the implementation; the
|
||
default implementation uses ``time.monotonic()``.
|
||
|
||
A note about callbacks and Handlers: any function that takes a
|
||
callback and a variable number of arguments for it can also be given a
|
||
Handler object instead of the callback. Then no arguments should be
|
||
given, and the Handler should represent an immediate callback (as
|
||
returned from ``call_soon()``), not a delayed callback (as returned
|
||
from ``call_later()``). If the Handler is already cancelled, the
|
||
call is a no-op.
|
||
|
||
A conforming event loop object has the following methods:
|
||
|
||
- ``run()``. Runs the event loop until there is nothing left to do.
|
||
This means, in particular:
|
||
|
||
- No more calls scheduled with ``call_later()``,
|
||
``call_repeatedly()``, ``call_soon()``, or
|
||
``call_soon_threadsafe()``, except for cancelled calls.
|
||
|
||
- No more registered file descriptors. It is up to the registering
|
||
party to unregister a file descriptor when it is closed.
|
||
|
||
Note: ``run()`` blocks until the termination condition is met,
|
||
or until ``stop()`` is called.
|
||
|
||
Note: if you schedule a call with ``call_repeatedly()``, ``run()``
|
||
will not exit until you cancel it.
|
||
|
||
TBD: How many variants of this do we really need?
|
||
|
||
- ``run_forever()``. Runs the event loop until ``stop()`` is called.
|
||
|
||
- ``run_until_complete(future, timeout=None)``. Runs the event loop
|
||
until the Future is done. If a timeout is given, it waits at most
|
||
that long. If the Future is done, its result is returned, or its
|
||
exception is raised; if the timeout expires before the Future is
|
||
done, or if ``stop()`` is called, ``TimeoutError`` is raised (but
|
||
the Future is not cancelled). This cannot be called when the event
|
||
loop is already running.
|
||
|
||
Note: This API is most useful for tests and the like. It should not
|
||
be used as a substitute for ``yield from future`` or other ways to
|
||
wait for a Future (e.g. registering a done callback).
|
||
|
||
- ``run_once(timeout=None)``. Run the event loop for a little while.
|
||
If a timeout is given, an I/O poll made will block at most that
|
||
long; otherwise, an I/O poll is not constrained in time.
|
||
|
||
Note: Exactlly how much work this does is up to the implementation.
|
||
One constraint: if a callback immediately schedules itself using
|
||
``call_soon()``, causing an infinite loop, ``run_once()`` should
|
||
still return.
|
||
|
||
- ``stop()``. Stops the event loop as soon as it is convenient. It
|
||
is fine to restart the loop with ``run()`` (or one of its variants)
|
||
subsequently.
|
||
|
||
Note: How soon exactly is up to the implementation. All immediate
|
||
callbacks that were already scheduled to run before ``stop()`` is
|
||
called must still be run, but callbacks scheduled after it is called
|
||
(or scheduled to be run later) will not be run.
|
||
|
||
- ``close()``. Closes the event loop, releasing any resources it may
|
||
hold, such as the file descriptor used by ``epoll()`` or
|
||
``kqueue()``. This should not be called while the event loop is
|
||
running. It may be called multiple times.
|
||
|
||
- ``call_later(delay, callback, *args)``. Arrange for
|
||
``callback(*args)`` to be called approximately ``delay`` seconds in
|
||
the future, once, unless cancelled. Returns
|
||
a ``Handler`` object representing the callback, whose
|
||
``cancel()`` method can be used to cancel the callback.
|
||
|
||
- ``call_repeatedly(interval, callback, **args)``. Like ``call_later()``
|
||
but calls the callback repeatedly, every ``interval`` seconds,
|
||
until the ``Handler`` returned is cancelled. The first call is in
|
||
``interval`` seconds.
|
||
|
||
- ``call_soon(callback, *args)``. Equivalent to ``call_later(0,
|
||
callback, *args)``.
|
||
|
||
- ``call_soon_threadsafe(callback, *args)``. Like
|
||
``call_soon(callback, *args)``, but when called from another thread
|
||
while the event loop is blocked waiting for I/O, unblocks the event
|
||
loop. This is the *only* method that is safe to call from another
|
||
thread. (To schedule a callback for a
|
||
later time in a threadsafe manner, you can use
|
||
``ev.call_soon_threadsafe(ev.call_later, when, callback, *args)``.)
|
||
This is not safe to call from a signal handler (since it may use
|
||
locks).
|
||
|
||
- ``add_signal_handler(sig, callback, *args). Whenever signal ``sig``
|
||
is received, arrange for ``callback(*args)`` to be called. Returns
|
||
a ``Handler`` which can be used to cancel the signal callback.
|
||
(Cancelling the handler causes ``remove_signal_handler()`` to be
|
||
called the next time the signal arrives. Explicitly calling
|
||
``remove_signal_handler()`` is preferred.)
|
||
Specifying another callback for the same signal replaces the
|
||
previous handler (only one handler can be active per signal). The
|
||
``sig`` must be a valid sigal number defined in the ``signal``
|
||
module. If the signal cannot be handled this raises an exception:
|
||
``ValueError`` if it is not a valid signal or if it is an uncatchable
|
||
signale (e.g. ``SIGKILL``), ``RuntimeError`` if this particular event
|
||
loop instance cannot handle signals (since signals are global per
|
||
process, only an event loop associated with the main thread can
|
||
handle signals).
|
||
|
||
- ``remove_signal_handler(sig)``. Removes the handler for signal
|
||
``sig``, if one is set. Raises the same exceptions as
|
||
``add_signal_handler()`` (except that it may return ``False``
|
||
instead raising ``RuntimeError`` for uncatchable signals). Returns
|
||
``True``e\ if a handler was removed successfully, ``False`` if no
|
||
handler was set.
|
||
|
||
Some methods in the standard conforming interface return Futures:
|
||
|
||
- ``wrap_future(future)``. This takes a PEP 3148 Future (i.e., an
|
||
instance of ``concurrent.futures.Future``) and returns a Future
|
||
compatible with the event loop (i.e., a ``tulip.Future`` instance).
|
||
|
||
- ``run_in_executor(executor, callback, *args)``. Arrange to call
|
||
``callback(*args)`` in an executor (see PEP 3148). Returns a Future
|
||
whose result on success is the return value that call. This is
|
||
equivalent to ``wrap_future(executor.submit(callback, *args))``. If
|
||
``executor`` is ``None``, a default ``ThreadPoolExecutor`` with 5
|
||
threads is used.
|
||
|
||
- ``set_default_executor(executor)``. Set the default executor used
|
||
by ``run_in_executor()``.
|
||
|
||
- ``getaddrinfo(host, port, family=0, type=0, proto=0, flags=0)``.
|
||
Similar to the ``socket.getaddrinfo()`` function but returns a
|
||
Future. The Future's result on success will be a list of the same
|
||
format as returned by ``socket.getaddrinfo()``. The default
|
||
implementation calls ``socket.getaddrinfo()`` using
|
||
``run_in_executor()``, but other implementations may choose to
|
||
implement their own DNS lookup. The optional arguments *must* be
|
||
specified as keyword arguments.
|
||
|
||
- ``getnameinfo(sockaddr, flags=0)``. Similar to
|
||
``socket.getnameinfo()`` but returns a Future. The Future's result
|
||
on success will be a tuple ``(host, port)``. Same implementation
|
||
remarks as for ``getaddrinfo()``.
|
||
|
||
- ``create_connection(protocol_factory, host, port, **kwargs)``.
|
||
Creates a stream connection to a given host and port. This creates
|
||
an implementation-dependent Transport to represent the connection,
|
||
then calls ``protocol_factory()`` to instantiate (or retrieve) the
|
||
user's Protocol implementation, and finally ties the two together.
|
||
(See below for the definitions of Transport and Protocol.) The
|
||
user's Protocol implementation is created or retrieved by calling
|
||
``protocol_factory()`` without arguments(*). The return value is a
|
||
Future whose result on success is the ``(transport, protocol)``
|
||
pair; if a failure prevents the creation of a successful connection,
|
||
the Future will have an appropriate exception set. Note that when
|
||
the Future completes, the protocol's ``connection_made()`` method
|
||
has not yet been called; that will happen when the connection
|
||
handshake is complete.
|
||
|
||
(*) There is no requirement that ``protocol_factory`` is a class.
|
||
If your protocol class needs to have specific arguments passed to
|
||
its constructor, you can use ``lambda`` or ``functools.partial()``.
|
||
You can also pass a trivial ``lambda`` that returns a previously
|
||
constructed Protocol instance.
|
||
|
||
Optional keyword arguments:
|
||
|
||
- ``family``, ``proto``, ``flags``: Address familty,
|
||
protcol, and miscellaneous flags to be passed through
|
||
to ``getaddrinfo()``. These all default to ``0``.
|
||
(The socket type is always ``SOCK_STREAM``.)
|
||
|
||
- ``ssl``: Pass ``True`` to create an SSL transport (by default a
|
||
plain TCP is created). Or pass an ``ssl.SSLContext`` object to
|
||
override the default SSL context object to be used.
|
||
|
||
- ``start_serving(protocol_factory, host, port, **kwds)``. Enters a
|
||
loop that accepts connections. Returns a Future that completes once
|
||
the loop is set up to serve; its return value is None. Each time a
|
||
connection is accepted, ``protocol_factory`` is called without
|
||
arguments(*) to create a Protocol, a Transport is created to represent
|
||
the network side of the connection, and the two are tied together by
|
||
calling ``protocol.connection_made(transport)``.
|
||
|
||
(*) See footnote above for ``create_connection()``. However, since
|
||
``protocol_factory()`` is called once for each new incoming
|
||
connection, it is recommended that it return a new Protocol object
|
||
each time it is called.
|
||
|
||
Optional keyword arguments:
|
||
|
||
- ``family``, ``proto``, ``flags``: Address familty,
|
||
protcol, and miscellaneous flags to be passed through
|
||
to ``getaddrinfo()``. These all default to ``0``.
|
||
(The socket type is always ``SOCK_STREAM``.)
|
||
|
||
TBD: Support SSL? I don't even know how to do that synchronously,
|
||
and I suppose it needs a certificate.
|
||
|
||
TBD: Maybe make the Future's result an object that can be used to
|
||
control the serving loop, e.g. to stop serving, abort all active
|
||
connections, and (if supported) adjust the backlog or other
|
||
parameters? It could also have an API to inquire about active
|
||
connections. Alternatively, return a Future (subclass?) that only
|
||
completes if the loop stops serving due to an error, or if it cannot
|
||
be started? Cancelling it might stop the loop.
|
||
|
||
TBD: Some platforms may not be interested in implementing all of
|
||
these, e.g. start_serving() may be of no interest to mobile apps.
|
||
(Although, there's a Minecraft server on my iPad...)
|
||
|
||
The following methods for registering callbacks for file descriptors
|
||
are optional. If they are not implemented, accessing the method
|
||
(without calling it) returns AttributeError. The default
|
||
implementation provides them but the user normally doesn't use these
|
||
directly -- they are used by the transport implementations
|
||
exclusively. Also, on Windows these may be present or not depending
|
||
on whether a select-based or IOCP-based event loop is used. These
|
||
take integer file descriptors only, not objects with a fileno()
|
||
method. The file descriptor should represent something pollable --
|
||
i.e. no disk files.
|
||
|
||
- ``add_reader(fd, callback, *args)``. Arrange for
|
||
``callback(*args)`` to be called whenever file descriptor ``fd`` is
|
||
ready for reading. Returns a ``Handler`` object which can be
|
||
used to cancel the callback. Note that, unlike ``call_later()``,
|
||
the callback may be called many times. Calling ``add_reader()``
|
||
again for the same file descriptor implicitly cancels the previous
|
||
callback for that file descriptor. Note: cancelling the handler
|
||
may be delayed until the handler would be called. If you plan to
|
||
close ``fd``, you should use ``remove_reader(fd)`` instead.
|
||
(TBD: Change this to raise an exception if a handler
|
||
is already set.)
|
||
|
||
- ``add_writer(fd, callback, *args)``. Like ``add_reader()``,
|
||
but registers the callback for writing instead of for reading.
|
||
|
||
- ``remove_reader(fd)``. Cancels the current read callback for file
|
||
descriptor ``fd``, if one is set. A no-op if no callback is
|
||
currently set for the file descriptor. (The reason for providing
|
||
this alternate interface is that it is often more convenient to
|
||
remember the file descriptor than to remember the ``Handler``
|
||
object.) Returns ``True`` if a handler was removed, ``False``
|
||
if not.
|
||
|
||
- ``remove_writer(fd)``. This is to ``add_writer()`` as
|
||
``remove_reader()`` is to ``add_reader()``.
|
||
|
||
TBD: What about multiple callbacks per fd? The current semantics is
|
||
that ``add_reader()/add_writer()`` replace a previously registered
|
||
callback. Change this to raise an exception if a callback is already
|
||
registered.
|
||
|
||
The following methods for doing async I/O on sockets are optional.
|
||
They are alternative to the previous set of optional methods, intended
|
||
for transport implementations on Windows using IOCP (if the event loop
|
||
supports it). The socket argument has to be a non-blocking socket.
|
||
|
||
- ``sock_recv(sock, n)``. Receive up to ``n`` bytes from socket
|
||
``sock``. Returns a Future whose result on success will be a
|
||
bytes object on success.
|
||
|
||
- ``sock_sendall(sock, data)``. Send bytes ``data`` to the socket
|
||
``sock``. Returns a Future whose result on success will be
|
||
``None``. (TBD: Is it better to emulate ``sendall()`` or ``send()``
|
||
semantics? I think ``sendall()`` -- but perhaps it should still
|
||
be *named* ``send()``?)
|
||
|
||
- ``sock_connect(sock, address)``. Connect to the given address.
|
||
Returns a Future whose result on success will be ``None``.
|
||
|
||
- ``sock_accept(sock)``. Accept a connection from a socket. The
|
||
socket must be in listening mode and bound to an address. Returns a
|
||
Future whose result on success will be a tuple ``(conn, peer)``
|
||
where ``conn`` is a connected non-blocking socket and ``peer`` is
|
||
the peer address. (TBD: People tell me that this style of API is
|
||
too slow for high-volume servers. So there's also
|
||
``start_serving()`` above. Then do we still need this?)
|
||
|
||
TBD: Optional methods are not so good. Perhaps these should be
|
||
required? It may still depend on the platform which set is more
|
||
efficient. Another possibility: document these as "for transports
|
||
only" and the rest as "for anyone".
|
||
|
||
Callback Sequencing
|
||
-------------------
|
||
|
||
When two callbacks are scheduled for the same time, they are run
|
||
in the order in which they are registered. For example::
|
||
|
||
ev.call_soon(foo)
|
||
ev.call_soon(bar)
|
||
|
||
guarantees that ``foo()`` is called before ``bar()``.
|
||
|
||
If ``call_soon()`` is used, this guarantee is true even if the system
|
||
clock were to run backwards. This is also the case for
|
||
``call_later(0, callback, *args)``. However, if ``call_later()`` is
|
||
used with a nonzero delay, all bets are off if the system
|
||
clock were to runs backwards. (A good event loop implementation
|
||
should use ``time.monotonic()`` to avoid problems when the clock runs
|
||
backward. See PEP 418.)
|
||
|
||
Context
|
||
-------
|
||
|
||
All event loops have a notion of context. For the default event loop
|
||
implementation, the context is a thread. An event loop implementation
|
||
should run all callbacks in the same context. An event loop
|
||
implementation should run only one callback at a time, so callbacks
|
||
can assume automatic mutual exclusion with other callbacks scheduled
|
||
in the same event loop.
|
||
|
||
Exceptions
|
||
----------
|
||
|
||
There are two categories of exceptions in Python: those that derive
|
||
from the ``Exception`` class and those that derive from
|
||
``BaseException``. Exceptions deriving from ``Exception`` will
|
||
generally be caught and handled appropriately; for example, they will
|
||
be passed through by Futures, and they will be logged and ignored when
|
||
they occur in a callback.
|
||
|
||
However, exceptions deriving only from ``BaseException`` are never
|
||
caught, and will usually cause the program to terminate with a
|
||
traceback. (Examples of this category include ``KeyboardInterrupt``
|
||
and ``SystemExit``; it is usually unwise to treat these the same as
|
||
most other exceptions.)
|
||
|
||
The Handler Class
|
||
-----------------
|
||
|
||
The various methods for registering callbacks (e.g. ``call_later()``)
|
||
all return an object representing the registration that can be used to
|
||
cancel the callback. For want of a better name this object is called
|
||
a ``Handler``, although the user never needs to instantiate
|
||
instances of this class. There is one public method:
|
||
|
||
- ``cancel()``. Attempt to cancel the callback.
|
||
TBD: Exact specification.
|
||
|
||
Read-only public attributes:
|
||
|
||
- ``callback``. The callback function to be called.
|
||
|
||
- ``args``. The argument tuple with which to call the callback function.
|
||
|
||
- ``cancelled``. True if ``cancel()`` has been called.
|
||
|
||
Note that some callbacks (e.g. those registered with ``call_later()``)
|
||
are meant to be called only once. Others (e.g. those registered with
|
||
``add_reader()``) are meant to be called multiple times.
|
||
|
||
TBD: An API to call the callback (encapsulating the exception handling
|
||
necessary)? Should it record how many times it has been called?
|
||
Maybe this API should just be ``__call__()``? (But it should suppress
|
||
exceptions.)
|
||
|
||
TBD: Public attribute recording the realtime value when the callback
|
||
is scheduled? (Since this is needed anyway for storing it in a heap.)
|
||
|
||
Futures
|
||
-------
|
||
|
||
The ``tulip.Future`` class here is intentionally similar to the
|
||
``concurrent.futures.Future`` class specified by PEP 3148, but there
|
||
are slight differences. Whenever this PEP talks about Futures or
|
||
futures this should be understood to refer to ``tulip.Future`` unless
|
||
``concurrent.futures.Future`` is explicitly mentioned. The supported
|
||
public API is as follows, indicating the differences with PEP 3148:
|
||
|
||
- ``cancel()``. If the Future is already done (or cancelled), return
|
||
``False``. Otherwise, change the Future's state to cancelled (this
|
||
implies done), schedule the callbacks, and return ``True``.
|
||
|
||
- ``cancelled()``. Returns ``True`` if the Future was cancelled.
|
||
|
||
- ``running()``. Always returns ``False``. Difference with PEP 3148:
|
||
there is no "running" state.
|
||
|
||
- ``done()``. Returns ``True`` if the Future is done. Note that a
|
||
cancelled Future is considered done too (here and everywhere).
|
||
|
||
- ``result()``. Returns the result set with ``set_result()``, or
|
||
raises the exception set with ``set_exception()``. Raises
|
||
``CancelledError`` if cancelled. Difference with PEP 3148: This has
|
||
no timeout argument and does *not* wait; if the future is not yet
|
||
done, it raises an exception.
|
||
|
||
- ``exception()``. Returns the exception if set with
|
||
``set_exception()``, or ``None`` if a result was set with
|
||
``set_result()``. Raises ``CancelledError`` if cancelled.
|
||
Difference with PEP 3148: This has no timeout argument and does
|
||
*not* wait; if the future is not yet done, it raises an exception.
|
||
|
||
- ``add_done_callback(fn)``. Add a callback to be run when the Future
|
||
becomes done (or is cancelled). If the Future is already done (or
|
||
cancelled), schedules the callback to using ``call_soon()``.
|
||
Difference with PEP 3148: The callback is never called immediately,
|
||
and always in the context of the caller. (Typically, a context is a
|
||
thread.) You can think of this as calling the callback through
|
||
``call_soon()``. Note that the callback (unlike all other callbacks
|
||
defined in this PEP, and ignoring the convention from the section
|
||
"Callback Style" below) is always called with a single argument, the
|
||
Future object, and should not be a Handler object.
|
||
|
||
- ``set_result(result)``. The Future must not be done (nor cancelled)
|
||
already. This makes the Future done and schedules the callbacks.
|
||
Difference with PEP 3148: This is a public API.
|
||
|
||
- ``set_exception(exception)``. The Future must not be done (nor
|
||
cancelled) already. This makes the Future done and schedules the
|
||
callbacks. Difference with PEP 3148: This is a public API.
|
||
|
||
The internal method ``set_running_or_notify_cancel()`` is not
|
||
supported; there is no way to set the running state.
|
||
|
||
The following exceptions are defined:
|
||
|
||
- ``InvalidStateError``. Raised whenever the Future is not in a state
|
||
acceptable to the method being called (e.g. calling ``set_result()``
|
||
on a Future that is already done, or calling ``result()`` on a Future
|
||
that is not yet done).
|
||
|
||
- ``InvalidTimeoutError``. Raised by ``result()`` and ``exception()``
|
||
when a nonzero ``timeout`` argument is given.
|
||
|
||
- ``CancelledError``. An alias for
|
||
``concurrent.futures.CancelledError``. Raised when ``result()`` or
|
||
``exception()`` is called on a Future that is cancelled.
|
||
|
||
- ``TimeoutError``. An alias for ``concurrent.futures.TimeoutError``.
|
||
May be raised by ``EventLoop.run_until_complete()``.
|
||
|
||
A Future is associated with the default event loop when it is created.
|
||
(TBD: Optionally pass in an alternative event loop instance?)
|
||
|
||
A ``tulip.Future`` object is not acceptable to the ``wait()`` and
|
||
``as_completed()`` functions in the ``concurrent.futures`` package.
|
||
However, there are similar APIs ``tulip.wait()`` and
|
||
``tulip.as_completed()``, described below.
|
||
|
||
A ``tulip.Future`` object is acceptable to a ``yield from`` expression
|
||
when used in a coroutine. This is implemented through the
|
||
``__iter__()`` interface on the Future. See the section "Coroutines
|
||
and the Scheduler" below.
|
||
|
||
When a Future is garbage-collected, if it has an associated exception
|
||
but neither ``result()`` nor ``exception()`` nor ``__iter__()`` has
|
||
ever been called (or the latter hasn't raised the exception yet --
|
||
details TBD), the exception should be logged. TBD: At what level?
|
||
|
||
In the future (pun intended) we may unify ``tulip.Future`` and
|
||
``concurrent.futures.Future``, e.g. by adding an ``__iter__()`` method
|
||
to the latter that works with ``yield from``. To prevent accidentally
|
||
blocking the event loop by calling e.g. ``result()`` on a Future
|
||
that's not don yet, the blocking operation may detect that an event
|
||
loop is active in the current thread and raise an exception instead.
|
||
However the current PEP strives to have no dependencies beyond Python
|
||
3.3, so changes to ``concurrent.futures.Future`` are off the table for
|
||
now.
|
||
|
||
Transports
|
||
----------
|
||
|
||
A transport is an abstraction on top of a socket or something similar
|
||
(for example, a UNIX pipe or an SSL connection). Transports are
|
||
strongly influenced by Twisted and PEP 3153. Users rarely implement
|
||
or instantiate transports -- rather, event loops offer utility methods
|
||
to set up transports.
|
||
|
||
Transports work in conjunction with protocols. Protocols are
|
||
typically written without knowing or caring about the exact type of
|
||
transport used, and transports can be used with a wide variety of
|
||
protocols. For example, an HTTP client protocol implementation may be
|
||
used with either a plain socket transport or an SSL transport. The
|
||
plain socket transport can be used with many different protocols
|
||
besides HTTP (e.g. SMTP, IMAP, POP, FTP, IRC, SPDY).
|
||
|
||
Most connections have an asymmetric nature: the client and server
|
||
usually have very different roles and behaviors. Hence, the interface
|
||
between transport and protocol is also asymmetric. From the
|
||
protocol's point of view, *writing* data is done by calling the
|
||
``write()`` method on the transport object; this buffers the data and
|
||
returns immediately. However, the transport takes a more active role
|
||
in *reading* data: whenever some data is read from the socket (or
|
||
other data source), the transport calls the protocol's
|
||
``data_received()`` method.
|
||
|
||
Transports have the following public methods:
|
||
|
||
- ``write(data)``. Write some bytes. The argument must be a bytes
|
||
object. Returns ``None``. The transport is free to buffer the
|
||
bytes, but it must eventually cause the bytes to be transferred to
|
||
the entity at the other end, and it must maintain stream behavior.
|
||
That is, ``t.write(b'abc'); t.write(b'def')`` is equivalent to
|
||
``t.write(b'abcdef')``, as well as to::
|
||
|
||
t.write(b'a')
|
||
t.write(b'b')
|
||
t.write(b'c')
|
||
t.write(b'd')
|
||
t.write(b'e')
|
||
t.write(b'f')
|
||
|
||
- ``writelines(iterable)``. Equivalent to::
|
||
|
||
for data in iterable:
|
||
self.write(data)
|
||
|
||
- ``write_eof()``. Close the writing end of the connection.
|
||
Subsequent calls to ``write()`` are not allowed. Once all buffered
|
||
data is transferred, the transport signals to the other end that no
|
||
more data will be received. Some protocols don't support this
|
||
operation; in that case, calling ``write_eof()`` will raise an
|
||
exception. (Note: This used to be called ``half_close()``, but
|
||
unless you already know what it is for, that name doesn't indicate
|
||
*which* end is closed.)
|
||
|
||
- ``can_write_eof()``. Return ``True`` if the protocol supports
|
||
``write_eof()``, ``False`` if it does not. (This method is needed
|
||
because some protocols need to change their behavior when
|
||
``write_eof()`` is unavailable. For example, in HTTP, to send data
|
||
whose size is not known ahead of time, the end of the data is
|
||
typically indicated using ``write_eof()``; however, SSL does not
|
||
support this, and an HTTP protocol implementation would have to use
|
||
the "chunked" transfer encoding in this case. But if the data size
|
||
is known ahead of time, the best approach in both cases is to use
|
||
the Content-Length header.)
|
||
|
||
- ``pause()``. Suspend delivery of data to the protocol until a
|
||
subsequent ``resume()`` call. Between ``pause()`` and ``resume()``,
|
||
the protocol's ``data_received()`` method will not be called. This
|
||
has no effect on ``write()``.
|
||
|
||
- ``resume()``. Restart delivery of data to the protocol via
|
||
``data_received()``.
|
||
|
||
- ``close()``. Sever the connection with the entity at the other end.
|
||
Any data buffered by ``write()`` will (eventually) be transferred
|
||
before the connection is actually closed. The protocol's
|
||
``data_received()`` method will not be called again. Once all
|
||
buffered data has been flushed, the protocol's ``connection_lost()``
|
||
method will be called with ``None`` as the argument. Note that
|
||
this method does not wait for all that to happen.
|
||
|
||
- ``abort()``. Immediately sever the connection. Any data still
|
||
buffered by the transport is thrown away. Soon, the protocol's
|
||
``connection_lost()`` method will be called with ``None`` as
|
||
argument. (TBD: Distinguish in the ``connection_lost()`` argument
|
||
between ``close()``, ``abort()`` or a close initated by the other
|
||
end? Or add a transport method to inquire about this? Glyph's
|
||
proposal was to pass different exceptions for this purpose.)
|
||
|
||
TBD: Provide flow control the other way -- the transport may need to
|
||
suspend the protocol if the amount of data buffered becomes a burden.
|
||
Proposal: let the transport call ``protocol.pause()`` and
|
||
``protocol.resume()`` if they exist; if they don't exist, the
|
||
protocol doesn't support flow control. (Perhaps different names
|
||
to avoid confusion between protocols and transports?)
|
||
|
||
Protocols
|
||
---------
|
||
|
||
Protocols are always used in conjunction with transports. While a few
|
||
common protocols are provided (e.g. decent though not necessarily
|
||
excellent HTTP client and server implementations), most protocols will
|
||
be implemented by user code or third-party libraries.
|
||
|
||
A protocol must implement the following methods, which will be called
|
||
by the transport. Consider these callbacks that are always called by
|
||
the event loop in the right context. (See the "Context" section
|
||
above.)
|
||
|
||
- ``connection_made(transport)``. Indicates that the transport is
|
||
ready and connected to the entity at the other end. The protocol
|
||
should probably save the transport reference as an instance variable
|
||
(so it can call its ``write()`` and other methods later), and may
|
||
write an initial greeting or request at this point.
|
||
|
||
- ``data_received(data)``. The transport has read some bytes from the
|
||
connection. The argument is always a non-empty bytes object. There
|
||
are no guarantees about the minimum or maximum size of the data
|
||
passed along this way. ``p.data_received(b'abcdef')`` should be
|
||
treated exactly equivalent to::
|
||
|
||
p.data_received(b'abc')
|
||
p.data_received(b'def')
|
||
|
||
- ``eof_received()``. This is called when the other end called
|
||
``write_eof()`` (or something equivalent). The default
|
||
implementation calls ``close()`` on the transport, which causes
|
||
``connection_lost()`` to be called (eventually) on the protocol.
|
||
|
||
- ``connection_lost(exc)``. The transport has been closed or aborted,
|
||
has detected that the other end has closed the connection cleanly,
|
||
or has encountered an unexpected error. In the first three cases
|
||
the argument is ``None``; for an unexpected error, the argument is
|
||
the exception that caused the transport to give up. (TBD: Do we
|
||
need to distinguish between the first three cases?)
|
||
|
||
Here is a chart indicating the order and multiplicity of calls:
|
||
|
||
1. ``connection_made()`` -- exactly once
|
||
2. ``data_received()`` -- zero or more times
|
||
3. ``eof_received()`` -- at most once
|
||
4. ``connection_lost()`` -- exactly once
|
||
|
||
TBD: Discuss whether user code needs to do anything to make sure that
|
||
protocol and transport aren't garbage-collected prematurely.
|
||
|
||
Callback Style
|
||
--------------
|
||
|
||
Most interfaces taking a callback also take positional arguments. For
|
||
instance, to arrange for ``foo("abc", 42)`` to be called soon, you
|
||
call ``ev.call_soon(foo, "abc", 42)``. To schedule the call
|
||
``foo()``, use ``ev.call_soon(foo)``. This convention greatly reduces
|
||
the number of small lambdas required in typical callback programming.
|
||
|
||
This convention specifically does *not* support keyword arguments.
|
||
Keyword arguments are used to pass optional extra information about
|
||
the callback. This allows graceful evolution of the API without
|
||
having to worry about whether a keyword might be significant to a
|
||
callee somewhere. If you have a callback that *must* be called with a
|
||
keyword argument, you can use a lambda or ``functools.partial``. For
|
||
example::
|
||
|
||
ev.call_soon(functools.partial(foo, "abc", repeat=42))
|
||
|
||
Choosing an Event Loop Implementation
|
||
-------------------------------------
|
||
|
||
TBD. (This is about the choice to use e.g. select vs. poll vs. epoll,
|
||
and how to override the choice. Probably belongs in the event loop
|
||
policy.)
|
||
|
||
|
||
Coroutines and the Scheduler
|
||
============================
|
||
|
||
This is a separate toplevel section because its status is different
|
||
from the event loop interface. Usage of coroutines is optional, and
|
||
it is perfectly fine to write code using callbacks only. On the other
|
||
hand, there is only one implementation of the scheduler/coroutine API,
|
||
and if you're using coroutines, that's the one you're using.
|
||
|
||
Coroutines
|
||
----------
|
||
|
||
A coroutine is a generator that follows certain conventions. For
|
||
documentation purposes, all coroutines should be decorated with
|
||
``@tulip.coroutine``, but this cannot be strictly enforced.
|
||
|
||
Coroutines use the ``yield from`` syntax introduced in PEP 380,
|
||
instead of the original ``yield`` syntax.
|
||
|
||
The word "coroutine", like the word "generator", is used for two
|
||
different (though related) concepts:
|
||
|
||
- The function that defines a coroutine (a function definition
|
||
decorated with ``tulip.coroutine``). If disambiguation is needed,
|
||
we call this a *coroutine function*.
|
||
|
||
- The object obtained by calling a coroutine function. This object
|
||
represents a computation or an I/O operation (usually a combination)
|
||
that will complete eventually. For disambiguation we call it a
|
||
*coroutine object*.
|
||
|
||
Things a coroutine can do:
|
||
|
||
- ``result = yield from future`` -- suspends the coroutine until the
|
||
future is done, then returns the future's result, or raises its
|
||
exception, which will be propagated.
|
||
|
||
- ``result = yield from coroutine`` -- wait for another coroutine to
|
||
produce a result (or raise an exception, which will be propagated).
|
||
The ``coroutine`` expression must be a *call* to another coroutine.
|
||
|
||
- ``return result`` -- produce a result to the coroutine that is
|
||
waiting for this one using ``yield from``.
|
||
|
||
- ``raise exception`` -- raise an exception in the coroutine that is
|
||
waiting for this one using ``yield from``.
|
||
|
||
Calling a coroutine does not start its code running -- it is just a
|
||
generator, and the coroutine object returned by the call is really a
|
||
generator object, which doesn't do anything until you iterate over it.
|
||
In the case of a coroutine object, there are two basic ways to start
|
||
it running: call ``yield from coroutine`` from another coroutine
|
||
(assuming the other coroutine is already running!), or convert it to a
|
||
Task (see below).
|
||
|
||
Coroutines can only run when the event loop is running.
|
||
|
||
Waiting for Multiple Coroutines
|
||
-------------------------------
|
||
|
||
To wait for multiple coroutines or Futures, two APIs similar to the
|
||
``wait()`` and ``as_completed()`` APIs in the ``concurrent.futures``
|
||
package are provided:
|
||
|
||
- ``tulip.wait(fs, timeout=None, return_when=ALL_COMPLETED)``. This
|
||
is a coroutine that waits for the Futures or coroutines given by
|
||
``fs`` to complete. Coroutine arguments will be wrapped in Tasks
|
||
(see below). This returns a Future whose result on success is a
|
||
tuple of two sets of Futures, ``(done, pending)``, where ``done`` is
|
||
the set of original Futures (or wrapped coroutines) that are done
|
||
(or cancelled), and ``pending`` is the rest, i.e. those that are
|
||
still not done (nor cancelled). Optional arguments ``timeout`` and
|
||
``return_when`` have the same meaning and defaults as for
|
||
``concurrent.futures.wait()``: ``timeout``, if not ``None``,
|
||
specifies a timeout for the overall operation; ``return_when``,
|
||
specifies when to stop. The constants ``FIRST_COMPLETED``,
|
||
``FIRST_EXCEPTION``, ``ALL_COMPLETED`` are defined with the same
|
||
values and the same meanings as in PEP 3148:
|
||
|
||
- ``ALL_COMPLETED`` (default): Wait until all Futures are done or
|
||
completed (or until the timeout occurs).
|
||
|
||
- ``FIRST_COMPLETED``: Wait until at least one Future is done or
|
||
cancelled (or until the timeout occurs).
|
||
|
||
- ``FIRST_EXCEPTION``: Wait until at least one Future is done (not
|
||
cancelled) with an exception set. (The exclusion of cancelled
|
||
Futures from the filter is surprising, but PEP 3148 does it this
|
||
way.)
|
||
|
||
- ``tulip.as_completed(fs, timeout=None)``. Returns an iterator whose
|
||
values are Futures; waiting for successive values waits until the
|
||
next Future or coroutine from the set ``fs`` completes, and returns
|
||
its result (or raises its exception). The optional argument
|
||
``timeout`` has the same meaning and default as it does for
|
||
``concurrent.futures.wait()``: when the timeout occurs, the next
|
||
Future returned by the iterator will raise ``TimeoutError`` when
|
||
waited for. Example of use::
|
||
|
||
for f in as_completed(fs):
|
||
result = yield from f # May raise an exception.
|
||
# Use result.
|
||
|
||
Tasks
|
||
-----
|
||
|
||
A Task is an object that manages an independently running coroutine.
|
||
The Task interface is the same as the Future interface. The task
|
||
becomes done when its coroutine returns or raises an exception; if it
|
||
returns a result, that becomes the task's result, if it raises an
|
||
exception, that becomes the task's exception.
|
||
|
||
Cancelling a task that's not done yet prevents its coroutine from
|
||
completing; in this case an exception is thrown into the coroutine
|
||
that it may catch to further handle cancellation, but it doesn't have
|
||
to (this is done using the standard ``close()`` method on generators,
|
||
described in PEP 342).
|
||
|
||
Tasks are also useful for interoperating between coroutines and
|
||
callback-based frameworks like Twisted. After converting a coroutine
|
||
into a Task, callbacks can be added to the Task.
|
||
|
||
You may ask, why not convert all coroutines to Tasks? The
|
||
``@tulip.coroutine`` decorator could do this. This would slow things
|
||
down considerably in the case where one coroutine calls another (and
|
||
so on), as switching to a "bare" coroutine has much less overhead than
|
||
switching to a Task.
|
||
|
||
The Scheduler
|
||
-------------
|
||
|
||
The scheduler has no public interface. You interact with it by using
|
||
``yield from future`` and ``yield from task``. In fact, there is no
|
||
single object representing the scheduler -- its behavior is
|
||
implemented by the ``Task`` and ``Future`` classes using only the
|
||
public interface of the event loop, so it will work with third-party
|
||
event loop implementations, too.
|
||
|
||
Sleeping
|
||
--------
|
||
|
||
TBD: ``yield sleep(seconds)``. Can use ``sleep(0)`` to suspend to
|
||
poll for I/O.
|
||
|
||
Coroutines and Protocols
|
||
------------------------
|
||
|
||
The best way to use coroutines to implement protocols is probably to
|
||
use a streaming buffer that gets filled by ``data_received()`` and can
|
||
be read asynchronously using methods like ``read(n)`` and
|
||
``readline()`` that return a Future. When the connection is closed,
|
||
``read()`` should return a Future whose result is ``b''``, or raise an
|
||
exception if ``connection_closed()`` is called with an exception.
|
||
|
||
To write, the ``write()`` method (and friends) on the transport can be
|
||
used -- these do not return Futures. A standard protocol
|
||
implementation should be provided that sets this up and kicks off the
|
||
coroutine when ``connection_made()`` is called.
|
||
|
||
TBD: Be more specific.
|
||
|
||
Cancellation
|
||
------------
|
||
|
||
TBD. When a Task is cancelled its coroutine may see an exception at
|
||
any point where it is yielding to the scheduler (i.e., potentially at
|
||
any ``yield from`` operation). We need to spell out which exception
|
||
is raised.
|
||
|
||
Also TBD: timeouts.
|
||
|
||
|
||
Open Issues
|
||
===========
|
||
|
||
- A debugging API? E.g. something that logs a lot of stuff, or logs
|
||
unusual conditions (like queues filling up faster than they drain)
|
||
or even callbacks taking too much time...
|
||
|
||
- Do we need introspection APIs? E.g. asking for the read callback
|
||
given a file descriptor. Or when the next scheduled call is. Or
|
||
the list of file descriptors registered with callbacks.
|
||
|
||
- Transports may need a method that tries to return the address of the
|
||
socket (and another for the peer address). Although these depend on
|
||
the socket type and there may not always be a socket; then it should
|
||
return None. (Alternatively, there could be a method to return the
|
||
socket itself -- but it is conceivable that a transport implements
|
||
IP connections without using sockets, and what should it do then?)
|
||
|
||
- Need to handle os.fork(). (This may be up to the selector classes
|
||
in Tulip's case.)
|
||
|
||
- Perhaps start_serving() needs a way to pass in an existing socket
|
||
(e.g. gunicorn would need this). Ditto for create_connection().
|
||
|
||
- We might introduce explicit locks, though these will be a bit of a
|
||
pain to use, as we can't use the ``with lock: block`` syntax
|
||
(because to wait for a lock we'd have to use ``yield from``, which
|
||
the ``with`` statement can't do).
|
||
|
||
- Support for datagram protocols, "connected" or otherwise. Probably
|
||
need more socket I/O methods, e.g. ``sock_sendto()`` and
|
||
``sock_recvfrom()``. Or users can write their own (it's not rocket
|
||
science). Is it reasonable to map ``write()``, ``writelines()``,
|
||
``data_received()`` to single datagrams? Or should we have a
|
||
different ``datagram_received()`` method on datagram protocols?
|
||
(Glyph recommends the latter.) And then what instead of ``write()``?
|
||
Finally, do we need support for unconnected datagram protocols?
|
||
(That would mean wrappers for ``sendto()`` and ``recvfrom()``.)
|
||
|
||
- We may need APIs to control various timeouts. E.g. we may want to
|
||
limit the time spent in DNS resolution, connecting, ssl handshake,
|
||
idle connection, close/shutdown, even per session. Possibly it's
|
||
sufficient to add ``timeout`` keyword parameters to some methods,
|
||
and other timeouts can probably be implemented by clever use of
|
||
``call_later()`` and ``Task.cancel()``. But it's possible that some
|
||
operations need default timeouts, and we may want to change the
|
||
default for a specific operation globally (i.e., per event loop).
|
||
|
||
- An EventEmitter in the style of NodeJS? Or make this a separate
|
||
PEP? It's easy enough to do in user space, though it may benefit
|
||
from standardization. (See
|
||
https://github.com/mnot/thor/blob/master/thor/events.py and
|
||
https://github.com/mnot/thor/blob/master/doc/events.md for examples.)
|
||
|
||
|
||
References
|
||
==========
|
||
|
||
- PEP 380 describes the semantics of ``yield from``. TBD: Greg
|
||
Ewing's tutorial.
|
||
|
||
- PEP 3148 describes ``concurrent.futures.Future``.
|
||
|
||
- PEP 3153, while rejected, has a good write-up explaining the need
|
||
to separate transports and protocols.
|
||
|
||
- Nick Coghlan wrote a nice blog post with some background, thoughts
|
||
about different approaches to async I/O, gevent, and how to use
|
||
futures with constructs like ``while``, ``for`` and ``with``:
|
||
http://python-notes.boredomandlaziness.org/en/latest/pep_ideas/async_programming.html
|
||
|
||
- TBD: references to the relevant parts of Twisted, Tornado, ZeroMQ,
|
||
pyftpdlib, libevent, libev, pyev, libuv, wattle, and so on.
|
||
|
||
|
||
Acknowledgments
|
||
===============
|
||
|
||
Apart from PEP 3153, influences include PEP 380 and Greg Ewing's
|
||
tutorial for ``yield from``, Twisted, Tornado, ZeroMQ, pyftpdlib, tulip
|
||
(the author's attempts at synthesis of all these), wattle (Steve
|
||
Dower's counter-proposal), numerous discussions on python-ideas from
|
||
September through December 2012, a Skype session with Steve Dower and
|
||
Dino Viehland, email exchanges with Ben Darnell, an audience with
|
||
Niels Provos (original author of libevent), and two in-person meetings
|
||
with several Twisted developers, including Glyph, Brian Warner, David
|
||
Reid, and Duncan McGreggor. Also, the author's previous work on async
|
||
support in the NDB library for Google App Engine was an important
|
||
influence.
|
||
|
||
|
||
Copyright
|
||
=========
|
||
|
||
This document has been placed in the public domain.
|
||
|
||
|
||
|
||
..
|
||
Local Variables:
|
||
mode: indented-text
|
||
indent-tabs-mode: nil
|
||
sentence-end-double-space: t
|
||
fill-column: 70
|
||
coding: utf-8
|
||
End:
|