934 lines
39 KiB
Plaintext
934 lines
39 KiB
Plaintext
PEP: 3156
|
||
Title: Asynchronous IO Support Rebooted
|
||
Version: $Revision$
|
||
Last-Modified: $Date$
|
||
Author: Guido van Rossum <guido@python.org>
|
||
Status: Draft
|
||
Type: Standards Track
|
||
Content-Type: text/x-rst
|
||
Created: 12-Dec-2012
|
||
Post-History: TBD
|
||
|
||
Abstract
|
||
========
|
||
|
||
This is a proposal for asynchronous I/O in Python 3, starting with
|
||
Python 3.3. Consider this the concrete proposal that is missing from
|
||
PEP 3153. The proposal includes a pluggable event loop API, transport
|
||
and protocol abstractions similar to those in Twisted, and a
|
||
higher-level scheduler based on ``yield from`` (PEP 380). A reference
|
||
implementation is in the works under the code name tulip.
|
||
|
||
|
||
Introduction
|
||
============
|
||
|
||
The event loop is the place where most interoperability occurs. It
|
||
should be easy for (Python 3.3 ports of) frameworks like Twisted,
|
||
Tornado, or ZeroMQ to either adapt the default event loop
|
||
implementation to their needs using a lightweight wrapper or proxy, or
|
||
to replace the default event loop implementation with an adaptation of
|
||
their own event loop implementation. (Some frameworks, like Twisted,
|
||
have multiple event loop implementations. This should not be a
|
||
problem since these all have the same interface.)
|
||
|
||
It should even be possible for two different third-party frameworks to
|
||
interoperate, either by sharing the default event loop implementation
|
||
(each using its own adapter), or by sharing the event loop
|
||
implementation of either framework. In the latter case two levels of
|
||
adaptation would occur (from framework A's event loop to the standard
|
||
event loop interface, and from there to framework B's event loop).
|
||
Which event loop implementation is used should be under control of the
|
||
main program (though a default policy for event loop selection is
|
||
provided).
|
||
|
||
Thus, two separate APIs are defined:
|
||
|
||
- getting and setting the current event loop object
|
||
- the interface of a conforming event loop and its minimum guarantees
|
||
|
||
An event loop implementation may provide additional methods and
|
||
guarantees.
|
||
|
||
The event loop interface does not depend on ``yield from``. Rather, it
|
||
uses a combination of callbacks, additional interfaces (transports and
|
||
protocols), and Futures. The latter are similar to those defined in
|
||
PEP 3148, but have a different implementation and are not tied to
|
||
threads. In particular, they have no wait() method; the user is
|
||
expected to use callbacks.
|
||
|
||
For users (like myself) who don't like using callbacks, a scheduler is
|
||
provided for writing asynchronous I/O code as coroutines using the PEP
|
||
380 ``yield from`` expressions. The scheduler is not pluggable;
|
||
pluggability occurs at the event loop level, and the scheduler should
|
||
work with any conforming event loop implementation.
|
||
|
||
For interoperability between code written using coroutines and other
|
||
async frameworks, the scheduler has a Task class that behaves like a
|
||
Future. A framework that interoperates at the event loop level can
|
||
wait for a Future to complete by adding a callback to the Future.
|
||
Likewise, the scheduler offers an operation to suspend a coroutine
|
||
until a callback is called.
|
||
|
||
Limited interoperability with threads is provided by the event loop
|
||
interface; there is an API to submit a function to an executor (see
|
||
PEP 3148) which returns a Future that is compatible with the event
|
||
loop.
|
||
|
||
|
||
Non-goals
|
||
=========
|
||
|
||
Interoperability with systems like Stackless Python or
|
||
greenlets/gevent is not a goal of this PEP.
|
||
|
||
|
||
Specification
|
||
=============
|
||
|
||
Dependencies
|
||
------------
|
||
|
||
Python 3.3 is required. No new language or standard library features
|
||
beyond Python 3.3 are required. No third-party modules or packages
|
||
are required.
|
||
|
||
Module Namespace
|
||
----------------
|
||
|
||
The specification here will live in a new toplevel package. Different
|
||
components will live in separate submodules of that package. The
|
||
package will import common APIs from their respective submodules and
|
||
make them available as package attributes (similar to the way the
|
||
email package works).
|
||
|
||
The name of the toplevel package is currently unspecified. The
|
||
reference implementation uses the name 'tulip', but the name will
|
||
change to something more boring if and when the implementation is
|
||
moved into the standard library (hopefully for Python 3.4).
|
||
|
||
Until the boring name is chosen, this PEP will use 'tulip' as the
|
||
toplevel package name. Classes and functions given without a module
|
||
name are assumed to be accessed via the toplevel package.
|
||
|
||
Event Loop Policy: Getting and Setting the Event Loop
|
||
-----------------------------------------------------
|
||
|
||
To get the current event loop, use ``get_event_loop()``. This returns
|
||
an instance of the ``EventLoop`` class defined below or an equivalent
|
||
object. It is possible that ``get_event_loop()`` returns a different
|
||
object depending on the current thread, or depending on some other
|
||
notion of context.
|
||
|
||
To set the current event loop, use ``set_event_loop(event_loop)``,
|
||
where ``event_loop`` is an instance of the ``EventLoop`` class or
|
||
equivalent. This uses the same notion of context as
|
||
``get_event_loop()``.
|
||
|
||
For the benefit of unit tests and other special cases there's a third
|
||
policy function: ``init_event_loop()``, which creates a new EventLoop
|
||
instance and calls ``set_event_loop()`` with it. TBD: Maybe we should
|
||
have a ``create_default_event_loop_instance()`` function instead?
|
||
|
||
To change the way the above three functions work
|
||
(including their notion of context), call
|
||
``set_event_loop_policy(policy)``, where ``policy`` is an event loop
|
||
policy object. The policy object can be any object that has methods
|
||
``get_event_loop()``, ``set_event_loop(event_loop)``
|
||
and ``init_event_loop()`` behaving like
|
||
the functions described above. The default event loop policy is an
|
||
instance of the class ``DefaultEventLoopPolicy``. The current event loop
|
||
policy object can be retrieved by calling ``get_event_loop_policy()``.
|
||
|
||
An event loop policy may but does not have to enforce that there is
|
||
only one event loop in existence. The default event loop policy does
|
||
not enforce this, but it does enforce that there is only one event
|
||
loop per thread.
|
||
|
||
Event Loop Interface
|
||
--------------------
|
||
|
||
(A note about times: as usual in Python, all timeouts, intervals and
|
||
delays are measured in seconds, and may be ints or floats. The
|
||
accuracy and precision of the clock are up to the implementation; the
|
||
default implementation uses ``time.monotonic()``.)
|
||
|
||
A conforming event loop object has the following methods:
|
||
|
||
- ``run()``. Runs the event loop until there is nothing left to do.
|
||
This means, in particular:
|
||
|
||
- No more calls scheduled with ``call_later()``,
|
||
``call_repeatedly()``, ``call_soon()``, or
|
||
``call_soon_threadsafe()``, except for cancelled calls.
|
||
|
||
- No more registered file descriptors. It is up to the registering
|
||
party to unregister a file descriptor when it is closed.
|
||
|
||
Note: ``run()`` blocks until the termination condition is met,
|
||
or until ``stop()`` is called.
|
||
|
||
Note: if you schedule a call with ``call_repeatedly()``, ``run()``
|
||
will not exit until you cancel it.
|
||
|
||
TBD: A method to run the loop forever, i.e. until ``stop()`` is called?
|
||
|
||
- ``stop()``. Stops the event loop as soon as it is convenient. It
|
||
is fine to restart the loop with ``run()`` (or one of its variants)
|
||
subsequently.
|
||
|
||
Note: How soon exactly is up to the implementation. All immediate
|
||
callbacks that were already scheduled to run before ``stop()`` is
|
||
called must still be run, but callbacks scheduled after it is called
|
||
(or scheduled to be run later) will not be run.
|
||
|
||
- ``run_until_complete(future, timeout=None)``. Runs the event loop
|
||
until the Future is done. If a timeout is given, it waits at most
|
||
that long. If the Future is done, its result is returned, or its
|
||
exception is raised; if the timeout expires before the Future is
|
||
done, or if ``stop()`` is called, ``TimeoutError`` is raised (but
|
||
the Future is not cancelled). This cannot be called when the event
|
||
loop is already running.
|
||
|
||
Note: This API is most useful for tests and the like. It should not
|
||
be used as a substitute for ``yield from future`` or other ways to
|
||
wait for a Future (e.g. registering a done callback).
|
||
|
||
- ``run_once(timeout=None)``. Run the event loop for a little while.
|
||
If a timeout is given, an I/O poll made will block at most that
|
||
long; otherwise, an I/O poll is not constrained in time.
|
||
|
||
Note: Exactlly how much work this does is up to the implementation.
|
||
One constraint: if a callback immediately schedules itself using
|
||
``call_soon()``, causing an infinite loop, ``run_once()`` should
|
||
still return.
|
||
|
||
- ``call_later(delay, callback, *args)``. Arrange for
|
||
``callback(*args)`` to be called approximately ``delay`` seconds in
|
||
the future, once, unless cancelled. Returns
|
||
a ``Handler`` object representing the callback, whose
|
||
``cancel()`` method can be used to cancel the callback.
|
||
|
||
- ``call_repeatedly(interval, callback, **args)``. Like ``call_later()``
|
||
but calls the callback repeatedly, every ``interval`` seconds,
|
||
until the ``Handler`` returned is cancelled. The first call is in
|
||
``interval`` seconds.
|
||
|
||
- ``call_soon(callback, *args)``. Equivalent to ``call_later(0,
|
||
callback, *args)``.
|
||
|
||
- ``call_soon_threadsafe(callback, *args)``. Like
|
||
``call_soon(callback, *args)``, but when called from another thread
|
||
while the event loop is blocked waiting for I/O, unblocks the event
|
||
loop. This is the *only* method that is safe to call from another
|
||
thread or from a signal handler. (To schedule a callback for a
|
||
later time in a threadsafe manner, you can use
|
||
``ev.call_soon_threadsafe(ev.call_later, when, callback, *args)``.)
|
||
|
||
- TBD: A way to register a callback that is already wrapped in a
|
||
``Handler``. Maybe ``call_soon()`` could just check
|
||
``isinstance(callback, Handler)``? It should silently skip
|
||
a cancelled callback.
|
||
|
||
Some methods in the standard conforming interface return Futures:
|
||
|
||
- ``wrap_future(future)``. This takes a PEP 3148 Future (i.e., an
|
||
instance of ``concurrent.futures.Future``) and returns a Future
|
||
compatible with the event loop (i.e., a ``tulip.Future`` instance).
|
||
|
||
- ``run_in_executor(executor, function, *args)``. Arrange to call
|
||
``function(*args)`` in an executor (see PEP 3148). Returns a Future
|
||
whose result on success is the return value that call. This is
|
||
equivalent to ``wrap_future(executor.submit(function, *args))``. If
|
||
``executor`` is ``None``, a default ``ThreadPoolExecutor`` with 5
|
||
threads is used. (TBD: Should the default executor be shared
|
||
between different event loops? Should we even have a default
|
||
executor? Should be be able to set its thread count? Shoul we even
|
||
have this method?)
|
||
|
||
- ``set_default_executor(executor)``. Set the default executor used
|
||
by ``run_in_executor()``.
|
||
|
||
- ``getaddrinfo(host, port, family=0, type=0, proto=0, flags=0)``.
|
||
Similar to the ``socket.getaddrinfo()`` function but returns a
|
||
Future. The Future's result on success will be a list of the same
|
||
format as returned by ``socket.getaddrinfo()``. The default
|
||
implementation calls ``socket.getaddrinfo()`` using
|
||
``run_in_executor()``, but other implementations may choose to
|
||
implement their own DNS lookup.
|
||
|
||
- ``getnameinfo(sockaddr, flags=0)``. Similar to
|
||
``socket.getnameinfo()`` but returns a Future. The Future's result
|
||
on success will be a tuple ``(host, port)``. Same implementation
|
||
remarks as for ``getaddrinfo()``.
|
||
|
||
- ``create_transport(...)``. Creates a transport. Returns a Future.
|
||
TBD: Signature. Do we pass in a protocol or protocol factory?
|
||
TBD: Should this be called create_connection()?
|
||
|
||
- ``start_serving(...)``. Enters a loop that accepts connections.
|
||
TBD: Signature. There are two possibilities:
|
||
|
||
1. You pass it a non-blocking socket that you have already prepared
|
||
with ``bind()`` and ``listen()`` (these system calls do not block
|
||
AFAIK), a protocol factory (I hesitate to use this word :-), and
|
||
optional flags that control the transport creation (e.g. ssl).
|
||
|
||
2. Instead of a socket, you pass it a host and port, and some more
|
||
optional flags (e.g. to control IPv4 vs IPv6, or to set the
|
||
backlog value to be passed to ``listen()``).
|
||
|
||
In either case, once it has a socket, it will wrap it in a
|
||
transport, and then enter a loop accepting connections (the best way
|
||
to implement such a loop depends on the platform). Each time a
|
||
connection is accepted, a transport and protocol are created for it.
|
||
|
||
This should return an object that can be used to control the serving
|
||
loop, e.g. to stop serving, abort all active connections, and (if
|
||
supported) adjust the backlog or other parameters. It may also have
|
||
an API to inquire about active connections. If version (2) is
|
||
selected, it should probably return a Future whose result on success
|
||
will be that control object, and which becomes done once the accept
|
||
loop is started.
|
||
|
||
TBD: It may be best to use version (2), since on some platforms the
|
||
best way to start a server may not involve sockets (but will still
|
||
involve transports and protocols).
|
||
|
||
TBD: Be more specific.
|
||
|
||
TBD: Some platforms may not be interested in implementing all of
|
||
these, e.g. start_serving() may be of no interest to mobile apps.
|
||
(Although, there's a Minecraft server on my iPad...)
|
||
|
||
The following methods for registering callbacks for file descriptors
|
||
are optional. If they are not implemented, accessing the method
|
||
(without calling it) returns AttributeError. The default
|
||
implementation provides them but the user normally doesn't use these
|
||
directly -- they are used by the transport implementations
|
||
exclusively. Also, on Windows these may be present or not depending
|
||
on whether a select-based or IOCP-based event loop is used. These
|
||
take integer file descriptors only, not objects with a fileno()
|
||
method. The file descriptor should represent something pollable --
|
||
i.e. no disk files.
|
||
|
||
- ``add_reader(fd, callback, *args)``. Arrange for
|
||
``callback(*args)`` to be called whenever file descriptor ``fd`` is
|
||
ready for reading. Returns a ``Handler`` object which can be
|
||
used to cancel the callback. Note that, unlike ``call_later()``,
|
||
the callback may be called many times. Calling ``add_reader()``
|
||
again for the same file descriptor implicitly cancels the previous
|
||
callback for that file descriptor. (TBD: Returning a
|
||
``Handler`` that can be cancelled seems awkward. Let's forget
|
||
about that.) (TBD: Change this to raise an exception if a handler
|
||
is already set.)
|
||
|
||
- ``add_writer(fd, callback, *args)``. Like ``add_reader()``,
|
||
but registers the callback for writing instead of for reading.
|
||
|
||
- ``remove_reader(fd)``. Cancels the current read callback for file
|
||
descriptor ``fd``, if one is set. A no-op if no callback is
|
||
currently set for the file descriptor. (The reason for providing
|
||
this alternate interface is that it is often more convenient to
|
||
remember the file descriptor than to remember the ``Handler``
|
||
object.) (TBD: Return ``True`` if a handler was removed, ``False``
|
||
if not.)
|
||
|
||
- ``remove_writer(fd)``. This is to ``add_writer()`` as
|
||
``remove_reader()`` is to ``add_reader()``.
|
||
|
||
- ``add_connector(fd, callback, *args)``. Like ``add_writer()`` but
|
||
meant to wait for ``connect()`` operations, which on some platforms
|
||
require different handling (e.g. ``WSAPoll()`` on Windows).
|
||
|
||
- ``remove_connector(fd)``. This is to ``remove_writer()`` as
|
||
``add_connector()`` is to ``add_writer()``.
|
||
|
||
TBD: What about multiple callbacks per fd? The current semantics is
|
||
that ``add_reader()/add_writer()`` replace a previously registered
|
||
callback. Change this to raise an exception if a callback is already
|
||
registered.
|
||
|
||
The following methods for doing async I/O on sockets are optional.
|
||
They are alternative to the previous set of optional methods, intended
|
||
for transport implementations on Windows using IOCP (if the event loop
|
||
supports it). The socket argument has to be a non-blocking socket.
|
||
|
||
- ``sock_recv(sock, n)``. Receive up to ``n`` bytes from socket
|
||
``sock``. Returns a Future whose result on success will be a
|
||
bytes object on success.
|
||
|
||
- ``sock_sendall(sock, data)``. Send bytes ``data`` to the socket
|
||
``sock``. Returns a Future whose result on success will be
|
||
``None``. (TBD: Is it better to emulate ``sendall()`` or ``send()``
|
||
semantics? I think ``sendall()`` -- but perhaps it should still
|
||
be *named* ``send()``?)
|
||
|
||
- ``sock_connect(sock, address)``. Connect to the given address.
|
||
Returns a Future whose result on success will be ``None``.
|
||
|
||
- ``sock_accept(sock)``. Accept a connection from a socket. The
|
||
socket must be in listening mode and bound to an address. Returns a
|
||
Future whose result on success will be a tuple ``(conn, peer)``
|
||
where ``conn`` is a connected non-blocking socket and ``peer`` is
|
||
the peer address. (TBD: People tell me that this style of API is
|
||
too slow for high-volume servers. So there's also
|
||
``start_serving()`` above. Then do we still need this?)
|
||
|
||
TBD: Optional methods are not so good. Perhaps these should be
|
||
required? It may still depend on the platform which set is more
|
||
efficient.
|
||
|
||
Callback Sequencing
|
||
-------------------
|
||
|
||
When two callbacks are scheduled for the same time, they are run
|
||
in the order in which they are registered. For example::
|
||
|
||
ev.call_soon(foo)
|
||
ev.call_soon(bar)
|
||
|
||
guarantees that ``foo()`` is called before ``bar()``.
|
||
|
||
If ``call_soon()`` is used, this guarantee is true even if the system
|
||
clock were to run backwards. This is also the case for
|
||
``call_later(0, callback, *args)``. However, if ``call_later()`` is
|
||
used with a nonzero delay, all bets are off if the system
|
||
clock were to runs backwards. (A good event loop implementation
|
||
should use ``time.monotonic()`` to avoid problems when the clock runs
|
||
backward. See PEP 418.)
|
||
|
||
Context
|
||
-------
|
||
|
||
All event loops have a notion of context. For the default event loop
|
||
implementation, the context is a thread. An event loop implementation
|
||
should run all callbacks in the same context. An event loop
|
||
implementation should run only one callback at a time, so callbacks
|
||
can assume automatic mutual exclusion with other callbacks scheduled
|
||
in the same event loop.
|
||
|
||
Exceptions
|
||
----------
|
||
|
||
There are two categories of exceptions in Python: those that derive
|
||
from the ``Exception`` class and those that derive from
|
||
``BaseException``. Exceptions deriving from ``Exception`` will
|
||
generally be caught and handled appropriately; for example, they will
|
||
be passed through by Futures, and they will be logged and ignored when
|
||
they occur in a callback.
|
||
|
||
However, exceptions deriving only from ``BaseException`` are never
|
||
caught, and will usually cause the program to terminate with a
|
||
traceback. (Examples of this category include ``KeyboardInterrupt``
|
||
and ``SystemExit``; it is usually unwise to treat these the same as
|
||
most other exceptions.)
|
||
|
||
The Handler Class
|
||
-----------------
|
||
|
||
The various methods for registering callbacks (e.g. ``call_later()``)
|
||
all return an object representing the registration that can be used to
|
||
cancel the callback. For want of a better name this object is called
|
||
a ``Handler``, although the user never needs to instantiate
|
||
instances of this class. There is one public method:
|
||
|
||
- ``cancel()``. Attempt to cancel the callback.
|
||
TBD: Exact specification.
|
||
|
||
Read-only public attributes:
|
||
|
||
- ``callback``. The callback function to be called.
|
||
|
||
- ``args``. The argument tuple with which to call the callback function.
|
||
|
||
- ``cancelled``. True if ``cancel()`` has been called.
|
||
|
||
Note that some callbacks (e.g. those registered with ``call_later()``)
|
||
are meant to be called only once. Others (e.g. those registered with
|
||
``add_reader()``) are meant to be called multiple times.
|
||
|
||
TBD: An API to call the callback (encapsulating the exception handling
|
||
necessary)? Should it record how many times it has been called?
|
||
Maybe this API should just be ``__call__()``? (But it should suppress
|
||
exceptions.)
|
||
|
||
TBD: Public attribute recording the realtime value when the callback
|
||
is scheduled? (Since this is needed anyway for storing it in a heap.)
|
||
|
||
Futures
|
||
-------
|
||
|
||
The ``tulip.Future`` class here is intentionally similar to the
|
||
``concurrent.futures.Future`` class specified by PEP 3148, but there
|
||
are slight differences. The supported public API is as follows,
|
||
indicating the differences with PEP 3148:
|
||
|
||
- ``cancel()``.
|
||
TBD: Exact specification.
|
||
|
||
- ``cancelled()``.
|
||
|
||
- ``running()``. Note that the meaning of this method is essentially
|
||
"cannot be cancelled and isn't done yet". (TBD: Would be nice if
|
||
this could be set *and* cleared in some cases, e.g. sock_recv().)
|
||
|
||
- ``done()``.
|
||
|
||
- ``result()``. Difference with PEP 3148: This has no timeout
|
||
argument and does *not* wait; if the future is not yet done, it
|
||
raises an exception.
|
||
|
||
- ``exception()``. Difference with PEP 3148: This has no timeout
|
||
argument and does *not* wait; if the future is not yet done, it
|
||
raises an exception.
|
||
|
||
- ``add_done_callback(fn)``. Difference with PEP 3148: The callback
|
||
is never called immediately, and always in the context of the
|
||
caller. (Typically, a context is a thread.) You can think of this
|
||
as calling the callback through ``call_soon_threadsafe()``. Note
|
||
that the callback (unlike all other callbacks defined in this PEP,
|
||
and ignoring the convention from the section "Callback Style" below)
|
||
is always called with a single argument, the Future object.
|
||
|
||
The internal methods defined in PEP 3148 are not supported. (TBD:
|
||
Maybe we do need to support these, in order to make it easy to write
|
||
user code that returns a Future?)
|
||
|
||
A ``tulip.Future`` object is not acceptable to the ``wait()`` and
|
||
``as_completed()`` functions in the ``concurrent.futures`` package.
|
||
|
||
A ``tulip.Future`` object is acceptable to a ``yield from`` expression
|
||
when used in a coroutine. This is implemented through the
|
||
``__iter__()`` interface on the Future. See the section "Coroutines
|
||
and the Scheduler" below.
|
||
|
||
When a Future is garbage-collected, if it has an associated exception
|
||
but neither ``result()`` nor ``exception()`` nor ``__iter__()`` has
|
||
ever been called (or the latter hasn't raised the exception yet --
|
||
details TBD), the exception should be logged. TBD: At what level?
|
||
|
||
In the future (pun intended) we may unify ``tulip.Future`` and
|
||
``concurrent.futures.Future``, e.g. by adding an ``__iter__()`` method
|
||
to the latter that works with ``yield from``. To prevent accidentally
|
||
blocking the event loop by calling e.g. ``result()`` on a Future
|
||
that's not don yet, the blocking operation may detect that an event
|
||
loop is active in the current thread and raise an exception instead.
|
||
However the current PEP strives to have no dependencies beyond Python
|
||
3.3, so changes to ``concurrent.futures.Future`` are off the table for
|
||
now.
|
||
|
||
Transports
|
||
----------
|
||
|
||
A transport is an abstraction on top of a socket or something similar
|
||
(for example, a UNIX pipe or an SSL connection). Transports are
|
||
strongly influenced by Twisted and PEP 3153. Users rarely implement
|
||
or instantiate transports -- rather, event loops offer utility methods
|
||
to set up transports.
|
||
|
||
Transports work in conjunction with protocols. Protocols are
|
||
typically written without knowing or caring about the exact type of
|
||
transport used, and transports can be used with a wide variety of
|
||
protocols. For example, an HTTP client protocol implementation may be
|
||
used with either a plain socket transport or an SSL transport. The
|
||
plain socket transport can be used with many different protocols
|
||
besides HTTP (e.g. SMTP, IMAP, POP, FTP, IRC, SPDY).
|
||
|
||
Most connections have an asymmetric nature: the client and server
|
||
usually have very different roles and behaviors. Hence, the interface
|
||
between transport and protocol is also asymmetric. From the
|
||
protocol's point of view, *writing* data is done by calling the
|
||
``write()`` method on the transport object; this buffers the data and
|
||
returns immediately. However, the transport takes a more active role
|
||
in *reading* data: whenever some data is read from the socket (or
|
||
other data source), the transport calls the protocol's
|
||
``data_received()`` method.
|
||
|
||
Transports have the following public methods:
|
||
|
||
- ``write(data)``. Write some bytes. The argument must be a bytes
|
||
object. Returns ``None``. The transport is free to buffer the
|
||
bytes, but it must eventually cause the bytes to be transferred to
|
||
the entity at the other end, and it must maintain stream behavior.
|
||
That is, ``t.write(b'abc'); t.write(b'def')`` is equivalent to
|
||
``t.write(b'abcdef')``, as well as to::
|
||
|
||
t.write(b'a')
|
||
t.write(b'b')
|
||
t.write(b'c')
|
||
t.write(b'd')
|
||
t.write(b'e')
|
||
t.write(b'f')
|
||
|
||
- ``writelines(iterable)``. Equivalent to::
|
||
|
||
for data in iterable:
|
||
self.write(data)
|
||
|
||
- ``write_eof()``. Close the writing end of the connection.
|
||
Subsequent calls to ``write()`` are not allowed. Once all buffered
|
||
data is transferred, the transport signals to the other end that no
|
||
more data will be received. Some protocols don't support this
|
||
operation; in that case, calling ``write_eof()`` will raise an
|
||
exception. (Note: This used to be called ``half_close()``, but
|
||
unless you already know what it is for, that name doesn't indicate
|
||
*which* end is closed.)
|
||
|
||
- ``can_write_eof()``. Return ``True`` if the protocol supports
|
||
``write_eof()``, ``False`` if it does not. (This method is needed
|
||
because some protocols need to change their behavior when
|
||
``write_eof()`` is unavailable. For example, in HTTP, to send data
|
||
whose size is not known ahead of time, the end of the data is
|
||
typically indicated using ``write_eof()``; however, SSL does not
|
||
support this, and an HTTP protocol implementation would have to use
|
||
the "chunked" transfer encoding in this case. But if the data size
|
||
is known ahead of time, the best approach in both cases is to use
|
||
the Content-Length header.)
|
||
|
||
- ``pause()``. Suspend delivery of data to the protocol until a
|
||
subsequent ``resume()`` call. Between ``pause()`` and ``resume()``,
|
||
the protocol's ``data_received()`` method will not be called. This
|
||
has no effect on ``write()``.
|
||
|
||
- ``resume()``. Restart delivery of data to the protocol via
|
||
``data_received()``.
|
||
|
||
- ``close()``. Sever the connection with the entity at the other end.
|
||
Any data buffered by ``write()`` will (eventually) be transferred
|
||
before the connection is actually closed. The protocol's
|
||
``data_received()`` method will not be called again. Once all
|
||
buffered data has been flushed, the protocol's ``connection_lost()``
|
||
method will be called with ``None`` as the argument. Note that
|
||
this method does not wait for all that to happen.
|
||
|
||
- ``abort()``. Immediately sever the connection. Any data still
|
||
buffered by the transport is thrown away. Soon, the protocol's
|
||
``connection_lost()`` method will be called with ``None`` as
|
||
argument. (TBD: Distinguish in the ``connection_lost()`` argument
|
||
between ``close()``, ``abort()`` or a close initated by the other
|
||
end? Or add a transport method to inquire about this? Glyph's
|
||
proposal was to pass different exceptions for this purpose.)
|
||
|
||
TBD: Provide flow control the other way -- the transport may need to
|
||
suspend the protocol if the amount of data buffered becomes a burden.
|
||
Proposal: let the transport call ``protocol.pause()`` and
|
||
``protocol.resume()`` if they exist; if they don't exist, the
|
||
protocol doesn't support flow control. (Perhaps different names
|
||
to avoid confusion between protocols and transports?)
|
||
|
||
Protocols
|
||
---------
|
||
|
||
Protocols are always used in conjunction with transports. While a few
|
||
common protocols are provided (e.g. decent though not necessarily
|
||
excellent HTTP client and server implementations), most protocols will
|
||
be implemented by user code or third-party libraries.
|
||
|
||
A protocol must implement the following methods, which will be called
|
||
by the transport. Consider these callbacks that are always called by
|
||
the event loop in the right context. (See the "Context" section
|
||
above.)
|
||
|
||
- ``connection_made(transport)``. Indicates that the transport is
|
||
ready and connected to the entity at the other end. The protocol
|
||
should probably save the transport reference as an instance variable
|
||
(so it can call its ``write()`` and other methods later), and may
|
||
write an initial greeting or request at this point.
|
||
|
||
- ``data_received(data)``. The transport has read some bytes from the
|
||
connection. The argument is always a non-empty bytes object. There
|
||
are no guarantees about the minimum or maximum size of the data
|
||
passed along this way. ``p.data_received(b'abcdef')`` should be
|
||
treated exactly equivalent to::
|
||
|
||
p.data_received(b'abc')
|
||
p.data_received(b'def')
|
||
|
||
- ``eof_received()``. This is called when the other end called
|
||
``write_eof()`` (or something equivalent). The default
|
||
implementation calls ``close()`` on the transport, which causes
|
||
``connection_lost()`` to be called (eventually) on the protocol.
|
||
|
||
- ``connection_lost(exc)``. The transport has been closed or aborted,
|
||
has detected that the other end has closed the connection cleanly,
|
||
or has encountered an unexpected error. In the first three cases
|
||
the argument is ``None``; for an unexpected error, the argument is
|
||
the exception that caused the transport to give up. (TBD: Do we
|
||
need to distinguish between the first three cases?)
|
||
|
||
Here is a chart indicating the order and multiplicity of calls:
|
||
|
||
1. ``connection_made()`` -- exactly once
|
||
2. ``data_received()`` -- zero or more times
|
||
3. ``eof_received()`` -- at most once
|
||
4. ``connection_lost()`` -- exactly once
|
||
|
||
TBD: Discuss whether user code needs to do anything to make sure that
|
||
protocol and transport aren't garbage-collected prematurely.
|
||
|
||
Callback Style
|
||
--------------
|
||
|
||
Most interfaces taking a callback also take positional arguments. For
|
||
instance, to arrange for ``foo("abc", 42)`` to be called soon, you
|
||
call ``ev.call_soon(foo, "abc", 42)``. To schedule the call
|
||
``foo()``, use ``ev.call_soon(foo)``. This convention greatly reduces
|
||
the number of small lambdas required in typical callback programming.
|
||
|
||
This convention specifically does *not* support keyword arguments.
|
||
Keyword arguments are used to pass optional extra information about
|
||
the callback. This allows graceful evolution of the API without
|
||
having to worry about whether a keyword might be significant to a
|
||
callee somewhere. If you have a callback that *must* be called with a
|
||
keyword argument, you can use a lambda or ``functools.partial``. For
|
||
example::
|
||
|
||
ev.call_soon(functools.partial(foo, "abc", repeat=42))
|
||
|
||
Choosing an Event Loop Implementation
|
||
-------------------------------------
|
||
|
||
TBD. (This is about the choice to use e.g. select vs. poll vs. epoll,
|
||
and how to override the choice. Probably belongs in the event loop
|
||
policy.)
|
||
|
||
|
||
Coroutines and the Scheduler
|
||
============================
|
||
|
||
This is a separate toplevel section because its status is different
|
||
from the event loop interface. Usage of coroutines is optional, and
|
||
it is perfectly fine to write code using callbacks only. On the other
|
||
hand, there is only one implementation of the scheduler/coroutine API,
|
||
and if you're using coroutines, that's the one you're using.
|
||
|
||
Coroutines
|
||
----------
|
||
|
||
A coroutine is a generator that follows certain conventions. For
|
||
documentation purposes, all coroutines should be decorated with
|
||
``@tulip.coroutine``, but this cannot be strictly enforced.
|
||
|
||
Coroutines use the ``yield from`` syntax introduced in PEP 380,
|
||
instead of the original ``yield`` syntax.
|
||
|
||
The word "coroutine", like the word "generator", is used for two
|
||
different (though related) concepts:
|
||
|
||
- The function that defines a coroutine (a function definition
|
||
decorated with ``tulip.coroutine``). If disambiguation is needed,
|
||
we call this a *coroutine function*.
|
||
|
||
- The object obtained by calling a coroutine function. This object
|
||
represents a computation or an I/O operation (usually a combination)
|
||
that will complete eventually. For disambiguation we call it a
|
||
*coroutine object*.
|
||
|
||
Things a coroutine can do:
|
||
|
||
- ``result = yield from future`` -- suspends the coroutine until the
|
||
future is done, then returns the future's result, or raises its
|
||
exception, which will be propagated.
|
||
|
||
- ``result = yield from coroutine`` -- wait for another coroutine to
|
||
produce a result (or raise an exception, which will be propagated).
|
||
The ``coroutine`` expression must be a *call* to another coroutine.
|
||
|
||
- ``results = yield from tulip.par(futures_and_coroutines)`` -- Wait
|
||
for a list of futures and/or coroutines to complete and return a
|
||
list of their results. If one of the futures or coroutines raises
|
||
an exception, that exception is propagated, after attempting to
|
||
cancel all other futures and coroutines in the list.
|
||
|
||
- ``return result`` -- produce a result to the coroutine that is
|
||
waiting for this one using ``yield from``.
|
||
|
||
- ``raise exception`` -- raise an exception in the coroutine that is
|
||
waiting for this one using ``yield from``.
|
||
|
||
Calling a coroutine does not start its code running -- it is just a
|
||
generator, and the coroutine object returned by the call is really a
|
||
generator object, which doesn't do anything until you iterate over it.
|
||
In the case of a coroutine object, there are two basic ways to start
|
||
it running: call ``yield from coroutine`` from another coroutine
|
||
(assuming the other coroutine is already running!), or convert it to a
|
||
Task.
|
||
|
||
Coroutines can only run when the event loop is running.
|
||
|
||
Tasks
|
||
-----
|
||
|
||
A Task is an object that manages an independently running coroutine.
|
||
The Task interface is the same as the Future interface. The task
|
||
becomes done when its coroutine returns or raises an exception; if it
|
||
returns a result, that becomes the task's result, if it raises an
|
||
exception, that becomes the task's exception.
|
||
|
||
Cancelling a task that's not done yet prevents its coroutine from
|
||
completing; in this case an exception is thrown into the coroutine
|
||
that it may catch to further handle cancellation, but it doesn't have
|
||
to (this is done using the standard ``close()`` method on generators,
|
||
described in PEP 342).
|
||
|
||
The ``par()`` function described above runs coroutines in parallel by
|
||
converting them to Tasks. (Arguments that are already Tasks or
|
||
Futures are not converted.)
|
||
|
||
Tasks are also useful for interoperating between coroutines and
|
||
callback-based frameworks like Twisted. After converting a coroutine
|
||
into a Task, callbacks can be added to the Task.
|
||
|
||
You may ask, why not convert all coroutines to Tasks? The
|
||
``@tulip.coroutine`` decorator could do this. This would slow things
|
||
down considerably in the case where one coroutine calls another (and
|
||
so on), as switching to a "bare" coroutine has much less overhead than
|
||
switching to a Task.
|
||
|
||
The Scheduler
|
||
-------------
|
||
|
||
The scheduler has no public interface. You interact with it by using
|
||
``yield from future`` and ``yield from task``. In fact, there is no
|
||
single object representing the scheduler -- its behavior is
|
||
implemented by the ``Task`` and ``Future`` classes using only the
|
||
public interface of the event loop, so it will work with third-party
|
||
event loop implementations, too.
|
||
|
||
Sleeping
|
||
--------
|
||
|
||
TBD: ``yield sleep(seconds)``. Can use ``sleep(0)`` to suspend to
|
||
poll for I/O.
|
||
|
||
Wait for First
|
||
--------------
|
||
|
||
TBD: Need an interface to wait for the first of a collection of Futures.
|
||
|
||
Coroutines and Protocols
|
||
------------------------
|
||
|
||
The best way to use coroutines to implement protocols is probably to
|
||
use a streaming buffer that gets filled by ``data_received()`` and can
|
||
be read asynchronously using methods like ``read(n)`` and
|
||
``readline()`` that return a Future. When the connection is closed,
|
||
``read()`` should return a Future whose result is ``b''``, or raise an
|
||
exception if ``connection_closed()`` is called with an exception.
|
||
|
||
To write, the ``write()`` method (and friends) on the transport can be
|
||
used -- these do not return Futures. A standard protocol
|
||
implementation should be provided that sets this up and kicks off the
|
||
coroutine when ``connection_made()`` is called.
|
||
|
||
TBD: Be more specific.
|
||
|
||
Cancellation
|
||
------------
|
||
|
||
TBD. When a Task is cancelled its coroutine may see an exception at
|
||
any point where it is yielding to the scheduler (i.e., potentially at
|
||
any ``yield from`` operation). We need to spell out which exception
|
||
is raised.
|
||
|
||
Also TBD: timeouts.
|
||
|
||
|
||
Open Issues
|
||
===========
|
||
|
||
- A debugging API? E.g. something that logs a lot of stuff, or logs
|
||
unusual conditions (like queues filling up faster than they drain)
|
||
or even callbacks taking too much time...
|
||
|
||
- Do we need introspection APIs? E.g. asking for the read callback
|
||
given a file descriptor. Or when the next scheduled call is. Or
|
||
the list of file descriptors registered with callbacks.
|
||
|
||
- Should we have ``future.add_callback(callback, *args)``, using the
|
||
convention from the section "Callback Style" above, or should we
|
||
stick with the PEP 3148 specification of
|
||
``future.add_done_callback(callback)`` which calls
|
||
``callback(future)``? (Glyph suggested using a different method
|
||
name since add_done_callback() does not guarantee that the callback
|
||
will be called in the right context.)
|
||
|
||
- Returning a Future is relatively expensive, and it is quite possible
|
||
that some types of calls *usually* complete immediately
|
||
(e.g. writing small amounts of data to a socket). A trick used by
|
||
Richard Oudkerk in the tulip project's proactor branch makes calls
|
||
like recv() either return a regular result or *raise* a Future. The
|
||
caller (likely a transport) must then write code like this::
|
||
|
||
try:
|
||
res = ev.sock_recv(sock, 8192)
|
||
except Future as f:
|
||
yield from sch.block_future(f)
|
||
res = f.result()
|
||
|
||
- Do we need a larger vocabulary of operations for combining
|
||
coroutines and/or futures? E.g. in addition to par() we could have
|
||
a way to run several coroutines sequentially (returning all results
|
||
or passing the result of one to the next and returning the final
|
||
result?). We might also introduce explicit locks (though these will
|
||
be a bit of a pain to use, as we can't use the ``with lock: block``
|
||
syntax). Anyway, I think all of these are easy enough to write
|
||
using ``Task``.
|
||
|
||
Proposal: ``f = yield from wait_one(fs)`` takes a set of Futures and
|
||
sets f to the first of those that is done. (Yes, this requires an
|
||
intermediate Future to wait for.) You can then write::
|
||
|
||
while fs:
|
||
f = tulip.wait_one(fs)
|
||
fs.remove(f)
|
||
<inspect f>
|
||
|
||
- Support for datagram protocols, "connected" or otherwise? Probably
|
||
need more socket I/O methods, e.g. ``sock_sendto()`` and
|
||
``sock_recvfrom()``. Or users can write their own (it's not rocket
|
||
science). Is it reasonable to map ``write()``, ``writelines()``,
|
||
``data_received()`` to single datagrams?
|
||
|
||
- Task or callback priorities? (I hope not.)
|
||
|
||
- An EventEmitter in the style of NodeJS? Or make this a separate
|
||
PEP? It's easy enough to do in user space, though it may benefit
|
||
from standardization. (See
|
||
https://github.com/mnot/thor/blob/master/thor/events.py and
|
||
https://github.com/mnot/thor/blob/master/doc/events.md for examples.)
|
||
|
||
|
||
Acknowledgments
|
||
===============
|
||
|
||
Apart from PEP 3153, influences include PEP 380 and Greg Ewing's
|
||
tutorial for ``yield from``, Twisted, Tornado, ZeroMQ, pyftpdlib, tulip
|
||
(the author's attempts at synthesis of all these), wattle (Steve
|
||
Dower's counter-proposal), numerous discussions on python-ideas from
|
||
September through December 2012, a Skype session with Steve Dower and
|
||
Dino Viehland, email exchanges with Ben Darnell, an audience with
|
||
Niels Provos (original author of libevent), and two in-person meetings
|
||
with several Twisted developers, including Glyph, Brian Warner, David
|
||
Reid, and Duncan McGreggor. Also, the author's previous work on async
|
||
support in the NDB library for Google App Engine was an important
|
||
influence.
|
||
|
||
|
||
Copyright
|
||
=========
|
||
|
||
This document has been placed in the public domain.
|
||
|
||
|
||
|
||
..
|
||
Local Variables:
|
||
mode: indented-text
|
||
indent-tabs-mode: nil
|
||
sentence-end-double-space: t
|
||
fill-column: 70
|
||
coding: utf-8
|
||
End:
|