python-peps/pep-3156.txt

PEP: 3156
Title: Asynchronous IO Support Rebooted
Version: $Revision$
Last-Modified: $Date$
Author: Guido van Rossum <guido@python.org>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 12-Dec-2012
Post-History: TBD

Abstract
========

This is a proposal for asynchronous I/O in Python 3, starting with
Python 3.3.  Consider this the concrete proposal that is missing from
PEP 3153.  The proposal includes a pluggable event loop API, transport
and protocol abstractions similar to those in Twisted, and a
higher-level scheduler based on yield-from (PEP 380).  A reference
implementation is in the works under the code name tulip.


Introduction
============

The event loop is the place where most interoperability occurs.  It
should be easy for (Python 3.3 ports of) frameworks like Twisted,
Tornado, or ZeroMQ to either adapt the default event loop
implementation to their needs using a lightweight wrapper or proxy, or
to replace the default event loop implementation with an adaptation of
their own event loop implementation.  (Some frameworks, like Twisted,
have multiple event loop implementations.  This should not be a
problem since these all have the same interface.)

It should even be possible for two different third-party frameworks to
interoperate, either by sharing the default event loop implementation
(each using its own adapter), or by sharing the event loop
implementation of either framework.  In the latter case two levels of
adaptation would occur (from framework A's event loop to the standard
event loop interface, and from there to framework B's event loop).
Which event loop implementation is used should be under control of the
main program (though a default policy for event loop selection is
provided).

Thus, two separate APIs are defined:

- getting and setting the current event loop object
- the interface of a conforming event loop and its minimum guarantees

An event loop implementation may provide additional methods and
guarantees.

The event loop interface does not depend on yield-from.  Rather, it
uses a combination of callbacks, additional interfaces (transports and
protocols), and Futures.  The latter are similar to those defined in
PEP 3148, but have a different implementation and are not tied to
threads.  In particular, they have no wait() method; the user is
expected to use callbacks.

For users (like myself) who don't like using callbacks, a scheduler is
provided for writing asynchronous I/O code as coroutines using the PEP
380 yield-from expressions.  The scheduler is not pluggable;
pluggability occurs at the event loop level, and the scheduler should
work with any conforming event loop implementation.

For interoperability between code written using coroutines and other
async frameworks, the scheduler has a Task class that behaves like a
Future.  A framework that interoperates at the event loop level can
wait for a Future to complete by adding a callback to the Future.
Likewise, the scheduler offers an operation to suspend a coroutine
until a callback is called.

Limited interoperability with threads is provided by the event loop
interface; there is an API to submit a function to an executor (see
PEP 3148) which returns a Future that is compatible with the event
loop.


Non-goals
=========

Interoperability with systems like Stackless Python or
greenlets/gevent is not a goal of this PEP.


Specification
=============

Dependencies
------------

Python 3.3 is required.  No new language or standard library features
beyond Python 3.3 are required.  No third-party modules or packages
are required.

Module Namespace
----------------

The specification here will live in a new toplevel package.  Different
components will live in separate submodules of that package.  The
package will import common APIs from their respective submodules and
make them available as package attributes (similar to the way the
email package works).

The name of the toplevel package is currently unspecified.  The
reference implementation uses the name 'tulip', but the name will
change to something more boring if and when the implementation is
moved into the standard library (hopefully for Python 3.4).

Until the boring name is chosen, this PEP will use 'tulip' as the
toplevel package name.  Classes and functions given without a module
name are assumed to be accessed via the toplevel package.

Event Loop Policy: Getting and Setting the Event Loop
-----------------------------------------------------

To get the current event loop, use ``get_event_loop()``.  This returns
an instance of the ``EventLoop`` class defined below or an equivalent
object.  It is possible that ``get_event_loop()`` returns a different
object depending on the current thread, or depending on some other
notion of context.

To set the current event loop, use ``set_event_loop(eventloop)``,
where ``eventloop`` is an instance of the ``EventLoop`` class or
equivalent.  This uses the same notion of context as
``get_event_loop()``.

To change the way ``get_event_loop()`` and ``set_event_loop()`` work
(including their notion of context), call
``set_event_loop_policy(policy)``, where ``policy`` is an event loop
policy object.  The policy object can be any object that has methods
``get_event_loop()`` and ``set_event_loop(eventloop)`` behaving like
the functions described above.  The default event loop policy is an
instance of the class ``DefaultEventLoopPolicy``.  The current event loop
policy object can be retrieved by calling ``get_event_loop_policy()``.

An event loop policy may but does not have to enforce that there is
only one event loop in existence.  The default event loop policy does
not enforce this, but it does enforce that there is only one event
loop per thread.

Event Loop Interface
--------------------

A conforming event loop object has the following methods:

..
  Look for a better way to format method docs.  PEP 12 doesn't seem to
  have one.  PEP 418 uses ^^^, which makes sub-headings.  PEP 3148
  uses a markup which generates rather heavy layout using blockquote,
  causing a blank line between each method heading and its
  description.  Also think of adding subheadings for different
  categories of methods.

- ``run()``.  Runs the event loop until there is nothing left to do.
  This means, in particular:
  
  - No more calls scheduled with ``call_later()`` (except for canceled
    calls).

  - No more registered file descriptors.  It is up to the registering
    party to unregister a file descriptor when it is closed.

- TBD: Do we need an API for stopping the event loop, given that we
  have the termination condition?  Is the termination condition
  compatible with other frameworks?

- TBD: Do we need an API to run the event loop for a little while
  (e.g.  a single iteration)?  If so, exactly what should it do?

- ``call_later(when, callback, *args)``.  Arrange for
  ``callback(*args)`` to be called approximately ``when`` seconds in
  the future, once, unless canceled.  As usual in Python, ``when`` may
  be a floating point number to represent smaller intervals.  Returns
  a ``DelayedCall`` object representing the callback, whose
  ``cancel()`` method can be used to cancel the callback.

- ``call_soon(callback, *args)``.  Equivalent to ``call_later(0,
  callback, *args)``.

- ``call_soon_threadsafe(callback, *args)``.  Like
  ``call_soon(callback, *args)``, but when called from another thread
  while the event loop is blocked waiting for I/O, unblocks the event
  loop.  This is the *only* method that is safe to call from another
  thread or from a signal handler.  (To schedule a callback for a
  later time in a threadsafe manner, you can use
  ``ev.call_soon_threadsafe(ev.call_later, when, callback, *args)``.)

- TBD: A way to register a callback that is already wrapped in a
  ``DelayedCall``.  Maybe ``call_soon()`` could just check
  ``isinstance(callback, DelayedCall)``?  It should silently skip
  a canceled callback.

Some methods return Futures:

- ``wrap_future(future)``.  This takes a PEP 3148 Future (i.e., an
  instance of ``concurrent.futures.Future``) and returns a Future
  compatible with the event loop (i.e., a ``tulip.Future`` instance).

- ``run_in_executor(executor, function, *args)``.  Arrange to call
  ``function(*args)`` in an executor (see PEP 3148).  Returns a Future
  whose result on success is the return value that call.  This is
  equivalent to ``wrap_future(executor.submit(function, *args))``.  If
  ``executor`` is ``None``, a default ``ThreadPoolExecutor`` with 5
  threads is used.  (TBD: Should the default executor be shared
  between different event loops?  Should we even have a default
  executor?  Should be be able to set its thread count?  Shoul we even
  have this method?)

- ``getaddrinfo(host, port, family=0, type=0, proto=0, flags=0)``.
  Similar to the ``socket.getaddrinfo()`` function but returns a
  Future.  The Future's result on success will be a list of the same
  format as returned by ``socket.getaddrinfo()``.  The default
  implementation calls ``socket.getaddrinfo()`` using
  ``run_in_executor()``, but other implementations may choose to
  implement their own DNS lookup.

- ``getnameinfo(sockaddr, flags)``.  Similar to
  ``socket.getnameinfo()`` but returns a Future.  The Future's result
  on success will be a tuple ``(host, port)``.  Same implementation
  remarks as for ``getaddrinfo()``.

- ``create_transport(...)``.  Creates a transport.  Returns a Future.
  TBD: Signature.  Do we pass in a protocol or protocol class?

- ``start_serving(...)``.  Enters a loop that accepts connections.
  TBD: Signature.  This definitely takes a protocol class.  Do we pass
  in a non-blocking socket that's already bound and listening, or do
  we pass in arguments that let it create and configure the socket
  properly?  (E.g. the event loop may know a better default backlog
  value for ``listen()`` than the typical application developer.)
  TBD: What does it return?  Something we can use to stop accepting
  without stopping the event loop?  A Future that completes when the
  socket is successfully accepting connections?

The following methods for registering callbacks for file descriptors
are optional.  If they are not implemented, accessing the method
(without calling it) returns AttributeError.  The default
implementation provides them but the user normally doesn't use these
directly -- they are used by the transport implementations
exclusively.  Also, on Windows these may be present or not depending
on whether a select-based or IOCP-based event loop is used.  These
take integer file descriptors only, not objects with a fileno()
method.  The file descriptor should represent something pollable --
i.e. no disk files.

- ``add_reader(fd, callback, *args)``.  Arrange for
  ``callback(*args)`` to be called whenever file descriptor ``fd`` is
  ready for reading.  Returns a ``DelayedCall`` object which can be
  used to cancel the callback.  Note that, unlike ``call_later()``,
  the callback may be called many times.  Calling ``add_reader()``
  again for the same file descriptor implicitly cancels the previous
  callback for that file descriptor.

- ``add_writer(fd, callback, *args)``.  Like ``add_reader()``,
  but registers the callback for writing instead of for reading.

- ``remove_reader(fd)``.  Cancels the current read callback for file
  descriptor ``fd``, if one is set.  A no-op if no callback is
  currently set for the file descriptor.  (The reason for providing
  this alternate interface is that it is often more convenient to
  remember the file descriptor than to remember the ``DelayedCall``
  object.)

- ``remove_writer(fd)``.  This is to ``add_writer()`` as
  ``remove_reader()`` is to ``add_reader()``.

The following methods for doing async I/O on sockets are optional.
They are alternative to the previous set of optional methods, intended
for transport implementations on Windows using IOCP (if the event loop
supports it).  The socket argument has to be a non-blocking socket.

- ``sock_recv(sock, n)``.  Receive up to ``n`` bytes from socket
  ``sock``.  Returns a Future whose result on success will be a
  bytes object on success.

- ``sock_sendall(sock, data)``.  Send bytes ``data`` to the socket
  ``sock``.  Returns a Future whose result on success will be
  ``None``.  (TBD: Is it better to emulate ``sendall()`` or ``send()``
  semantics?)

- ``sock_connect(sock, address)``.  Connect to the given address.
  Returns a Future whose result on success will be ``None``.

- ``sock_accept(sock)``.  Accept a connection from a socket.  The
  socket must be in listening mode and bound to an address.  Returns a
  Future whose result on success will be a tuple ``(conn, peer)``
  where ``conn`` is a connected non-blocking socket and ``peer`` is
  the peer address.  (TBD: People tell me that this style of API is
  too slow for high-volume servers.  So there's also
  ``start_serving()`` above.  Then do we still need this?)

Callback Sequencing
-------------------

When two callbacks are scheduled for the same time, they are run
in the order in which they are registered.  For example::

  ev.call_soon(foo)
  ev.call_soon(bar)

guarantees that ``foo()`` is called before ``bar()``.

If ``call_soon()`` is used, this guarantee is true even if the system
clock were to run backwards.  This is also the case for
``call_later(0, callback, *args)``.  However, if ``call_later()`` is
used with a nonzero ``when`` argument, all bets are off if the system
clock were to runs backwards.  (A good event loop implementation
should use ``time.monotonic()`` to avoid problems when the clock runs
backward.  See PEP 418.)

Context
-------

All event loops have a notion of context.  For the default event loop
implementation, the context is a thread.  An event loop implementation
should run all callbacks in the same context.  An event loop
implementation should run only one callback at a time, so callbacks
can assume automatic mutual exclusion with other callbacks scheduled
in the same event loop.

The DelayedCall Class
---------------------

The various methods for registering callbacks (e.g. ``call_later()``)
all return an object representing the registration that can be used to
cancel the callback.  For want of a better name this object is called
a ``DelayedCall``, although the user never needs to instantiate
instances of this class.  There is one public method:

- ``cancel()``.  Attempt to cancel the callback.

Read-only public attributes:

- ``callback``.  The callback function to be called.

- ``args``.  The argument tuple with which to call the callback function.

- ``canceled``.  True if ``cancel()`` has been called.

Note that some callbacks (e.g. those registered with ``call_later()``)
are meant to be called only once.  Others (e.g. those registered with
``add_reader()``) are meant to be called multiple times.

TBD: An API to call the callback (encapsulating the exception handling
necessary)?  Should it record how many times it has been called?
Maybe this API should just be ``__call__()``?  (But it should suppress
exceptions.)

TBD: Public attribute recording the realtime value when the callback
is scheduled?  (Since this is needed anyway for storing it in a heap.)

TBD: A better name for the class?

Futures
-------

The ``tulip.Future`` class here is intentionally similar to the
``concurrent.futures.Future`` class specified by PEP 3148, but there
are slight differences.  The supported public API is as follows,
indicating the differences with PEP 3148:

- ``cancel()``.

- ``cancelled()``.

- ``running()``.  Note that the meaning of this method is essentially
  "cannot be cancelled and isn't done yet".

- ``done()``.

- ``result()``.  Difference with PEP 3148: This has no timeout
  argument and does *not* wait; if the future is not yet done, it
  raises an exception.

- ``exception()``.  Difference with PEP 3148: This has no timeout
  argument and does *not* wait; if the future is not yet done, it
  raises an exception.

- ``add_done_callback(fn)``.  Difference with PEP 3148: The callback
  is never called immediately, and always in the context of the
  caller.  (Typically, a context is a thread.)  You can think of this
  as calling the callback through ``call_soon_threadsafe()``.  Note
  that the callback (unlike all other callbacks defined in this PEP)
  is always called with a single argument, the Future object.

The internal methods defined in PEP 3148 are not supported.

A ``tulip.Future`` object is not acceptable to the ``wait()`` and
``as_completed()`` functions in the ``concurrent.futures`` package.

A ``tulip.Future`` object is acceptable to a yield-from expression
when used in a coroutine.  See the section "Coroutines and the
Scheduler" below.

Transports
----------

TBD.

Protocols
---------

TBD.

Coroutines and the Scheduler
----------------------------

TBD.

Callback Style
--------------

Most interfaces taking a callback also take positional arguments.  For
instance, to arrange for ``foo("abc", 42)`` to be called soon, you
call ``ev.call_soon(foo, "abc", 42)``.  To schedule the call
``foo()``, use ``ev.call_soon(foo)``.  This convention greatly reduces
the number of small lambdas required in typical callback programming.

This convention specifically does *not* support keyword arguments.
Keyword arguments are used to pass optional extra information about
the callback.  This allows graceful evolution of the API without
having to worry about whether a keyword might be significant to a
callee somewhere.  If you have a callback that *must* be called with a
keyword argument, you can use a lambda or ``functools.partial``.  For
example::

  ev.call_soon(functools.partial(foo, "abc", repeat=42))

Choosing an Event Loop Implementation
-------------------------------------

TBD.  (This is about the choice to use e.g. select vs. poll vs. epoll,
and how to override the choice.  Probably belongs in the event loop
policy.)


Open Issues
===========

- How to spell the past tense of 'cancel'?  American usage prefers
  (though not absolutely dictates) 'canceled' (one ell), but outside
  the US 'cancelled' (two ells) prevails.  PEP 3148, whose author
  currently lives in Australia, uses ``cancelled()`` as a method name
  on its Future class.

- Do we need introspection APIs?  E.g. asking for the read callback
  given a file descriptor.  Or when the next scheduled call is.  Or
  the list of file descriptors registered with callbacks.

- Should we have ``future.add_callback(callback, *args)``, using the
  convention from the section "Callback Style" above, or should we
  stick with the PEP 3148 specification of
  ``future.add_done_callback(callback)`` which calls
  ``callback(future)``?  (Glyph suggested using a different method
  name since add_done_callback() does not guarantee that the callback
  will be called in the right context.)

- Returning a Future is relatively expensive, and it is quite possible
  that some types of calls *usually* complete immediately
  (e.g. writing small amounts of data to a socket).  A trick used by
  Richard Oudkerk in the tulip project's proactor branch makes calls
  like recv() either return a regular result or *raise* a Future.  The
  caller (likely a transport) must then write code like this::

    try:
      res = ev.sock_recv(sock, 8192)
    except Future as f:
      yield from sch.block_future(f)
      res = f.result()


Acknowledgments
===============

Apart from PEP 3153, influences include PEP 380 and Greg Ewing's
tutorial for yield-from, Twisted, Tornado, ZeroMQ, pyftpdlib, tulip
(the author's attempts at synthesis of all these), wattle (Steve
Dower's counter-proposal), numerous discussions on python-ideas from
September through December 2012, a Skype session with Steve Dower and
Dino Viehland, email exchanges with Ben Darnell, an audience with
Niels Provos (original author of libevent), and two in-person meetings
with several Twisted developers, including Glyph, Brian Warner, David
Reid, and Duncan McGreggor.


Copyright
=========

This document has been placed in the public domain.


..
   Local Variables:
   mode: indented-text
   indent-tabs-mode: nil
   sentence-end-double-space: t
   fill-column: 70
   coding: utf-8
   End: