2011-06-26 05:15:11 -04:00
|
|
|
|
PEP: 3153
|
2011-06-25 06:33:30 -04:00
|
|
|
|
Title: Asynchronous IO support
|
|
|
|
|
Version: $Revision$
|
|
|
|
|
Last-Modified: $Date$
|
|
|
|
|
Author: Laurens Van Houtven <_@lvh.cc>
|
|
|
|
|
Status: Draft
|
|
|
|
|
Type: Standards Track
|
|
|
|
|
Content-Type: text/x-rst
|
|
|
|
|
Created: 29-May-2011
|
|
|
|
|
Post-History: TBD
|
|
|
|
|
|
|
|
|
|
Abstract
|
|
|
|
|
========
|
|
|
|
|
|
|
|
|
|
This PEP describes an abstraction of asynchronous IO for the Python
|
|
|
|
|
standard library.
|
|
|
|
|
|
|
|
|
|
The goal is to reach a abstraction that can be implemented by many
|
|
|
|
|
different asynchronous IO backends and provides a target for library
|
|
|
|
|
developers to write code portable between those different backends.
|
|
|
|
|
|
|
|
|
|
Rationale
|
|
|
|
|
=========
|
|
|
|
|
|
|
|
|
|
People who want to write asynchronous code in Python right now have a
|
|
|
|
|
few options:
|
|
|
|
|
|
|
|
|
|
- ``asyncore`` and ``asynchat``
|
|
|
|
|
- something bespoke, most likely based on the ``select`` module
|
|
|
|
|
- using a third party library, such as Twisted_ or gevent_
|
|
|
|
|
|
|
|
|
|
Unfortunately, each of these options has its downsides, which this PEP
|
|
|
|
|
tries to address.
|
|
|
|
|
|
|
|
|
|
Despite having been part of the Python standard library for a long time,
|
|
|
|
|
the asyncore module suffers from fundamental flaws following from
|
|
|
|
|
an inflexible API that does not stand up to the expectations of
|
|
|
|
|
a modern asynchronous networking module.
|
|
|
|
|
|
|
|
|
|
Moreover, its approach is too simplistic to provide developers with all
|
|
|
|
|
the tools they need in order to fully exploit the potential of asynchronous
|
|
|
|
|
networking.
|
|
|
|
|
|
|
|
|
|
The most popular solution right now used in production involves the
|
|
|
|
|
use of third party libraries. These often provide satisfactory
|
|
|
|
|
solutions, but there is a lack of compatibility between these
|
|
|
|
|
libraries, which tends to make codebases very tightly coupled to the
|
|
|
|
|
library they use.
|
|
|
|
|
|
|
|
|
|
This current lack of portability between different asynchronous IO
|
|
|
|
|
libraries causes a lot of duplicated effort for third party library
|
|
|
|
|
developers. A sufficiently powerful abstraction could mean that
|
|
|
|
|
asynchronous code gets written once, but used everywhere.
|
|
|
|
|
|
|
|
|
|
An eventual added goal would be for standard library implementations
|
|
|
|
|
of wire and network protocols to evolve towards being real protocol
|
|
|
|
|
implementations, as opposed to standalone libraries that do everything
|
|
|
|
|
including calling ``recv()`` blockingly. This means they could be
|
|
|
|
|
easily reused for both synchronous and asynchronous code.
|
|
|
|
|
|
|
|
|
|
.. _Twisted: http://www.twistedmatrix.com/
|
|
|
|
|
.. _gevent: http://www.gevent.org/
|
|
|
|
|
|
|
|
|
|
Communication abstractions
|
|
|
|
|
==========================
|
|
|
|
|
|
|
|
|
|
Transports
|
|
|
|
|
----------
|
|
|
|
|
|
|
|
|
|
Transports provide a uniform API for reading bytes from and writing
|
|
|
|
|
bytes to different kinds of connections. Transports in this PEP are
|
|
|
|
|
always ordered, reliable, bidirectional, stream-oriented two-endpoint
|
|
|
|
|
connections. This might be a TCP socket, an SSL connection, a pipe
|
|
|
|
|
(named or otherwise), a serial port... It may abstract a file descriptor
|
|
|
|
|
on POSIX platforms or a Handle on Windows or some other data structure
|
|
|
|
|
appropriate to a particular platform. It encapsulates all of the
|
|
|
|
|
particular implementation details of using that platform data structure
|
|
|
|
|
and presents a uniform interface for application developers.
|
|
|
|
|
|
|
|
|
|
Transports talk to two things: the other side of the connection on
|
|
|
|
|
one hand, and a protocol on the other. It's a bridge between the
|
|
|
|
|
specific underlying transfer mechanism and the protocol. Its job can
|
|
|
|
|
be described as allowing the protocol to just send and receive bytes,
|
|
|
|
|
taking care of all of the magic that needs to happen to those bytes
|
|
|
|
|
to be eventually sent across the wire.
|
|
|
|
|
|
|
|
|
|
The primary feature of a transport is sending bytes to a protocol and
|
|
|
|
|
receiving bytes from the underlying protocol. Writing to the transport
|
|
|
|
|
is done using the ``write`` and ``write_sequence`` methods. The latter
|
|
|
|
|
method is a performance optimization, to allow software to take
|
|
|
|
|
advantage of specific capabilities in some transport
|
|
|
|
|
mechanisms. Specifically, this allows transports to use writev_
|
|
|
|
|
instead of write_ or send_, also known as scatter/gather IO.
|
|
|
|
|
|
|
|
|
|
A transport can be paused and resumed. This will cause it to buffer
|
|
|
|
|
data coming from protocols and stop sending received data to the
|
|
|
|
|
protocol.
|
|
|
|
|
|
|
|
|
|
A transport can also be closed, half-closed and aborted. A closed
|
|
|
|
|
transport will finish writing all of the data queued in it to the
|
|
|
|
|
underlying mechanism, and will then stop reading or writing
|
|
|
|
|
data. Aborting a transport stops it, closing the connection without
|
|
|
|
|
sending any data that is still queued.
|
|
|
|
|
|
|
|
|
|
Further writes will result in exceptions being thrown. A half-closed
|
|
|
|
|
transport may not be written to anymore, but will still accept
|
|
|
|
|
incoming data.
|
|
|
|
|
|
|
|
|
|
Protocols
|
|
|
|
|
---------
|
|
|
|
|
|
|
|
|
|
Protocols are probably more familiar to new users. The terminology is
|
|
|
|
|
consistent with what you would expect from something called a
|
|
|
|
|
protocol: the protocols most people think of first, like HTTP, IRC,
|
|
|
|
|
SMTP... are all examples of something that would be implemented in a
|
|
|
|
|
protocol.
|
|
|
|
|
|
|
|
|
|
The shortest useful definition of a protocol is a (usually two-way)
|
|
|
|
|
bridge between the transport and the rest of the application logic. A
|
|
|
|
|
protocol will receive bytes from a transport and translates that
|
|
|
|
|
information into some behavior, typically resulting in some method
|
|
|
|
|
calls on an object. Similarly, application logic calls some methods on
|
|
|
|
|
the protocol, which the protocol translates into bytes and
|
|
|
|
|
communicates to the transport.
|
|
|
|
|
|
|
|
|
|
One of the simplest protocols is a line-based protocol, where data is
|
|
|
|
|
delimited by ``\r\n``. The protocol will receive bytes from the
|
|
|
|
|
transport and buffer them until there is at least one complete
|
|
|
|
|
line. Once that's done, it will pass this line along to some
|
|
|
|
|
object. Ideally that would be accomplished using a callable or even a
|
|
|
|
|
completely separate object composed by the protocol, but it could also
|
|
|
|
|
be implemented by subclassing (as is the case with Twisted's
|
|
|
|
|
``LineReceiver``). For the other direction, the protocol could have a
|
|
|
|
|
``write_line`` method, which adds the required ``\r\n`` and passes the
|
|
|
|
|
new bytes buffer on to the transport.
|
|
|
|
|
|
|
|
|
|
This PEP suggests a generalized ``LineReceiver`` called
|
|
|
|
|
``ChunkProtocol``, where a "chunk" is a message in a stream, delimited
|
|
|
|
|
by the specified delimiter. Instances take a delimiter and a callable
|
|
|
|
|
that will be called with a chunk of data once it's received (as
|
|
|
|
|
opposed to Twisted's subclassing behavior). ``ChunkProtocol`` also has
|
|
|
|
|
a ``write_chunk`` method analogous to the ``write_line`` method
|
|
|
|
|
described above.
|
|
|
|
|
|
|
|
|
|
Why separate protocols and transports?
|
|
|
|
|
--------------------------------------
|
|
|
|
|
|
|
|
|
|
This separation between protocol and transport often confuses people
|
|
|
|
|
who first come across it. In fact, the standard library itself does
|
|
|
|
|
not make this distinction in many cases, particularly not in the API
|
|
|
|
|
it provides to users.
|
|
|
|
|
|
|
|
|
|
It is nonetheless a very useful distinction. In the worst case, it
|
|
|
|
|
simplifies the implementation by clear separation of
|
|
|
|
|
concerns. However, it often serves the far more useful purpose of
|
|
|
|
|
being able to reuse protocols across different transports.
|
|
|
|
|
|
|
|
|
|
Consider a simple RPC protocol. The same bytes may be transferred
|
|
|
|
|
across many different transports, for example pipes or sockets. To
|
|
|
|
|
help with this, we separate the protocol out from the transport. The
|
|
|
|
|
protocol just reads and writes bytes, and doesn't really care what
|
|
|
|
|
mechanism is used to eventually transfer those bytes.
|
|
|
|
|
|
|
|
|
|
This also allows for protocols to be stacked or nested easily,
|
|
|
|
|
allowing for even more code reuse. A common example of this is
|
|
|
|
|
JSON-RPC: according to the specification, it can be used across both
|
|
|
|
|
sockets and HTTP[#jsonrpc]_ . In practice, it tends to be primarily
|
|
|
|
|
encapsulated in HTTP. The protocol-transport abstraction allows us to
|
|
|
|
|
build a stack of protocols and transports that allow you to use HTTP
|
|
|
|
|
as if it were a transport. For JSON-RPC, that might get you a stack
|
|
|
|
|
somewhat like this:
|
|
|
|
|
|
|
|
|
|
1. TCP socket transport
|
|
|
|
|
2. HTTP protocol
|
|
|
|
|
3. HTTP-based transport
|
|
|
|
|
4. JSON-RPC protocol
|
|
|
|
|
5. Application code
|
|
|
|
|
|
|
|
|
|
Flow control
|
|
|
|
|
============
|
|
|
|
|
|
|
|
|
|
Consumers
|
|
|
|
|
---------
|
|
|
|
|
|
|
|
|
|
Consumers consume bytes produced by producers. Together with
|
|
|
|
|
producers, they make flow control possible.
|
|
|
|
|
|
|
|
|
|
Consumers primarily play a passive role in flow control. They get
|
|
|
|
|
called whenever a producer has some data available. They then process
|
|
|
|
|
that data, and typically yield control back to the producer.
|
|
|
|
|
|
|
|
|
|
Consumers typically implement buffers of some sort. They make flow
|
|
|
|
|
control possible by telling their producer about the current status of
|
|
|
|
|
those buffers. A consumer can instruct a producer to stop producing
|
|
|
|
|
entirely, stop producing temporarily, or resume producing if it has
|
|
|
|
|
been told to pause previously.
|
|
|
|
|
|
|
|
|
|
Producers are registered to the consumer using the ``register``
|
|
|
|
|
method.
|
|
|
|
|
|
|
|
|
|
Producers
|
|
|
|
|
---------
|
|
|
|
|
|
|
|
|
|
Where consumers consume bytes, producers produce them.
|
|
|
|
|
|
|
|
|
|
Producers are modeled after the IPushProducer_ interface found in
|
|
|
|
|
Twisted. Although there is an IPullProducer_ as well, it is on the
|
|
|
|
|
whole far less interesting and therefore probably out of the scope of
|
|
|
|
|
this PEP.
|
|
|
|
|
|
|
|
|
|
Although producers can be told to stop producing entirely, the two
|
|
|
|
|
most interesting methods they have are ``pause`` and ``resume``. These
|
|
|
|
|
are usually called by the consumer, to signify whether it is ready to
|
|
|
|
|
process ("consume") more data or not. Consumers and producers
|
|
|
|
|
cooperate to make flow control possible.
|
|
|
|
|
|
|
|
|
|
In addition to the Twisted IPushProducer_ interface, producers have a
|
|
|
|
|
``half_register`` method which is called with the consumer when the
|
|
|
|
|
consumer tries to register that producer. In most cases, this will
|
|
|
|
|
just be a case of setting ``self.consumer = consumer``, but some
|
|
|
|
|
producers may require more complex preconditions or behavior when a
|
|
|
|
|
consumer is registered. End-users are not supposed to call this method
|
|
|
|
|
directly.
|
|
|
|
|
|
|
|
|
|
===========================
|
|
|
|
|
Considered API alternatives
|
|
|
|
|
===========================
|
|
|
|
|
|
|
|
|
|
Generators as producers
|
|
|
|
|
~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
|
|
|
|
|
|
Generators have been suggested as way to implement producers. However,
|
|
|
|
|
there appear to be a few problems with this.
|
|
|
|
|
|
|
|
|
|
First of all, there is a conceptual problem. A generator, in a sense,
|
|
|
|
|
is "passive". It needs to be told, through a method call, to take
|
|
|
|
|
action. A producer is "active": it initiates those method calls. A
|
|
|
|
|
real producer has a symmetric relationship with it's consumer. In the
|
|
|
|
|
case of a generator-turned-producer, only the consumer would have a
|
|
|
|
|
reference, and the producer is blissfully unaware of the consumer's
|
|
|
|
|
existence.
|
|
|
|
|
|
|
|
|
|
This conceptual problem translates into a few technical issues as
|
|
|
|
|
well. After a successful ``write`` method call on its consumer, a
|
|
|
|
|
(push) producer is free to take action once more. In the case of a
|
|
|
|
|
generator, it would need to be told, either by asking for the next
|
|
|
|
|
object through the iteration protocol (a process which could block
|
|
|
|
|
indefinitely), or perhaps by throwing some kind of signal exception
|
|
|
|
|
into it.
|
|
|
|
|
|
|
|
|
|
This signaling setup may provide a technically feasible solution, but
|
|
|
|
|
it is still unsatisfactory. For one, this introduces unwarranted
|
|
|
|
|
complexity in the consumer, which now not only needs to understand how
|
|
|
|
|
to receive and process data, but also how to ask for new data and deal
|
|
|
|
|
with the case of no new data being available.
|
|
|
|
|
|
|
|
|
|
This latter edge case is particularly problematic. It needs to be
|
|
|
|
|
taken care of, since the entire operation is not allowed to
|
|
|
|
|
block. However, generators can not raise an exception on iteration
|
|
|
|
|
without terminating, thereby losing the state of the generator. As a
|
|
|
|
|
result, signaling a lack of available data would have to be done using
|
|
|
|
|
a sentinel value, instead of being done using th exception mechanism.
|
|
|
|
|
|
|
|
|
|
Last but not least, nobody produced actually working code
|
|
|
|
|
demonstrating how they could be used.
|
|
|
|
|
|
|
|
|
|
References
|
|
|
|
|
==========
|
|
|
|
|
|
|
|
|
|
.. [#jsonrpc] Sections `2.1 <http://json-rpc.org/wiki/specification#a2.1JSON-RPCoverstreamconnections>`_ and
|
|
|
|
|
`2.2 <http://json-rpc.org/wiki/specification#a2.2JSON-RPCoverHTTP>`_ .
|
|
|
|
|
|
|
|
|
|
.. _writev: http://pubs.opengroup.org/onlinepubs/009695399/functions/writev.html
|
|
|
|
|
.. _write: http://pubs.opengroup.org/onlinepubs/009695399/functions/write.html
|
|
|
|
|
.. _send: http://pubs.opengroup.org/onlinepubs/009695399/functions/send.html
|
|
|
|
|
.. _IPushProducer: http://twistedmatrix.com/documents/current/api/twisted.internet.interfaces.IPushProducer.html
|
|
|
|
|
.. _IPullProducer: http://twistedmatrix.com/documents/current/api/twisted.internet.interfaces.IPullProducer.html
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Copyright
|
|
|
|
|
=========
|
|
|
|
|
|
|
|
|
|
This document has been placed in the public domain.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
..
|
|
|
|
|
Local Variables:
|
|
|
|
|
mode: indented-text
|
|
|
|
|
indent-tabs-mode: nil
|
|
|
|
|
sentence-end-double-space: t
|
|
|
|
|
fill-column: 70
|
|
|
|
|
coding: utf-8
|
|
|
|
|
End:
|