298 lines
12 KiB
Plaintext
298 lines
12 KiB
Plaintext
PEP: 3153
|
||
Title: Asynchronous IO support
|
||
Version: $Revision$
|
||
Last-Modified: $Date$
|
||
Author: Laurens Van Houtven <_@lvh.cc>
|
||
Status: Superseded
|
||
Type: Standards Track
|
||
Content-Type: text/x-rst
|
||
Created: 29-May-2011
|
||
Post-History:
|
||
Superseded-By: 3156
|
||
|
||
Abstract
|
||
========
|
||
|
||
This PEP describes an abstraction of asynchronous IO for the Python
|
||
standard library.
|
||
|
||
The goal is to reach an abstraction that can be implemented by many
|
||
different asynchronous IO backends and provides a target for library
|
||
developers to write code portable between those different backends.
|
||
|
||
Rationale
|
||
=========
|
||
|
||
People who want to write asynchronous code in Python right now have a
|
||
few options:
|
||
|
||
- ``asyncore`` and ``asynchat``
|
||
- something bespoke, most likely based on the ``select`` module
|
||
- using a third party library, such as Twisted_ or gevent_
|
||
|
||
Unfortunately, each of these options has its downsides, which this PEP
|
||
tries to address.
|
||
|
||
Despite having been part of the Python standard library for a long
|
||
time, the asyncore module suffers from fundamental flaws following
|
||
from an inflexible API that does not stand up to the expectations of a
|
||
modern asynchronous networking module.
|
||
|
||
Moreover, its approach is too simplistic to provide developers with
|
||
all the tools they need in order to fully exploit the potential of
|
||
asynchronous networking.
|
||
|
||
The most popular solution right now used in production involves the
|
||
use of third party libraries. These often provide satisfactory
|
||
solutions, but there is a lack of compatibility between these
|
||
libraries, which tends to make codebases very tightly coupled to the
|
||
library they use.
|
||
|
||
This current lack of portability between different asynchronous IO
|
||
libraries causes a lot of duplicated effort for third party library
|
||
developers. A sufficiently powerful abstraction could mean that
|
||
asynchronous code gets written once, but used everywhere.
|
||
|
||
An eventual added goal would be for standard library implementations
|
||
of wire and network protocols to evolve towards being real protocol
|
||
implementations, as opposed to standalone libraries that do everything
|
||
including calling ``recv()`` blockingly. This means they could be
|
||
easily reused for both synchronous and asynchronous code.
|
||
|
||
.. _Twisted: http://www.twistedmatrix.com/
|
||
.. _gevent: http://www.gevent.org/
|
||
|
||
Communication abstractions
|
||
==========================
|
||
|
||
Transports
|
||
----------
|
||
|
||
Transports provide a uniform API for reading bytes from and writing
|
||
bytes to different kinds of connections. Transports in this PEP are
|
||
always ordered, reliable, bidirectional, stream-oriented two-endpoint
|
||
connections. This might be a TCP socket, an SSL connection, a pipe
|
||
(named or otherwise), a serial port... It may abstract a file
|
||
descriptor on POSIX platforms or a Handle on Windows or some other
|
||
data structure appropriate to a particular platform. It encapsulates
|
||
all of the particular implementation details of using that platform
|
||
data structure and presents a uniform interface for application
|
||
developers.
|
||
|
||
Transports talk to two things: the other side of the connection on one
|
||
hand, and a protocol on the other. It's a bridge between the specific
|
||
underlying transfer mechanism and the protocol. Its job can be
|
||
described as allowing the protocol to just send and receive bytes,
|
||
taking care of all of the magic that needs to happen to those bytes to
|
||
be eventually sent across the wire.
|
||
|
||
The primary feature of a transport is sending bytes to a protocol and
|
||
receiving bytes from the underlying protocol. Writing to the
|
||
transport is done using the ``write`` and ``write_sequence`` methods.
|
||
The latter method is a performance optimization, to allow software to
|
||
take advantage of specific capabilities in some transport mechanisms.
|
||
Specifically, this allows transports to use writev_ instead of write_
|
||
or send_, also known as scatter/gather IO.
|
||
|
||
A transport can be paused and resumed. This will cause it to buffer
|
||
data coming from protocols and stop sending received data to the
|
||
protocol.
|
||
|
||
A transport can also be closed, half-closed and aborted. A closed
|
||
transport will finish writing all of the data queued in it to the
|
||
underlying mechanism, and will then stop reading or writing data.
|
||
Aborting a transport stops it, closing the connection without sending
|
||
any data that is still queued.
|
||
|
||
Further writes will result in exceptions being thrown. A half-closed
|
||
transport may not be written to anymore, but will still accept
|
||
incoming data.
|
||
|
||
Protocols
|
||
---------
|
||
|
||
Protocols are probably more familiar to new users. The terminology is
|
||
consistent with what you would expect from something called a
|
||
protocol: the protocols most people think of first, like HTTP, IRC,
|
||
SMTP... are all examples of something that would be implemented in a
|
||
protocol.
|
||
|
||
The shortest useful definition of a protocol is a (usually two-way)
|
||
bridge between the transport and the rest of the application logic. A
|
||
protocol will receive bytes from a transport and translates that
|
||
information into some behavior, typically resulting in some method
|
||
calls on an object. Similarly, application logic calls some methods
|
||
on the protocol, which the protocol translates into bytes and
|
||
communicates to the transport.
|
||
|
||
One of the simplest protocols is a line-based protocol, where data is
|
||
delimited by ``\r\n``. The protocol will receive bytes from the
|
||
transport and buffer them until there is at least one complete line.
|
||
Once that's done, it will pass this line along to some object.
|
||
Ideally that would be accomplished using a callable or even a
|
||
completely separate object composed by the protocol, but it could also
|
||
be implemented by subclassing (as is the case with Twisted's
|
||
``LineReceiver``). For the other direction, the protocol could have a
|
||
``write_line`` method, which adds the required ``\r\n`` and passes the
|
||
new bytes buffer on to the transport.
|
||
|
||
This PEP suggests a generalized ``LineReceiver`` called
|
||
``ChunkProtocol``, where a "chunk" is a message in a stream, delimited
|
||
by the specified delimiter. Instances take a delimiter and a callable
|
||
that will be called with a chunk of data once it's received (as
|
||
opposed to Twisted's subclassing behavior). ``ChunkProtocol`` also
|
||
has a ``write_chunk`` method analogous to the ``write_line`` method
|
||
described above.
|
||
|
||
Why separate protocols and transports?
|
||
--------------------------------------
|
||
|
||
This separation between protocol and transport often confuses people
|
||
who first come across it. In fact, the standard library itself does
|
||
not make this distinction in many cases, particularly not in the API
|
||
it provides to users.
|
||
|
||
It is nonetheless a very useful distinction. In the worst case, it
|
||
simplifies the implementation by clear separation of concerns.
|
||
However, it often serves the far more useful purpose of being able to
|
||
reuse protocols across different transports.
|
||
|
||
Consider a simple RPC protocol. The same bytes may be transferred
|
||
across many different transports, for example pipes or sockets. To
|
||
help with this, we separate the protocol out from the transport. The
|
||
protocol just reads and writes bytes, and doesn't really care what
|
||
mechanism is used to eventually transfer those bytes.
|
||
|
||
This also allows for protocols to be stacked or nested easily,
|
||
allowing for even more code reuse. A common example of this is
|
||
JSON-RPC: according to the specification, it can be used across both
|
||
sockets and HTTP [#jsonrpc]_. In practice, it tends to be primarily
|
||
encapsulated in HTTP. The protocol-transport abstraction allows us to
|
||
build a stack of protocols and transports that allow you to use HTTP
|
||
as if it were a transport. For JSON-RPC, that might get you a stack
|
||
somewhat like this:
|
||
|
||
1. TCP socket transport
|
||
2. HTTP protocol
|
||
3. HTTP-based transport
|
||
4. JSON-RPC protocol
|
||
5. Application code
|
||
|
||
Flow control
|
||
============
|
||
|
||
Consumers
|
||
---------
|
||
|
||
Consumers consume bytes produced by producers. Together with
|
||
producers, they make flow control possible.
|
||
|
||
Consumers primarily play a passive role in flow control. They get
|
||
called whenever a producer has some data available. They then process
|
||
that data, and typically yield control back to the producer.
|
||
|
||
Consumers typically implement buffers of some sort. They make flow
|
||
control possible by telling their producer about the current status of
|
||
those buffers. A consumer can instruct a producer to stop producing
|
||
entirely, stop producing temporarily, or resume producing if it has
|
||
been told to pause previously.
|
||
|
||
Producers are registered to the consumer using the ``register``
|
||
method.
|
||
|
||
Producers
|
||
---------
|
||
|
||
Where consumers consume bytes, producers produce them.
|
||
|
||
Producers are modeled after the IPushProducer_ interface found in
|
||
Twisted. Although there is an IPullProducer_ as well, it is on the
|
||
whole far less interesting and therefore probably out of the scope of
|
||
this PEP.
|
||
|
||
Although producers can be told to stop producing entirely, the two
|
||
most interesting methods they have are ``pause`` and ``resume``.
|
||
These are usually called by the consumer, to signify whether it is
|
||
ready to process ("consume") more data or not. Consumers and
|
||
producers cooperate to make flow control possible.
|
||
|
||
In addition to the Twisted IPushProducer_ interface, producers have a
|
||
``half_register`` method which is called with the consumer when the
|
||
consumer tries to register that producer. In most cases, this will
|
||
just be a case of setting ``self.consumer = consumer``, but some
|
||
producers may require more complex preconditions or behavior when a
|
||
consumer is registered. End-users are not supposed to call this
|
||
method directly.
|
||
|
||
===========================
|
||
Considered API alternatives
|
||
===========================
|
||
|
||
Generators as producers
|
||
~~~~~~~~~~~~~~~~~~~~~~~
|
||
|
||
Generators have been suggested as way to implement producers.
|
||
However, there appear to be a few problems with this.
|
||
|
||
First of all, there is a conceptual problem. A generator, in a sense,
|
||
is "passive". It needs to be told, through a method call, to take
|
||
action. A producer is "active": it initiates those method calls. A
|
||
real producer has a symmetric relationship with its consumer. In the
|
||
case of a generator-turned-producer, only the consumer would have a
|
||
reference, and the producer is blissfully unaware of the consumer's
|
||
existence.
|
||
|
||
This conceptual problem translates into a few technical issues as
|
||
well. After a successful ``write`` method call on its consumer, a
|
||
(push) producer is free to take action once more. In the case of a
|
||
generator, it would need to be told, either by asking for the next
|
||
object through the iteration protocol (a process which could block
|
||
indefinitely), or perhaps by throwing some kind of signal exception
|
||
into it.
|
||
|
||
This signaling setup may provide a technically feasible solution, but
|
||
it is still unsatisfactory. For one, this introduces unwarranted
|
||
complexity in the consumer, which now not only needs to understand how
|
||
to receive and process data, but also how to ask for new data and deal
|
||
with the case of no new data being available.
|
||
|
||
This latter edge case is particularly problematic. It needs to be
|
||
taken care of, since the entire operation is not allowed to block.
|
||
However, generators can not raise an exception on iteration without
|
||
terminating, thereby losing the state of the generator. As a result,
|
||
signaling a lack of available data would have to be done using a
|
||
sentinel value, instead of being done using th exception mechanism.
|
||
|
||
Last but not least, nobody produced actually working code
|
||
demonstrating how they could be used.
|
||
|
||
|
||
References
|
||
==========
|
||
|
||
.. [#jsonrpc] Sections `2.1 <http://json-rpc.org/wiki/specification#a2.1JSON-RPCoverstreamconnections>`_ and
|
||
`2.2 <http://json-rpc.org/wiki/specification#a2.2JSON-RPCoverHTTP>`_ .
|
||
|
||
.. _writev: http://pubs.opengroup.org/onlinepubs/009695399/functions/writev.html
|
||
.. _write: http://pubs.opengroup.org/onlinepubs/009695399/functions/write.html
|
||
.. _send: http://pubs.opengroup.org/onlinepubs/009695399/functions/send.html
|
||
.. _IPushProducer: http://twistedmatrix.com/documents/current/api/twisted.internet.interfaces.IPushProducer.html
|
||
.. _IPullProducer: http://twistedmatrix.com/documents/current/api/twisted.internet.interfaces.IPullProducer.html
|
||
|
||
|
||
Copyright
|
||
=========
|
||
|
||
This document has been placed in the public domain.
|
||
|
||
|
||
|
||
..
|
||
Local Variables:
|
||
mode: indented-text
|
||
indent-tabs-mode: nil
|
||
sentence-end-double-space: t
|
||
fill-column: 70
|
||
coding: utf-8
|
||
End:
|