Add early draft of PEP to create a new async module and protocol.
This commit is contained in:
parent
e12e762a54
commit
a31b330d4c
|
@ -0,0 +1,294 @@
|
|||
PEP: XXX
|
||||
Title: Asynchronous IO support
|
||||
Version: $Revision$
|
||||
Last-Modified: $Date$
|
||||
Author: Laurens Van Houtven <_@lvh.cc>
|
||||
Status: Draft
|
||||
Type: Standards Track
|
||||
Content-Type: text/x-rst
|
||||
Created: 29-May-2011
|
||||
Post-History: TBD
|
||||
|
||||
Abstract
|
||||
========
|
||||
|
||||
This PEP describes an abstraction of asynchronous IO for the Python
|
||||
standard library.
|
||||
|
||||
The goal is to reach a abstraction that can be implemented by many
|
||||
different asynchronous IO backends and provides a target for library
|
||||
developers to write code portable between those different backends.
|
||||
|
||||
Rationale
|
||||
=========
|
||||
|
||||
People who want to write asynchronous code in Python right now have a
|
||||
few options:
|
||||
|
||||
- ``asyncore`` and ``asynchat``
|
||||
- something bespoke, most likely based on the ``select`` module
|
||||
- using a third party library, such as Twisted_ or gevent_
|
||||
|
||||
Unfortunately, each of these options has its downsides, which this PEP
|
||||
tries to address.
|
||||
|
||||
Despite having been part of the Python standard library for a long time,
|
||||
the asyncore module suffers from fundamental flaws following from
|
||||
an inflexible API that does not stand up to the expectations of
|
||||
a modern asynchronous networking module.
|
||||
|
||||
Moreover, its approach is too simplistic to provide developers with all
|
||||
the tools they need in order to fully exploit the potential of asynchronous
|
||||
networking.
|
||||
|
||||
The most popular solution right now used in production involves the
|
||||
use of third party libraries. These often provide satisfactory
|
||||
solutions, but there is a lack of compatibility between these
|
||||
libraries, which tends to make codebases very tightly coupled to the
|
||||
library they use.
|
||||
|
||||
This current lack of portability between different asynchronous IO
|
||||
libraries causes a lot of duplicated effort for third party library
|
||||
developers. A sufficiently powerful abstraction could mean that
|
||||
asynchronous code gets written once, but used everywhere.
|
||||
|
||||
An eventual added goal would be for standard library implementations
|
||||
of wire and network protocols to evolve towards being real protocol
|
||||
implementations, as opposed to standalone libraries that do everything
|
||||
including calling ``recv()`` blockingly. This means they could be
|
||||
easily reused for both synchronous and asynchronous code.
|
||||
|
||||
.. _Twisted: http://www.twistedmatrix.com/
|
||||
.. _gevent: http://www.gevent.org/
|
||||
|
||||
Communication abstractions
|
||||
==========================
|
||||
|
||||
Transports
|
||||
----------
|
||||
|
||||
Transports provide a uniform API for reading bytes from and writing
|
||||
bytes to different kinds of connections. Transports in this PEP are
|
||||
always ordered, reliable, bidirectional, stream-oriented two-endpoint
|
||||
connections. This might be a TCP socket, an SSL connection, a pipe
|
||||
(named or otherwise), a serial port... It may abstract a file descriptor
|
||||
on POSIX platforms or a Handle on Windows or some other data structure
|
||||
appropriate to a particular platform. It encapsulates all of the
|
||||
particular implementation details of using that platform data structure
|
||||
and presents a uniform interface for application developers.
|
||||
|
||||
Transports talk to two things: the other side of the connection on
|
||||
one hand, and a protocol on the other. It's a bridge between the
|
||||
specific underlying transfer mechanism and the protocol. Its job can
|
||||
be described as allowing the protocol to just send and receive bytes,
|
||||
taking care of all of the magic that needs to happen to those bytes
|
||||
to be eventually sent across the wire.
|
||||
|
||||
The primary feature of a transport is sending bytes to a protocol and
|
||||
receiving bytes from the underlying protocol. Writing to the transport
|
||||
is done using the ``write`` and ``write_sequence`` methods. The latter
|
||||
method is a performance optimization, to allow software to take
|
||||
advantage of specific capabilities in some transport
|
||||
mechanisms. Specifically, this allows transports to use writev_
|
||||
instead of write_ or send_, also known as scatter/gather IO.
|
||||
|
||||
A transport can be paused and resumed. This will cause it to buffer
|
||||
data coming from protocols and stop sending received data to the
|
||||
protocol.
|
||||
|
||||
A transport can also be closed, half-closed and aborted. A closed
|
||||
transport will finish writing all of the data queued in it to the
|
||||
underlying mechanism, and will then stop reading or writing
|
||||
data. Aborting a transport stops it, closing the connection without
|
||||
sending any data that is still queued.
|
||||
|
||||
Further writes will result in exceptions being thrown. A half-closed
|
||||
transport may not be written to anymore, but will still accept
|
||||
incoming data.
|
||||
|
||||
Protocols
|
||||
---------
|
||||
|
||||
Protocols are probably more familiar to new users. The terminology is
|
||||
consistent with what you would expect from something called a
|
||||
protocol: the protocols most people think of first, like HTTP, IRC,
|
||||
SMTP... are all examples of something that would be implemented in a
|
||||
protocol.
|
||||
|
||||
The shortest useful definition of a protocol is a (usually two-way)
|
||||
bridge between the transport and the rest of the application logic. A
|
||||
protocol will receive bytes from a transport and translates that
|
||||
information into some behavior, typically resulting in some method
|
||||
calls on an object. Similarly, application logic calls some methods on
|
||||
the protocol, which the protocol translates into bytes and
|
||||
communicates to the transport.
|
||||
|
||||
One of the simplest protocols is a line-based protocol, where data is
|
||||
delimited by ``\r\n``. The protocol will receive bytes from the
|
||||
transport and buffer them until there is at least one complete
|
||||
line. Once that's done, it will pass this line along to some
|
||||
object. Ideally that would be accomplished using a callable or even a
|
||||
completely separate object composed by the protocol, but it could also
|
||||
be implemented by subclassing (as is the case with Twisted's
|
||||
``LineReceiver``). For the other direction, the protocol could have a
|
||||
``write_line`` method, which adds the required ``\r\n`` and passes the
|
||||
new bytes buffer on to the transport.
|
||||
|
||||
This PEP suggests a generalized ``LineReceiver`` called
|
||||
``ChunkProtocol``, where a "chunk" is a message in a stream, delimited
|
||||
by the specified delimiter. Instances take a delimiter and a callable
|
||||
that will be called with a chunk of data once it's received (as
|
||||
opposed to Twisted's subclassing behavior). ``ChunkProtocol`` also has
|
||||
a ``write_chunk`` method analogous to the ``write_line`` method
|
||||
described above.
|
||||
|
||||
Why separate protocols and transports?
|
||||
--------------------------------------
|
||||
|
||||
This separation between protocol and transport often confuses people
|
||||
who first come across it. In fact, the standard library itself does
|
||||
not make this distinction in many cases, particularly not in the API
|
||||
it provides to users.
|
||||
|
||||
It is nonetheless a very useful distinction. In the worst case, it
|
||||
simplifies the implementation by clear separation of
|
||||
concerns. However, it often serves the far more useful purpose of
|
||||
being able to reuse protocols across different transports.
|
||||
|
||||
Consider a simple RPC protocol. The same bytes may be transferred
|
||||
across many different transports, for example pipes or sockets. To
|
||||
help with this, we separate the protocol out from the transport. The
|
||||
protocol just reads and writes bytes, and doesn't really care what
|
||||
mechanism is used to eventually transfer those bytes.
|
||||
|
||||
This also allows for protocols to be stacked or nested easily,
|
||||
allowing for even more code reuse. A common example of this is
|
||||
JSON-RPC: according to the specification, it can be used across both
|
||||
sockets and HTTP[#jsonrpc]_ . In practice, it tends to be primarily
|
||||
encapsulated in HTTP. The protocol-transport abstraction allows us to
|
||||
build a stack of protocols and transports that allow you to use HTTP
|
||||
as if it were a transport. For JSON-RPC, that might get you a stack
|
||||
somewhat like this:
|
||||
|
||||
1. TCP socket transport
|
||||
2. HTTP protocol
|
||||
3. HTTP-based transport
|
||||
4. JSON-RPC protocol
|
||||
5. Application code
|
||||
|
||||
Flow control
|
||||
============
|
||||
|
||||
Consumers
|
||||
---------
|
||||
|
||||
Consumers consume bytes produced by producers. Together with
|
||||
producers, they make flow control possible.
|
||||
|
||||
Consumers primarily play a passive role in flow control. They get
|
||||
called whenever a producer has some data available. They then process
|
||||
that data, and typically yield control back to the producer.
|
||||
|
||||
Consumers typically implement buffers of some sort. They make flow
|
||||
control possible by telling their producer about the current status of
|
||||
those buffers. A consumer can instruct a producer to stop producing
|
||||
entirely, stop producing temporarily, or resume producing if it has
|
||||
been told to pause previously.
|
||||
|
||||
Producers are registered to the consumer using the ``register``
|
||||
method.
|
||||
|
||||
Producers
|
||||
---------
|
||||
|
||||
Where consumers consume bytes, producers produce them.
|
||||
|
||||
Producers are modeled after the IPushProducer_ interface found in
|
||||
Twisted. Although there is an IPullProducer_ as well, it is on the
|
||||
whole far less interesting and therefore probably out of the scope of
|
||||
this PEP.
|
||||
|
||||
Although producers can be told to stop producing entirely, the two
|
||||
most interesting methods they have are ``pause`` and ``resume``. These
|
||||
are usually called by the consumer, to signify whether it is ready to
|
||||
process ("consume") more data or not. Consumers and producers
|
||||
cooperate to make flow control possible.
|
||||
|
||||
In addition to the Twisted IPushProducer_ interface, producers have a
|
||||
``half_register`` method which is called with the consumer when the
|
||||
consumer tries to register that producer. In most cases, this will
|
||||
just be a case of setting ``self.consumer = consumer``, but some
|
||||
producers may require more complex preconditions or behavior when a
|
||||
consumer is registered. End-users are not supposed to call this method
|
||||
directly.
|
||||
|
||||
===========================
|
||||
Considered API alternatives
|
||||
===========================
|
||||
|
||||
Generators as producers
|
||||
~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Generators have been suggested as way to implement producers. However,
|
||||
there appear to be a few problems with this.
|
||||
|
||||
First of all, there is a conceptual problem. A generator, in a sense,
|
||||
is "passive". It needs to be told, through a method call, to take
|
||||
action. A producer is "active": it initiates those method calls. A
|
||||
real producer has a symmetric relationship with it's consumer. In the
|
||||
case of a generator-turned-producer, only the consumer would have a
|
||||
reference, and the producer is blissfully unaware of the consumer's
|
||||
existence.
|
||||
|
||||
This conceptual problem translates into a few technical issues as
|
||||
well. After a successful ``write`` method call on its consumer, a
|
||||
(push) producer is free to take action once more. In the case of a
|
||||
generator, it would need to be told, either by asking for the next
|
||||
object through the iteration protocol (a process which could block
|
||||
indefinitely), or perhaps by throwing some kind of signal exception
|
||||
into it.
|
||||
|
||||
This signaling setup may provide a technically feasible solution, but
|
||||
it is still unsatisfactory. For one, this introduces unwarranted
|
||||
complexity in the consumer, which now not only needs to understand how
|
||||
to receive and process data, but also how to ask for new data and deal
|
||||
with the case of no new data being available.
|
||||
|
||||
This latter edge case is particularly problematic. It needs to be
|
||||
taken care of, since the entire operation is not allowed to
|
||||
block. However, generators can not raise an exception on iteration
|
||||
without terminating, thereby losing the state of the generator. As a
|
||||
result, signaling a lack of available data would have to be done using
|
||||
a sentinel value, instead of being done using th exception mechanism.
|
||||
|
||||
Last but not least, nobody produced actually working code
|
||||
demonstrating how they could be used.
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
.. [#jsonrpc] Sections `2.1 <http://json-rpc.org/wiki/specification#a2.1JSON-RPCoverstreamconnections>`_ and
|
||||
`2.2 <http://json-rpc.org/wiki/specification#a2.2JSON-RPCoverHTTP>`_ .
|
||||
|
||||
.. _writev: http://pubs.opengroup.org/onlinepubs/009695399/functions/writev.html
|
||||
.. _write: http://pubs.opengroup.org/onlinepubs/009695399/functions/write.html
|
||||
.. _send: http://pubs.opengroup.org/onlinepubs/009695399/functions/send.html
|
||||
.. _IPushProducer: http://twistedmatrix.com/documents/current/api/twisted.internet.interfaces.IPushProducer.html
|
||||
.. _IPullProducer: http://twistedmatrix.com/documents/current/api/twisted.internet.interfaces.IPullProducer.html
|
||||
|
||||
|
||||
Copyright
|
||||
=========
|
||||
|
||||
This document has been placed in the public domain.
|
||||
|
||||
|
||||
|
||||
..
|
||||
Local Variables:
|
||||
mode: indented-text
|
||||
indent-tabs-mode: nil
|
||||
sentence-end-double-space: t
|
||||
fill-column: 70
|
||||
coding: utf-8
|
||||
End:
|
Loading…
Reference in New Issue