407 lines
13 KiB
Plaintext
407 lines
13 KiB
Plaintext
PEP: 3148
|
||
Title: futures - execute computations asynchronously
|
||
Version: $Revision$
|
||
Last-Modified: $Date$
|
||
Author: Brian Quinlan <brian@sweetapp.com>
|
||
Status: Draft
|
||
Type: Standards Track
|
||
Content-Type: text/x-rst
|
||
Created: 16-Oct-2009
|
||
Python-Version: 3.2
|
||
Post-History:
|
||
|
||
========
|
||
Abstract
|
||
========
|
||
|
||
This PEP proposes a design for a package that facilitates the evaluation of
|
||
callables using threads and processes.
|
||
|
||
==========
|
||
Motivation
|
||
==========
|
||
|
||
Python currently has powerful primitives to construct multi-threaded and
|
||
multi-process applications but parallelizing simple operations requires a lot of
|
||
work i.e. explicitly launching processes/threads, constructing a work/results
|
||
queue, and waiting for completion or some other termination condition (e.g.
|
||
failure, timeout). It is also difficult to design an application with a global
|
||
process/thread limit when each component invents its own parallel execution
|
||
strategy.
|
||
|
||
=============
|
||
Specification
|
||
=============
|
||
|
||
Check Prime Example
|
||
-------------------
|
||
|
||
::
|
||
|
||
import futures
|
||
import math
|
||
|
||
PRIMES = [
|
||
112272535095293,
|
||
112582705942171,
|
||
112272535095293,
|
||
115280095190773,
|
||
115797848077099,
|
||
1099726899285419]
|
||
|
||
def is_prime(n):
|
||
if n % 2 == 0:
|
||
return False
|
||
|
||
sqrt_n = int(math.floor(math.sqrt(n)))
|
||
for i in range(3, sqrt_n + 1, 2):
|
||
if n % i == 0:
|
||
return False
|
||
return True
|
||
|
||
with futures.ProcessPoolExecutor() as executor:
|
||
for number, is_prime in zip(PRIMES, executor.map(is_prime, PRIMES)):
|
||
print('%d is prime: %s' % (number, is_prime))
|
||
|
||
Web Crawl Example
|
||
-----------------
|
||
|
||
::
|
||
|
||
import futures
|
||
import urllib.request
|
||
|
||
URLS = ['http://www.foxnews.com/',
|
||
'http://www.cnn.com/',
|
||
'http://europe.wsj.com/',
|
||
'http://www.bbc.co.uk/',
|
||
'http://some-made-up-domain.com/']
|
||
|
||
def load_url(url, timeout):
|
||
return urllib.request.urlopen(url, timeout=timeout).read()
|
||
|
||
with futures.ThreadPoolExecutor(max_workers=5) as executor:
|
||
future_to_url = dict((executor.submit(load_url, url, 60), url)
|
||
for url in URLS)
|
||
|
||
for future in futures.as_completed(future_to_url):
|
||
url = future_to_url[future]
|
||
if future.exception() is not None:
|
||
print('%r generated an exception: %s' % (url,
|
||
future.exception()))
|
||
else:
|
||
print('%r page is %d bytes' % (url, len(future.result())))
|
||
|
||
Interface
|
||
---------
|
||
|
||
The proposed package provides two core classes: `Executor` and `Future`.
|
||
An `Executor` receives asynchronous work requests (in terms of a callable and
|
||
its arguments) and returns a `Future` to represent the execution of that
|
||
work request.
|
||
|
||
Executor
|
||
''''''''
|
||
|
||
`Executor` is an abstract class that provides methods to execute calls
|
||
asynchronously.
|
||
|
||
`submit(fn, *args, **kwargs)`
|
||
|
||
Schedules the callable to be executed as fn(*\*args*, *\*\*kwargs*) and returns
|
||
a `Future` instance representing the execution of the function.
|
||
|
||
This is an abstract method and must be implemented by Executor subclasses.
|
||
|
||
`map(func, *iterables, timeout=None)`
|
||
|
||
Equivalent to map(*func*, *\*iterables*) but executed asynchronously and
|
||
possibly out-of-order. The returned iterator raises a `TimeoutError` if
|
||
`__next__()` is called and the result isn't available after *timeout* seconds
|
||
from the original call to `map()`. If *timeout* is not specified or
|
||
``None`` then there is no limit to the wait time. If a call raises an exception
|
||
then that exception will be raised when its value is retrieved from the
|
||
iterator.
|
||
|
||
`shutdown(wait=True)`
|
||
|
||
Signal the executor that it should free any resources that it is using when
|
||
the currently pending futures are done executing. Calls to
|
||
`Executor.submit` and `Executor.map` and made after shutdown will raise
|
||
`RuntimeError`.
|
||
|
||
If wait is `True` then the executor will not return until all the pending
|
||
futures are done executing and the resources associated with the executor
|
||
have been freed.
|
||
|
||
`__enter__()`
|
||
`__exit__(exc_type, exc_val, exc_tb)`
|
||
|
||
When using an executor as a context manager, `__exit__` will call
|
||
`Executor.shutdown(wait=True)`.
|
||
|
||
|
||
ProcessPoolExecutor
|
||
'''''''''''''''''''
|
||
|
||
The `ProcessPoolExecutor` class is an `Executor` subclass that uses a pool of
|
||
processes to execute calls asynchronously. The callable objects and arguments
|
||
passed to `ProcessPoolExecutor.submit` must be serializeable according to the
|
||
same limitations as the multiprocessing module.
|
||
|
||
`__init__(max_workers)`
|
||
|
||
Executes calls asynchronously using a pool of a most *max_workers*
|
||
processes. If *max_workers* is ``None`` or not given then as many worker
|
||
processes will be created as the machine has processors.
|
||
|
||
ThreadPoolExecutor
|
||
''''''''''''''''''
|
||
|
||
The `ThreadPoolExecutor` class is an `Executor` subclass that uses a pool of
|
||
threads to execute calls asynchronously.
|
||
|
||
Deadlock can occur when the callable associated with a `Future` waits on
|
||
the results of another `Future`. For example:
|
||
|
||
::
|
||
|
||
import time
|
||
def wait_on_b():
|
||
time.sleep(5)
|
||
print(b.result()) # b will never complete because it is waiting on a.
|
||
return 5
|
||
|
||
def wait_on_a():
|
||
time.sleep(5)
|
||
print(a.result()) # a will never complete because it is waiting on b.
|
||
return 6
|
||
|
||
|
||
executor = ThreadPoolExecutor(max_workers=2)
|
||
a = executor.submit(wait_on_b)
|
||
b = executor.submit(wait_on_a)
|
||
|
||
And:
|
||
|
||
::
|
||
|
||
def wait_on_future():
|
||
f = executor.submit(pow, 5, 2)
|
||
# This will never complete because there is only one worker thread and
|
||
# it is executing this function.
|
||
print(f.result())
|
||
|
||
executor = ThreadPoolExecutor(max_workers=1)
|
||
executor.submit(wait_on_future)
|
||
|
||
`__init__(max_workers)`
|
||
|
||
Executes calls asynchronously using a pool of at most *max_workers* threads.
|
||
|
||
Future Objects
|
||
''''''''''''''
|
||
|
||
The `Future` class encapsulates the asynchronous execution of a function
|
||
or method call. `Future` instances are returned by `Executor.submit`.
|
||
|
||
`cancel()`
|
||
|
||
Attempt to cancel the call. If the call is currently being executed then
|
||
it cannot be cancelled and the method will return `False`, otherwise the call
|
||
will be cancelled and the method will return `True`.
|
||
|
||
`cancelled()`
|
||
|
||
Return `True` if the call was successfully cancelled.
|
||
|
||
`Future.running()`
|
||
|
||
Return `True` if the call is currently being executed and cannot be cancelled.
|
||
|
||
`Future.done()`
|
||
|
||
Return `True` if the call was successfully cancelled or finished running.
|
||
|
||
`result(timeout=None)`
|
||
|
||
Return the value returned by the call. If the call hasn't yet completed then
|
||
this method will wait up to *timeout* seconds. If the call hasn't completed
|
||
in *timeout* seconds then a `TimeoutError` will be raised. If *timeout*
|
||
is not specified or ``None`` then there is no limit to the wait time.
|
||
|
||
If the future is cancelled before completing then `CancelledError` will
|
||
be raised.
|
||
|
||
If the call raised then this method will raise the same exception.
|
||
|
||
`exception(timeout=None)`
|
||
|
||
Return the exception raised by the call. If the call hasn't yet completed
|
||
then this method will wait up to *timeout* seconds. If the call hasn't
|
||
completed in *timeout* seconds then a `TimeoutError` will be raised.
|
||
If *timeout* is not specified or ``None`` then there is no limit to the wait
|
||
time.
|
||
|
||
If the future is cancelled before completing then `CancelledError` will
|
||
be raised.
|
||
|
||
If the call completed without raising then ``None`` is returned.
|
||
|
||
Internal Future Methods
|
||
^^^^^^^^^^^^^^^^^^^^^^^
|
||
|
||
The following `Future` methods are meant for use in unit tests and `Executor`
|
||
implementations.
|
||
|
||
`set_running_or_notify_cancel()`
|
||
|
||
Should be called by `Executor` implementations before executing the work
|
||
associated with the `Future`.
|
||
|
||
If the method returns `False` then the `Future` was cancelled i.e.
|
||
`Future.cancel` was called and returned `True`. Any threads waiting on the
|
||
`Future` completing (i.e. through `as_completed()` or `wait()`) will be woken
|
||
up.
|
||
|
||
If the method returns `True` then the `Future` was not cancelled and has been
|
||
put in the running state i.e. calls to `Future.running()` will return `True`.
|
||
|
||
This method can only be called once and cannot be called after
|
||
`Future.set_result()` or `Future.set_exception()` have been called.
|
||
|
||
`set_result(result)`
|
||
|
||
Sets the result of the work associated with the `Future`.
|
||
|
||
`set_exception(exception)`
|
||
|
||
Sets the result of the work associated with the `Future` to the given
|
||
`Exception`.
|
||
|
||
Module Functions
|
||
''''''''''''''''
|
||
|
||
`wait(fs, timeout=None, return_when=ALL_COMPLETED)`
|
||
|
||
Wait for the `Future` instances in the given sequence to complete. Returns a
|
||
named 2-tuple of sets. The first set, named "finished", contains the futures
|
||
that completed (finished or were cancelled) before the wait completed. The
|
||
second set, named "not_finished", contains uncompleted futures.
|
||
|
||
This method should always be called using keyword arguments, which are:
|
||
|
||
*fs* is the sequence of Future instances that should be waited on.
|
||
|
||
*timeout* can be used to control the maximum number of seconds to wait before
|
||
returning. If timeout is not specified or None then there is no limit to the
|
||
wait time.
|
||
|
||
*return_when* indicates when the method should return. It must be one of the
|
||
following constants:
|
||
|
||
============================= ==================================================
|
||
Constant Description
|
||
============================= ==================================================
|
||
`FIRST_COMPLETED` The method will return when any future finishes or
|
||
is cancelled.
|
||
`FIRST_EXCEPTION` The method will return when any future finishes by
|
||
raising an exception. If not future raises an
|
||
exception then it is equivalent to ALL_COMPLETED.
|
||
`ALL_COMPLETED` The method will return when all calls finish.
|
||
============================= ==================================================
|
||
|
||
`as_completed(fs, timeout=None)`
|
||
|
||
Returns an iterator over the Future instances given by *fs* that yields futures
|
||
as they complete (finished or were cancelled). Any futures that completed
|
||
before `as_completed()` was called will be yielded first. The returned iterator
|
||
raises a `TimeoutError` if `__next__()` is called and the result isn't available
|
||
after *timeout* seconds from the original call to `as_completed()`. If
|
||
*timeout* is not specified or `None` then there is no limit to the wait time.
|
||
|
||
=========
|
||
Rationale
|
||
=========
|
||
|
||
The proposed design of this module was heavily influenced by the the Java
|
||
java.util.concurrent package [1]_. The conceptual basis of the module, as in
|
||
Java, is the Future class, which represents the progress and result of an
|
||
asynchronous computation. The Future class makes little commitment to the
|
||
evaluation mode being used e.g. it can be be used to represent lazy or eager
|
||
evaluation, for evaluation using threads, processes or remote procedure call.
|
||
|
||
Futures are created by concrete implementations of the Executor class
|
||
(called ExecutorService in Java). The reference implementation provides
|
||
classes that use either a process a thread pool to eagerly evaluate
|
||
computations.
|
||
|
||
Futures have already been seen in Python as part of a popular Python
|
||
cookbook recipe [2]_ and have discussed on the Python-3000 mailing list [3]_.
|
||
|
||
The proposed design is explicit i.e. it requires that clients be aware that
|
||
they are consuming Futures. It would be possible to design a module that
|
||
would return proxy objects (in the style of `weakref`) that could be used
|
||
transparently. It is possible to build a proxy implementation on top of
|
||
the proposed explicit mechanism.
|
||
|
||
The proposed design does not introduce any changes to Python language syntax
|
||
or semantics. Special syntax could be introduced [4]_ to mark function and
|
||
method calls as asynchronous. A proxy result would be returned while the
|
||
operation is eagerly evaluated asynchronously, and execution would only
|
||
block if the proxy object were used before the operation completed.
|
||
|
||
Anh Hai Trinh proposed a simpler but more limited API concept [5]_.
|
||
|
||
========================
|
||
Reference Implementation
|
||
========================
|
||
|
||
The reference implementation [6]_ contains a complete implementation of the
|
||
proposed design. It has been tested on Linux and Mac OS X.
|
||
|
||
==========
|
||
References
|
||
==========
|
||
|
||
.. [1]
|
||
`java.util.concurrent` package documentation
|
||
`http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/package-summary.html`
|
||
|
||
.. [2]
|
||
Python Cookbook recipe 84317, "Easy threading with Futures"
|
||
`http://code.activestate.com/recipes/84317/`
|
||
|
||
.. [3]
|
||
`Python-3000` thread, "mechanism for handling asynchronous concurrency"
|
||
`http://mail.python.org/pipermail/python-3000/2006-April/000960.html`
|
||
|
||
.. [4]
|
||
`Python 3000` thread, "Futures in Python 3000 (was Re: mechanism for handling asynchronous concurrency)"
|
||
`http://mail.python.org/pipermail/python-3000/2006-April/000970.html`
|
||
|
||
.. [5]
|
||
A discussion of `stream`, a similar concept proposed by Anh Hai Trinh
|
||
`http://www.mail-archive.com/stdlib-sig@python.org/msg00480.html`
|
||
|
||
.. [6]
|
||
Reference `futures` implementation
|
||
`http://code.google.com/p/pythonfutures/source/browse/#svn/branches/feedback`
|
||
|
||
=========
|
||
Copyright
|
||
=========
|
||
|
||
This document has been placed in the public domain.
|
||
|
||
|
||
|
||
..
|
||
Local Variables:
|
||
mode: indented-text
|
||
indent-tabs-mode: nil
|
||
sentence-end-double-space: t
|
||
fill-column: 70
|
||
coding: utf-8
|
||
End:
|