re-wrapped text to 70 columns
This commit is contained in:
parent
1cabbf4c39
commit
1f996f3ebc
314
pep-0371.txt
314
pep-0371.txt
|
@ -14,50 +14,54 @@ Post-History:
|
|||
|
||||
Abstract
|
||||
|
||||
This PEP proposes the inclusion of the pyProcessing [1] package into the
|
||||
Python standard library, renamed to "multiprocessing".
|
||||
This PEP proposes the inclusion of the pyProcessing [1] package
|
||||
into the Python standard library, renamed to "multiprocessing".
|
||||
|
||||
The processing package mimics the standard library threading module and API
|
||||
to provide a process-based approach to "threaded programming" allowing
|
||||
end-users to dispatch multiple tasks that effectively side-step the global
|
||||
interpreter lock.
|
||||
The processing package mimics the standard library threading
|
||||
module and API to provide a process-based approach to "threaded
|
||||
programming" allowing end-users to dispatch multiple tasks that
|
||||
effectively side-step the global interpreter lock.
|
||||
|
||||
The package also provides server and client functionality (processing.Manager)
|
||||
to provide remote sharing and management of objects and tasks so that
|
||||
applications may not only leverage multiple cores on the local machine,
|
||||
but also distribute objects and tasks across a cluster of networked machines.
|
||||
The package also provides server and client functionality
|
||||
(processing.Manager) to provide remote sharing and management of
|
||||
objects and tasks so that applications may not only leverage
|
||||
multiple cores on the local machine, but also distribute objects
|
||||
and tasks across a cluster of networked machines.
|
||||
|
||||
While the distributed capabilities of the package are beneficial, the primary
|
||||
focus of this PEP is the core threading-like API and capabilities of the
|
||||
package.
|
||||
While the distributed capabilities of the package are beneficial,
|
||||
the primary focus of this PEP is the core threading-like API and
|
||||
capabilities of the package.
|
||||
|
||||
Rationale
|
||||
|
||||
The current CPython interpreter implements the Global Interpreter Lock (GIL)
|
||||
and barring work in Python 3000 or other versions currently planned [2], the
|
||||
GIL will remain as-is within the CPython interpreter for the foreseeable
|
||||
future. While the GIL itself enables clean and easy to maintain C code for
|
||||
the interpreter and extensions base, it is frequently an issue for those
|
||||
Python programmers who are leveraging multi-core machines.
|
||||
The current CPython interpreter implements the Global Interpreter
|
||||
Lock (GIL) and barring work in Python 3000 or other versions
|
||||
currently planned [2], the GIL will remain as-is within the
|
||||
CPython interpreter for the foreseeable future. While the GIL
|
||||
itself enables clean and easy to maintain C code for the
|
||||
interpreter and extensions base, it is frequently an issue for
|
||||
those Python programmers who are leveraging multi-core machines.
|
||||
|
||||
The GIL itself prevents more than a single thread from running within the
|
||||
interpreter at any given point in time, effectively removing Python's
|
||||
ability to take advantage of multi-processor systems. While I/O bound
|
||||
applications do not suffer the same slow-down when using threading, they do
|
||||
suffer some performance cost due to the GIL.
|
||||
The GIL itself prevents more than a single thread from running
|
||||
within the interpreter at any given point in time, effectively
|
||||
removing Python's ability to take advantage of multi-processor
|
||||
systems. While I/O bound applications do not suffer the same
|
||||
slow-down when using threading, they do suffer some performance
|
||||
cost due to the GIL.
|
||||
|
||||
The pyProcessing package offers a method to side-step the GIL allowing
|
||||
applications within CPython to take advantage of multi-core architectures
|
||||
without asking users to completely change their programming paradigm (i.e.:
|
||||
dropping threaded programming for another "concurrent" approach - Twisted,
|
||||
etc).
|
||||
The pyProcessing package offers a method to side-step the GIL
|
||||
allowing applications within CPython to take advantage of
|
||||
multi-core architectures without asking users to completely change
|
||||
their programming paradigm (i.e.: dropping threaded programming
|
||||
for another "concurrent" approach - Twisted, etc).
|
||||
|
||||
The Processing package offers CPython users a known API (that of the
|
||||
threading module), with known semantics and easy-scalability. In the
|
||||
future, the package might not be as relevant should the CPython interpreter
|
||||
enable "true" threading, however for some applications, forking an OS
|
||||
process may sometimes be more desirable than using lightweight threads,
|
||||
especially on those platforms where process creation is fast/optimized.
|
||||
The Processing package offers CPython users a known API (that of
|
||||
the threading module), with known semantics and easy-scalability.
|
||||
In the future, the package might not be as relevant should the
|
||||
CPython interpreter enable "true" threading, however for some
|
||||
applications, forking an OS process may sometimes be more
|
||||
desirable than using lightweight threads, especially on those
|
||||
platforms where process creation is fast/optimized.
|
||||
|
||||
For example, a simple threaded application:
|
||||
|
||||
|
@ -70,52 +74,56 @@ Rationale
|
|||
t.start()
|
||||
t.join()
|
||||
|
||||
The pyprocessing package mirrors the API so well, that with a simple change
|
||||
of the import to:
|
||||
The pyprocessing package mirrors the API so well, that with a
|
||||
simple change of the import to:
|
||||
|
||||
from processing import Process as worker
|
||||
|
||||
The code now executes through the processing.Process class. This type of
|
||||
compatibility means that, with a minor (in most cases) change in code,
|
||||
users' applications will be able to leverage all cores and processors on a
|
||||
given machine for parallel execution. In many cases the pyprocessing package
|
||||
is even faster than the normal threading approach for I/O bound programs.
|
||||
This of course, takes into account that the pyprocessing package is in
|
||||
optimized C code, while the threading module is not.
|
||||
The code now executes through the processing.Process class. This
|
||||
type of compatibility means that, with a minor (in most cases)
|
||||
change in code, users' applications will be able to leverage all
|
||||
cores and processors on a given machine for parallel execution.
|
||||
In many cases the pyprocessing package is even faster than the
|
||||
normal threading approach for I/O bound programs. This of course,
|
||||
takes into account that the pyprocessing package is in optimized C
|
||||
code, while the threading module is not.
|
||||
|
||||
The "Distributed" Problem
|
||||
|
||||
In the discussion on Python-Dev about the inclusion of this package [3] there
|
||||
was confusion about the intentions this PEP with an attempt to solve the
|
||||
"Distributed" problem - frequently comparing the functionality of this
|
||||
package with other solutions like MPI-based communication [4], CORBA, or
|
||||
other distributed object approaches [5].
|
||||
In the discussion on Python-Dev about the inclusion of this
|
||||
package [3] there was confusion about the intentions this PEP with
|
||||
an attempt to solve the "Distributed" problem - frequently
|
||||
comparing the functionality of this package with other solutions
|
||||
like MPI-based communication [4], CORBA, or other distributed
|
||||
object approaches [5].
|
||||
|
||||
The "distributed" problem is large and varied. Each programmer working
|
||||
within this domain has either very strong opinions about their favorite
|
||||
module/method or a highly customized problem for which no existing solution
|
||||
works.
|
||||
The "distributed" problem is large and varied. Each programmer
|
||||
working within this domain has either very strong opinions about
|
||||
their favorite module/method or a highly customized problem for
|
||||
which no existing solution works.
|
||||
|
||||
The acceptance of this package does not preclude or recommend that
|
||||
programmers working on the "distributed" problem not examine other solutions
|
||||
for their problem domain. The intent of including this package is to provide
|
||||
entry-level capabilities for local concurrency and the basic support to
|
||||
spread that concurrency across a network of machines - although the two are
|
||||
not tightly coupled, the pyprocessing package could in fact, be used in
|
||||
programmers working on the "distributed" problem not examine other
|
||||
solutions for their problem domain. The intent of including this
|
||||
package is to provide entry-level capabilities for local
|
||||
concurrency and the basic support to spread that concurrency
|
||||
across a network of machines - although the two are not tightly
|
||||
coupled, the pyprocessing package could in fact, be used in
|
||||
conjunction with any of the other solutions including MPI/etc.
|
||||
|
||||
If necessary - it is possible to completely decouple the local concurrency
|
||||
abilities of the package from the network-capable/shared aspects of the
|
||||
package. Without serious concerns or cause however, the author of this PEP
|
||||
does not recommend that approach.
|
||||
If necessary - it is possible to completely decouple the local
|
||||
concurrency abilities of the package from the
|
||||
network-capable/shared aspects of the package. Without serious
|
||||
concerns or cause however, the author of this PEP does not
|
||||
recommend that approach.
|
||||
|
||||
Performance Comparison
|
||||
|
||||
As we all know - there are "lies, damned lies, and benchmarks". These speed
|
||||
comparisons, while aimed at showcasing the performance of the pyprocessing
|
||||
package, are by no means comprehensive or applicable to all possible use
|
||||
cases or environments. Especially for those platforms with sluggish process
|
||||
forking timing.
|
||||
As we all know - there are "lies, damned lies, and benchmarks".
|
||||
These speed comparisons, while aimed at showcasing the performance
|
||||
of the pyprocessing package, are by no means comprehensive or
|
||||
applicable to all possible use cases or environments. Especially
|
||||
for those platforms with sluggish process forking timing.
|
||||
|
||||
All benchmarks were run using the following:
|
||||
* 4 Core Intel Xeon CPU @ 3.00GHz
|
||||
|
@ -127,16 +135,17 @@ Performance Comparison
|
|||
http://jessenoller.com/code/bench-src.tgz
|
||||
|
||||
The basic method of execution for these benchmarks is in the
|
||||
run_benchmarks.py script, which is simply a wrapper to execute a target
|
||||
function through a single threaded (linear), multi-threaded (via threading),
|
||||
and multi-process (via pyprocessing) function for a static number of
|
||||
iterations with increasing numbers of execution loops and/or threads.
|
||||
run_benchmarks.py script, which is simply a wrapper to execute a
|
||||
target function through a single threaded (linear), multi-threaded
|
||||
(via threading), and multi-process (via pyprocessing) function for
|
||||
a static number of iterations with increasing numbers of execution
|
||||
loops and/or threads.
|
||||
|
||||
The run_benchmarks.py script executes each function 100 times, picking the
|
||||
best run of that 100 iterations via the timeit module.
|
||||
The run_benchmarks.py script executes each function 100 times,
|
||||
picking the best run of that 100 iterations via the timeit module.
|
||||
|
||||
First, to identify the overhead of the spawning of the workers, we execute
|
||||
an function which is simply a pass statement (empty):
|
||||
First, to identify the overhead of the spawning of the workers, we
|
||||
execute an function which is simply a pass statement (empty):
|
||||
|
||||
cmd: python run_benchmarks.py empty_func.py
|
||||
Importing empty_func
|
||||
|
@ -157,11 +166,12 @@ Performance Comparison
|
|||
threaded (8 threads) 0.007990 seconds
|
||||
processes (8 procs) 0.005512 seconds
|
||||
|
||||
As you can see, process forking via the pyprocessing package is faster than
|
||||
the speed of building and then executing the threaded version of the code.
|
||||
As you can see, process forking via the pyprocessing package is
|
||||
faster than the speed of building and then executing the threaded
|
||||
version of the code.
|
||||
|
||||
The second test calculates 50000 Fibonacci numbers inside of each thread
|
||||
(isolated and shared nothing):
|
||||
The second test calculates 50000 Fibonacci numbers inside of each
|
||||
thread (isolated and shared nothing):
|
||||
|
||||
cmd: python run_benchmarks.py fibonacci.py
|
||||
Importing fibonacci
|
||||
|
@ -182,8 +192,8 @@ Performance Comparison
|
|||
threaded (8 threads) 1.596824 seconds
|
||||
processes (8 procs) 0.417899 seconds
|
||||
|
||||
The third test calculates the sum of all primes below 100000, again sharing
|
||||
nothing.
|
||||
The third test calculates the sum of all primes below 100000,
|
||||
again sharing nothing.
|
||||
|
||||
cmd: run_benchmarks.py crunch_primes.py
|
||||
Importing crunch_primes
|
||||
|
@ -204,17 +214,18 @@ Performance Comparison
|
|||
threaded (8 threads) 5.109192 seconds
|
||||
processes (8 procs) 1.077939 seconds
|
||||
|
||||
The reason why tests two and three focused on pure numeric
|
||||
crunching is to showcase how the current threading implementation
|
||||
does hinder non-I/O applications. Obviously, these tests could be
|
||||
improved to use a queue for coordination of results and chunks of
|
||||
work but that is not required to show the performance of the
|
||||
package and core Processing module.
|
||||
|
||||
The reason why tests two and three focused on pure numeric crunching is to
|
||||
showcase how the current threading implementation does hinder non-I/O
|
||||
applications. Obviously, these tests could be improved to use a queue for
|
||||
coordination of results and chunks of work but that is not required to show
|
||||
the performance of the package and core Processing module.
|
||||
|
||||
The next test is an I/O bound test. This is normally where we see a steep
|
||||
improvement in the threading module approach versus a single-threaded
|
||||
approach. In this case, each worker is opening a descriptor to lorem.txt,
|
||||
randomly seeking within it and writing lines to /dev/null:
|
||||
The next test is an I/O bound test. This is normally where we see
|
||||
a steep improvement in the threading module approach versus a
|
||||
single-threaded approach. In this case, each worker is opening a
|
||||
descriptor to lorem.txt, randomly seeking within it and writing
|
||||
lines to /dev/null:
|
||||
|
||||
cmd: python run_benchmarks.py file_io.py
|
||||
Importing file_io
|
||||
|
@ -235,14 +246,14 @@ Performance Comparison
|
|||
threaded (8 threads) 2.437204 seconds
|
||||
processes (8 procs) 0.203438 seconds
|
||||
|
||||
As you can see, pyprocessing is still faster on this I/O operation than
|
||||
using multiple threads. And using multiple threads is slower than the
|
||||
single threaded execution itself.
|
||||
As you can see, pyprocessing is still faster on this I/O operation
|
||||
than using multiple threads. And using multiple threads is slower
|
||||
than the single threaded execution itself.
|
||||
|
||||
Finally, we will run a socket-based test to show network I/O performance.
|
||||
This function grabs a URL from a server on the LAN that is a simple error
|
||||
page from tomcat. It gets the page 100 times. The network is silent, and a
|
||||
10G connection:
|
||||
Finally, we will run a socket-based test to show network I/O
|
||||
performance. This function grabs a URL from a server on the LAN
|
||||
that is a simple error page from tomcat. It gets the page 100
|
||||
times. The network is silent, and a 10G connection:
|
||||
|
||||
cmd: python run_benchmarks.py url_get.py
|
||||
Importing url_get
|
||||
|
@ -263,16 +274,19 @@ Performance Comparison
|
|||
threaded (8 threads) 0.659298 seconds
|
||||
processes (8 procs) 0.298625 seconds
|
||||
|
||||
We finally see threaded performance surpass that of single-threaded
|
||||
execution, but the pyprocessing package is still faster when increasing the
|
||||
number of workers. If you stay with one or two threads/workers, then the
|
||||
timing between threads and pyprocessing is fairly close.
|
||||
We finally see threaded performance surpass that of
|
||||
single-threaded execution, but the pyprocessing package is still
|
||||
faster when increasing the number of workers. If you stay with
|
||||
one or two threads/workers, then the timing between threads and
|
||||
pyprocessing is fairly close.
|
||||
|
||||
One item of note however, is that there is an implicit overhead within the
|
||||
pyprocessing package's Queue implementation due to the object serialization.
|
||||
One item of note however, is that there is an implicit overhead
|
||||
within the pyprocessing package's Queue implementation due to the
|
||||
object serialization.
|
||||
|
||||
Alec Thomas provided a short example based on the run_benchmarks.py script
|
||||
to demonstrate this overhead versus the default Queue implementation:
|
||||
Alec Thomas provided a short example based on the
|
||||
run_benchmarks.py script to demonstrate this overhead versus the
|
||||
default Queue implementation:
|
||||
|
||||
cmd: run_bench_queue.py
|
||||
non_threaded (1 iters) 0.010546 seconds
|
||||
|
@ -291,21 +305,23 @@ Performance Comparison
|
|||
threaded (8 threads) 0.184254 seconds
|
||||
processes (8 procs) 0.302999 seconds
|
||||
|
||||
Additional benchmarks can be found in the pyprocessing package's source
|
||||
distribution's examples/ directory. The examples will be included in the
|
||||
package's documentation.
|
||||
Additional benchmarks can be found in the pyprocessing package's
|
||||
source distribution's examples/ directory. The examples will be
|
||||
included in the package's documentation.
|
||||
|
||||
Maintenance
|
||||
|
||||
Richard M. Oudkerk - the author of the pyprocessing package has agreed to
|
||||
maintain the package within Python SVN. Jesse Noller has volunteered to
|
||||
also help maintain/document and test the package.
|
||||
Richard M. Oudkerk - the author of the pyprocessing package has
|
||||
agreed to maintain the package within Python SVN. Jesse Noller
|
||||
has volunteered to also help maintain/document and test the
|
||||
package.
|
||||
|
||||
API Naming
|
||||
|
||||
The API of the pyprocessing package is designed to closely mimic that of
|
||||
the threading and Queue modules. It has been proposed that instead of
|
||||
adding the package as-is, we rename it to be PEP 8 compliant instead.
|
||||
The API of the pyprocessing package is designed to closely mimic
|
||||
that of the threading and Queue modules. It has been proposed that
|
||||
instead of adding the package as-is, we rename it to be PEP 8
|
||||
compliant instead.
|
||||
|
||||
Since the aim of the package is to be a drop-in for the threading
|
||||
module, the authors feel that the current API should be used.
|
||||
|
@ -314,43 +330,50 @@ API Naming
|
|||
|
||||
Timing/Schedule
|
||||
|
||||
Some concerns have been raised about the timing/lateness of this PEP
|
||||
for the 2.6 and 3.0 releases this year, however it is felt by both
|
||||
the authors and others that the functionality this package offers
|
||||
surpasses the risk of inclusion.
|
||||
Some concerns have been raised about the timing/lateness of this
|
||||
PEP for the 2.6 and 3.0 releases this year, however it is felt by
|
||||
both the authors and others that the functionality this package
|
||||
offers surpasses the risk of inclusion.
|
||||
|
||||
However, taking into account the desire not to destabilize Python-core, some
|
||||
refactoring of pyprocessing's code "into" Python-core can be withheld until
|
||||
the next 2.x/3.x releases. This means that the actual risk to Python-core
|
||||
is minimal, and largely constrained to the actual package itself.
|
||||
However, taking into account the desire not to destabilize
|
||||
Python-core, some refactoring of pyprocessing's code "into"
|
||||
Python-core can be withheld until the next 2.x/3.x releases. This
|
||||
means that the actual risk to Python-core is minimal, and largely
|
||||
constrained to the actual package itself.
|
||||
|
||||
Open Issues
|
||||
|
||||
* All existing tests for the package should be converted to UnitTest format.
|
||||
* All existing tests for the package should be converted to
|
||||
UnitTest format.
|
||||
* Existing documentation has to be moved to ReST formatting.
|
||||
* Verify code coverage percentage of existing test suite.
|
||||
* Identify any requirements to achieve a 1.0 milestone if required.
|
||||
* Verify current source tree conforms to standard library practices.
|
||||
* Rename top-level package from "pyprocessing" to "multiprocessing".
|
||||
* Confirm no "default" remote connection capabilities, if needed enable the
|
||||
remote security mechanisms by default for those classes which offer remote
|
||||
capabilities.
|
||||
* Some of the API (Queue methods qsize(), task_done() and join()) either
|
||||
need to be added, or the reason for their exclusion needs to be identified
|
||||
and documented clearly.
|
||||
* Add in "multiprocessing.setExecutable()" method to override the default
|
||||
behavior of the package to spawn processes using the current executable
|
||||
name rather than the Python interpreter. Note that Mark Hammond has
|
||||
suggested a factory-style interface for this[7].
|
||||
* Also note that the default behavior of process spawning does not make
|
||||
it compatible with use within IDLE as-is, this will be examined as
|
||||
a bug-fix or "setExecutable" enhancement.
|
||||
* Identify any requirements to achieve a 1.0 milestone if
|
||||
required.
|
||||
* Verify current source tree conforms to standard library
|
||||
practices.
|
||||
* Rename top-level package from "pyprocessing" to
|
||||
"multiprocessing".
|
||||
* Confirm no "default" remote connection capabilities, if needed
|
||||
enable the remote security mechanisms by default for those
|
||||
classes which offer remote capabilities.
|
||||
* Some of the API (Queue methods qsize(), task_done() and join())
|
||||
either need to be added, or the reason for their exclusion needs
|
||||
to be identified and documented clearly.
|
||||
* Add in "multiprocessing.setExecutable()" method to override the
|
||||
default behavior of the package to spawn processes using the
|
||||
current executable name rather than the Python interpreter. Note
|
||||
that Mark Hammond has suggested a factory-style interface for
|
||||
this[7].
|
||||
* Also note that the default behavior of process spawning does
|
||||
not make it compatible with use within IDLE as-is, this will
|
||||
be examined as a bug-fix or "setExecutable" enhancement.
|
||||
|
||||
Closed Issues
|
||||
|
||||
* Reliance on ctypes: The pyprocessing package's reliance on ctypes prevents
|
||||
the package from functioning on platforms where ctypes is not supported.
|
||||
This is not a restriction of this package, but rather of ctypes.
|
||||
* Reliance on ctypes: The pyprocessing package's reliance on
|
||||
ctypes prevents the package from functioning on platforms where
|
||||
ctypes is not supported. This is not a restriction of this
|
||||
package, but rather of ctypes.
|
||||
|
||||
References
|
||||
|
||||
|
@ -369,8 +392,9 @@ References
|
|||
http://wiki.python.org/moin/ParallelProcessing
|
||||
|
||||
[6] The original run_benchmark.py code was published in Python
|
||||
Magazine in December 2008: "Python Threads and the Global Interpreter
|
||||
Lock" by Jesse Noller. It has been modified for this PEP.
|
||||
Magazine in December 2008: "Python Threads and the Global
|
||||
Interpreter Lock" by Jesse Noller. It has been modified for
|
||||
this PEP.
|
||||
|
||||
[7] http://groups.google.com/group/python-dev2/msg/54cf06d15cbcbc34
|
||||
|
||||
|
|
Loading…
Reference in New Issue