re-wrapped text to 70 columns

This commit is contained in:
David Goodger 2008-06-03 14:19:58 +00:00
parent 1cabbf4c39
commit 1f996f3ebc
1 changed files with 169 additions and 145 deletions

View File

@ -14,50 +14,54 @@ Post-History:
Abstract
This PEP proposes the inclusion of the pyProcessing [1] package into the
Python standard library, renamed to "multiprocessing".
This PEP proposes the inclusion of the pyProcessing [1] package
into the Python standard library, renamed to "multiprocessing".
The processing package mimics the standard library threading module and API
to provide a process-based approach to "threaded programming" allowing
end-users to dispatch multiple tasks that effectively side-step the global
interpreter lock.
The processing package mimics the standard library threading
module and API to provide a process-based approach to "threaded
programming" allowing end-users to dispatch multiple tasks that
effectively side-step the global interpreter lock.
The package also provides server and client functionality (processing.Manager)
to provide remote sharing and management of objects and tasks so that
applications may not only leverage multiple cores on the local machine,
but also distribute objects and tasks across a cluster of networked machines.
The package also provides server and client functionality
(processing.Manager) to provide remote sharing and management of
objects and tasks so that applications may not only leverage
multiple cores on the local machine, but also distribute objects
and tasks across a cluster of networked machines.
While the distributed capabilities of the package are beneficial, the primary
focus of this PEP is the core threading-like API and capabilities of the
package.
While the distributed capabilities of the package are beneficial,
the primary focus of this PEP is the core threading-like API and
capabilities of the package.
Rationale
The current CPython interpreter implements the Global Interpreter Lock (GIL)
and barring work in Python 3000 or other versions currently planned [2], the
GIL will remain as-is within the CPython interpreter for the foreseeable
future. While the GIL itself enables clean and easy to maintain C code for
the interpreter and extensions base, it is frequently an issue for those
Python programmers who are leveraging multi-core machines.
The current CPython interpreter implements the Global Interpreter
Lock (GIL) and barring work in Python 3000 or other versions
currently planned [2], the GIL will remain as-is within the
CPython interpreter for the foreseeable future. While the GIL
itself enables clean and easy to maintain C code for the
interpreter and extensions base, it is frequently an issue for
those Python programmers who are leveraging multi-core machines.
The GIL itself prevents more than a single thread from running within the
interpreter at any given point in time, effectively removing Python's
ability to take advantage of multi-processor systems. While I/O bound
applications do not suffer the same slow-down when using threading, they do
suffer some performance cost due to the GIL.
The GIL itself prevents more than a single thread from running
within the interpreter at any given point in time, effectively
removing Python's ability to take advantage of multi-processor
systems. While I/O bound applications do not suffer the same
slow-down when using threading, they do suffer some performance
cost due to the GIL.
The pyProcessing package offers a method to side-step the GIL allowing
applications within CPython to take advantage of multi-core architectures
without asking users to completely change their programming paradigm (i.e.:
dropping threaded programming for another "concurrent" approach - Twisted,
etc).
The pyProcessing package offers a method to side-step the GIL
allowing applications within CPython to take advantage of
multi-core architectures without asking users to completely change
their programming paradigm (i.e.: dropping threaded programming
for another "concurrent" approach - Twisted, etc).
The Processing package offers CPython users a known API (that of the
threading module), with known semantics and easy-scalability. In the
future, the package might not be as relevant should the CPython interpreter
enable "true" threading, however for some applications, forking an OS
process may sometimes be more desirable than using lightweight threads,
especially on those platforms where process creation is fast/optimized.
The Processing package offers CPython users a known API (that of
the threading module), with known semantics and easy-scalability.
In the future, the package might not be as relevant should the
CPython interpreter enable "true" threading, however for some
applications, forking an OS process may sometimes be more
desirable than using lightweight threads, especially on those
platforms where process creation is fast/optimized.
For example, a simple threaded application:
@ -70,52 +74,56 @@ Rationale
t.start()
t.join()
The pyprocessing package mirrors the API so well, that with a simple change
of the import to:
The pyprocessing package mirrors the API so well, that with a
simple change of the import to:
from processing import Process as worker
The code now executes through the processing.Process class. This type of
compatibility means that, with a minor (in most cases) change in code,
users' applications will be able to leverage all cores and processors on a
given machine for parallel execution. In many cases the pyprocessing package
is even faster than the normal threading approach for I/O bound programs.
This of course, takes into account that the pyprocessing package is in
optimized C code, while the threading module is not.
The code now executes through the processing.Process class. This
type of compatibility means that, with a minor (in most cases)
change in code, users' applications will be able to leverage all
cores and processors on a given machine for parallel execution.
In many cases the pyprocessing package is even faster than the
normal threading approach for I/O bound programs. This of course,
takes into account that the pyprocessing package is in optimized C
code, while the threading module is not.
The "Distributed" Problem
In the discussion on Python-Dev about the inclusion of this package [3] there
was confusion about the intentions this PEP with an attempt to solve the
"Distributed" problem - frequently comparing the functionality of this
package with other solutions like MPI-based communication [4], CORBA, or
other distributed object approaches [5].
In the discussion on Python-Dev about the inclusion of this
package [3] there was confusion about the intentions this PEP with
an attempt to solve the "Distributed" problem - frequently
comparing the functionality of this package with other solutions
like MPI-based communication [4], CORBA, or other distributed
object approaches [5].
The "distributed" problem is large and varied. Each programmer working
within this domain has either very strong opinions about their favorite
module/method or a highly customized problem for which no existing solution
works.
The "distributed" problem is large and varied. Each programmer
working within this domain has either very strong opinions about
their favorite module/method or a highly customized problem for
which no existing solution works.
The acceptance of this package does not preclude or recommend that
programmers working on the "distributed" problem not examine other solutions
for their problem domain. The intent of including this package is to provide
entry-level capabilities for local concurrency and the basic support to
spread that concurrency across a network of machines - although the two are
not tightly coupled, the pyprocessing package could in fact, be used in
programmers working on the "distributed" problem not examine other
solutions for their problem domain. The intent of including this
package is to provide entry-level capabilities for local
concurrency and the basic support to spread that concurrency
across a network of machines - although the two are not tightly
coupled, the pyprocessing package could in fact, be used in
conjunction with any of the other solutions including MPI/etc.
If necessary - it is possible to completely decouple the local concurrency
abilities of the package from the network-capable/shared aspects of the
package. Without serious concerns or cause however, the author of this PEP
does not recommend that approach.
If necessary - it is possible to completely decouple the local
concurrency abilities of the package from the
network-capable/shared aspects of the package. Without serious
concerns or cause however, the author of this PEP does not
recommend that approach.
Performance Comparison
As we all know - there are "lies, damned lies, and benchmarks". These speed
comparisons, while aimed at showcasing the performance of the pyprocessing
package, are by no means comprehensive or applicable to all possible use
cases or environments. Especially for those platforms with sluggish process
forking timing.
As we all know - there are "lies, damned lies, and benchmarks".
These speed comparisons, while aimed at showcasing the performance
of the pyprocessing package, are by no means comprehensive or
applicable to all possible use cases or environments. Especially
for those platforms with sluggish process forking timing.
All benchmarks were run using the following:
* 4 Core Intel Xeon CPU @ 3.00GHz
@ -127,16 +135,17 @@ Performance Comparison
http://jessenoller.com/code/bench-src.tgz
The basic method of execution for these benchmarks is in the
run_benchmarks.py script, which is simply a wrapper to execute a target
function through a single threaded (linear), multi-threaded (via threading),
and multi-process (via pyprocessing) function for a static number of
iterations with increasing numbers of execution loops and/or threads.
run_benchmarks.py script, which is simply a wrapper to execute a
target function through a single threaded (linear), multi-threaded
(via threading), and multi-process (via pyprocessing) function for
a static number of iterations with increasing numbers of execution
loops and/or threads.
The run_benchmarks.py script executes each function 100 times, picking the
best run of that 100 iterations via the timeit module.
The run_benchmarks.py script executes each function 100 times,
picking the best run of that 100 iterations via the timeit module.
First, to identify the overhead of the spawning of the workers, we execute
an function which is simply a pass statement (empty):
First, to identify the overhead of the spawning of the workers, we
execute an function which is simply a pass statement (empty):
cmd: python run_benchmarks.py empty_func.py
Importing empty_func
@ -157,11 +166,12 @@ Performance Comparison
threaded (8 threads) 0.007990 seconds
processes (8 procs) 0.005512 seconds
As you can see, process forking via the pyprocessing package is faster than
the speed of building and then executing the threaded version of the code.
As you can see, process forking via the pyprocessing package is
faster than the speed of building and then executing the threaded
version of the code.
The second test calculates 50000 Fibonacci numbers inside of each thread
(isolated and shared nothing):
The second test calculates 50000 Fibonacci numbers inside of each
thread (isolated and shared nothing):
cmd: python run_benchmarks.py fibonacci.py
Importing fibonacci
@ -182,8 +192,8 @@ Performance Comparison
threaded (8 threads) 1.596824 seconds
processes (8 procs) 0.417899 seconds
The third test calculates the sum of all primes below 100000, again sharing
nothing.
The third test calculates the sum of all primes below 100000,
again sharing nothing.
cmd: run_benchmarks.py crunch_primes.py
Importing crunch_primes
@ -204,17 +214,18 @@ Performance Comparison
threaded (8 threads) 5.109192 seconds
processes (8 procs) 1.077939 seconds
The reason why tests two and three focused on pure numeric
crunching is to showcase how the current threading implementation
does hinder non-I/O applications. Obviously, these tests could be
improved to use a queue for coordination of results and chunks of
work but that is not required to show the performance of the
package and core Processing module.
The reason why tests two and three focused on pure numeric crunching is to
showcase how the current threading implementation does hinder non-I/O
applications. Obviously, these tests could be improved to use a queue for
coordination of results and chunks of work but that is not required to show
the performance of the package and core Processing module.
The next test is an I/O bound test. This is normally where we see a steep
improvement in the threading module approach versus a single-threaded
approach. In this case, each worker is opening a descriptor to lorem.txt,
randomly seeking within it and writing lines to /dev/null:
The next test is an I/O bound test. This is normally where we see
a steep improvement in the threading module approach versus a
single-threaded approach. In this case, each worker is opening a
descriptor to lorem.txt, randomly seeking within it and writing
lines to /dev/null:
cmd: python run_benchmarks.py file_io.py
Importing file_io
@ -235,14 +246,14 @@ Performance Comparison
threaded (8 threads) 2.437204 seconds
processes (8 procs) 0.203438 seconds
As you can see, pyprocessing is still faster on this I/O operation than
using multiple threads. And using multiple threads is slower than the
single threaded execution itself.
As you can see, pyprocessing is still faster on this I/O operation
than using multiple threads. And using multiple threads is slower
than the single threaded execution itself.
Finally, we will run a socket-based test to show network I/O performance.
This function grabs a URL from a server on the LAN that is a simple error
page from tomcat. It gets the page 100 times. The network is silent, and a
10G connection:
Finally, we will run a socket-based test to show network I/O
performance. This function grabs a URL from a server on the LAN
that is a simple error page from tomcat. It gets the page 100
times. The network is silent, and a 10G connection:
cmd: python run_benchmarks.py url_get.py
Importing url_get
@ -263,16 +274,19 @@ Performance Comparison
threaded (8 threads) 0.659298 seconds
processes (8 procs) 0.298625 seconds
We finally see threaded performance surpass that of single-threaded
execution, but the pyprocessing package is still faster when increasing the
number of workers. If you stay with one or two threads/workers, then the
timing between threads and pyprocessing is fairly close.
We finally see threaded performance surpass that of
single-threaded execution, but the pyprocessing package is still
faster when increasing the number of workers. If you stay with
one or two threads/workers, then the timing between threads and
pyprocessing is fairly close.
One item of note however, is that there is an implicit overhead within the
pyprocessing package's Queue implementation due to the object serialization.
One item of note however, is that there is an implicit overhead
within the pyprocessing package's Queue implementation due to the
object serialization.
Alec Thomas provided a short example based on the run_benchmarks.py script
to demonstrate this overhead versus the default Queue implementation:
Alec Thomas provided a short example based on the
run_benchmarks.py script to demonstrate this overhead versus the
default Queue implementation:
cmd: run_bench_queue.py
non_threaded (1 iters) 0.010546 seconds
@ -291,21 +305,23 @@ Performance Comparison
threaded (8 threads) 0.184254 seconds
processes (8 procs) 0.302999 seconds
Additional benchmarks can be found in the pyprocessing package's source
distribution's examples/ directory. The examples will be included in the
package's documentation.
Additional benchmarks can be found in the pyprocessing package's
source distribution's examples/ directory. The examples will be
included in the package's documentation.
Maintenance
Richard M. Oudkerk - the author of the pyprocessing package has agreed to
maintain the package within Python SVN. Jesse Noller has volunteered to
also help maintain/document and test the package.
Richard M. Oudkerk - the author of the pyprocessing package has
agreed to maintain the package within Python SVN. Jesse Noller
has volunteered to also help maintain/document and test the
package.
API Naming
The API of the pyprocessing package is designed to closely mimic that of
the threading and Queue modules. It has been proposed that instead of
adding the package as-is, we rename it to be PEP 8 compliant instead.
The API of the pyprocessing package is designed to closely mimic
that of the threading and Queue modules. It has been proposed that
instead of adding the package as-is, we rename it to be PEP 8
compliant instead.
Since the aim of the package is to be a drop-in for the threading
module, the authors feel that the current API should be used.
@ -314,43 +330,50 @@ API Naming
Timing/Schedule
Some concerns have been raised about the timing/lateness of this PEP
for the 2.6 and 3.0 releases this year, however it is felt by both
the authors and others that the functionality this package offers
surpasses the risk of inclusion.
Some concerns have been raised about the timing/lateness of this
PEP for the 2.6 and 3.0 releases this year, however it is felt by
both the authors and others that the functionality this package
offers surpasses the risk of inclusion.
However, taking into account the desire not to destabilize Python-core, some
refactoring of pyprocessing's code "into" Python-core can be withheld until
the next 2.x/3.x releases. This means that the actual risk to Python-core
is minimal, and largely constrained to the actual package itself.
However, taking into account the desire not to destabilize
Python-core, some refactoring of pyprocessing's code "into"
Python-core can be withheld until the next 2.x/3.x releases. This
means that the actual risk to Python-core is minimal, and largely
constrained to the actual package itself.
Open Issues
* All existing tests for the package should be converted to UnitTest format.
* All existing tests for the package should be converted to
UnitTest format.
* Existing documentation has to be moved to ReST formatting.
* Verify code coverage percentage of existing test suite.
* Identify any requirements to achieve a 1.0 milestone if required.
* Verify current source tree conforms to standard library practices.
* Rename top-level package from "pyprocessing" to "multiprocessing".
* Confirm no "default" remote connection capabilities, if needed enable the
remote security mechanisms by default for those classes which offer remote
capabilities.
* Some of the API (Queue methods qsize(), task_done() and join()) either
need to be added, or the reason for their exclusion needs to be identified
and documented clearly.
* Add in "multiprocessing.setExecutable()" method to override the default
behavior of the package to spawn processes using the current executable
name rather than the Python interpreter. Note that Mark Hammond has
suggested a factory-style interface for this[7].
* Also note that the default behavior of process spawning does not make
it compatible with use within IDLE as-is, this will be examined as
a bug-fix or "setExecutable" enhancement.
* Identify any requirements to achieve a 1.0 milestone if
required.
* Verify current source tree conforms to standard library
practices.
* Rename top-level package from "pyprocessing" to
"multiprocessing".
* Confirm no "default" remote connection capabilities, if needed
enable the remote security mechanisms by default for those
classes which offer remote capabilities.
* Some of the API (Queue methods qsize(), task_done() and join())
either need to be added, or the reason for their exclusion needs
to be identified and documented clearly.
* Add in "multiprocessing.setExecutable()" method to override the
default behavior of the package to spawn processes using the
current executable name rather than the Python interpreter. Note
that Mark Hammond has suggested a factory-style interface for
this[7].
* Also note that the default behavior of process spawning does
not make it compatible with use within IDLE as-is, this will
be examined as a bug-fix or "setExecutable" enhancement.
Closed Issues
* Reliance on ctypes: The pyprocessing package's reliance on ctypes prevents
the package from functioning on platforms where ctypes is not supported.
This is not a restriction of this package, but rather of ctypes.
* Reliance on ctypes: The pyprocessing package's reliance on
ctypes prevents the package from functioning on platforms where
ctypes is not supported. This is not a restriction of this
package, but rather of ctypes.
References
@ -369,8 +392,9 @@ References
http://wiki.python.org/moin/ParallelProcessing
[6] The original run_benchmark.py code was published in Python
Magazine in December 2008: "Python Threads and the Global Interpreter
Lock" by Jesse Noller. It has been modified for this PEP.
Magazine in December 2008: "Python Threads and the Global
Interpreter Lock" by Jesse Noller. It has been modified for
this PEP.
[7] http://groups.google.com/group/python-dev2/msg/54cf06d15cbcbc34