changes from PEP authors; corrections

This commit is contained in:
David Goodger 2008-06-03 14:14:50 +00:00
parent 43996147f7
commit 1cabbf4c39
2 changed files with 90 additions and 46 deletions

View File

@ -96,7 +96,7 @@ Index by Category
S 364 Transitioning to the Py3K Standard Library Warsaw
S 368 Standard image protocol and class Mastrodomenico
S 369 Post import hooks Heimes
S 371 Addition of the Processing module Noller, Oudkerk
S 371 Addition of the multiprocessing package Noller, Oudkerk
S 3134 Exception Chaining and Embedded Tracebacks Yee
S 3135 New Super Spealman, Delaney
S 3138 String representation in Python 3000 Ishimoto
@ -475,7 +475,7 @@ Numerical Index
S 368 Standard image protocol and class Mastrodomenico
S 369 Post import hooks Heimes
SA 370 Per user site-packages directory Heimes
S 371 Addition of the Processing module Noller, Oudkerk
S 371 Addition of the multiprocessing package Noller, Oudkerk
SR 666 Reject Foolish Indentation Creighton
SR 754 IEEE 754 Floating Point Special Values Warnes
P 3000 Python 3000 GvR

View File

@ -1,5 +1,5 @@
PEP: 371
Title: Addition of the Processing module to standard library
Title: Addition of the multiprocessing package to the standard library
Version: $Revision: $
Last-Modified: $Date: $
Author: Jesse Noller <jnoller@gmail.com>
@ -14,22 +14,22 @@ Post-History:
Abstract
This PEP proposes the inclusion of the pyProcessing [1] module into the
python standard library.
This PEP proposes the inclusion of the pyProcessing [1] package into the
Python standard library, renamed to "multiprocessing".
The processing module mimics the standard library threading module and API
The processing package mimics the standard library threading module and API
to provide a process-based approach to "threaded programming" allowing
end-users to dispatch multiple tasks that effectively side-step the global
interpreter lock.
The module also provides server and client modules to provide remote-
sharing and management of objects and tasks so that applications may not
only leverage multiple cores on the local machine, but also distribute
objects and tasks across a cluster of networked machines.
The package also provides server and client functionality (processing.Manager)
to provide remote sharing and management of objects and tasks so that
applications may not only leverage multiple cores on the local machine,
but also distribute objects and tasks across a cluster of networked machines.
While the distributed capabilities of the module are beneficial, the primary
While the distributed capabilities of the package are beneficial, the primary
focus of this PEP is the core threading-like API and capabilities of the
module.
package.
Rationale
@ -41,20 +41,20 @@ Rationale
Python programmers who are leveraging multi-core machines.
The GIL itself prevents more than a single thread from running within the
interpreter at any given point in time, effectively removing python's
interpreter at any given point in time, effectively removing Python's
ability to take advantage of multi-processor systems. While I/O bound
applications do not suffer the same slow-down when using threading, they do
suffer some performance cost due to the GIL.
The Processing module offers a method to side-step the GIL allowing
The pyProcessing package offers a method to side-step the GIL allowing
applications within CPython to take advantage of multi-core architectures
without asking users to completely change their programming paradigm (i.e.:
dropping threaded programming for another "concurrent" approach - Twisted,
etc).
The Processing module offers CPython users a known API (that of the
The Processing package offers CPython users a known API (that of the
threading module), with known semantics and easy-scalability. In the
future, the module might not be as relevant should the CPython interpreter
future, the package might not be as relevant should the CPython interpreter
enable "true" threading, however for some applications, forking an OS
process may sometimes be more desirable than using lightweight threads,
especially on those platforms where process creation is fast/optimized.
@ -70,7 +70,7 @@ Rationale
t.start()
t.join()
The pyprocessing module mirrors the API so well, that with a simple change
The pyprocessing package mirrors the API so well, that with a simple change
of the import to:
from processing import Process as worker
@ -78,17 +78,17 @@ Rationale
The code now executes through the processing.Process class. This type of
compatibility means that, with a minor (in most cases) change in code,
users' applications will be able to leverage all cores and processors on a
given machine for parallel execution. In many cases the pyprocessing module
given machine for parallel execution. In many cases the pyprocessing package
is even faster than the normal threading approach for I/O bound programs.
This of course, takes into account that the pyprocessing module is in
This of course, takes into account that the pyprocessing package is in
optimized C code, while the threading module is not.
The "Distributed" Problem
In the discussion on Python-Dev about the inclusion of this module [3] there
In the discussion on Python-Dev about the inclusion of this package [3] there
was confusion about the intentions this PEP with an attempt to solve the
"Distributed" problem - frequently comparing the functionality of this
module with other solutions like MPI-based communication [4], CORBA, or
package with other solutions like MPI-based communication [4], CORBA, or
other distributed object approaches [5].
The "distributed" problem is large and varied. Each programmer working
@ -96,24 +96,24 @@ The "Distributed" Problem
module/method or a highly customized problem for which no existing solution
works.
The acceptance of this module does not preclude or recommend that
The acceptance of this package does not preclude or recommend that
programmers working on the "distributed" problem not examine other solutions
for their problem domain. The intent of including this module is to provide
for their problem domain. The intent of including this package is to provide
entry-level capabilities for local concurrency and the basic support to
spread that concurrency across a network of machines - although the two are
not tightly coupled, the pyprocessing module could in fact, be used in
not tightly coupled, the pyprocessing package could in fact, be used in
conjunction with any of the other solutions including MPI/etc.
If necessary - it is possible to completely decouple the local concurrency
abilities of the module from the network-capable/shared aspects of the
module. Without serious concerns or cause however, the author of this PEP
abilities of the package from the network-capable/shared aspects of the
package. Without serious concerns or cause however, the author of this PEP
does not recommend that approach.
Performance Comparison
As we all know - there are "lies, damned lies, and benchmarks". These speed
comparisons, while aimed at showcasing the performance of the pyprocessing
module, are by no means comprehensive or applicable to all possible use
package, are by no means comprehensive or applicable to all possible use
cases or environments. Especially for those platforms with sluggish process
forking timing.
@ -157,10 +157,10 @@ Performance Comparison
threaded (8 threads) 0.007990 seconds
processes (8 procs) 0.005512 seconds
As you can see, process forking via the pyprocessing module is faster than
As you can see, process forking via the pyprocessing package is faster than
the speed of building and then executing the threaded version of the code.
The second test calculates 50000 fibonacci numbers inside of each thread
The second test calculates 50000 Fibonacci numbers inside of each thread
(isolated and shared nothing):
cmd: python run_benchmarks.py fibonacci.py
@ -209,7 +209,7 @@ Performance Comparison
showcase how the current threading implementation does hinder non-I/O
applications. Obviously, these tests could be improved to use a queue for
coordination of results and chunks of work but that is not required to show
the performance of the module.
the performance of the package and core Processing module.
The next test is an I/O bound test. This is normally where we see a steep
improvement in the threading module approach versus a single-threaded
@ -264,51 +264,93 @@ Performance Comparison
processes (8 procs) 0.298625 seconds
We finally see threaded performance surpass that of single-threaded
execution, but the pyprocessing module is still faster when increasing the
execution, but the pyprocessing package is still faster when increasing the
number of workers. If you stay with one or two threads/workers, then the
timing between threads and pyprocessing is fairly close.
Additional benchmarks can be found in the pyprocessing module's source
distribution's examples/ directory.
One item of note however, is that there is an implicit overhead within the
pyprocessing package's Queue implementation due to the object serialization.
Alec Thomas provided a short example based on the run_benchmarks.py script
to demonstrate this overhead versus the default Queue implementation:
cmd: run_bench_queue.py
non_threaded (1 iters) 0.010546 seconds
threaded (1 threads) 0.015164 seconds
processes (1 procs) 0.066167 seconds
non_threaded (2 iters) 0.020768 seconds
threaded (2 threads) 0.041635 seconds
processes (2 procs) 0.084270 seconds
non_threaded (4 iters) 0.041718 seconds
threaded (4 threads) 0.086394 seconds
processes (4 procs) 0.144176 seconds
non_threaded (8 iters) 0.083488 seconds
threaded (8 threads) 0.184254 seconds
processes (8 procs) 0.302999 seconds
Additional benchmarks can be found in the pyprocessing package's source
distribution's examples/ directory. The examples will be included in the
package's documentation.
Maintenance
Richard M. Oudkerk - the author of the pyprocessing module has agreed to
maintaing the module within Python SVN. Jesse Noller has volunteered to
also help maintain/document and test the module.
Richard M. Oudkerk - the author of the pyprocessing package has agreed to
maintain the package within Python SVN. Jesse Noller has volunteered to
also help maintain/document and test the package.
API Naming
The API of the pyprocessing package is designed to closely mimic that of
the threading and Queue modules. It has been proposed that instead of
adding the package as-is, we rename it to be PEP 8 compliant instead.
Since the aim of the package is to be a drop-in for the threading
module, the authors feel that the current API should be used.
When the threading and Queue modules are updated to fully reflect
PEP 8, the pyprocessing/multiprocessing naming can be revised.
Timing/Schedule
Some concerns have been raised about the timing/lateness of this PEP
for the 2.6 and 3.0 releases this year, however it is felt by both
the authors and others that the functionality this module offers
the authors and others that the functionality this package offers
surpasses the risk of inclusion.
However, taking into account the desire not to destabilize python-core, some
refactoring of pyprocessing's code "into" python-core can be withheld until
the next 2.x/3.x releases. This means that the actual risk to python-core
is minimal, and largely constrained to the actual module itself.
However, taking into account the desire not to destabilize Python-core, some
refactoring of pyprocessing's code "into" Python-core can be withheld until
the next 2.x/3.x releases. This means that the actual risk to Python-core
is minimal, and largely constrained to the actual package itself.
Open Issues
* All existing tests for the module should be converted to UnitTest format.
* All existing tests for the package should be converted to UnitTest format.
* Existing documentation has to be moved to ReST formatting.
* Verify code coverage percentage of existing test suite.
* Identify any requirements to achieve a 1.0 milestone if required.
* Verify current source tree conforms to standard library practices.
* Rename top-level module from "pyprocessing" to "multiprocessing".
* Rename top-level package from "pyprocessing" to "multiprocessing".
* Confirm no "default" remote connection capabilities, if needed enable the
remote security mechanisms by default for those classes which offer remote
capabilities.
* Some of the API (Queue methods qsize(), task_done() and join()) either
need to be added, or the reason for their exclusion needs to be identified
and documented clearly.
* Add in "multiprocessing.setExecutable()" method to override the default
behavior of the package to spawn processes using the current executable
name rather than the Python interpreter. Note that Mark Hammond has
suggested a factory-style interface for this[7].
* Also note that the default behavior of process spawning does not make
it compatible with use within IDLE as-is, this will be examined as
a bug-fix or "setExecutable" enhancement.
Closed Issues
* Reliance on ctypes: The pyprocessing module's reliance on ctypes prevents
the module from functioning on platforms where ctypes is not supported.
This is not a restriction of this module, but rather ctypes.
* Reliance on ctypes: The pyprocessing package's reliance on ctypes prevents
the package from functioning on platforms where ctypes is not supported.
This is not a restriction of this package, but rather of ctypes.
References
@ -330,6 +372,8 @@ References
Magazine in December 2008: "Python Threads and the Global Interpreter
Lock" by Jesse Noller. It has been modified for this PEP.
[7] http://groups.google.com/group/python-dev2/msg/54cf06d15cbcbc34
Copyright
This document has been placed in the public domain.