changes from PEP authors; corrections

This commit is contained in:
David Goodger 2008-06-03 14:14:50 +00:00
parent 43996147f7
commit 1cabbf4c39
2 changed files with 90 additions and 46 deletions

View File

@ -96,7 +96,7 @@ Index by Category
S 364 Transitioning to the Py3K Standard Library Warsaw S 364 Transitioning to the Py3K Standard Library Warsaw
S 368 Standard image protocol and class Mastrodomenico S 368 Standard image protocol and class Mastrodomenico
S 369 Post import hooks Heimes S 369 Post import hooks Heimes
S 371 Addition of the Processing module Noller, Oudkerk S 371 Addition of the multiprocessing package Noller, Oudkerk
S 3134 Exception Chaining and Embedded Tracebacks Yee S 3134 Exception Chaining and Embedded Tracebacks Yee
S 3135 New Super Spealman, Delaney S 3135 New Super Spealman, Delaney
S 3138 String representation in Python 3000 Ishimoto S 3138 String representation in Python 3000 Ishimoto
@ -475,7 +475,7 @@ Numerical Index
S 368 Standard image protocol and class Mastrodomenico S 368 Standard image protocol and class Mastrodomenico
S 369 Post import hooks Heimes S 369 Post import hooks Heimes
SA 370 Per user site-packages directory Heimes SA 370 Per user site-packages directory Heimes
S 371 Addition of the Processing module Noller, Oudkerk S 371 Addition of the multiprocessing package Noller, Oudkerk
SR 666 Reject Foolish Indentation Creighton SR 666 Reject Foolish Indentation Creighton
SR 754 IEEE 754 Floating Point Special Values Warnes SR 754 IEEE 754 Floating Point Special Values Warnes
P 3000 Python 3000 GvR P 3000 Python 3000 GvR

View File

@ -1,5 +1,5 @@
PEP: 371 PEP: 371
Title: Addition of the Processing module to standard library Title: Addition of the multiprocessing package to the standard library
Version: $Revision: $ Version: $Revision: $
Last-Modified: $Date: $ Last-Modified: $Date: $
Author: Jesse Noller <jnoller@gmail.com> Author: Jesse Noller <jnoller@gmail.com>
@ -14,22 +14,22 @@ Post-History:
Abstract Abstract
This PEP proposes the inclusion of the pyProcessing [1] module into the This PEP proposes the inclusion of the pyProcessing [1] package into the
python standard library. Python standard library, renamed to "multiprocessing".
The processing module mimics the standard library threading module and API The processing package mimics the standard library threading module and API
to provide a process-based approach to "threaded programming" allowing to provide a process-based approach to "threaded programming" allowing
end-users to dispatch multiple tasks that effectively side-step the global end-users to dispatch multiple tasks that effectively side-step the global
interpreter lock. interpreter lock.
The module also provides server and client modules to provide remote- The package also provides server and client functionality (processing.Manager)
sharing and management of objects and tasks so that applications may not to provide remote sharing and management of objects and tasks so that
only leverage multiple cores on the local machine, but also distribute applications may not only leverage multiple cores on the local machine,
objects and tasks across a cluster of networked machines. but also distribute objects and tasks across a cluster of networked machines.
While the distributed capabilities of the module are beneficial, the primary While the distributed capabilities of the package are beneficial, the primary
focus of this PEP is the core threading-like API and capabilities of the focus of this PEP is the core threading-like API and capabilities of the
module. package.
Rationale Rationale
@ -41,20 +41,20 @@ Rationale
Python programmers who are leveraging multi-core machines. Python programmers who are leveraging multi-core machines.
The GIL itself prevents more than a single thread from running within the The GIL itself prevents more than a single thread from running within the
interpreter at any given point in time, effectively removing python's interpreter at any given point in time, effectively removing Python's
ability to take advantage of multi-processor systems. While I/O bound ability to take advantage of multi-processor systems. While I/O bound
applications do not suffer the same slow-down when using threading, they do applications do not suffer the same slow-down when using threading, they do
suffer some performance cost due to the GIL. suffer some performance cost due to the GIL.
The Processing module offers a method to side-step the GIL allowing The pyProcessing package offers a method to side-step the GIL allowing
applications within CPython to take advantage of multi-core architectures applications within CPython to take advantage of multi-core architectures
without asking users to completely change their programming paradigm (i.e.: without asking users to completely change their programming paradigm (i.e.:
dropping threaded programming for another "concurrent" approach - Twisted, dropping threaded programming for another "concurrent" approach - Twisted,
etc). etc).
The Processing module offers CPython users a known API (that of the The Processing package offers CPython users a known API (that of the
threading module), with known semantics and easy-scalability. In the threading module), with known semantics and easy-scalability. In the
future, the module might not be as relevant should the CPython interpreter future, the package might not be as relevant should the CPython interpreter
enable "true" threading, however for some applications, forking an OS enable "true" threading, however for some applications, forking an OS
process may sometimes be more desirable than using lightweight threads, process may sometimes be more desirable than using lightweight threads,
especially on those platforms where process creation is fast/optimized. especially on those platforms where process creation is fast/optimized.
@ -70,7 +70,7 @@ Rationale
t.start() t.start()
t.join() t.join()
The pyprocessing module mirrors the API so well, that with a simple change The pyprocessing package mirrors the API so well, that with a simple change
of the import to: of the import to:
from processing import Process as worker from processing import Process as worker
@ -78,17 +78,17 @@ Rationale
The code now executes through the processing.Process class. This type of The code now executes through the processing.Process class. This type of
compatibility means that, with a minor (in most cases) change in code, compatibility means that, with a minor (in most cases) change in code,
users' applications will be able to leverage all cores and processors on a users' applications will be able to leverage all cores and processors on a
given machine for parallel execution. In many cases the pyprocessing module given machine for parallel execution. In many cases the pyprocessing package
is even faster than the normal threading approach for I/O bound programs. is even faster than the normal threading approach for I/O bound programs.
This of course, takes into account that the pyprocessing module is in This of course, takes into account that the pyprocessing package is in
optimized C code, while the threading module is not. optimized C code, while the threading module is not.
The "Distributed" Problem The "Distributed" Problem
In the discussion on Python-Dev about the inclusion of this module [3] there In the discussion on Python-Dev about the inclusion of this package [3] there
was confusion about the intentions this PEP with an attempt to solve the was confusion about the intentions this PEP with an attempt to solve the
"Distributed" problem - frequently comparing the functionality of this "Distributed" problem - frequently comparing the functionality of this
module with other solutions like MPI-based communication [4], CORBA, or package with other solutions like MPI-based communication [4], CORBA, or
other distributed object approaches [5]. other distributed object approaches [5].
The "distributed" problem is large and varied. Each programmer working The "distributed" problem is large and varied. Each programmer working
@ -96,24 +96,24 @@ The "Distributed" Problem
module/method or a highly customized problem for which no existing solution module/method or a highly customized problem for which no existing solution
works. works.
The acceptance of this module does not preclude or recommend that The acceptance of this package does not preclude or recommend that
programmers working on the "distributed" problem not examine other solutions programmers working on the "distributed" problem not examine other solutions
for their problem domain. The intent of including this module is to provide for their problem domain. The intent of including this package is to provide
entry-level capabilities for local concurrency and the basic support to entry-level capabilities for local concurrency and the basic support to
spread that concurrency across a network of machines - although the two are spread that concurrency across a network of machines - although the two are
not tightly coupled, the pyprocessing module could in fact, be used in not tightly coupled, the pyprocessing package could in fact, be used in
conjunction with any of the other solutions including MPI/etc. conjunction with any of the other solutions including MPI/etc.
If necessary - it is possible to completely decouple the local concurrency If necessary - it is possible to completely decouple the local concurrency
abilities of the module from the network-capable/shared aspects of the abilities of the package from the network-capable/shared aspects of the
module. Without serious concerns or cause however, the author of this PEP package. Without serious concerns or cause however, the author of this PEP
does not recommend that approach. does not recommend that approach.
Performance Comparison Performance Comparison
As we all know - there are "lies, damned lies, and benchmarks". These speed As we all know - there are "lies, damned lies, and benchmarks". These speed
comparisons, while aimed at showcasing the performance of the pyprocessing comparisons, while aimed at showcasing the performance of the pyprocessing
module, are by no means comprehensive or applicable to all possible use package, are by no means comprehensive or applicable to all possible use
cases or environments. Especially for those platforms with sluggish process cases or environments. Especially for those platforms with sluggish process
forking timing. forking timing.
@ -157,10 +157,10 @@ Performance Comparison
threaded (8 threads) 0.007990 seconds threaded (8 threads) 0.007990 seconds
processes (8 procs) 0.005512 seconds processes (8 procs) 0.005512 seconds
As you can see, process forking via the pyprocessing module is faster than As you can see, process forking via the pyprocessing package is faster than
the speed of building and then executing the threaded version of the code. the speed of building and then executing the threaded version of the code.
The second test calculates 50000 fibonacci numbers inside of each thread The second test calculates 50000 Fibonacci numbers inside of each thread
(isolated and shared nothing): (isolated and shared nothing):
cmd: python run_benchmarks.py fibonacci.py cmd: python run_benchmarks.py fibonacci.py
@ -209,7 +209,7 @@ Performance Comparison
showcase how the current threading implementation does hinder non-I/O showcase how the current threading implementation does hinder non-I/O
applications. Obviously, these tests could be improved to use a queue for applications. Obviously, these tests could be improved to use a queue for
coordination of results and chunks of work but that is not required to show coordination of results and chunks of work but that is not required to show
the performance of the module. the performance of the package and core Processing module.
The next test is an I/O bound test. This is normally where we see a steep The next test is an I/O bound test. This is normally where we see a steep
improvement in the threading module approach versus a single-threaded improvement in the threading module approach versus a single-threaded
@ -264,51 +264,93 @@ Performance Comparison
processes (8 procs) 0.298625 seconds processes (8 procs) 0.298625 seconds
We finally see threaded performance surpass that of single-threaded We finally see threaded performance surpass that of single-threaded
execution, but the pyprocessing module is still faster when increasing the execution, but the pyprocessing package is still faster when increasing the
number of workers. If you stay with one or two threads/workers, then the number of workers. If you stay with one or two threads/workers, then the
timing between threads and pyprocessing is fairly close. timing between threads and pyprocessing is fairly close.
Additional benchmarks can be found in the pyprocessing module's source One item of note however, is that there is an implicit overhead within the
distribution's examples/ directory. pyprocessing package's Queue implementation due to the object serialization.
Alec Thomas provided a short example based on the run_benchmarks.py script
to demonstrate this overhead versus the default Queue implementation:
cmd: run_bench_queue.py
non_threaded (1 iters) 0.010546 seconds
threaded (1 threads) 0.015164 seconds
processes (1 procs) 0.066167 seconds
non_threaded (2 iters) 0.020768 seconds
threaded (2 threads) 0.041635 seconds
processes (2 procs) 0.084270 seconds
non_threaded (4 iters) 0.041718 seconds
threaded (4 threads) 0.086394 seconds
processes (4 procs) 0.144176 seconds
non_threaded (8 iters) 0.083488 seconds
threaded (8 threads) 0.184254 seconds
processes (8 procs) 0.302999 seconds
Additional benchmarks can be found in the pyprocessing package's source
distribution's examples/ directory. The examples will be included in the
package's documentation.
Maintenance Maintenance
Richard M. Oudkerk - the author of the pyprocessing module has agreed to Richard M. Oudkerk - the author of the pyprocessing package has agreed to
maintaing the module within Python SVN. Jesse Noller has volunteered to maintain the package within Python SVN. Jesse Noller has volunteered to
also help maintain/document and test the module. also help maintain/document and test the package.
API Naming
The API of the pyprocessing package is designed to closely mimic that of
the threading and Queue modules. It has been proposed that instead of
adding the package as-is, we rename it to be PEP 8 compliant instead.
Since the aim of the package is to be a drop-in for the threading
module, the authors feel that the current API should be used.
When the threading and Queue modules are updated to fully reflect
PEP 8, the pyprocessing/multiprocessing naming can be revised.
Timing/Schedule Timing/Schedule
Some concerns have been raised about the timing/lateness of this PEP Some concerns have been raised about the timing/lateness of this PEP
for the 2.6 and 3.0 releases this year, however it is felt by both for the 2.6 and 3.0 releases this year, however it is felt by both
the authors and others that the functionality this module offers the authors and others that the functionality this package offers
surpasses the risk of inclusion. surpasses the risk of inclusion.
However, taking into account the desire not to destabilize python-core, some However, taking into account the desire not to destabilize Python-core, some
refactoring of pyprocessing's code "into" python-core can be withheld until refactoring of pyprocessing's code "into" Python-core can be withheld until
the next 2.x/3.x releases. This means that the actual risk to python-core the next 2.x/3.x releases. This means that the actual risk to Python-core
is minimal, and largely constrained to the actual module itself. is minimal, and largely constrained to the actual package itself.
Open Issues Open Issues
* All existing tests for the module should be converted to UnitTest format. * All existing tests for the package should be converted to UnitTest format.
* Existing documentation has to be moved to ReST formatting. * Existing documentation has to be moved to ReST formatting.
* Verify code coverage percentage of existing test suite. * Verify code coverage percentage of existing test suite.
* Identify any requirements to achieve a 1.0 milestone if required. * Identify any requirements to achieve a 1.0 milestone if required.
* Verify current source tree conforms to standard library practices. * Verify current source tree conforms to standard library practices.
* Rename top-level module from "pyprocessing" to "multiprocessing". * Rename top-level package from "pyprocessing" to "multiprocessing".
* Confirm no "default" remote connection capabilities, if needed enable the * Confirm no "default" remote connection capabilities, if needed enable the
remote security mechanisms by default for those classes which offer remote remote security mechanisms by default for those classes which offer remote
capabilities. capabilities.
* Some of the API (Queue methods qsize(), task_done() and join()) either * Some of the API (Queue methods qsize(), task_done() and join()) either
need to be added, or the reason for their exclusion needs to be identified need to be added, or the reason for their exclusion needs to be identified
and documented clearly. and documented clearly.
* Add in "multiprocessing.setExecutable()" method to override the default
behavior of the package to spawn processes using the current executable
name rather than the Python interpreter. Note that Mark Hammond has
suggested a factory-style interface for this[7].
* Also note that the default behavior of process spawning does not make
it compatible with use within IDLE as-is, this will be examined as
a bug-fix or "setExecutable" enhancement.
Closed Issues Closed Issues
* Reliance on ctypes: The pyprocessing module's reliance on ctypes prevents * Reliance on ctypes: The pyprocessing package's reliance on ctypes prevents
the module from functioning on platforms where ctypes is not supported. the package from functioning on platforms where ctypes is not supported.
This is not a restriction of this module, but rather ctypes. This is not a restriction of this package, but rather of ctypes.
References References
@ -330,6 +372,8 @@ References
Magazine in December 2008: "Python Threads and the Global Interpreter Magazine in December 2008: "Python Threads and the Global Interpreter
Lock" by Jesse Noller. It has been modified for this PEP. Lock" by Jesse Noller. It has been modified for this PEP.
[7] http://groups.google.com/group/python-dev2/msg/54cf06d15cbcbc34
Copyright Copyright
This document has been placed in the public domain. This document has been placed in the public domain.