Add PEP 371, Addition of the processing module, by Jesse Noller/Richard Oudkerk.

2008-05-28 15:28:08 +00:00 · 2008-05-28 15:28:08 +00:00 · cf08dcdd09
parent 2fbe655f68
commit cf08dcdd09
2 changed files with 349 additions and 0 deletions
--- a/pep-0000.txt
+++ b/pep-0000.txt
@ -96,6 +96,7 @@ Index by Category
 S   364  Transitioning to the Py3K Standard Library   Warsaw
 S   368  Standard image protocol and class            Mastrodomenico
 S   369  Post import hooks                            Heimes
+ S   371  Addition of the Processing module            Noller, Oudkerk
 S  3134  Exception Chaining and Embedded Tracebacks   Yee
 S  3135  New Super                                    Spealman, Delaney
 S  3138  String representation in Python 3000         Ishimoto
@ -473,6 +474,7 @@ Numerical Index
 S   368  Standard image protocol and class            Mastrodomenico
 S   369  Post import hooks                            Heimes
 SA  370  Per user site-packages directory             Heimes
+ S   371  Addition of the Processing module            Noller, Oudkerk
 SR  666  Reject Foolish Indentation                   Creighton
 SR  754  IEEE 754 Floating Point Special Values       Warnes
 P  3000  Python 3000                                  GvR
@ -607,10 +609,12 @@ Owners
    Meyer, Mike              mwm@mired.org
    Montanaro, Skip          skip@pobox.com
    Moore, Paul              gustav@morpheus.demon.co.uk
+    Noller, Jesse            jnoller@gmail.com
    North, Ben               ben at redfrontdoor.org
    Norwitz, Neal            nnorwitz@gmail.com
    Oliphant, Travis         oliphant@ee.byu.edu
    Orendorff, Jason         jason.orendorff@gmail.com
+    Oudkerk, Richard         r.m.oudkerk@googlemail.com
    Pedroni, Samuele         pedronis@python.org
    Pelletier, Michel        michel@users.sourceforge.net
    Peters, Tim              tim@zope.com
--- a/pep-0371.txt
+++ b/pep-0371.txt
@ -0,0 +1,345 @@
+PEP: 371
+Title: Addition of the Processing module to standard library
+Version: $Revision: $
+Last-Modified: $Date: $
+Author: Jesse Noller <jnoller@gmail.com>
+        Richard Oudkerk <r.m.oudkerk@googlemail.com>
+Status: Draft
+Type: Standards Track
+Content-Type: text/plain
+Created: 06-May-2008
+Python-Version: 2.6 / 3.0
+Post-History:
+
+
+Abstract
+
+    This PEP proposes the inclusion of the pyProcessing [1] module into the
+    python standard library.
+
+    The processing module mimics the standard library threading module and API
+    to provide a process-based approach to "threaded programming" allowing
+    end-users to dispatch multiple tasks that effectively side-step the global
+    interpreter lock.
+
+    The module also provides server and client modules to provide remote-
+    sharing and management of objects and tasks so that applications may not
+    only leverage multiple cores on the local machine, but also distribute
+    objects and tasks across a cluster of networked machines.
+
+    While the distributed capabilities of the module are beneficial, the primary
+    focus of this PEP is the core threading-like API and capabilities of the
+    module.
+
+Rationale
+
+    The current CPython interpreter implements the Global Interpreter Lock (GIL)
+    and barring work in Python 3000 or other versions currently planned [2], the
+    GIL will remain as-is within the CPython interpreter for the foreseeable
+    future.  While the GIL itself enables clean and easy to maintain C code for
+    the interpreter and extensions base, it is frequently an issue for those
+    Python programmers who are leveraging multi-core machines.
+
+    The GIL itself prevents more than a single thread from running within the
+    interpreter at any given point in time, effectively removing python's
+    ability to take advantage of multi-processor systems.  While I/O bound
+    applications do not suffer the same slow-down when using threading, they do
+    suffer some performance cost due to the GIL.
+
+    The Processing module offers a method to side-step the GIL allowing
+    applications within CPython to take advantage of multi-core architectures
+    without asking users to completely change their programming paradigm (i.e.:
+    dropping threaded programming for another "concurrent" approach - Twisted,
+    etc).
+
+    The Processing module offers CPython users a known API (that of the
+    threading module), with known semantics and easy-scalability.  In the
+    future, the module might not be as relevant should the CPython interpreter
+    enable "true" threading, however for some applications, forking an OS
+    process may sometimes be more desirable than using lightweight threads,
+    especially on those platforms where process creation is fast/optimized.
+
+    For example, a simple threaded application:
+
+        from threading import Thread as worker
+
+        def afunc(number):
+            print number * 3
+
+        t = worker(target=afunc, args=(4,))
+        t.start()
+        t.join()
+
+    The pyprocessing module mirrors the API so well, that with a simple change
+    of the import to:
+
+        from processing import Process as worker
+
+    The code now executes through the processing.Process class.  This type of
+    compatibility means that, with a minor (in most cases) change in code,
+    users' applications will be able to leverage all cores and processors on a
+    given machine for parallel execution.  In many cases the pyprocessing module
+    is even faster than the normal threading approach for I/O bound programs.
+    This of course, takes into account that the pyprocessing module is in
+    optimized C code, while the threading module is not.
+
+The "Distributed" Problem
+
+    In the discussion on Python-Dev about the inclusion of this module [3] there
+    was confusion about the intentions this PEP with an attempt to solve the
+    "Distributed" problem - frequently comparing the functionality of this
+    module with other solutions like MPI-based communication [4], CORBA, or
+    other distributed object approaches [5].
+
+    The "distributed" problem is large and varied.  Each programmer working
+    within this domain has either very strong opinions about their favorite
+    module/method or a highly customized problem for which no existing solution
+    works.
+
+    The acceptance of this module does not preclude or recommend that
+    programmers working on the "distributed" problem not examine other solutions
+    for their problem domain.  The intent of including this module is to provide
+    entry-level capabilities for local concurrency and the basic support to
+    spread that concurrency across a network of machines - although the two are
+    not tightly coupled, the pyprocessing module could in fact, be used in
+    conjunction with any of the other solutions including MPI/etc.
+
+    If necessary - it is possible to completely decouple the local concurrency
+    abilities of the module from the network-capable/shared aspects of the
+    module. Without serious concerns or cause however, the author of this PEP
+    does not recommend that approach.
+
+Performance Comparison
+
+    As we all know - there are "lies, damned lies, and benchmarks".  These speed
+    comparisons, while aimed at showcasing the performance of the pyprocessing
+    module, are by no means comprehensive or applicable to all possible use
+    cases or environments.  Especially for those platforms with sluggish process
+    forking timing.
+
+    All benchmarks were run using the following:
+        * 4 Core Intel Xeon CPU @ 3.00GHz
+        * 16 GB of RAM
+        * Python 2.5.2 compiled on Gentoo Linux (kernel 2.6.18.6)
+        * pyProcessing 0.52
+
+    All of the code for this can be downloaded from:
+        http://jessenoller.com/code/bench-src.tgz
+
+    The basic method of execution for these benchmarks is in the
+    run_benchmarks.py script, which is simply a wrapper to execute a target
+    function through a single threaded (linear), multi-threaded (via threading),
+    and multi-process (via pyprocessing) function for a static number of
+    iterations with increasing numbers of execution loops and/or threads.
+
+    The run_benchmarks.py script executes each function 100 times, picking the
+    best run of that 100 iterations via the timeit module.
+
+    First, to identify the overhead of the spawning of the workers, we execute
+    an function which is simply a pass statement (empty):
+
+        cmd: python run_benchmarks.py empty_func.py
+        Importing empty_func
+        Starting tests ...
+        non_threaded (1 iters)  0.000001 seconds
+        threaded (1 threads)    0.000796 seconds
+        processes (1 procs)     0.000714 seconds
+
+        non_threaded (2 iters)  0.000002 seconds
+        threaded (2 threads)    0.001963 seconds
+        processes (2 procs)     0.001466 seconds
+
+        non_threaded (4 iters)  0.000002 seconds
+        threaded (4 threads)    0.003986 seconds
+        processes (4 procs)     0.002701 seconds
+
+        non_threaded (8 iters)  0.000003 seconds
+        threaded (8 threads)    0.007990 seconds
+        processes (8 procs)     0.005512 seconds
+
+    As you can see, process forking via the pyprocessing module is faster than
+    the speed of building and then executing the threaded version of the code.
+
+    The second test calculates 50000 fibonacci numbers inside of each thread
+    (isolated and shared nothing):
+
+        cmd: python run_benchmarks.py fibonacci.py
+        Importing fibonacci
+        Starting tests ...
+        non_threaded (1 iters)  0.195548 seconds
+        threaded (1 threads)    0.197909 seconds
+        processes (1 procs)     0.201175 seconds
+
+        non_threaded (2 iters)  0.397540 seconds
+        threaded (2 threads)    0.397637 seconds
+        processes (2 procs)     0.204265 seconds
+
+        non_threaded (4 iters)  0.795333 seconds
+        threaded (4 threads)    0.797262 seconds
+        processes (4 procs)     0.206990 seconds
+
+        non_threaded (8 iters)  1.591680 seconds
+        threaded (8 threads)    1.596824 seconds
+        processes (8 procs)     0.417899 seconds
+
+    The third test calculates the sum of all primes below 100000, again sharing
+    nothing.
+
+        cmd: run_benchmarks.py crunch_primes.py
+        Importing crunch_primes
+        Starting tests ...
+        non_threaded (1 iters)  0.495157 seconds
+        threaded (1 threads)    0.522320 seconds
+        processes (1 procs)     0.523757 seconds
+
+        non_threaded (2 iters)  1.052048 seconds
+        threaded (2 threads)    1.154726 seconds
+        processes (2 procs)     0.524603 seconds
+
+        non_threaded (4 iters)  2.104733 seconds
+        threaded (4 threads)    2.455215 seconds
+        processes (4 procs)     0.530688 seconds
+
+        non_threaded (8 iters)  4.217455 seconds
+        threaded (8 threads)    5.109192 seconds
+        processes (8 procs)     1.077939 seconds
+
+
+    The reason why tests two and three focused on pure numeric crunching is to
+    showcase how the current threading implementation does hinder non-I/O
+    applications.  Obviously, these tests could be improved to use a queue for
+    coordination of results and chunks of work but that is not required to show
+    the performance of the module.
+
+    The next test is an I/O bound test.  This is normally where we see a steep
+    improvement in the threading module approach versus a single-threaded
+    approach.  In this case, each worker is opening a descriptor to lorem.txt,
+    randomly seeking within it and writing lines to /dev/null:
+
+        cmd: python run_benchmarks.py file_io.py
+        Importing file_io
+        Starting tests ...
+        non_threaded (1 iters)  0.057750 seconds
+        threaded (1 threads)    0.089992 seconds
+        processes (1 procs)     0.090817 seconds
+
+        non_threaded (2 iters)  0.180256 seconds
+        threaded (2 threads)    0.329961 seconds
+        processes (2 procs)     0.096683 seconds
+
+        non_threaded (4 iters)  0.370841 seconds
+        threaded (4 threads)    1.103678 seconds
+        processes (4 procs)     0.101535 seconds
+
+        non_threaded (8 iters)  0.749571 seconds
+        threaded (8 threads)    2.437204 seconds
+        processes (8 procs)     0.203438 seconds
+
+    As you can see, pyprocessing is still faster on this I/O operation than
+    using multiple threads.  And using multiple threads is slower than the
+    single threaded execution itself.
+
+    Finally, we will run a socket-based test to show network I/O performance.
+    This function grabs a URL from a server on the LAN that is a simple error
+    page from tomcat.  It gets the page 100 times. The network is silent, and a
+    10G connection:
+
+        cmd: python run_benchmarks.py url_get.py
+        Importing url_get
+        Starting tests ...
+        non_threaded (1 iters)  0.124774 seconds
+        threaded (1 threads)    0.120478 seconds
+        processes (1 procs)     0.121404 seconds
+
+        non_threaded (2 iters)  0.239574 seconds
+        threaded (2 threads)    0.146138 seconds
+        processes (2 procs)     0.138366 seconds
+
+        non_threaded (4 iters)  0.479159 seconds
+        threaded (4 threads)    0.200985 seconds
+        processes (4 procs)     0.188847 seconds
+
+        non_threaded (8 iters)  0.960621 seconds
+        threaded (8 threads)    0.659298 seconds
+        processes (8 procs)     0.298625 seconds
+
+    We finally see threaded performance surpass that of single-threaded
+    execution, but the pyprocessing module is still faster when increasing the
+    number of workers.  If you stay with one or two threads/workers, then the
+    timing between threads and pyprocessing is fairly close.
+
+    Additional benchmarks can be found in the pyprocessing module's source
+    distribution's examples/ directory.
+
+Maintenance
+
+    Richard M. Oudkerk - the author of the pyprocessing module has agreed to
+    maintaing the module within Python SVN.  Jesse Noller has volunteered to
+    also help maintain/document and test the module.
+
+Timing/Schedule
+
+    Some concerns have been raised about the timing/lateness of this PEP
+    for the 2.6 and 3.0 releases this year, however it is felt by both
+    the authors and others that the functionality this module offers
+    surpasses the risk of inclusion.
+
+    However, taking into account the desire not to destabilize python-core, some
+    refactoring of pyprocessing's code "into" python-core can be withheld until
+    the next 2.x/3.x releases.  This means that the actual risk to python-core
+    is minimal, and largely constrained to the actual module itself.
+
+Open Issues
+
+    * All existing tests for the module should be converted to UnitTest format.
+    * Existing documentation has to be moved to ReST formatting.
+    * Verify code coverage percentage of existing test suite.
+    * Identify any requirements to achieve a 1.0 milestone if required.
+    * Verify current source tree conforms to standard library practices.
+    * Rename top-level module from "pyprocessing" to "multiprocessing".
+    * Confirm no "default" remote connection capabilities, if needed enable the
+      remote security mechanisms by default for those classes which offer remote
+      capabilities.
+    * Some of the API (Queue methods qsize(), task_done() and join()) either
+      need to be added, or the reason for their exclusion needs to be identified
+      and documented clearly.
+
+Closed Issues
+
+    * Reliance on ctypes: The pyprocessing module's reliance on ctypes prevents
+      the module from functioning on platforms where ctypes is not supported.
+      This is not a restriction of this module, but rather ctypes.
+
+References
+
+    [1] PyProcessing home page
+        http://pyprocessing.berlios.de/
+
+    [2] See Adam Olsen's "safe threading" project
+        http://code.google.com/p/python-safethread/
+
+    [3] See: Addition of "pyprocessing" module to standard lib.
+        http://mail.python.org/pipermail/python-dev/2008-May/079417.html
+
+    [4] http://mpi4py.scipy.org/
+
+    [5] See "Cluster Computing"
+        http://wiki.python.org/moin/ParallelProcessing
+
+    [6] The original run_benchmark.py code was published in Python
+        Magazine in December 2008: "Python Threads and the Global Interpreter
+        Lock" by Jesse Noller.  It has been modified for this PEP.
+
+Copyright
+
+    This document has been placed in the public domain.
+
+
+
+Local Variables:
+mode: indented-text
+indent-tabs-mode: nil
+sentence-end-double-space: t
+fill-column: 70
+coding: utf-8
+End: