diff --git a/pep-3146.txt b/pep-3146.txt new file mode 100644 index 000000000..ab6948015 --- /dev/null +++ b/pep-3146.txt @@ -0,0 +1,1315 @@ +PEP: 3146 +Title: Merging Unladen Swallow into CPython +Version: $Revision$ +Last-Modified: $Date$ +Author: Collin Winter , + Jeffrey Yasskin , + Reid Kleckner +Status: Draft +Type: Standards Track +Content-Type: text/x-rst +Created: 1-Jan-2010 +Python-Version: 3.3 +Post-History: + + +Abstract +======== + +This PEP proposes the merger of the Unladen Swallow project [#us]_ into +CPython's source tree. Unladen Swallow is an open-source branch of CPython +focused on performance. Unladen Swallow is source-compatible with valid Python +2.6.4 applications and C extension modules. + +Unladen Swallow adds a just-in-time (JIT) compiler to CPython, allowing for the +compilation of selected Python code to optimized machine code. Beyond classical +static compiler optimizations, Unladen Swallow's JIT compiler takes advantage of +data collected at runtime to make checked assumptions about code behaviour, +allowing the production of faster machine code. + +This PEP proposes to integrate Unladen Swallow into CPython's development tree +in a separate ``py3k-jit`` branch, targeted for eventual merger with the main +``py3k`` branch. While Unladen Swallow is by no means finished or perfect, we +feel that Unladen Swallow has reached sufficient maturity to warrant +incorporation into CPython's roadmap. We have sought to create a stable platform +that the wider CPython development team can build upon, a platform that will +yield increasing performance for years to come. + +This PEP will detail Unladen Swallow's implementation and how it differs from +CPython 2.6.4; the benchmarks used to measure performance; the tools used to +ensure correctness and compatibility; the impact on CPython's current platform +support; and the impact on the CPython core development process. The PEP +concludes with a proposed merger plan and brief notes on possible directions +for future work. + +We seek the following from the BDFL: + +- Approval for the overall concept of adding a just-in-time compiler to CPython, + following the design laid out below. +- Permission to continue working on the just-in-time compiler in the CPython + source tree. +- Permission to eventually merge the just-in-time compiler into the ``py3k`` + branch once all blocking issues have been addressed. +- A pony. + + +Rationale, Implementation +========================= + +Many companies and individuals would like Python to be faster, to enable its +use in more projects. Google is one such company. + +Unladen Swallow is a Google-sponsored branch of CPython, initiated to improve +the performance of Google's numerous Python libraries, tools and applications. +To make the adoption of Unladen Swallow as easy as possible, the project +initially aimed at four goals: + +- A performance improvement of 5x over the baseline of CPython 2.6.4 for + single-threaded code. +- 100% source compatibility with valid CPython 2.6 applications. +- 100% source compatibility with valid CPython 2.6 C extension modules. +- Design for eventual merger back into CPython. + +We chose 2.6.4 as our baseline because Google uses CPython 2.4 internally, and +jumping directly from CPython 2.4 to CPython 3.x was considered infeasible. + +To achieve the desired performance, Unladen Swallow has implemented a +just-in-time (JIT) compiler [#jit]_ in the tradition of Urs Hoelzle's work on +Self [#urs-self]_, gathering feedback at runtime and using that to inform +compile-time optimizations. This is similar to the approach taken by the current +breed of JavaScript engines [#v8]_, [#squirrelfishextreme]_; most Java virtual +machines [#hotspot]_; Rubinius [#rubinius]_, MacRuby [#macruby]_, and other Ruby +implementations; Psyco [#psyco]_; and others. + +We explicitly reject any suggestion that our ideas are original. We have sought +to reuse the published work of other researchers wherever possible. If we have +done any original work, it is by accident. We have tried, as much as possible, +to take good ideas from all corners of the academic and industrial community. A +partial list of the research papers that have informed Unladen Swallow is +available on the Unladen Swallow wiki [#us-relevantpapers]_. + +The key observation about optimizing dynamic languages is that they are only +dynamic in theory; in practice, each individual function or snippet of code is +relatively static, using a stable set of types and child functions. The current +CPython bytecode interpreter assumes the worst about the code it is running, +that at any moment the user might override the ``len()`` function or pass a +never-before-seen type into a function. In practice this never happens, but user +code pays for that support. Unladen Swallow takes advantage of the relatively +static nature of user code to improve performance. + +At a high level, the Unladen Swallow JIT compiler works by translating a +function's CPython bytecode to platform-specific machine code, using data +collected at runtime, as well as classical compiler optimizations, to improve +the quality of the generated machine code. Because we only want to spend +resources compiling Python code that will actually benefit the runtime of the +program, an online heuristic is used to assess how hot a given function is. Once +the hotness value for a function crosses a given threshold, it is selected for +compilation and optimization. Until a function is judged hot, however, it runs +in the standard CPython eval loop, which in Unladen Swallow has been +instrumented to record interesting data about each bytecode executed. This +runtime data is used to reduce the flexibility of the generated machine code, +allowing us to optimize for the common case. For example, we collect data on + +- Whether a branch was taken/not taken. If a branch is never taken, we will not + compile it to machine code. +- Types used by operators. If we find that ``a + b`` is only ever adding + integers, the generated machine code for that snippet will not support adding + floats. +- Functions called at each callsite. If we find that a particular ``foo()`` + callsite is always calling the same ``foo`` function, we can optimize the + call or inline it away + +Refer to [#us-llvm-notes]_ for a complete list of data points gathered and how +they are used. + +However, if by chance the historically-untaken branch is now taken, or some +integer-optimized ``a + b`` snippet receives two strings, we must support this. +We cannot change Python semantics. Each of these sections of optimized machine +code is preceded by a `guard`, which checks whether the simplifying assumptions +we made when optimizing still hold. If the assumptions are still valid, we run +the optimized machine code; if they are not, we revert back to the interpreter +and pick up where we left off. + +We have chosen to reuse a set of existing compiler libraries called LLVM +[#llvm]_ for code generation and code optimization. This has saved our small +team from needing to understand and debug code generation on multiple machine +instruction sets and from needing to implement a large set of classical compiler +optimizations. The project would not have been possible without such code reuse. +We have found LLVM easy to modify and its community receptive to our suggestions +and modifications. + +In somewhat more depth, Unladen Swallow's JIT works by compiling CPython +bytecode to LLVM's own intermediate representation (IR) [#llvm-langref]_, taking +into account any runtime data from the CPython eval loop. We then run a set of +LLVM's built-in optimization passes, producing a smaller, optimized version of +the original LLVM IR. LLVM then lowers the IR to platform-specific machine code, +performing register allocation, instruction scheduling, and any necessary +relocations. This arrangement of the compilation pipeline allows the LLVM-based +JIT to be easily omitted from a compiled ``python`` binary by passing +``--without-llvm`` to ``./configure``; various use cases for this flag are +discussed later. + +For a complete detailing of how Unladen Swallow works, consult the Unladen +Swallow documentation [#us-projectplan]_, [#us-llvm-notes]_. + +Unladen Swallow has focused on improving the performance of single-threaded, +pure-Python code. We have not made an effort to remove CPython's global +interpreter lock (GIL); we feel this is separate from our work, and due to its +sensitivity, is best done in a mainline development branch. We considered +making GIL-removal a part of Unladen Swallow, but were concerned by the +possibility of introducing subtle bugs when porting our work from CPython 2.6 +to 3.x. + +A JIT compiler is an extremely versatile tool, and we have by no means +exhausted its full potential. We have tried to create a sufficiently flexible +framework that the wider CPython development community can build upon it for +years to come, extracting increased performance in each subsequent release. + + +Performance +=========== + +Benchmarks +---------- + +Unladen Swallow has developed a fairly large suite of benchmarks, ranging from +synthetic microbenchmarks designed to test a single feature up through +whole-application macrobenchmarks. The inspiration for these benchmarks has come +variously from third-party contributors (in the case of the ``html5lib`` +benchmark), Google's own internal workloads (``slowspitfire``, ``pickle``, +``unpickle``), as well as tools and libraries in heavy use throughout the wider +Python community (``django``, ``2to3``, ``spambayes``). These benchmarks are run +through a single interface called ``perf.py`` that takes care of collecting +memory usage information, graphing performance, and running statistics on the +benchmark results to ensure significance. + +The full list of available benchmarks is available on the Unladen Swallow wiki +[#us-benchmarks]_, including instructions on downloading and running the +benchmarks for yourself. All our benchmarks are open-source; none are +Google-proprietary. We believe this collection of benchmarks serves as a useful +tool to benchmark any complete Python implementation, and indeed, PyPy is +already using these benchmarks for their own performance testing +[#pypy-bmarks]_, [#us-wider-perf-issue]_. We welcome this, and we seek +additional workloads for the benchmark suite from the Python community. + +We have focused our efforts on collecting macrobenchmarks and benchmarks that +simulate real applications as well as possible, when running a whole application +is not feasible. Along a different axis, our benchmark collection originally +focused on the kinds of workloads seen by Google's Python code (webapps, text +processing), though we have since expanded the collection to include workloads +Google cares nothing about. We have so far shied away from heavily-numerical +workloads, since NumPy [#numpy]_ already does an excellent job on such code and +so improving numerical performance was not an initial high priority for the +team; we have begun to incorporate such benchmarks into the collection +[#us-nbody]_ and have started work on optimizing numerical Python code. + +Beyond these benchmarks, there are also a variety of workloads we are explicitly +not interested in benchmarking. Unladen Swallow is focused on improving the +performance of pure-Python code, so the performance of extension modules like +NumPy is uninteresting since NumPy's core routines are implemented in +C. Similarly, workloads that involve a lot of IO like GUIs, databases or +socket-heavy applications would, we feel, fail to accurately measure interpreter +or code generation optimizations. That said, there's certainly room to improve +the performance of C-language extensions modules in the standard library, and +as such, we have added benchmarks for the ``cPickle`` and ``re`` modules. + + +Performance vs CPython 2.6.4 +---------------------------- + +The charts below compare the arithmetic mean of multiple benchmark iterations +for CPython 2.6.4 and Unladen Swallow. ``perf.py`` gathers more data than this, +and indeed, arithmetic mean is not the whole story; we reproduce only the mean +for the sake of conciseness. We include the ``t`` score from the Student's +two-tailed T-test [#students-t-test]_ at the 95% confidence interval to indicate +the significance of the result. Most benchmarks are run for 100 iterations, +though some longer-running whole-application benchmarks are run for fewer +iterations. + +A description of each of these benchmarks is available on the Unladen Swallow +wiki [#us-benchmarks]_. + +Command: +:: + + ./perf.py -r -b default,apps ../a/python ../b/python + + +32-bit; gcc 4.0.3; Ubuntu Dapper; Intel Core2 Duo 6600 @ 2.4GHz; 2 cores; 4MB L2 cache; 4GB RAM + ++--------------+---------------+----------------------+--------------+---------------+----------------------------+ +| Benchmark | CPython 2.6.4 | Unladen Swallow r988 | Change | Significance | Timeline | ++==============+===============+======================+==============+===============+============================+ +| 2to3 | 25.13 s | 24.87 s | 1.01x faster | t=8.94 | http://tinyurl.com/yamhrpg | ++--------------+---------------+----------------------+--------------+---------------+----------------------------+ +| django | 1.08 s | 0.80 s | 1.35x faster | t=315.59 | http://tinyurl.com/y9mrn8s | ++--------------+---------------+----------------------+--------------+---------------+----------------------------+ +| html5lib | 14.29 s | 13.20 s | 1.08x faster | t=2.17 | http://tinyurl.com/y8tyslu | ++--------------+---------------+----------------------+--------------+---------------+----------------------------+ +| nbody | 0.51 s | 0.28 s | 1.84x faster | t=78.007 | http://tinyurl.com/y989qhg | ++--------------+---------------+----------------------+--------------+---------------+----------------------------+ +| rietveld | 0.75 s | 0.55 s | 1.37x faster | Insignificant | http://tinyurl.com/ye7mqd3 | ++--------------+---------------+----------------------+--------------+---------------+----------------------------+ +| slowpickle | 0.75 s | 0.55 s | 1.37x faster | t=20.78 | http://tinyurl.com/ybrsfnd | ++--------------+---------------+----------------------+--------------+---------------+----------------------------+ +| slowspitfire | 0.83 s | 0.61 s | 1.36x faster | t=2124.66 | http://tinyurl.com/yfknhaw | ++--------------+---------------+----------------------+--------------+---------------+----------------------------+ +| slowunpickle | 0.33 s | 0.26 s | 1.26x faster | t=15.12 | http://tinyurl.com/yzlakoo | ++--------------+---------------+----------------------+--------------+---------------+----------------------------+ +| spambayes | 0.31 s | 0.34 s | 1.10x slower | Insignificant | http://tinyurl.com/yem62ub | ++--------------+---------------+----------------------+--------------+---------------+----------------------------+ + + +64-bit; gcc 4.2.4; Ubuntu Hardy; AMD Opteron 8214 HE @ 2.2 GHz; 4 cores; 1MB L2 cache; 8GB RAM + ++--------------+---------------+----------------------+--------------+---------------+----------------------------+ +| Benchmark | CPython 2.6.4 | Unladen Swallow r988 | Change | Significance | Timeline | ++==============+===============+======================+==============+===============+============================+ +| 2to3 | 31.98 s | 30.41 s | 1.05x faster | t=8.35 | http://tinyurl.com/ybcrl3b | ++--------------+---------------+----------------------+--------------+---------------+----------------------------+ +| django | 1.22 s | 0.94 s | 1.30x faster | t=106.68 | http://tinyurl.com/ybwqll6 | ++--------------+---------------+----------------------+--------------+---------------+----------------------------+ +| html5lib | 18.97 s | 17.79 s | 1.06x faster | t=2.78 | http://tinyurl.com/yzlyqvk | ++--------------+---------------+----------------------+--------------+---------------+----------------------------+ +| nbody | 0.77 s | 0.27 s | 2.86x faster | t=133.49 | http://tinyurl.com/yeyqhbg | ++--------------+---------------+----------------------+--------------+---------------+----------------------------+ +| rietveld | 0.74 s | 0.80 s | 1.08x slower | t=-2.45 | http://tinyurl.com/yzjc6ff | ++--------------+---------------+----------------------+--------------+---------------+----------------------------+ +| slowpickle | 0.91 s | 0.62 s | 1.48x faster | t=28.04 | http://tinyurl.com/yf7en6k | ++--------------+---------------+----------------------+--------------+---------------+----------------------------+ +| slowspitfire | 1.01 s | 0.72 s | 1.40x faster | t=98.70 | http://tinyurl.com/yc8pe2o | ++--------------+---------------+----------------------+--------------+---------------+----------------------------+ +| slowunpickle | 0.51 s | 0.34 s | 1.51x faster | t=32.65 | http://tinyurl.com/yjufu4j | ++--------------+---------------+----------------------+--------------+---------------+----------------------------+ +| spambayes | 0.43 s | 0.45 s | 1.06x slower | Insignificant | http://tinyurl.com/yztbjfp | ++--------------+---------------+----------------------+--------------+---------------+----------------------------+ + + +Many of these benchmarks take a hit under Unladen Swallow because the current +version blocks execution to compile Python functions down to machine code. This +leads to the behaviour seen in the timeline graphs for the ``html5lib`` and +``rietveld`` benchmarks, for example, and slows down the overall performance of +``2to3``. We have an active development branch to fix this problem +([#us-background-thread]_, [#us-background-thread-issue]_), but working within +the strictures of CPython's current threading system has complicated the process +and required far more care and time than originally anticipated. We view this +issue as critical to final merger into the ``py3k`` branch. + +We have obviously not met our initial goal of a 5x performance improvement. A +`performance retrospective`_ follows, which addresses why we failed to meet our +initial performance goal. We maintain a list of yet-to-be-implemented +performance work [#us-perf-punchlist]_. + + +Memory Usage +------------ + +The following table shows maximum memory usage (in kilobytes) for each of +Unladen Swallow's default benchmarks for both CPython 2.6.4 and Unladen Swallow +r988, as well as a timeline of memory usage across the lifetime of the +benchmark. We include tables for both 32- and 64-bit binaries. Memory usage was +measured on Linux 2.6 systems by summing the ``Private_`` sections from the +kernel's ``/proc/$pid/smaps`` pseudo-files [#smaps]_. + +Command: + +:: + + ./perf.py -r --track_memory -b default,apps ../a/python ../b/python + + +32-bit + ++--------------+---------------+----------------------+--------+----------------------------+ +| Benchmark | CPython 2.6.4 | Unladen Swallow r988 | Change | Timeline | ++==============+===============+======================+========+============================+ +| 2to3 | 26396 kb | 46896 kb | 1.77x | http://tinyurl.com/yhr2h4z | ++--------------+---------------+----------------------+--------+----------------------------+ +| django | 10028 kb | 27740 kb | 2.76x | http://tinyurl.com/yhan8vs | ++--------------+---------------+----------------------+--------+----------------------------+ +| html5lib | 150028 kb | 173924 kb | 1.15x | http://tinyurl.com/ybt44en | ++--------------+---------------+----------------------+--------+----------------------------+ +| nbody | 3020 kb | 16036 kb | 5.31x | http://tinyurl.com/ya8hltw | ++--------------+---------------+----------------------+--------+----------------------------+ +| rietveld | 15008 kb | 46400 kb | 3.09x | http://tinyurl.com/yhd5dra | ++--------------+---------------+----------------------+--------+----------------------------+ +| slowpickle | 4608 kb | 16656 kb | 3.61x | http://tinyurl.com/ybukyvo | ++--------------+---------------+----------------------+--------+----------------------------+ +| slowspitfire | 85776 kb | 97620 kb | 1.13x | http://tinyurl.com/y9vj35z | ++--------------+---------------+----------------------+--------+----------------------------+ +| slowunpickle | 3448 kb | 13744 kb | 3.98x | http://tinyurl.com/yexh4d5 | ++--------------+---------------+----------------------+--------+----------------------------+ +| spambayes | 7352 kb | 46480 kb | 6.32x | http://tinyurl.com/yem62ub | ++--------------+---------------+----------------------+--------+----------------------------+ + + +64-bit + ++--------------+---------------+----------------------+--------+----------------------------+ +| Benchmark | CPython 2.6.4 | Unladen Swallow r988 | Change | Timeline | ++==============+===============+======================+========+============================+ +| 2to3 | 51596 kb | 82340 kb | 1.59x | http://tinyurl.com/yljg6rs | ++--------------+---------------+----------------------+--------+----------------------------+ +| django | 16020 kb | 38908 kb | 2.43x | http://tinyurl.com/ylqsebh | ++--------------+---------------+----------------------+--------+----------------------------+ +| html5lib | 259232 kb | 324968 kb | 1.25x | http://tinyurl.com/yha6oee | ++--------------+---------------+----------------------+--------+----------------------------+ +| nbody | 4296 kb | 23012 kb | 5.35x | http://tinyurl.com/yztozza | ++--------------+---------------+----------------------+--------+----------------------------+ +| rietveld | 24140 kb | 73960 kb | 3.06x | http://tinyurl.com/ybg2nq7 | ++--------------+---------------+----------------------+--------+----------------------------+ +| slowpickle | 4928 kb | 23300 kb | 4.73x | http://tinyurl.com/yk5tpbr | ++--------------+---------------+----------------------+--------+----------------------------+ +| slowspitfire | 133276 kb | 148676 kb | 1.11x | http://tinyurl.com/y8bz2xe | ++--------------+---------------+----------------------+--------+----------------------------+ +| slowunpickle | 4896 kb | 16948 kb | 3.46x | http://tinyurl.com/ygywwoc | ++--------------+---------------+----------------------+--------+----------------------------+ +| spambayes | 10728 kb | 84992 kb | 7.92x | http://tinyurl.com/yhjban5 | ++--------------+---------------+----------------------+--------+----------------------------+ + + +The increased memory usage comes from a) LLVM code generation, analysis and +optimization libraries; b) native code; c) memory usage issues or leaks in +LLVM; d) data structures needed to optimize and generate machine code; e) +as-yet uncategorized other sources. + +While we have made significant progress in reducing memory usage since the +initial naive JIT implementation [#us-memory-issue]_, there is obviously more +to do. We believe that there are still memory savings to be made without +sacrificing performance. We have tended to focus on raw performance, and we +have not yet made a concerted push to reduce memory usage. We view reducing +memory usage as a blocking issue for final merger into the ``py3k`` branch. We +seek guidance from the community on an acceptable level of increased memory +usage. + + +Start-up Time +------------- + +Statically linking LLVM's code generation, analysis and optimization libraries +increases the time needed to start the Python binary. C++ static initializers +used by LLVM also increase start-up time, as does importing the collection of +pre-compiled C runtime routines we want to inline to Python code. + +Results from Unladen Swallow's ``startup`` benchmarks: + +:: + + $ ./perf.py -r -b startup /tmp/cpy-26/bin/python /tmp/unladen/bin/python + + ### normal_startup ### + Min: 0.219186 -> 0.352075: 1.6063x slower + Avg: 0.227228 -> 0.364384: 1.6036x slower + Significant (t=-51.879098, a=0.95) + Stddev: 0.00762 -> 0.02532: 3.3227x larger + Timeline: http://tinyurl.com/yfe8z3r + + ### startup_nosite ### + Min: 0.105949 -> 0.264912: 2.5004x slower + Avg: 0.107574 -> 0.267505: 2.4867x slower + Significant (t=-703.557403, a=0.95) + Stddev: 0.00214 -> 0.00240: 1.1209x larger + Timeline: http://tinyurl.com/yajn8fa + + +Unladen Swallow has made headway toward optimizing startup time, but there is +still more work to do and further optimizations to implement. Improving start-up +time is a high-priority item [#us-issue-startup-time]_ in Unladen Swallow's +merger punchlist. + + +Binary Size +----------- + +Statically linking LLVM's code generation, analysis and optimization libraries +significantly increases the size of the ``python`` binary. + + +32-bit; gcc 4.0.3 + ++-------------+---------------+---------------+----------------------+ +| Binary size | CPython 2.6.4 | CPython 3.1.1 | Unladen Swallow r988 | ++=============+===============+===============+======================+ +| Release | 3.8M | 4.0M | 74M | ++-------------+---------------+---------------+----------------------+ +| Debug | 3.3M | 3.6M | 118M | ++-------------+---------------+---------------+----------------------+ + +64-bit; gcc 4.2.4 + ++-------------+---------------+---------------+----------------------+ +| Binary size | CPython 2.6.4 | CPython 3.1.1 | Unladen Swallow r988 | ++=============+===============+===============+======================+ +| Release | 5.5M | 5.7M | 89M | ++-------------+---------------+---------------+----------------------+ +| Debug | 4.1M | 4.4M | 128M | ++-------------+---------------+---------------+----------------------+ + +The increased binary size is due to statically linking LLVM's code generation, +analysis and optimization libraries into the ``python`` binary. This can be +straightforwardly addressed by modifying LLVM to better support shared linking +and then using that, instead of the current static linking. For the moment, +though, static linking provides an accurate look at the cost of linking against +LLVM. + +Unladen Swallow recently experienced a regression in binary size, going from +19MB in Unladen's 2009Q3 release up to the current 74MB shown in the table +above. Resolution of this issue [#us-binary-size]_ will block final merger into +the ``py3k`` branch. + + +Performance Retrospective +------------------------- + +Our initial goal for Unladen Swallow was a 5x performance improvement over +CPython 2.6. We did not hit that, nor to put it bluntly, even come close. Why +did the project not hit that goal, and can an LLVM-based JIT ever hit that goal? + +Why did Unladen Swallow not achieve its 5x goal? The primary reason was +that LLVM required more work than we had initially anticipated. Based on the +fact that Apple was shipping products based on LLVM [#llvm-users]_, and +other high-level languages had successfully implemented LLVM-based JITs +([#rubinius]_, [#macruby]_, [#hlvm]_), we had assumed that LLVM's JIT was +relatively free of show-stopper bugs. + +That turned out to be incorrect. We had to turn our attention away from +performance to fix a number of critical bugs in LLVM's JIT infrastructure (for +example, [#llvm-far-call-issue]_, [#llvm-jmm-rev]_) as well as a number of +nice-to-have enhancements that would enable further optimizations along various +axes (for example, [#llvm-globaldce-rev]_, +[#llvm-memleak-rev]_, [#llvm-availext-issue]_). LLVM's static code generation +facilities, tools and optimization passes are stable and stress-tested, but the +just-in-time infrastructure was relatively untested and buggy. We have fixed +this. + +(Our hypothesis is that we hit these problems -- problems other projects had +avoided -- because of the complexity and thoroughness of CPython's standard +library test suite.) + +We also diverted engineering effort away from performance and into support tools +such as gdb and oProfile. gdb did not work well with JIT compilers at all, and +LLVM previously had no integration with oProfile. Having JIT-aware debuggers and +profilers has been very valuable to the project, and we do not regret +channeling our time in these directions. See the `Debugging`_ and `Profiling`_ +sections for more information. + +Can an LLVM-based CPython JIT ever hit the 5x performance target? The benchmark +results for JIT-based JavaScript implementations suggest that 5x is indeed +possible, as do the results PyPy's JIT has delivered for numeric workloads. The +experience of Self-92 [#urs-self]_ is also instructive. + +Can LLVM deliver this? We believe that we have only begun to scratch the surface +of what our LLVM-based JIT can deliver. The optimizations we have incorporated +into this system thus far have borne significant fruit (for example, +[#us-specialization-issue]_, [#us-direct-calling-issue]_, +[#us-fast-globals-issue]_). Our experience to date is that the limiting factor +on Unladen Swallow's performance is the engineering cycles needed to implement +the literature. We have found LLVM easy to work with and to modify, and its +built-in optimizations have greatly simplified the task of implementing +Python-level optimizations. + +An overview of further performance opportunities is discussed in the +`Future Work`_ section. + + + +Correctness and Compatibility +============================= + +Unladen Swallow's correctness test suite includes CPython's test suite (under +``Lib/test/``), as well as a number of important third-party applications and +libraries [#tested-apps]_. A full list of these applications and libraries is +reproduced below. Any dependencies needed by these packages, such as +``zope.interface`` [#zope-interface]_, are also tested indirectly as a part of +testing the primary package, thus widening the corpus of tested third-party +Python code. + +- 2to3 +- Cheetah +- cvs2svn +- Django +- Nose +- NumPy +- PyCrypto +- pyOpenSSL +- PyXML +- Setuptools +- SQLAlchemy +- SWIG +- SymPy +- Twisted +- ZODB + +These applications pass all relevant tests when run under Unladen Swallow. Note +that some tests that failed against our baseline of CPython 2.6.4 were disabled, +as were tests that made assumptions about CPython internals such as exact +bytecode numbers or bytecode format. Any package with disabled tests includes +a ``README.unladen`` file that details the changes (for example, +[#us-sqlalchemy-readme]_). + +In addition, Unladen Swallow is tested automatically against an array of +internal Google Python libraries and applications. These include Google's +internal Python bindings for BigTable [#bigtable]_, the Mondrian code review +application [#mondrian]_, and Google's Python standard library, among others. +The changes needed to run these projects under Unladen Swallow have consistently +broken into one of three camps: + +- Adding CPython 2.6 C API compatibility. Since Google still primarily uses + CPython 2.4 internally, we have needed to convert uses of ``int`` to + ``Py_ssize_t`` and similar API changes. +- Fixing or disabling explicit, incorrect tests of the CPython version number. +- Conditionally disabling code that worked around or depending on bugs in + CPython 2.4 that have since been fixed. + +Testing against this wide range of public and proprietary applications and +libraries has been instrumental in ensuring the correctness of Unladen Swallow. +Testing has exposed bugs that we have duly corrected. Our automated regression +testing regime has given us high confidence in our changes as we have moved +forward. + +In addition to third-party testing, we have added further tests to CPython's +test suite for corner cases of the language or implementation that we felt were +untested or underspecified (for example, [#us-import-tests]_, +[#us-tracing-tests]_). These have been especially important when implementing +optimizations, helping make sure we have not accidentally broken the darker +corners of Python. + +We have also constructed a test suite focused solely on the LLVM-based JIT +compiler and the optimizations implemented for it [#us-test_llvm]_. Because of +the complexity and subtlety inherent in writing an optimizing compiler, we have +attempted to exhaustively enumerate the constructs, scenarios and corner cases +we are compiling and optimizing. The JIT tests also include tests for things +like the JIT hotness model, making it easier for future CPython developers to +maintain and improve. + +We have recently begun using fuzz testing [#fuzz-testing]_ to stress-test the +compiler. We have used both pyfuzz [#pyfuzz]_ and Fusil [#fusil]_ in the past, +and we recommend they be introduced as an automated part of the CPython testing +process. + +Known Incompatibilities +----------------------- + +The only application or library we know to not work with Unladen Swallow that +does work with CPython 2.6.4 is Psyco [#psyco]_. We are aware of some libraries +such as PyGame [#pygame]_ that work well with CPython 2.6.4, but suffer some +degradation due to changes made in Unladen Swallow. We are tracking this issue +[#us-background-thread-issue]_ and are working to resolve these instances of +degradation. + +While Unladen Swallow is source-compatible with CPython 2.6.4, it is not +binary compatible. C extension modules compiled against one will need to be +recompiled to work with the other. + + +Platform Support +================ + +Unladen Swallow is inherently limited by the platform support provided by LLVM, +especially LLVM's JIT compilation system [#llvm-hardware]_. LLVM's JIT has the +best support on x86 and x86-64 systems, and these are the platforms where +Unladen Swallow has received the most testing. We are confident in LLVM/Unladen +Swallow's support for x86 and x86-64 hardware. PPC and ARM support exists, but +is not widely used and may be buggy. + +Unladen Swallow is known to work on the following operating systems: Linux, +Darwin, Windows. Unladen Swallow has received the most testing on Linux and +Darwin, though it still builds and passes its tests on Windows. + +In order to support hardware and software platforms where LLVM's JIT does not +work, Unladen Swallow provides a ``./configure --without-llvm`` option. This +flag carves out any part of Unladen Swallow that depends on LLVM, yielding a +Python binary that works and passes its tests, but has no performance +advantages. This configuration is recommended for hardware unsupported by LLVM, +or systems that care more about memory usage than performance. + + +Impact on CPython Development +============================= + +Experimenting with Changes to Python or CPython Bytecode +-------------------------------------------------------- + +Unladen Swallow's JIT compiler operates on CPython bytecode, and as such, it is +immune to Python languages changes that only affect the parser. + +We recommend that changes to the CPython bytecode compiler or the semantics of +individual bytecodes be prototyped in the interpreter loop first, then be ported +to the JIT compiler once the semantics are clear. To make this easier, Unladen +Swallow includes a ``--without-llvm`` configure-time option that strips out the +JIT compiler and all associated infrastructure. This leaves the current burden +of experimentation unchanged so that developers can prototype in the current +low-barrier-to-entry interpreter loop. + +Unladen Swallow began implementing its JIT compiler by doing straightforward, +naive translations from bytecode implementations into LLVM API calls. We found +this process to be easily understood, and we recommend the same approach for +CPython. We include several sample changes from the Unladen Swallow repository +here as examples of this style of development: [#us-r359]_, [#us-r376]_, +[#us-r417]_, [#us-r517]_. + + +Debugging +--------- + +The Unladen Swallow team implemented changes to gdb to make it easier to use gdb +to debug JIT-compiled Python code. These changes were released in gdb 7.0 +[#gdb70]_. They make it possible for gdb to identify and unwind past +JIT-generated call stack frames. This allows gdb to continue to function as +before for CPython development if one is changing, for example, the ``list`` +type or builtin functions. + +Example backtrace after our changes, where ``baz``, ``bar`` and ``foo`` are +JIT-compiled: + +:: + + Program received signal SIGSEGV, Segmentation fault. + 0x00002aaaabe7d1a8 in baz () + (gdb) bt + #0 0x00002aaaabe7d1a8 in baz () + #1 0x00002aaaabe7d12c in bar () + #2 0x00002aaaabe7d0aa in foo () + #3 0x00002aaaabe7d02c in main () + #4 0x0000000000b870a2 in llvm::JIT::runFunction (this=0x1405b70, F=0x14024e0, ArgValues=...) + at /home/rnk/llvm-gdb/lib/ExecutionEngine/JIT/JIT.cpp:395 + #5 0x0000000000baa4c5 in llvm::ExecutionEngine::runFunctionAsMain + (this=0x1405b70, Fn=0x14024e0, argv=..., envp=0x7fffffffe3c0) + at /home/rnk/llvm-gdb/lib/ExecutionEngine/ExecutionEngine.cpp:377 + #6 0x00000000007ebd52 in main (argc=2, argv=0x7fffffffe3a8, + envp=0x7fffffffe3c0) at /home/rnk/llvm-gdb/tools/lli/lli.cpp:208 + +Previously, the JIT-compiled frames would have caused gdb to unwind incorrectly, +generating lots of obviously-incorrect ``#6 0x00002aaaabe7d0aa in ?? ()``-style +stack frames. + +Highlights: + +- gdb 7.0 is able to correctly parse JIT-compiled stack frames, allowing full + use of gdb on non-JIT-compiled functions, that is, the vast majority of the + CPython codebase. +- Disassembling inside a JIT-compiled stack frame automatically prints the full + list of instructions making up that function. This is an advance over the + state of gdb before our work: developers needed to guess the starting address + of the function and manually disassemble the assembly code. +- Flexible underlying mechanism allows CPython to add more and more information, + and eventually reach parity with C/C++ support in gdb for JIT-compiled machine + code. + +Lowlights: + +- gdb cannot print local variables or tell you what line you're currently + executing inside a JIT-compiled function. Nor can it step through + JIT-compiled code, except for one instruction at a time. +- Not yet integrated with Apple's gdb or Microsoft's Visual Studio debuggers. + +The Unladen Swallow team is working with Apple to get these changes +incorporated into their future gdb releases. + + +Profiling +--------- + +Unladen Swallow integrates with oProfile 0.9.4 and newer [#oprofile]_ to support +assembly-level profiling on Linux systems. This means that oProfile will +correctly symbolize JIT-compiled functions in its reports. + +Example report, where the ``#u#``-prefixed symbol names are JIT-compiled Python +functions: + +:: + + $ opreport -l ./python | less + CPU: Core 2, speed 1600 MHz (estimated) + Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 100000 + samples % image name symbol name + 79589 4.2329 python PyString_FromFormatV + 62971 3.3491 python PyEval_EvalCodeEx + 62713 3.3354 python tupledealloc + 57071 3.0353 python _PyEval_CallFunction + 50009 2.6597 24532.jo #u#force_unicode + 47468 2.5246 python PyUnicodeUCS2_Decode + 45829 2.4374 python PyFrame_New + 45173 2.4025 python lookdict_string + 43082 2.2913 python PyType_IsSubtype + 39763 2.1148 24532.jo #u#render5 + 38145 2.0287 python _PyType_Lookup + 37643 2.0020 python PyObject_GC_UnTrack + 37105 1.9734 python frame_dealloc + 36849 1.9598 python PyEval_EvalFrame + 35630 1.8950 24532.jo #u#resolve + 33313 1.7717 python PyObject_IsInstance + 33208 1.7662 python PyDict_GetItem + 33168 1.7640 python PyTuple_New + 30458 1.6199 python PyCFunction_NewEx + +This support is functional, but as-yet unpolished. Unladen Swallow maintains a +punchlist of items we feel are important to improve in our oProfile integration +to make it more useful to core CPython developers [#us-oprofile-punchlist]_. + +Highlights: + +- Symbolization of JITted frames working in oProfile on Linux. + +Lowlights: + +- No work yet invested in improving symbolization of JIT-compiled frames for + Apple's Shark [#shark]_ or Microsoft's Visual Studio profiling tools. +- Some polishing still desired for oProfile output. + +We recommend using oProfile 0.9.5 (and newer) to work around a now-fixed bug on +x86-64 platforms in oProfile. oProfile 0.9.4 will work fine on 32-bit platforms, +however. + +Given the ease of integrating oProfile with LLVM [#llvm-oprofile-change]_ and +Unladen Swallow [#us-oprofile-change]_, other profiling tools should be easy as +well, provided they support a similar JIT interface [#oprofile-jit-interface]_. + + +Addition of C++ to CPython +-------------------------- + +In order to use LLVM, Unladen Swallow has introduced C++ into the core CPython +tree and build process. This is an unavoidable part of depending on LLVM; though +LLVM offers a C API [#llvm-c-api]_, it is limited and does not expose the +functionality needed by CPython. Because of this, we have implemented the +internal details of the Unladen Swallow JIT and its supporting infrastructure +in C++. We do not propose converting the entire CPython codebase to C++. + +Highlights: + +- Easy use of LLVM's full, powerful code generation and related APIs. +- Convenient, abstract data structures simplify code. +- C++ is limited to relatively small corners of the CPython codebase. + +Lowlights: + +- Developers must know two related languages, C and C++ to work on the full + range of CPython's internals. +- A C++ style guide will need to be developed and enforced. See `Open Issues`_. + + +Managing LLVM Releases, C++ API Changes +--------------------------------------- + +LLVM is released regularly every six months. This means that LLVM may be +released two or three times during the course of development of a CPython 3.x +release. Each LLVM release brings newer and more powerful optimizations, +improved platform support and more sophisticated code generation. + +LLVM releases usually include incompatible changes to the LLVM C++ API; the +release notes for LLVM 2.6 [#llvm-26-whatsnew]_ include a list of +intentionally-introduced incompatibilities. Unladen Swallow has tracked LLVM +trunk closely over the course of development. Our experience has been +that LLVM API changes are obvious and easily or mechanically remedied. We +include two such changes from the Unladen Swallow tree as references here: +[#us-llvm-r820]_, [#us-llvm-r532]_. + +Due to API incompatibilities, we recommend that an LLVM-based CPython target +compatibility with a single version of LLVM at a time. This will lower the +overhead on the core development team. Pegging to an LLVM version should not be +a problem from a packaging perspective, because pre-built LLVM packages +generally become available via standard system package managers fairly quickly +following an LLVM release, and failing that, llvm.org itself includes binary +releases. + +Pre-built LLVM packages are available from MacPorts [#llvm-macports]_ for +Darwin, and from most major Linux distributions ([#llvm-ubuntu]_, +[#llvm-debian]_, [#llvm-fedora]_). LLVM itself provides additional binaries, +such as for MinGW [#llvm-mingw]_. + +LLVM is currently intended to be statically linked; this means that binary +releases of CPython will include the relevant parts (not all!) of LLVM. This +will increase the binary size, as noted above. + +Unladen Swallow has tasked a full-time engineer with fixing any remaining +critical issues in LLVM before LLVM's 2.7 release. We would like CPython 3.x to +be able to depend on a released version of LLVM, rather than closely tracking +LLVM trunk as Unladen Swallow has done. We believe we will finish this work +before the release of LLVM 2.7, expected in May 2010. + + +Building CPython +---------------- + +In addition to a runtime dependency on LLVM, Unladen Swallow includes a +build-time dependency on Clang [#clang]_, an LLVM-based C/C++ compiler. We use +this to compile parts of the C-language Python runtime to LLVM's intermediate +representation; this allows us to perform cross-language inlining, yielding +increased performance. Clang is not required to run Unladen Swallow. Clang +binary packages are available from most major Linux distributions (for example, +[#clang-debian]_). + +We examined the impact of Unladen Swallow on the time needed to build Python, +including configure, full builds and incremental builds after touching a single +C source file. + ++-------------+---------------+---------------+----------------------+ +| ./configure | CPython 2.6.4 | CPython 3.1.1 | Unladen Swallow r988 | ++=============+===============+===============+======================+ +| Run 1 | 0m20.795s | 0m16.558s | 0m15.477s | ++-------------+---------------+---------------+----------------------+ +| Run 2 | 0m15.255s | 0m16.349s | 0m15.391s | ++-------------+---------------+---------------+----------------------+ +| Run 3 | 0m15.228s | 0m16.299s | 0m15.528s | ++-------------+---------------+---------------+----------------------+ + ++-------------+---------------+---------------+----------------------+ +| Full make | CPython 2.6.4 | CPython 3.1.1 | Unladen Swallow r988 | ++=============+===============+===============+======================+ +| Run 1 | 1m30.776s | 1m22.367s | 1m54.053s | ++-------------+---------------+---------------+----------------------+ +| Run 2 | 1m21.374s | 1m22.064s | 1m49.448s | ++-------------+---------------+---------------+----------------------+ +| Run 3 | 1m22.047s | 1m23.645s | 1m49.305s | ++-------------+---------------+---------------+----------------------+ + +Full builds take a hit due to a) additional ``.cc`` files needed for LLVM +interaction, b) statically linking LLVM into ``libpython``, c) compiling parts +of the Python runtime to LLVM IR to enable cross-language inlining. + +Incremental builds, however, are significantly slower. The table below shows +incremental rebuild times after touching ``Objects/listobject.c``. + ++-------------+---------------+---------------+----------------------+ +| Incr make | CPython 2.6.4 | CPython 3.1.1 | Unladen Swallow r988 | ++=============+===============+===============+======================+ +| Run 1 | 0m1.854s | 0m1.456s | 0m24.464s | ++-------------+---------------+---------------+----------------------+ +| Run 2 | 0m1.437s | 0m1.442s | 0m24.416s | ++-------------+---------------+---------------+----------------------+ +| Run 3 | 0m1.440s | 0m1.425s | 0m24.352s | ++-------------+---------------+---------------+----------------------+ + +As with full builds, this extra time comes from a) additional ``.cc`` files +needed for LLVM interaction, and b) statically linking LLVM into ``libpython``. + +If ``libpython`` were linked shared against LLVM, this overhead would go down. +Incremental builds of Unladen Swallow also currently (as of r988) suffer from a +known bug in the Unladen Swallow ``Makefile`` [#rebuild-too-much]_ where too +many ``.cc`` files are recompiled. We consider this a blocking issue for full +merger with the ``py3k`` branch. + + +Proposed Merge Plan +=================== + +We propose focusing our efforts on eventual merger with CPython's 3.x line of +development. The BDFL has indicated that 2.7 is to be the final release of +CPython's 2.x line of development [#bdfl-27-final]_, and since 2.7 alpha 1 has +already been released [#cpy-27a1]_, we have missed the window. Python 3 is the +future, and that is where we will target our performance efforts. + +We recommend the following plan for merger of Unladen Swallow into the CPython +source tree: + +- Creation of a branch in the CPython SVN repository to work in, call it + ``py3k-jit`` as a strawman. This will be a branch of the CPython ``py3k`` + branch. +- We will keep this branch closely integrated to ``py3k``. The further we + deviate, the harder our work will be. +- Any JIT-related patches will go into the ``py3k-jit`` branch. +- Non-JIT-related patches will go into the ``py3k`` branch (once reviewed and + approved) and be merged back into the ``py3k-jit`` branch. +- Potentially-contentious issues, such as the introduction of new command line + flags or environment variables, will be discussed on python-dev. + + +Because Google uses CPython 2.x internally, Unladen Swallow is based on CPython +2.6. We would need to port our compiler to Python 3; this would be done as +patches are applied to the ``py3k-jit`` branch, so that the branch remains a +consistent implementation of Python 3 at all times. + +We believe this approach will be minimally disruptive to the 3.2 or 3.3 release +process while we iron out any remaining issues blocking final merger into +``py3k``. Unladen Swallow maintains a punchlist of known issues needed before +final merger [#us-punchlist]_, which includes all problems mentioned in this +PEP; we trust the CPython community will have its own concerns. + +See the `Open Issues`_ section for questions about code review policy for the +``py3k-jit`` branch. + + +Future Work +=========== + +A JIT compiler is an extremely flexible tool, and we have by no means exhausted +its full potential. Unladen Swallow maintains a list of yet-to-be-implemented +performance optimizations [#us-perf-punchlist]_ that the team has not yet +had time to fully implement. Examples: + +- Python/Python inlining [#inlining]_. Our compiler currently performs no + inlining between pure-Python functions. Work on this is on-going + [#us-inlining]_. +- Unboxing [#unboxing]_. Unboxing is critical for numerical performance. PyPy + in particular has demonstrated the value of unboxing to heavily-numeric + workloads. +- Recompilation, adaptation. Unladen Swallow currently only compiles a Python + function once, based on its usage pattern up to that point. If the usage + pattern changes, limitations in LLVM [#us-recompile-issue]_ prevent us from + recompiling the function to better serve the new usage pattern. +- JIT-compile regular expressions. Modern JavaScript engines reuse their JIT + compilation infrastructure to boost regex performance [#us-regex-perf]_. + Unladen Swallow has developed benchmarks for Python regular expression + performance ([#us-bm-re-compile]_, [#us-bm-re-v8]_, [#us-bm-re-effbot]_), but + work on regex performance is still at an early stage [#us-regex-issue]_. +- Trace compilation [#traces-waste-of-time]_, [#traces-explicit-pipeline]_. + Based on the results of PyPy and Tracemonkey [#tracemonkey]_, we believe that + a CPython JIT should incorporate trace compilation to some degree. We + initially avoided a purely-tracing JIT compiler in favor of a simpler, + function-at-a-time compiler. However this function-at-a-time compiler has laid + the groundwork for a future tracing compiler implemented in the same terms. + +This list is by no means exhaustive. There is a vast literature on optimizations +for dynamic languages that could and should be implemented in terms of Unladen +Swallow's LLVM-based JIT compiler [#us-relevantpapers]_. + + +Open Issues +=========== + +- *Code review policy for the ``py3k-jit`` branch.* How does the CPython + community want us to procede with respect to checkins on the ``py3k-jit`` + branch? Pre-commit reviews? Post-commit reviews? + + Unladen Swallow has enforced pre-commit reviews in our trunk, but we realize + this may lead to long review/checkin cycles in a purely-volunteer + organization. We would like a non-Google-affiliated member of the CPython + development team to review our work for correctness and compatibility, but we + realize this may not be possible for every commit. +- *How to link LLVM.* Should we change LLVM to better support shared linking, + and then use shared linking to link the parts of it we need into CPython? +- *Prioritization of remaining issues.* We would like input from the CPython + development team on how to prioritize the remaining issues in the Unladen + Swallow codebase. Some issues like memory usage are obviously critical before + merger with ``py3k``, but others may fall into a "nice to have" category that + could be kept for resolution into a future CPython 3.x release. + +- *Create a C++ style guide.* Should PEP 7 be extended to include C++, or + should a separate C++ style PEP be created? Unladen Swallow maintains its own + style guide [#us-styleguide]_, which may serve as a starting point; the + Unladen Swallow style guide is based on both LLVM's [#llvm-styleguide]_ and + Google's [#google-styleguide]_ C++ style guides. + + +Unladen Swallow Community +========================= + +We would like to thank the community of developers who have contributed to +Unladen Swallow, in particular: James Abbatiello, Joerg Blank, Eric Christopher, +Alex Gaynor, Chris Lattner, Nick Lewycky, Evan Phoenix and Thomas Wouters. + + +Licensing +========= + +All work on Unladen Swallow is licensed to the Python Software Foundation (PSF) +under the terms of the Python Software Foundation License v2 [#psf-lic]_ under +the umbrella of Google's blanket Contributor License Agreement with the PSF. + + +References +========== + +.. [#us] + http://code.google.com/p/unladen-swallow/ + +.. [#llvm] + http://llvm.org/ + +.. [#clang] + http://clang.llvm.org/ + +.. [#tested-apps] + http://code.google.com/p/unladen-swallow/wiki/Testing + +.. [#llvm-hardware] + http://llvm.org/docs/GettingStarted.html#hardware + +.. [#rebuild-too-much] + http://code.google.com/p/unladen-swallow/issues/detail?id=115 + +.. [#llvm-c-api] + http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm-c/ + +.. [#llvm-26-whatsnew] + http://llvm.org/releases/2.6/docs/ReleaseNotes.html#whatsnew + +.. [#us-llvm-r820] + http://code.google.com/p/unladen-swallow/source/detail?r=820 + +.. [#us-llvm-r532] + http://code.google.com/p/unladen-swallow/source/detail?r=532 + +.. [#llvm-macports] + http://trac.macports.org/browser/trunk/dports/lang/llvm/Portfile + +.. [#llvm-ubuntu] + http://packages.ubuntu.com/karmic/llvm + +.. [#llvm-debian] + http://packages.debian.org/unstable/devel/llvm + +.. [#clang-debian] + http://packages.debian.org/sid/clang + +.. [#llvm-fedora] + http://koji.fedoraproject.org/koji/buildinfo?buildID=134384 + +.. [#gdb70] + http://www.gnu.org/software/gdb/download/ANNOUNCEMENT + +.. [#oprofile] + http://oprofile.sourceforge.net/news/ + +.. [#us-oprofile-punchlist] + http://code.google.com/p/unladen-swallow/issues/detail?id=63 + +.. [#shark] + http://developer.apple.com/tools/sharkoptimize.html + +.. [#llvm-oprofile-change] + http://llvm.org/viewvc/llvm-project?view=rev&revision=75279 + +.. [#us-oprofile-change] + http://code.google.com/p/unladen-swallow/source/detail?r=986 + +.. [#oprofile-jit-interface] + http://oprofile.sourceforge.net/doc/devel/jit-interface.html + +.. [#llvm-mingw] + http://llvm.org/releases/download.html + +.. [#us-r359] + http://code.google.com/p/unladen-swallow/source/detail?r=359 + +.. [#us-r376] + http://code.google.com/p/unladen-swallow/source/detail?r=376 + +.. [#us-r417] + http://code.google.com/p/unladen-swallow/source/detail?r=417 + +.. [#us-r517] + http://code.google.com/p/unladen-swallow/source/detail?r=517 + +.. [#bdfl-27-final] + http://mail.python.org/pipermail/python-dev/2010-January/095682.html + +.. [#cpy-27a1] + http://www.python.org/dev/peps/pep-0373/ + +.. [#cpy-32]_ + http://www.python.org/dev/peps/pep-0392/ + +.. [#us-punchlist] + http://code.google.com/p/unladen-swallow/issues/list?q=label:Merger + +.. [#us-binary-size] + http://code.google.com/p/unladen-swallow/issues/detail?id=118 + +.. [#us-issue-startup-time] + http://code.google.com/p/unladen-swallow/issues/detail?id=64 + +.. [#zope-interface] + http://www.zope.org/Products/ZopeInterface + +.. [#bigtable] + http://en.wikipedia.org/wiki/BigTable + +.. [#mondrian] + http://www.niallkennedy.com/blog/2006/11/google-mondrian.html + +.. [#us-sqlalchemy-readme] + http://code.google.com/p/unladen-swallow/source/browse/tests/lib/sqlalchemy/README.unladen + +.. [#us-test_llvm] + http://code.google.com/p/unladen-swallow/source/browse/trunk/Lib/test/test_llvm.py + +.. [#fuzz-testing] + http://en.wikipedia.org/wiki/Fuzz_testing + +.. [#pyfuzz] + http://bitbucket.org/ebo/pyfuzz/overview/ + +.. [#fusil] + http://lwn.net/Articles/322826/ + +.. [#us-memory-issue] + http://code.google.com/p/unladen-swallow/issues/detail?id=68 + +.. [#us-benchmarks] + http://code.google.com/p/unladen-swallow/wiki/Benchmarks + +.. [#students-t-test] + http://en.wikipedia.org/wiki/Student's_t-test + +.. [#smaps] + http://bmaurer.blogspot.com/2006/03/memory-usage-with-smaps.html + +.. [#us-background-thread] + http://code.google.com/p/unladen-swallow/source/browse/branches/background-thread + +.. [#us-background-thread-issue] + http://code.google.com/p/unladen-swallow/issues/detail?id=40 + +.. [#us-import-tests] + http://code.google.com/p/unladen-swallow/source/detail?r=888 + +.. [#us-tracing-tests] + http://code.google.com/p/unladen-swallow/source/diff?spec=svn576&r=576&format=side&path=/trunk/Lib/test/test_trace.py + +.. [#us-perf-punchlist] + http://code.google.com/p/unladen-swallow/issues/list?q=label:Performance + +.. [#jit] + http://en.wikipedia.org/wiki/Just-in-time_compilation + +.. [#urs-self] + http://research.sun.com/self/papers/urs-thesis.html + +.. [#us-projectplan] + http://code.google.com/p/unladen-swallow/wiki/ProjectPlan + +.. [#us-relevantpapers] + http://code.google.com/p/unladen-swallow/wiki/RelevantPapers + +.. [#us-llvm-notes] + http://code.google.com/p/unladen-swallow/source/browse/trunk/Python/llvm_notes.txt + +.. [#psf-lic] + http://www.python.org/psf/license/ + +.. [#v8] + http://code.google.com/p/v8/ + +.. [#squirrelfishextreme] + http://webkit.org/blog/214/introducing-squirrelfish-extreme/ + +.. [#rubinius] + http://rubini.us/ + +.. [#parrot-on-llvm] + http://lists.parrot.org/pipermail/parrot-dev/2009-September/002811.html + +.. [#macruby] + http://www.macruby.org/ + +.. [#hotspot] + http://en.wikipedia.org/wiki/HotSpot + +.. [#psyco] + http://psyco.sourceforge.net/ + +.. [#pypy] + http://codespeak.net/pypy/dist/pypy/doc/ + +.. [#inlining] + http://en.wikipedia.org/wiki/Inline_expansion + +.. [#unboxing] + http://en.wikipedia.org/wiki/Object_type_(object-oriented_programming) + +.. [#us-inlining] + http://code.google.com/p/unladen-swallow/issues/detail?id=86 + +.. [#us-styleguide] + http://code.google.com/p/unladen-swallow/wiki/StyleGuide + +.. [#llvm-styleguide] + http://llvm.org/docs/CodingStandards.html + +.. [#google-styleguide] + http://google-styleguide.googlecode.com/svn/trunk/cppguide.xml + +.. [#us-recompile-issue] + http://code.google.com/p/unladen-swallow/issues/detail?id=41 + +.. [#us-regex-perf] + http://code.google.com/p/unladen-swallow/wiki/ProjectPlan#Regular_Expressions + +.. [#us-bm-re-compile] + http://code.google.com/p/unladen-swallow/source/browse/tests/performance/bm_regex_compile.py + +.. [#us-bm-re-v8] + http://code.google.com/p/unladen-swallow/source/browse/tests/performance/bm_regex_v8.py + +.. [#us-bm-re-effbot] + http://code.google.com/p/unladen-swallow/source/browse/tests/performance/bm_regex_effbot.py + +.. [#us-regex-issue] + http://code.google.com/p/unladen-swallow/issues/detail?id=13 + +.. [#pygame] + http://www.pygame.org/ + +.. [#numpy] + http://numpy.scipy.org/ + +.. [#pypy-bmarks] + http://codespeak.net:8099/plotsummary.html + +.. [#llvm-users] + http://llvm.org/Users.html + +.. [#hlvm] + http://www.ffconsultancy.com/ocaml/hlvm/ + +.. [#llvm-far-call-issue] + http://llvm.org/PR5201 + +.. [#llvm-jmm-rev] + http://llvm.org/viewvc/llvm-project?view=rev&revision=76828 + +.. [#llvm-memleak-rev] + http://llvm.org/viewvc/llvm-project?rev=91611&view=rev + +.. [#llvm-globaldce-rev] + http://llvm.org/viewvc/llvm-project?rev=85182&view=rev + +.. [#llvm-availext-issue] + http://llvm.org/PR5735 + +.. [#us-specialization-issue] + http://code.google.com/p/unladen-swallow/issues/detail?id=73 + +.. [#us-direct-calling-issue] + http://code.google.com/p/unladen-swallow/issues/detail?id=88 + +.. [#us-fast-globals-issue] + http://code.google.com/p/unladen-swallow/issues/detail?id=67 + +.. [#traces-waste-of-time] + http://www.ics.uci.edu/~franz/Site/pubs-pdf/C44Prepub.pdf + +.. [#traces-explicit-pipeline] + http://www.ics.uci.edu/~franz/Site/pubs-pdf/ICS-TR-07-12.pdf + +.. [#tracemonkey] + https://wiki.mozilla.org/JavaScript:TraceMonkey + +.. [#llvm-langref] + http://llvm.org/docs/LangRef.html + +.. [#us-wider-perf-issue] + http://code.google.com/p/unladen-swallow/issues/detail?id=120 + +.. [#us-nbody] + http://code.google.com/p/unladen-swallow/source/browse/tests/performance/bm_nbody.py + + +Copyright +========= + +This document has been placed in the public domain. + +.. + Local Variables: + mode: indented-text + indent-tabs-mode: nil + sentence-end-double-space: t + fill-column: 70 + coding: utf-8 + End: + + +