2010-01-20 17:08:04 -05:00
|
|
|
PEP: 3146
|
|
|
|
Title: Merging Unladen Swallow into CPython
|
|
|
|
Version: $Revision$
|
|
|
|
Last-Modified: $Date$
|
|
|
|
Author: Collin Winter <collinwinter@google.com>,
|
|
|
|
Jeffrey Yasskin <jyasskin@google.com>,
|
|
|
|
Reid Kleckner <rnk@mit.edu>
|
2011-08-30 06:49:35 -04:00
|
|
|
Status: Withdrawn
|
2010-01-20 17:08:04 -05:00
|
|
|
Type: Standards Track
|
|
|
|
Content-Type: text/x-rst
|
|
|
|
Created: 1-Jan-2010
|
|
|
|
Python-Version: 3.3
|
|
|
|
Post-History:
|
|
|
|
|
|
|
|
|
2011-08-30 06:49:35 -04:00
|
|
|
PEP Withdrawal
|
|
|
|
==============
|
|
|
|
|
|
|
|
With Unladen Swallow going the way of the Norwegian Blue [#us-post-mortem]_
|
|
|
|
[#dead-parrot]_, this PEP has been deemed to have been withdrawn.
|
|
|
|
|
|
|
|
|
2010-01-20 17:08:04 -05:00
|
|
|
Abstract
|
|
|
|
========
|
|
|
|
|
|
|
|
This PEP proposes the merger of the Unladen Swallow project [#us]_ into
|
|
|
|
CPython's source tree. Unladen Swallow is an open-source branch of CPython
|
|
|
|
focused on performance. Unladen Swallow is source-compatible with valid Python
|
|
|
|
2.6.4 applications and C extension modules.
|
|
|
|
|
|
|
|
Unladen Swallow adds a just-in-time (JIT) compiler to CPython, allowing for the
|
|
|
|
compilation of selected Python code to optimized machine code. Beyond classical
|
|
|
|
static compiler optimizations, Unladen Swallow's JIT compiler takes advantage of
|
|
|
|
data collected at runtime to make checked assumptions about code behaviour,
|
|
|
|
allowing the production of faster machine code.
|
|
|
|
|
|
|
|
This PEP proposes to integrate Unladen Swallow into CPython's development tree
|
|
|
|
in a separate ``py3k-jit`` branch, targeted for eventual merger with the main
|
|
|
|
``py3k`` branch. While Unladen Swallow is by no means finished or perfect, we
|
|
|
|
feel that Unladen Swallow has reached sufficient maturity to warrant
|
|
|
|
incorporation into CPython's roadmap. We have sought to create a stable platform
|
|
|
|
that the wider CPython development team can build upon, a platform that will
|
|
|
|
yield increasing performance for years to come.
|
|
|
|
|
|
|
|
This PEP will detail Unladen Swallow's implementation and how it differs from
|
|
|
|
CPython 2.6.4; the benchmarks used to measure performance; the tools used to
|
|
|
|
ensure correctness and compatibility; the impact on CPython's current platform
|
|
|
|
support; and the impact on the CPython core development process. The PEP
|
|
|
|
concludes with a proposed merger plan and brief notes on possible directions
|
|
|
|
for future work.
|
|
|
|
|
|
|
|
We seek the following from the BDFL:
|
|
|
|
|
|
|
|
- Approval for the overall concept of adding a just-in-time compiler to CPython,
|
|
|
|
following the design laid out below.
|
|
|
|
- Permission to continue working on the just-in-time compiler in the CPython
|
|
|
|
source tree.
|
|
|
|
- Permission to eventually merge the just-in-time compiler into the ``py3k``
|
2010-02-20 17:02:19 -05:00
|
|
|
branch once all blocking issues [#us-punchlist]_ have been addressed.
|
2010-01-20 17:08:04 -05:00
|
|
|
- A pony.
|
|
|
|
|
|
|
|
|
|
|
|
Rationale, Implementation
|
|
|
|
=========================
|
|
|
|
|
|
|
|
Many companies and individuals would like Python to be faster, to enable its
|
|
|
|
use in more projects. Google is one such company.
|
|
|
|
|
|
|
|
Unladen Swallow is a Google-sponsored branch of CPython, initiated to improve
|
|
|
|
the performance of Google's numerous Python libraries, tools and applications.
|
|
|
|
To make the adoption of Unladen Swallow as easy as possible, the project
|
|
|
|
initially aimed at four goals:
|
|
|
|
|
|
|
|
- A performance improvement of 5x over the baseline of CPython 2.6.4 for
|
|
|
|
single-threaded code.
|
|
|
|
- 100% source compatibility with valid CPython 2.6 applications.
|
|
|
|
- 100% source compatibility with valid CPython 2.6 C extension modules.
|
|
|
|
- Design for eventual merger back into CPython.
|
|
|
|
|
|
|
|
We chose 2.6.4 as our baseline because Google uses CPython 2.4 internally, and
|
|
|
|
jumping directly from CPython 2.4 to CPython 3.x was considered infeasible.
|
|
|
|
|
|
|
|
To achieve the desired performance, Unladen Swallow has implemented a
|
|
|
|
just-in-time (JIT) compiler [#jit]_ in the tradition of Urs Hoelzle's work on
|
|
|
|
Self [#urs-self]_, gathering feedback at runtime and using that to inform
|
|
|
|
compile-time optimizations. This is similar to the approach taken by the current
|
|
|
|
breed of JavaScript engines [#v8]_, [#squirrelfishextreme]_; most Java virtual
|
|
|
|
machines [#hotspot]_; Rubinius [#rubinius]_, MacRuby [#macruby]_, and other Ruby
|
|
|
|
implementations; Psyco [#psyco]_; and others.
|
|
|
|
|
|
|
|
We explicitly reject any suggestion that our ideas are original. We have sought
|
|
|
|
to reuse the published work of other researchers wherever possible. If we have
|
|
|
|
done any original work, it is by accident. We have tried, as much as possible,
|
|
|
|
to take good ideas from all corners of the academic and industrial community. A
|
|
|
|
partial list of the research papers that have informed Unladen Swallow is
|
|
|
|
available on the Unladen Swallow wiki [#us-relevantpapers]_.
|
|
|
|
|
|
|
|
The key observation about optimizing dynamic languages is that they are only
|
|
|
|
dynamic in theory; in practice, each individual function or snippet of code is
|
|
|
|
relatively static, using a stable set of types and child functions. The current
|
|
|
|
CPython bytecode interpreter assumes the worst about the code it is running,
|
|
|
|
that at any moment the user might override the ``len()`` function or pass a
|
|
|
|
never-before-seen type into a function. In practice this never happens, but user
|
|
|
|
code pays for that support. Unladen Swallow takes advantage of the relatively
|
|
|
|
static nature of user code to improve performance.
|
|
|
|
|
|
|
|
At a high level, the Unladen Swallow JIT compiler works by translating a
|
|
|
|
function's CPython bytecode to platform-specific machine code, using data
|
|
|
|
collected at runtime, as well as classical compiler optimizations, to improve
|
|
|
|
the quality of the generated machine code. Because we only want to spend
|
|
|
|
resources compiling Python code that will actually benefit the runtime of the
|
|
|
|
program, an online heuristic is used to assess how hot a given function is. Once
|
|
|
|
the hotness value for a function crosses a given threshold, it is selected for
|
|
|
|
compilation and optimization. Until a function is judged hot, however, it runs
|
|
|
|
in the standard CPython eval loop, which in Unladen Swallow has been
|
|
|
|
instrumented to record interesting data about each bytecode executed. This
|
|
|
|
runtime data is used to reduce the flexibility of the generated machine code,
|
|
|
|
allowing us to optimize for the common case. For example, we collect data on
|
|
|
|
|
|
|
|
- Whether a branch was taken/not taken. If a branch is never taken, we will not
|
|
|
|
compile it to machine code.
|
|
|
|
- Types used by operators. If we find that ``a + b`` is only ever adding
|
|
|
|
integers, the generated machine code for that snippet will not support adding
|
|
|
|
floats.
|
|
|
|
- Functions called at each callsite. If we find that a particular ``foo()``
|
|
|
|
callsite is always calling the same ``foo`` function, we can optimize the
|
|
|
|
call or inline it away
|
|
|
|
|
|
|
|
Refer to [#us-llvm-notes]_ for a complete list of data points gathered and how
|
|
|
|
they are used.
|
|
|
|
|
|
|
|
However, if by chance the historically-untaken branch is now taken, or some
|
|
|
|
integer-optimized ``a + b`` snippet receives two strings, we must support this.
|
|
|
|
We cannot change Python semantics. Each of these sections of optimized machine
|
|
|
|
code is preceded by a `guard`, which checks whether the simplifying assumptions
|
|
|
|
we made when optimizing still hold. If the assumptions are still valid, we run
|
|
|
|
the optimized machine code; if they are not, we revert back to the interpreter
|
|
|
|
and pick up where we left off.
|
|
|
|
|
|
|
|
We have chosen to reuse a set of existing compiler libraries called LLVM
|
|
|
|
[#llvm]_ for code generation and code optimization. This has saved our small
|
|
|
|
team from needing to understand and debug code generation on multiple machine
|
|
|
|
instruction sets and from needing to implement a large set of classical compiler
|
|
|
|
optimizations. The project would not have been possible without such code reuse.
|
|
|
|
We have found LLVM easy to modify and its community receptive to our suggestions
|
|
|
|
and modifications.
|
|
|
|
|
|
|
|
In somewhat more depth, Unladen Swallow's JIT works by compiling CPython
|
|
|
|
bytecode to LLVM's own intermediate representation (IR) [#llvm-langref]_, taking
|
|
|
|
into account any runtime data from the CPython eval loop. We then run a set of
|
|
|
|
LLVM's built-in optimization passes, producing a smaller, optimized version of
|
|
|
|
the original LLVM IR. LLVM then lowers the IR to platform-specific machine code,
|
|
|
|
performing register allocation, instruction scheduling, and any necessary
|
|
|
|
relocations. This arrangement of the compilation pipeline allows the LLVM-based
|
|
|
|
JIT to be easily omitted from a compiled ``python`` binary by passing
|
|
|
|
``--without-llvm`` to ``./configure``; various use cases for this flag are
|
|
|
|
discussed later.
|
|
|
|
|
|
|
|
For a complete detailing of how Unladen Swallow works, consult the Unladen
|
|
|
|
Swallow documentation [#us-projectplan]_, [#us-llvm-notes]_.
|
|
|
|
|
|
|
|
Unladen Swallow has focused on improving the performance of single-threaded,
|
|
|
|
pure-Python code. We have not made an effort to remove CPython's global
|
|
|
|
interpreter lock (GIL); we feel this is separate from our work, and due to its
|
|
|
|
sensitivity, is best done in a mainline development branch. We considered
|
|
|
|
making GIL-removal a part of Unladen Swallow, but were concerned by the
|
|
|
|
possibility of introducing subtle bugs when porting our work from CPython 2.6
|
|
|
|
to 3.x.
|
|
|
|
|
|
|
|
A JIT compiler is an extremely versatile tool, and we have by no means
|
|
|
|
exhausted its full potential. We have tried to create a sufficiently flexible
|
|
|
|
framework that the wider CPython development community can build upon it for
|
|
|
|
years to come, extracting increased performance in each subsequent release.
|
|
|
|
|
2010-02-08 21:51:26 -05:00
|
|
|
Alternatives
|
|
|
|
------------
|
|
|
|
|
|
|
|
There are number of alternative strategies for improving Python performance
|
|
|
|
which we considered, but found unsatisfactory.
|
|
|
|
|
|
|
|
- *Cython, Shedskin*: Cython [#cython]_ and Shedskin [#shedskin]_ are both
|
|
|
|
static compilers for Python. We view these as useful-but-limited workarounds
|
|
|
|
for CPython's historically-poor performance. Shedskin does not support the
|
|
|
|
full Python standard library [#shedskin-library-limits]_, while Cython
|
|
|
|
requires manual Cython-specific annotations for optimum performance.
|
|
|
|
|
|
|
|
Static compilers like these are useful for writing extension modules without
|
|
|
|
worrying about reference counting, but because they are static, ahead-of-time
|
|
|
|
compilers, they cannot optimize the full range of code under consideration by
|
|
|
|
a just-in-time compiler informed by runtime data.
|
|
|
|
- *IronPython*: IronPython [#ironpython]_ is Python on Microsoft's .Net
|
|
|
|
platform. It is not actively tested on Mono [#mono]_, meaning that it is
|
|
|
|
essentially Windows-only, making it unsuitable as a general CPython
|
|
|
|
replacement.
|
|
|
|
- *Jython*: Jython [#jython]_ is a complete implementation of Python 2.5, but
|
|
|
|
is significantly slower than Unladen Swallow (3-5x on measured benchmarks) and
|
|
|
|
has no support for CPython extension modules [#jython-c-ext]_, which would
|
|
|
|
make migration of large applications prohibitively expensive.
|
|
|
|
- *Psyco*: Psyco [#psyco]_ is a specializing JIT compiler for CPython,
|
|
|
|
implemented as an extension module. It primarily improves performance for
|
|
|
|
numerical code. Pros: exists; makes some code faster. Cons: 32-bit only, with
|
|
|
|
no plans for 64-bit support; supports x86 only; very difficult to maintain;
|
|
|
|
incompatible with SSE2 optimized code due to alignment issues.
|
|
|
|
- *PyPy*: PyPy [#pypy]_ has good performance on numerical code, but is slower
|
2010-02-20 17:02:19 -05:00
|
|
|
than Unladen Swallow on some workloads. Migration of large applications from
|
|
|
|
CPython to PyPy would be prohibitively expensive: PyPy's JIT compiler supports
|
|
|
|
only 32-bit x86 code generation; important modules, such as MySQLdb and
|
2010-02-25 14:36:05 -05:00
|
|
|
pycrypto, do not build against PyPy; PyPy does not offer an embedding API,
|
|
|
|
much less the same API as CPython.
|
2010-02-08 21:51:26 -05:00
|
|
|
- *PyV8*: PyV8 [#pyv8]_ is an alpha-stage experimental Python-to-JavaScript
|
|
|
|
compiler that runs on top of V8. PyV8 does not implement the whole Python
|
|
|
|
language, and has no support for CPython extension modules.
|
|
|
|
- *WPython*: WPython [#wpython]_ is a wordcode-based reimplementation of
|
|
|
|
CPython's interpreter loop. While it provides a modest improvement to
|
|
|
|
interpreter performance [#wpython-performance]_, it is not an either-or
|
|
|
|
substitute for a just-in-time compiler. An interpreter will never be as fast
|
|
|
|
as optimized machine code. We view WPython and similar interpreter
|
|
|
|
enhancements as complementary to our work, rather than as competitors.
|
|
|
|
|
|
|
|
|
2010-01-20 17:08:04 -05:00
|
|
|
|
|
|
|
Performance
|
|
|
|
===========
|
|
|
|
|
|
|
|
Benchmarks
|
|
|
|
----------
|
|
|
|
|
|
|
|
Unladen Swallow has developed a fairly large suite of benchmarks, ranging from
|
|
|
|
synthetic microbenchmarks designed to test a single feature up through
|
|
|
|
whole-application macrobenchmarks. The inspiration for these benchmarks has come
|
|
|
|
variously from third-party contributors (in the case of the ``html5lib``
|
|
|
|
benchmark), Google's own internal workloads (``slowspitfire``, ``pickle``,
|
|
|
|
``unpickle``), as well as tools and libraries in heavy use throughout the wider
|
|
|
|
Python community (``django``, ``2to3``, ``spambayes``). These benchmarks are run
|
|
|
|
through a single interface called ``perf.py`` that takes care of collecting
|
|
|
|
memory usage information, graphing performance, and running statistics on the
|
|
|
|
benchmark results to ensure significance.
|
|
|
|
|
|
|
|
The full list of available benchmarks is available on the Unladen Swallow wiki
|
|
|
|
[#us-benchmarks]_, including instructions on downloading and running the
|
|
|
|
benchmarks for yourself. All our benchmarks are open-source; none are
|
|
|
|
Google-proprietary. We believe this collection of benchmarks serves as a useful
|
|
|
|
tool to benchmark any complete Python implementation, and indeed, PyPy is
|
|
|
|
already using these benchmarks for their own performance testing
|
|
|
|
[#pypy-bmarks]_, [#us-wider-perf-issue]_. We welcome this, and we seek
|
|
|
|
additional workloads for the benchmark suite from the Python community.
|
|
|
|
|
|
|
|
We have focused our efforts on collecting macrobenchmarks and benchmarks that
|
|
|
|
simulate real applications as well as possible, when running a whole application
|
|
|
|
is not feasible. Along a different axis, our benchmark collection originally
|
|
|
|
focused on the kinds of workloads seen by Google's Python code (webapps, text
|
|
|
|
processing), though we have since expanded the collection to include workloads
|
|
|
|
Google cares nothing about. We have so far shied away from heavily-numerical
|
|
|
|
workloads, since NumPy [#numpy]_ already does an excellent job on such code and
|
|
|
|
so improving numerical performance was not an initial high priority for the
|
|
|
|
team; we have begun to incorporate such benchmarks into the collection
|
|
|
|
[#us-nbody]_ and have started work on optimizing numerical Python code.
|
|
|
|
|
|
|
|
Beyond these benchmarks, there are also a variety of workloads we are explicitly
|
|
|
|
not interested in benchmarking. Unladen Swallow is focused on improving the
|
|
|
|
performance of pure-Python code, so the performance of extension modules like
|
|
|
|
NumPy is uninteresting since NumPy's core routines are implemented in
|
|
|
|
C. Similarly, workloads that involve a lot of IO like GUIs, databases or
|
|
|
|
socket-heavy applications would, we feel, fail to accurately measure interpreter
|
|
|
|
or code generation optimizations. That said, there's certainly room to improve
|
|
|
|
the performance of C-language extensions modules in the standard library, and
|
|
|
|
as such, we have added benchmarks for the ``cPickle`` and ``re`` modules.
|
|
|
|
|
|
|
|
|
|
|
|
Performance vs CPython 2.6.4
|
|
|
|
----------------------------
|
|
|
|
|
|
|
|
The charts below compare the arithmetic mean of multiple benchmark iterations
|
|
|
|
for CPython 2.6.4 and Unladen Swallow. ``perf.py`` gathers more data than this,
|
|
|
|
and indeed, arithmetic mean is not the whole story; we reproduce only the mean
|
|
|
|
for the sake of conciseness. We include the ``t`` score from the Student's
|
|
|
|
two-tailed T-test [#students-t-test]_ at the 95% confidence interval to indicate
|
|
|
|
the significance of the result. Most benchmarks are run for 100 iterations,
|
|
|
|
though some longer-running whole-application benchmarks are run for fewer
|
|
|
|
iterations.
|
|
|
|
|
|
|
|
A description of each of these benchmarks is available on the Unladen Swallow
|
|
|
|
wiki [#us-benchmarks]_.
|
|
|
|
|
|
|
|
Command:
|
|
|
|
::
|
|
|
|
|
|
|
|
./perf.py -r -b default,apps ../a/python ../b/python
|
|
|
|
|
|
|
|
|
|
|
|
32-bit; gcc 4.0.3; Ubuntu Dapper; Intel Core2 Duo 6600 @ 2.4GHz; 2 cores; 4MB L2 cache; 4GB RAM
|
|
|
|
|
|
|
|
+--------------+---------------+----------------------+--------------+---------------+----------------------------+
|
|
|
|
| Benchmark | CPython 2.6.4 | Unladen Swallow r988 | Change | Significance | Timeline |
|
|
|
|
+==============+===============+======================+==============+===============+============================+
|
|
|
|
| 2to3 | 25.13 s | 24.87 s | 1.01x faster | t=8.94 | http://tinyurl.com/yamhrpg |
|
|
|
|
+--------------+---------------+----------------------+--------------+---------------+----------------------------+
|
|
|
|
| django | 1.08 s | 0.80 s | 1.35x faster | t=315.59 | http://tinyurl.com/y9mrn8s |
|
|
|
|
+--------------+---------------+----------------------+--------------+---------------+----------------------------+
|
|
|
|
| html5lib | 14.29 s | 13.20 s | 1.08x faster | t=2.17 | http://tinyurl.com/y8tyslu |
|
|
|
|
+--------------+---------------+----------------------+--------------+---------------+----------------------------+
|
|
|
|
| nbody | 0.51 s | 0.28 s | 1.84x faster | t=78.007 | http://tinyurl.com/y989qhg |
|
|
|
|
+--------------+---------------+----------------------+--------------+---------------+----------------------------+
|
|
|
|
| rietveld | 0.75 s | 0.55 s | 1.37x faster | Insignificant | http://tinyurl.com/ye7mqd3 |
|
|
|
|
+--------------+---------------+----------------------+--------------+---------------+----------------------------+
|
|
|
|
| slowpickle | 0.75 s | 0.55 s | 1.37x faster | t=20.78 | http://tinyurl.com/ybrsfnd |
|
|
|
|
+--------------+---------------+----------------------+--------------+---------------+----------------------------+
|
|
|
|
| slowspitfire | 0.83 s | 0.61 s | 1.36x faster | t=2124.66 | http://tinyurl.com/yfknhaw |
|
|
|
|
+--------------+---------------+----------------------+--------------+---------------+----------------------------+
|
|
|
|
| slowunpickle | 0.33 s | 0.26 s | 1.26x faster | t=15.12 | http://tinyurl.com/yzlakoo |
|
|
|
|
+--------------+---------------+----------------------+--------------+---------------+----------------------------+
|
|
|
|
| spambayes | 0.31 s | 0.34 s | 1.10x slower | Insignificant | http://tinyurl.com/yem62ub |
|
|
|
|
+--------------+---------------+----------------------+--------------+---------------+----------------------------+
|
|
|
|
|
|
|
|
|
|
|
|
64-bit; gcc 4.2.4; Ubuntu Hardy; AMD Opteron 8214 HE @ 2.2 GHz; 4 cores; 1MB L2 cache; 8GB RAM
|
|
|
|
|
|
|
|
+--------------+---------------+----------------------+--------------+---------------+----------------------------+
|
|
|
|
| Benchmark | CPython 2.6.4 | Unladen Swallow r988 | Change | Significance | Timeline |
|
|
|
|
+==============+===============+======================+==============+===============+============================+
|
|
|
|
| 2to3 | 31.98 s | 30.41 s | 1.05x faster | t=8.35 | http://tinyurl.com/ybcrl3b |
|
|
|
|
+--------------+---------------+----------------------+--------------+---------------+----------------------------+
|
|
|
|
| django | 1.22 s | 0.94 s | 1.30x faster | t=106.68 | http://tinyurl.com/ybwqll6 |
|
|
|
|
+--------------+---------------+----------------------+--------------+---------------+----------------------------+
|
|
|
|
| html5lib | 18.97 s | 17.79 s | 1.06x faster | t=2.78 | http://tinyurl.com/yzlyqvk |
|
|
|
|
+--------------+---------------+----------------------+--------------+---------------+----------------------------+
|
|
|
|
| nbody | 0.77 s | 0.27 s | 2.86x faster | t=133.49 | http://tinyurl.com/yeyqhbg |
|
|
|
|
+--------------+---------------+----------------------+--------------+---------------+----------------------------+
|
|
|
|
| rietveld | 0.74 s | 0.80 s | 1.08x slower | t=-2.45 | http://tinyurl.com/yzjc6ff |
|
|
|
|
+--------------+---------------+----------------------+--------------+---------------+----------------------------+
|
|
|
|
| slowpickle | 0.91 s | 0.62 s | 1.48x faster | t=28.04 | http://tinyurl.com/yf7en6k |
|
|
|
|
+--------------+---------------+----------------------+--------------+---------------+----------------------------+
|
|
|
|
| slowspitfire | 1.01 s | 0.72 s | 1.40x faster | t=98.70 | http://tinyurl.com/yc8pe2o |
|
|
|
|
+--------------+---------------+----------------------+--------------+---------------+----------------------------+
|
|
|
|
| slowunpickle | 0.51 s | 0.34 s | 1.51x faster | t=32.65 | http://tinyurl.com/yjufu4j |
|
|
|
|
+--------------+---------------+----------------------+--------------+---------------+----------------------------+
|
|
|
|
| spambayes | 0.43 s | 0.45 s | 1.06x slower | Insignificant | http://tinyurl.com/yztbjfp |
|
|
|
|
+--------------+---------------+----------------------+--------------+---------------+----------------------------+
|
|
|
|
|
|
|
|
|
|
|
|
Many of these benchmarks take a hit under Unladen Swallow because the current
|
|
|
|
version blocks execution to compile Python functions down to machine code. This
|
|
|
|
leads to the behaviour seen in the timeline graphs for the ``html5lib`` and
|
|
|
|
``rietveld`` benchmarks, for example, and slows down the overall performance of
|
|
|
|
``2to3``. We have an active development branch to fix this problem
|
|
|
|
([#us-background-thread]_, [#us-background-thread-issue]_), but working within
|
|
|
|
the strictures of CPython's current threading system has complicated the process
|
|
|
|
and required far more care and time than originally anticipated. We view this
|
|
|
|
issue as critical to final merger into the ``py3k`` branch.
|
|
|
|
|
|
|
|
We have obviously not met our initial goal of a 5x performance improvement. A
|
|
|
|
`performance retrospective`_ follows, which addresses why we failed to meet our
|
|
|
|
initial performance goal. We maintain a list of yet-to-be-implemented
|
|
|
|
performance work [#us-perf-punchlist]_.
|
|
|
|
|
|
|
|
|
|
|
|
Memory Usage
|
|
|
|
------------
|
|
|
|
|
|
|
|
The following table shows maximum memory usage (in kilobytes) for each of
|
|
|
|
Unladen Swallow's default benchmarks for both CPython 2.6.4 and Unladen Swallow
|
|
|
|
r988, as well as a timeline of memory usage across the lifetime of the
|
|
|
|
benchmark. We include tables for both 32- and 64-bit binaries. Memory usage was
|
|
|
|
measured on Linux 2.6 systems by summing the ``Private_`` sections from the
|
|
|
|
kernel's ``/proc/$pid/smaps`` pseudo-files [#smaps]_.
|
|
|
|
|
|
|
|
Command:
|
|
|
|
|
|
|
|
::
|
|
|
|
|
|
|
|
./perf.py -r --track_memory -b default,apps ../a/python ../b/python
|
|
|
|
|
|
|
|
|
|
|
|
32-bit
|
|
|
|
|
|
|
|
+--------------+---------------+----------------------+--------+----------------------------+
|
|
|
|
| Benchmark | CPython 2.6.4 | Unladen Swallow r988 | Change | Timeline |
|
|
|
|
+==============+===============+======================+========+============================+
|
|
|
|
| 2to3 | 26396 kb | 46896 kb | 1.77x | http://tinyurl.com/yhr2h4z |
|
|
|
|
+--------------+---------------+----------------------+--------+----------------------------+
|
|
|
|
| django | 10028 kb | 27740 kb | 2.76x | http://tinyurl.com/yhan8vs |
|
|
|
|
+--------------+---------------+----------------------+--------+----------------------------+
|
|
|
|
| html5lib | 150028 kb | 173924 kb | 1.15x | http://tinyurl.com/ybt44en |
|
|
|
|
+--------------+---------------+----------------------+--------+----------------------------+
|
|
|
|
| nbody | 3020 kb | 16036 kb | 5.31x | http://tinyurl.com/ya8hltw |
|
|
|
|
+--------------+---------------+----------------------+--------+----------------------------+
|
|
|
|
| rietveld | 15008 kb | 46400 kb | 3.09x | http://tinyurl.com/yhd5dra |
|
|
|
|
+--------------+---------------+----------------------+--------+----------------------------+
|
|
|
|
| slowpickle | 4608 kb | 16656 kb | 3.61x | http://tinyurl.com/ybukyvo |
|
|
|
|
+--------------+---------------+----------------------+--------+----------------------------+
|
|
|
|
| slowspitfire | 85776 kb | 97620 kb | 1.13x | http://tinyurl.com/y9vj35z |
|
|
|
|
+--------------+---------------+----------------------+--------+----------------------------+
|
|
|
|
| slowunpickle | 3448 kb | 13744 kb | 3.98x | http://tinyurl.com/yexh4d5 |
|
|
|
|
+--------------+---------------+----------------------+--------+----------------------------+
|
|
|
|
| spambayes | 7352 kb | 46480 kb | 6.32x | http://tinyurl.com/yem62ub |
|
|
|
|
+--------------+---------------+----------------------+--------+----------------------------+
|
|
|
|
|
|
|
|
|
|
|
|
64-bit
|
|
|
|
|
|
|
|
+--------------+---------------+----------------------+--------+----------------------------+
|
|
|
|
| Benchmark | CPython 2.6.4 | Unladen Swallow r988 | Change | Timeline |
|
|
|
|
+==============+===============+======================+========+============================+
|
|
|
|
| 2to3 | 51596 kb | 82340 kb | 1.59x | http://tinyurl.com/yljg6rs |
|
|
|
|
+--------------+---------------+----------------------+--------+----------------------------+
|
|
|
|
| django | 16020 kb | 38908 kb | 2.43x | http://tinyurl.com/ylqsebh |
|
|
|
|
+--------------+---------------+----------------------+--------+----------------------------+
|
|
|
|
| html5lib | 259232 kb | 324968 kb | 1.25x | http://tinyurl.com/yha6oee |
|
|
|
|
+--------------+---------------+----------------------+--------+----------------------------+
|
|
|
|
| nbody | 4296 kb | 23012 kb | 5.35x | http://tinyurl.com/yztozza |
|
|
|
|
+--------------+---------------+----------------------+--------+----------------------------+
|
|
|
|
| rietveld | 24140 kb | 73960 kb | 3.06x | http://tinyurl.com/ybg2nq7 |
|
|
|
|
+--------------+---------------+----------------------+--------+----------------------------+
|
|
|
|
| slowpickle | 4928 kb | 23300 kb | 4.73x | http://tinyurl.com/yk5tpbr |
|
|
|
|
+--------------+---------------+----------------------+--------+----------------------------+
|
|
|
|
| slowspitfire | 133276 kb | 148676 kb | 1.11x | http://tinyurl.com/y8bz2xe |
|
|
|
|
+--------------+---------------+----------------------+--------+----------------------------+
|
|
|
|
| slowunpickle | 4896 kb | 16948 kb | 3.46x | http://tinyurl.com/ygywwoc |
|
|
|
|
+--------------+---------------+----------------------+--------+----------------------------+
|
|
|
|
| spambayes | 10728 kb | 84992 kb | 7.92x | http://tinyurl.com/yhjban5 |
|
|
|
|
+--------------+---------------+----------------------+--------+----------------------------+
|
|
|
|
|
|
|
|
|
|
|
|
The increased memory usage comes from a) LLVM code generation, analysis and
|
|
|
|
optimization libraries; b) native code; c) memory usage issues or leaks in
|
|
|
|
LLVM; d) data structures needed to optimize and generate machine code; e)
|
|
|
|
as-yet uncategorized other sources.
|
|
|
|
|
|
|
|
While we have made significant progress in reducing memory usage since the
|
|
|
|
initial naive JIT implementation [#us-memory-issue]_, there is obviously more
|
|
|
|
to do. We believe that there are still memory savings to be made without
|
|
|
|
sacrificing performance. We have tended to focus on raw performance, and we
|
|
|
|
have not yet made a concerted push to reduce memory usage. We view reducing
|
|
|
|
memory usage as a blocking issue for final merger into the ``py3k`` branch. We
|
|
|
|
seek guidance from the community on an acceptable level of increased memory
|
|
|
|
usage.
|
|
|
|
|
|
|
|
|
|
|
|
Start-up Time
|
|
|
|
-------------
|
|
|
|
|
|
|
|
Statically linking LLVM's code generation, analysis and optimization libraries
|
|
|
|
increases the time needed to start the Python binary. C++ static initializers
|
|
|
|
used by LLVM also increase start-up time, as does importing the collection of
|
|
|
|
pre-compiled C runtime routines we want to inline to Python code.
|
|
|
|
|
|
|
|
Results from Unladen Swallow's ``startup`` benchmarks:
|
|
|
|
|
|
|
|
::
|
|
|
|
|
|
|
|
$ ./perf.py -r -b startup /tmp/cpy-26/bin/python /tmp/unladen/bin/python
|
|
|
|
|
|
|
|
### normal_startup ###
|
|
|
|
Min: 0.219186 -> 0.352075: 1.6063x slower
|
|
|
|
Avg: 0.227228 -> 0.364384: 1.6036x slower
|
|
|
|
Significant (t=-51.879098, a=0.95)
|
|
|
|
Stddev: 0.00762 -> 0.02532: 3.3227x larger
|
|
|
|
Timeline: http://tinyurl.com/yfe8z3r
|
|
|
|
|
|
|
|
### startup_nosite ###
|
|
|
|
Min: 0.105949 -> 0.264912: 2.5004x slower
|
|
|
|
Avg: 0.107574 -> 0.267505: 2.4867x slower
|
|
|
|
Significant (t=-703.557403, a=0.95)
|
|
|
|
Stddev: 0.00214 -> 0.00240: 1.1209x larger
|
|
|
|
Timeline: http://tinyurl.com/yajn8fa
|
|
|
|
|
2010-02-08 21:51:26 -05:00
|
|
|
### bzr_startup ###
|
|
|
|
Min: 0.067990 -> 0.097985: 1.4412x slower
|
|
|
|
Avg: 0.084322 -> 0.111348: 1.3205x slower
|
|
|
|
Significant (t=-37.432534, a=0.95)
|
|
|
|
Stddev: 0.00793 -> 0.00643: 1.2330x smaller
|
|
|
|
Timeline: http://tinyurl.com/ybdm537
|
|
|
|
|
|
|
|
### hg_startup ###
|
|
|
|
Min: 0.016997 -> 0.024997: 1.4707x slower
|
|
|
|
Avg: 0.026990 -> 0.036772: 1.3625x slower
|
|
|
|
Significant (t=-53.104502, a=0.95)
|
|
|
|
Stddev: 0.00406 -> 0.00417: 1.0273x larger
|
|
|
|
Timeline: http://tinyurl.com/ycout8m
|
|
|
|
|
|
|
|
|
|
|
|
``bzr_startup`` and ``hg_startup`` measure how long it takes Bazaar and
|
|
|
|
Mercurial, respectively, to display their help screens. ``startup_nosite``
|
|
|
|
runs ``python -S`` many times; usage of the ``-S`` option is rare, but we feel
|
|
|
|
this gives a good indication of where increased startup time is coming from.
|
2010-01-20 17:08:04 -05:00
|
|
|
|
|
|
|
Unladen Swallow has made headway toward optimizing startup time, but there is
|
|
|
|
still more work to do and further optimizations to implement. Improving start-up
|
|
|
|
time is a high-priority item [#us-issue-startup-time]_ in Unladen Swallow's
|
|
|
|
merger punchlist.
|
|
|
|
|
|
|
|
|
|
|
|
Binary Size
|
|
|
|
-----------
|
|
|
|
|
|
|
|
Statically linking LLVM's code generation, analysis and optimization libraries
|
2010-02-08 21:51:26 -05:00
|
|
|
significantly increases the size of the ``python`` binary. The tables below
|
|
|
|
report stripped on-disk binary sizes; the binaries are stripped to better
|
|
|
|
correspond with the configurations used by system package managers. We feel this
|
|
|
|
is the most realistic measure of any change in binary size.
|
2010-01-20 17:08:04 -05:00
|
|
|
|
|
|
|
|
2010-02-08 21:51:26 -05:00
|
|
|
+-------------+---------------+---------------+-----------------------+
|
|
|
|
| Binary size | CPython 2.6.4 | CPython 3.1.1 | Unladen Swallow r1041 |
|
|
|
|
+=============+===============+===============+=======================+
|
|
|
|
| 32-bit | 1.3M | 1.4M | 12M |
|
|
|
|
+-------------+---------------+---------------+-----------------------+
|
|
|
|
| 64-bit | 1.6M | 1.6M | 12M |
|
|
|
|
+-------------+---------------+---------------+-----------------------+
|
2010-01-20 17:08:04 -05:00
|
|
|
|
|
|
|
|
2010-02-08 21:51:26 -05:00
|
|
|
The increased binary size is caused by statically linking LLVM's code
|
|
|
|
generation, analysis and optimization libraries into the ``python`` binary.
|
|
|
|
This can be straightforwardly addressed by modifying LLVM to better support
|
|
|
|
shared linking and then using that, instead of the current static linking. For
|
|
|
|
the moment, though, static linking provides an accurate look at the cost of
|
|
|
|
linking against LLVM.
|
2010-01-20 17:08:04 -05:00
|
|
|
|
2010-02-08 21:51:26 -05:00
|
|
|
Even when statically linking, we believe there is still headroom to improve
|
|
|
|
on-disk binary size by narrowing Unladen Swallow's dependencies on LLVM. This
|
|
|
|
issue is actively being addressed [#us-binary-size]_.
|
2010-01-20 17:08:04 -05:00
|
|
|
|
|
|
|
|
|
|
|
Performance Retrospective
|
|
|
|
-------------------------
|
|
|
|
|
|
|
|
Our initial goal for Unladen Swallow was a 5x performance improvement over
|
|
|
|
CPython 2.6. We did not hit that, nor to put it bluntly, even come close. Why
|
|
|
|
did the project not hit that goal, and can an LLVM-based JIT ever hit that goal?
|
|
|
|
|
|
|
|
Why did Unladen Swallow not achieve its 5x goal? The primary reason was
|
|
|
|
that LLVM required more work than we had initially anticipated. Based on the
|
|
|
|
fact that Apple was shipping products based on LLVM [#llvm-users]_, and
|
|
|
|
other high-level languages had successfully implemented LLVM-based JITs
|
|
|
|
([#rubinius]_, [#macruby]_, [#hlvm]_), we had assumed that LLVM's JIT was
|
|
|
|
relatively free of show-stopper bugs.
|
|
|
|
|
|
|
|
That turned out to be incorrect. We had to turn our attention away from
|
|
|
|
performance to fix a number of critical bugs in LLVM's JIT infrastructure (for
|
|
|
|
example, [#llvm-far-call-issue]_, [#llvm-jmm-rev]_) as well as a number of
|
|
|
|
nice-to-have enhancements that would enable further optimizations along various
|
|
|
|
axes (for example, [#llvm-globaldce-rev]_,
|
|
|
|
[#llvm-memleak-rev]_, [#llvm-availext-issue]_). LLVM's static code generation
|
|
|
|
facilities, tools and optimization passes are stable and stress-tested, but the
|
|
|
|
just-in-time infrastructure was relatively untested and buggy. We have fixed
|
|
|
|
this.
|
|
|
|
|
|
|
|
(Our hypothesis is that we hit these problems -- problems other projects had
|
|
|
|
avoided -- because of the complexity and thoroughness of CPython's standard
|
|
|
|
library test suite.)
|
|
|
|
|
|
|
|
We also diverted engineering effort away from performance and into support tools
|
|
|
|
such as gdb and oProfile. gdb did not work well with JIT compilers at all, and
|
|
|
|
LLVM previously had no integration with oProfile. Having JIT-aware debuggers and
|
|
|
|
profilers has been very valuable to the project, and we do not regret
|
|
|
|
channeling our time in these directions. See the `Debugging`_ and `Profiling`_
|
|
|
|
sections for more information.
|
|
|
|
|
|
|
|
Can an LLVM-based CPython JIT ever hit the 5x performance target? The benchmark
|
|
|
|
results for JIT-based JavaScript implementations suggest that 5x is indeed
|
|
|
|
possible, as do the results PyPy's JIT has delivered for numeric workloads. The
|
|
|
|
experience of Self-92 [#urs-self]_ is also instructive.
|
|
|
|
|
|
|
|
Can LLVM deliver this? We believe that we have only begun to scratch the surface
|
|
|
|
of what our LLVM-based JIT can deliver. The optimizations we have incorporated
|
|
|
|
into this system thus far have borne significant fruit (for example,
|
|
|
|
[#us-specialization-issue]_, [#us-direct-calling-issue]_,
|
|
|
|
[#us-fast-globals-issue]_). Our experience to date is that the limiting factor
|
|
|
|
on Unladen Swallow's performance is the engineering cycles needed to implement
|
|
|
|
the literature. We have found LLVM easy to work with and to modify, and its
|
|
|
|
built-in optimizations have greatly simplified the task of implementing
|
|
|
|
Python-level optimizations.
|
|
|
|
|
|
|
|
An overview of further performance opportunities is discussed in the
|
|
|
|
`Future Work`_ section.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Correctness and Compatibility
|
|
|
|
=============================
|
|
|
|
|
|
|
|
Unladen Swallow's correctness test suite includes CPython's test suite (under
|
|
|
|
``Lib/test/``), as well as a number of important third-party applications and
|
|
|
|
libraries [#tested-apps]_. A full list of these applications and libraries is
|
|
|
|
reproduced below. Any dependencies needed by these packages, such as
|
|
|
|
``zope.interface`` [#zope-interface]_, are also tested indirectly as a part of
|
|
|
|
testing the primary package, thus widening the corpus of tested third-party
|
|
|
|
Python code.
|
|
|
|
|
|
|
|
- 2to3
|
|
|
|
- Cheetah
|
|
|
|
- cvs2svn
|
|
|
|
- Django
|
|
|
|
- Nose
|
|
|
|
- NumPy
|
|
|
|
- PyCrypto
|
|
|
|
- pyOpenSSL
|
|
|
|
- PyXML
|
|
|
|
- Setuptools
|
|
|
|
- SQLAlchemy
|
|
|
|
- SWIG
|
|
|
|
- SymPy
|
|
|
|
- Twisted
|
|
|
|
- ZODB
|
|
|
|
|
|
|
|
These applications pass all relevant tests when run under Unladen Swallow. Note
|
|
|
|
that some tests that failed against our baseline of CPython 2.6.4 were disabled,
|
|
|
|
as were tests that made assumptions about CPython internals such as exact
|
|
|
|
bytecode numbers or bytecode format. Any package with disabled tests includes
|
|
|
|
a ``README.unladen`` file that details the changes (for example,
|
|
|
|
[#us-sqlalchemy-readme]_).
|
|
|
|
|
|
|
|
In addition, Unladen Swallow is tested automatically against an array of
|
|
|
|
internal Google Python libraries and applications. These include Google's
|
|
|
|
internal Python bindings for BigTable [#bigtable]_, the Mondrian code review
|
|
|
|
application [#mondrian]_, and Google's Python standard library, among others.
|
|
|
|
The changes needed to run these projects under Unladen Swallow have consistently
|
|
|
|
broken into one of three camps:
|
|
|
|
|
|
|
|
- Adding CPython 2.6 C API compatibility. Since Google still primarily uses
|
|
|
|
CPython 2.4 internally, we have needed to convert uses of ``int`` to
|
|
|
|
``Py_ssize_t`` and similar API changes.
|
|
|
|
- Fixing or disabling explicit, incorrect tests of the CPython version number.
|
|
|
|
- Conditionally disabling code that worked around or depending on bugs in
|
|
|
|
CPython 2.4 that have since been fixed.
|
|
|
|
|
|
|
|
Testing against this wide range of public and proprietary applications and
|
|
|
|
libraries has been instrumental in ensuring the correctness of Unladen Swallow.
|
|
|
|
Testing has exposed bugs that we have duly corrected. Our automated regression
|
|
|
|
testing regime has given us high confidence in our changes as we have moved
|
|
|
|
forward.
|
|
|
|
|
|
|
|
In addition to third-party testing, we have added further tests to CPython's
|
|
|
|
test suite for corner cases of the language or implementation that we felt were
|
|
|
|
untested or underspecified (for example, [#us-import-tests]_,
|
|
|
|
[#us-tracing-tests]_). These have been especially important when implementing
|
|
|
|
optimizations, helping make sure we have not accidentally broken the darker
|
|
|
|
corners of Python.
|
|
|
|
|
|
|
|
We have also constructed a test suite focused solely on the LLVM-based JIT
|
|
|
|
compiler and the optimizations implemented for it [#us-test_llvm]_. Because of
|
|
|
|
the complexity and subtlety inherent in writing an optimizing compiler, we have
|
|
|
|
attempted to exhaustively enumerate the constructs, scenarios and corner cases
|
|
|
|
we are compiling and optimizing. The JIT tests also include tests for things
|
|
|
|
like the JIT hotness model, making it easier for future CPython developers to
|
|
|
|
maintain and improve.
|
|
|
|
|
|
|
|
We have recently begun using fuzz testing [#fuzz-testing]_ to stress-test the
|
|
|
|
compiler. We have used both pyfuzz [#pyfuzz]_ and Fusil [#fusil]_ in the past,
|
|
|
|
and we recommend they be introduced as an automated part of the CPython testing
|
|
|
|
process.
|
|
|
|
|
|
|
|
Known Incompatibilities
|
|
|
|
-----------------------
|
|
|
|
|
|
|
|
The only application or library we know to not work with Unladen Swallow that
|
|
|
|
does work with CPython 2.6.4 is Psyco [#psyco]_. We are aware of some libraries
|
|
|
|
such as PyGame [#pygame]_ that work well with CPython 2.6.4, but suffer some
|
|
|
|
degradation due to changes made in Unladen Swallow. We are tracking this issue
|
|
|
|
[#us-background-thread-issue]_ and are working to resolve these instances of
|
|
|
|
degradation.
|
|
|
|
|
|
|
|
While Unladen Swallow is source-compatible with CPython 2.6.4, it is not
|
|
|
|
binary compatible. C extension modules compiled against one will need to be
|
|
|
|
recompiled to work with the other.
|
|
|
|
|
2010-02-20 17:02:19 -05:00
|
|
|
The merger of Unladen Swallow should have minimal impact on long-lived
|
|
|
|
CPython optimization branches like WPython. WPython [#wpython]_ and Unladen
|
|
|
|
Swallow are largely orthogonal, and there is no technical reason why both
|
|
|
|
could not be merged into CPython. The changes needed to make WPython
|
|
|
|
compatible with a JIT-enhanced version of CPython should be minimal
|
|
|
|
[#us-wpython-compat]_. The same should be true for other CPython optimization
|
|
|
|
projects (for example, [#asher-rotem]_).
|
|
|
|
|
|
|
|
Invasive forks of CPython such as Stackless Python [#stackless]_ are more
|
|
|
|
challenging to support. Since Stackless is highly unlikely to be merged into
|
|
|
|
CPython [#stackless-merger]_ and an increased maintenance burden is part and
|
|
|
|
parcel of any fork, we consider compatibility with Stackless to be relatively
|
|
|
|
low-priority. JIT-compiled stack frames use the C stack, so Stackless should
|
|
|
|
be able to treat them the same as it treats calls through extension modules.
|
|
|
|
If that turns out to be unacceptable, Stackless could either remove the JIT
|
|
|
|
compiler or improve JIT code generation to better support heap-based stack
|
|
|
|
frames [#llvm-heap-frames]_, [#llvm-heap-frames-disc]_.
|
|
|
|
|
2010-01-20 17:08:04 -05:00
|
|
|
|
|
|
|
Platform Support
|
|
|
|
================
|
|
|
|
|
|
|
|
Unladen Swallow is inherently limited by the platform support provided by LLVM,
|
|
|
|
especially LLVM's JIT compilation system [#llvm-hardware]_. LLVM's JIT has the
|
|
|
|
best support on x86 and x86-64 systems, and these are the platforms where
|
|
|
|
Unladen Swallow has received the most testing. We are confident in LLVM/Unladen
|
|
|
|
Swallow's support for x86 and x86-64 hardware. PPC and ARM support exists, but
|
2010-02-08 21:51:26 -05:00
|
|
|
is not widely used and may be buggy (for example, [#llvm-ppc-eager-jit-issue]_,
|
|
|
|
[#llvm-far-call-issue]_, [#llvm-arm-jit-issue]_).
|
2010-01-20 17:08:04 -05:00
|
|
|
|
|
|
|
Unladen Swallow is known to work on the following operating systems: Linux,
|
|
|
|
Darwin, Windows. Unladen Swallow has received the most testing on Linux and
|
|
|
|
Darwin, though it still builds and passes its tests on Windows.
|
|
|
|
|
|
|
|
In order to support hardware and software platforms where LLVM's JIT does not
|
|
|
|
work, Unladen Swallow provides a ``./configure --without-llvm`` option. This
|
|
|
|
flag carves out any part of Unladen Swallow that depends on LLVM, yielding a
|
|
|
|
Python binary that works and passes its tests, but has no performance
|
|
|
|
advantages. This configuration is recommended for hardware unsupported by LLVM,
|
|
|
|
or systems that care more about memory usage than performance.
|
|
|
|
|
|
|
|
|
|
|
|
Impact on CPython Development
|
|
|
|
=============================
|
|
|
|
|
|
|
|
Experimenting with Changes to Python or CPython Bytecode
|
|
|
|
--------------------------------------------------------
|
|
|
|
|
|
|
|
Unladen Swallow's JIT compiler operates on CPython bytecode, and as such, it is
|
2010-02-08 21:51:26 -05:00
|
|
|
immune to Python language changes that affect only the parser.
|
2010-01-20 17:08:04 -05:00
|
|
|
|
|
|
|
We recommend that changes to the CPython bytecode compiler or the semantics of
|
|
|
|
individual bytecodes be prototyped in the interpreter loop first, then be ported
|
|
|
|
to the JIT compiler once the semantics are clear. To make this easier, Unladen
|
|
|
|
Swallow includes a ``--without-llvm`` configure-time option that strips out the
|
|
|
|
JIT compiler and all associated infrastructure. This leaves the current burden
|
|
|
|
of experimentation unchanged so that developers can prototype in the current
|
|
|
|
low-barrier-to-entry interpreter loop.
|
|
|
|
|
|
|
|
Unladen Swallow began implementing its JIT compiler by doing straightforward,
|
|
|
|
naive translations from bytecode implementations into LLVM API calls. We found
|
|
|
|
this process to be easily understood, and we recommend the same approach for
|
|
|
|
CPython. We include several sample changes from the Unladen Swallow repository
|
|
|
|
here as examples of this style of development: [#us-r359]_, [#us-r376]_,
|
|
|
|
[#us-r417]_, [#us-r517]_.
|
|
|
|
|
|
|
|
|
|
|
|
Debugging
|
|
|
|
---------
|
|
|
|
|
|
|
|
The Unladen Swallow team implemented changes to gdb to make it easier to use gdb
|
|
|
|
to debug JIT-compiled Python code. These changes were released in gdb 7.0
|
|
|
|
[#gdb70]_. They make it possible for gdb to identify and unwind past
|
|
|
|
JIT-generated call stack frames. This allows gdb to continue to function as
|
|
|
|
before for CPython development if one is changing, for example, the ``list``
|
|
|
|
type or builtin functions.
|
|
|
|
|
|
|
|
Example backtrace after our changes, where ``baz``, ``bar`` and ``foo`` are
|
|
|
|
JIT-compiled:
|
|
|
|
|
|
|
|
::
|
|
|
|
|
|
|
|
Program received signal SIGSEGV, Segmentation fault.
|
|
|
|
0x00002aaaabe7d1a8 in baz ()
|
|
|
|
(gdb) bt
|
|
|
|
#0 0x00002aaaabe7d1a8 in baz ()
|
|
|
|
#1 0x00002aaaabe7d12c in bar ()
|
|
|
|
#2 0x00002aaaabe7d0aa in foo ()
|
|
|
|
#3 0x00002aaaabe7d02c in main ()
|
|
|
|
#4 0x0000000000b870a2 in llvm::JIT::runFunction (this=0x1405b70, F=0x14024e0, ArgValues=...)
|
|
|
|
at /home/rnk/llvm-gdb/lib/ExecutionEngine/JIT/JIT.cpp:395
|
|
|
|
#5 0x0000000000baa4c5 in llvm::ExecutionEngine::runFunctionAsMain
|
|
|
|
(this=0x1405b70, Fn=0x14024e0, argv=..., envp=0x7fffffffe3c0)
|
|
|
|
at /home/rnk/llvm-gdb/lib/ExecutionEngine/ExecutionEngine.cpp:377
|
|
|
|
#6 0x00000000007ebd52 in main (argc=2, argv=0x7fffffffe3a8,
|
|
|
|
envp=0x7fffffffe3c0) at /home/rnk/llvm-gdb/tools/lli/lli.cpp:208
|
|
|
|
|
|
|
|
Previously, the JIT-compiled frames would have caused gdb to unwind incorrectly,
|
|
|
|
generating lots of obviously-incorrect ``#6 0x00002aaaabe7d0aa in ?? ()``-style
|
|
|
|
stack frames.
|
|
|
|
|
|
|
|
Highlights:
|
|
|
|
|
|
|
|
- gdb 7.0 is able to correctly parse JIT-compiled stack frames, allowing full
|
|
|
|
use of gdb on non-JIT-compiled functions, that is, the vast majority of the
|
|
|
|
CPython codebase.
|
|
|
|
- Disassembling inside a JIT-compiled stack frame automatically prints the full
|
|
|
|
list of instructions making up that function. This is an advance over the
|
|
|
|
state of gdb before our work: developers needed to guess the starting address
|
|
|
|
of the function and manually disassemble the assembly code.
|
|
|
|
- Flexible underlying mechanism allows CPython to add more and more information,
|
|
|
|
and eventually reach parity with C/C++ support in gdb for JIT-compiled machine
|
|
|
|
code.
|
|
|
|
|
|
|
|
Lowlights:
|
|
|
|
|
|
|
|
- gdb cannot print local variables or tell you what line you're currently
|
|
|
|
executing inside a JIT-compiled function. Nor can it step through
|
|
|
|
JIT-compiled code, except for one instruction at a time.
|
|
|
|
- Not yet integrated with Apple's gdb or Microsoft's Visual Studio debuggers.
|
|
|
|
|
|
|
|
The Unladen Swallow team is working with Apple to get these changes
|
|
|
|
incorporated into their future gdb releases.
|
|
|
|
|
|
|
|
|
|
|
|
Profiling
|
|
|
|
---------
|
|
|
|
|
|
|
|
Unladen Swallow integrates with oProfile 0.9.4 and newer [#oprofile]_ to support
|
|
|
|
assembly-level profiling on Linux systems. This means that oProfile will
|
|
|
|
correctly symbolize JIT-compiled functions in its reports.
|
|
|
|
|
|
|
|
Example report, where the ``#u#``-prefixed symbol names are JIT-compiled Python
|
|
|
|
functions:
|
|
|
|
|
|
|
|
::
|
|
|
|
|
|
|
|
$ opreport -l ./python | less
|
|
|
|
CPU: Core 2, speed 1600 MHz (estimated)
|
|
|
|
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 100000
|
|
|
|
samples % image name symbol name
|
|
|
|
79589 4.2329 python PyString_FromFormatV
|
|
|
|
62971 3.3491 python PyEval_EvalCodeEx
|
|
|
|
62713 3.3354 python tupledealloc
|
|
|
|
57071 3.0353 python _PyEval_CallFunction
|
|
|
|
50009 2.6597 24532.jo #u#force_unicode
|
|
|
|
47468 2.5246 python PyUnicodeUCS2_Decode
|
|
|
|
45829 2.4374 python PyFrame_New
|
|
|
|
45173 2.4025 python lookdict_string
|
|
|
|
43082 2.2913 python PyType_IsSubtype
|
|
|
|
39763 2.1148 24532.jo #u#render5
|
|
|
|
38145 2.0287 python _PyType_Lookup
|
|
|
|
37643 2.0020 python PyObject_GC_UnTrack
|
|
|
|
37105 1.9734 python frame_dealloc
|
|
|
|
36849 1.9598 python PyEval_EvalFrame
|
|
|
|
35630 1.8950 24532.jo #u#resolve
|
|
|
|
33313 1.7717 python PyObject_IsInstance
|
|
|
|
33208 1.7662 python PyDict_GetItem
|
|
|
|
33168 1.7640 python PyTuple_New
|
|
|
|
30458 1.6199 python PyCFunction_NewEx
|
|
|
|
|
|
|
|
This support is functional, but as-yet unpolished. Unladen Swallow maintains a
|
|
|
|
punchlist of items we feel are important to improve in our oProfile integration
|
|
|
|
to make it more useful to core CPython developers [#us-oprofile-punchlist]_.
|
|
|
|
|
|
|
|
Highlights:
|
|
|
|
|
|
|
|
- Symbolization of JITted frames working in oProfile on Linux.
|
|
|
|
|
|
|
|
Lowlights:
|
|
|
|
|
|
|
|
- No work yet invested in improving symbolization of JIT-compiled frames for
|
|
|
|
Apple's Shark [#shark]_ or Microsoft's Visual Studio profiling tools.
|
|
|
|
- Some polishing still desired for oProfile output.
|
|
|
|
|
|
|
|
We recommend using oProfile 0.9.5 (and newer) to work around a now-fixed bug on
|
|
|
|
x86-64 platforms in oProfile. oProfile 0.9.4 will work fine on 32-bit platforms,
|
|
|
|
however.
|
|
|
|
|
|
|
|
Given the ease of integrating oProfile with LLVM [#llvm-oprofile-change]_ and
|
|
|
|
Unladen Swallow [#us-oprofile-change]_, other profiling tools should be easy as
|
|
|
|
well, provided they support a similar JIT interface [#oprofile-jit-interface]_.
|
|
|
|
|
2010-02-08 21:51:26 -05:00
|
|
|
We have documented the process for using oProfile to profile Unladen Swallow
|
|
|
|
[#oprofile-workflow]_. This document will be merged into CPython's `Doc/` tree
|
|
|
|
in the merge.
|
|
|
|
|
2010-01-20 17:08:04 -05:00
|
|
|
|
|
|
|
Addition of C++ to CPython
|
|
|
|
--------------------------
|
|
|
|
|
|
|
|
In order to use LLVM, Unladen Swallow has introduced C++ into the core CPython
|
|
|
|
tree and build process. This is an unavoidable part of depending on LLVM; though
|
|
|
|
LLVM offers a C API [#llvm-c-api]_, it is limited and does not expose the
|
|
|
|
functionality needed by CPython. Because of this, we have implemented the
|
|
|
|
internal details of the Unladen Swallow JIT and its supporting infrastructure
|
|
|
|
in C++. We do not propose converting the entire CPython codebase to C++.
|
|
|
|
|
|
|
|
Highlights:
|
|
|
|
|
|
|
|
- Easy use of LLVM's full, powerful code generation and related APIs.
|
|
|
|
- Convenient, abstract data structures simplify code.
|
|
|
|
- C++ is limited to relatively small corners of the CPython codebase.
|
2010-02-08 21:51:26 -05:00
|
|
|
- C++ can be disabled via ``./configure --without-llvm``, which even omits the
|
|
|
|
dependency on ``libstdc++``.
|
2010-01-20 17:08:04 -05:00
|
|
|
|
|
|
|
Lowlights:
|
|
|
|
|
|
|
|
- Developers must know two related languages, C and C++ to work on the full
|
|
|
|
range of CPython's internals.
|
2010-02-20 17:02:19 -05:00
|
|
|
- A C++ style guide will need to be developed and enforced. PEP 7 will be
|
|
|
|
extended [#pep7-cpp]_ to encompass C++ by taking the relevant parts of
|
|
|
|
the C++ style guides from Unladen Swallow [#us-styleguide]_, LLVM
|
|
|
|
[#llvm-styleguide]_ and Google [#google-styleguide]_.
|
2010-02-08 21:51:26 -05:00
|
|
|
- Different C++ compilers emit different ABIs; this can cause problems if
|
|
|
|
CPython is compiled with one C++ compiler and extensions modules are compiled
|
|
|
|
with a different C++ compiler.
|
2010-01-20 17:08:04 -05:00
|
|
|
|
|
|
|
|
|
|
|
Managing LLVM Releases, C++ API Changes
|
|
|
|
---------------------------------------
|
|
|
|
|
|
|
|
LLVM is released regularly every six months. This means that LLVM may be
|
|
|
|
released two or three times during the course of development of a CPython 3.x
|
|
|
|
release. Each LLVM release brings newer and more powerful optimizations,
|
|
|
|
improved platform support and more sophisticated code generation.
|
|
|
|
|
|
|
|
LLVM releases usually include incompatible changes to the LLVM C++ API; the
|
|
|
|
release notes for LLVM 2.6 [#llvm-26-whatsnew]_ include a list of
|
|
|
|
intentionally-introduced incompatibilities. Unladen Swallow has tracked LLVM
|
|
|
|
trunk closely over the course of development. Our experience has been
|
|
|
|
that LLVM API changes are obvious and easily or mechanically remedied. We
|
|
|
|
include two such changes from the Unladen Swallow tree as references here:
|
|
|
|
[#us-llvm-r820]_, [#us-llvm-r532]_.
|
|
|
|
|
|
|
|
Due to API incompatibilities, we recommend that an LLVM-based CPython target
|
|
|
|
compatibility with a single version of LLVM at a time. This will lower the
|
|
|
|
overhead on the core development team. Pegging to an LLVM version should not be
|
|
|
|
a problem from a packaging perspective, because pre-built LLVM packages
|
|
|
|
generally become available via standard system package managers fairly quickly
|
|
|
|
following an LLVM release, and failing that, llvm.org itself includes binary
|
|
|
|
releases.
|
|
|
|
|
2010-02-08 21:51:26 -05:00
|
|
|
Unladen Swallow has historically included a copy of the LLVM and Clang source
|
|
|
|
trees in the Unladen Swallow tree; this was done to allow us to closely track
|
|
|
|
LLVM trunk as we made patches to it. We do not recommend this model of
|
|
|
|
development for CPython. CPython releases should be based on official LLVM
|
|
|
|
releases. Pre-built LLVM packages are available from MacPorts [#llvm-macports]_
|
|
|
|
for Darwin, and from most major Linux distributions ([#llvm-ubuntu]_,
|
2010-01-20 17:08:04 -05:00
|
|
|
[#llvm-debian]_, [#llvm-fedora]_). LLVM itself provides additional binaries,
|
|
|
|
such as for MinGW [#llvm-mingw]_.
|
|
|
|
|
|
|
|
LLVM is currently intended to be statically linked; this means that binary
|
|
|
|
releases of CPython will include the relevant parts (not all!) of LLVM. This
|
2010-02-08 21:51:26 -05:00
|
|
|
will increase the binary size, as noted above. To simplify downstream package
|
|
|
|
management, we will modify LLVM to better support shared linking. This issue
|
|
|
|
will block final merger [#us-shared-link-issue]_.
|
2010-01-20 17:08:04 -05:00
|
|
|
|
|
|
|
Unladen Swallow has tasked a full-time engineer with fixing any remaining
|
2010-02-08 21:51:26 -05:00
|
|
|
critical issues in LLVM before LLVM's 2.7 release. We consider it essential that
|
|
|
|
CPython 3.x be able to depend on a released version of LLVM, rather than closely
|
|
|
|
tracking LLVM trunk as Unladen Swallow has done. We believe we will finish this
|
|
|
|
work [#us-llvm-punchlist]_ before the release of LLVM 2.7, expected in May 2010.
|
2010-01-20 17:08:04 -05:00
|
|
|
|
|
|
|
|
|
|
|
Building CPython
|
|
|
|
----------------
|
|
|
|
|
|
|
|
In addition to a runtime dependency on LLVM, Unladen Swallow includes a
|
|
|
|
build-time dependency on Clang [#clang]_, an LLVM-based C/C++ compiler. We use
|
|
|
|
this to compile parts of the C-language Python runtime to LLVM's intermediate
|
|
|
|
representation; this allows us to perform cross-language inlining, yielding
|
|
|
|
increased performance. Clang is not required to run Unladen Swallow. Clang
|
|
|
|
binary packages are available from most major Linux distributions (for example,
|
|
|
|
[#clang-debian]_).
|
|
|
|
|
|
|
|
We examined the impact of Unladen Swallow on the time needed to build Python,
|
|
|
|
including configure, full builds and incremental builds after touching a single
|
|
|
|
C source file.
|
|
|
|
|
|
|
|
+-------------+---------------+---------------+----------------------+
|
|
|
|
| ./configure | CPython 2.6.4 | CPython 3.1.1 | Unladen Swallow r988 |
|
|
|
|
+=============+===============+===============+======================+
|
|
|
|
| Run 1 | 0m20.795s | 0m16.558s | 0m15.477s |
|
|
|
|
+-------------+---------------+---------------+----------------------+
|
|
|
|
| Run 2 | 0m15.255s | 0m16.349s | 0m15.391s |
|
|
|
|
+-------------+---------------+---------------+----------------------+
|
|
|
|
| Run 3 | 0m15.228s | 0m16.299s | 0m15.528s |
|
|
|
|
+-------------+---------------+---------------+----------------------+
|
|
|
|
|
|
|
|
+-------------+---------------+---------------+----------------------+
|
|
|
|
| Full make | CPython 2.6.4 | CPython 3.1.1 | Unladen Swallow r988 |
|
|
|
|
+=============+===============+===============+======================+
|
|
|
|
| Run 1 | 1m30.776s | 1m22.367s | 1m54.053s |
|
|
|
|
+-------------+---------------+---------------+----------------------+
|
|
|
|
| Run 2 | 1m21.374s | 1m22.064s | 1m49.448s |
|
|
|
|
+-------------+---------------+---------------+----------------------+
|
|
|
|
| Run 3 | 1m22.047s | 1m23.645s | 1m49.305s |
|
|
|
|
+-------------+---------------+---------------+----------------------+
|
|
|
|
|
|
|
|
Full builds take a hit due to a) additional ``.cc`` files needed for LLVM
|
|
|
|
interaction, b) statically linking LLVM into ``libpython``, c) compiling parts
|
|
|
|
of the Python runtime to LLVM IR to enable cross-language inlining.
|
|
|
|
|
2010-02-08 21:51:26 -05:00
|
|
|
Incremental builds are also somewhat slower than mainline CPython. The table
|
|
|
|
below shows incremental rebuild times after touching ``Objects/listobject.c``.
|
2010-01-20 17:08:04 -05:00
|
|
|
|
2010-02-08 21:51:26 -05:00
|
|
|
+-------------+---------------+---------------+-----------------------+
|
|
|
|
| Incr make | CPython 2.6.4 | CPython 3.1.1 | Unladen Swallow r1024 |
|
|
|
|
+=============+===============+===============+=======================+
|
|
|
|
| Run 1 | 0m1.854s | 0m1.456s | 0m6.680s |
|
|
|
|
+-------------+---------------+---------------+-----------------------+
|
|
|
|
| Run 2 | 0m1.437s | 0m1.442s | 0m5.310s |
|
|
|
|
+-------------+---------------+---------------+-----------------------+
|
|
|
|
| Run 3 | 0m1.440s | 0m1.425s | 0m7.639s |
|
|
|
|
+-------------+---------------+---------------+-----------------------+
|
2010-01-20 17:08:04 -05:00
|
|
|
|
2010-02-08 21:51:26 -05:00
|
|
|
As with full builds, this extra time comes from statically linking LLVM
|
|
|
|
into ``libpython``. If ``libpython`` were linked shared against LLVM, this
|
|
|
|
overhead would go down.
|
2010-01-20 17:08:04 -05:00
|
|
|
|
|
|
|
|
|
|
|
Proposed Merge Plan
|
|
|
|
===================
|
|
|
|
|
|
|
|
We propose focusing our efforts on eventual merger with CPython's 3.x line of
|
|
|
|
development. The BDFL has indicated that 2.7 is to be the final release of
|
|
|
|
CPython's 2.x line of development [#bdfl-27-final]_, and since 2.7 alpha 1 has
|
|
|
|
already been released [#cpy-27a1]_, we have missed the window. Python 3 is the
|
|
|
|
future, and that is where we will target our performance efforts.
|
|
|
|
|
|
|
|
We recommend the following plan for merger of Unladen Swallow into the CPython
|
|
|
|
source tree:
|
|
|
|
|
|
|
|
- Creation of a branch in the CPython SVN repository to work in, call it
|
|
|
|
``py3k-jit`` as a strawman. This will be a branch of the CPython ``py3k``
|
|
|
|
branch.
|
|
|
|
- We will keep this branch closely integrated to ``py3k``. The further we
|
|
|
|
deviate, the harder our work will be.
|
|
|
|
- Any JIT-related patches will go into the ``py3k-jit`` branch.
|
|
|
|
- Non-JIT-related patches will go into the ``py3k`` branch (once reviewed and
|
|
|
|
approved) and be merged back into the ``py3k-jit`` branch.
|
|
|
|
- Potentially-contentious issues, such as the introduction of new command line
|
|
|
|
flags or environment variables, will be discussed on python-dev.
|
|
|
|
|
|
|
|
|
|
|
|
Because Google uses CPython 2.x internally, Unladen Swallow is based on CPython
|
|
|
|
2.6. We would need to port our compiler to Python 3; this would be done as
|
|
|
|
patches are applied to the ``py3k-jit`` branch, so that the branch remains a
|
|
|
|
consistent implementation of Python 3 at all times.
|
|
|
|
|
|
|
|
We believe this approach will be minimally disruptive to the 3.2 or 3.3 release
|
|
|
|
process while we iron out any remaining issues blocking final merger into
|
|
|
|
``py3k``. Unladen Swallow maintains a punchlist of known issues needed before
|
|
|
|
final merger [#us-punchlist]_, which includes all problems mentioned in this
|
2010-02-20 17:02:19 -05:00
|
|
|
PEP; we trust the CPython community will have its own concerns. This punchlist
|
|
|
|
is not static; other issues may emerge in the future that will block final
|
|
|
|
merger into the ``py3k`` branch.
|
2010-01-20 17:08:04 -05:00
|
|
|
|
2010-02-20 17:02:19 -05:00
|
|
|
Changes will be committed directly to the ``py3k-jit`` branch, with only large,
|
|
|
|
tricky or controversial changes sent for pre-commit code review.
|
2010-01-20 17:08:04 -05:00
|
|
|
|
|
|
|
|
2010-02-08 21:51:26 -05:00
|
|
|
Contingency Plans
|
|
|
|
-----------------
|
|
|
|
|
|
|
|
There is a chance that we will not be able to reduce memory usage or startup
|
|
|
|
time to a level satisfactory to the CPython community. Our primary contingency
|
2016-05-03 06:52:22 -04:00
|
|
|
plan for this situation is to shift from an online just-in-time compilation
|
2010-02-08 21:51:26 -05:00
|
|
|
strategy to an offline ahead-of-time strategy using an instrumented CPython
|
|
|
|
interpreter loop to obtain feedback. This is the same model used by gcc's
|
|
|
|
feedback-directed optimizations (`-fprofile-generate`) [#gcc-fdo]_ and
|
|
|
|
Microsoft Visual Studio's profile-guided optimizations [#msvc-pgo]_; we will
|
|
|
|
refer to this as "feedback-directed optimization" here, or FDO.
|
|
|
|
|
|
|
|
We believe that an FDO compiler for Python would be inferior to a JIT compiler.
|
|
|
|
FDO requires a high-quality, representative benchmark suite, which is a relative
|
|
|
|
rarity in both open- and closed-source development. A JIT compiler can
|
|
|
|
dynamically find and optimize the hot spots in any application -- benchmark
|
|
|
|
suite or no -- allowing it to adapt to changes in application bottlenecks
|
|
|
|
without human intervention.
|
|
|
|
|
|
|
|
If an ahead-of-time FDO compiler is required, it should be able to leverage a
|
|
|
|
large percentage of the code and infrastructure already developed for Unladen
|
|
|
|
Swallow's JIT compiler. Indeed, these two compilation strategies could exist
|
|
|
|
side-by-side.
|
|
|
|
|
|
|
|
|
2010-01-20 17:08:04 -05:00
|
|
|
Future Work
|
|
|
|
===========
|
|
|
|
|
|
|
|
A JIT compiler is an extremely flexible tool, and we have by no means exhausted
|
|
|
|
its full potential. Unladen Swallow maintains a list of yet-to-be-implemented
|
|
|
|
performance optimizations [#us-perf-punchlist]_ that the team has not yet
|
|
|
|
had time to fully implement. Examples:
|
|
|
|
|
|
|
|
- Python/Python inlining [#inlining]_. Our compiler currently performs no
|
|
|
|
inlining between pure-Python functions. Work on this is on-going
|
|
|
|
[#us-inlining]_.
|
|
|
|
- Unboxing [#unboxing]_. Unboxing is critical for numerical performance. PyPy
|
|
|
|
in particular has demonstrated the value of unboxing to heavily-numeric
|
|
|
|
workloads.
|
|
|
|
- Recompilation, adaptation. Unladen Swallow currently only compiles a Python
|
|
|
|
function once, based on its usage pattern up to that point. If the usage
|
|
|
|
pattern changes, limitations in LLVM [#us-recompile-issue]_ prevent us from
|
|
|
|
recompiling the function to better serve the new usage pattern.
|
|
|
|
- JIT-compile regular expressions. Modern JavaScript engines reuse their JIT
|
|
|
|
compilation infrastructure to boost regex performance [#us-regex-perf]_.
|
|
|
|
Unladen Swallow has developed benchmarks for Python regular expression
|
|
|
|
performance ([#us-bm-re-compile]_, [#us-bm-re-v8]_, [#us-bm-re-effbot]_), but
|
|
|
|
work on regex performance is still at an early stage [#us-regex-issue]_.
|
|
|
|
- Trace compilation [#traces-waste-of-time]_, [#traces-explicit-pipeline]_.
|
|
|
|
Based on the results of PyPy and Tracemonkey [#tracemonkey]_, we believe that
|
|
|
|
a CPython JIT should incorporate trace compilation to some degree. We
|
|
|
|
initially avoided a purely-tracing JIT compiler in favor of a simpler,
|
|
|
|
function-at-a-time compiler. However this function-at-a-time compiler has laid
|
|
|
|
the groundwork for a future tracing compiler implemented in the same terms.
|
2010-02-08 21:51:26 -05:00
|
|
|
- Profile generation/reuse. The runtime data gathered by the JIT could be
|
|
|
|
persisted to disk and reused by subsequent JIT compilations, or by external
|
|
|
|
tools such as Cython [#cython]_ or a feedback-enhanced code coverage tool.
|
2010-01-20 17:08:04 -05:00
|
|
|
|
|
|
|
This list is by no means exhaustive. There is a vast literature on optimizations
|
|
|
|
for dynamic languages that could and should be implemented in terms of Unladen
|
|
|
|
Swallow's LLVM-based JIT compiler [#us-relevantpapers]_.
|
|
|
|
|
|
|
|
|
|
|
|
Unladen Swallow Community
|
|
|
|
=========================
|
|
|
|
|
|
|
|
We would like to thank the community of developers who have contributed to
|
|
|
|
Unladen Swallow, in particular: James Abbatiello, Joerg Blank, Eric Christopher,
|
|
|
|
Alex Gaynor, Chris Lattner, Nick Lewycky, Evan Phoenix and Thomas Wouters.
|
|
|
|
|
|
|
|
|
|
|
|
Licensing
|
|
|
|
=========
|
|
|
|
|
|
|
|
All work on Unladen Swallow is licensed to the Python Software Foundation (PSF)
|
|
|
|
under the terms of the Python Software Foundation License v2 [#psf-lic]_ under
|
|
|
|
the umbrella of Google's blanket Contributor License Agreement with the PSF.
|
|
|
|
|
2010-02-08 21:51:26 -05:00
|
|
|
LLVM is licensed [#llvm-lic]_ under the University of llinois/NCSA Open Source
|
|
|
|
License [#ui-lic]_, a liberal, OSI-approved license. The University of Illinois
|
|
|
|
Urbana-Champaign is the sole copyright holder for LLVM.
|
|
|
|
|
2010-01-20 17:08:04 -05:00
|
|
|
|
|
|
|
References
|
|
|
|
==========
|
|
|
|
|
2011-08-30 06:49:35 -04:00
|
|
|
.. [#us-post-mortem]
|
|
|
|
http://qinsb.blogspot.com/2011/03/unladen-swallow-retrospective.html
|
|
|
|
|
|
|
|
.. [#dead-parrot]
|
|
|
|
http://en.wikipedia.org/wiki/Dead_Parrot_sketch
|
|
|
|
|
2010-01-20 17:08:04 -05:00
|
|
|
.. [#us]
|
|
|
|
http://code.google.com/p/unladen-swallow/
|
|
|
|
|
|
|
|
.. [#llvm]
|
|
|
|
http://llvm.org/
|
|
|
|
|
|
|
|
.. [#clang]
|
|
|
|
http://clang.llvm.org/
|
|
|
|
|
|
|
|
.. [#tested-apps]
|
|
|
|
http://code.google.com/p/unladen-swallow/wiki/Testing
|
|
|
|
|
|
|
|
.. [#llvm-hardware]
|
|
|
|
http://llvm.org/docs/GettingStarted.html#hardware
|
|
|
|
|
|
|
|
.. [#llvm-c-api]
|
|
|
|
http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm-c/
|
|
|
|
|
|
|
|
.. [#llvm-26-whatsnew]
|
|
|
|
http://llvm.org/releases/2.6/docs/ReleaseNotes.html#whatsnew
|
|
|
|
|
|
|
|
.. [#us-llvm-r820]
|
|
|
|
http://code.google.com/p/unladen-swallow/source/detail?r=820
|
|
|
|
|
|
|
|
.. [#us-llvm-r532]
|
|
|
|
http://code.google.com/p/unladen-swallow/source/detail?r=532
|
|
|
|
|
|
|
|
.. [#llvm-macports]
|
|
|
|
http://trac.macports.org/browser/trunk/dports/lang/llvm/Portfile
|
|
|
|
|
|
|
|
.. [#llvm-ubuntu]
|
|
|
|
http://packages.ubuntu.com/karmic/llvm
|
|
|
|
|
|
|
|
.. [#llvm-debian]
|
|
|
|
http://packages.debian.org/unstable/devel/llvm
|
|
|
|
|
|
|
|
.. [#clang-debian]
|
|
|
|
http://packages.debian.org/sid/clang
|
|
|
|
|
|
|
|
.. [#llvm-fedora]
|
|
|
|
http://koji.fedoraproject.org/koji/buildinfo?buildID=134384
|
|
|
|
|
|
|
|
.. [#gdb70]
|
|
|
|
http://www.gnu.org/software/gdb/download/ANNOUNCEMENT
|
|
|
|
|
|
|
|
.. [#oprofile]
|
|
|
|
http://oprofile.sourceforge.net/news/
|
|
|
|
|
|
|
|
.. [#us-oprofile-punchlist]
|
|
|
|
http://code.google.com/p/unladen-swallow/issues/detail?id=63
|
|
|
|
|
|
|
|
.. [#shark]
|
|
|
|
http://developer.apple.com/tools/sharkoptimize.html
|
|
|
|
|
|
|
|
.. [#llvm-oprofile-change]
|
|
|
|
http://llvm.org/viewvc/llvm-project?view=rev&revision=75279
|
|
|
|
|
|
|
|
.. [#us-oprofile-change]
|
|
|
|
http://code.google.com/p/unladen-swallow/source/detail?r=986
|
|
|
|
|
|
|
|
.. [#oprofile-jit-interface]
|
|
|
|
http://oprofile.sourceforge.net/doc/devel/jit-interface.html
|
|
|
|
|
2010-02-08 21:51:26 -05:00
|
|
|
.. [#oprofile-workflow]
|
|
|
|
http://code.google.com/p/unladen-swallow/wiki/UsingOProfile
|
|
|
|
|
2010-01-20 17:08:04 -05:00
|
|
|
.. [#llvm-mingw]
|
|
|
|
http://llvm.org/releases/download.html
|
|
|
|
|
|
|
|
.. [#us-r359]
|
|
|
|
http://code.google.com/p/unladen-swallow/source/detail?r=359
|
|
|
|
|
|
|
|
.. [#us-r376]
|
|
|
|
http://code.google.com/p/unladen-swallow/source/detail?r=376
|
|
|
|
|
|
|
|
.. [#us-r417]
|
|
|
|
http://code.google.com/p/unladen-swallow/source/detail?r=417
|
|
|
|
|
|
|
|
.. [#us-r517]
|
|
|
|
http://code.google.com/p/unladen-swallow/source/detail?r=517
|
|
|
|
|
|
|
|
.. [#bdfl-27-final]
|
|
|
|
http://mail.python.org/pipermail/python-dev/2010-January/095682.html
|
|
|
|
|
|
|
|
.. [#cpy-27a1]
|
|
|
|
http://www.python.org/dev/peps/pep-0373/
|
|
|
|
|
|
|
|
.. [#cpy-32]_
|
|
|
|
http://www.python.org/dev/peps/pep-0392/
|
|
|
|
|
|
|
|
.. [#us-punchlist]
|
|
|
|
http://code.google.com/p/unladen-swallow/issues/list?q=label:Merger
|
|
|
|
|
|
|
|
.. [#us-binary-size]
|
|
|
|
http://code.google.com/p/unladen-swallow/issues/detail?id=118
|
|
|
|
|
|
|
|
.. [#us-issue-startup-time]
|
|
|
|
http://code.google.com/p/unladen-swallow/issues/detail?id=64
|
|
|
|
|
|
|
|
.. [#zope-interface]
|
|
|
|
http://www.zope.org/Products/ZopeInterface
|
|
|
|
|
|
|
|
.. [#bigtable]
|
|
|
|
http://en.wikipedia.org/wiki/BigTable
|
|
|
|
|
|
|
|
.. [#mondrian]
|
|
|
|
http://www.niallkennedy.com/blog/2006/11/google-mondrian.html
|
|
|
|
|
|
|
|
.. [#us-sqlalchemy-readme]
|
|
|
|
http://code.google.com/p/unladen-swallow/source/browse/tests/lib/sqlalchemy/README.unladen
|
|
|
|
|
|
|
|
.. [#us-test_llvm]
|
|
|
|
http://code.google.com/p/unladen-swallow/source/browse/trunk/Lib/test/test_llvm.py
|
|
|
|
|
|
|
|
.. [#fuzz-testing]
|
|
|
|
http://en.wikipedia.org/wiki/Fuzz_testing
|
|
|
|
|
|
|
|
.. [#pyfuzz]
|
|
|
|
http://bitbucket.org/ebo/pyfuzz/overview/
|
|
|
|
|
|
|
|
.. [#fusil]
|
|
|
|
http://lwn.net/Articles/322826/
|
|
|
|
|
|
|
|
.. [#us-memory-issue]
|
|
|
|
http://code.google.com/p/unladen-swallow/issues/detail?id=68
|
|
|
|
|
|
|
|
.. [#us-benchmarks]
|
|
|
|
http://code.google.com/p/unladen-swallow/wiki/Benchmarks
|
|
|
|
|
|
|
|
.. [#students-t-test]
|
|
|
|
http://en.wikipedia.org/wiki/Student's_t-test
|
|
|
|
|
|
|
|
.. [#smaps]
|
|
|
|
http://bmaurer.blogspot.com/2006/03/memory-usage-with-smaps.html
|
|
|
|
|
|
|
|
.. [#us-background-thread]
|
|
|
|
http://code.google.com/p/unladen-swallow/source/browse/branches/background-thread
|
|
|
|
|
|
|
|
.. [#us-background-thread-issue]
|
|
|
|
http://code.google.com/p/unladen-swallow/issues/detail?id=40
|
|
|
|
|
|
|
|
.. [#us-import-tests]
|
|
|
|
http://code.google.com/p/unladen-swallow/source/detail?r=888
|
|
|
|
|
|
|
|
.. [#us-tracing-tests]
|
|
|
|
http://code.google.com/p/unladen-swallow/source/diff?spec=svn576&r=576&format=side&path=/trunk/Lib/test/test_trace.py
|
|
|
|
|
|
|
|
.. [#us-perf-punchlist]
|
|
|
|
http://code.google.com/p/unladen-swallow/issues/list?q=label:Performance
|
|
|
|
|
|
|
|
.. [#jit]
|
|
|
|
http://en.wikipedia.org/wiki/Just-in-time_compilation
|
|
|
|
|
|
|
|
.. [#urs-self]
|
|
|
|
http://research.sun.com/self/papers/urs-thesis.html
|
|
|
|
|
|
|
|
.. [#us-projectplan]
|
|
|
|
http://code.google.com/p/unladen-swallow/wiki/ProjectPlan
|
|
|
|
|
|
|
|
.. [#us-relevantpapers]
|
|
|
|
http://code.google.com/p/unladen-swallow/wiki/RelevantPapers
|
|
|
|
|
|
|
|
.. [#us-llvm-notes]
|
|
|
|
http://code.google.com/p/unladen-swallow/source/browse/trunk/Python/llvm_notes.txt
|
|
|
|
|
|
|
|
.. [#psf-lic]
|
|
|
|
http://www.python.org/psf/license/
|
|
|
|
|
2010-02-08 21:51:26 -05:00
|
|
|
.. [#llvm-lic]
|
|
|
|
http://llvm.org/docs/DeveloperPolicy.html#clp
|
|
|
|
|
|
|
|
.. [#ui-lic]
|
|
|
|
http://www.opensource.org/licenses/UoI-NCSA.php
|
|
|
|
|
2010-01-20 17:08:04 -05:00
|
|
|
.. [#v8]
|
|
|
|
http://code.google.com/p/v8/
|
|
|
|
|
|
|
|
.. [#squirrelfishextreme]
|
|
|
|
http://webkit.org/blog/214/introducing-squirrelfish-extreme/
|
|
|
|
|
|
|
|
.. [#rubinius]
|
|
|
|
http://rubini.us/
|
|
|
|
|
|
|
|
.. [#parrot-on-llvm]
|
|
|
|
http://lists.parrot.org/pipermail/parrot-dev/2009-September/002811.html
|
|
|
|
|
|
|
|
.. [#macruby]
|
|
|
|
http://www.macruby.org/
|
|
|
|
|
|
|
|
.. [#hotspot]
|
|
|
|
http://en.wikipedia.org/wiki/HotSpot
|
|
|
|
|
|
|
|
.. [#psyco]
|
|
|
|
http://psyco.sourceforge.net/
|
|
|
|
|
|
|
|
.. [#pypy]
|
|
|
|
http://codespeak.net/pypy/dist/pypy/doc/
|
|
|
|
|
|
|
|
.. [#inlining]
|
|
|
|
http://en.wikipedia.org/wiki/Inline_expansion
|
|
|
|
|
|
|
|
.. [#unboxing]
|
2010-02-25 14:36:05 -05:00
|
|
|
http://en.wikipedia.org/wiki/Object_type_(object-oriented_programming%29
|
2010-01-20 17:08:04 -05:00
|
|
|
|
|
|
|
.. [#us-inlining]
|
|
|
|
http://code.google.com/p/unladen-swallow/issues/detail?id=86
|
|
|
|
|
|
|
|
.. [#us-styleguide]
|
|
|
|
http://code.google.com/p/unladen-swallow/wiki/StyleGuide
|
|
|
|
|
|
|
|
.. [#llvm-styleguide]
|
|
|
|
http://llvm.org/docs/CodingStandards.html
|
|
|
|
|
|
|
|
.. [#google-styleguide]
|
|
|
|
http://google-styleguide.googlecode.com/svn/trunk/cppguide.xml
|
|
|
|
|
|
|
|
.. [#us-recompile-issue]
|
|
|
|
http://code.google.com/p/unladen-swallow/issues/detail?id=41
|
|
|
|
|
|
|
|
.. [#us-regex-perf]
|
|
|
|
http://code.google.com/p/unladen-swallow/wiki/ProjectPlan#Regular_Expressions
|
|
|
|
|
|
|
|
.. [#us-bm-re-compile]
|
|
|
|
http://code.google.com/p/unladen-swallow/source/browse/tests/performance/bm_regex_compile.py
|
|
|
|
|
|
|
|
.. [#us-bm-re-v8]
|
|
|
|
http://code.google.com/p/unladen-swallow/source/browse/tests/performance/bm_regex_v8.py
|
|
|
|
|
|
|
|
.. [#us-bm-re-effbot]
|
|
|
|
http://code.google.com/p/unladen-swallow/source/browse/tests/performance/bm_regex_effbot.py
|
|
|
|
|
|
|
|
.. [#us-regex-issue]
|
|
|
|
http://code.google.com/p/unladen-swallow/issues/detail?id=13
|
|
|
|
|
|
|
|
.. [#pygame]
|
|
|
|
http://www.pygame.org/
|
|
|
|
|
|
|
|
.. [#numpy]
|
|
|
|
http://numpy.scipy.org/
|
|
|
|
|
|
|
|
.. [#pypy-bmarks]
|
|
|
|
http://codespeak.net:8099/plotsummary.html
|
|
|
|
|
|
|
|
.. [#llvm-users]
|
|
|
|
http://llvm.org/Users.html
|
|
|
|
|
|
|
|
.. [#hlvm]
|
|
|
|
http://www.ffconsultancy.com/ocaml/hlvm/
|
|
|
|
|
|
|
|
.. [#llvm-far-call-issue]
|
|
|
|
http://llvm.org/PR5201
|
|
|
|
|
|
|
|
.. [#llvm-jmm-rev]
|
|
|
|
http://llvm.org/viewvc/llvm-project?view=rev&revision=76828
|
|
|
|
|
|
|
|
.. [#llvm-memleak-rev]
|
|
|
|
http://llvm.org/viewvc/llvm-project?rev=91611&view=rev
|
|
|
|
|
|
|
|
.. [#llvm-globaldce-rev]
|
|
|
|
http://llvm.org/viewvc/llvm-project?rev=85182&view=rev
|
|
|
|
|
|
|
|
.. [#llvm-availext-issue]
|
|
|
|
http://llvm.org/PR5735
|
|
|
|
|
|
|
|
.. [#us-specialization-issue]
|
|
|
|
http://code.google.com/p/unladen-swallow/issues/detail?id=73
|
|
|
|
|
|
|
|
.. [#us-direct-calling-issue]
|
|
|
|
http://code.google.com/p/unladen-swallow/issues/detail?id=88
|
|
|
|
|
|
|
|
.. [#us-fast-globals-issue]
|
|
|
|
http://code.google.com/p/unladen-swallow/issues/detail?id=67
|
|
|
|
|
|
|
|
.. [#traces-waste-of-time]
|
|
|
|
http://www.ics.uci.edu/~franz/Site/pubs-pdf/C44Prepub.pdf
|
|
|
|
|
|
|
|
.. [#traces-explicit-pipeline]
|
|
|
|
http://www.ics.uci.edu/~franz/Site/pubs-pdf/ICS-TR-07-12.pdf
|
|
|
|
|
|
|
|
.. [#tracemonkey]
|
|
|
|
https://wiki.mozilla.org/JavaScript:TraceMonkey
|
|
|
|
|
|
|
|
.. [#llvm-langref]
|
|
|
|
http://llvm.org/docs/LangRef.html
|
|
|
|
|
|
|
|
.. [#us-wider-perf-issue]
|
|
|
|
http://code.google.com/p/unladen-swallow/issues/detail?id=120
|
|
|
|
|
|
|
|
.. [#us-nbody]
|
|
|
|
http://code.google.com/p/unladen-swallow/source/browse/tests/performance/bm_nbody.py
|
|
|
|
|
2010-02-08 21:51:26 -05:00
|
|
|
.. [#us-shared-link-issue]
|
|
|
|
http://code.google.com/p/unladen-swallow/issues/detail?id=130
|
|
|
|
|
|
|
|
.. [#us-llvm-punchlist]
|
|
|
|
http://code.google.com/p/unladen-swallow/issues/detail?id=131
|
|
|
|
|
|
|
|
.. [#llvm-ppc-eager-jit-issue]
|
|
|
|
http://llvm.org/PR4816
|
|
|
|
|
|
|
|
.. [#llvm-arm-jit-issue]
|
|
|
|
http://llvm.org/PR6065
|
|
|
|
|
|
|
|
.. [#cython]
|
|
|
|
http://www.cython.org/
|
|
|
|
|
|
|
|
.. [#shedskin]
|
|
|
|
http://shed-skin.blogspot.com/
|
|
|
|
|
|
|
|
.. [#shedskin-library-limits]
|
|
|
|
http://shedskin.googlecode.com/files/shedskin-tutorial-0.3.html
|
|
|
|
|
|
|
|
.. [#wpython]
|
|
|
|
http://code.google.com/p/wpython/
|
|
|
|
|
|
|
|
.. [#wpython-performance]
|
|
|
|
http://www.mail-archive.com/python-dev@python.org/msg45143.html
|
|
|
|
|
|
|
|
.. [#ironpython]
|
|
|
|
http://ironpython.net/
|
|
|
|
|
|
|
|
.. [#mono]
|
|
|
|
http://www.mono-project.com/
|
|
|
|
|
|
|
|
.. [#jython]
|
|
|
|
http://www.jython.org/
|
|
|
|
|
|
|
|
.. [#jython-c-ext]
|
|
|
|
http://wiki.python.org/jython/JythonFaq/GeneralInfo
|
|
|
|
|
|
|
|
.. [#pyv8]
|
|
|
|
http://code.google.com/p/pyv8/
|
|
|
|
|
|
|
|
.. [#gcc-fdo]
|
|
|
|
http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html
|
|
|
|
|
|
|
|
.. [#msvc-pgo]
|
|
|
|
http://msdn.microsoft.com/en-us/library/e7k32f4k.aspx
|
|
|
|
|
2010-02-20 17:02:19 -05:00
|
|
|
.. [#us-wpython-compat]
|
|
|
|
http://www.mail-archive.com/python-dev@python.org/msg44962.html
|
|
|
|
|
|
|
|
.. [#asher-rotem]
|
|
|
|
http://portal.acm.org/citation.cfm?id=1534530.1534550
|
|
|
|
|
|
|
|
.. [#stackless]
|
|
|
|
http://www.stackless.com/
|
|
|
|
|
|
|
|
.. [#stackless-merger]
|
|
|
|
http://mail.python.org/pipermail/python-dev/2004-June/045165.html
|
|
|
|
|
|
|
|
.. [#llvm-heap-frames]
|
|
|
|
http://www.nondot.org/sabre/LLVMNotes/ExplicitlyManagedStackFrames.txt
|
|
|
|
|
|
|
|
.. [#llvm-heap-frames-disc]
|
|
|
|
http://old.nabble.com/LLVM-and-coroutines-microthreads-td23080883.html
|
|
|
|
|
|
|
|
.. [#pep7-cpp]
|
|
|
|
http://www.mail-archive.com/python-dev@python.org/msg45544.html
|
|
|
|
|
2010-01-20 17:08:04 -05:00
|
|
|
|
|
|
|
Copyright
|
|
|
|
=========
|
|
|
|
|
|
|
|
This document has been placed in the public domain.
|
|
|
|
|
|
|
|
..
|
|
|
|
Local Variables:
|
|
|
|
mode: indented-text
|
|
|
|
indent-tabs-mode: nil
|
|
|
|
sentence-end-double-space: t
|
|
|
|
fill-column: 70
|
|
|
|
coding: utf-8
|
|
|
|
End:
|
|
|
|
|
|
|
|
|
|
|
|
|