Commit updated version of PEP 3146.

This commit is contained in:
Collin Winter 2010-02-09 02:51:26 +00:00
parent ae27a4e100
commit ba7a7e9df3
1 changed files with 211 additions and 62 deletions

View File

@ -165,6 +165,50 @@ exhausted its full potential. We have tried to create a sufficiently flexible
framework that the wider CPython development community can build upon it for
years to come, extracting increased performance in each subsequent release.
Alternatives
------------
There are number of alternative strategies for improving Python performance
which we considered, but found unsatisfactory.
- *Cython, Shedskin*: Cython [#cython]_ and Shedskin [#shedskin]_ are both
static compilers for Python. We view these as useful-but-limited workarounds
for CPython's historically-poor performance. Shedskin does not support the
full Python standard library [#shedskin-library-limits]_, while Cython
requires manual Cython-specific annotations for optimum performance.
Static compilers like these are useful for writing extension modules without
worrying about reference counting, but because they are static, ahead-of-time
compilers, they cannot optimize the full range of code under consideration by
a just-in-time compiler informed by runtime data.
- *IronPython*: IronPython [#ironpython]_ is Python on Microsoft's .Net
platform. It is not actively tested on Mono [#mono]_, meaning that it is
essentially Windows-only, making it unsuitable as a general CPython
replacement.
- *Jython*: Jython [#jython]_ is a complete implementation of Python 2.5, but
is significantly slower than Unladen Swallow (3-5x on measured benchmarks) and
has no support for CPython extension modules [#jython-c-ext]_, which would
make migration of large applications prohibitively expensive.
- *Psyco*: Psyco [#psyco]_ is a specializing JIT compiler for CPython,
implemented as an extension module. It primarily improves performance for
numerical code. Pros: exists; makes some code faster. Cons: 32-bit only, with
no plans for 64-bit support; supports x86 only; very difficult to maintain;
incompatible with SSE2 optimized code due to alignment issues.
- *PyPy*: PyPy [#pypy]_ has good performance on numerical code, but is slower
than Unladen Swallow on non-numerical workloads. PyPy only supports 32-bit
x86 code generation. It has poor support for CPython extension modules,
making migration for large applications prohibitively expensive.
- *PyV8*: PyV8 [#pyv8]_ is an alpha-stage experimental Python-to-JavaScript
compiler that runs on top of V8. PyV8 does not implement the whole Python
language, and has no support for CPython extension modules.
- *WPython*: WPython [#wpython]_ is a wordcode-based reimplementation of
CPython's interpreter loop. While it provides a modest improvement to
interpreter performance [#wpython-performance]_, it is not an either-or
substitute for a just-in-time compiler. An interpreter will never be as fast
as optimized machine code. We view WPython and similar interpreter
enhancements as complementary to our work, rather than as competitors.
Performance
===========
@ -411,6 +455,25 @@ Results from Unladen Swallow's ``startup`` benchmarks:
Stddev: 0.00214 -> 0.00240: 1.1209x larger
Timeline: http://tinyurl.com/yajn8fa
### bzr_startup ###
Min: 0.067990 -> 0.097985: 1.4412x slower
Avg: 0.084322 -> 0.111348: 1.3205x slower
Significant (t=-37.432534, a=0.95)
Stddev: 0.00793 -> 0.00643: 1.2330x smaller
Timeline: http://tinyurl.com/ybdm537
### hg_startup ###
Min: 0.016997 -> 0.024997: 1.4707x slower
Avg: 0.026990 -> 0.036772: 1.3625x slower
Significant (t=-53.104502, a=0.95)
Stddev: 0.00406 -> 0.00417: 1.0273x larger
Timeline: http://tinyurl.com/ycout8m
``bzr_startup`` and ``hg_startup`` measure how long it takes Bazaar and
Mercurial, respectively, to display their help screens. ``startup_nosite``
runs ``python -S`` many times; usage of the ``-S`` option is rare, but we feel
this gives a good indication of where increased startup time is coming from.
Unladen Swallow has made headway toward optimizing startup time, but there is
still more work to do and further optimizations to implement. Improving start-up
@ -422,40 +485,31 @@ Binary Size
-----------
Statically linking LLVM's code generation, analysis and optimization libraries
significantly increases the size of the ``python`` binary.
significantly increases the size of the ``python`` binary. The tables below
report stripped on-disk binary sizes; the binaries are stripped to better
correspond with the configurations used by system package managers. We feel this
is the most realistic measure of any change in binary size.
32-bit; gcc 4.0.3
+-------------+---------------+---------------+-----------------------+
| Binary size | CPython 2.6.4 | CPython 3.1.1 | Unladen Swallow r1041 |
+=============+===============+===============+=======================+
| 32-bit | 1.3M | 1.4M | 12M |
+-------------+---------------+---------------+-----------------------+
| 64-bit | 1.6M | 1.6M | 12M |
+-------------+---------------+---------------+-----------------------+
+-------------+---------------+---------------+----------------------+
| Binary size | CPython 2.6.4 | CPython 3.1.1 | Unladen Swallow r988 |
+=============+===============+===============+======================+
| Release | 3.8M | 4.0M | 74M |
+-------------+---------------+---------------+----------------------+
| Debug | 3.3M | 3.6M | 118M |
+-------------+---------------+---------------+----------------------+
64-bit; gcc 4.2.4
The increased binary size is caused by statically linking LLVM's code
generation, analysis and optimization libraries into the ``python`` binary.
This can be straightforwardly addressed by modifying LLVM to better support
shared linking and then using that, instead of the current static linking. For
the moment, though, static linking provides an accurate look at the cost of
linking against LLVM.
+-------------+---------------+---------------+----------------------+
| Binary size | CPython 2.6.4 | CPython 3.1.1 | Unladen Swallow r988 |
+=============+===============+===============+======================+
| Release | 5.5M | 5.7M | 89M |
+-------------+---------------+---------------+----------------------+
| Debug | 4.1M | 4.4M | 128M |
+-------------+---------------+---------------+----------------------+
The increased binary size is due to statically linking LLVM's code generation,
analysis and optimization libraries into the ``python`` binary. This can be
straightforwardly addressed by modifying LLVM to better support shared linking
and then using that, instead of the current static linking. For the moment,
though, static linking provides an accurate look at the cost of linking against
LLVM.
Unladen Swallow recently experienced a regression in binary size, going from
19MB in Unladen's 2009Q3 release up to the current 74MB shown in the table
above. Resolution of this issue [#us-binary-size]_ will block final merger into
the ``py3k`` branch.
Even when statically linking, we believe there is still headroom to improve
on-disk binary size by narrowing Unladen Swallow's dependencies on LLVM. This
issue is actively being addressed [#us-binary-size]_.
Performance Retrospective
@ -610,7 +664,8 @@ especially LLVM's JIT compilation system [#llvm-hardware]_. LLVM's JIT has the
best support on x86 and x86-64 systems, and these are the platforms where
Unladen Swallow has received the most testing. We are confident in LLVM/Unladen
Swallow's support for x86 and x86-64 hardware. PPC and ARM support exists, but
is not widely used and may be buggy.
is not widely used and may be buggy (for example, [#llvm-ppc-eager-jit-issue]_,
[#llvm-far-call-issue]_, [#llvm-arm-jit-issue]_).
Unladen Swallow is known to work on the following operating systems: Linux,
Darwin, Windows. Unladen Swallow has received the most testing on Linux and
@ -631,7 +686,7 @@ Experimenting with Changes to Python or CPython Bytecode
--------------------------------------------------------
Unladen Swallow's JIT compiler operates on CPython bytecode, and as such, it is
immune to Python languages changes that only affect the parser.
immune to Python language changes that affect only the parser.
We recommend that changes to the CPython bytecode compiler or the semantics of
individual bytecodes be prototyped in the interpreter loop first, then be ported
@ -765,6 +820,10 @@ Given the ease of integrating oProfile with LLVM [#llvm-oprofile-change]_ and
Unladen Swallow [#us-oprofile-change]_, other profiling tools should be easy as
well, provided they support a similar JIT interface [#oprofile-jit-interface]_.
We have documented the process for using oProfile to profile Unladen Swallow
[#oprofile-workflow]_. This document will be merged into CPython's `Doc/` tree
in the merge.
Addition of C++ to CPython
--------------------------
@ -781,12 +840,17 @@ Highlights:
- Easy use of LLVM's full, powerful code generation and related APIs.
- Convenient, abstract data structures simplify code.
- C++ is limited to relatively small corners of the CPython codebase.
- C++ can be disabled via ``./configure --without-llvm``, which even omits the
dependency on ``libstdc++``.
Lowlights:
- Developers must know two related languages, C and C++ to work on the full
range of CPython's internals.
- A C++ style guide will need to be developed and enforced. See `Open Issues`_.
- Different C++ compilers emit different ABIs; this can cause problems if
CPython is compiled with one C++ compiler and extensions modules are compiled
with a different C++ compiler.
Managing LLVM Releases, C++ API Changes
@ -813,20 +877,26 @@ generally become available via standard system package managers fairly quickly
following an LLVM release, and failing that, llvm.org itself includes binary
releases.
Pre-built LLVM packages are available from MacPorts [#llvm-macports]_ for
Darwin, and from most major Linux distributions ([#llvm-ubuntu]_,
Unladen Swallow has historically included a copy of the LLVM and Clang source
trees in the Unladen Swallow tree; this was done to allow us to closely track
LLVM trunk as we made patches to it. We do not recommend this model of
development for CPython. CPython releases should be based on official LLVM
releases. Pre-built LLVM packages are available from MacPorts [#llvm-macports]_
for Darwin, and from most major Linux distributions ([#llvm-ubuntu]_,
[#llvm-debian]_, [#llvm-fedora]_). LLVM itself provides additional binaries,
such as for MinGW [#llvm-mingw]_.
LLVM is currently intended to be statically linked; this means that binary
releases of CPython will include the relevant parts (not all!) of LLVM. This
will increase the binary size, as noted above.
will increase the binary size, as noted above. To simplify downstream package
management, we will modify LLVM to better support shared linking. This issue
will block final merger [#us-shared-link-issue]_.
Unladen Swallow has tasked a full-time engineer with fixing any remaining
critical issues in LLVM before LLVM's 2.7 release. We would like CPython 3.x to
be able to depend on a released version of LLVM, rather than closely tracking
LLVM trunk as Unladen Swallow has done. We believe we will finish this work
before the release of LLVM 2.7, expected in May 2010.
critical issues in LLVM before LLVM's 2.7 release. We consider it essential that
CPython 3.x be able to depend on a released version of LLVM, rather than closely
tracking LLVM trunk as Unladen Swallow has done. We believe we will finish this
work [#us-llvm-punchlist]_ before the release of LLVM 2.7, expected in May 2010.
Building CPython
@ -868,27 +938,22 @@ Full builds take a hit due to a) additional ``.cc`` files needed for LLVM
interaction, b) statically linking LLVM into ``libpython``, c) compiling parts
of the Python runtime to LLVM IR to enable cross-language inlining.
Incremental builds, however, are significantly slower. The table below shows
incremental rebuild times after touching ``Objects/listobject.c``.
Incremental builds are also somewhat slower than mainline CPython. The table
below shows incremental rebuild times after touching ``Objects/listobject.c``.
+-------------+---------------+---------------+----------------------+
| Incr make | CPython 2.6.4 | CPython 3.1.1 | Unladen Swallow r988 |
+=============+===============+===============+======================+
| Run 1 | 0m1.854s | 0m1.456s | 0m24.464s |
+-------------+---------------+---------------+----------------------+
| Run 2 | 0m1.437s | 0m1.442s | 0m24.416s |
+-------------+---------------+---------------+----------------------+
| Run 3 | 0m1.440s | 0m1.425s | 0m24.352s |
+-------------+---------------+---------------+----------------------+
+-------------+---------------+---------------+-----------------------+
| Incr make | CPython 2.6.4 | CPython 3.1.1 | Unladen Swallow r1024 |
+=============+===============+===============+=======================+
| Run 1 | 0m1.854s | 0m1.456s | 0m6.680s |
+-------------+---------------+---------------+-----------------------+
| Run 2 | 0m1.437s | 0m1.442s | 0m5.310s |
+-------------+---------------+---------------+-----------------------+
| Run 3 | 0m1.440s | 0m1.425s | 0m7.639s |
+-------------+---------------+---------------+-----------------------+
As with full builds, this extra time comes from a) additional ``.cc`` files
needed for LLVM interaction, and b) statically linking LLVM into ``libpython``.
If ``libpython`` were linked shared against LLVM, this overhead would go down.
Incremental builds of Unladen Swallow also currently (as of r988) suffer from a
known bug in the Unladen Swallow ``Makefile`` [#rebuild-too-much]_ where too
many ``.cc`` files are recompiled. We consider this a blocking issue for full
merger with the ``py3k`` branch.
As with full builds, this extra time comes from statically linking LLVM
into ``libpython``. If ``libpython`` were linked shared against LLVM, this
overhead would go down.
Proposed Merge Plan
@ -930,6 +995,31 @@ See the `Open Issues`_ section for questions about code review policy for the
``py3k-jit`` branch.
Contingency Plans
-----------------
There is a chance that we will not be able to reduce memory usage or startup
time to a level satisfactory to the CPython community. Our primary contingency
plan for this situation is to shift from a online just-in-time compilation
strategy to an offline ahead-of-time strategy using an instrumented CPython
interpreter loop to obtain feedback. This is the same model used by gcc's
feedback-directed optimizations (`-fprofile-generate`) [#gcc-fdo]_ and
Microsoft Visual Studio's profile-guided optimizations [#msvc-pgo]_; we will
refer to this as "feedback-directed optimization" here, or FDO.
We believe that an FDO compiler for Python would be inferior to a JIT compiler.
FDO requires a high-quality, representative benchmark suite, which is a relative
rarity in both open- and closed-source development. A JIT compiler can
dynamically find and optimize the hot spots in any application -- benchmark
suite or no -- allowing it to adapt to changes in application bottlenecks
without human intervention.
If an ahead-of-time FDO compiler is required, it should be able to leverage a
large percentage of the code and infrastructure already developed for Unladen
Swallow's JIT compiler. Indeed, these two compilation strategies could exist
side-by-side.
Future Work
===========
@ -959,6 +1049,9 @@ had time to fully implement. Examples:
initially avoided a purely-tracing JIT compiler in favor of a simpler,
function-at-a-time compiler. However this function-at-a-time compiler has laid
the groundwork for a future tracing compiler implemented in the same terms.
- Profile generation/reuse. The runtime data gathered by the JIT could be
persisted to disk and reused by subsequent JIT compilations, or by external
tools such as Cython [#cython]_ or a feedback-enhanced code coverage tool.
This list is by no means exhaustive. There is a vast literature on optimizations
for dynamic languages that could and should be implemented in terms of Unladen
@ -977,8 +1070,6 @@ Open Issues
organization. We would like a non-Google-affiliated member of the CPython
development team to review our work for correctness and compatibility, but we
realize this may not be possible for every commit.
- *How to link LLVM.* Should we change LLVM to better support shared linking,
and then use shared linking to link the parts of it we need into CPython?
- *Prioritization of remaining issues.* We would like input from the CPython
development team on how to prioritize the remaining issues in the Unladen
Swallow codebase. Some issues like memory usage are obviously critical before
@ -1007,6 +1098,10 @@ All work on Unladen Swallow is licensed to the Python Software Foundation (PSF)
under the terms of the Python Software Foundation License v2 [#psf-lic]_ under
the umbrella of Google's blanket Contributor License Agreement with the PSF.
LLVM is licensed [#llvm-lic]_ under the University of llinois/NCSA Open Source
License [#ui-lic]_, a liberal, OSI-approved license. The University of Illinois
Urbana-Champaign is the sole copyright holder for LLVM.
References
==========
@ -1026,9 +1121,6 @@ References
.. [#llvm-hardware]
http://llvm.org/docs/GettingStarted.html#hardware
.. [#rebuild-too-much]
http://code.google.com/p/unladen-swallow/issues/detail?id=115
.. [#llvm-c-api]
http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm-c/
@ -1077,6 +1169,9 @@ References
.. [#oprofile-jit-interface]
http://oprofile.sourceforge.net/doc/devel/jit-interface.html
.. [#oprofile-workflow]
http://code.google.com/p/unladen-swallow/wiki/UsingOProfile
.. [#llvm-mingw]
http://llvm.org/releases/download.html
@ -1179,6 +1274,12 @@ References
.. [#psf-lic]
http://www.python.org/psf/license/
.. [#llvm-lic]
http://llvm.org/docs/DeveloperPolicy.html#clp
.. [#ui-lic]
http://www.opensource.org/licenses/UoI-NCSA.php
.. [#v8]
http://code.google.com/p/v8/
@ -1296,6 +1397,54 @@ References
.. [#us-nbody]
http://code.google.com/p/unladen-swallow/source/browse/tests/performance/bm_nbody.py
.. [#us-shared-link-issue]
http://code.google.com/p/unladen-swallow/issues/detail?id=130
.. [#us-llvm-punchlist]
http://code.google.com/p/unladen-swallow/issues/detail?id=131
.. [#llvm-ppc-eager-jit-issue]
http://llvm.org/PR4816
.. [#llvm-arm-jit-issue]
http://llvm.org/PR6065
.. [#cython]
http://www.cython.org/
.. [#shedskin]
http://shed-skin.blogspot.com/
.. [#shedskin-library-limits]
http://shedskin.googlecode.com/files/shedskin-tutorial-0.3.html
.. [#wpython]
http://code.google.com/p/wpython/
.. [#wpython-performance]
http://www.mail-archive.com/python-dev@python.org/msg45143.html
.. [#ironpython]
http://ironpython.net/
.. [#mono]
http://www.mono-project.com/
.. [#jython]
http://www.jython.org/
.. [#jython-c-ext]
http://wiki.python.org/jython/JythonFaq/GeneralInfo
.. [#pyv8]
http://code.google.com/p/pyv8/
.. [#gcc-fdo]
http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html
.. [#msvc-pgo]
http://msdn.microsoft.com/en-us/library/e7k32f4k.aspx
Copyright
=========