Commit updated version of PEP 3146.
This commit is contained in:
parent
ae27a4e100
commit
ba7a7e9df3
273
pep-3146.txt
273
pep-3146.txt
|
@ -165,6 +165,50 @@ exhausted its full potential. We have tried to create a sufficiently flexible
|
|||
framework that the wider CPython development community can build upon it for
|
||||
years to come, extracting increased performance in each subsequent release.
|
||||
|
||||
Alternatives
|
||||
------------
|
||||
|
||||
There are number of alternative strategies for improving Python performance
|
||||
which we considered, but found unsatisfactory.
|
||||
|
||||
- *Cython, Shedskin*: Cython [#cython]_ and Shedskin [#shedskin]_ are both
|
||||
static compilers for Python. We view these as useful-but-limited workarounds
|
||||
for CPython's historically-poor performance. Shedskin does not support the
|
||||
full Python standard library [#shedskin-library-limits]_, while Cython
|
||||
requires manual Cython-specific annotations for optimum performance.
|
||||
|
||||
Static compilers like these are useful for writing extension modules without
|
||||
worrying about reference counting, but because they are static, ahead-of-time
|
||||
compilers, they cannot optimize the full range of code under consideration by
|
||||
a just-in-time compiler informed by runtime data.
|
||||
- *IronPython*: IronPython [#ironpython]_ is Python on Microsoft's .Net
|
||||
platform. It is not actively tested on Mono [#mono]_, meaning that it is
|
||||
essentially Windows-only, making it unsuitable as a general CPython
|
||||
replacement.
|
||||
- *Jython*: Jython [#jython]_ is a complete implementation of Python 2.5, but
|
||||
is significantly slower than Unladen Swallow (3-5x on measured benchmarks) and
|
||||
has no support for CPython extension modules [#jython-c-ext]_, which would
|
||||
make migration of large applications prohibitively expensive.
|
||||
- *Psyco*: Psyco [#psyco]_ is a specializing JIT compiler for CPython,
|
||||
implemented as an extension module. It primarily improves performance for
|
||||
numerical code. Pros: exists; makes some code faster. Cons: 32-bit only, with
|
||||
no plans for 64-bit support; supports x86 only; very difficult to maintain;
|
||||
incompatible with SSE2 optimized code due to alignment issues.
|
||||
- *PyPy*: PyPy [#pypy]_ has good performance on numerical code, but is slower
|
||||
than Unladen Swallow on non-numerical workloads. PyPy only supports 32-bit
|
||||
x86 code generation. It has poor support for CPython extension modules,
|
||||
making migration for large applications prohibitively expensive.
|
||||
- *PyV8*: PyV8 [#pyv8]_ is an alpha-stage experimental Python-to-JavaScript
|
||||
compiler that runs on top of V8. PyV8 does not implement the whole Python
|
||||
language, and has no support for CPython extension modules.
|
||||
- *WPython*: WPython [#wpython]_ is a wordcode-based reimplementation of
|
||||
CPython's interpreter loop. While it provides a modest improvement to
|
||||
interpreter performance [#wpython-performance]_, it is not an either-or
|
||||
substitute for a just-in-time compiler. An interpreter will never be as fast
|
||||
as optimized machine code. We view WPython and similar interpreter
|
||||
enhancements as complementary to our work, rather than as competitors.
|
||||
|
||||
|
||||
|
||||
Performance
|
||||
===========
|
||||
|
@ -411,6 +455,25 @@ Results from Unladen Swallow's ``startup`` benchmarks:
|
|||
Stddev: 0.00214 -> 0.00240: 1.1209x larger
|
||||
Timeline: http://tinyurl.com/yajn8fa
|
||||
|
||||
### bzr_startup ###
|
||||
Min: 0.067990 -> 0.097985: 1.4412x slower
|
||||
Avg: 0.084322 -> 0.111348: 1.3205x slower
|
||||
Significant (t=-37.432534, a=0.95)
|
||||
Stddev: 0.00793 -> 0.00643: 1.2330x smaller
|
||||
Timeline: http://tinyurl.com/ybdm537
|
||||
|
||||
### hg_startup ###
|
||||
Min: 0.016997 -> 0.024997: 1.4707x slower
|
||||
Avg: 0.026990 -> 0.036772: 1.3625x slower
|
||||
Significant (t=-53.104502, a=0.95)
|
||||
Stddev: 0.00406 -> 0.00417: 1.0273x larger
|
||||
Timeline: http://tinyurl.com/ycout8m
|
||||
|
||||
|
||||
``bzr_startup`` and ``hg_startup`` measure how long it takes Bazaar and
|
||||
Mercurial, respectively, to display their help screens. ``startup_nosite``
|
||||
runs ``python -S`` many times; usage of the ``-S`` option is rare, but we feel
|
||||
this gives a good indication of where increased startup time is coming from.
|
||||
|
||||
Unladen Swallow has made headway toward optimizing startup time, but there is
|
||||
still more work to do and further optimizations to implement. Improving start-up
|
||||
|
@ -422,40 +485,31 @@ Binary Size
|
|||
-----------
|
||||
|
||||
Statically linking LLVM's code generation, analysis and optimization libraries
|
||||
significantly increases the size of the ``python`` binary.
|
||||
significantly increases the size of the ``python`` binary. The tables below
|
||||
report stripped on-disk binary sizes; the binaries are stripped to better
|
||||
correspond with the configurations used by system package managers. We feel this
|
||||
is the most realistic measure of any change in binary size.
|
||||
|
||||
|
||||
32-bit; gcc 4.0.3
|
||||
+-------------+---------------+---------------+-----------------------+
|
||||
| Binary size | CPython 2.6.4 | CPython 3.1.1 | Unladen Swallow r1041 |
|
||||
+=============+===============+===============+=======================+
|
||||
| 32-bit | 1.3M | 1.4M | 12M |
|
||||
+-------------+---------------+---------------+-----------------------+
|
||||
| 64-bit | 1.6M | 1.6M | 12M |
|
||||
+-------------+---------------+---------------+-----------------------+
|
||||
|
||||
+-------------+---------------+---------------+----------------------+
|
||||
| Binary size | CPython 2.6.4 | CPython 3.1.1 | Unladen Swallow r988 |
|
||||
+=============+===============+===============+======================+
|
||||
| Release | 3.8M | 4.0M | 74M |
|
||||
+-------------+---------------+---------------+----------------------+
|
||||
| Debug | 3.3M | 3.6M | 118M |
|
||||
+-------------+---------------+---------------+----------------------+
|
||||
|
||||
64-bit; gcc 4.2.4
|
||||
The increased binary size is caused by statically linking LLVM's code
|
||||
generation, analysis and optimization libraries into the ``python`` binary.
|
||||
This can be straightforwardly addressed by modifying LLVM to better support
|
||||
shared linking and then using that, instead of the current static linking. For
|
||||
the moment, though, static linking provides an accurate look at the cost of
|
||||
linking against LLVM.
|
||||
|
||||
+-------------+---------------+---------------+----------------------+
|
||||
| Binary size | CPython 2.6.4 | CPython 3.1.1 | Unladen Swallow r988 |
|
||||
+=============+===============+===============+======================+
|
||||
| Release | 5.5M | 5.7M | 89M |
|
||||
+-------------+---------------+---------------+----------------------+
|
||||
| Debug | 4.1M | 4.4M | 128M |
|
||||
+-------------+---------------+---------------+----------------------+
|
||||
|
||||
The increased binary size is due to statically linking LLVM's code generation,
|
||||
analysis and optimization libraries into the ``python`` binary. This can be
|
||||
straightforwardly addressed by modifying LLVM to better support shared linking
|
||||
and then using that, instead of the current static linking. For the moment,
|
||||
though, static linking provides an accurate look at the cost of linking against
|
||||
LLVM.
|
||||
|
||||
Unladen Swallow recently experienced a regression in binary size, going from
|
||||
19MB in Unladen's 2009Q3 release up to the current 74MB shown in the table
|
||||
above. Resolution of this issue [#us-binary-size]_ will block final merger into
|
||||
the ``py3k`` branch.
|
||||
Even when statically linking, we believe there is still headroom to improve
|
||||
on-disk binary size by narrowing Unladen Swallow's dependencies on LLVM. This
|
||||
issue is actively being addressed [#us-binary-size]_.
|
||||
|
||||
|
||||
Performance Retrospective
|
||||
|
@ -610,7 +664,8 @@ especially LLVM's JIT compilation system [#llvm-hardware]_. LLVM's JIT has the
|
|||
best support on x86 and x86-64 systems, and these are the platforms where
|
||||
Unladen Swallow has received the most testing. We are confident in LLVM/Unladen
|
||||
Swallow's support for x86 and x86-64 hardware. PPC and ARM support exists, but
|
||||
is not widely used and may be buggy.
|
||||
is not widely used and may be buggy (for example, [#llvm-ppc-eager-jit-issue]_,
|
||||
[#llvm-far-call-issue]_, [#llvm-arm-jit-issue]_).
|
||||
|
||||
Unladen Swallow is known to work on the following operating systems: Linux,
|
||||
Darwin, Windows. Unladen Swallow has received the most testing on Linux and
|
||||
|
@ -631,7 +686,7 @@ Experimenting with Changes to Python or CPython Bytecode
|
|||
--------------------------------------------------------
|
||||
|
||||
Unladen Swallow's JIT compiler operates on CPython bytecode, and as such, it is
|
||||
immune to Python languages changes that only affect the parser.
|
||||
immune to Python language changes that affect only the parser.
|
||||
|
||||
We recommend that changes to the CPython bytecode compiler or the semantics of
|
||||
individual bytecodes be prototyped in the interpreter loop first, then be ported
|
||||
|
@ -765,6 +820,10 @@ Given the ease of integrating oProfile with LLVM [#llvm-oprofile-change]_ and
|
|||
Unladen Swallow [#us-oprofile-change]_, other profiling tools should be easy as
|
||||
well, provided they support a similar JIT interface [#oprofile-jit-interface]_.
|
||||
|
||||
We have documented the process for using oProfile to profile Unladen Swallow
|
||||
[#oprofile-workflow]_. This document will be merged into CPython's `Doc/` tree
|
||||
in the merge.
|
||||
|
||||
|
||||
Addition of C++ to CPython
|
||||
--------------------------
|
||||
|
@ -781,12 +840,17 @@ Highlights:
|
|||
- Easy use of LLVM's full, powerful code generation and related APIs.
|
||||
- Convenient, abstract data structures simplify code.
|
||||
- C++ is limited to relatively small corners of the CPython codebase.
|
||||
- C++ can be disabled via ``./configure --without-llvm``, which even omits the
|
||||
dependency on ``libstdc++``.
|
||||
|
||||
Lowlights:
|
||||
|
||||
- Developers must know two related languages, C and C++ to work on the full
|
||||
range of CPython's internals.
|
||||
- A C++ style guide will need to be developed and enforced. See `Open Issues`_.
|
||||
- Different C++ compilers emit different ABIs; this can cause problems if
|
||||
CPython is compiled with one C++ compiler and extensions modules are compiled
|
||||
with a different C++ compiler.
|
||||
|
||||
|
||||
Managing LLVM Releases, C++ API Changes
|
||||
|
@ -813,20 +877,26 @@ generally become available via standard system package managers fairly quickly
|
|||
following an LLVM release, and failing that, llvm.org itself includes binary
|
||||
releases.
|
||||
|
||||
Pre-built LLVM packages are available from MacPorts [#llvm-macports]_ for
|
||||
Darwin, and from most major Linux distributions ([#llvm-ubuntu]_,
|
||||
Unladen Swallow has historically included a copy of the LLVM and Clang source
|
||||
trees in the Unladen Swallow tree; this was done to allow us to closely track
|
||||
LLVM trunk as we made patches to it. We do not recommend this model of
|
||||
development for CPython. CPython releases should be based on official LLVM
|
||||
releases. Pre-built LLVM packages are available from MacPorts [#llvm-macports]_
|
||||
for Darwin, and from most major Linux distributions ([#llvm-ubuntu]_,
|
||||
[#llvm-debian]_, [#llvm-fedora]_). LLVM itself provides additional binaries,
|
||||
such as for MinGW [#llvm-mingw]_.
|
||||
|
||||
LLVM is currently intended to be statically linked; this means that binary
|
||||
releases of CPython will include the relevant parts (not all!) of LLVM. This
|
||||
will increase the binary size, as noted above.
|
||||
will increase the binary size, as noted above. To simplify downstream package
|
||||
management, we will modify LLVM to better support shared linking. This issue
|
||||
will block final merger [#us-shared-link-issue]_.
|
||||
|
||||
Unladen Swallow has tasked a full-time engineer with fixing any remaining
|
||||
critical issues in LLVM before LLVM's 2.7 release. We would like CPython 3.x to
|
||||
be able to depend on a released version of LLVM, rather than closely tracking
|
||||
LLVM trunk as Unladen Swallow has done. We believe we will finish this work
|
||||
before the release of LLVM 2.7, expected in May 2010.
|
||||
critical issues in LLVM before LLVM's 2.7 release. We consider it essential that
|
||||
CPython 3.x be able to depend on a released version of LLVM, rather than closely
|
||||
tracking LLVM trunk as Unladen Swallow has done. We believe we will finish this
|
||||
work [#us-llvm-punchlist]_ before the release of LLVM 2.7, expected in May 2010.
|
||||
|
||||
|
||||
Building CPython
|
||||
|
@ -868,27 +938,22 @@ Full builds take a hit due to a) additional ``.cc`` files needed for LLVM
|
|||
interaction, b) statically linking LLVM into ``libpython``, c) compiling parts
|
||||
of the Python runtime to LLVM IR to enable cross-language inlining.
|
||||
|
||||
Incremental builds, however, are significantly slower. The table below shows
|
||||
incremental rebuild times after touching ``Objects/listobject.c``.
|
||||
Incremental builds are also somewhat slower than mainline CPython. The table
|
||||
below shows incremental rebuild times after touching ``Objects/listobject.c``.
|
||||
|
||||
+-------------+---------------+---------------+----------------------+
|
||||
| Incr make | CPython 2.6.4 | CPython 3.1.1 | Unladen Swallow r988 |
|
||||
+=============+===============+===============+======================+
|
||||
| Run 1 | 0m1.854s | 0m1.456s | 0m24.464s |
|
||||
+-------------+---------------+---------------+----------------------+
|
||||
| Run 2 | 0m1.437s | 0m1.442s | 0m24.416s |
|
||||
+-------------+---------------+---------------+----------------------+
|
||||
| Run 3 | 0m1.440s | 0m1.425s | 0m24.352s |
|
||||
+-------------+---------------+---------------+----------------------+
|
||||
+-------------+---------------+---------------+-----------------------+
|
||||
| Incr make | CPython 2.6.4 | CPython 3.1.1 | Unladen Swallow r1024 |
|
||||
+=============+===============+===============+=======================+
|
||||
| Run 1 | 0m1.854s | 0m1.456s | 0m6.680s |
|
||||
+-------------+---------------+---------------+-----------------------+
|
||||
| Run 2 | 0m1.437s | 0m1.442s | 0m5.310s |
|
||||
+-------------+---------------+---------------+-----------------------+
|
||||
| Run 3 | 0m1.440s | 0m1.425s | 0m7.639s |
|
||||
+-------------+---------------+---------------+-----------------------+
|
||||
|
||||
As with full builds, this extra time comes from a) additional ``.cc`` files
|
||||
needed for LLVM interaction, and b) statically linking LLVM into ``libpython``.
|
||||
|
||||
If ``libpython`` were linked shared against LLVM, this overhead would go down.
|
||||
Incremental builds of Unladen Swallow also currently (as of r988) suffer from a
|
||||
known bug in the Unladen Swallow ``Makefile`` [#rebuild-too-much]_ where too
|
||||
many ``.cc`` files are recompiled. We consider this a blocking issue for full
|
||||
merger with the ``py3k`` branch.
|
||||
As with full builds, this extra time comes from statically linking LLVM
|
||||
into ``libpython``. If ``libpython`` were linked shared against LLVM, this
|
||||
overhead would go down.
|
||||
|
||||
|
||||
Proposed Merge Plan
|
||||
|
@ -930,6 +995,31 @@ See the `Open Issues`_ section for questions about code review policy for the
|
|||
``py3k-jit`` branch.
|
||||
|
||||
|
||||
Contingency Plans
|
||||
-----------------
|
||||
|
||||
There is a chance that we will not be able to reduce memory usage or startup
|
||||
time to a level satisfactory to the CPython community. Our primary contingency
|
||||
plan for this situation is to shift from a online just-in-time compilation
|
||||
strategy to an offline ahead-of-time strategy using an instrumented CPython
|
||||
interpreter loop to obtain feedback. This is the same model used by gcc's
|
||||
feedback-directed optimizations (`-fprofile-generate`) [#gcc-fdo]_ and
|
||||
Microsoft Visual Studio's profile-guided optimizations [#msvc-pgo]_; we will
|
||||
refer to this as "feedback-directed optimization" here, or FDO.
|
||||
|
||||
We believe that an FDO compiler for Python would be inferior to a JIT compiler.
|
||||
FDO requires a high-quality, representative benchmark suite, which is a relative
|
||||
rarity in both open- and closed-source development. A JIT compiler can
|
||||
dynamically find and optimize the hot spots in any application -- benchmark
|
||||
suite or no -- allowing it to adapt to changes in application bottlenecks
|
||||
without human intervention.
|
||||
|
||||
If an ahead-of-time FDO compiler is required, it should be able to leverage a
|
||||
large percentage of the code and infrastructure already developed for Unladen
|
||||
Swallow's JIT compiler. Indeed, these two compilation strategies could exist
|
||||
side-by-side.
|
||||
|
||||
|
||||
Future Work
|
||||
===========
|
||||
|
||||
|
@ -959,6 +1049,9 @@ had time to fully implement. Examples:
|
|||
initially avoided a purely-tracing JIT compiler in favor of a simpler,
|
||||
function-at-a-time compiler. However this function-at-a-time compiler has laid
|
||||
the groundwork for a future tracing compiler implemented in the same terms.
|
||||
- Profile generation/reuse. The runtime data gathered by the JIT could be
|
||||
persisted to disk and reused by subsequent JIT compilations, or by external
|
||||
tools such as Cython [#cython]_ or a feedback-enhanced code coverage tool.
|
||||
|
||||
This list is by no means exhaustive. There is a vast literature on optimizations
|
||||
for dynamic languages that could and should be implemented in terms of Unladen
|
||||
|
@ -977,8 +1070,6 @@ Open Issues
|
|||
organization. We would like a non-Google-affiliated member of the CPython
|
||||
development team to review our work for correctness and compatibility, but we
|
||||
realize this may not be possible for every commit.
|
||||
- *How to link LLVM.* Should we change LLVM to better support shared linking,
|
||||
and then use shared linking to link the parts of it we need into CPython?
|
||||
- *Prioritization of remaining issues.* We would like input from the CPython
|
||||
development team on how to prioritize the remaining issues in the Unladen
|
||||
Swallow codebase. Some issues like memory usage are obviously critical before
|
||||
|
@ -1007,6 +1098,10 @@ All work on Unladen Swallow is licensed to the Python Software Foundation (PSF)
|
|||
under the terms of the Python Software Foundation License v2 [#psf-lic]_ under
|
||||
the umbrella of Google's blanket Contributor License Agreement with the PSF.
|
||||
|
||||
LLVM is licensed [#llvm-lic]_ under the University of llinois/NCSA Open Source
|
||||
License [#ui-lic]_, a liberal, OSI-approved license. The University of Illinois
|
||||
Urbana-Champaign is the sole copyright holder for LLVM.
|
||||
|
||||
|
||||
References
|
||||
==========
|
||||
|
@ -1026,9 +1121,6 @@ References
|
|||
.. [#llvm-hardware]
|
||||
http://llvm.org/docs/GettingStarted.html#hardware
|
||||
|
||||
.. [#rebuild-too-much]
|
||||
http://code.google.com/p/unladen-swallow/issues/detail?id=115
|
||||
|
||||
.. [#llvm-c-api]
|
||||
http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm-c/
|
||||
|
||||
|
@ -1077,6 +1169,9 @@ References
|
|||
.. [#oprofile-jit-interface]
|
||||
http://oprofile.sourceforge.net/doc/devel/jit-interface.html
|
||||
|
||||
.. [#oprofile-workflow]
|
||||
http://code.google.com/p/unladen-swallow/wiki/UsingOProfile
|
||||
|
||||
.. [#llvm-mingw]
|
||||
http://llvm.org/releases/download.html
|
||||
|
||||
|
@ -1179,6 +1274,12 @@ References
|
|||
.. [#psf-lic]
|
||||
http://www.python.org/psf/license/
|
||||
|
||||
.. [#llvm-lic]
|
||||
http://llvm.org/docs/DeveloperPolicy.html#clp
|
||||
|
||||
.. [#ui-lic]
|
||||
http://www.opensource.org/licenses/UoI-NCSA.php
|
||||
|
||||
.. [#v8]
|
||||
http://code.google.com/p/v8/
|
||||
|
||||
|
@ -1296,6 +1397,54 @@ References
|
|||
.. [#us-nbody]
|
||||
http://code.google.com/p/unladen-swallow/source/browse/tests/performance/bm_nbody.py
|
||||
|
||||
.. [#us-shared-link-issue]
|
||||
http://code.google.com/p/unladen-swallow/issues/detail?id=130
|
||||
|
||||
.. [#us-llvm-punchlist]
|
||||
http://code.google.com/p/unladen-swallow/issues/detail?id=131
|
||||
|
||||
.. [#llvm-ppc-eager-jit-issue]
|
||||
http://llvm.org/PR4816
|
||||
|
||||
.. [#llvm-arm-jit-issue]
|
||||
http://llvm.org/PR6065
|
||||
|
||||
.. [#cython]
|
||||
http://www.cython.org/
|
||||
|
||||
.. [#shedskin]
|
||||
http://shed-skin.blogspot.com/
|
||||
|
||||
.. [#shedskin-library-limits]
|
||||
http://shedskin.googlecode.com/files/shedskin-tutorial-0.3.html
|
||||
|
||||
.. [#wpython]
|
||||
http://code.google.com/p/wpython/
|
||||
|
||||
.. [#wpython-performance]
|
||||
http://www.mail-archive.com/python-dev@python.org/msg45143.html
|
||||
|
||||
.. [#ironpython]
|
||||
http://ironpython.net/
|
||||
|
||||
.. [#mono]
|
||||
http://www.mono-project.com/
|
||||
|
||||
.. [#jython]
|
||||
http://www.jython.org/
|
||||
|
||||
.. [#jython-c-ext]
|
||||
http://wiki.python.org/jython/JythonFaq/GeneralInfo
|
||||
|
||||
.. [#pyv8]
|
||||
http://code.google.com/p/pyv8/
|
||||
|
||||
.. [#gcc-fdo]
|
||||
http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html
|
||||
|
||||
.. [#msvc-pgo]
|
||||
http://msdn.microsoft.com/en-us/library/e7k32f4k.aspx
|
||||
|
||||
|
||||
Copyright
|
||||
=========
|
||||
|
|
Loading…
Reference in New Issue