Thoroughly clean up what impressions I have written so far.

This commit is contained in:
Brett Cannon 2009-01-24 23:06:31 +00:00
parent e9886ce329
commit 974c644da5
1 changed files with 102 additions and 86 deletions

View File

@ -1344,120 +1344,136 @@ merging are absent, and branches are handled as CVS modules, which
is likely to shock a veteran CVS user.
Impressions
===========
Tests/Impressions
=================
As I (Brett Cannon) am left with the task of of making the final
decision of which/any DVCS to go with and not my co-authors, I felt
it only fair to write down my impressions as I evaluate the various
tools so as to be as transparent as possible.
it only fair to write down what tests I ran and my impressions as I
evaluate the various tools so as to be as transparent as possible.
To begin, I measured the checking out of code as if I was a non-core
developer. This is important as this is the first impression
developers have when they decide they wish to contribute a patch to
Python. Timings were done using the ``time`` command in zsh and
Barrier to Entry
----------------
The amount of time and effort it takes to get a checkout of Python's
repository is critical. If the difficulty or time is too great then a
person wishing to contribute to Python may very well give up. That
cannot be allowed to happen.
I measured the checking out of code as if I was a non-core
developer. Timings were done using the ``time`` command in zsh and
space was calculated with ``du -c -h``.
======= ================ ==============
DVCS Time Space
------- ---------------- --------------
svn 1:04 139 M
bzr 2:29:24 or 8:46 275 M or 596 M
hg 2:30 171 M
git 2:54 134 M
svn 1:04 139 M
bzr 1 2:29:24 275 M
bzr 2 8:46 596 M
hg 2:30 171 M
git 2:54 134 M
======= ================ ==============
The svn measurements are not exactly a 1:1 comparison to the DVCSs.
For one, svn does not download the entire revision history, and thus
(should) have the least amount to download. And two, because various
calculation steps are left up to the server the entire process of
checking out code (should) be faster.
.. note::
The *bzr 1* entry is for
following the instructions in the `One-Off Checkout`_ scenario
instructions pulling from Launchpad_ in mid-January.
The *bzr 2* entry is based on following the instructions
for the `experimental Bazaar branches
<http://www.python.org/dev/bazaar/>`_ and pulling from
http://code.python.org/python/trunk/.
But the svn measurements should be considered as what developers are
used to. Thus they act as a reference point for what people tend to
expect in terms of performance.
When comparing these numbers to svn, it is important to realize that
it is not a 1:1 comparison. Svn does not pull down the entire revision
history like all of the DVCSs do. That means svn can perform an
initial checkout much faster than the DVCS purely based on the fact
that it has less information to worry about.
Looking at bzr, I have listed two numbers. The first values are for
running ``bzr branch`` as outlined in the `One-Off Checkout`_
scenario. When the
timings came back in hours (I used Launchpad as code.python.org is
not running the newest version of bzr and I wanted to use its latest
networking protocol), I decided to try using the steps outlined when
the experimental bzr branches were first created. That second
approach is what the second set of values for bzr represent.
While both the hg and git numbers are perfectly acceptable, the bzr
numbers not necessarily. The raw ``bzr branch`` approach is entirely
not acceptable as no one wants to wait over two hours to write a
potentially one line change to some code for the benefit of Python.
Assuming 8:46 is a reasonable amount of time (I believe it in
general is, but it is teetering on not), the 596 M space requirement
could be an issue for some. While we typically view disk space as
cheap, for some people it might be an issue (e.g. the person who did
the schedule for PyCon 2008 did it over a connection so badly that
Google Spreadsheets didn't work for him and he had to submit the
schedule in another form than the one original used). Once again I
think the space usage is acceptable, but it is close to being too
much.
Performance of basic information functionality
----------------------------------------------
To see if bzr's performance would be acceptable once at least the
branch was downloaded, I decided to see how long it would take to
get the change log for a file. I chose the README file as it sees
regular changes for every release and has a revision history going
back to 1993 and thus would have a fair number of revisions.
It should be mentioned that while git had the nicest output thanks to
its color terminal output, it also took a while to find the
``--no-pager`` flag in order to get just a stream of text instead of
having the output sent to the pager.
To see how the tools did for performing a command that required
querying the history, the log for the ``README`` file was timed.
Overall the numbers were all acceptable:
==== =====
DVCS Time
---- -----
bzr 4.5 s
hg 1.1 s
git 1.5 s
==== =====
* bzr: 4.5 seconds
* hg: 1.1 seconds
* git: 1.5 seconds
One thing of note during this test was that git took longer than the
other three tools to figure out how to get the log without it using a
pager. While the pager use is a nice touch in general, not having it
automatically turn on took some time (turns out the main ``git``
command has a ``--no-pager`` flag to disable use of the pager).
While having bzr be over 3x slower than its nearest neighbor, it
must be kept in mind that the total performance time is still
acceptable, regardless of the multiplier.
Because a DVCS keeps its revision history on disk, it also means
that typically they can be zipped up for direct downloading. At
least in bzr's case that would solve the performance issue for
initial checkout if the zip file could be generate constantly. But
that didn't address the cost of pulling in new revisions when a
checkout has gone stale. To measure this I decided I would check out
the repositories back about 700 revisions which represented the
amount of change made since the beginning of the month and time how
long they took to update.
Figuring out what command to use from built-in help
----------------------------------------------------
For this to happen I first had to remember the URLs for the
repositories. Instead of simply looking in this PEP, though, I
decided to try to figure it out from the command-line help for each
tool or simply guessing. Bzr worked out great with ``bzr info``. Git
took a little poking around, but I figured out ``git remote show
origin`` told me what I needed. For hg, though, I couldn't figure it
out short of running ``hg pull`` and denoting the status information
during the pull (turns out ``hg paths`` is what I was looking for).
I ended up trying to find out what the command was to see what URL the
repository was cloned from. To do this I used nothing more than the
help provided by the tool itself or its man pages.
With the repository locations known I then had to perform a checkout
to a certain revision. Turns out that git will not clone a
repository to only a specific revision, although from personal
experience git's pull facility is very fast. Bzr was able to perform
its update in just over 39 seconds. Hg did its update in just over
17 seconds. Much like the log test, while the multiplier of slowness
seems high, in real life terms al DVCSs performed within reason.
Bzr was the easiest: ``bzr info``. Running ``bzr help`` didn't show
what I wanted, but mentioned ``bzr help commands``. That list had the
command with a description that made sense.
In my mind this means that bzr is only an acceptable candidate as
long as an fairly up-to-date archive of Python's key branches are
made available for people to download to avoid bzr's very so remote
branching.
Git was the second easiest. The command ``git help`` didn't show much
and did not have a way of listing all commands. That is when I viewed
the man page. Reading through the various commands I discovered ``git
remote``. The command itself spit out nothing more than ``origin``.
Trying ``git remote origin`` said it was an error and printed out the
command usage. That is when I noticed ``git remote show``. Running
``git remote show origin`` gave me the information I wanted.
For hg, I never found the information I wanted on my own. It turns out
I wanted ``hg paths``, but that was not obvious from the description
of "show definition of symbolic path names" as printed by ``hg help``.
Updating a checkout
---------------------
To see how long it takes to update an outdated repository I timed both
updating a repository 700 commits behind and 50 commits behind (three
weeks stale and 1 week stale, respectively).
==== =========== ==========
DVCS 700 commits 50 commits
---- ----------- ----------
bzr 39 s 7 s
hg 17 s 3 s
git N/A 4 s
==== =========== ==========
.. note::
Git lacks a value for the *700 commits* scenario as it does
not seem to allow checking out a repository at a specific
revision.
Git deserves special mention for its output from ``git pull``. It
not only lists the delta change information for each file but also
color-codes the information.
XXX ... usage on top of svn, filling in `Coordinated Development of a
New Feature`_ scenario
XXX ... to be continued
Chosen DVCS
===========
XXX
::
import random
print(random.choice(['svn', 'bzr', 'hg', 'git']))
Transition Plan