Thoroughly clean up what impressions I have written so far.
This commit is contained in:
parent
e9886ce329
commit
974c644da5
188
pep-0374.txt
188
pep-0374.txt
|
@ -1344,120 +1344,136 @@ merging are absent, and branches are handled as CVS modules, which
|
|||
is likely to shock a veteran CVS user.
|
||||
|
||||
|
||||
Impressions
|
||||
===========
|
||||
Tests/Impressions
|
||||
=================
|
||||
|
||||
As I (Brett Cannon) am left with the task of of making the final
|
||||
decision of which/any DVCS to go with and not my co-authors, I felt
|
||||
it only fair to write down my impressions as I evaluate the various
|
||||
tools so as to be as transparent as possible.
|
||||
it only fair to write down what tests I ran and my impressions as I
|
||||
evaluate the various tools so as to be as transparent as possible.
|
||||
|
||||
To begin, I measured the checking out of code as if I was a non-core
|
||||
developer. This is important as this is the first impression
|
||||
developers have when they decide they wish to contribute a patch to
|
||||
Python. Timings were done using the ``time`` command in zsh and
|
||||
|
||||
Barrier to Entry
|
||||
----------------
|
||||
|
||||
The amount of time and effort it takes to get a checkout of Python's
|
||||
repository is critical. If the difficulty or time is too great then a
|
||||
person wishing to contribute to Python may very well give up. That
|
||||
cannot be allowed to happen.
|
||||
|
||||
I measured the checking out of code as if I was a non-core
|
||||
developer. Timings were done using the ``time`` command in zsh and
|
||||
space was calculated with ``du -c -h``.
|
||||
|
||||
======= ================ ==============
|
||||
DVCS Time Space
|
||||
------- ---------------- --------------
|
||||
svn 1:04 139 M
|
||||
bzr 2:29:24 or 8:46 275 M or 596 M
|
||||
hg 2:30 171 M
|
||||
git 2:54 134 M
|
||||
svn 1:04 139 M
|
||||
bzr 1 2:29:24 275 M
|
||||
bzr 2 8:46 596 M
|
||||
hg 2:30 171 M
|
||||
git 2:54 134 M
|
||||
======= ================ ==============
|
||||
|
||||
The svn measurements are not exactly a 1:1 comparison to the DVCSs.
|
||||
For one, svn does not download the entire revision history, and thus
|
||||
(should) have the least amount to download. And two, because various
|
||||
calculation steps are left up to the server the entire process of
|
||||
checking out code (should) be faster.
|
||||
.. note::
|
||||
The *bzr 1* entry is for
|
||||
following the instructions in the `One-Off Checkout`_ scenario
|
||||
instructions pulling from Launchpad_ in mid-January.
|
||||
The *bzr 2* entry is based on following the instructions
|
||||
for the `experimental Bazaar branches
|
||||
<http://www.python.org/dev/bazaar/>`_ and pulling from
|
||||
http://code.python.org/python/trunk/.
|
||||
|
||||
But the svn measurements should be considered as what developers are
|
||||
used to. Thus they act as a reference point for what people tend to
|
||||
expect in terms of performance.
|
||||
When comparing these numbers to svn, it is important to realize that
|
||||
it is not a 1:1 comparison. Svn does not pull down the entire revision
|
||||
history like all of the DVCSs do. That means svn can perform an
|
||||
initial checkout much faster than the DVCS purely based on the fact
|
||||
that it has less information to worry about.
|
||||
|
||||
Looking at bzr, I have listed two numbers. The first values are for
|
||||
running ``bzr branch`` as outlined in the `One-Off Checkout`_
|
||||
scenario. When the
|
||||
timings came back in hours (I used Launchpad as code.python.org is
|
||||
not running the newest version of bzr and I wanted to use its latest
|
||||
networking protocol), I decided to try using the steps outlined when
|
||||
the experimental bzr branches were first created. That second
|
||||
approach is what the second set of values for bzr represent.
|
||||
|
||||
While both the hg and git numbers are perfectly acceptable, the bzr
|
||||
numbers not necessarily. The raw ``bzr branch`` approach is entirely
|
||||
not acceptable as no one wants to wait over two hours to write a
|
||||
potentially one line change to some code for the benefit of Python.
|
||||
Assuming 8:46 is a reasonable amount of time (I believe it in
|
||||
general is, but it is teetering on not), the 596 M space requirement
|
||||
could be an issue for some. While we typically view disk space as
|
||||
cheap, for some people it might be an issue (e.g. the person who did
|
||||
the schedule for PyCon 2008 did it over a connection so badly that
|
||||
Google Spreadsheets didn't work for him and he had to submit the
|
||||
schedule in another form than the one original used). Once again I
|
||||
think the space usage is acceptable, but it is close to being too
|
||||
much.
|
||||
Performance of basic information functionality
|
||||
----------------------------------------------
|
||||
|
||||
To see if bzr's performance would be acceptable once at least the
|
||||
branch was downloaded, I decided to see how long it would take to
|
||||
get the change log for a file. I chose the README file as it sees
|
||||
regular changes for every release and has a revision history going
|
||||
back to 1993 and thus would have a fair number of revisions.
|
||||
It should be mentioned that while git had the nicest output thanks to
|
||||
its color terminal output, it also took a while to find the
|
||||
``--no-pager`` flag in order to get just a stream of text instead of
|
||||
having the output sent to the pager.
|
||||
To see how the tools did for performing a command that required
|
||||
querying the history, the log for the ``README`` file was timed.
|
||||
|
||||
Overall the numbers were all acceptable:
|
||||
==== =====
|
||||
DVCS Time
|
||||
---- -----
|
||||
bzr 4.5 s
|
||||
hg 1.1 s
|
||||
git 1.5 s
|
||||
==== =====
|
||||
|
||||
* bzr: 4.5 seconds
|
||||
* hg: 1.1 seconds
|
||||
* git: 1.5 seconds
|
||||
One thing of note during this test was that git took longer than the
|
||||
other three tools to figure out how to get the log without it using a
|
||||
pager. While the pager use is a nice touch in general, not having it
|
||||
automatically turn on took some time (turns out the main ``git``
|
||||
command has a ``--no-pager`` flag to disable use of the pager).
|
||||
|
||||
While having bzr be over 3x slower than its nearest neighbor, it
|
||||
must be kept in mind that the total performance time is still
|
||||
acceptable, regardless of the multiplier.
|
||||
|
||||
Because a DVCS keeps its revision history on disk, it also means
|
||||
that typically they can be zipped up for direct downloading. At
|
||||
least in bzr's case that would solve the performance issue for
|
||||
initial checkout if the zip file could be generate constantly. But
|
||||
that didn't address the cost of pulling in new revisions when a
|
||||
checkout has gone stale. To measure this I decided I would check out
|
||||
the repositories back about 700 revisions which represented the
|
||||
amount of change made since the beginning of the month and time how
|
||||
long they took to update.
|
||||
Figuring out what command to use from built-in help
|
||||
----------------------------------------------------
|
||||
|
||||
For this to happen I first had to remember the URLs for the
|
||||
repositories. Instead of simply looking in this PEP, though, I
|
||||
decided to try to figure it out from the command-line help for each
|
||||
tool or simply guessing. Bzr worked out great with ``bzr info``. Git
|
||||
took a little poking around, but I figured out ``git remote show
|
||||
origin`` told me what I needed. For hg, though, I couldn't figure it
|
||||
out short of running ``hg pull`` and denoting the status information
|
||||
during the pull (turns out ``hg paths`` is what I was looking for).
|
||||
I ended up trying to find out what the command was to see what URL the
|
||||
repository was cloned from. To do this I used nothing more than the
|
||||
help provided by the tool itself or its man pages.
|
||||
|
||||
With the repository locations known I then had to perform a checkout
|
||||
to a certain revision. Turns out that git will not clone a
|
||||
repository to only a specific revision, although from personal
|
||||
experience git's pull facility is very fast. Bzr was able to perform
|
||||
its update in just over 39 seconds. Hg did its update in just over
|
||||
17 seconds. Much like the log test, while the multiplier of slowness
|
||||
seems high, in real life terms al DVCSs performed within reason.
|
||||
Bzr was the easiest: ``bzr info``. Running ``bzr help`` didn't show
|
||||
what I wanted, but mentioned ``bzr help commands``. That list had the
|
||||
command with a description that made sense.
|
||||
|
||||
In my mind this means that bzr is only an acceptable candidate as
|
||||
long as an fairly up-to-date archive of Python's key branches are
|
||||
made available for people to download to avoid bzr's very so remote
|
||||
branching.
|
||||
Git was the second easiest. The command ``git help`` didn't show much
|
||||
and did not have a way of listing all commands. That is when I viewed
|
||||
the man page. Reading through the various commands I discovered ``git
|
||||
remote``. The command itself spit out nothing more than ``origin``.
|
||||
Trying ``git remote origin`` said it was an error and printed out the
|
||||
command usage. That is when I noticed ``git remote show``. Running
|
||||
``git remote show origin`` gave me the information I wanted.
|
||||
|
||||
For hg, I never found the information I wanted on my own. It turns out
|
||||
I wanted ``hg paths``, but that was not obvious from the description
|
||||
of "show definition of symbolic path names" as printed by ``hg help``.
|
||||
|
||||
|
||||
Updating a checkout
|
||||
---------------------
|
||||
|
||||
To see how long it takes to update an outdated repository I timed both
|
||||
updating a repository 700 commits behind and 50 commits behind (three
|
||||
weeks stale and 1 week stale, respectively).
|
||||
|
||||
==== =========== ==========
|
||||
DVCS 700 commits 50 commits
|
||||
---- ----------- ----------
|
||||
bzr 39 s 7 s
|
||||
hg 17 s 3 s
|
||||
git N/A 4 s
|
||||
==== =========== ==========
|
||||
|
||||
.. note::
|
||||
Git lacks a value for the *700 commits* scenario as it does
|
||||
not seem to allow checking out a repository at a specific
|
||||
revision.
|
||||
|
||||
Git deserves special mention for its output from ``git pull``. It
|
||||
not only lists the delta change information for each file but also
|
||||
color-codes the information.
|
||||
|
||||
|
||||
XXX ... usage on top of svn, filling in `Coordinated Development of a
|
||||
New Feature`_ scenario
|
||||
|
||||
XXX ... to be continued
|
||||
|
||||
|
||||
Chosen DVCS
|
||||
===========
|
||||
|
||||
XXX
|
||||
::
|
||||
|
||||
import random
|
||||
print(random.choice(['svn', 'bzr', 'hg', 'git']))
|
||||
|
||||
|
||||
Transition Plan
|
||||
|
|
Loading…
Reference in New Issue