python-peps/pep-0374.txt

1463 lines
51 KiB
Plaintext
Raw Normal View History

PEP: 374
Title: Migrating from svn to a distributed VCS
Version: $Revision: 65628 $
Last-Modified: $Date: 2008-08-10 06:59:20 -0700 (Sun, 10 Aug 2008) $
Author: Brett Cannon <brett@python.org>,
Stephen J. Turnbull <stephen@xemacs.org>,
Alexandre Vassalotti <alexandre@peadrop.com>,
Barry Warsaw <barry@python.org>
Status: Active
Type: Process
Content-Type: text/x-rst
Created: 07-Nov-2008
Post-History: 07-Nov-2008
22-Jan-2009
.. warning::
This PEP is in the draft stages and is still under active
development.
Rationale
=========
Python has been using a centralized version control system (VCS;
first CVS, now Subversion) for years to great effect. Having a master
copy of the official version of Python provides people with a single
place to always get the official Python source code. It has also
allowed for the storage of the history of the language, mostly for
help with development, but also for posterity. And of course the V in
VCS is very helpful when developing.
But a centralized version control system has its drawbacks. First and
foremost, in order to have the benefits of version control with
Python in a seamless fashion, one must be a "core developer" (i.e.
someone with commit privileges on the master copy of Python). People
who are not core developers but who wish to work with Python's
revision tree, e.g. anyone writing a patch for Python or creating a
custom version, do not have direct tool support for revisions. This
can be quite a limitation, since these non-core developers cannot do
easily do basic tasks such as reverting changes to a previously
saved state, creating branches, publishing one's changes with full
revision history, etc.  For non-core developers, the last safe tree
state is one the Python developers happen to set, and this prevents
safe development. This second-class citizenship is a hindrance to
people who wish to contribute to Python with a patch of any
complexity and want a way to incrementally save their progress to
make their development lives easier.
There is also the issue of having to be online to be able to commit
one's work. Because centralized VCSs keep a central copy that stores
all revisions, one must have Internet access in order for their
revisions to be stored; no Net, no commit. This can be annoying if
you happen to be traveling and lack any Internet. There is also the
situation of someone wishing to contribute to Python but having a
bad Internet connection where committing is time-consuming and
expensive and it might work out better to do it in a single step.
Another drawback to a centralized VCS is that a common use case is
for a developer to revise patches in response to review comments. 
This is more difficult with a centralized model because there's no
place to contain intermediate work.  It's either all checked in or
none of it is checked in.  In the centralized VCS, it's also very
difficult to track changes to the trunk as they are committed, while
you're working on your feature or bug fix branch.  This increases
the risk that such branches will grow stale, out-dated, or that
merging them into the trunk will generate too may conflicts to be
easily resolved.
Lastly, there is the issue of maintenance of Python. At any one time
there is at least one major version of Python under development (at
the time of this writing there are two). For each major version of
Python under development there is at least the maintenance version
of the last minor version and the in-development minor version (e.g.
with 2.6 just released, that means that both 2.6 and 2.7 are being
worked on). Once a release is done, a branch is created between the
code bases where changes in one version do not (but could) belong in
the other version. As of right now there is no natural support for
this branch in time in central VCSs; you must use tools that
simulate the branching.  Tracking merges is similarly painful for
developers, as revisions often need to be merged between four active
branches (e.g. 2.6 maintenance, 3.0 maintenance, 2.7 development,
3.1 development).  In this case, VCSs such as Subversion only handle
this through arcane third party tools.
Distributed VCSs (DVCSs) solve all of these problems. While one can
keep a master copy of a revision tree, anyone is free to copy that
tree for their own use. This gives everyone the power to commit
changes to their copy, online or offline. It also more naturally
ties into the idea of branching in the history of a revision tree
for maintenance and the development of new features bound for
Python.  DVCSs also provide a great many additional features that
centralized VCSs don't or can't provide.
This PEP explores the issue of changing Python's use of Subversion
for any of the current Python-implemented DVCSs, in order to gain
the benefits outlined above.
Terminology
===========
Agreeing on a common terminology is surprisingly difficult,
primarily because each VCS uses these terms when describing subtly
different tasks, objects, and concepts.  Where possible, we try to
provide a generic definition of the concepts, but you should consult
the individual system's glossaries for details.  Here are some basic
references for terminology, from some of the standard web-based
references on each VCS. You can also refer to glossaries for each
DVCS:
* Subversion : http://svnbook.red-bean.com/en/1.5/svn.basic.html
* Bazaar : http://bazaar-vcs.org/BzrGlossary
* Mercurial : http://www.selenic.com/mercurial/wiki/index.cgi/UnderstandingMercurial
* git : http://book.git-scm.com/1_the_git_object_model.html
branch
A line of development; a collection of revisions, ordered by
time.
checkout/working copy/working tree
A tree of code the developer can edit, linked to a branch.
index
A "staging area" where a revision is built (unique to git).
repository
A collection of revisions, organized into branches.
clone
A complete copy of a branch or repository.
commit
To record a revision in a repository.
merge
Applying all the changes and history from one branch/repository
to another.
pull
To update a checkout/clone from the original branch/repository,
which can be remote or local
push/publish
To copy a revision, and all revisions it depends on, from a one
repository to another.
cherry-pick
To merge one or more specific revisions from one branch to
another, possibly in a different repository, possibly without its
dependent revisions.
rebase
To "detach" a branch, and move it to a new branch point; move
commits to the beginning of a branch instead of where they
happened in time.
Typical Workflow
================
At the moment, the typical workflow for a Python core developer is:
* Edit code in a checkout until it is stable enough to commit/push.
* Commit to the master repository.
It is a rather simple workflow, but it has drawbacks. For one,
because any work that involves the repository takes time thanks to
the network, commits/pushes tend to not necessarily be as atomic as
possible. There is also the drawback of there not being a
necessarily cheap way to create new checkouts beyond a recursive
copy of the checkout directory.
A DVCS would lead to a workflow more like this:
* Branch off of a local clone of the master repository.
* Edit code, committing in atomic pieces.
* Merge the branch into the mainline, and
* Push all commits to the master repository.
While there are more possible steps, the workflow is much more
independent of the master repository than is currently possible. By
being able to commit locally at the speed of your disk, a core
developer is able to do atomic commits much more frequently,
minimizing having commits that do multiple things to the code. Also
by using a branch, the changes are isolated (if desired) from other
changes being made by other developers.  Because branches are cheap,
it is easy to create and maintain many smaller branches that address
one specific issue, e.g. one bug or one new feature.  More
sophisticated features of DVCSs allow the developer to more easily
track long running development branches as the official mainline
progresses.
Contenders
==========
========== ========== ======= =================================== ==========================================
Name Short Name Version 2.x Trunk Mirror 3.x Trunk Mirror
---------- ---------- ------- ----------------------------------- ------------------------------------------
Bazaar_ bzr 1.10 http://code.python.org/python/trunk http://code.python.org/python/3.0
Mercurial_ hg 1.1 http://code.python.org/hg/trunk/ http://code.python.org/hg/branches/py3k/
git_ N/A 1.6.0.6 git://code.python.org/python/trunk git://code.python.org/python/branches/py3k
========== ========== ======= =================================== ==========================================
.. _Bazaar: http://bazaar-vcs.org/
.. _Mercurial: http://www.selenic.com/mercurial/
.. _git: http://www.git-scm.com/
This PEP does not consider darcs, arch, or monotone. The main
problem with these DVCSs is that they are simply not popular enough
to bother supporting when they do not provide some very compelling
features that the other DVCSs provide.  Arch and darcs also have
significant performance problems which seem unlikely to be addressed
in the near future.
Usage Scenarios
===============
Probably the best way to help decide on whether/which DVCS should
replace Subversion is to see what it takes to perform some
real-world usage scenarios that developers (core and non-core) have
to work with. Each usage scenario outlines what it is, a bullet list
of what the basic steps are (which can vary slightly per VCS), and
how to perform the usage scenario in the various VCSs
(including Subversion).
Each VCS had a single author in charge of writing implementations
for each scenario (unless otherwise noted).
========= ===
Name VCS
--------- ---
Brett svn
Barry bzr
Alexandre hg
Stephen git
========= ===
Initial Setup
-------------
Some DVCSs have some perks if you do some initial setup upfront.
This section covers what can be done before any of the usage
scenarios are run in order to take better advantage of the tools.
All of the DVCSs support configuring your project identification.
Unlike the centralized systems, they use your email address to
identify your commits.  (Access control is generally done by
mechanisms external to the DVCS, such as ssh or console login).
This identity may be associated with a full name. 
All of the DVCSs will query the system to get some approximation to
this information, but that may not be what you want.  They also
support setting this information on a per-user basis, and on a per-
project basis.  Convenience commands to set these attributes vary,
but all allow direct editing of configuration files.
Some VCSs support end-of-line (EOL) conversions on checkout/checkin.  
svn
'''
None required, but it is recommended you follow the
`guidelines <http://www.python.org/dev/faq/#what-configuration-settings-should-i-use>`_
in the dev FAQ.
bzr
'''
No setup is required, but for much quicker and space-efficient local
branching, you should create a shared repository to hold all your
Python branches.  A shared repository is really just a parent
directory containing a .bzr directory.  When bzr commits a revision,
it searches from the local directory on up the file system for a .bzr
directory to hold the revision.  By sharing revisions across multiple
branches, you cut down on the amount of disk space used.  Do this::
  cd ~/projects
  bzr init-repo python
  cd python
Now, all your Python branches should be created inside of
``~/projects/python``.
There are also some settings you can put in your
``~/.bzr/bazaar.conf``
and ``~/.bzr/locations.conf`` file to set up defaults for interacting
with Python code.  None of them are required, although some are
recommended.  E.g. I would suggest gpg signing all commits, but that
might be too high a barrier for developers.  Also, you can set up
default push locations depending on where you want to push branches
by default.  If you have write access to the master branches, that
push location could be code.python.org.  Otherwise, it might be a
free Bazaar code hosting service such as Launchpad.  If Bazaar is
chosen, we should decide what the policies and recommendations are.
At a minimum, I would set up your email address::
  bzr whoami "Firstname Lastname <email.address@example.com>"
hg
''
Minimally, you should set your user name. To do so, create the file
``.hgrc`` in your home directory and add the following::
 [ui]
 username = Firstname Lastname <email.address@example.com>
If you are using Windows, enable the win32text extension to use
Unix-style newlines by adding to your configuration::
 [extensions]
 win32text =
These options can also be set locally to a given repository by
customizing ``<repo>/.hg/hgrc``, instead of ``~/.hgrc``.
git
'''
None needed.  However, git supports a number of features that can
smooth your work, with a little preparation.  git supports setting
defaults at the workspace, user, and system levels.  The system
level is out of scope of this PEP.  The user configuration file is
``$HOME/.gitconfig`` on Unix-like systems, and the workspace
configuration file is ``$REPOSITORY/.git/config``.
You can use the ``git-config`` tool to set preferences for user.name and
user.email either globally (for your system login account) or
locally (to a given git working copy), or you can edit the
configuration files (which have the same format as shown in the
Mercurial section above).::
# my full name doesn't change
# note "--global" flag means per user
# (system-wide configuration is set with "--system")
git config --global user.name 'Firstname Lastname'
# but use my Pythonic email address
cd /path/to/python/repository
git config user.email email.address@python.example.com
If you are using Windows, you probably want to set the core.autocrlf
and core.safecrlf preferences to true using ``git-config``.::
# check out files with CRLF line endings rather than Unix-style LF only
git config --global core.autocrlf true
# scream if a transformation would be ambiguous
# (eg, a working file contains both naked LF and CRLF)
# and check them back in with the reverse transformation
git config --global core.safecrlf true
Although the repository will usually contain a .gitignore file
specifying file names that rarely if ever should be registered in the
VCS, you may have personal conventions (e.g., always editing log
messages in a temporary file named ".msg") that you may wish to
specify.::
# tell git where my personal ignores are
git config --global core.excludesfile ~/.gitignore
# I use .msg for my long commit logs, and Emacs makes backups in
# files ending with ~
# these are globs, not regular expressions
echo '*~' >> ~/.gitignore
echo '.msg' >> ~/.gitignore
If you use multiple branches, you can save a lot of space by putting
all objects in a common object store.  This can be done physically,
by making them branches in a single repository. It can alternatively
be done logically, with the environment variables
``GIT_OBJECT_DIRECTORY`` (a single directory where new repository
objects will be written) and ``GIT_ALTERNATE_OBJECT_DIRECTORIES``
(a colon-separated path -- on Windows, semicolon-separated -- of
directories containing read-only object stores to search).  Note that
when making a local clone, git will hard-link objects rather than
creating copies if the OS supports that, which also saves space in
the child repository.  Here's a complicated example::
# clone the trunk and py3k repositories
cd /path/to/myrepos
git clone git://code.python.org/python/trunk
git clone git://code.python.org/python/branches/py3k
# set up environment for my personal copies of the trunk and py3k
# they will read the objects in the pristine clones, but never write
# anything there export
# GIT_ALTERNATE_OBJECT_DIRECTORIES=/path/to/myrepos/trunk:/path/to/myrepos/py3k
git clone trunk trunk-sandbox
# set up environment for my personal copy of py3k
# read/write: if a file introduced in py3k is imported to trunk
# verbatim, the trunk sandbox will
# use the object created in the py3k sandbox
export GIT_OBJECT_DIRECTORY=/path/to/myrepos/trunk-sandbox
git clone py3k py3k-sandbox
If you want more complexity, git clone has a plethora of options to
optimize space.
One-Off Checkout
----------------
As a non-core developer, I want to create and publish a one-off patch
that fixes a bug, so that a core developer can review it for
inclusion in the mainline.
* Checkout/branch/clone trunk.
* Edit some code.
* Generate a patch (based on what is best supported by the VCS, e.g.
branch history).
* Receive reviewer comments and address the issues.
* Generate a second patch for the core developer to commit.
svn
'''
::
svn checkout http://svn.python.org/projects/python/trunk
cd trunk
# Edit some code.
echo "The cake is a lie!" > README
# Since svn lacks support for local commits, we fake it with patches.
svn diff >> commit-1.diff
svn diff >> patch-1.diff
# Upload the patch-1 to bugs.python.org.
# Receive reviewer comments.
# Edit some code.
echo "The cake is real!" > README
# Since svn lacks support for local commits, we fake it with patches.
svn diff >> commit-2.diff
svn diff >> patch-2.diff
# Upload patch-2 to bugs.python.org
bzr
'''
::
bzr branch http://code.python.org/python/trunk
cd trunk
# Edit some code.
bzr commit -m 'Stuff I did'
bzr send -o bundle
# Upload bundle to bugs.python.org
# Receive reviewer comments
# Edit some code
bzr commit -m 'Respond to reviewer comments'
bzr send -o bundle
# Upload updated bundle to bugs.python.org
hg
''
::
hg clone http://code.python.org/hg/trunk
cd trunk
# Edit some code.
hg commit -m "Stuff I did"
# Create a patch containing the last commit. Use hg export REV: to
# export all changes from revision REV (inclusive).
hg export tip >> stuff-i-did.patch
# Upload patch to bugs.python.org
# Receive reviewer comments
# Edit some code
hg commit -m "Address reviewer comments."
hg export tip >> additional-fixes.patch
# Upload patch to bugs.python.org
git
'''
The patches could be created with
``git diff master > stuff-i-did.patch``, too, but
``git format-patch | git am`` knows some tricks
(empty files, renames, etc) that ordinary patch can't handle.  git
grabs "Stuff I did" out of the the commit message to create the file
name 0001-Stuff-I-did.patch.  See Patch Review below for a
description of the git-format-patch format.::
# Get the mainline code.
git clone git://code.python.org/python/trunk
cd trunk
# Make a personal branch to keep the trunk ("master" branch) clean.
git checkout -b stuff
# Edit some code.
git commit -a -m 'Stuff I did.'
# Create patch for my changes (i.e, relative to master).
git format-patch master
git tag stuff-v1
# Upload 0001-Stuff-I-did.patch to bugs.python.org.
# Time passes ... receive reviewer comments.
# Edit more code.
git commit -a -m 'Address reviewer comments.'
# Make an add-on patch to apply on top of the original.
git format-patch stuff-v1
# Upload 0001-Address-reviewer-comments.patch to bugs.python.org.
Backing Out Changes
-------------------
As a core developer, I want to undo a change that was not ready for
inclusion in the mainline.
* Back out the unwanted change.
* Push patch to server.
svn
'''
::
# Assume the change to revert is in revision 40
svn merge -r40:39 .
# Resolve conflicts, if any.
svn commit -m "Reverted revision 40"
bzr
'''
::
# Assume the change to revert is in revision 40
bzr merge -r 40..39
# Resolve conflicts, if any.
bzr commit -m "Reverted revision 40"
Note that if the change you want revert is the last one that was
made, you can just use ``bzr uncommit``.
hg
''
::
# Assume the change to revert is in revision 9150dd9c6d30
hg backout --merge -r 9150dd9c6d30
# Resolve conflicts, if any.
hg commit -m "Reverted changeset 9150dd9c6d30"
hg push
2009-01-24 13:55:28 -05:00
Note, you can use "hg rollback" and "hg strip" to revert changes you committed
in your local repository, but did not yet push to other repositories.
git
'''
::
# Assume the change to revert is the grandfather of a revision tagged "newhotness".
git revert newhotness~2
#if CONFLICTS
#    Resolve conflicts if any.
git commit -m "Reverted changeset 9150dd9c6d30."
#else
#    Edit log message, commit will be done automatically.
#endif
git push
Patch Review
------------
As a core developer, I want to review patches submitted by other
people, so that I can make sure that only approved changes are added
to Python.
Core developers have to review patches as submitted by other people.
This requires applying the patch, testing it, and then tossing away
the changes. The assumption can be made that a core developer already
has a checkout/branch/clone of the trunk.
* Branch off of trunk.
* Apply patch w/o any comments as generated by the patch submitter.
* Push patch to server.
* Delete now-useless branch.
svn
'''
Subversion does not exactly fit into this development style very well
as there are no such thing as a "branch" as has been defined in this
PEP. Instead a developer either needs to create another checkout for
testing a patch or create a branch on the server. Up to this point,
core developers have not taken the "branch on the server" approach to
dealing with individual patches. For this scenario the assumption
will be the developer creates a local checkout of the trunk to work
with.::
cp -r trunk issue0000
cd issue0000
patch -p0 < __patch__
# Review patch.
svn commit -m "Some patch."
cd ..
rm -r issue0000
bzr
'''
::
bzr branch trunk issueNNNN
# Download `patch` bundle from Roundup
bzr merge patch
# Review patch
bzr commit -m'Patch NNN by So N. So' --fixes python:NNNN
bzr push bzr+ssh://trunk
rm -rf ../issueNNNN
Alternatively, since you're probably going to commit these changes to
the trunk, you could just do a checkout.  That would give you a local
working tree while the branch (i.e. all revisions) would continue to
live on the server.  This is similar to the svn model and might allow
you to more quickly review the patch.  There's no need for the push
in this case.::
bzr checkout trunk issueNNNN
# Download `patch` bundle from Roundup
bzr merge patch
# Review patch
bzr commit -m'Patch NNNN by So N. So' --fixes python:NNNN
rm -rf ../issueNNNN
hg
''
::
hg clone trunk issue0000
cd issue0000
# If the patch was generated using hg export, the user name of the
# submitter is automatically recorded. Otherwise,
# use hg import --no-commit submitted.diff and commit with
# hg commit -u "Firstname Lastname <email.address@example.com>"
hg import submitted.diff
# Review patch.
hg push ssh://alexandre@code.python.org/hg/trunk/
git
'''
We assume a patch created by git-format-patch.  This is a Unix mbox
file containing one or more patches, each formatted as an RFC 2822
message.  git-am interprets each message as a commit as follows.  The
author of the patch is taken from the From: header, the date from the
Date header.  The commit log is created by concatenating the content
of the subject line, a blank line, and the message body up to the
start of the patch.::
cd trunk
# Create a branch in case we don't like the patch.
git checkout -b patch-review
# Download patch from bugs.python.org to submitted.patch.
git am « submitted.patch
# Review and approve patch.
# Merge into master and push.
git checkout master
git merge patch-review
git push
Backport
--------
As a core developer, I want to apply a patch to 2.6, 2.7 and 3.0, so
that I can fix a problem in all three versions.
Python always has at least the trunk and the last major release to
potentially backport patches to. Currently, though, the situation is
even more complicated than that as we also have to port forward
changes as well. This scenario assumes one needs to apply a patch to
2.6, 2.7, and 3.0, but not necessarily in that order (which is why
there is no list of required steps for this scenario). It is assumed
a developer has a checkout/clone of all three versions. There is also
a revision that needs to be prevented from ever being merged into
another branch.
svn
'''
::
# Assume patch applied to 2.7 in revision 0000.
cd release26-maint
svnmerge merge -r 0000
# Resolve merge conflicts and make sure patch works.
svn commit -F svnmerge-commit-message.txt
cd ../py3k
svnmerge merge -r 0000
# Same as for 2.6, except Misc/NEWS changes are reverted.
svn revert Misc/NEWS
svn commit -F svnmerge-commit-message.txt
# Block revision 0001 from being merged from 2.7 into 3.0.
svnmerge block -r 0001
svn ci -F svnmerge-commit-message.txt
bzr
'''
Bazaar is pretty straightforward here, since it supports cherry
picking revisions manually.  In the example below, we could have
given a revision id instead of a revision number, but that's usually
not necessary.  Martin Pool suggests "We'd generally recommend doing
the fix first in the oldest supported branch, and then merging it
forward to the later releases."::
# Assume patch applied to 2.7 in revision 0000
cd release26-main
bzr merge ../trunk -c 0000
# Resolve conflicts and make sure patch works
bzr commit -m'Backport patch NNNN'
bzr push bzr+ssh://trunk
cd ../py3k
bzr merge ../trunk -r 0000
# Same as for 2.6 except Misc/NEWS changes are reverted
bzr revert Misc/NEWS
bzr commit -m'Forward port patch NNNN'
bzr push bzr+ssh://py3k
hg
''
Mercurial, like other DVCS, does not well support the current
workflow used by Python core developers to backport patches. Right
now, bug fixes are first applied to the development mainline
(i.e., trunk), then back-ported to the maintenance branches and
forward-ported, as necessary, to the py3k branch. This workflow
requires the ability to cherry-pick individual changes. Mercurial's
transplant extension provides this ability. Here is an example of
the scenario using this workflow::
cd release26-maint
# Assume patch applied to 2.7 in revision 0000
hg transplant -s ../trunk 0000
# Resolve conflicts, if any.
cd ../py3k
hg pull ../trunk
hg merge
hg revert Misc/NEWS
hg commit -m "Merged trunk"
hg push
In the above example, transplant acts much like the current svnmerge
command. When transplant is invoked without the revision, the command
launches an interactive loop useful for transplanting multiple
changes. Another useful feature is the --filter option which can be
used to modify changesets programmatically (e.g., it could be used
for removing changes to Misc/NEWS automatically).
Alternatively to the traditional workflow, we could avoid
transplanting changesets by committing bug fixes to the oldest
supported release, then merge these fixes upward to the more recent
branches.
::
cd release25-maint
hg import fix_some_bug.diff
# Review patch and run test suite. Revert if failure.
hg push
cd ../release26-maint
hg pull ../release25-maint
hg merge
# Resolve conflicts, if any. Then, review patch and run test suite.
hg commit -m "Merged patches from release25-maint."
hg push
cd ../trunk
hg pull ../release26-maint
hg merge
# Resolve conflicts, if any, then review.
hg commit -m "Merged patches from release26-maint."
hg push
Although this approach makes the history non-linear and slightly
more difficult to follow, it encourages fixing bugs across all
supported releases. Furthermore, it scales better when there is many
changes to backport, because we do not need to seek the specific
revision IDs to merge.
git
'''
In git I would have an "integration" workspace which contains all of
the relevant master repository branches.  git cherry-pick doesn't
work across repositories; you need to have the branches in the same
repository.
::
# Assume patch applied to 2.7 in revision release27~3 (4th patch back from tip).
cd integration
git checkout release26
# The "-x" option automatically notes which commit is being
# cherry-picked in the log.
git cherry-pick -x release27~3
# If there are conflicts, resolve them, and commit those changes.
# git commit -a -m "Resolve conflicts."
# Run test suite.  If fixes are necessary, record as a separate commit.
# git commit -a -m "Fix code causing test failures."
git checkout master
git cherry-pick -x release27~3
# Do any conflict resolution and test failure fixups.
# Revert Misc/NEWS changes.
git checkout HEAD^ -- Misc/NEWS
# This creates a new commit on top of the cherry-pick.  An alternative workflow
# would use the -n (aka --no-commit) flag to git-cherry-pick, and then commit
# here with an appropriate log message.
git commit -m 'Revert cherry-picked Misc/NEWS changes.' Misc/NEWS
# Push both ports.
git push release26 master
If you are regularly merging (rather than cherry-picking) from a
given branch, then you can block a given commit from being
accidentally merged in the future by merging, then reverting it.
This does not prevent a cherry-pick from pulling in the unwanted
patch, and this technique requires blocking everything that you don't
want merged.  I'm not sure if this differs from svn on this point.
::
cd trunk
# Merge in the alpha tested code.
git merge experimental-branch
# We don't want the 3rd-to-last commit from the experimental-branch,
# and we don't want it to ever be merged.
# The notation "^N" means Nth parent of the current commit.  Thus HEAD^2^1^1
# means the first parent of the first parent of the second parent of HEAD.
git revert HEAD^2^1^1
# Propagate the merge and the prohibition to the public repository.
git push
Coordinated Development of a New Feature
----------------------------------------
Sometimes core developers end up working on a major feature with
several developers.  As a core developer, I want to be able to
publish feature branches to a common public location so that I can
collaborate with other developers.
This requires creating a branch on a server that other developers
can access.  All of the DVCSs support creating new repositories on
hosts where the developer is already able to commit, with
appropriate configuration of the repository host. This is
similar in concept to the existing sandbox in svn, although details
of repository initialization may differ.
For non-developers, there are various more-or-less public-access
repository-hosting services.
Bazaar has
Launchpad_,
Mercurial has
`bitbucket.org`_,
and git has
GitHub_.
All also have easy-to-use
CGI interfaces for developers who maintain their own servers.
.. _Launchpad: http://www.launchpad.net/
.. _bitbucket.org: http://www.bitbucket.org/
.. _GitHub: http://www.github.com/
* Branch trunk.
* Pull from branch on the server.
* Pull from trunk.
* Push merge to trunk.
svn
'''
::
# Create branch.
svn copy svn+ssh://pythondev@svn.python.org/python/trunk svn+ssh://pythondev@svn.python.org/python/branches/NewHotness
svn checkout svn+ssh://pythondev@svn.python.org/python/branches/NewHotness
cd NewHotness
svnmerge init
svn commit -m "Initialize svnmerge."
# Pull in changes from other developers.
svn update
# Pull in trunk and merge to the branch.
svnmerge merge
svn commit -F svnmerge-commit-message.txt
bzr
'''
::
XXX To be done by Brett as a test of knowledge and online documentation/community.
hg
''
::
XXX To be done by Brett as a test of knowledge and online documentation/community.
git
'''
::
XXX To be done by Brett as a test of knowledge and online documentation/community.
Separation of Issue Dependencies
--------------------------------
Sometimes, while working on an issue, it becomes apparent that the
problem being worked on is actually a compound issue of various
smaller issues. Being able to take the current work and then begin
working on a separate issue is very helpful to separate out issues
into individual units of work instead of compounding them into a
single, large unit.
* Create a branch A (e.g. urllib has a bug).
* Edit some code.
* Create a new branch B that branch A depends on (e.g. the urllib
bug exposes a socket bug).
* Edit some code in branch B.
* Commit branch B.
* Edit some code in branch A.
* Commit branch A.
* Clean up.
svn
'''
To make up for svn's lack of cheap branching, it has a changelist
option to associate a file with a single changelist. This is not as
powerful as being able to associate at the commit level. There is
also no way to express dependencies between changelists.
::
cp -r trunk issue0000
cd issue0000
# Edit some code.
echo "The cake is a lie!" > README
svn changelist A README
# Edit some other code.
echo "I own Python!" . LICENSE
svn changelist B LICENSE
svn ci -m "Tell it how it is." --changelist B
# Edit changelist A some more.
svn ci -m "Speak the truth." --changelist A
cd ..
rm -rf issue0000
bzr
'''
Here's an approach that uses bzr shelf (now a standard part of bzr)
to squirrel away some changes temporarily while you take a detour to
fix the socket bugs.
::
bzr branch bzr+svn://trunk bug-0000
cd bug-0000
# Edit some code.  Dang, we need to fix the socket module.
bzr shelve --all
# Edit some code.
bzr commit -m "Socket module fixes"
# Detour over, now resume fixing urllib
bzr unshelve
# Edit some code
Another approach one might take uses the loom plugin.  Looms can
greatly simply working on dependent branches because they
automatically take care of the stacking dependencies for you. 
Imagine looms as a stack of dependent branches (called "threads" in
loom parlance), with easy ways to move up and down the stack of
thread, merge changes up the stack to descendant threads, create
diffs between threads, etc.  Occasionally, you may need or want to
export your loom threads into separate branches, either for review
or commit.  Higher threads incorporate all the changes in the lower
threads, automatically.
::
bzr branch bzr+svn://trunk bug-0000
cd bug-0000
bzr loomify trunk
bzr create-thread fix-urllib
# Edit some code.  Dang, we need to fix the socket module.
bzr commit -m "Checkpointing my work so far"
bzr down-thread
bzr create-thread fix-socket
# Edit some code
bzr commit -m "Socket module fixes"
bzr up-thread
# Manually resolve conflicts if necessary
bzr commit -m 'Merge in socket fixes'
# Edit me some more code
bzr commit -m "Now that socket is fixed, complete the urllib fixes"
bzr record done
For bonus points, let's say someone else fixes the socket module in
exactly the same way you did.  Perhaps this person even grabbed your
fix-socket thread and applied just that to the trunk.  You'd like to
be able to merge their changes into your loom and delete your
now-redundant fix-socket thread.
::
bzr down-thread trunk
# Get all new revisions to the trunk.  If you've done things
# correctly, this must succeed without conflict.
bzr pull
bzr up-thread
# See?  The fix-socket thread is now identical to the trunk
bzr commit -m 'Merge in trunk changes'
bzr diff -r thread: | wc -l  # returns 0
bzr combine-thread
bzr up-thread
# Resolve any conflicts
bzr commit -m 'Merge trunk'
# Now our top-thread has an up-to-date trunk and just the urllib fix.
hg
''
::
hg clone trunk issue0000
cd issue0000
# Edit some code (e.g. urllib).
cd ..
hg clone trunk fix-socket-bug
cd fix-socket-bug
# Edit some other code (e.g. socket).
hg commit
cd ../issue0000
hg pull ../fix-socket-bug
hg merge
# Edit some more code.
hg commit
cd ../trunk
hg pull ../issue0000
hg merge
hg push
cd ..
rm -rf issue0000 fix-socket-bug
git
'''
::
cd trunk
# Actually, I wouldn't tag here in most cases; it's easy enough to get the
# appropriate revision to rewind to via git-show-branches.
git tag checkpoint
git checkout -b bug-0000
# Edit some code, commit some changes.
git commit -a -m "Fixed urllib bug, part 1."
# Dang, we need to fix something lower level now.
# This is independent of urllib, so create a new branch at master.
git checkout -b fix-socket master
# Edit some code, commit some changes.
git commit -a -m "Completed fix of socket."
# Can't test urllib unless the socket fix is present.
# So we rebase on top of fix-socket (which is where we happen to be).
# git-rebase is interactive,so we resolve conflicts as we go along.
git rebase fix-socket bug-0000
# Edit me some more code, commit some more fixes to bug-0000.
git commit -a -m "Complete urllib fixes."
# Merge in the fixes.
git checkout master
git merge bug-0000
# And push them to the public repository.
git push
# Bonus points: someone else fixes socket in the exact same way
# you just did, and landed that in the trunk.
# Merge their changes in and delete your now redundant thread.
# Note that we find this out because the git push fails with
# "not a fast forward."
git pull git://code.python.org/public/trunk master
# Gag me, we got conflicts.
# Call the doctor, who says "you've got duplicate patchiosis".
# The second opinion is that it really is exactly what I had in fix-socket.
# OK, abandon my work, and clean up the bloody wreck of
# conflicts with the same mop:
git reset --hard checkpoint
git pull git://code.python.org/public/trunk master
git rebase --onto master fix-socket bug-0000
# If there were any conflicts, we fixed them during rebase.  But
# there shouldn't be any,
# since we assumed the socket bug is independent of the urllib bug.
git checkout master
git merge bug-0000
git push
# Clean up.  We don't delete bug-0000 because the merge obsoleted it already.
git tag -d checkpoint
git branch -d fix-socket
# Now our HEAD has an up-to-date trunk and just the urllib fix.
Doing a Python Release
----------------------
How does PEP 101 change when using a DVCS?
bzr
'''
It will change, but not substantially so.  When doing the
maintenance branch, we'll just push to the new location instead of
doing an svn cp.  Tags are totally different, since in svn they are
directory copies, but in bzr (and I'm guessing hg), they are just
symbolic names for revisions on a particular branch.  The release.py
script will have to change to use bzr commands instead.  It's
possible that because DVCS (in particular, bzr) does cherry picking
and merging well enough that we'll be able to create the maint
branches sooner.  It would be a useful exercise to try to do a
release off the bzr/hg mirrors.
hg
''
Clearly, details specific to Subversion in PEP 101 and in the
release script will need to be updated. In particular, release
tagging and maintenance branches creation process will have to be
modified to use Mercurial's features; this will simplify and
streamline certain aspects of the release process. For example,
tagging and re-tagging a release will become a trivial operation
since a tag, in Mercurial, is simply a symbolic name for a given
revision.
git
'''
It will change, but not substantially so.  When doing the
maintenance branch, we'll just git push to the new location instead
of doing an svn cp.  Tags are totally different, since in svn they
are directory copies, but in git they are just symbolic names for
revisions, as are branches.  (The difference between a tag and a
branch is that tags refer to a particular commit, and will never
change unless you use git tag -f to force them to move.  The
checked-out branch, on the other hand, is automatically updated by
git commit.)  The release.py script will have to change to use git
commands instead.  With git I would create a (local) maintenance
branch as soon as the release engineer is chosen.  Then I'd "git
pull" until I didn't like a patch, when it would be "git pull; git
revert ugly-patch", until it started to look like the sensible thing
is to fork off, and start doing "git cherry-pick" on the good
patches.
Platform/Tool Support
=====================
Operating Systems
-----------------
==== ======================================= ============================================= =============================
DVCS Windows OS X UNIX
---- --------------------------------------- --------------------------------------------- -----------------------------
bzr yes (installer) w/ tortoise yes (installer, fink or MacPorts) yes (various package formats)
hg yes (third-party installer) w/ tortoise yes (third-party installer, fink or MacPorts) yes (various package formats)
git yes (third-party installer) yes (third-party installer, fink or MacPorts) yes (.deb or .rpm)
==== ======================================= ============================================= =============================
As the above table shows, all three DVCSs are available on all three
major OS platforms. But what it also shows is that Bazaar is the
only DVCS that directly supports Windows with a binary installer
while Mercurial and git require you to rely on a third-party for
binaries. Both bzr and hg have a tortoise version while git does not.
Bazaar also has the benefit of being written in pure Python, making
a Python VM the bare minimum requirement to work. It does have Pyrex
extensions which are optional, but highly recommended for
performance reasons. Mercurial requires the compilation of an
extension module and git is pure C and thus also requires a compiler.
CRLF -> LF Support
------------------
bzr
My understanding is that support for this is being worked on as
I type, landing in a version RSN.  I will try to dig up details.
hg
Supported via the win32text extension.
git
I can't say from personal experience, but it looks like there's
pretty good support via the core.autocrlf and core.safecrlf
configuration attributes.
Case-insensitive filesystem support
-----------------------------------
bzr
Should be OK.  I share branches between Linux and OS all the
time.  I've done case changes (e.g. bzr mv Mailman mailman) and
as long as I did it on Linux (obviously), when I pulled in the
changes on OS X everything was hunky dory.
hg
Mercurial uses a case safe repository mechanism and detects case
folding collisions.
git
Since OS X preserves case, you can do case changes there too.
git does not have a problem with renames in either direction.
However, case-insensitive filesystem support is usually taken
to mean complaining about collisions on case-sensitive files
systems.  git does not do that.
Tools
-----
In terms of code review tools such as `Review Board`_ and Rietveld_,
the former supports all three while the latter supports hg and git but
not bzr.  Bazaar does not yet have an online review board, but it
has several ways to manage email based reviews and trunk merging. 
There's `Bundle Buggy`_, `Patch Queue Manager`_ (PQM), and
`Launchpad's code reviews <https://launchpad.net/+tour/code-review>`_.
.. _Review Board: http://www.review-board.org/
.. _Rietveld: http://code.google.com/p/rietveld/
.. _Bundle Buggy: http://code.aaronbentley.com/bundlebuggy/
.. _Patch Queue Manager: http://bazaar-vcs.org/PatchQueueManager
All three have some web site online that provides basic hosting
support for people who want to put a repository online. Bazaar has
Launchpad, Mercurial has bitbucket.org, and git has GitHub. Google
Code also has instructions on how to use git with the service, both
to hold a repository and how to act as a read-only mirror.
All three also `appear to be supported
<http://buildbot.net/repos/release/docs/buildbot.html#How-Different-VC-Systems-Specify-Sources>`_
by Buildbot_.
.. _Buildbot: http://buildbot.net
Usage On Top Of Subversion
==========================
==== ============
DVCS svn support
---- ------------
bzr bzr-svn_ (third-party)
hg `multiple third-parties <http://www.selenic.com/mercurial/wiki/index.cgi/WorkingWithSubversion>`__
git git-svn_
==== ============
.. _bzr-svn: http://bazaar-vcs.org/BzrForeignBranches/Subversion
.. _git-svn: http://www.kernel.org/pub/software/scm/git/docs/git-svn.html
All three DVCSs have svn support, although git is the only one to
come with that support out-of-the-box.
Server Support
==============
==== ==================
DVCS Web page interface
---- ------------------
bzr loggerhead_
hg hgweb_
git gitweb_
==== ==================
.. _loggerhead: https://launchpad.net/loggerhead
.. _hgweb: http://www.selenic.com/mercurial/wiki/index.cgi/HgWebDirStepByStep
.. _gitweb: http://git.or.cz/gitwiki/Gitweb
All three DVCSs support various hooks on the client and server side
for e.g. pre/post-commit verifications.
Development
===========
All three projects are under active development to some degree. Both
Git and Bazaar seem to release on a monthly schedule. Mercurial, on
the other hand, seems to release roughly once a quarter.
For the two Python-based DVCSs, the amount of time until a release
that is compatible with Python 2.6 can also be a sign of how active
the development is. Bazaar was compatible with 2.6 as of version 1.8
which was released two weeks after 2.6 came out. Mercurial, on the
other hand, took two months to be compatible with its 1.1 release.
Special Features
================
bzr
---
Martin Pool adds: "bzr has a stable Python scripting interface, with
a distinction between public and private interfaces and a
deprecation window for APIs that are changing.  Some plugins are
listed in https://edge.launchpad.net/bazaar and
http://bazaar-vcs.org/Documentation".
hg
--
Alexander Solovyov comments:
Mercurial has easy to use extensive API with hooks for main events
2009-01-24 13:55:28 -05:00
and ability to extend commands. Also there is the mq (mercurial
queues) extension, distributed with Mercurial, which simplifies
work with patches.
git
---
git has a cvsserver mode, ie, you can check out a tree from git
using CVS.  You can even commit to the tree, but features like
merging are absent, and branches are handled as CVS modules, which
is likely to shock a veteran CVS user.
Impressions
===========
As I (Brett Cannon) am left with the task of of making the final
decision of which/any DVCS to go with and not my co-authors, I felt
it only fair to write down my impressions as I evaluate the various
tools so as to be as transparent as possible.
To begin, I measured the checking out of code as if I was a non-core
developer. This is important as this is the first impression
developers have when they decide they wish to contribute a patch to
Python. Timings were done using the ``time`` command in zsh and
space was calculated with ``du -c -h``.
======= ================ ==============
DVCS Time Space
------- ---------------- --------------
svn 1:04 139 M
bzr 2:29:24 or 8:46 275 M or 596 M
hg 2:30 171 M
git 2:54 134 M
======= ================ ==============
The svn measurements are not exactly a 1:1 comparison to the DVCSs.
For one, svn does not download the entire revision history, and thus
(should) have the least amount to download. And two, because various
calculation steps are left up to the server the entire process of
checking out code (should) be faster.
But the svn measurements should be considered as what developers are
used to. Thus they act as a reference point for what people tend to
expect in terms of performance.
Looking at bzr, I have listed two numbers. The first values are for
running ``bzr branch`` as outlined in the `One-Off Checkout`_
scenario. When the
timings came back in hours (I used Launchpad as code.python.org is
not running the newest version of bzr and I wanted to use its latest
networking protocol), I decided to try using the steps outlined when
the experimental bzr branches were first created. That second
approach is what the second set of values for bzr represent.
While both the hg and git numbers are perfectly acceptable, the bzr
numbers not necessarily. The raw ``bzr branch`` approach is entirely
not acceptable as no one wants to wait over two hours to write a
potentially one line change to some code for the benefit of Python.
Assuming 8:46 is a reasonable amount of time (I believe it in
general is, but it is teetering on not), the 596 M space requirement
could be an issue for some. While we typically view disk space as
cheap, for some people it might be an issue (e.g. the person who did
the schedule for PyCon 2008 did it over a connection so badly that
Google Spreadsheets didn't work for him and he had to submit the
schedule in another form than the one original used). Once again I
think the space usage is acceptable, but it is close to being too
much.
To see if bzr's performance would be acceptable once at least the
branch was downloaded, I decided to see how long it would take to
get the change log for a file. I chose the README file as it sees
regular changes for every release and has a revision history going
back to 1993 and thus would have a fair number of revisions.
It should be mentioned that while git had the nicest output thanks to
its color terminal output, it also took a while to find the
``--no-pager`` flag in order to get just a stream of text instead of
having the output sent to the pager.
Overall the numbers were all acceptable:
* bzr: 4.5 seconds
* hg: 1.1 seconds
* git: 1.5 seconds
While having bzr be over 3x slower than its nearest neighbor, it
must be kept in mind that the total performance time is still
acceptable, regardless of the multiplier.
Because a DVCS keeps its revision history on disk, it also means
that typically they can be zipped up for direct downloading. At
least in bzr's case that would solve the performance issue for
initial checkout if the zip file could be generate constantly. But
that didn't address the cost of pulling in new revisions when a
checkout has gone stale. To measure this I decided I would check out
the repositories back about 700 revisions which represented the
amount of change made since the beginning of the month and time how
long they took to update.
For this to happen I first had to remember the URLs for the
repositories. Instead of simply looking in this PEP, though, I
decided to try to figure it out from the command-line help for each
tool or simply guessing. Bzr worked out great with ``bzr info``. Git
took a little poking around, but I figured out ``git remote show
origin`` told me what I needed. For hg, though, I couldn't figure it
out short of running ``hg pull`` and denoting the status information
during the pull (turns out ``hg paths`` is what I was looking for).
With the repository locations known I then had to perform a checkout
to a certain revision. Turns out that git will not clone a
repository to only a specific revision, although from personal
experience git's pull facility is very fast. Bzr was able to perform
its update in just over 39 seconds. Hg did its update in just over
17 seconds. Much like the log test, while the multiplier of slowness
seems high, in real life terms al DVCSs performed within reason.
In my mind this means that bzr is only an acceptable candidate as
long as an fairly up-to-date archive of Python's key branches are
made available for people to download to avoid bzr's very so remote
branching.
XXX ... to be continued
Chosen DVCS
===========
XXX
Transition Plan
===============
XXX