300 lines
13 KiB
Plaintext
300 lines
13 KiB
Plaintext
PEP: 385
|
|
Title: Migrating from svn to Mercurial
|
|
Version: $Revision$
|
|
Last-Modified: $Date$
|
|
Author: Dirkjan Ochtman <dirkjan@ochtman.nl>
|
|
Status: Active
|
|
Type: Process
|
|
Content-Type: text/x-rst
|
|
Created: 25-May-2009
|
|
|
|
.. warning::
|
|
This PEP is in the draft stages.
|
|
|
|
|
|
Motivation
|
|
==========
|
|
|
|
After having decided to switch to the Mercurial DVCS, the actual migration
|
|
still has to be performed. In the case of an important piece of
|
|
infrastructure like the version control system for a large, distributed
|
|
project like Python, this is a significant effort. This PEP is an attempt
|
|
to describe the steps that must be taken for further discussion. It's
|
|
somewhat similar to `PEP 347`_, which discussed the migration to SVN.
|
|
|
|
To make the most of hg, I (Dirkjan) would like to make a high-fidelity
|
|
conversion, such that (a) as much of the svn metadata as possible is
|
|
retained, and (b) all metadata is converted to formats that are common in
|
|
Mercurial. This way, tools written for Mercurial can be optimally used. In
|
|
order to do this, I want to use the `hgsubversion`_ software to do an initial
|
|
conversion. This hg extension is focused on providing high-quality conversion
|
|
from Subversion to Mercurial for use in two-way correspondence, meaning it
|
|
doesn't throw away as much available metadata as other solutions.
|
|
|
|
Such a conversion also seems like a good time to reconsider the contents of
|
|
the repository and determine if some things are still valuable. In this spirit,
|
|
the following sections also propose discarding some of the older metadata.
|
|
|
|
.. _PEP 347: http://www.python.org/dev/peps/pep-0347/
|
|
.. _hgsubversion: http://bitbucket.org/durin42/hgsubversion/
|
|
|
|
|
|
Timeline
|
|
========
|
|
|
|
TBD; needs fully working hgsubversion and consensus on this document.
|
|
|
|
|
|
Transition plan
|
|
===============
|
|
|
|
Branch strategy
|
|
---------------
|
|
|
|
Mercurial has two basic ways of using branches: cloned branches, where each
|
|
branch is kept in a separate repository, and named branches, where each revision
|
|
keeps metadata to note on which branch it belongs. The former makes it easier
|
|
to distinguish branches, at the expense of requiring more disk space on the
|
|
client. The latter makes it a little easier to switch between branches, but
|
|
often has somewhat unintuitive results for people (though this has been
|
|
getting better in recent versions of Mercurial).
|
|
|
|
I'm still a bit on the fence about whether Python should adopt cloned
|
|
branches and named branches. Since it usually makes more sense to tag releases
|
|
on the maintenance branch, for example, mainline history would not contain
|
|
release tags if we used cloned branches. Also, Mercurial 1.2 and 1.3 have the
|
|
necessary tools to make named branches less painful (because they can be
|
|
properly closed and closed heads are no longer considered in relevant cases).
|
|
|
|
A disadvantage might be that the used clones will be a good bit larger (since
|
|
they essentially contain all other branches as well). This can me mitigated by
|
|
keeping non-release (feature) branches in separate clones. Also note that it's
|
|
still possible to clone a single named branch from a combined clone, by
|
|
specifying the branch as in hg clone http://hg.python.org/main/#2.6-maint.
|
|
Keeping the py3k history in a separate clone problably also makes sense.
|
|
|
|
XXX To do: size comparison for selected separation scenarios.
|
|
|
|
Converting branches
|
|
-------------------
|
|
|
|
There are quite a lot of branches in SVN's branches directory. I propose to
|
|
clean this up a bit, by employing the following the strategy:
|
|
|
|
* Keep all release (maintenance) branches
|
|
* Discard branches that haven't been touched in 18 months, unless somone
|
|
indicates there's still interest in such a branch
|
|
* Keep branches that have been touched in the last 18 months, unless someone
|
|
indicates the branch can be deprecated
|
|
|
|
Converting tags
|
|
---------------
|
|
|
|
The SVN tags directory contains a lot of old stuff. Some of these are not, in
|
|
fact, full tags, but contain only a smaller subset of the repository. I think
|
|
we should keep all release tags, and consider other tags for inclusion based
|
|
on requests from the developer community. I'd like to consider unifying the
|
|
release tag naming scheme to make some things more consistent, if people feel
|
|
that won't create too many problems. For example, Mercurial itself just uses
|
|
'1.2.1' as a tag, where CPython would currently use r121.
|
|
|
|
Author map
|
|
----------
|
|
|
|
In order to provide user names the way they are common in hg (in the 'First Last
|
|
<user@example.org>' format), we need an author map to map cvs and svn user
|
|
names to real names and their email addresses. I have a complete version of such
|
|
a map in my `migration tools repository`_. The email addresses in it might be
|
|
out of date; that's bound to happen, although it would be nice to try and
|
|
have as many people as possible review it for addresses that are out of date.
|
|
The current version also still seems to contain some encoding problems.
|
|
|
|
.. _migration tools repository: http://hg.xavamedia.nl/cpython/pymigr/
|
|
|
|
Generating .hgignore
|
|
--------------------
|
|
|
|
The .hgignore file can be used in Mercurial repositories to help ignore files
|
|
that are not eligible for version control. It does this by employing several
|
|
possible forms of pattern matching. The current Python repository already
|
|
includes a rudimentary .hgignore file to help with using the hg mirrors.
|
|
|
|
It might be useful to have the .hgignore be generated automatically from
|
|
svn:ignore properties. This would make sure all historic revisions also have
|
|
useful ignore information (though one could argue ignoring isn't really
|
|
relevant to just checking out an old revision).
|
|
|
|
Revlog reordering
|
|
-----------------
|
|
|
|
As an optional optimization technique, we should consider trying a reordering
|
|
pass on the revlogs (internal Mercurial files) resulting from the conversion.
|
|
In some cases this results in dramatic decreases in on-disk repository size.
|
|
|
|
Other repositories
|
|
------------------
|
|
|
|
Richard Tew has indicated that he'd like the Stackless repository to also be
|
|
converted. What other projects in the svn.python.org repository should be
|
|
converted? Do we want to convert the peps repository? distutils? others?
|
|
|
|
|
|
Infrastructure
|
|
==============
|
|
|
|
hg-ssh
|
|
------
|
|
|
|
Developers should access the repositories through ssh, similar to the current
|
|
setup. Public keys can be used to grant people access to a shared hg@ account.
|
|
A hgwebdir instance should also be set up for easy browsing and read-only
|
|
access. If we're using ssh, developers should trivially be able to start new
|
|
clones (for longer-term features that profit from a separate branch).
|
|
|
|
Hooks
|
|
-----
|
|
|
|
A number of hooks is currently in use. The hg equivalents for these should be
|
|
developed and deployed. The following hooks are being used:
|
|
|
|
* check whitespace: a hook to reject commits in case the whitespace doesn't
|
|
match the rules for the Python codebase. Should be straightforward to
|
|
re-implement from the current version. We can also offer a whitespace hook
|
|
for use with client-side repositories that people can use; it could either
|
|
warn about whitespace issues and/or truncate trailing whitespace from changed
|
|
lines. Open issue: do we check only the tip after each push, or do we check
|
|
every commit in a changegroup?
|
|
|
|
* commit mails: we can leverage the notify extension for this
|
|
|
|
* buildbots: both the regular and the community build masters must be notified.
|
|
Fortunately buildbot includes support for hg. I've also implemented this for
|
|
Mercurial itself, so I don't expect problems here.
|
|
|
|
* check contributors: in the current setup, all changesets bear the username of
|
|
committers, who must have signed the contributor agreement. In a DVCS, the
|
|
committers are not necessarily the same people who push, and so we can't
|
|
check if the committer is a contributor. We could use a hook to check if the
|
|
committer is a contributor if we keep a list of registered contributors.
|
|
|
|
hgwebdir
|
|
--------
|
|
|
|
A more or less stock hgwebdir installation should be set up. We might want to
|
|
come up with a style to match the Python website. It may also be useful to
|
|
build a quick extension to augment the URL rev parser so that it can also take
|
|
r[0-9]+ args and come up with the matching hg revision.
|
|
|
|
|
|
After migration
|
|
===============
|
|
|
|
Where to get code
|
|
-----------------
|
|
|
|
It needs to be decided where the hg repositories will live. I'd like to
|
|
propose to keep the hgwebdir instance at hg.python.org. This is an accepted
|
|
standard for many organizations, and an easy parallel to svn.python.org.
|
|
The 2.7 (trunk) repo might live at http://hg.python.org/main/, for example,
|
|
with py3k at http://hg.python.org/py3k/. For write access, developers will
|
|
have to use ssh, which could be ssh://hg@hg.python.org/main/. A demo
|
|
installation will be set up with a preliminary conversion so people can
|
|
experiment and review; it can live at http://hg.python.org/example/.
|
|
|
|
code.python.org was also proposed as the hostname. Personally, I think that
|
|
using the VCS name in the hostname is good because it prevents confusion: it
|
|
should be clear that you can't use svn or bzr for hg.python.org.
|
|
|
|
hgwebdir can already provide tarballs for every changeset. I think this
|
|
obviates the need for daily snapshots; we can just point users to tip.tar.gz
|
|
instead, meaning they will get the latest. If desired, we could even use
|
|
buildbot results to point to the last good changeset.
|
|
|
|
Python-specific documentation
|
|
-----------------------------
|
|
|
|
hg comes with good built-in documentation (available through hg help) and a
|
|
`wiki`_ that's full of useful information and recipes. In addition to that,
|
|
the `parts of the developer FAQ`_ concerning version control will gain a
|
|
section on using hg for Python development. Some of the text will be dependent
|
|
on the outcome of debate about this PEP (for example, the branching strategy).
|
|
|
|
.. _wiki: http://www.selenic.com/mercurial/wiki/
|
|
.. _parts of the developer FAQ: http://www.python.org/dev/faq/#version-control
|
|
|
|
Think first, commit later?
|
|
--------------------------
|
|
|
|
In recent history, old versions of Python have been maintained by a select
|
|
group of people backporting patches from trunk to release branches. While
|
|
this may not scale so well as the development pace grows, it also runs into
|
|
some problems with the current crop of distributed versioning tools. These
|
|
tools (I believe similar problems would exist for either git, bzr, or hg,
|
|
though some may cope better than others) are based on the idea of a Directed
|
|
Acyclic Graph (or DAG), meaning they keep track of relations of changesets.
|
|
|
|
Mercurial itself has a stable branch which is a ''strict'' subset of the
|
|
unstable branch. This means that generally all fixes for the stable branch
|
|
get committed against the tip of the stable branch, then they get merged into
|
|
the unstable branch (which already contains the parent of the new cset). This
|
|
provides a largely frictionless environment for moving changes from stable to
|
|
unstable branches. Mistakes, where a change that should go on stable goes on
|
|
unstable first, do happen, but they're usually easy to fix. That can be done by
|
|
copying the change over to the stable branch, then trivial-merging with
|
|
unstable -- meaning the merge in fact ignores the parent from the stable
|
|
branch).
|
|
|
|
This strategy means a little more work for regular committers, because they
|
|
have to think about whether their change should go on stable or unstable; they
|
|
may even have to ask someone else (the RM) before committing. But it also
|
|
relieves a dedicated group of committers of regular backporting duty, in
|
|
addition to making it easier to work with the tool.
|
|
|
|
Now would be a good time to consider changing strategies in this regard,
|
|
although it would be relatively easy to switch to such a model later on.
|
|
|
|
The future of Subversion
|
|
------------------------
|
|
|
|
What happens to the Subversion repositories after the migration? Since the svn
|
|
server contains a bunch of repositories, not just the CPython one, it will
|
|
probably live on for a bit as not every project may want to migrate or it
|
|
takes longer for other projects to migrate. To prevent people from staying
|
|
behind, we may want to remove migrated projects from the repository.
|
|
|
|
Build identification
|
|
--------------------
|
|
|
|
Python currently provides the sys.subversion tuple to allow Python code to
|
|
find out exactly what version of Python it's running against. The current
|
|
version looks something like this:
|
|
|
|
* ('CPython', 'tags/r262', '71600')
|
|
* ('CPython', 'trunk', '73128M')
|
|
|
|
Another value is returned from Py_GetBuildInfo() in the C API, and available
|
|
to Python code as part of sys.version:
|
|
|
|
* 'r262:71600, Jun 2 2009, 09:58:33'
|
|
* 'trunk:73128M, Jun 2 2009, 01:24:14'
|
|
|
|
I propose that the revision identifier will be the short version of hg's
|
|
revision hash, for example 'dd3ebf81af43', augmented with '+' (instead of 'M')
|
|
if the working directory from which it was built was modified. This mirrors
|
|
the output of the hg id command, which is intended for this kind of usage.
|
|
|
|
For the tag/branch identifier, I propose that hg will check for tags on the
|
|
currently checked out revision, use the tag if there is one ('tip' doesn't
|
|
count), and uses the branch name otherwise. sys.subversion becomes
|
|
|
|
* ('CPython', '2.6.2', 'dd3ebf81af43')
|
|
* ('CPython', 'default', 'af694c6a888c+')
|
|
|
|
and the build info string becomes
|
|
|
|
* '2.6.2:dd3ebf81af43, Jun 2 2009, 09:58:33'
|
|
* 'default:af694c6a888c+, Jun 2 2009, 01:24:14'
|
|
|
|
This reflects that the default branch in hg is called 'default' instead of
|
|
Subversion's 'trunk', and reflects the proposed new tag format.
|