2009-05-25 10:53:48 -04:00
|
|
|
PEP: 385
|
|
|
|
Title: Migrating from svn to Mercurial
|
2009-06-04 14:11:38 -04:00
|
|
|
Version: $Revision$
|
|
|
|
Last-Modified: $Date$
|
2009-05-25 10:53:48 -04:00
|
|
|
Author: Dirkjan Ochtman <dirkjan@ochtman.nl>
|
|
|
|
Status: Active
|
|
|
|
Type: Process
|
|
|
|
Content-Type: text/x-rst
|
|
|
|
Created: 25-May-2009
|
|
|
|
|
|
|
|
.. warning::
|
|
|
|
This PEP is in the draft stages.
|
|
|
|
|
|
|
|
|
|
|
|
Motivation
|
|
|
|
==========
|
|
|
|
|
|
|
|
After having decided to switch to the Mercurial DVCS, the actual migration
|
|
|
|
still has to be performed. In the case of an important piece of
|
|
|
|
infrastructure like the version control system for a large, distributed
|
|
|
|
project like Python, this is a significant effort. This PEP is an attempt
|
|
|
|
to describe the steps that must be taken for further discussion. It's
|
2009-06-04 11:12:13 -04:00
|
|
|
somewhat similar to `PEP 347`_, which discussed the migration to SVN.
|
2009-05-25 10:53:48 -04:00
|
|
|
|
|
|
|
To make the most of hg, I (Dirkjan) would like to make a high-fidelity
|
|
|
|
conversion, such that (a) as much of the svn metadata as possible is
|
|
|
|
retained, and (b) all metadata is converted to formats that are common in
|
|
|
|
Mercurial. This way, tools written for Mercurial can be optimally used. In
|
|
|
|
order to do this, I want to use the `hgsubversion`_ software to do an initial
|
|
|
|
conversion. This hg extension is focused on providing high-quality conversion
|
|
|
|
from Subversion to Mercurial for use in two-way correspondence, meaning it
|
|
|
|
doesn't throw away as much available metadata as other solutions.
|
|
|
|
|
|
|
|
Such a conversion also seems like a good time to reconsider the contents of
|
|
|
|
the repository and determine if some things are still valuable. In this spirit,
|
|
|
|
the following sections also propose discarding some of the older metadata.
|
|
|
|
|
|
|
|
.. _PEP 347: http://www.python.org/dev/peps/pep-0347/
|
|
|
|
.. _hgsubversion: http://bitbucket.org/durin42/hgsubversion/
|
|
|
|
|
|
|
|
|
|
|
|
Transition plan
|
|
|
|
===============
|
|
|
|
|
|
|
|
Branch strategy
|
|
|
|
---------------
|
|
|
|
|
|
|
|
Mercurial has two basic ways of using branches: cloned branches, where each
|
2009-06-04 11:33:30 -04:00
|
|
|
branch is kept in a separate repository, and named branches, where each revision
|
2009-05-25 10:53:48 -04:00
|
|
|
keeps metadata to note on which branch it belongs. The former makes it easier
|
|
|
|
to distinguish branches, at the expense of requiring more disk space on the
|
|
|
|
client. The latter makes it a little easier to switch between branches, but
|
|
|
|
often has somewhat unintuitive results for people (though this has been
|
|
|
|
getting better in recent versions of Mercurial).
|
|
|
|
|
2009-06-04 11:12:13 -04:00
|
|
|
I'm still a bit on the fence about whether Python should adopt cloned
|
|
|
|
branches and named branches. Since it usually makes more sense to tag releases
|
|
|
|
on the maintenance branch, for example, mainline history would not contain
|
|
|
|
release tags if we used cloned branches. Also, Mercurial 1.2 and 1.3 have the
|
|
|
|
necessary tools to make named branches less painful (because they can be
|
|
|
|
properly closed and closed heads are no longer considered in relevant cases).
|
|
|
|
|
|
|
|
A disadvantage might be that the used clones will be a good bit larger (since
|
|
|
|
they essentially contain all other branches as well). Perhaps it would be a
|
|
|
|
good idea to distinguish between feature branches (which would be done trough
|
|
|
|
a separate clone) and release branches (which would be named).
|
2009-05-25 10:53:48 -04:00
|
|
|
|
|
|
|
Converting branches
|
|
|
|
-------------------
|
|
|
|
|
|
|
|
There are quite a lot of branches in SVN's branches directory. I propose to
|
|
|
|
clean this up a bit, by employing the following the strategy:
|
|
|
|
|
|
|
|
* Keep all release (maintenance) branches
|
|
|
|
* Discard branches that haven't been touched in 18 months, unless somone
|
|
|
|
indicates there's still interest in such a branch
|
|
|
|
* Keep branches that have been touched in the last 18 months, unless someone
|
|
|
|
indicates the branch can be deprecated
|
|
|
|
|
|
|
|
Converting tags
|
|
|
|
---------------
|
|
|
|
|
|
|
|
The SVN tags directory contains a lot of old stuff. Some of these are not, in
|
|
|
|
fact, full tags, but contain only a smaller subset of the repository. I think
|
|
|
|
we should keep all release tags, and consider other tags for inclusion based
|
|
|
|
on requests from the developer community. I'd like to consider unifying the
|
|
|
|
release tag naming scheme to make some things more consistent, if people feel
|
|
|
|
that won't create too many problems.
|
|
|
|
|
|
|
|
Author map
|
|
|
|
----------
|
|
|
|
|
|
|
|
In order to provide user names the way they are common in hg (in the 'First Last
|
|
|
|
<user@example.org>' format), we need an author map to map cvs and svn user
|
|
|
|
names to real names and their email addresses. I have a complete version of such
|
|
|
|
a map in my `migration tools repository`_. The email addresses in it might be
|
|
|
|
out of date; that's bound to happen, although it would be nice to try and
|
|
|
|
have as many people as possible review it for addresses that are out of date.
|
|
|
|
The current version also still seems to contain some encoding problems.
|
|
|
|
|
|
|
|
.. _migration tools repository: http://hg.xavamedia.nl/cpython/pymigr/
|
|
|
|
|
|
|
|
Generating .hgignore
|
|
|
|
--------------------
|
|
|
|
|
|
|
|
The .hgignore file can be used in Mercurial repositories to help ignore files
|
|
|
|
that are not eligible for version control. It does this by employing several
|
|
|
|
possible forms of pattern matching. The current Python repository already
|
|
|
|
includes a rudimentary .hgignore file to help with using the hg mirrors.
|
|
|
|
|
|
|
|
It might be useful to have the .hgignore be generated automatically from
|
|
|
|
svn:ignore properties. This would make sure all historic revisions also have
|
|
|
|
useful ignore information (though one could argue ignoring isn't really
|
|
|
|
relevant to just checking out an old revision).
|
|
|
|
|
|
|
|
Revlog reordering
|
|
|
|
-----------------
|
|
|
|
|
|
|
|
As an optional optimization technique, we should consider trying a reordering
|
|
|
|
pass on the revlogs (internal Mercurial files) resulting from the conversion.
|
|
|
|
In some cases this results in dramatic decreases in on-disk repository size.
|
|
|
|
|
|
|
|
Other repositories
|
|
|
|
------------------
|
|
|
|
|
|
|
|
Richard Tew has indicated that he'd like the Stackless repository to also be
|
|
|
|
converted. What other projects in the svn.python.org repository should be
|
|
|
|
converted? Do we want to convert the peps repository? distutils? others?
|
|
|
|
|
|
|
|
|
|
|
|
Infrastructure
|
|
|
|
==============
|
|
|
|
|
|
|
|
hg-ssh
|
|
|
|
------
|
|
|
|
|
|
|
|
Developers should access the repositories through ssh, similar to the current
|
|
|
|
setup. Public keys can be used to grant people access to a shared hg@ account.
|
|
|
|
A hgwebdir instance should also be set up for easy browsing and read-only
|
2009-06-04 11:12:13 -04:00
|
|
|
access. If we're using ssh, developers should trivially be able to start new
|
|
|
|
clones (for longer-term features that profit from a separate branch).
|
2009-05-25 10:53:48 -04:00
|
|
|
|
|
|
|
Hooks
|
|
|
|
-----
|
|
|
|
|
|
|
|
A number of hooks is currently in use. The hg equivalents for these should be
|
|
|
|
developed and deployed. The following hooks are being used:
|
|
|
|
|
|
|
|
* check whitespace: a hook to reject commits in case the whitespace doesn't
|
|
|
|
match the rules for the Python codebase. Should be straightforward to
|
2009-06-04 12:38:23 -04:00
|
|
|
re-implement from the current version. We can also offer a whitespace hook
|
|
|
|
for use with client-side repositories that people can use; it could either
|
|
|
|
warn about whitespace issues and/or truncate trailing whitespace from changed
|
|
|
|
lines. Open issue: do we check only the tip after each push, or do we check
|
|
|
|
every commit in a changegroup?
|
2009-05-25 10:53:48 -04:00
|
|
|
|
|
|
|
* commit mails: we can leverage the notify extension for this
|
|
|
|
|
|
|
|
* buildbots: both the regular and the community build masters must be notified.
|
|
|
|
Fortunately buildbot includes support for hg. I've also implemented this for
|
|
|
|
Mercurial itself, so I don't expect problems here.
|
|
|
|
|
|
|
|
* check contributors: in the current setup, all changesets bear the username of
|
|
|
|
committers, who must have signed the contributor agreement. In a DVCS, the
|
|
|
|
committers are not necessarily the same people who push, and so we can't
|
|
|
|
check if the committer is a contributor. We could use a hook to check if the
|
|
|
|
committer is a contributor if we keep a list of registered contributors.
|
|
|
|
|
|
|
|
hgwebdir
|
|
|
|
--------
|
|
|
|
|
|
|
|
A more or less stock hgwebdir installation should be set up. We might want to
|
|
|
|
come up with a style to match the Python website. It may also be useful to
|
|
|
|
build a quick extension to augment the URL rev parser so that it can also take
|
|
|
|
r[0-9]+ args and come up with the matching hg revision.
|