python-peps/pep-0385.txt

173 lines
7.4 KiB
Plaintext

PEP: 385
Title: Migrating from svn to Mercurial
Version: $Revision: 72563 $
Last-Modified: $Date: 2009-05-11 14:50:03 +0200 (Mon, 11 May 2009) $
Author: Dirkjan Ochtman <dirkjan@ochtman.nl>
Status: Active
Type: Process
Content-Type: text/x-rst
Created: 25-May-2009
.. warning::
This PEP is in the draft stages.
Motivation
==========
After having decided to switch to the Mercurial DVCS, the actual migration
still has to be performed. In the case of an important piece of
infrastructure like the version control system for a large, distributed
project like Python, this is a significant effort. This PEP is an attempt
to describe the steps that must be taken for further discussion. It's
somewhat similar to `PEP 347`_, which discussed the migration to SVN.
To make the most of hg, I (Dirkjan) would like to make a high-fidelity
conversion, such that (a) as much of the svn metadata as possible is
retained, and (b) all metadata is converted to formats that are common in
Mercurial. This way, tools written for Mercurial can be optimally used. In
order to do this, I want to use the `hgsubversion`_ software to do an initial
conversion. This hg extension is focused on providing high-quality conversion
from Subversion to Mercurial for use in two-way correspondence, meaning it
doesn't throw away as much available metadata as other solutions.
Such a conversion also seems like a good time to reconsider the contents of
the repository and determine if some things are still valuable. In this spirit,
the following sections also propose discarding some of the older metadata.
.. _PEP 347: http://www.python.org/dev/peps/pep-0347/
.. _hgsubversion: http://bitbucket.org/durin42/hgsubversion/
Transition plan
===============
Branch strategy
---------------
Mercurial has two basic ways of using branches: cloned branches, where each
branch is kept in a separate directory, and named branches, where each revision
keeps metadata to note on which branch it belongs. The former makes it easier
to distinguish branches, at the expense of requiring more disk space on the
client. The latter makes it a little easier to switch between branches, but
often has somewhat unintuitive results for people (though this has been
getting better in recent versions of Mercurial).
I'm still a bit on the fence about whether Python should adopt cloned
branches and named branches. Since it usually makes more sense to tag releases
on the maintenance branch, for example, mainline history would not contain
release tags if we used cloned branches. Also, Mercurial 1.2 and 1.3 have the
necessary tools to make named branches less painful (because they can be
properly closed and closed heads are no longer considered in relevant cases).
A disadvantage might be that the used clones will be a good bit larger (since
they essentially contain all other branches as well). Perhaps it would be a
good idea to distinguish between feature branches (which would be done trough
a separate clone) and release branches (which would be named).
Converting branches
-------------------
There are quite a lot of branches in SVN's branches directory. I propose to
clean this up a bit, by employing the following the strategy:
* Keep all release (maintenance) branches
* Discard branches that haven't been touched in 18 months, unless somone
indicates there's still interest in such a branch
* Keep branches that have been touched in the last 18 months, unless someone
indicates the branch can be deprecated
Converting tags
---------------
The SVN tags directory contains a lot of old stuff. Some of these are not, in
fact, full tags, but contain only a smaller subset of the repository. I think
we should keep all release tags, and consider other tags for inclusion based
on requests from the developer community. I'd like to consider unifying the
release tag naming scheme to make some things more consistent, if people feel
that won't create too many problems.
Author map
----------
In order to provide user names the way they are common in hg (in the 'First Last
<user@example.org>' format), we need an author map to map cvs and svn user
names to real names and their email addresses. I have a complete version of such
a map in my `migration tools repository`_. The email addresses in it might be
out of date; that's bound to happen, although it would be nice to try and
have as many people as possible review it for addresses that are out of date.
The current version also still seems to contain some encoding problems.
.. _migration tools repository: http://hg.xavamedia.nl/cpython/pymigr/
Generating .hgignore
--------------------
The .hgignore file can be used in Mercurial repositories to help ignore files
that are not eligible for version control. It does this by employing several
possible forms of pattern matching. The current Python repository already
includes a rudimentary .hgignore file to help with using the hg mirrors.
It might be useful to have the .hgignore be generated automatically from
svn:ignore properties. This would make sure all historic revisions also have
useful ignore information (though one could argue ignoring isn't really
relevant to just checking out an old revision).
Revlog reordering
-----------------
As an optional optimization technique, we should consider trying a reordering
pass on the revlogs (internal Mercurial files) resulting from the conversion.
In some cases this results in dramatic decreases in on-disk repository size.
Other repositories
------------------
Richard Tew has indicated that he'd like the Stackless repository to also be
converted. What other projects in the svn.python.org repository should be
converted? Do we want to convert the peps repository? distutils? others?
Infrastructure
==============
hg-ssh
------
Developers should access the repositories through ssh, similar to the current
setup. Public keys can be used to grant people access to a shared hg@ account.
A hgwebdir instance should also be set up for easy browsing and read-only
access. If we're using ssh, developers should trivially be able to start new
clones (for longer-term features that profit from a separate branch).
Hooks
-----
A number of hooks is currently in use. The hg equivalents for these should be
developed and deployed. The following hooks are being used:
* check whitespace: a hook to reject commits in case the whitespace doesn't
match the rules for the Python codebase. Should be straightforward to
re-implement from the current version. Open issue: do we check only the tip
after each push, or do we check every commit in a changegroup?
* commit mails: we can leverage the notify extension for this
* buildbots: both the regular and the community build masters must be notified.
Fortunately buildbot includes support for hg. I've also implemented this for
Mercurial itself, so I don't expect problems here.
* check contributors: in the current setup, all changesets bear the username of
committers, who must have signed the contributor agreement. In a DVCS, the
committers are not necessarily the same people who push, and so we can't
check if the committer is a contributor. We could use a hook to check if the
committer is a contributor if we keep a list of registered contributors.
hgwebdir
--------
A more or less stock hgwebdir installation should be set up. We might want to
come up with a style to match the Python website. It may also be useful to
build a quick extension to augment the URL rev parser so that it can also take
r[0-9]+ args and come up with the matching hg revision.