344 lines
15 KiB
Plaintext
344 lines
15 KiB
Plaintext
PEP: 385
|
|
Title: Migrating from svn to Mercurial
|
|
Version: $Revision$
|
|
Last-Modified: $Date$
|
|
Author: Dirkjan Ochtman <dirkjan@ochtman.nl>
|
|
Status: Active
|
|
Type: Process
|
|
Content-Type: text/x-rst
|
|
Created: 25-May-2009
|
|
|
|
.. warning::
|
|
This PEP is in the draft stages.
|
|
|
|
|
|
Motivation
|
|
==========
|
|
|
|
After having decided to switch to the Mercurial DVCS, the actual migration
|
|
still has to be performed. In the case of an important piece of
|
|
infrastructure like the version control system for a large, distributed
|
|
project like Python, this is a significant effort. This PEP is an attempt
|
|
to describe the steps that must be taken for further discussion. It's
|
|
somewhat similar to `PEP 347`_, which discussed the migration to SVN.
|
|
|
|
To make the most of hg, I (Dirkjan) would like to make a high-fidelity
|
|
conversion, such that (a) as much of the svn metadata as possible is
|
|
retained, and (b) all metadata is converted to formats that are common in
|
|
Mercurial. This way, tools written for Mercurial can be optimally used. In
|
|
order to do this, I want to use the `hgsubversion`_ software to do an initial
|
|
conversion. This hg extension is focused on providing high-quality conversion
|
|
from Subversion to Mercurial for use in two-way correspondence, meaning it
|
|
doesn't throw away as much available metadata as other solutions.
|
|
|
|
Such a conversion also seems like a good time to reconsider the contents of
|
|
the repository and determine if some things are still valuable. In this spirit,
|
|
the following sections also propose discarding some of the older metadata.
|
|
|
|
.. _PEP 347: http://www.python.org/dev/peps/pep-0347/
|
|
.. _hgsubversion: http://bitbucket.org/durin42/hgsubversion/
|
|
|
|
|
|
Timeline
|
|
========
|
|
|
|
TBD; needs fully working hgsubversion and consensus on this document.
|
|
|
|
|
|
Transition plan
|
|
===============
|
|
|
|
Branch strategy
|
|
---------------
|
|
|
|
Mercurial has two basic ways of using branches: cloned branches, where each
|
|
branch is kept in a separate repository, and named branches, where each revision
|
|
keeps metadata to note on which branch it belongs. The former makes it easier
|
|
to distinguish branches, at the expense of requiring more disk space on the
|
|
client. The latter makes it a little easier to switch between branches, but
|
|
often has somewhat unintuitive results for people (though this has been
|
|
getting better in recent versions of Mercurial).
|
|
|
|
The current proposal is to use named branches for release branches and adopt
|
|
cloned branches for feature branches, with one exception to this rule: the 3.x
|
|
branches will be kept in separate clones from the 2.x branches. I think this
|
|
provides an optimal hybrid approach for Python's uses of branching.
|
|
|
|
Differences between named branches and cloned branches:
|
|
|
|
* Tags in a different (maintenance) clone aren't available in the local clone
|
|
* Clones with named branches will be larger, since they contain more data
|
|
|
|
(The Mercurial book discourages the use of named branches, but it is, in this
|
|
respect, somewhat outdated. Named branches have gotten much easier to use
|
|
since that comment was written, due to improvements in hg.)
|
|
|
|
Converting branches
|
|
-------------------
|
|
|
|
There are quite a lot of branches in SVN's branches directory. I propose to
|
|
clean this up a bit, by following this basic strategy:
|
|
|
|
* Keep all release (maintenance) branches
|
|
* Discard branches that haven't been touched in 18 months, unless somone
|
|
indicates there's still interest in such a branch
|
|
* Keep branches that have been touched in the last 18 months, unless someone
|
|
indicates the branch can be deprecated
|
|
|
|
There's a `branch map`_ available that shows info about each branch:
|
|
|
|
* keep-clone means we'll keep that branch in a separate clone
|
|
* keep-named means we'll keep that branch as a named branch in one of the clones
|
|
* strip means we won't keep that branch
|
|
* streamed-merge means that it got merged by committing several new revisions
|
|
to the other branch
|
|
* merged-r* means the branch got merged in the named revision
|
|
* merges? means I haven't checked/found out yet whether that branch was ever
|
|
merged
|
|
* ? means that your input would be even more helpful than for the other items
|
|
* some items have no action yet, feel free to treat that as just '?'
|
|
|
|
.. _branch map: http://hg.python.org/pymigr/file/tip/all-branches.txt
|
|
|
|
Converting tags
|
|
---------------
|
|
|
|
The SVN tags directory contains a lot of old stuff. Some of these are not, in
|
|
fact, full tags, but contain only a smaller subset of the repository. I think
|
|
we should keep all release tags, and consider other tags for inclusion based
|
|
on requests from the developer community. I'd like to consider unifying the
|
|
release tag naming scheme to make some things more consistent, if people feel
|
|
that won't create too many problems. The current proposal is to bring old
|
|
release tags in line with the current practice of release tag naming.
|
|
|
|
Author map
|
|
----------
|
|
|
|
In order to provide user names the way they are common in hg (in the 'First Last
|
|
<user@example.org>' format), we need an author map to map cvs and svn user
|
|
names to real names and their email addresses. I have a complete version of such
|
|
a map in my `migration tools repository`_. The email addresses in it might be
|
|
out of date; that's bound to happen, although it would be nice to try and
|
|
have as many people as possible review it for addresses that are out of date.
|
|
The current version also still seems to contain some encoding problems.
|
|
|
|
.. _migration tools repository: http://hg.xavamedia.nl/cpython/pymigr/
|
|
|
|
Generating .hgignore
|
|
--------------------
|
|
|
|
The .hgignore file can be used in Mercurial repositories to help ignore files
|
|
that are not eligible for version control. It does this by employing several
|
|
possible forms of pattern matching. The current Python repository already
|
|
includes a rudimentary .hgignore file to help with using the hg mirrors.
|
|
|
|
Since the current Python repository already includes a .hgignore file (for use
|
|
with hg mirrors), we'll just use that. Generating full history of the file
|
|
was debated but deemed impractical (because it's relatively hard with fairly
|
|
little gain, since ignoring is less important for older revisions).
|
|
|
|
Revlog reordering
|
|
-----------------
|
|
|
|
As an optional optimization technique, I have performed a reordering pass on
|
|
the revlogs (internal Mercurial files) resulting from the conversion. In some
|
|
cases this results in dramatic decreases in on-disk repository size. This
|
|
especially makes sense for the manifest (where it really helps out quite a lot)
|
|
and oft-edited files like NEWS.txt (with an admittedly smaller effect).
|
|
|
|
Other repositories
|
|
------------------
|
|
|
|
Richard Tew has indicated that he'd like the Stackless repository to also be
|
|
converted. What other projects in the svn.python.org repository should be
|
|
converted? Do we want to convert the peps repository? distutils? others?
|
|
|
|
There's now an initial stab at converting the Jython repository. The current
|
|
tip of hgsubversion unfortunately fails at some point. Pending investigation.
|
|
|
|
Other repositories that would like to converted to Mercurial can announce
|
|
themselves to me after the main Python migration is done, and I'll take care
|
|
of their needs.
|
|
|
|
|
|
Infrastructure
|
|
==============
|
|
|
|
hg-ssh
|
|
------
|
|
|
|
Developers should access the repositories through ssh, similar to the current
|
|
setup. Public keys can be used to grant people access to a shared hg@ account.
|
|
A hgwebdir instance should also be set up for easy browsing and read-only
|
|
access. If we're using ssh, developers should trivially be able to start new
|
|
clones (for longer-term features that profit from a separate branch).
|
|
|
|
Hooks
|
|
-----
|
|
|
|
A number of hooks is currently in use. The hg equivalents for these should be
|
|
developed and deployed. The following hooks are being used:
|
|
|
|
* check whitespace: a hook to reject commits in case the whitespace doesn't
|
|
match the rules for the Python codebase. Should be straightforward to
|
|
re-implement from the current version. We can also offer a whitespace hook
|
|
for use with client-side repositories that people can use; it could either
|
|
warn about whitespace issues and/or truncate trailing whitespace from changed
|
|
lines. Open issue: do we check only the tip after each push, or do we check
|
|
every commit in a changegroup?
|
|
|
|
* commit mails: we can leverage the notify extension for this. Emails will
|
|
include diffs for each changeset committed against the repository.
|
|
|
|
* buildbots: both the regular and the community build masters must be notified.
|
|
Fortunately buildbot includes support for hg. I've also implemented this for
|
|
Mercurial itself, so I don't expect problems here.
|
|
|
|
* check contributors: in the current setup, all changesets bear the username of
|
|
committers, who must have signed the contributor agreement. We might want to
|
|
use a hook to check if the committer is a contributor if we keep a list of
|
|
registered contributors. Then, the hook might warn users that push a group
|
|
of revisions containing changesets from unknown contributors.
|
|
|
|
End-of-line conversions
|
|
-----------------------
|
|
|
|
There has been some discussion about the lack of end-of-line conversion support
|
|
in Mercurial. While Mercurial comes with a win32text extension that provides
|
|
some basic support for converting end-of-line data on a file-name pattern
|
|
basis, the lack of exclusion (for specifying broad rules with exceptions) and
|
|
the use of hgrc files (which can't be versioned) make it less than ideal.
|
|
|
|
I think the primary line of defense for prevention of inappropriate newlines
|
|
should be hooks on the server side which basically turn down any changegroup
|
|
or changeset introducing such data. The use of the win32text extension (which
|
|
can hopefully be improved/extended to support the usage scenarios mentioned
|
|
above) and/or a commit-time hook could be the first line of defense.
|
|
|
|
hgwebdir
|
|
--------
|
|
|
|
A more or less stock hgwebdir installation should be set up. We might want to
|
|
come up with a style to match the Python website. It may also be useful to
|
|
build a quick extension to augment the URL rev parser so that it can also take
|
|
r[0-9]+ args and come up with the matching hg revision.
|
|
|
|
roundup
|
|
-------
|
|
|
|
We'll come up with an auto-linking plugin for roundup, which can match a
|
|
changeset identifier (possibly with a branch prefix), and link it to the
|
|
appropriate revision in the hgwebdir instance. Second, the script above (in
|
|
the hgwebdir section) will make sure that old links to revision should continue
|
|
to work (by pointing to the hg changeset that reflects the svn revision).
|
|
|
|
|
|
After migration
|
|
===============
|
|
|
|
Where to get code
|
|
-----------------
|
|
|
|
It needs to be decided where the hg repositories will live. I'd like to
|
|
propose to keep the hgwebdir instance at hg.python.org. This is an accepted
|
|
standard for many organizations, and an easy parallel to svn.python.org.
|
|
The 2.7 (trunk) repo might live at http://hg.python.org/main/, for example,
|
|
with py3k at http://hg.python.org/py3k/. For write access, developers will
|
|
have to use ssh, which could be ssh://hg@hg.python.org/main/. A demo
|
|
installation will be set up with a preliminary conversion so people can
|
|
experiment and review; it can live at http://hg.python.org/example/.
|
|
|
|
code.python.org was also proposed as the hostname. Personally, I think that
|
|
using the VCS name in the hostname is good because it prevents confusion: it
|
|
should be clear that you can't use svn or bzr for hg.python.org.
|
|
|
|
hgwebdir can already provide tarballs for every changeset. I think this
|
|
obviates the need for daily snapshots; we can just point users to tip.tar.gz
|
|
instead, meaning they will get the latest. If desired, we could even use
|
|
buildbot results to point to the last good changeset.
|
|
|
|
Python-specific documentation
|
|
-----------------------------
|
|
|
|
hg comes with good built-in documentation (available through hg help) and a
|
|
`wiki`_ that's full of useful information and recipes. In addition to that,
|
|
the `parts of the developer FAQ`_ concerning version control will gain a
|
|
section on using hg for Python development. Some of the text will be dependent
|
|
on the outcome of debate about this PEP (for example, the branching strategy).
|
|
|
|
.. _wiki: http://www.selenic.com/mercurial/wiki/
|
|
.. _parts of the developer FAQ: http://www.python.org/dev/faq/#version-control
|
|
|
|
Proposed workflow
|
|
-----------------
|
|
|
|
I propose two workflows for the migration of patches between several branches.
|
|
|
|
For migration within 2.x or 3.x branches, I propose a patch always gets
|
|
committed to the oldest branch where it applies first. Then, the resulting
|
|
changeset can be merged using hg merge to all newer branches within that
|
|
series (2.x or 3.x). If it does not apply as-is to the newer branch, hg revert
|
|
can be used to easily revert to the new-branch-native head, patch in some
|
|
alternative version of the patch (or none, if it's not applicable), then commit
|
|
the merge. The premise here is that all changesets from an older branch within
|
|
the series are eventually merged to all newer branches within the series.
|
|
|
|
The upshot is that this provides for the most painless merging procedure. The
|
|
downside is that in the general case, people have to think about the oldest
|
|
branch to which the patch should be applied before actually applying it.
|
|
|
|
For migration between 2.x and 3.x branches (which should all be in the same
|
|
direction, though I'm not sure what direction is most appropriate here),
|
|
changesets should be transplanted (not merged) in some other way. The
|
|
transplant extension, import/export and bundle/unbundle work equally well here.
|
|
|
|
Choosing this approach allows 3.x not to carry all of the 2.x history-since-it-
|
|
was-branched, meaning the clone is not as big and the merges not as complicated.
|
|
|
|
The future of Subversion
|
|
------------------------
|
|
|
|
What happens to the Subversion repositories after the migration? Since the svn
|
|
server contains a bunch of repositories, not just the CPython one, it will
|
|
probably live on for a bit as not every project may want to migrate or it
|
|
takes longer for other projects to migrate. To prevent people from staying
|
|
behind, we may want to remove migrated projects from the repository.
|
|
|
|
Build identification
|
|
--------------------
|
|
|
|
Python currently provides the sys.subversion tuple to allow Python code to
|
|
find out exactly what version of Python it's running against. The current
|
|
version looks something like this:
|
|
|
|
* ('CPython', 'tags/r262', '71600')
|
|
* ('CPython', 'trunk', '73128M')
|
|
|
|
Another value is returned from Py_GetBuildInfo() in the C API, and available
|
|
to Python code as part of sys.version:
|
|
|
|
* 'r262:71600, Jun 2 2009, 09:58:33'
|
|
* 'trunk:73128M, Jun 2 2009, 01:24:14'
|
|
|
|
I propose that the revision identifier will be the short version of hg's
|
|
revision hash, for example 'dd3ebf81af43', augmented with '+' (instead of 'M')
|
|
if the working directory from which it was built was modified. This mirrors
|
|
the output of the hg id command, which is intended for this kind of usage. The
|
|
sys.subversion value will also be renamed to sys.mercurial to reflect the
|
|
change in VCS.
|
|
|
|
For the tag/branch identifier, I propose that hg will check for tags on the
|
|
currently checked out revision, use the tag if there is one ('tip' doesn't
|
|
count), and uses the branch name otherwise. sys.subversion becomes
|
|
|
|
* ('CPython', '2.6.2', 'dd3ebf81af43')
|
|
* ('CPython', 'default', 'af694c6a888c+')
|
|
|
|
and the build info string becomes
|
|
|
|
* '2.6.2:dd3ebf81af43, Jun 2 2009, 09:58:33'
|
|
* 'default:af694c6a888c+, Jun 2 2009, 01:24:14'
|
|
|
|
This reflects that the default branch in hg is called 'default' instead of
|
|
Subversion's 'trunk', and reflects the proposed new tag format.
|