428 lines
16 KiB
Plaintext
428 lines
16 KiB
Plaintext
PEP: 385
|
||
Title: Migrating from svn to Mercurial
|
||
Version: $Revision$
|
||
Last-Modified: $Date$
|
||
Author: Dirkjan Ochtman <dirkjan@ochtman.nl>
|
||
Status: Active
|
||
Type: Process
|
||
Content-Type: text/x-rst
|
||
Created: 25-May-2009
|
||
|
||
|
||
Motivation
|
||
==========
|
||
|
||
After having decided to switch to the Mercurial DVCS, the actual
|
||
migration still has to be performed. In the case of an important
|
||
piece of infrastructure like the version control system for a large,
|
||
distributed project like Python, this is a significant effort. This
|
||
PEP is an attempt to describe the steps that must be taken for further
|
||
discussion. It's somewhat similar to `PEP 347`_, which discussed the
|
||
migration to SVN.
|
||
|
||
To make the most of hg, I (Dirkjan) would like to make a high-fidelity
|
||
conversion, such that (a) as much of the svn metadata as possible is
|
||
retained, and (b) all metadata is converted to formats that are common
|
||
in Mercurial. This way, tools written for Mercurial can be optimally
|
||
used. In order to do this, I want to use the `hgsubversion`_ software
|
||
to do an initial conversion. This hg extension is focused on
|
||
providing high-quality conversion from Subversion to Mercurial for use
|
||
in two-way correspondence, meaning it doesn't throw away as much
|
||
available metadata as other solutions.
|
||
|
||
Such a conversion also seems like a good time to reconsider the
|
||
contents of the repository and determine if some things are still
|
||
valuable. In this spirit, the following sections also propose
|
||
discarding some of the older metadata.
|
||
|
||
.. _PEP 347: http://www.python.org/dev/peps/pep-0347/
|
||
.. _hgsubversion: http://bitbucket.org/durin42/hgsubversion/
|
||
|
||
|
||
Timeline
|
||
========
|
||
|
||
The current schedule for conversion milestones:
|
||
|
||
- 2010-11-20: availability of a test repo at hg.python.org
|
||
|
||
Test commits will be allowed (and encouraged) from all committers to
|
||
the Subversion repository. The test repository and all test commits
|
||
will be removed once the final conversion is done. The server-side
|
||
hooks will be installed for the test repository, in order to test
|
||
buildbot, diff-email and whitespace checking integration.
|
||
|
||
- 2010-12-12: final conversion (tentative)
|
||
|
||
Commits to the Subversion branches now maintained in Mercurial will
|
||
be blocked. Developers should refrain from pushing to the Mercurial
|
||
repositories until all infrastructure is ensured to work after their
|
||
switch over to the new repository.
|
||
|
||
|
||
Todo list
|
||
=========
|
||
|
||
The current list of issues to resolve at various steps in the
|
||
conversion is kept `in the pymigr repo`_.
|
||
|
||
.. _in the pymigr repo: http://hg.python.org/pymigr/file/tip/todo.txt
|
||
|
||
|
||
Transition plan
|
||
===============
|
||
|
||
Branch strategy
|
||
---------------
|
||
|
||
Mercurial has two basic ways of using branches: cloned branches, where
|
||
each branch is kept in a separate repository, and named branches,
|
||
where each revision keeps metadata to note on which branch it belongs.
|
||
The former makes it easier to distinguish branches, at the expense of
|
||
requiring more disk space on the client. The latter makes it a little
|
||
easier to switch between branches, but often has somewhat unintuitive
|
||
results for people (though this has been getting better in recent
|
||
versions of Mercurial).
|
||
|
||
The current proposal is to use named branches for release branches and
|
||
adopt cloned branches for feature branches, with one exception to this
|
||
rule: the 3.x branches will be kept in separate clones from the 2.x
|
||
branches. I think this provides an optimal hybrid approach for
|
||
Python's uses of branching.
|
||
|
||
Differences between named branches and cloned branches:
|
||
|
||
* Tags in a different (maintenance) clone aren't available in the
|
||
local clone
|
||
* Clones with named branches will be larger, since they contain more
|
||
data
|
||
|
||
(The Mercurial book discourages the use of named branches, but it is,
|
||
in this respect, somewhat outdated. Named branches have gotten much
|
||
easier to use since that comment was written, due to improvements in
|
||
hg.)
|
||
|
||
Converting branches
|
||
-------------------
|
||
|
||
There are quite a lot of branches in SVN's branches directory. I
|
||
propose to clean this up a bit, by following this basic strategy:
|
||
|
||
* Keep all release (maintenance) branches
|
||
* Discard branches that haven't been touched in 18 months, unless
|
||
somone indicates there's still interest in such a branch
|
||
* Keep branches that have been touched in the last 18 months, unless
|
||
someone indicates the branch can be deprecated
|
||
|
||
There's a `branch map`_ available that shows info about each branch:
|
||
|
||
* keep-clone means we'll keep that branch in a separate clone
|
||
* keep-named means we'll keep that branch as a named branch in one of
|
||
the clones
|
||
* strip means we won't keep that branch
|
||
* streamed-merge means that it got merged by committing several new
|
||
revisions to the other branch
|
||
* merged-r* means the branch got merged in the named revision
|
||
* merges? means I haven't checked/found out yet whether that branch
|
||
was ever merged
|
||
* ? means that your input would be even more helpful than for the
|
||
other items
|
||
* some items have no action yet, feel free to treat that as just '?'
|
||
|
||
.. _branch map: http://hg.python.org/pymigr/file/tip/all-branches.txt
|
||
|
||
Converting tags
|
||
---------------
|
||
|
||
The SVN tags directory contains a lot of old stuff. Some of these are
|
||
not, in fact, full tags, but contain only a smaller subset of the
|
||
repository. All release tags will be kept; other tags will be
|
||
included based on requests from the developer community. I'd like to
|
||
consider unifying the release tag naming scheme to make some things
|
||
more consistent, if people feel that won't create too many problems.
|
||
The current proposal is to bring old release tags in line with the
|
||
current practice of release tag naming.
|
||
|
||
Author map
|
||
----------
|
||
|
||
In order to provide user names the way they are common in hg (in the
|
||
'First Last <user@example.org>' format), we need an author map to map
|
||
cvs and svn user names to real names and their email addresses. I
|
||
have a complete version of such a map in my `migration tools
|
||
repository`_. The email addresses in it might be out of date; that's
|
||
bound to happen, although it would be nice to try and have as many
|
||
people as possible review it for addresses that are out of date. The
|
||
current version also still seems to contain some encoding problems.
|
||
|
||
.. _migration tools repository: http://hg.python.org/pymigr/
|
||
|
||
Generating .hgignore
|
||
--------------------
|
||
|
||
The .hgignore file can be used in Mercurial repositories to help
|
||
ignore files that are not eligible for version control. It does this
|
||
by employing several possible forms of pattern matching. The current
|
||
Python repository already includes a rudimentary .hgignore file to
|
||
help with using the hg mirrors.
|
||
|
||
Since the current Python repository already includes a .hgignore file
|
||
(for use with hg mirrors), we'll just use that. Generating full
|
||
history of the file was debated but deemed impractical (because it's
|
||
relatively hard with fairly little gain, since ignoring is less
|
||
important for older revisions).
|
||
|
||
Revlog reordering
|
||
-----------------
|
||
|
||
As an optional optimization technique, I have performed a reordering
|
||
pass on the revlogs (internal Mercurial files) resulting from the
|
||
conversion. In some cases this results in dramatic decreases in
|
||
on-disk repository size. This especially makes sense for the manifest
|
||
(where it really helps out quite a lot) and oft-edited files like
|
||
Misc/NEWS (with an admittedly smaller effect).
|
||
|
||
Other repositories
|
||
------------------
|
||
|
||
Richard Tew has indicated that he'd like the Stackless repository to
|
||
also be converted. What other projects in the svn.python.org
|
||
repository should be converted? Do we want to convert the peps
|
||
repository? distutils? others?
|
||
|
||
There's now an initial stab at converting the Jython repository. The
|
||
current tip of hgsubversion unfortunately fails at some point.
|
||
Pending investigation.
|
||
|
||
Other repositories that would like to converted to Mercurial can
|
||
announce themselves to me after the main Python migration is done, and
|
||
I'll take care of their needs.
|
||
|
||
|
||
Infrastructure
|
||
==============
|
||
|
||
hg-ssh
|
||
------
|
||
|
||
Developers should access the repositories through ssh, similar to the
|
||
current setup. Public keys can be used to grant people access to a
|
||
shared hg@ account. A hgwebdir instance will also be set up for easy
|
||
browsing and read-only access. If we're using ssh, developers should
|
||
trivially be able to start new clones (for longer-term features that
|
||
profit from development in a separate repository).
|
||
|
||
Hooks
|
||
-----
|
||
|
||
A number of hooks is currently in use. The hg equivalents for these
|
||
should be developed and deployed. The following hooks are being used:
|
||
|
||
* check whitespace: a hook to reject commits in case the whitespace
|
||
doesn't match the rules for the Python codebase. In a changegroup,
|
||
only the tip is checked (this allows cleanup commits for changes
|
||
pulled from third-party repos). We can also offer a whitespace hook
|
||
for use with client-side repositories that people can use; it could
|
||
either warn about whitespace issues and/or truncate trailing
|
||
whitespace from changed lines.
|
||
|
||
* commit mails: Emails will include diffs for each changeset committed
|
||
against the repository.
|
||
|
||
* buildbots: both the regular and the community build masters must be
|
||
notified.
|
||
|
||
The `hooks repository`_ contains ports of these server-side hooks to
|
||
Mercurial. One additional hook could be beneficial:
|
||
|
||
* check contributors: in the current setup, all changesets bear the
|
||
username of committers, who must have signed the contributor
|
||
agreement. We might want to use a hook to check if the committer is
|
||
a contributor if we keep a list of registered contributors. Then,
|
||
the hook might warn users that push a group of revisions containing
|
||
changesets from unknown contributors.
|
||
|
||
.. _hooks repository: http://hg.python.org/hooks/
|
||
|
||
End-of-line conversions
|
||
-----------------------
|
||
|
||
Discussion about the lack of end-of-line conversion support in
|
||
Mercurial, which was provided initially by the win32text extension,
|
||
led to the development of the new eol extension that supports a
|
||
versioned management of line-ending conventions on a file-by-file
|
||
basis, akin to Subversion's ``svn:eol-style`` properties. This
|
||
information is kept in a versioned file called ``.hgeol``, and such a
|
||
file has already been checked into the Subversion repository.
|
||
|
||
A hook on the server side that turns down any changegroup or changeset
|
||
introducing inconsistent newline data can still be implemented, if
|
||
deemed necessary.
|
||
|
||
hgwebdir
|
||
--------
|
||
|
||
A more or less stock hgwebdir installation should be set up. We might
|
||
want to come up with a style to match the Python website.
|
||
|
||
A `small WSGI application`_ has been written that can look up
|
||
Subversion revisions and redirect to the appropriate hgweb page for
|
||
the given changeset, regardless in which repository the converted
|
||
revision ended up (since one big Subversion repository is converted
|
||
into several Mercurial repositories). It can also look up Mercurial
|
||
changesets by their hexadecimal ID.
|
||
|
||
.. _small WSGI application: http://hg.python.org/pymigr/file/tip/hglookup.py
|
||
|
||
roundup
|
||
-------
|
||
|
||
By pointing Roundup to the URL of the lookup script mentioned above,
|
||
links to SVN revisions will continue to work, and links to Mercurial
|
||
changesets can be created as well, without having to give repository
|
||
*and* changeset ID.
|
||
|
||
|
||
After migration
|
||
===============
|
||
|
||
Where to get code
|
||
-----------------
|
||
|
||
After migration, the hgwebdir will live at hg.python.org. This is an
|
||
accepted standard for many organizations, and an easy parallel to
|
||
svn.python.org. The 3.x repo might live at
|
||
http://hg.python.org/main/, for example, with the 2.x repo at
|
||
http://hg.python.org/2.x/. For write access, developers will have to
|
||
use ssh, which could be ssh://hg@hg.python.org/main/. A demo
|
||
installation will be set up with a preliminary conversion so people
|
||
can experiment and review; it can live at
|
||
http://hg.python.org/example/.
|
||
|
||
code.python.org was also proposed as the hostname. Personally, I
|
||
think that using the VCS name in the hostname is good because it
|
||
prevents confusion: it should be clear that you can't use svn or bzr
|
||
for hg.python.org.
|
||
|
||
hgwebdir can already provide tarballs for every changeset. I think
|
||
this obviates the need for daily snapshots; we can just point users to
|
||
tip.tar.gz instead, meaning they will get the latest. If desired, we
|
||
could even use buildbot results to point to the last good changeset.
|
||
|
||
Python-specific documentation
|
||
-----------------------------
|
||
|
||
hg comes with good built-in documentation (available through hg help)
|
||
and a `wiki`_ that's full of useful information and recipes. In
|
||
addition to that, the `parts of the developer FAQ`_ concerning version
|
||
control will gain a section on using hg for Python development. Some
|
||
of the text will be dependent on the outcome of debate about this PEP
|
||
(for example, the branching strategy).
|
||
|
||
.. _wiki: http://www.selenic.com/mercurial/wiki/
|
||
.. _parts of the developer FAQ: http://www.python.org/dev/faq/#version-control
|
||
|
||
The developer FAQ will be overhauled by Brett Cannon, which will
|
||
include any updates needed with respect to Mercurial.
|
||
|
||
Proposed workflow
|
||
-----------------
|
||
|
||
I propose two workflows for the migration of patches between several
|
||
branches.
|
||
|
||
For migration within 2.x or 3.x branches, I propose a patch always
|
||
gets committed to the oldest branch where it applies first. Then, the
|
||
resulting changeset can be merged using hg merge to all newer branches
|
||
within that series (2.x or 3.x). If it does not apply as-is to the
|
||
newer branch, hg revert can be used to easily revert to the
|
||
new-branch-native head, patch in some alternative version of the patch
|
||
(or none, if it's not applicable), then commit the merge. The premise
|
||
here is that all changesets from an older branch within the series are
|
||
eventually merged to all newer branches within the series.
|
||
|
||
The upshot is that this provides for the most painless merging
|
||
procedure. This means that in the general case, people have to think
|
||
about the oldest branch to which the patch should be applied before
|
||
actually applying it. Usually, that is one of only two branches: the
|
||
latest maintenance branch and the trunk, except for security fixes
|
||
applicable to older branches in security-fix-only mode.
|
||
|
||
For merging bug fixes from the 3.x to the 2.7 maintenance branch (2.6
|
||
and 2.5 are in security-fix-only mode and their maintenance will
|
||
continue in the Subversion repository), changesets should be
|
||
transplanted (not merged) in some other way. The transplant
|
||
extension, import/export and bundle/unbundle work equally well here.
|
||
|
||
Choosing this approach allows 3.x not to carry all of the 2.x
|
||
history-since-it-was-branched, meaning the clone is not as big and the
|
||
merges not as complicated.
|
||
|
||
The future of Subversion
|
||
------------------------
|
||
|
||
What happens to the Subversion repositories after the migration?
|
||
Since the svn server contains a bunch of repositories, not just the
|
||
CPython one, it will probably live on for a bit as not every project
|
||
may want to migrate or it takes longer for other projects to migrate.
|
||
To prevent people from staying behind, we may want to move migrated
|
||
projects from the repository to a new, read-only repository with a
|
||
new name.
|
||
|
||
Build identification
|
||
--------------------
|
||
|
||
Python currently provides the sys.subversion tuple to allow Python
|
||
code to find out exactly what version of Python it's running against.
|
||
The current version looks something like this:
|
||
|
||
* ('CPython', 'tags/r262', '71600')
|
||
* ('CPython', 'trunk', '73128M')
|
||
|
||
Another value is returned from Py_GetBuildInfo() in the C API, and
|
||
available to Python code as part of sys.version:
|
||
|
||
* 'r262:71600, Jun 2 2009, 09:58:33'
|
||
* 'trunk:73128M, Jun 2 2009, 01:24:14'
|
||
|
||
I propose that the revision identifier will be the short version of
|
||
hg's revision hash, for example 'dd3ebf81af43', augmented with '+'
|
||
(instead of 'M') if the working directory from which it was built was
|
||
modified. This mirrors the output of the hg id command, which is
|
||
intended for this kind of usage. The sys.subversion value will also
|
||
be renamed to sys.mercurial to reflect the change in VCS.
|
||
|
||
For the tag/branch identifier, I propose that hg will check for tags
|
||
on the currently checked out revision, use the tag if there is one
|
||
('tip' doesn't count), and uses the branch name otherwise.
|
||
sys.subversion becomes
|
||
|
||
* ('CPython', '2.6.2', 'dd3ebf81af43')
|
||
* ('CPython', 'default', 'af694c6a888c+')
|
||
|
||
and the build info string becomes
|
||
|
||
* '2.6.2:dd3ebf81af43, Jun 2 2009, 09:58:33'
|
||
* 'default:af694c6a888c+, Jun 2 2009, 01:24:14'
|
||
|
||
This reflects that the default branch in hg is called 'default'
|
||
instead of Subversion's 'trunk', and reflects the proposed new tag
|
||
format.
|
||
|
||
|
||
Copyright
|
||
=========
|
||
|
||
This document has been placed in the public domain.
|
||
|
||
|
||
|
||
..
|
||
Local Variables:
|
||
mode: indented-text
|
||
indent-tabs-mode: nil
|
||
sentence-end-double-space: t
|
||
fill-column: 70
|
||
coding: utf-8
|
||
End:
|