488 lines
19 KiB
ReStructuredText
488 lines
19 KiB
ReStructuredText
PEP: 385
|
||
Title: Migrating from Subversion to Mercurial
|
||
Version: $Revision$
|
||
Last-Modified: $Date$
|
||
Author: Dirkjan Ochtman <dirkjan@ochtman.nl>,
|
||
Antoine Pitrou <solipsis@pitrou.net>,
|
||
Georg Brandl <georg@python.org>
|
||
Status: Final
|
||
Type: Process
|
||
Content-Type: text/x-rst
|
||
Created: 25-May-2009
|
||
|
||
|
||
Motivation
|
||
==========
|
||
|
||
After having decided to switch to the Mercurial DVCS, the actual
|
||
migration still has to be performed. In the case of an important
|
||
piece of infrastructure like the version control system for a large,
|
||
distributed project like Python, this is a significant effort. This
|
||
PEP is an attempt to describe the steps that must be taken for further
|
||
discussion. It's somewhat similar to :pep:`347`, which discussed the
|
||
migration to SVN.
|
||
|
||
To make the most of hg, we would like to make a high-fidelity
|
||
conversion, such that (a) as much of the svn metadata as possible is
|
||
retained, and (b) all metadata is converted to formats that are common
|
||
in Mercurial. This way, tools written for Mercurial can be optimally
|
||
used. In order to do this, we want to use the `hgsubversion`_
|
||
software to do an initial conversion. This hg extension is focused on
|
||
providing high-quality conversion from Subversion to Mercurial for use
|
||
in two-way correspondence, meaning it doesn't throw away as much
|
||
available metadata as other solutions.
|
||
|
||
Such a conversion also seems like a good time to reconsider the
|
||
contents of the repository and determine if some things are still
|
||
valuable. In this spirit, the following sections also propose
|
||
discarding some of the older metadata.
|
||
|
||
.. _hgsubversion: http://bitbucket.org/durin42/hgsubversion/
|
||
|
||
|
||
Timeline
|
||
========
|
||
|
||
The current schedule for conversion milestones:
|
||
|
||
- 2011-02-24: availability of a test repo at hg.python.org
|
||
|
||
Test commits will be allowed (and encouraged) from all committers to
|
||
the Subversion repository. The test repository and all test commits
|
||
will be removed once the final conversion is done. The server-side
|
||
hooks will be installed for the test repository, in order to test
|
||
buildbot, diff-email and whitespace checking integration.
|
||
|
||
- 2011-03-05: final conversion (tentative)
|
||
|
||
Commits to the Subversion branches now maintained in Mercurial will
|
||
be blocked. Developers should refrain from pushing to the Mercurial
|
||
repositories until all infrastructure is ensured to work after their
|
||
switch over to the new repository.
|
||
|
||
|
||
Transition plan
|
||
===============
|
||
|
||
Branch strategy
|
||
---------------
|
||
|
||
Mercurial has two basic ways of using branches: cloned branches, where
|
||
each branch is kept in a separate repository, and named branches,
|
||
where each revision keeps metadata to note on which branch it belongs.
|
||
The former makes it easier to distinguish branches, at the expense of
|
||
requiring more disk space on the client. The latter makes it a little
|
||
easier to switch between branches, but all branch names are a
|
||
persistent part of history. [1]_
|
||
|
||
Differences between named branches and cloned branches:
|
||
|
||
* Tags in a different (maintenance) clone aren't available in the
|
||
local clone
|
||
* Clones with named branches will be larger, since they contain more
|
||
data
|
||
|
||
We propose to use named branches for release branches and adopt cloned
|
||
branches for feature branches.
|
||
|
||
|
||
History management
|
||
------------------
|
||
|
||
In order to minimize the loss of information due to the conversion, we
|
||
propose to provide several repositories as a conversion result:
|
||
|
||
* A repository trimmed to the mainline trunk (and py3k), as well as
|
||
past and present maintenance branches -- this is called the
|
||
"working" repo and is where development continues. This repository has
|
||
all the history needed for development work, including annotating
|
||
source files with changes back up to 1990 and other common history-digging
|
||
operations.
|
||
|
||
The ``default`` branch in that repo is what is known as ``py3k`` in
|
||
Subversion, while the Subversion trunk lives on with the branch name
|
||
``legacy-trunk``; however in Mercurial this branch will be closed.
|
||
Release branches are named after their major.minor version, e.g. ``3.2``.
|
||
|
||
* A repository with the full, unedited conversion of the Subversion
|
||
repository (actually, its /python subdirectory) -- this is called
|
||
the "historic" or "archive" repo and will be offered as a read-only
|
||
resource. [2]_
|
||
|
||
* One more repository per active feature branch; "active" means that
|
||
at least one core developer asks for the branch to be provided. Each
|
||
such repository will contain both the feature branch and all ancestor
|
||
changesets from mainline (coming from ``trunk`` and/or ``py3k`` in SVN).
|
||
|
||
Since all branches are present in the historic repo, they can later be
|
||
extracted as separate repositories at any time should it prove to be
|
||
necessary.
|
||
|
||
The final revision map between SVN revision numbers, Mercurial changesets
|
||
and SVN branch names will be made available in a file stored in the ``Misc``
|
||
directory. Its format is as following::
|
||
|
||
[...]
|
||
88483 e65daae6cf4499a0863cb7645109a4798c28d83e issue10276-snowleopard
|
||
88484 835cb57abffeceaff0d85c2a3aa0625458dd3e31 py3k
|
||
88485 d880f9d8492f597a030772c7485a34aadb6c4ece release32-maint
|
||
88486 0c431b8c22f5dbeb591414c154acb7890c1809df py3k
|
||
88487 82cda1f21396bbd10db8083ea20146d296cb630b release32-maint
|
||
88488 8174d00d07972d6f109ed57efca8273a4d59302c release27-maint
|
||
[...]
|
||
|
||
|
||
Converting tags
|
||
---------------
|
||
|
||
The SVN tags directory contains a lot of old stuff. Some of these are
|
||
not, in fact, full tags, but contain only a smaller subset of the
|
||
repository. All release tags will be kept; other tags will be
|
||
included based on requests from the developer community. We propose
|
||
to make the tag naming scheme consistent, in this style: ``v3.2.1a2``.
|
||
|
||
|
||
Author map
|
||
----------
|
||
|
||
In order to provide user names the way they are common in hg (in the
|
||
'First Last <user@example.org>' format), we need an author map to map
|
||
cvs and svn user names to real names and their email addresses. We
|
||
have a complete version of such a map in the migration tools
|
||
repository (not publicly accessible to avoid leaking addresses to
|
||
harvesters). The email addresses in it might be out of date; that's
|
||
bound to happen, although it would be nice to try and have as many
|
||
people as possible review it for addresses that are out of date. The
|
||
current version also still seems to contain some encoding problems.
|
||
|
||
|
||
Generating .hgignore
|
||
--------------------
|
||
|
||
The .hgignore file can be used in Mercurial repositories to help
|
||
ignore files that are not eligible for version control. It does this
|
||
by employing several possible forms of pattern matching. The current
|
||
Python repository already includes a rudimentary .hgignore file to
|
||
help with using the hg mirrors.
|
||
|
||
Since the current Python repository already includes a .hgignore file
|
||
(for use with hg mirrors), we'll just use that. Generating full
|
||
history of the file was debated but deemed impractical (because it's
|
||
relatively hard with fairly little gain, since ignoring is less
|
||
important for older revisions).
|
||
|
||
|
||
Repository size
|
||
---------------
|
||
|
||
A bare conversion result of the current Python repository weighs 1.9
|
||
GB; although this is smaller than the Subversion repository (2.7 GB)
|
||
it is not feasible.
|
||
|
||
The size becomes more manageable by the trimming applied to the
|
||
working repository, and by a process called "revlog reordering" that
|
||
optimizes the layout of internal Mercurial storage very efficiently.
|
||
|
||
After all optimizations done, the size of the working repository is
|
||
around 180 MB on disk. The amount of data transferred over the
|
||
network when cloning is estimated to be around 80 MB.
|
||
|
||
|
||
Other repositories
|
||
------------------
|
||
|
||
There are a number of other projects hosted in svn.python.org's
|
||
"projects" repository. The "peps" directory will be converted along
|
||
with the main Python one. Richard Tew has indicated that he'd like the
|
||
Stackless repository to also be converted. What other projects in the
|
||
svn.python.org repository should be converted?
|
||
|
||
There's now an initial stab at converting the Jython repository. The
|
||
current tip of hgsubversion unfortunately fails at some point.
|
||
Pending investigation.
|
||
|
||
Other repositories that would like to converted to Mercurial can
|
||
announce themselves to me after the main Python migration is done, and
|
||
I'll take care of their needs.
|
||
|
||
|
||
Infrastructure
|
||
==============
|
||
|
||
hg-ssh
|
||
------
|
||
|
||
Developers should access the repositories through ssh, similar to the
|
||
current setup. Public keys can be used to grant people access to a
|
||
shared hg@ account. A hgwebdir instance also has been set up at
|
||
``hg.python.org`` for easy browsing and read-only access. It is
|
||
configured so that developers can trivially start new clones (for
|
||
longer-term features that profit from development in a separate
|
||
repository).
|
||
|
||
Also, direct creation of public repositories is allowed for core developers,
|
||
although it is not yet decided which naming scheme will be enforced::
|
||
|
||
$ hg init ssh://hg@hg.python.org/sandbox/mywork
|
||
repo created, public URL is http://hg.python.org/sandbox/mywork
|
||
|
||
|
||
Hooks
|
||
-----
|
||
|
||
A number of hooks is currently in use. The hg equivalents for these
|
||
should be developed and deployed. The following hooks are being used:
|
||
|
||
* check whitespace: a hook to reject commits in case the whitespace
|
||
doesn't match the rules for the Python codebase. In a changegroup,
|
||
only the tip is checked (this allows cleanup commits for changes
|
||
pulled from third-party repos). We can also offer a whitespace hook
|
||
for use with client-side repositories that people can use; it could
|
||
either warn about whitespace issues and/or truncate trailing
|
||
whitespace from changed lines.
|
||
|
||
* push mails: Emails will include diffs for each changeset pushed
|
||
to the public repository, including the username which pushed the
|
||
changesets (this is not necessarily the same as the author recorded
|
||
in the changesets).
|
||
|
||
* buildbots: the python.org build master will be notified of each changeset
|
||
pushed to the ``cpython`` repository, and will trigger an appropriate build
|
||
on every build slave for the branch in which the changeset occurs.
|
||
|
||
The `hooks repository`_ contains ports of these server-side hooks to
|
||
Mercurial, as well as a couple additional ones:
|
||
|
||
* check branch heads: a hook to reject pushes which create a new head on
|
||
an existing branch. The pusher then has to merge the excess heads
|
||
and try pushing again.
|
||
|
||
* check branches: a hook to reject all changesets not on an allowed named
|
||
branch. This hook's whitelist will have to be updated when we want to
|
||
create new maintenance branches.
|
||
|
||
* check line endings: a hook, based on the `eol extension`_, to reject all
|
||
changesets committing files with the wrong line endings. The commits then
|
||
have to be stripped and redone, possibly with the `eol extension`_ enabled
|
||
on the comitter's computer.
|
||
|
||
One additional hook could be beneficial:
|
||
|
||
* check contributors: in the current setup, all changesets bear the
|
||
username of committers, who must have signed the contributor
|
||
agreement. We might want to use a hook to check if the committer is
|
||
a contributor if we keep a list of registered contributors. Then,
|
||
the hook might warn users that push a group of revisions containing
|
||
changesets from unknown contributors.
|
||
|
||
.. _hooks repository: http://hg.python.org/hooks/
|
||
|
||
|
||
End-of-line conversions
|
||
-----------------------
|
||
|
||
Discussion about the lack of end-of-line conversion support in
|
||
Mercurial, which was provided initially by the `win32text extension`_,
|
||
led to the development of the new `eol extension`_ that supports a
|
||
versioned management of line-ending conventions on a file-by-file
|
||
basis, akin to Subversion's ``svn:eol-style`` properties. This
|
||
information is kept in a versioned file called ``.hgeol``, and such a
|
||
file has already been checked into the Subversion repository.
|
||
|
||
A hook also exists on the server side to reject any changeset
|
||
introducing inconsistent newline data (see above).
|
||
|
||
.. _eol extension: http://mercurial.selenic.com/wiki/EolExtension
|
||
.. _win32text extension: http://mercurial.selenic.com/wiki/Win32TextExtension
|
||
|
||
|
||
hgwebdir
|
||
--------
|
||
|
||
A more or less stock hgwebdir installation should be set up. We might
|
||
want to come up with a style to match the Python website.
|
||
|
||
A small WSGI application has been written that can look up
|
||
Subversion revisions and redirect to the appropriate hgweb page for
|
||
the given changeset, regardless in which repository the converted
|
||
revision ended up (since one big Subversion repository is converted
|
||
into several Mercurial repositories). It can also look up Mercurial
|
||
changesets by their hexadecimal ID.
|
||
|
||
|
||
roundup
|
||
-------
|
||
|
||
By pointing Roundup to the URL of the lookup script mentioned above,
|
||
links to SVN revisions will continue to work, and links to Mercurial
|
||
changesets can be created as well, without having to give repository
|
||
*and* changeset ID.
|
||
|
||
|
||
After migration
|
||
===============
|
||
|
||
Where to get code
|
||
-----------------
|
||
|
||
After migration, the hgwebdir will live at hg.python.org. This is an
|
||
accepted standard for many organizations, and an easy parallel to
|
||
svn.python.org. The working repo might live at
|
||
http://hg.python.org/cpython/, for example, with the archive repo at
|
||
http://hg.python.org/cpython-archive/. For write access, developers
|
||
will have to use ssh, which could be ssh://hg@hg.python.org/cpython/.
|
||
|
||
code.python.org was also proposed as the hostname. We think that
|
||
using the VCS name in the hostname is good because it prevents
|
||
confusion: it should be clear that you can't use svn or bzr for
|
||
hg.python.org.
|
||
|
||
hgwebdir can already provide tarballs for every changeset. This
|
||
obviates the need for daily snapshots; we can just point users to
|
||
tip.tar.gz instead, meaning they will get the latest. If desired, we
|
||
could even use buildbot results to point to the last good changeset.
|
||
|
||
|
||
Python-specific documentation
|
||
-----------------------------
|
||
|
||
hg comes with good built-in documentation (available through hg help)
|
||
and a `wiki`_ that's full of useful information and recipes, not to
|
||
mention a popular `book`_ (readable online).
|
||
|
||
In addition to that, the recently overhauled `Python Developer's
|
||
Guide`_ already has a branch with instructions for Mercurial instead
|
||
of Subversion; an online `build of this branch`_ is also available.
|
||
|
||
.. _Python Developer's Guide: http://docs.python.org/devguide/
|
||
.. _build of this branch: http://potrou.net/hgdevguide/
|
||
.. _wiki: http://mercurial.selenic.com/wiki/
|
||
.. _book: http://hgbook.red-bean.com/
|
||
|
||
Proposed workflow
|
||
-----------------
|
||
|
||
We propose two workflows for the migration of patches between several
|
||
branches.
|
||
|
||
For migration within 2.x or 3.x branches, we propose a patch always
|
||
gets committed to the oldest branch where it applies first. Then, the
|
||
resulting changeset can be merged using hg merge to all newer branches
|
||
within that series (2.x or 3.x). If it does not apply as-is to the
|
||
newer branch, hg revert can be used to easily revert to the
|
||
new-branch-native head, patch in some alternative version of the patch
|
||
(or none, if it's not applicable), then commit the merge. The premise
|
||
here is that all changesets from an older branch within the series are
|
||
eventually merged to all newer branches within the series.
|
||
|
||
The upshot is that this provides for the most painless merging
|
||
procedure. This means that in the general case, people have to think
|
||
about the oldest branch to which the patch should be applied before
|
||
actually applying it. Usually, that is one of only two branches: the
|
||
latest maintenance branch and the trunk, except for security fixes
|
||
applicable to older branches in security-fix-only mode.
|
||
|
||
For merging bug fixes from the 3.x to the 2.7 maintenance branch (2.6
|
||
and 2.5 are in security-fix-only mode and their maintenance will
|
||
continue in the Subversion repository), changesets should be
|
||
transplanted (not merged) in some other way. The transplant
|
||
extension, import/export and bundle/unbundle work equally well here.
|
||
|
||
Choosing this approach allows 3.x not to carry all of the 2.x
|
||
history-since-it-was-branched, meaning the clone is not as big and the
|
||
merges not as complicated.
|
||
|
||
|
||
The future of Subversion
|
||
------------------------
|
||
|
||
What happens to the Subversion repositories after the migration?
|
||
Since the svn server contains a bunch of repositories, not just the
|
||
CPython one, it will probably live on for a bit as not every project
|
||
may want to migrate or it takes longer for other projects to migrate.
|
||
To prevent people from staying behind, we may want to move migrated
|
||
projects from the repository to a new, read-only repository with a new
|
||
name.
|
||
|
||
|
||
Build identification
|
||
--------------------
|
||
|
||
Python currently provides the sys.subversion tuple to allow Python
|
||
code to find out exactly what version of Python it's running against.
|
||
The current version looks something like this:
|
||
|
||
* ('CPython', 'tags/r262', '71600')
|
||
* ('CPython', 'trunk', '73128M')
|
||
|
||
Another value is returned from Py_GetBuildInfo() in the C API, and
|
||
available to Python code as part of sys.version:
|
||
|
||
* 'r262:71600, Jun 2 2009, 09:58:33'
|
||
* 'trunk:73128M, Jun 2 2009, 01:24:14'
|
||
|
||
I propose that the revision identifier will be the short version of
|
||
hg's revision hash, for example 'dd3ebf81af43', augmented with '+'
|
||
(instead of 'M') if the working directory from which it was built was
|
||
modified. This mirrors the output of the hg id command, which is
|
||
intended for this kind of usage. The sys.subversion value will also
|
||
be renamed to sys.mercurial to reflect the change in VCS.
|
||
|
||
For the tag/branch identifier, I propose that hg will check for tags
|
||
on the currently checked out revision, use the tag if there is one
|
||
('tip' doesn't count), and uses the branch name otherwise.
|
||
sys.subversion becomes
|
||
|
||
* ('CPython', 'v2.6.2', 'dd3ebf81af43')
|
||
* ('CPython', 'default', 'af694c6a888c+')
|
||
|
||
and the build info string becomes
|
||
|
||
* 'v2.6.2:dd3ebf81af43, Jun 2 2009, 09:58:33'
|
||
* 'default:af694c6a888c+, Jun 2 2009, 01:24:14'
|
||
|
||
This reflects that the default branch in hg is called 'default'
|
||
instead of Subversion's 'trunk', and reflects the proposed new tag
|
||
format.
|
||
|
||
Mercurial also allows to find out the latest tag and the number of
|
||
changesets separating the current changeset from that tag, allowing for
|
||
a descriptive version string::
|
||
|
||
$ hg parent --template "{latesttag}+{latesttagdistance}-{node|short}\n"
|
||
v3.2+37-4b5d0d260e72
|
||
$ hg up 2.7
|
||
3316 files updated, 0 files merged, 379 files removed, 0 files unresolved
|
||
$ hg parent --template "{latesttag}+{latesttagdistance}-{node|short}\n"
|
||
v2.7.1+216-9619d21d8198
|
||
|
||
|
||
Footnotes
|
||
=========
|
||
|
||
.. [1] The Mercurial book discourages the use of named branches, but
|
||
it is, in this respect, somewhat outdated. Named branches have
|
||
gotten much easier to use since that comment was written, due to
|
||
improvements in hg.
|
||
|
||
.. [2] Since the initial working repo is a subset of the archive repo,
|
||
it would also be feasible to pull changes from the working repo
|
||
into the archive repo periodically.
|
||
|
||
|
||
Copyright
|
||
=========
|
||
|
||
This document has been placed in the public domain.
|
||
|
||
|
||
|
||
..
|
||
Local Variables:
|
||
mode: indented-text
|
||
indent-tabs-mode: nil
|
||
sentence-end-double-space: t
|
||
fill-column: 70
|
||
coding: utf-8
|
||
End:
|