2008-11-08 00:03:25 -05:00
|
|
|
PEP: 374
|
2009-05-11 08:50:03 -04:00
|
|
|
Title: Migrating from svn to Mercurial
|
2009-03-31 14:11:49 -04:00
|
|
|
Version: $Revision$
|
|
|
|
Last-Modified: $Date$
|
2009-01-23 00:02:26 -05:00
|
|
|
Author: Brett Cannon <brett@python.org>,
|
2009-05-11 08:50:03 -04:00
|
|
|
Dirkjan Ochtman <dirkjan@ochtman.nl>
|
2009-01-23 00:02:26 -05:00
|
|
|
Status: Active
|
2008-11-08 00:03:25 -05:00
|
|
|
Type: Process
|
|
|
|
Content-Type: text/x-rst
|
|
|
|
Created: 07-Nov-2008
|
|
|
|
Post-History: 07-Nov-2008
|
2009-01-23 00:02:26 -05:00
|
|
|
22-Jan-2009
|
2008-11-08 00:03:25 -05:00
|
|
|
|
2009-01-24 17:16:53 -05:00
|
|
|
.. warning::
|
|
|
|
This PEP is in the draft stages and is still under active
|
2009-03-31 13:56:40 -04:00
|
|
|
development in terms of the transition plan even though Hg is the
|
|
|
|
chosen DVCS.
|
2009-01-24 17:16:53 -05:00
|
|
|
|
2009-01-23 00:02:26 -05:00
|
|
|
|
2009-05-11 08:50:03 -04:00
|
|
|
Motivation
|
|
|
|
==========
|
2009-01-23 00:02:26 -05:00
|
|
|
|
|
|
|
Python has been using a centralized version control system (VCS;
|
|
|
|
first CVS, now Subversion) for years to great effect. Having a master
|
|
|
|
copy of the official version of Python provides people with a single
|
|
|
|
place to always get the official Python source code. It has also
|
|
|
|
allowed for the storage of the history of the language, mostly for
|
|
|
|
help with development, but also for posterity. And of course the V in
|
|
|
|
VCS is very helpful when developing.
|
2009-01-27 15:14:27 -05:00
|
|
|
|
2009-01-23 00:02:26 -05:00
|
|
|
But a centralized version control system has its drawbacks. First and
|
|
|
|
foremost, in order to have the benefits of version control with
|
|
|
|
Python in a seamless fashion, one must be a "core developer" (i.e.
|
|
|
|
someone with commit privileges on the master copy of Python). People
|
|
|
|
who are not core developers but who wish to work with Python's
|
|
|
|
revision tree, e.g. anyone writing a patch for Python or creating a
|
|
|
|
custom version, do not have direct tool support for revisions. This
|
2009-02-15 17:02:56 -05:00
|
|
|
can be quite a limitation, since these non-core developers cannot
|
2009-01-23 00:02:26 -05:00
|
|
|
easily do basic tasks such as reverting changes to a previously
|
|
|
|
saved state, creating branches, publishing one's changes with full
|
2009-01-27 15:14:27 -05:00
|
|
|
revision history, etc. For non-core developers, the last safe tree
|
2009-01-23 00:02:26 -05:00
|
|
|
state is one the Python developers happen to set, and this prevents
|
|
|
|
safe development. This second-class citizenship is a hindrance to
|
|
|
|
people who wish to contribute to Python with a patch of any
|
|
|
|
complexity and want a way to incrementally save their progress to
|
|
|
|
make their development lives easier.
|
2009-01-27 15:14:27 -05:00
|
|
|
|
2009-01-23 00:02:26 -05:00
|
|
|
There is also the issue of having to be online to be able to commit
|
|
|
|
one's work. Because centralized VCSs keep a central copy that stores
|
|
|
|
all revisions, one must have Internet access in order for their
|
|
|
|
revisions to be stored; no Net, no commit. This can be annoying if
|
|
|
|
you happen to be traveling and lack any Internet. There is also the
|
|
|
|
situation of someone wishing to contribute to Python but having a
|
|
|
|
bad Internet connection where committing is time-consuming and
|
|
|
|
expensive and it might work out better to do it in a single step.
|
2009-01-27 15:14:27 -05:00
|
|
|
|
2009-01-23 00:02:26 -05:00
|
|
|
Another drawback to a centralized VCS is that a common use case is
|
2009-01-27 15:14:27 -05:00
|
|
|
for a developer to revise patches in response to review comments.
|
2009-01-23 00:02:26 -05:00
|
|
|
This is more difficult with a centralized model because there's no
|
2009-01-27 15:14:27 -05:00
|
|
|
place to contain intermediate work. It's either all checked in or
|
|
|
|
none of it is checked in. In the centralized VCS, it's also very
|
2009-01-23 00:02:26 -05:00
|
|
|
difficult to track changes to the trunk as they are committed, while
|
2009-01-27 15:14:27 -05:00
|
|
|
you're working on your feature or bug fix branch. This increases
|
2009-01-23 00:02:26 -05:00
|
|
|
the risk that such branches will grow stale, out-dated, or that
|
|
|
|
merging them into the trunk will generate too may conflicts to be
|
|
|
|
easily resolved.
|
2009-01-27 15:14:27 -05:00
|
|
|
|
2009-01-23 00:02:26 -05:00
|
|
|
Lastly, there is the issue of maintenance of Python. At any one time
|
|
|
|
there is at least one major version of Python under development (at
|
|
|
|
the time of this writing there are two). For each major version of
|
|
|
|
Python under development there is at least the maintenance version
|
|
|
|
of the last minor version and the in-development minor version (e.g.
|
|
|
|
with 2.6 just released, that means that both 2.6 and 2.7 are being
|
|
|
|
worked on). Once a release is done, a branch is created between the
|
|
|
|
code bases where changes in one version do not (but could) belong in
|
|
|
|
the other version. As of right now there is no natural support for
|
|
|
|
this branch in time in central VCSs; you must use tools that
|
2009-01-27 15:14:27 -05:00
|
|
|
simulate the branching. Tracking merges is similarly painful for
|
2009-01-23 00:02:26 -05:00
|
|
|
developers, as revisions often need to be merged between four active
|
|
|
|
branches (e.g. 2.6 maintenance, 3.0 maintenance, 2.7 development,
|
2009-01-27 15:14:27 -05:00
|
|
|
3.1 development). In this case, VCSs such as Subversion only handle
|
|
|
|
this through arcane third party tools.
|
2009-01-23 00:02:26 -05:00
|
|
|
|
|
|
|
Distributed VCSs (DVCSs) solve all of these problems. While one can
|
|
|
|
keep a master copy of a revision tree, anyone is free to copy that
|
|
|
|
tree for their own use. This gives everyone the power to commit
|
|
|
|
changes to their copy, online or offline. It also more naturally
|
|
|
|
ties into the idea of branching in the history of a revision tree
|
|
|
|
for maintenance and the development of new features bound for
|
2009-01-27 15:14:27 -05:00
|
|
|
Python. DVCSs also provide a great many additional features that
|
2009-01-23 00:02:26 -05:00
|
|
|
centralized VCSs don't or can't provide.
|
2009-01-27 15:14:27 -05:00
|
|
|
|
2009-01-25 15:35:48 -05:00
|
|
|
This PEP explores the possibility of changing Python's use of Subversion
|
|
|
|
to any of the currently popular DVCSs, in order to gain
|
2009-01-24 17:19:46 -05:00
|
|
|
the benefits outlined above. This PEP does not guarantee that a switch
|
|
|
|
to a DVCS will occur at the conclusion of this PEP. It is quite
|
|
|
|
possible that no clear winner will be found and that svn will continue
|
|
|
|
to be used. If this happens, this PEP will be revisited and revised in
|
|
|
|
the future as the state of DVCSs evolves.
|
2009-01-23 00:02:26 -05:00
|
|
|
|
2009-01-27 15:14:27 -05:00
|
|
|
|
2009-05-11 08:50:03 -04:00
|
|
|
Choice of DVCS
|
|
|
|
==============
|
2009-01-23 00:02:26 -05:00
|
|
|
|
2009-05-11 08:50:03 -04:00
|
|
|
This PEP included a thorough investigation of three DVCSs as options for
|
|
|
|
migration, with substantial work from Barry Warsaw, Alexandre Vassalotti and
|
|
|
|
Stephen Turnbull. That comparison has been moved to `DvcsComparison`_, and
|
|
|
|
this PEP now includes more information on the migration to Mercurial.
|
2009-01-23 00:02:26 -05:00
|
|
|
|
2009-05-11 08:50:03 -04:00
|
|
|
.. _DvcsComparison: http://wiki.python.org/moin/DvcsComparison
|
2009-01-23 00:02:26 -05:00
|
|
|
|
2009-05-11 08:50:03 -04:00
|
|
|
At PyCon 2009, a `decision
|
|
|
|
<http://mail.python.org/pipermail/python-dev/2009-March/087931.html>`_
|
|
|
|
was made to go with Mercurial.
|
2009-01-23 00:02:26 -05:00
|
|
|
|
2009-05-11 08:50:03 -04:00
|
|
|
The choice to go with Mercurial was made for three important reasons:
|
2009-01-23 00:02:26 -05:00
|
|
|
|
2009-05-11 08:50:03 -04:00
|
|
|
* According to a small survey, Python developers are more interested in
|
|
|
|
using Mercurial than in Bazaar or Git.
|
2009-01-23 00:02:26 -05:00
|
|
|
|
2009-05-11 08:50:03 -04:00
|
|
|
* Mercurial is written in Python, which is congruent with the python-dev
|
|
|
|
tendency to 'eat their own dogfood'.
|
2009-01-23 00:02:26 -05:00
|
|
|
|
2009-05-11 08:50:03 -04:00
|
|
|
* Mercurial is significantly faster than bzr (it's slower than git, though
|
|
|
|
by a much smaller difference).
|
2009-02-20 00:47:15 -05:00
|
|
|
|
2009-05-11 08:50:03 -04:00
|
|
|
* Mercurial is easier to learn for SVN users than bzr.
|
2009-02-20 00:47:15 -05:00
|
|
|
|
2009-05-11 08:50:03 -04:00
|
|
|
Although all of these points can be debated, in the end a pronouncement from
|
|
|
|
the BDFL was made to go with hg as the chosen DVCS for the Python project.
|
2009-02-20 00:47:15 -05:00
|
|
|
|
|
|
|
|
2009-05-11 08:50:03 -04:00
|
|
|
Transition Plan
|
2009-01-23 00:02:26 -05:00
|
|
|
===============
|
|
|
|
|
|
|
|
|
2009-05-11 08:50:03 -04:00
|
|
|
Introduction
|
2009-01-23 00:02:26 -05:00
|
|
|
------------
|
|
|
|
|
2009-05-11 08:50:03 -04:00
|
|
|
To make the most of hg, I (Dirkjan) want to make a high-fidelity conversion,
|
|
|
|
such that (a) as much of the svn metadata as possible is retained, and (b) all
|
|
|
|
metadata is converted to formats that are common in Mercurial. This way, tools
|
|
|
|
written for Mercurial can be optimally used. In order to do this, I want to use
|
|
|
|
the `hgsubversion <http://bitbucket.org/durin42/hgsubversion>`_ software to do
|
|
|
|
an initial conversion. This hg extension is focused on providing high-quality
|
|
|
|
conversion from Subversion to Mercurial for use in two-way correspondence,
|
|
|
|
meaning it doesn't throw away as much available metadata as other solutions.
|
|
|
|
|
|
|
|
Such a conversion also seems like a good time to reconsider the contents of
|
|
|
|
the repository and determine if some things are still valuable. In this spirit,
|
|
|
|
in the following sections I propose discarding some of the older metadata.
|
|
|
|
|
|
|
|
Branch strategy
|
|
|
|
---------------
|
|
|
|
|
|
|
|
Mercurial has two basic ways of using branches: cloned branches, where each
|
|
|
|
branch is kept in a separate directory, and named branches, where each revision
|
|
|
|
keeps metadata to note on which branch it belongs. The former makes it easier
|
|
|
|
to distinguish branches, at the expense of requiring more disk space on the
|
|
|
|
client. The latter makes it a little easier to switch between branches, but
|
|
|
|
often has somewhat unintuitive results for people (though this has been
|
|
|
|
getting better in recent versions of Mercurial).
|
|
|
|
|
|
|
|
For Python, I think it would work well to have cloned branches and keep most
|
|
|
|
things separate. This is predicated on the assumption that most people work on
|
|
|
|
just one (or maybe two) branches at a time. Branches can be exposed separately,
|
|
|
|
though I would advocate merging old (and tagged!) branches into mainline so
|
|
|
|
that people can easily revert to older releases. At what age of a release this
|
|
|
|
should be done can be debated (a natural point might be when the branch gets
|
|
|
|
unsupported, e.g. 2.4 at the release of 2.6).
|
|
|
|
|
|
|
|
Converting branches
|
|
|
|
-------------------
|
2009-01-23 00:02:26 -05:00
|
|
|
|
2009-05-11 08:50:03 -04:00
|
|
|
There are quite a lot of branches in SVN's branches directory. I propose to
|
|
|
|
clean this up a bit, by employing the following the strategy:
|
2009-01-23 00:02:26 -05:00
|
|
|
|
2009-05-11 08:50:03 -04:00
|
|
|
* Keep all release (maintenance) branches
|
|
|
|
* Discard branches that haven't been touched in 18 months, unless somone
|
|
|
|
indicates there's still interest in such a branch
|
|
|
|
* Keep branches that have been touched in the last 18 months, unless someone
|
|
|
|
indicates the branch can be deprecated
|
2009-01-27 15:14:27 -05:00
|
|
|
|
2009-05-11 08:50:03 -04:00
|
|
|
Converting tags
|
|
|
|
---------------
|
2009-01-27 15:14:27 -05:00
|
|
|
|
2009-05-11 08:50:03 -04:00
|
|
|
The SVN tags directory contains a lot of old stuff. Some of these are not, in
|
|
|
|
fact, full tags, but contain only a smaller subset of the repository. I think
|
|
|
|
we should keep all release tags, and consider other tags for inclusion based
|
|
|
|
on requests from the developer community. I'd like to consider unifying the
|
|
|
|
release tag naming scheme to make some things more consistent, if people feel
|
|
|
|
that won't create too many problems.
|
2009-01-23 00:02:26 -05:00
|
|
|
|
2009-05-11 08:50:03 -04:00
|
|
|
Author map
|
|
|
|
----------
|
2009-01-23 00:02:26 -05:00
|
|
|
|
2009-05-11 08:50:03 -04:00
|
|
|
In order to provide user names the way they are common in hg (in the 'First Last
|
|
|
|
<user@example.org>' format), we need an author map to map cvs and svn user
|
|
|
|
names to real names and their email addresses. I have a complete version of such
|
|
|
|
a map in my `migration tools repository`_. The email addresses in it might be
|
|
|
|
out of date; that's bound to happen, although it would be nice to try and
|
|
|
|
have as many people as possible review it for addresses that are out of date.
|
|
|
|
The current version also still seems to contain some encoding problems.
|
2009-01-23 00:02:26 -05:00
|
|
|
|
2009-05-11 08:50:03 -04:00
|
|
|
.. _migration tools repository: http://hg.xavamedia.nl/cpython/pymigr/
|
2009-01-23 00:02:26 -05:00
|
|
|
|
2009-05-11 08:50:03 -04:00
|
|
|
Generating .hgignore
|
|
|
|
--------------------
|
2009-01-27 15:14:27 -05:00
|
|
|
|
2009-05-11 08:50:03 -04:00
|
|
|
The .hgignore file can be used in Mercurial repositories to help ignore files
|
|
|
|
that are not eligible for version control. It does this by employing several
|
|
|
|
possible forms of pattern matching. The current Python repository already
|
|
|
|
includes a rudimentary .hgignore file to help with using the hg mirrors.
|
2009-01-23 00:02:26 -05:00
|
|
|
|
2009-05-11 08:50:03 -04:00
|
|
|
It might be useful to have the .hgignore be generated automatically from
|
|
|
|
svn:ignore properties. This would make sure all historic revisions also have
|
|
|
|
useful ignore information (though one could argue ignoring isn't really
|
|
|
|
relevant to just checking out an old revision).
|
2009-01-23 00:02:26 -05:00
|
|
|
|
2009-05-11 08:50:03 -04:00
|
|
|
Revlog reordering
|
2009-01-23 00:02:26 -05:00
|
|
|
-----------------
|
|
|
|
|
2009-05-11 08:50:03 -04:00
|
|
|
As an optional optimization technique, we should consider trying a reordering
|
|
|
|
pass on the revlogs (internal Mercurial files) resulting from the conversion.
|
|
|
|
In some cases this results in dramatic decreases in on-disk repository size.
|
2009-01-23 00:02:26 -05:00
|
|
|
|
2009-05-11 08:50:03 -04:00
|
|
|
Other repositories
|
2009-01-23 00:02:26 -05:00
|
|
|
------------------
|
|
|
|
|
2009-05-11 08:50:03 -04:00
|
|
|
Richard Tew has indicated that he'd like the Stackless repository to also be
|
|
|
|
converted. What other projects in the svn.python.org repository should be
|
|
|
|
converted? Do we want to convert the peps repository? distutils? others?
|
2009-01-23 00:02:26 -05:00
|
|
|
|
|
|
|
|
2009-05-11 08:50:03 -04:00
|
|
|
Infrastructure
|
2009-01-23 00:02:26 -05:00
|
|
|
==============
|
|
|
|
|
2009-05-11 08:50:03 -04:00
|
|
|
hg-ssh
|
|
|
|
------
|
2009-01-23 00:02:26 -05:00
|
|
|
|
2009-05-11 08:50:03 -04:00
|
|
|
Developers should access the repositories through ssh, similar to the current
|
|
|
|
setup. Public keys can be used to grant people access to a shared hg@ account.
|
|
|
|
A hgwebdir instance should also be set up for easy browsing and read-only
|
|
|
|
access. Some facility for sandboxes/incubator repositories could be discussed.
|
2009-01-24 18:06:31 -05:00
|
|
|
|
2009-05-11 08:50:03 -04:00
|
|
|
Hooks
|
|
|
|
-----
|
2009-01-23 00:02:26 -05:00
|
|
|
|
2009-05-11 08:50:03 -04:00
|
|
|
A number of hooks is currently in use. The hg equivalents for these should be
|
|
|
|
developed and deployed. The following hooks are being used:
|
2009-01-23 00:02:26 -05:00
|
|
|
|
2009-05-11 08:50:03 -04:00
|
|
|
* check whitespace: a hook to reject commits in case the whitespace doesn't
|
|
|
|
match the rules for the Python codebase. Should be straightforward to
|
|
|
|
re-implement from the current version. Open issue: do we check only the tip
|
|
|
|
after each push, or do we check every commit in a changegroup?
|
2009-01-23 00:02:26 -05:00
|
|
|
|
2009-05-11 08:50:03 -04:00
|
|
|
* commit mails: we can leverage the notify extension for this
|
2009-01-24 18:06:31 -05:00
|
|
|
|
2009-05-11 08:50:03 -04:00
|
|
|
* buildbots: both the regular and the community build masters must be notified.
|
|
|
|
Fortunately buildbot includes support for hg. I've also implemented this for
|
|
|
|
Mercurial itself, so I don't expect problems here.
|
2009-01-27 15:14:27 -05:00
|
|
|
|
2009-05-11 08:50:03 -04:00
|
|
|
* check contributors: in the current setup, all changesets bear the username of
|
|
|
|
committers, who must have signed the contributor agreement. In a DVCS, the
|
|
|
|
committers are not necessarily the same people who push, and so we can't
|
|
|
|
check if the committer is a contributor. We could use a hook to check if the
|
|
|
|
committer is a contributor if we keep a list of registered contributors.
|
2009-01-23 00:02:26 -05:00
|
|
|
|
2009-05-11 08:50:03 -04:00
|
|
|
hgwebdir
|
|
|
|
--------
|
2009-01-23 00:02:26 -05:00
|
|
|
|
2009-05-11 08:50:03 -04:00
|
|
|
A more or less stock hgwebdir installation should be set up. We might want to
|
|
|
|
come up with a style to match the Python website. It may also be useful to
|
|
|
|
build a quick extension to augment the URL rev parser so that it can also take
|
|
|
|
r[0-9]+ args and come up with the matching hg revision.
|