Overhaul PEP 385 with newest conversion strategy.

This commit is contained in:
Georg Brandl 2011-02-25 18:30:02 +00:00
parent 8ac0ce5b74
commit bb40b9f9cb
1 changed files with 139 additions and 91 deletions

View File

@ -1,8 +1,10 @@
PEP: 385
Title: Migrating from svn to Mercurial
Title: Migrating from Subversion to Mercurial
Version: $Revision$
Last-Modified: $Date$
Author: Dirkjan Ochtman <dirkjan@ochtman.nl>
Author: Dirkjan Ochtman <dirkjan@ochtman.nl>,
Antoine Pitrou <solipsis@pitrou.net>,
Georg Brandl <georg@python.org>
Status: Active
Type: Process
Content-Type: text/x-rst
@ -20,12 +22,12 @@ PEP is an attempt to describe the steps that must be taken for further
discussion. It's somewhat similar to `PEP 347`_, which discussed the
migration to SVN.
To make the most of hg, I (Dirkjan) would like to make a high-fidelity
To make the most of hg, we would like to make a high-fidelity
conversion, such that (a) as much of the svn metadata as possible is
retained, and (b) all metadata is converted to formats that are common
in Mercurial. This way, tools written for Mercurial can be optimally
used. In order to do this, I want to use the `hgsubversion`_ software
to do an initial conversion. This hg extension is focused on
used. In order to do this, we want to use the `hgsubversion`_
software to do an initial conversion. This hg extension is focused on
providing high-quality conversion from Subversion to Mercurial for use
in two-way correspondence, meaning it doesn't throw away as much
available metadata as other solutions.
@ -44,7 +46,7 @@ Timeline
The current schedule for conversion milestones:
- 2010-11-20: availability of a test repo at hg.python.org
- 2011-02-24: availability of a test repo at hg.python.org
Test commits will be allowed (and encouraged) from all committers to
the Subversion repository. The test repository and all test commits
@ -52,7 +54,7 @@ The current schedule for conversion milestones:
hooks will be installed for the test repository, in order to test
buildbot, diff-email and whitespace checking integration.
- 2010-12-12: final conversion (tentative)
- 2010-03-09: final conversion (tentative)
Commits to the Subversion branches now maintained in Mercurial will
be blocked. Developers should refrain from pushing to the Mercurial
@ -80,15 +82,8 @@ each branch is kept in a separate repository, and named branches,
where each revision keeps metadata to note on which branch it belongs.
The former makes it easier to distinguish branches, at the expense of
requiring more disk space on the client. The latter makes it a little
easier to switch between branches, but often has somewhat unintuitive
results for people (though this has been getting better in recent
versions of Mercurial).
The current proposal is to use named branches for release branches and
adopt cloned branches for feature branches, with one exception to this
rule: the 3.x branches will be kept in separate clones from the 2.x
branches. I think this provides an optimal hybrid approach for
Python's uses of branching.
easier to switch between branches, but all branch names are a
persistent part of history. [1]_
Differences between named branches and cloned branches:
@ -97,39 +92,70 @@ Differences between named branches and cloned branches:
* Clones with named branches will be larger, since they contain more
data
(The Mercurial book discourages the use of named branches, but it is,
in this respect, somewhat outdated. Named branches have gotten much
easier to use since that comment was written, due to improvements in
hg.)
We propose to use named branches for release branches and adopt cloned
branches for feature branches.
Converting branches
-------------------
.. with one exception to this rule: the 3.x branches will be kept in
.. separate clones from the 2.x branches. I think this provides an
.. optimal hybrid approach for Python's uses of branching.
There are quite a lot of branches in SVN's branches directory. I
propose to clean this up a bit, by following this basic strategy:
* Keep all release (maintenance) branches
* Discard branches that haven't been touched in 18 months, unless
somone indicates there's still interest in such a branch
* Keep branches that have been touched in the last 18 months, unless
someone indicates the branch can be deprecated
History management
------------------
There's a `branch map`_ available that shows info about each branch:
In order to minimize the loss of information due to the conversion, we
propose to provide several repositories as a conversion result:
* keep-clone means we'll keep that branch in a separate clone
* keep-named means we'll keep that branch as a named branch in one of
the clones
* strip means we won't keep that branch
* streamed-merge means that it got merged by committing several new
revisions to the other branch
* merged-r* means the branch got merged in the named revision
* merges? means I haven't checked/found out yet whether that branch
was ever merged
* ? means that your input would be even more helpful than for the
other items
* some items have no action yet, feel free to treat that as just '?'
* A repository with the full, unedited conversion of the Subversion
repository (actually, its /python subdirectory) -- this is called
the "historic" or "archive" repo and will be offered as a read-only
resource. [2]_
* A repository trimmed to the mainline trunk (and py3k), as well as
past and present maintenance branches -- this is called the
"working" repo and is where development continues.
The ``default`` branch in that repo is what is known as ``py3k`` in
Subversion, while the Subversion trunk lives on with the branch name
``trunk``; however in Mercurial this branch will be closed. Release
branches are named after their major.minor version, e.g. ``3.2``.
* One more repository per active feature branch; "active" means that
at least one core developer asks for the branch to be provided.
All other branches are still present in the historic repo, and can
be extracted as separate repositories at any time should it prove to
be necessary.
.. Converting branches
.. -------------------
.. There are quite a lot of branches in SVN's branches directory. We
.. propose to clean this up a bit, by following this basic strategy:
.. * Keep all release (maintenance) branches
.. * Discard branches that haven't been touched in 18 months, unless
.. somone indicates there's still interest in such a branch
.. * Keep branches that have been touched in the last 18 months, unless
.. someone indicates the branch can be deprecated
.. There's a `branch map`_ available that shows info about each branch:
.. * keep-clone means we'll keep that branch in a separate clone
.. * keep-named means we'll keep that branch as a named branch in one of
.. the clones
.. * strip means we won't keep that branch
.. * streamed-merge means that it got merged by committing several new
.. revisions to the other branch
.. * merged-r* means the branch got merged in the named revision
.. * merges? means we haven't checked/found out yet whether that branch
.. was ever merged
.. * ? means that your input would be even more helpful than for the
.. other items
.. * some items have no action yet, feel free to treat that as just '?'
.. .. _branch map: http://hg.python.org/pymigr/file/tip/all-branches.txt
.. _branch map: http://hg.python.org/pymigr/file/tip/all-branches.txt
Converting tags
---------------
@ -137,18 +163,16 @@ Converting tags
The SVN tags directory contains a lot of old stuff. Some of these are
not, in fact, full tags, but contain only a smaller subset of the
repository. All release tags will be kept; other tags will be
included based on requests from the developer community. I'd like to
consider unifying the release tag naming scheme to make some things
more consistent, if people feel that won't create too many problems.
The current proposal is to bring old release tags in line with the
current practice of release tag naming.
included based on requests from the developer community. We propose
to make the tag naming scheme consistent, in this style: ``v3.2.1a2``.
Author map
----------
In order to provide user names the way they are common in hg (in the
'First Last <user@example.org>' format), we need an author map to map
cvs and svn user names to real names and their email addresses. I
cvs and svn user names to real names and their email addresses. We
have a complete version of such a map in my `migration tools
repository`_. The email addresses in it might be out of date; that's
bound to happen, although it would be nice to try and have as many
@ -157,6 +181,7 @@ current version also still seems to contain some encoding problems.
.. _migration tools repository: http://hg.python.org/pymigr/
Generating .hgignore
--------------------
@ -172,23 +197,31 @@ history of the file was debated but deemed impractical (because it's
relatively hard with fairly little gain, since ignoring is less
important for older revisions).
Revlog reordering
-----------------
As an optional optimization technique, I have performed a reordering
pass on the revlogs (internal Mercurial files) resulting from the
conversion. In some cases this results in dramatic decreases in
on-disk repository size. This especially makes sense for the manifest
(where it really helps out quite a lot) and oft-edited files like
Misc/NEWS (with an admittedly smaller effect).
Repository size
---------------
A bare conversion result of the current Python repository weighs 1.9
GB; although this is smaller than the Subversion repository (2.7 GB)
it is not feasible.
The size becomes more manageable by the trimming applied to the
working repository, and by a process called "revlog reordering" that
optimizes the layout of internal Mercurial storage very efficiently.
After all optimizations done, the size of the working repository is
around 180 MB on disk. The amount of data transferred over the
network when cloning is estimated to be around 80 MB.
Other repositories
------------------
Richard Tew has indicated that he'd like the Stackless repository to
also be converted. What other projects in the svn.python.org
repository should be converted? Do we want to convert the peps
repository? distutils? others?
There are a number of other projects hosted in svn.python.org's
"projects" repository. The "peps" directory will be converted along
with the main Python one. Richard Tew has indicated that he'd like the
Stackless repository to also be converted. What other projects in the
svn.python.org repository should be converted?
There's now an initial stab at converting the Jython repository. The
current tip of hgsubversion unfortunately fails at some point.
@ -207,10 +240,11 @@ hg-ssh
Developers should access the repositories through ssh, similar to the
current setup. Public keys can be used to grant people access to a
shared hg@ account. A hgwebdir instance will also be set up for easy
browsing and read-only access. If we're using ssh, developers should
trivially be able to start new clones (for longer-term features that
profit from development in a separate repository).
shared hg@ account. A hgwebdir instance also has been set up at
``hg.python.org`` for easy browsing and read-only access. It is
configured so that developers can trivially start new clones (for
longer-term features that profit from development in a separate
repository).
Hooks
-----
@ -244,6 +278,7 @@ Mercurial. One additional hook could be beneficial:
.. _hooks repository: http://hg.python.org/hooks/
End-of-line conversions
-----------------------
@ -259,6 +294,7 @@ A hook on the server side that turns down any changegroup or changeset
introducing inconsistent newline data can still be implemented, if
deemed necessary.
hgwebdir
--------
@ -274,6 +310,7 @@ changesets by their hexadecimal ID.
.. _small WSGI application: http://hg.python.org/pymigr/file/tip/hglookup.py
roundup
-------
@ -291,47 +328,43 @@ Where to get code
After migration, the hgwebdir will live at hg.python.org. This is an
accepted standard for many organizations, and an easy parallel to
svn.python.org. The 3.x repo might live at
http://hg.python.org/main/, for example, with the 2.x repo at
http://hg.python.org/2.x/. For write access, developers will have to
use ssh, which could be ssh://hg@hg.python.org/main/. A demo
installation will be set up with a preliminary conversion so people
can experiment and review; it can live at
http://hg.python.org/example/.
svn.python.org. The working repo might live at
http://hg.python.org/cpython/, for example, with the archive repo at
http://hg.python.org/cpython-archive/. For write access, developers
will have to use ssh, which could be ssh://hg@hg.python.org/cpython/.
code.python.org was also proposed as the hostname. Personally, I
think that using the VCS name in the hostname is good because it
prevents confusion: it should be clear that you can't use svn or bzr
for hg.python.org.
code.python.org was also proposed as the hostname. We think that
using the VCS name in the hostname is good because it prevents
confusion: it should be clear that you can't use svn or bzr for
hg.python.org.
hgwebdir can already provide tarballs for every changeset. I think
this obviates the need for daily snapshots; we can just point users to
hgwebdir can already provide tarballs for every changeset. This
obviates the need for daily snapshots; we can just point users to
tip.tar.gz instead, meaning they will get the latest. If desired, we
could even use buildbot results to point to the last good changeset.
Python-specific documentation
-----------------------------
hg comes with good built-in documentation (available through hg help)
and a `wiki`_ that's full of useful information and recipes. In
addition to that, the `parts of the developer FAQ`_ concerning version
control will gain a section on using hg for Python development. Some
of the text will be dependent on the outcome of debate about this PEP
(for example, the branching strategy).
and a `wiki`_ that's full of useful information and recipes.
.. _wiki: http://www.selenic.com/mercurial/wiki/
.. _parts of the developer FAQ: http://www.python.org/dev/faq/#version-control
In addition to that, the recently overhauled `Python Developer's
Guide`_ already has a branch with instructions for Mercurial instead
of Subversion; an online `build of this branch`_ is also available.
.. _Python Developer's Guide: http://docs.python.org/devguide/
.. _build of this branch: http://potrou.net/hgdevguide/
The developer FAQ will be overhauled by Brett Cannon, which will
include any updates needed with respect to Mercurial.
Proposed workflow
-----------------
I propose two workflows for the migration of patches between several
We propose two workflows for the migration of patches between several
branches.
For migration within 2.x or 3.x branches, I propose a patch always
For migration within 2.x or 3.x branches, we propose a patch always
gets committed to the oldest branch where it applies first. Then, the
resulting changeset can be merged using hg merge to all newer branches
within that series (2.x or 3.x). If it does not apply as-is to the
@ -358,6 +391,7 @@ Choosing this approach allows 3.x not to carry all of the 2.x
history-since-it-was-branched, meaning the clone is not as big and the
merges not as complicated.
The future of Subversion
------------------------
@ -366,8 +400,9 @@ Since the svn server contains a bunch of repositories, not just the
CPython one, it will probably live on for a bit as not every project
may want to migrate or it takes longer for other projects to migrate.
To prevent people from staying behind, we may want to move migrated
projects from the repository to a new, read-only repository with a
new name.
projects from the repository to a new, read-only repository with a new
name.
Build identification
--------------------
@ -410,6 +445,19 @@ instead of Subversion's 'trunk', and reflects the proposed new tag
format.
Footnotes
=========
.. [1] The Mercurial book discourages the use of named branches, but
it is, in this respect, somewhat outdated. Named branches have
gotten much easier to use since that comment was written, due to
improvements in hg.
.. [2] Since the initial working repo is a subset of the archive repo,
it would also be feasible to pull changes from the working repo
into the archive repo periodically.
Copyright
=========