Overhaul PEP 385 with newest conversion strategy.
This commit is contained in:
parent
8ac0ce5b74
commit
bb40b9f9cb
230
pep-0385.txt
230
pep-0385.txt
|
@ -1,8 +1,10 @@
|
|||
PEP: 385
|
||||
Title: Migrating from svn to Mercurial
|
||||
Title: Migrating from Subversion to Mercurial
|
||||
Version: $Revision$
|
||||
Last-Modified: $Date$
|
||||
Author: Dirkjan Ochtman <dirkjan@ochtman.nl>
|
||||
Author: Dirkjan Ochtman <dirkjan@ochtman.nl>,
|
||||
Antoine Pitrou <solipsis@pitrou.net>,
|
||||
Georg Brandl <georg@python.org>
|
||||
Status: Active
|
||||
Type: Process
|
||||
Content-Type: text/x-rst
|
||||
|
@ -20,12 +22,12 @@ PEP is an attempt to describe the steps that must be taken for further
|
|||
discussion. It's somewhat similar to `PEP 347`_, which discussed the
|
||||
migration to SVN.
|
||||
|
||||
To make the most of hg, I (Dirkjan) would like to make a high-fidelity
|
||||
To make the most of hg, we would like to make a high-fidelity
|
||||
conversion, such that (a) as much of the svn metadata as possible is
|
||||
retained, and (b) all metadata is converted to formats that are common
|
||||
in Mercurial. This way, tools written for Mercurial can be optimally
|
||||
used. In order to do this, I want to use the `hgsubversion`_ software
|
||||
to do an initial conversion. This hg extension is focused on
|
||||
used. In order to do this, we want to use the `hgsubversion`_
|
||||
software to do an initial conversion. This hg extension is focused on
|
||||
providing high-quality conversion from Subversion to Mercurial for use
|
||||
in two-way correspondence, meaning it doesn't throw away as much
|
||||
available metadata as other solutions.
|
||||
|
@ -44,7 +46,7 @@ Timeline
|
|||
|
||||
The current schedule for conversion milestones:
|
||||
|
||||
- 2010-11-20: availability of a test repo at hg.python.org
|
||||
- 2011-02-24: availability of a test repo at hg.python.org
|
||||
|
||||
Test commits will be allowed (and encouraged) from all committers to
|
||||
the Subversion repository. The test repository and all test commits
|
||||
|
@ -52,7 +54,7 @@ The current schedule for conversion milestones:
|
|||
hooks will be installed for the test repository, in order to test
|
||||
buildbot, diff-email and whitespace checking integration.
|
||||
|
||||
- 2010-12-12: final conversion (tentative)
|
||||
- 2010-03-09: final conversion (tentative)
|
||||
|
||||
Commits to the Subversion branches now maintained in Mercurial will
|
||||
be blocked. Developers should refrain from pushing to the Mercurial
|
||||
|
@ -80,15 +82,8 @@ each branch is kept in a separate repository, and named branches,
|
|||
where each revision keeps metadata to note on which branch it belongs.
|
||||
The former makes it easier to distinguish branches, at the expense of
|
||||
requiring more disk space on the client. The latter makes it a little
|
||||
easier to switch between branches, but often has somewhat unintuitive
|
||||
results for people (though this has been getting better in recent
|
||||
versions of Mercurial).
|
||||
|
||||
The current proposal is to use named branches for release branches and
|
||||
adopt cloned branches for feature branches, with one exception to this
|
||||
rule: the 3.x branches will be kept in separate clones from the 2.x
|
||||
branches. I think this provides an optimal hybrid approach for
|
||||
Python's uses of branching.
|
||||
easier to switch between branches, but all branch names are a
|
||||
persistent part of history. [1]_
|
||||
|
||||
Differences between named branches and cloned branches:
|
||||
|
||||
|
@ -97,39 +92,70 @@ Differences between named branches and cloned branches:
|
|||
* Clones with named branches will be larger, since they contain more
|
||||
data
|
||||
|
||||
(The Mercurial book discourages the use of named branches, but it is,
|
||||
in this respect, somewhat outdated. Named branches have gotten much
|
||||
easier to use since that comment was written, due to improvements in
|
||||
hg.)
|
||||
We propose to use named branches for release branches and adopt cloned
|
||||
branches for feature branches.
|
||||
|
||||
Converting branches
|
||||
-------------------
|
||||
.. with one exception to this rule: the 3.x branches will be kept in
|
||||
.. separate clones from the 2.x branches. I think this provides an
|
||||
.. optimal hybrid approach for Python's uses of branching.
|
||||
|
||||
There are quite a lot of branches in SVN's branches directory. I
|
||||
propose to clean this up a bit, by following this basic strategy:
|
||||
|
||||
* Keep all release (maintenance) branches
|
||||
* Discard branches that haven't been touched in 18 months, unless
|
||||
somone indicates there's still interest in such a branch
|
||||
* Keep branches that have been touched in the last 18 months, unless
|
||||
someone indicates the branch can be deprecated
|
||||
History management
|
||||
------------------
|
||||
|
||||
There's a `branch map`_ available that shows info about each branch:
|
||||
In order to minimize the loss of information due to the conversion, we
|
||||
propose to provide several repositories as a conversion result:
|
||||
|
||||
* keep-clone means we'll keep that branch in a separate clone
|
||||
* keep-named means we'll keep that branch as a named branch in one of
|
||||
the clones
|
||||
* strip means we won't keep that branch
|
||||
* streamed-merge means that it got merged by committing several new
|
||||
revisions to the other branch
|
||||
* merged-r* means the branch got merged in the named revision
|
||||
* merges? means I haven't checked/found out yet whether that branch
|
||||
was ever merged
|
||||
* ? means that your input would be even more helpful than for the
|
||||
other items
|
||||
* some items have no action yet, feel free to treat that as just '?'
|
||||
* A repository with the full, unedited conversion of the Subversion
|
||||
repository (actually, its /python subdirectory) -- this is called
|
||||
the "historic" or "archive" repo and will be offered as a read-only
|
||||
resource. [2]_
|
||||
|
||||
* A repository trimmed to the mainline trunk (and py3k), as well as
|
||||
past and present maintenance branches -- this is called the
|
||||
"working" repo and is where development continues.
|
||||
|
||||
The ``default`` branch in that repo is what is known as ``py3k`` in
|
||||
Subversion, while the Subversion trunk lives on with the branch name
|
||||
``trunk``; however in Mercurial this branch will be closed. Release
|
||||
branches are named after their major.minor version, e.g. ``3.2``.
|
||||
|
||||
* One more repository per active feature branch; "active" means that
|
||||
at least one core developer asks for the branch to be provided.
|
||||
|
||||
All other branches are still present in the historic repo, and can
|
||||
be extracted as separate repositories at any time should it prove to
|
||||
be necessary.
|
||||
|
||||
.. Converting branches
|
||||
.. -------------------
|
||||
|
||||
.. There are quite a lot of branches in SVN's branches directory. We
|
||||
.. propose to clean this up a bit, by following this basic strategy:
|
||||
|
||||
.. * Keep all release (maintenance) branches
|
||||
.. * Discard branches that haven't been touched in 18 months, unless
|
||||
.. somone indicates there's still interest in such a branch
|
||||
.. * Keep branches that have been touched in the last 18 months, unless
|
||||
.. someone indicates the branch can be deprecated
|
||||
|
||||
.. There's a `branch map`_ available that shows info about each branch:
|
||||
|
||||
.. * keep-clone means we'll keep that branch in a separate clone
|
||||
.. * keep-named means we'll keep that branch as a named branch in one of
|
||||
.. the clones
|
||||
.. * strip means we won't keep that branch
|
||||
.. * streamed-merge means that it got merged by committing several new
|
||||
.. revisions to the other branch
|
||||
.. * merged-r* means the branch got merged in the named revision
|
||||
.. * merges? means we haven't checked/found out yet whether that branch
|
||||
.. was ever merged
|
||||
.. * ? means that your input would be even more helpful than for the
|
||||
.. other items
|
||||
.. * some items have no action yet, feel free to treat that as just '?'
|
||||
|
||||
.. .. _branch map: http://hg.python.org/pymigr/file/tip/all-branches.txt
|
||||
|
||||
.. _branch map: http://hg.python.org/pymigr/file/tip/all-branches.txt
|
||||
|
||||
Converting tags
|
||||
---------------
|
||||
|
@ -137,18 +163,16 @@ Converting tags
|
|||
The SVN tags directory contains a lot of old stuff. Some of these are
|
||||
not, in fact, full tags, but contain only a smaller subset of the
|
||||
repository. All release tags will be kept; other tags will be
|
||||
included based on requests from the developer community. I'd like to
|
||||
consider unifying the release tag naming scheme to make some things
|
||||
more consistent, if people feel that won't create too many problems.
|
||||
The current proposal is to bring old release tags in line with the
|
||||
current practice of release tag naming.
|
||||
included based on requests from the developer community. We propose
|
||||
to make the tag naming scheme consistent, in this style: ``v3.2.1a2``.
|
||||
|
||||
|
||||
Author map
|
||||
----------
|
||||
|
||||
In order to provide user names the way they are common in hg (in the
|
||||
'First Last <user@example.org>' format), we need an author map to map
|
||||
cvs and svn user names to real names and their email addresses. I
|
||||
cvs and svn user names to real names and their email addresses. We
|
||||
have a complete version of such a map in my `migration tools
|
||||
repository`_. The email addresses in it might be out of date; that's
|
||||
bound to happen, although it would be nice to try and have as many
|
||||
|
@ -157,6 +181,7 @@ current version also still seems to contain some encoding problems.
|
|||
|
||||
.. _migration tools repository: http://hg.python.org/pymigr/
|
||||
|
||||
|
||||
Generating .hgignore
|
||||
--------------------
|
||||
|
||||
|
@ -172,23 +197,31 @@ history of the file was debated but deemed impractical (because it's
|
|||
relatively hard with fairly little gain, since ignoring is less
|
||||
important for older revisions).
|
||||
|
||||
Revlog reordering
|
||||
-----------------
|
||||
|
||||
As an optional optimization technique, I have performed a reordering
|
||||
pass on the revlogs (internal Mercurial files) resulting from the
|
||||
conversion. In some cases this results in dramatic decreases in
|
||||
on-disk repository size. This especially makes sense for the manifest
|
||||
(where it really helps out quite a lot) and oft-edited files like
|
||||
Misc/NEWS (with an admittedly smaller effect).
|
||||
Repository size
|
||||
---------------
|
||||
|
||||
A bare conversion result of the current Python repository weighs 1.9
|
||||
GB; although this is smaller than the Subversion repository (2.7 GB)
|
||||
it is not feasible.
|
||||
|
||||
The size becomes more manageable by the trimming applied to the
|
||||
working repository, and by a process called "revlog reordering" that
|
||||
optimizes the layout of internal Mercurial storage very efficiently.
|
||||
|
||||
After all optimizations done, the size of the working repository is
|
||||
around 180 MB on disk. The amount of data transferred over the
|
||||
network when cloning is estimated to be around 80 MB.
|
||||
|
||||
|
||||
Other repositories
|
||||
------------------
|
||||
|
||||
Richard Tew has indicated that he'd like the Stackless repository to
|
||||
also be converted. What other projects in the svn.python.org
|
||||
repository should be converted? Do we want to convert the peps
|
||||
repository? distutils? others?
|
||||
There are a number of other projects hosted in svn.python.org's
|
||||
"projects" repository. The "peps" directory will be converted along
|
||||
with the main Python one. Richard Tew has indicated that he'd like the
|
||||
Stackless repository to also be converted. What other projects in the
|
||||
svn.python.org repository should be converted?
|
||||
|
||||
There's now an initial stab at converting the Jython repository. The
|
||||
current tip of hgsubversion unfortunately fails at some point.
|
||||
|
@ -207,10 +240,11 @@ hg-ssh
|
|||
|
||||
Developers should access the repositories through ssh, similar to the
|
||||
current setup. Public keys can be used to grant people access to a
|
||||
shared hg@ account. A hgwebdir instance will also be set up for easy
|
||||
browsing and read-only access. If we're using ssh, developers should
|
||||
trivially be able to start new clones (for longer-term features that
|
||||
profit from development in a separate repository).
|
||||
shared hg@ account. A hgwebdir instance also has been set up at
|
||||
``hg.python.org`` for easy browsing and read-only access. It is
|
||||
configured so that developers can trivially start new clones (for
|
||||
longer-term features that profit from development in a separate
|
||||
repository).
|
||||
|
||||
Hooks
|
||||
-----
|
||||
|
@ -244,6 +278,7 @@ Mercurial. One additional hook could be beneficial:
|
|||
|
||||
.. _hooks repository: http://hg.python.org/hooks/
|
||||
|
||||
|
||||
End-of-line conversions
|
||||
-----------------------
|
||||
|
||||
|
@ -259,6 +294,7 @@ A hook on the server side that turns down any changegroup or changeset
|
|||
introducing inconsistent newline data can still be implemented, if
|
||||
deemed necessary.
|
||||
|
||||
|
||||
hgwebdir
|
||||
--------
|
||||
|
||||
|
@ -274,6 +310,7 @@ changesets by their hexadecimal ID.
|
|||
|
||||
.. _small WSGI application: http://hg.python.org/pymigr/file/tip/hglookup.py
|
||||
|
||||
|
||||
roundup
|
||||
-------
|
||||
|
||||
|
@ -291,47 +328,43 @@ Where to get code
|
|||
|
||||
After migration, the hgwebdir will live at hg.python.org. This is an
|
||||
accepted standard for many organizations, and an easy parallel to
|
||||
svn.python.org. The 3.x repo might live at
|
||||
http://hg.python.org/main/, for example, with the 2.x repo at
|
||||
http://hg.python.org/2.x/. For write access, developers will have to
|
||||
use ssh, which could be ssh://hg@hg.python.org/main/. A demo
|
||||
installation will be set up with a preliminary conversion so people
|
||||
can experiment and review; it can live at
|
||||
http://hg.python.org/example/.
|
||||
svn.python.org. The working repo might live at
|
||||
http://hg.python.org/cpython/, for example, with the archive repo at
|
||||
http://hg.python.org/cpython-archive/. For write access, developers
|
||||
will have to use ssh, which could be ssh://hg@hg.python.org/cpython/.
|
||||
|
||||
code.python.org was also proposed as the hostname. Personally, I
|
||||
think that using the VCS name in the hostname is good because it
|
||||
prevents confusion: it should be clear that you can't use svn or bzr
|
||||
for hg.python.org.
|
||||
code.python.org was also proposed as the hostname. We think that
|
||||
using the VCS name in the hostname is good because it prevents
|
||||
confusion: it should be clear that you can't use svn or bzr for
|
||||
hg.python.org.
|
||||
|
||||
hgwebdir can already provide tarballs for every changeset. I think
|
||||
this obviates the need for daily snapshots; we can just point users to
|
||||
hgwebdir can already provide tarballs for every changeset. This
|
||||
obviates the need for daily snapshots; we can just point users to
|
||||
tip.tar.gz instead, meaning they will get the latest. If desired, we
|
||||
could even use buildbot results to point to the last good changeset.
|
||||
|
||||
|
||||
Python-specific documentation
|
||||
-----------------------------
|
||||
|
||||
hg comes with good built-in documentation (available through hg help)
|
||||
and a `wiki`_ that's full of useful information and recipes. In
|
||||
addition to that, the `parts of the developer FAQ`_ concerning version
|
||||
control will gain a section on using hg for Python development. Some
|
||||
of the text will be dependent on the outcome of debate about this PEP
|
||||
(for example, the branching strategy).
|
||||
and a `wiki`_ that's full of useful information and recipes.
|
||||
|
||||
.. _wiki: http://www.selenic.com/mercurial/wiki/
|
||||
.. _parts of the developer FAQ: http://www.python.org/dev/faq/#version-control
|
||||
In addition to that, the recently overhauled `Python Developer's
|
||||
Guide`_ already has a branch with instructions for Mercurial instead
|
||||
of Subversion; an online `build of this branch`_ is also available.
|
||||
|
||||
.. _Python Developer's Guide: http://docs.python.org/devguide/
|
||||
.. _build of this branch: http://potrou.net/hgdevguide/
|
||||
|
||||
The developer FAQ will be overhauled by Brett Cannon, which will
|
||||
include any updates needed with respect to Mercurial.
|
||||
|
||||
Proposed workflow
|
||||
-----------------
|
||||
|
||||
I propose two workflows for the migration of patches between several
|
||||
We propose two workflows for the migration of patches between several
|
||||
branches.
|
||||
|
||||
For migration within 2.x or 3.x branches, I propose a patch always
|
||||
For migration within 2.x or 3.x branches, we propose a patch always
|
||||
gets committed to the oldest branch where it applies first. Then, the
|
||||
resulting changeset can be merged using hg merge to all newer branches
|
||||
within that series (2.x or 3.x). If it does not apply as-is to the
|
||||
|
@ -358,6 +391,7 @@ Choosing this approach allows 3.x not to carry all of the 2.x
|
|||
history-since-it-was-branched, meaning the clone is not as big and the
|
||||
merges not as complicated.
|
||||
|
||||
|
||||
The future of Subversion
|
||||
------------------------
|
||||
|
||||
|
@ -366,8 +400,9 @@ Since the svn server contains a bunch of repositories, not just the
|
|||
CPython one, it will probably live on for a bit as not every project
|
||||
may want to migrate or it takes longer for other projects to migrate.
|
||||
To prevent people from staying behind, we may want to move migrated
|
||||
projects from the repository to a new, read-only repository with a
|
||||
new name.
|
||||
projects from the repository to a new, read-only repository with a new
|
||||
name.
|
||||
|
||||
|
||||
Build identification
|
||||
--------------------
|
||||
|
@ -410,6 +445,19 @@ instead of Subversion's 'trunk', and reflects the proposed new tag
|
|||
format.
|
||||
|
||||
|
||||
Footnotes
|
||||
=========
|
||||
|
||||
.. [1] The Mercurial book discourages the use of named branches, but
|
||||
it is, in this respect, somewhat outdated. Named branches have
|
||||
gotten much easier to use since that comment was written, due to
|
||||
improvements in hg.
|
||||
|
||||
.. [2] Since the initial working repo is a subset of the archive repo,
|
||||
it would also be feasible to pull changes from the working repo
|
||||
into the archive repo periodically.
|
||||
|
||||
|
||||
Copyright
|
||||
=========
|
||||
|
||||
|
|
Loading…
Reference in New Issue