From bb40b9f9cbef7644fc0cd5369949d927d59714c8 Mon Sep 17 00:00:00 2001 From: Georg Brandl Date: Fri, 25 Feb 2011 18:30:02 +0000 Subject: [PATCH] Overhaul PEP 385 with newest conversion strategy. --- pep-0385.txt | 230 +++++++++++++++++++++++++++++++-------------------- 1 file changed, 139 insertions(+), 91 deletions(-) diff --git a/pep-0385.txt b/pep-0385.txt index 233a0e53d..8683904ab 100644 --- a/pep-0385.txt +++ b/pep-0385.txt @@ -1,8 +1,10 @@ PEP: 385 -Title: Migrating from svn to Mercurial +Title: Migrating from Subversion to Mercurial Version: $Revision$ Last-Modified: $Date$ -Author: Dirkjan Ochtman +Author: Dirkjan Ochtman , + Antoine Pitrou , + Georg Brandl Status: Active Type: Process Content-Type: text/x-rst @@ -20,12 +22,12 @@ PEP is an attempt to describe the steps that must be taken for further discussion. It's somewhat similar to `PEP 347`_, which discussed the migration to SVN. -To make the most of hg, I (Dirkjan) would like to make a high-fidelity +To make the most of hg, we would like to make a high-fidelity conversion, such that (a) as much of the svn metadata as possible is retained, and (b) all metadata is converted to formats that are common in Mercurial. This way, tools written for Mercurial can be optimally -used. In order to do this, I want to use the `hgsubversion`_ software -to do an initial conversion. This hg extension is focused on +used. In order to do this, we want to use the `hgsubversion`_ +software to do an initial conversion. This hg extension is focused on providing high-quality conversion from Subversion to Mercurial for use in two-way correspondence, meaning it doesn't throw away as much available metadata as other solutions. @@ -44,7 +46,7 @@ Timeline The current schedule for conversion milestones: -- 2010-11-20: availability of a test repo at hg.python.org +- 2011-02-24: availability of a test repo at hg.python.org Test commits will be allowed (and encouraged) from all committers to the Subversion repository. The test repository and all test commits @@ -52,7 +54,7 @@ The current schedule for conversion milestones: hooks will be installed for the test repository, in order to test buildbot, diff-email and whitespace checking integration. -- 2010-12-12: final conversion (tentative) +- 2010-03-09: final conversion (tentative) Commits to the Subversion branches now maintained in Mercurial will be blocked. Developers should refrain from pushing to the Mercurial @@ -80,15 +82,8 @@ each branch is kept in a separate repository, and named branches, where each revision keeps metadata to note on which branch it belongs. The former makes it easier to distinguish branches, at the expense of requiring more disk space on the client. The latter makes it a little -easier to switch between branches, but often has somewhat unintuitive -results for people (though this has been getting better in recent -versions of Mercurial). - -The current proposal is to use named branches for release branches and -adopt cloned branches for feature branches, with one exception to this -rule: the 3.x branches will be kept in separate clones from the 2.x -branches. I think this provides an optimal hybrid approach for -Python's uses of branching. +easier to switch between branches, but all branch names are a +persistent part of history. [1]_ Differences between named branches and cloned branches: @@ -97,39 +92,70 @@ Differences between named branches and cloned branches: * Clones with named branches will be larger, since they contain more data -(The Mercurial book discourages the use of named branches, but it is, -in this respect, somewhat outdated. Named branches have gotten much -easier to use since that comment was written, due to improvements in -hg.) +We propose to use named branches for release branches and adopt cloned +branches for feature branches. -Converting branches -------------------- +.. with one exception to this rule: the 3.x branches will be kept in +.. separate clones from the 2.x branches. I think this provides an +.. optimal hybrid approach for Python's uses of branching. -There are quite a lot of branches in SVN's branches directory. I -propose to clean this up a bit, by following this basic strategy: -* Keep all release (maintenance) branches -* Discard branches that haven't been touched in 18 months, unless - somone indicates there's still interest in such a branch -* Keep branches that have been touched in the last 18 months, unless - someone indicates the branch can be deprecated +History management +------------------ -There's a `branch map`_ available that shows info about each branch: +In order to minimize the loss of information due to the conversion, we +propose to provide several repositories as a conversion result: -* keep-clone means we'll keep that branch in a separate clone -* keep-named means we'll keep that branch as a named branch in one of - the clones -* strip means we won't keep that branch -* streamed-merge means that it got merged by committing several new - revisions to the other branch -* merged-r* means the branch got merged in the named revision -* merges? means I haven't checked/found out yet whether that branch - was ever merged -* ? means that your input would be even more helpful than for the - other items -* some items have no action yet, feel free to treat that as just '?' +* A repository with the full, unedited conversion of the Subversion + repository (actually, its /python subdirectory) -- this is called + the "historic" or "archive" repo and will be offered as a read-only + resource. [2]_ + +* A repository trimmed to the mainline trunk (and py3k), as well as + past and present maintenance branches -- this is called the + "working" repo and is where development continues. + + The ``default`` branch in that repo is what is known as ``py3k`` in + Subversion, while the Subversion trunk lives on with the branch name + ``trunk``; however in Mercurial this branch will be closed. Release + branches are named after their major.minor version, e.g. ``3.2``. + +* One more repository per active feature branch; "active" means that + at least one core developer asks for the branch to be provided. + + All other branches are still present in the historic repo, and can + be extracted as separate repositories at any time should it prove to + be necessary. + +.. Converting branches +.. ------------------- + +.. There are quite a lot of branches in SVN's branches directory. We +.. propose to clean this up a bit, by following this basic strategy: + +.. * Keep all release (maintenance) branches +.. * Discard branches that haven't been touched in 18 months, unless +.. somone indicates there's still interest in such a branch +.. * Keep branches that have been touched in the last 18 months, unless +.. someone indicates the branch can be deprecated + +.. There's a `branch map`_ available that shows info about each branch: + +.. * keep-clone means we'll keep that branch in a separate clone +.. * keep-named means we'll keep that branch as a named branch in one of +.. the clones +.. * strip means we won't keep that branch +.. * streamed-merge means that it got merged by committing several new +.. revisions to the other branch +.. * merged-r* means the branch got merged in the named revision +.. * merges? means we haven't checked/found out yet whether that branch +.. was ever merged +.. * ? means that your input would be even more helpful than for the +.. other items +.. * some items have no action yet, feel free to treat that as just '?' + +.. .. _branch map: http://hg.python.org/pymigr/file/tip/all-branches.txt -.. _branch map: http://hg.python.org/pymigr/file/tip/all-branches.txt Converting tags --------------- @@ -137,18 +163,16 @@ Converting tags The SVN tags directory contains a lot of old stuff. Some of these are not, in fact, full tags, but contain only a smaller subset of the repository. All release tags will be kept; other tags will be -included based on requests from the developer community. I'd like to -consider unifying the release tag naming scheme to make some things -more consistent, if people feel that won't create too many problems. -The current proposal is to bring old release tags in line with the -current practice of release tag naming. +included based on requests from the developer community. We propose +to make the tag naming scheme consistent, in this style: ``v3.2.1a2``. + Author map ---------- In order to provide user names the way they are common in hg (in the 'First Last ' format), we need an author map to map -cvs and svn user names to real names and their email addresses. I +cvs and svn user names to real names and their email addresses. We have a complete version of such a map in my `migration tools repository`_. The email addresses in it might be out of date; that's bound to happen, although it would be nice to try and have as many @@ -157,6 +181,7 @@ current version also still seems to contain some encoding problems. .. _migration tools repository: http://hg.python.org/pymigr/ + Generating .hgignore -------------------- @@ -172,23 +197,31 @@ history of the file was debated but deemed impractical (because it's relatively hard with fairly little gain, since ignoring is less important for older revisions). -Revlog reordering ------------------ -As an optional optimization technique, I have performed a reordering -pass on the revlogs (internal Mercurial files) resulting from the -conversion. In some cases this results in dramatic decreases in -on-disk repository size. This especially makes sense for the manifest -(where it really helps out quite a lot) and oft-edited files like -Misc/NEWS (with an admittedly smaller effect). +Repository size +--------------- + +A bare conversion result of the current Python repository weighs 1.9 +GB; although this is smaller than the Subversion repository (2.7 GB) +it is not feasible. + +The size becomes more manageable by the trimming applied to the +working repository, and by a process called "revlog reordering" that +optimizes the layout of internal Mercurial storage very efficiently. + +After all optimizations done, the size of the working repository is +around 180 MB on disk. The amount of data transferred over the +network when cloning is estimated to be around 80 MB. + Other repositories ------------------ -Richard Tew has indicated that he'd like the Stackless repository to -also be converted. What other projects in the svn.python.org -repository should be converted? Do we want to convert the peps -repository? distutils? others? +There are a number of other projects hosted in svn.python.org's +"projects" repository. The "peps" directory will be converted along +with the main Python one. Richard Tew has indicated that he'd like the +Stackless repository to also be converted. What other projects in the +svn.python.org repository should be converted? There's now an initial stab at converting the Jython repository. The current tip of hgsubversion unfortunately fails at some point. @@ -207,10 +240,11 @@ hg-ssh Developers should access the repositories through ssh, similar to the current setup. Public keys can be used to grant people access to a -shared hg@ account. A hgwebdir instance will also be set up for easy -browsing and read-only access. If we're using ssh, developers should -trivially be able to start new clones (for longer-term features that -profit from development in a separate repository). +shared hg@ account. A hgwebdir instance also has been set up at +``hg.python.org`` for easy browsing and read-only access. It is +configured so that developers can trivially start new clones (for +longer-term features that profit from development in a separate +repository). Hooks ----- @@ -244,6 +278,7 @@ Mercurial. One additional hook could be beneficial: .. _hooks repository: http://hg.python.org/hooks/ + End-of-line conversions ----------------------- @@ -259,6 +294,7 @@ A hook on the server side that turns down any changegroup or changeset introducing inconsistent newline data can still be implemented, if deemed necessary. + hgwebdir -------- @@ -274,6 +310,7 @@ changesets by their hexadecimal ID. .. _small WSGI application: http://hg.python.org/pymigr/file/tip/hglookup.py + roundup ------- @@ -291,47 +328,43 @@ Where to get code After migration, the hgwebdir will live at hg.python.org. This is an accepted standard for many organizations, and an easy parallel to -svn.python.org. The 3.x repo might live at -http://hg.python.org/main/, for example, with the 2.x repo at -http://hg.python.org/2.x/. For write access, developers will have to -use ssh, which could be ssh://hg@hg.python.org/main/. A demo -installation will be set up with a preliminary conversion so people -can experiment and review; it can live at -http://hg.python.org/example/. +svn.python.org. The working repo might live at +http://hg.python.org/cpython/, for example, with the archive repo at +http://hg.python.org/cpython-archive/. For write access, developers +will have to use ssh, which could be ssh://hg@hg.python.org/cpython/. -code.python.org was also proposed as the hostname. Personally, I -think that using the VCS name in the hostname is good because it -prevents confusion: it should be clear that you can't use svn or bzr -for hg.python.org. +code.python.org was also proposed as the hostname. We think that +using the VCS name in the hostname is good because it prevents +confusion: it should be clear that you can't use svn or bzr for +hg.python.org. -hgwebdir can already provide tarballs for every changeset. I think -this obviates the need for daily snapshots; we can just point users to +hgwebdir can already provide tarballs for every changeset. This +obviates the need for daily snapshots; we can just point users to tip.tar.gz instead, meaning they will get the latest. If desired, we could even use buildbot results to point to the last good changeset. + Python-specific documentation ----------------------------- hg comes with good built-in documentation (available through hg help) -and a `wiki`_ that's full of useful information and recipes. In -addition to that, the `parts of the developer FAQ`_ concerning version -control will gain a section on using hg for Python development. Some -of the text will be dependent on the outcome of debate about this PEP -(for example, the branching strategy). +and a `wiki`_ that's full of useful information and recipes. -.. _wiki: http://www.selenic.com/mercurial/wiki/ -.. _parts of the developer FAQ: http://www.python.org/dev/faq/#version-control +In addition to that, the recently overhauled `Python Developer's +Guide`_ already has a branch with instructions for Mercurial instead +of Subversion; an online `build of this branch`_ is also available. + +.. _Python Developer's Guide: http://docs.python.org/devguide/ +.. _build of this branch: http://potrou.net/hgdevguide/ -The developer FAQ will be overhauled by Brett Cannon, which will -include any updates needed with respect to Mercurial. Proposed workflow ----------------- -I propose two workflows for the migration of patches between several +We propose two workflows for the migration of patches between several branches. -For migration within 2.x or 3.x branches, I propose a patch always +For migration within 2.x or 3.x branches, we propose a patch always gets committed to the oldest branch where it applies first. Then, the resulting changeset can be merged using hg merge to all newer branches within that series (2.x or 3.x). If it does not apply as-is to the @@ -358,6 +391,7 @@ Choosing this approach allows 3.x not to carry all of the 2.x history-since-it-was-branched, meaning the clone is not as big and the merges not as complicated. + The future of Subversion ------------------------ @@ -366,8 +400,9 @@ Since the svn server contains a bunch of repositories, not just the CPython one, it will probably live on for a bit as not every project may want to migrate or it takes longer for other projects to migrate. To prevent people from staying behind, we may want to move migrated -projects from the repository to a new, read-only repository with a -new name. +projects from the repository to a new, read-only repository with a new +name. + Build identification -------------------- @@ -410,6 +445,19 @@ instead of Subversion's 'trunk', and reflects the proposed new tag format. +Footnotes +========= + +.. [1] The Mercurial book discourages the use of named branches, but + it is, in this respect, somewhat outdated. Named branches have + gotten much easier to use since that comment was written, due to + improvements in hg. + +.. [2] Since the initial working repo is a subset of the archive repo, + it would also be feasible to pull changes from the working repo + into the archive repo periodically. + + Copyright =========