diff --git a/pep-0374.txt b/pep-0374.txt index 5f766c1d1..40948771d 100644 --- a/pep-0374.txt +++ b/pep-0374.txt @@ -1,11 +1,9 @@ PEP: 374 -Title: Migrating from svn to a distributed VCS +Title: Migrating from svn to Mercurial Version: $Revision$ Last-Modified: $Date$ Author: Brett Cannon , - Stephen J. Turnbull , - Alexandre Vassalotti , - Barry Warsaw + Dirkjan Ochtman Status: Active Type: Process Content-Type: text/x-rst @@ -19,8 +17,8 @@ Post-History: 07-Nov-2008 chosen DVCS. -Rationale -========= +Motivation +========== Python has been using a centralized version control system (VCS; first CVS, now Subversion) for years to great effect. Having a master @@ -101,1371 +99,177 @@ to be used. If this happens, this PEP will be revisited and revised in the future as the state of DVCSs evolves. -Terminology -=========== - -Agreeing on a common terminology is surprisingly difficult, -primarily because each VCS uses these terms when describing subtly -different tasks, objects, and concepts. Where possible, we try to -provide a generic definition of the concepts, but you should consult -the individual system's glossaries for details. Here are some basic -references for terminology, from some of the standard web-based -references on each VCS. You can also refer to glossaries for each -DVCS: - -* Subversion : http://svnbook.red-bean.com/en/1.5/svn.basic.html -* Bazaar : http://bazaar-vcs.org/BzrGlossary -* Mercurial : http://www.selenic.com/mercurial/wiki/index.cgi/UnderstandingMercurial -* git : http://book.git-scm.com/1_the_git_object_model.html - - -branch - A line of development; a collection of revisions, ordered by - time. - -checkout/working copy/working tree - A tree of code the developer can edit, linked to a branch. - -index - A "staging area" where a revision is built (unique to git). - -repository - A collection of revisions, organized into branches. - -clone - A complete copy of a branch or repository. - -commit - To record a revision in a repository. - -merge - Applying all the changes and history from one branch/repository - to another. - -pull - To update a checkout/clone from the original branch/repository, - which can be remote or local - -push/publish - To copy a revision, and all revisions it depends on, from a one - repository to another. - -cherry-pick - To merge one or more specific revisions from one branch to - another, possibly in a different repository, possibly without its - dependent revisions. - -rebase - To "detach" a branch, and move it to a new branch point; move - commits to the beginning of a branch instead of where they - happened in time. - - -Typical Workflow -================ - -At the moment, the typical workflow for a Python core developer is: - - -* Edit code in a checkout until it is stable enough to commit/push. -* Commit to the master repository. - -It is a rather simple workflow, but it has drawbacks. For one, -because any work that involves the repository takes time thanks to -the network, commits/pushes tend to not necessarily be as atomic as -possible. There is also the drawback of there not being a -necessarily cheap way to create new checkouts beyond a recursive -copy of the checkout directory. - -A DVCS would lead to a workflow more like this: - -* Branch off of a local clone of the master repository. -* Edit code, committing in atomic pieces. -* Merge the branch into the mainline, and -* Push all commits to the master repository. - -While there are more possible steps, the workflow is much more -independent of the master repository than is currently possible. By -being able to commit locally at the speed of your disk, a core -developer is able to do atomic commits much more frequently, -minimizing having commits that do multiple things to the code. Also -by using a branch, the changes are isolated (if desired) from other -changes being made by other developers. Because branches are cheap, -it is easy to create and maintain many smaller branches that address -one specific issue, e.g. one bug or one new feature. More -sophisticated features of DVCSs allow the developer to more easily -track long running development branches as the official mainline -progresses. - - -Contenders -========== - -========== ========== ======= =================================== ========================================== -Name Short Name Version 2.x Trunk Mirror 3.x Trunk Mirror ----------- ---------- ------- ----------------------------------- ------------------------------------------ -Bazaar_ bzr 1.12 http://code.python.org/python/trunk http://code.python.org/python/3.0 -Mercurial_ hg 1.2.0 http://code.python.org/hg/trunk/ http://code.python.org/hg/branches/py3k/ -git_ N/A 1.6.1 git://code.python.org/python/trunk git://code.python.org/python/branches/py3k -========== ========== ======= =================================== ========================================== - -.. _Bazaar: http://bazaar-vcs.org/ -.. _Mercurial: http://www.selenic.com/mercurial/ -.. _git: http://www.git-scm.com/ - -This PEP does not consider darcs, arch, or monotone. The main -problem with these DVCSs is that they are simply not popular enough -to bother supporting when they do not provide some very compelling -features that the other DVCSs provide. Arch and darcs also have -significant performance problems which seem unlikely to be addressed -in the near future. - - -Interoperability -================ - -For those who have already decided which DVCSs they want to use, and -are willing to maintain local mirrors themselves, all three DVCSs -support interchange via the git "fast-import" changeset format. git -does so natively, of course, and native support for Bazaar is under -active development, and getting good early reviews as of mid-February -2009. Mercurial has idiosyncratic support for importing via its *hg -convert* command, and `third-party fast-import support`_ is available -for exporting. Also, the Tailor_ tool supports automatic maintenance -of mirrors based on an official repository in any of the candidate -formats with a local mirror in any format. - -.. _third-party fast-import support: http://repo.or.cz/r/fast-export.git/.git/description -.. _Tailor: http://progetti.arstecnica.it/tailor/ - - -Usage Scenarios -=============== - -Probably the best way to help decide on whether/which DVCS should -replace Subversion is to see what it takes to perform some -real-world usage scenarios that developers (core and non-core) have -to work with. Each usage scenario outlines what it is, a bullet list -of what the basic steps are (which can vary slightly per VCS), and -how to perform the usage scenario in the various VCSs -(including Subversion). - -Each VCS had a single author in charge of writing implementations -for each scenario (unless otherwise noted). - -========= === -Name VCS ---------- --- -Brett svn -Barry bzr -Alexandre hg -Stephen git -========= === - - -Initial Setup -------------- - -Some DVCSs have some perks if you do some initial setup upfront. -This section covers what can be done before any of the usage -scenarios are run in order to take better advantage of the tools. - -All of the DVCSs support configuring your project identification. -Unlike the centralized systems, they use your email address to -identify your commits. (Access control is generally done by -mechanisms external to the DVCS, such as ssh or console login). -This identity may be associated with a full name. - -All of the DVCSs will query the system to get some approximation to -this information, but that may not be what you want. They also -support setting this information on a per-user basis, and on a per- -project basis. Convenience commands to set these attributes vary, -but all allow direct editing of configuration files. - -Some VCSs support end-of-line (EOL) conversions on checkout/checkin. - - -svn -''' - -None required, but it is recommended you follow the -`guidelines `_ -in the dev FAQ. - - -bzr -''' - -No setup is required, but for much quicker and space-efficient local -branching, you should create a shared repository to hold all your -Python branches. A shared repository is really just a parent -directory containing a .bzr directory. When bzr commits a revision, -it searches from the local directory on up the file system for a .bzr -directory to hold the revision. By sharing revisions across multiple -branches, you cut down on the amount of disk space used. Do this:: - - cd ~/projects - bzr init-repo python - cd python - -Now, all your Python branches should be created inside of -``~/projects/python``. - -There are also some settings you can put in your -``~/.bzr/bazaar.conf`` -and ``~/.bzr/locations.conf`` file to set up defaults for interacting -with Python code. None of them are required, although some are -recommended. E.g. I would suggest gpg signing all commits, but that -might be too high a barrier for developers. Also, you can set up -default push locations depending on where you want to push branches -by default. If you have write access to the master branches, that -push location could be code.python.org. Otherwise, it might be a -free Bazaar code hosting service such as Launchpad. If Bazaar is -chosen, we should decide what the policies and recommendations are. - -At a minimum, I would set up your email address:: - - bzr whoami "Firstname Lastname " - -As with hg and git below, there are ways to set your email address (or really, -just about any parameter) on a -per-repository basis. You do this with settings in your -``$HOME/.bazaar/locations.conf`` file, which has an ini-style format as does -the other DVCSs. See the Bazaar documentation for details, -which mostly aren't relevant for this discussion. - - -hg -'' - -Minimally, you should set your user name. To do so, create the file -``.hgrc`` in your home directory and add the following:: - - [ui] - username = Firstname Lastname - -If you are using Windows and your tools do not support Unix-style newlines, -you can enable automatic newline translation by adding to your configuration:: - - [extensions] - win32text = - -These options can also be set locally to a given repository by -customizing ``/.hg/hgrc``, instead of ``~/.hgrc``. - - -git -''' - -None needed. However, git supports a number of features that can -smooth your work, with a little preparation. git supports setting -defaults at the workspace, user, and system levels. The system -level is out of scope of this PEP. The user configuration file is -``$HOME/.gitconfig`` on Unix-like systems, and the workspace -configuration file is ``$REPOSITORY/.git/config``. - -You can use the ``git-config`` tool to set preferences for user.name and -user.email either globally (for your system login account) or -locally (to a given git working copy), or you can edit the -configuration files (which have the same format as shown in the -Mercurial section above).:: - - # my full name doesn't change - # note "--global" flag means per user - # (system-wide configuration is set with "--system") - git config --global user.name 'Firstname Lastname' - # but use my Pythonic email address - cd /path/to/python/repository - git config user.email email.address@python.example.com - -If you are using Windows, you probably want to set the core.autocrlf -and core.safecrlf preferences to true using ``git-config``.:: - - # check out files with CRLF line endings rather than Unix-style LF only - git config --global core.autocrlf true - # scream if a transformation would be ambiguous - # (eg, a working file contains both naked LF and CRLF) - # and check them back in with the reverse transformation - git config --global core.safecrlf true - -Although the repository will usually contain a .gitignore file -specifying file names that rarely if ever should be registered in the -VCS, you may have personal conventions (e.g., always editing log -messages in a temporary file named ".msg") that you may wish to -specify.:: - - # tell git where my personal ignores are - git config --global core.excludesfile ~/.gitignore - # I use .msg for my long commit logs, and Emacs makes backups in - # files ending with ~ - # these are globs, not regular expressions - echo '*~' >> ~/.gitignore - echo '.msg' >> ~/.gitignore - -If you use multiple branches, as with the other VCSes, you can save a -lot of space by putting all objects in a common object store. This -also can save download time, if the origins of the branches were in -different repositories, because objects are shared across branches in -your repository even if they were not present in the upstream -repositories. git is very space- and time-efficient and applies a -number of optimizations automatically, so this configuration is -optional. (Examples are omitted.) - - -One-Off Checkout ----------------- - -As a non-core developer, I want to create and publish a one-off patch -that fixes a bug, so that a core developer can review it for -inclusion in the mainline. - -* Checkout/branch/clone trunk. -* Edit some code. -* Generate a patch (based on what is best supported by the VCS, e.g. - branch history). -* Receive reviewer comments and address the issues. -* Generate a second patch for the core developer to commit. - - -svn -''' -:: - - svn checkout http://svn.python.org/projects/python/trunk - cd trunk - # Edit some code. - echo "The cake is a lie!" > README - # Since svn lacks support for local commits, we fake it with patches. - svn diff >> commit-1.diff - svn diff >> patch-1.diff - # Upload the patch-1 to bugs.python.org. - # Receive reviewer comments. - # Edit some code. - echo "The cake is real!" > README - # Since svn lacks support for local commits, we fake it with patches. - svn diff >> commit-2.diff - svn diff >> patch-2.diff - # Upload patch-2 to bugs.python.org - - -bzr -''' -:: - - bzr branch http://code.python.org/python/trunk - cd trunk - # Edit some code. - bzr commit -m 'Stuff I did' - bzr send -o bundle - # Upload bundle to bugs.python.org - # Receive reviewer comments - # Edit some code - bzr commit -m 'Respond to reviewer comments' - bzr send -o bundle - # Upload updated bundle to bugs.python.org - -The ``bundle`` file is like a super-patch. It can be read by ``patch(1)`` but -it contains additional metadata so that it can be fed to ``bzr merge`` to -produce a fully usable branch completely with history. See `Patch Review`_ -section below. - - -hg -'' -:: - - hg clone http://code.python.org/hg/trunk - cd trunk - # Edit some code. - hg commit -m "Stuff I did" - hg outgoing -p > fixes.patch - # Upload patch to bugs.python.org - # Receive reviewer comments - # Edit some code - hg commit -m "Address reviewer comments." - hg outgoing -p > additional-fixes.patch - # Upload patch to bugs.python.org - -While ``hg outgoing`` does not have the flag for it, most Mercurial -commands support git's extended patch format through a ``--git`` -command. This can be set in one's ``.hgrc`` file so that all commands -that generate a patch use the extended format. - - -git -''' - -The patches could be created with -``git diff master > stuff-i-did.patch``, too, but -``git format-patch | git am`` knows some tricks -(empty files, renames, etc) that ordinary patch can't handle. git -grabs "Stuff I did" out of the the commit message to create the file -name 0001-Stuff-I-did.patch. See Patch Review below for a -description of the git-format-patch format. -:: - - # Get the mainline code. - git clone git://code.python.org/python/trunk - cd trunk - # Edit some code. - git commit -a -m 'Stuff I did.' - # Create patch for my changes (i.e, relative to master). - git format-patch master - git tag stuff-v1 - # Upload 0001-Stuff-I-did.patch to bugs.python.org. - # Time passes ... receive reviewer comments. - # Edit more code. - git commit -a -m 'Address reviewer comments.' - # Make an add-on patch to apply on top of the original. - git format-patch stuff-v1 - # Upload 0001-Address-reviewer-comments.patch to bugs.python.org. - - -Backing Out Changes -------------------- - -As a core developer, I want to undo a change that was not ready for -inclusion in the mainline. - -* Back out the unwanted change. -* Push patch to server. - - -svn -''' -:: - - # Assume the change to revert is in revision 40 - svn merge -c -40 . - # Resolve conflicts, if any. - svn commit -m "Reverted revision 40" - - -bzr -''' -:: - - # Assume the change to revert is in revision 40 - bzr merge -r 40..39 - # Resolve conflicts, if any. - bzr commit -m "Reverted revision 40" - -Note that if the change you want revert is the last one that was -made, you can just use ``bzr uncommit``. - - -hg -'' -:: - - # Assume the change to revert is in revision 9150dd9c6d30 - hg backout --merge -r 9150dd9c6d30 - # Resolve conflicts, if any. - hg commit -m "Reverted changeset 9150dd9c6d30" - hg push - -Note, you can use "hg rollback" and "hg strip" to revert changes you committed -in your local repository, but did not yet push to other repositories. - -git -''' -:: - - # Assume the change to revert is the grandfather of a revision tagged "newhotness". - git revert newhotness~2 - # Resolve conflicts if any. If there are no conflicts, the commit - # will be done automatically by "git revert", which prompts for a log. - git commit -m "Reverted changeset 9150dd9c6d30." - git push - - -Patch Review ------------- - -As a core developer, I want to review patches submitted by other -people, so that I can make sure that only approved changes are added -to Python. - -Core developers have to review patches as submitted by other people. -This requires applying the patch, testing it, and then tossing away -the changes. The assumption can be made that a core developer already -has a checkout/branch/clone of the trunk. - -* Branch off of trunk. -* Apply patch w/o any comments as generated by the patch submitter. -* Push patch to server. -* Delete now-useless branch. - - -svn -''' - -Subversion does not exactly fit into this development style very well -as there are no such thing as a "branch" as has been defined in this -PEP. Instead a developer either needs to create another checkout for -testing a patch or create a branch on the server. Up to this point, -core developers have not taken the "branch on the server" approach to -dealing with individual patches. For this scenario the assumption -will be the developer creates a local checkout of the trunk to work -with.:: - - cp -r trunk issue0000 - cd issue0000 - patch -p0 < __patch__ - # Review patch. - svn commit -m "Some patch." - cd .. - rm -r issue0000 - -Another option is to only have a single checkout running at any one -time and use ``svn diff`` along with ``svn revert -R`` to store away -independent changes you may have made. - - -bzr -''' -:: - - bzr branch trunk issueNNNN - # Download `patch` bundle from Roundup - bzr merge patch - # Review patch - bzr commit -m'Patch NNN by So N. So' --fixes python:NNNN - bzr push bzr+ssh://me@code.python.org/trunk - rm -rf ../issueNNNN - -Alternatively, since you're probably going to commit these changes to -the trunk, you could just do a checkout. That would give you a local -working tree while the branch (i.e. all revisions) would continue to -live on the server. This is similar to the svn model and might allow -you to more quickly review the patch. There's no need for the push -in this case.:: - - bzr checkout trunk issueNNNN - # Download `patch` bundle from Roundup - bzr merge patch - # Review patch - bzr commit -m'Patch NNNN by So N. So' --fixes python:NNNN - rm -rf ../issueNNNN - - -hg -'' -:: - - hg clone trunk issue0000 - cd issue0000 - # If the patch was generated using hg export, the user name of the - # submitter is automatically recorded. Otherwise, - # use hg import --no-commit submitted.diff and commit with - # hg commit -u "Firstname Lastname " - hg import submitted.diff - # Review patch. - hg push ssh://alexandre@code.python.org/hg/trunk/ - - -git -''' -We assume a patch created by git-format-patch. This is a Unix mbox -file containing one or more patches, each formatted as an RFC 2822 -message. git-am interprets each message as a commit as follows. The -author of the patch is taken from the From: header, the date from the -Date header. The commit log is created by concatenating the content -of the subject line, a blank line, and the message body up to the -start of the patch.:: - - cd trunk - # Create a branch in case we don't like the patch. - # This checkout takes zero time, since the workspace is left in - # the same state as the master branch. - git checkout -b patch-review - # Download patch from bugs.python.org to submitted.patch. - git am < submitted.patch - # Review and approve patch. - # Merge into master and push. - git checkout master - git merge patch-review - git push - - -Backport --------- - -As a core developer, I want to apply a patch to 2.6, 2.7, 3.0, and 3.1 -so that I can fix a problem in all three versions. - -Thanks to always having the cutting-edge and the latest release -version under development, Python currently has four branches being -worked on simultaneously. That makes it important for a change to -propagate easily through various branches. - -svn -''' - -Because of Python's use of svnmerge, changes start with the trunk -(2.7) and then get merged to the release version of 2.6. To get the -change into the 3.x series, the change is merged into 3.1, fixed up, -and then merged into 3.0 (2.7 -> 2.6; 2.7 -> 3.1 -> 3.0). - -This is in contrast to a port-forward strategy where the patch would -have been added to 2.6 and then pulled forward into newer versions -(2.6 -> 2.7 -> 3.0 -> 3.1). - -:: - - # Assume patch applied to 2.7 in revision 0000. - cd release26-maint - svnmerge merge -r 0000 - # Resolve merge conflicts and make sure patch works. - svn commit -F svnmerge-commit-message.txt # revision 0001. - cd ../py3k - svnmerge merge -r 0000 - # Same as for 2.6, except Misc/NEWS changes are reverted. - svn revert Misc/NEWS - svn commit -F svnmerge-commit-message.txt # revision 0002. - cd ../release30-maint - svnmerge merge -r 0002 - svn commit -F svnmerge-commit-message.txt # revision 0003. - - -bzr -''' - -Bazaar is pretty straightforward here, since it supports cherry -picking revisions manually. In the example below, we could have -given a revision id instead of a revision number, but that's usually -not necessary. Martin Pool suggests "We'd generally recommend doing -the fix first in the oldest supported branch, and then merging it -forward to the later releases.":: - - # Assume patch applied to 2.7 in revision 0000 - cd release26-maint - bzr merge ../trunk -c 0000 - # Resolve conflicts and make sure patch works - bzr commit -m 'Back port patch NNNN' - bzr push bzr+ssh://me@code.python.org/trunk - cd ../py3k - bzr merge ../trunk -r 0000 - # Same as for 2.6 except Misc/NEWS changes are reverted - bzr revert Misc/NEWS - bzr commit -m 'Forward port patch NNNN' - bzr push bzr+ssh://me@code.python.org/py3k - - -hg -'' - -Mercurial, like other DVCS, does not well support the current -workflow used by Python core developers to backport patches. Right -now, bug fixes are first applied to the development mainline -(i.e., trunk), then back-ported to the maintenance branches and -forward-ported, as necessary, to the py3k branch. This workflow -requires the ability to cherry-pick individual changes. Mercurial's -transplant extension provides this ability. Here is an example of -the scenario using this workflow:: - - cd release26-maint - # Assume patch applied to 2.7 in revision 0000 - hg transplant -s ../trunk 0000 - # Resolve conflicts, if any. - cd ../py3k - hg pull ../trunk - hg merge - hg revert Misc/NEWS - hg commit -m "Merged trunk" - hg push - -In the above example, transplant acts much like the current svnmerge -command. When transplant is invoked without the revision, the command -launches an interactive loop useful for transplanting multiple -changes. Another useful feature is the --filter option which can be -used to modify changesets programmatically (e.g., it could be used -for removing changes to Misc/NEWS automatically). - -Alternatively to the traditional workflow, we could avoid -transplanting changesets by committing bug fixes to the oldest -supported release, then merge these fixes upward to the more recent -branches. -:: - - cd release25-maint - hg import fix_some_bug.diff - # Review patch and run test suite. Revert if failure. - hg push - cd ../release26-maint - hg pull ../release25-maint - hg merge - # Resolve conflicts, if any. Then, review patch and run test suite. - hg commit -m "Merged patches from release25-maint." - hg push - cd ../trunk - hg pull ../release26-maint - hg merge - # Resolve conflicts, if any, then review. - hg commit -m "Merged patches from release26-maint." - hg push - -Although this approach makes the history non-linear and slightly -more difficult to follow, it encourages fixing bugs across all -supported releases. Furthermore, it scales better when there is many -changes to backport, because we do not need to seek the specific -revision IDs to merge. - - -git -''' - -In git I would have a workspace which contains all of -the relevant master repository branches. git cherry-pick doesn't -work across repositories; you need to have the branches in the same -repository. -:: - - # Assume patch applied to 2.7 in revision release27~3 (4th patch back from tip). - cd integration - git checkout release26 - git cherry-pick release27~3 - # If there are conflicts, resolve them, and commit those changes. - # git commit -a -m "Resolve conflicts." - # Run test suite. If fixes are necessary, record as a separate commit. - # git commit -a -m "Fix code causing test failures." - git checkout master - git cherry-pick release27~3 - # Do any conflict resolution and test failure fixups. - # Revert Misc/NEWS changes. - git checkout HEAD^ -- Misc/NEWS - git commit -m 'Revert cherry-picked Misc/NEWS changes.' Misc/NEWS - # Push both ports. - git push release26 master - -If you are regularly merging (rather than cherry-picking) from a -given branch, then you can block a given commit from being -accidentally merged in the future by merging, then reverting it. -This does not prevent a cherry-pick from pulling in the unwanted -patch, and this technique requires blocking everything that you don't -want merged. I'm not sure if this differs from svn on this point. -:: - - cd trunk - # Merge in the alpha tested code. - git merge experimental-branch - # We don't want the 3rd-to-last commit from the experimental-branch, - # and we don't want it to ever be merged. - # The notation "^N" means Nth parent of the current commit. Thus HEAD^2^1^1 - # means the first parent of the first parent of the second parent of HEAD. - git revert HEAD^2^1^1 - # Propagate the merge and the prohibition to the public repository. - git push - - -Coordinated Development of a New Feature ----------------------------------------- - -Sometimes core developers end up working on a major feature with -several developers. As a core developer, I want to be able to -publish feature branches to a common public location so that I can -collaborate with other developers. - -This requires creating a branch on a server that other developers -can access. All of the DVCSs support creating new repositories on -hosts where the developer is already able to commit, with -appropriate configuration of the repository host. This is -similar in concept to the existing sandbox in svn, although details -of repository initialization may differ. - -For non-core developers, there are various more-or-less public-access -repository-hosting services. -Bazaar has -Launchpad_, -Mercurial has -`bitbucket.org`_, -and git has -GitHub_. -All also have easy-to-use -CGI interfaces for developers who maintain their own servers. - - -.. _Launchpad: http://www.launchpad.net/ -.. _bitbucket.org: http://www.bitbucket.org/ -.. _GitHub: http://www.github.com/ - -* Branch trunk. -* Pull from branch on the server. -* Pull from trunk. -* Push merge to trunk. - - -svn -''' -:: - - # Create branch. - svn copy svn+ssh://pythondev@svn.python.org/python/trunk svn+ssh://pythondev@svn.python.org/python/branches/NewHotness - svn checkout svn+ssh://pythondev@svn.python.org/python/branches/NewHotness - cd NewHotness - svnmerge init - svn commit -m "Initialize svnmerge." - # Pull in changes from other developers. - svn update - # Pull in trunk and merge to the branch. - svnmerge merge - svn commit -F svnmerge-commit-message.txt - - -bzr -''' -:: - - XXX To be done by Brett as a test of knowledge and online documentation/community. - - -hg -'' -:: - - XXX To be done by Brett as a test of knowledge and online documentation/community. - - -git -''' -:: - - XXX To be done by Brett as a test of knowledge and online documentation/community. - - -Separation of Issue Dependencies --------------------------------- - -Sometimes, while working on an issue, it becomes apparent that the -problem being worked on is actually a compound issue of various -smaller issues. Being able to take the current work and then begin -working on a separate issue is very helpful to separate out issues -into individual units of work instead of compounding them into a -single, large unit. - -* Create a branch A (e.g. urllib has a bug). -* Edit some code. -* Create a new branch B that branch A depends on (e.g. the urllib - bug exposes a socket bug). -* Edit some code in branch B. -* Commit branch B. -* Edit some code in branch A. -* Commit branch A. -* Clean up. - - -svn -''' - -To make up for svn's lack of cheap branching, it has a changelist -option to associate a file with a single changelist. This is not as -powerful as being able to associate at the commit level. There is -also no way to express dependencies between changelists. -:: - - cp -r trunk issue0000 - cd issue0000 - # Edit some code. - echo "The cake is a lie!" > README - svn changelist A README - # Edit some other code. - echo "I own Python!" . LICENSE - svn changelist B LICENSE - svn ci -m "Tell it how it is." --changelist B - # Edit changelist A some more. - svn ci -m "Speak the truth." --changelist A - cd .. - rm -rf issue0000 - - -bzr -''' -Here's an approach that uses bzr shelf (now a standard part of bzr) -to squirrel away some changes temporarily while you take a detour to -fix the socket bugs. -:: - - bzr branch trunk bug-0000 - cd bug-0000 - # Edit some code. Dang, we need to fix the socket module. - bzr shelve --all - # Edit some code. - bzr commit -m "Socket module fixes" - # Detour over, now resume fixing urllib - bzr unshelve - # Edit some code - -Another approach uses the loom plugin. Looms can -greatly simplify working on dependent branches because they -automatically take care of the stacking dependencies for you. -Imagine looms as a stack of dependent branches (called "threads" in -loom parlance), with easy ways to move up and down the stack of -threads, merge changes up the stack to descendant threads, create -diffs between threads, etc. Occasionally, you may need or want to -export your loom threads into separate branches, either for review -or commit. Higher threads incorporate all the changes in the lower -threads, automatically. -:: - - bzr branch trunk bug-0000 - cd bug-0000 - bzr loomify --base trunk - bzr create-thread fix-urllib - # Edit some code. Dang, we need to fix the socket module first. - bzr commit -m "Checkpointing my work so far" - bzr down-thread - bzr create-thread fix-socket - # Edit some code - bzr commit -m "Socket module fixes" - bzr up-thread - # Manually resolve conflicts if necessary - bzr commit -m 'Merge in socket fixes' - # Edit me some more code - bzr commit -m "Now that socket is fixed, complete the urllib fixes" - bzr record done - -For bonus points, let's say someone else fixes the socket module in -exactly the same way you just did. Perhaps this person even grabbed your -fix-socket thread and applied just that to the trunk. You'd like to -be able to merge their changes into your loom and delete your -now-redundant fix-socket thread. -:: - - bzr down-thread trunk - # Get all new revisions to the trunk. If you've done things - # correctly, this will succeed without conflict. - bzr pull - bzr up-thread - # See? The fix-socket thread is now identical to the trunk - bzr commit -m 'Merge in trunk changes' - bzr diff -r thread: | wc -l # returns 0 - bzr combine-thread - bzr up-thread - # Resolve any conflicts - bzr commit -m 'Merge trunk' - # Now our top-thread has an up-to-date trunk and just the urllib fix. - - -hg -'' - -One approach is to use the shelve extension; this extension is not included -with Mercurial, but it is easy to install. With shelve, you can select changes -to put temporarily aside. -:: - - hg clone trunk issue0000 - cd issue0000 - # Edit some code (e.g. urllib). - hg shelve - # Select changes to put aside - # Edit some other code (e.g. socket). - hg commit - hg unshelve - # Complete initial fix. - hg commit - cd ../trunk - hg pull ../issue0000 - hg merge - hg commit - rm -rf ../issue0000 - -Several other way to approach this scenario with Mercurial. Alexander Solovyov -presented a few `alternative approaches`_ on Mercurial's mailing list. - -.. _alternative approaches: http://selenic.com/pipermail/mercurial/2009-January/023710.html - -git -''' -:: - - cd trunk - # Edit some code in urllib. - # Discover a bug in socket, want to fix that first. - # So save away our current work. - git stash - # Edit some code, commit some changes. - git commit -a -m "Completed fix of socket." - # Restore the in-progress work on urllib. - git stash apply - # Edit me some more code, commit some more fixes. - git commit -a -m "Complete urllib fixes." - # And push both patches to the public repository. - git push - -Bonus points: suppose you took your time, and someone else fixes -socket in the same way you just did, and landed that in the trunk. In -that case, your push will fail because your branch is not up-to-date. -If the fix was a one-liner, there's a very good chance that it's -*exactly* the same, character for character. git would notice that, -and you are done; git will silently merge them. - -Suppose we're not so lucky:: - - # Update your branch. - git pull git://code.python.org/public/trunk master - - # git has fetched all the necessary data, but reports that the - # merge failed. We discover the nearly-duplicated patch. - # Neither our version of the master branch nor the workspace has - # been touched. Revert our socket patch and pull again: - git revert HEAD^ - git pull git://code.python.org/public/trunk master - -Like Bazaar and Mercurial, git has extensions to manage stacks of -patches. You can use the original Quilt by Andrew Morton, or there is -StGit ("stacked git") which integrates patch-tracking for large sets -of patches into the VCS in a way similar to Mercurial Queues or Bazaar -looms. - - -Doing a Python Release ----------------------- - -How does PEP 101 change when using a DVCS? - - -bzr -''' - -It will change, but not substantially so. When doing the -maintenance branch, we'll just push to the new location instead of -doing an svn cp. Tags are totally different, since in svn they are -directory copies, but in bzr (and I'm guessing hg), they are just -symbolic names for revisions on a particular branch. The release.py -script will have to change to use bzr commands instead. It's -possible that because DVCS (in particular, bzr) does cherry picking -and merging well enough that we'll be able to create the maint -branches sooner. It would be a useful exercise to try to do a -release off the bzr/hg mirrors. - - -hg -'' - -Clearly, details specific to Subversion in PEP 101 and in the -release script will need to be updated. In particular, release -tagging and maintenance branches creation process will have to be -modified to use Mercurial's features; this will simplify and -streamline certain aspects of the release process. For example, -tagging and re-tagging a release will become a trivial operation -since a tag, in Mercurial, is simply a symbolic name for a given -revision. - - -git -''' - -It will change, but not substantially so. When doing the -maintenance branch, we'll just git push to the new location instead -of doing an svn cp. Tags are totally different, since in svn they -are directory copies, but in git they are just symbolic names for -revisions, as are branches. (The difference between a tag and a -branch is that tags refer to a particular commit, and will never -change unless you use git tag -f to force them to move. The -checked-out branch, on the other hand, is automatically updated by -git commit.) The release.py script will have to change to use git -commands instead. With git I would create a (local) maintenance -branch as soon as the release engineer is chosen. Then I'd "git -pull" until I didn't like a patch, when it would be "git pull; git -revert ugly-patch", until it started to look like the sensible thing -is to fork off, and start doing "git cherry-pick" on the good -patches. - - -Platform/Tool Support -===================== - -Operating Systems ------------------ -==== ======================================= ============================================= ============================= -DVCS Windows OS X UNIX ----- --------------------------------------- --------------------------------------------- ----------------------------- -bzr yes (installer) w/ tortoise yes (installer, fink or MacPorts) yes (various package formats) -hg yes (third-party installer) w/ tortoise yes (third-party installer, fink or MacPorts) yes (various package formats) -git yes (third-party installer) yes (third-party installer, fink or MacPorts) yes (.deb or .rpm) -==== ======================================= ============================================= ============================= - -As the above table shows, all three DVCSs are available on all three -major OS platforms. But what it also shows is that Bazaar is the -only DVCS that directly supports Windows with a binary installer -while Mercurial and git require you to rely on a third-party for -binaries. Both bzr and hg have a tortoise version while git does not. - -Bazaar and Mercurial also has the benefit of being available in pure -Python with optional extensions available for performance. - - -CRLF -> LF Support ------------------- - -bzr - My understanding is that support for this is being worked on as - I type, landing in a version RSN. I will try to dig up details. - -hg - Supported via the win32text extension. - -git - I can't say from personal experience, but it looks like there's - pretty good support via the core.autocrlf and core.safecrlf - configuration attributes. - - -Case-insensitive filesystem support ------------------------------------ - -bzr - Should be OK. I share branches between Linux and OS X all the - time. I've done case changes (e.g. ``bzr mv Mailman mailman``) and - as long as I did it on Linux (obviously), when I pulled in the - changes on OS X everything was hunky dory. - -hg - Mercurial uses a case safe repository mechanism and detects case - folding collisions. - -git - Since OS X preserves case, you can do case changes there too. - git does not have a problem with renames in either direction. - However, case-insensitive filesystem support is usually taken - to mean complaining about collisions on case-sensitive files - systems. git does not do that. - - -Tools ------ - -In terms of code review tools such as `Review Board`_ and Rietveld_, -the former supports all three while the latter supports hg and git but -not bzr. Bazaar does not yet have an online review board, but it -has several ways to manage email based reviews and trunk merging. -There's `Bundle Buggy`_, `Patch Queue Manager`_ (PQM), and -`Launchpad's code reviews `_. - -.. _Review Board: http://www.review-board.org/ -.. _Rietveld: http://code.google.com/p/rietveld/ - -.. _Bundle Buggy: http://code.aaronbentley.com/bundlebuggy/ -.. _Patch Queue Manager: http://bazaar-vcs.org/PatchQueueManager - -All three have some web site online that provides basic hosting -support for people who want to put a repository online. Bazaar has -Launchpad, Mercurial has bitbucket.org, and git has GitHub. Google -Code also has instructions on how to use git with the service, both -to hold a repository and how to act as a read-only mirror. - -All three also `appear to be supported -`_ -by Buildbot_. - -.. _Buildbot: http://buildbot.net - - -Usage On Top Of Subversion -========================== - -==== ============ -DVCS svn support ----- ------------ -bzr bzr-svn_ (third-party) -hg `multiple third-parties `__ -git git-svn_ -==== ============ - -.. _bzr-svn: http://bazaar-vcs.org/BzrForeignBranches/Subversion -.. _git-svn: http://www.kernel.org/pub/software/scm/git/docs/git-svn.html - -All three DVCSs have svn support, although git is the only one to -come with that support out-of-the-box. - - -Server Support +Choice of DVCS ============== -==== ================== -DVCS Web page interface ----- ------------------ -bzr loggerhead_ -hg hgweb_ -git gitweb_ -==== ================== +This PEP included a thorough investigation of three DVCSs as options for +migration, with substantial work from Barry Warsaw, Alexandre Vassalotti and +Stephen Turnbull. That comparison has been moved to `DvcsComparison`_, and +this PEP now includes more information on the migration to Mercurial. -.. _loggerhead: https://launchpad.net/loggerhead -.. _hgweb: http://www.selenic.com/mercurial/wiki/index.cgi/HgWebDirStepByStep -.. _gitweb: http://git.or.cz/gitwiki/Gitweb +.. _DvcsComparison: http://wiki.python.org/moin/DvcsComparison -All three DVCSs support various hooks on the client and server side -for e.g. pre/post-commit verifications. - - -Development -=========== - -All three projects are under active development. Git seems to be on a -monthly release schedule. Bazaar is on a time-released monthly -schedule. Mercurial is on a 4-month, timed release schedule. - - -Special Features -================ - -bzr ---- - -Martin Pool adds: "bzr has a stable Python scripting interface, with -a distinction between public and private interfaces and a -deprecation window for APIs that are changing. Some plugins are -listed in https://edge.launchpad.net/bazaar and -http://bazaar-vcs.org/Documentation". - - -hg --- - -Alexander Solovyov comments: - - Mercurial has easy to use extensive API with hooks for main events - and ability to extend commands. Also there is the mq (mercurial - queues) extension, distributed with Mercurial, which simplifies - work with patches. - - -git ---- - -git has a cvsserver mode, ie, you can check out a tree from git -using CVS. You can even commit to the tree, but features like -merging are absent, and branches are handled as CVS modules, which -is likely to shock a veteran CVS user. - - -Tests/Impressions -================= - -As I (Brett Cannon) am left with the task of of making the final -decision of which/any DVCS to go with and not my co-authors, I felt -it only fair to write down what tests I ran and my impressions as I -evaluate the various tools so as to be as transparent as possible. - - -Barrier to Entry ----------------- - -The amount of time and effort it takes to get a checkout of Python's -repository is critical. If the difficulty or time is too great then a -person wishing to contribute to Python may very well give up. That -cannot be allowed to happen. - -I measured the checking out of the 2.x trunk as if I was a non-core -developer. Timings were done using the ``time`` command in zsh and -space was calculated with ``du -c -h``. - -======= ================ ========= ===== -DVCS San Francisco Vancouver Space -------- ---------------- --------- ----- -svn 1:04 2:59 139 M -bzr 10:45 16:04 276 M -hg 2:30 5:24 171 M -git 2:54 5:28 134 M -======= ================ ========= ===== - -When comparing these numbers to svn, it is important to realize that -it is not a 1:1 comparison. Svn does not pull down the entire revision -history like all of the DVCSs do. That means svn can perform an -initial checkout much faster than the DVCS purely based on the fact -that it has less information to download for the network. - - -Performance of basic information functionality ----------------------------------------------- - -To see how the tools did for performing a command that required -querying the history, the log for the ``README`` file was timed. - -==== ===== -DVCS Time ----- ----- -bzr 4.5 s -hg 1.1 s -git 1.5 s -==== ===== - -One thing of note during this test was that git took longer than the -other three tools to figure out how to get the log without it using a -pager. While the pager use is a nice touch in general, not having it -automatically turn on took some time (turns out the main ``git`` -command has a ``--no-pager`` flag to disable use of the pager). - - -Figuring out what command to use from built-in help ----------------------------------------------------- - -I ended up trying to find out what the command was to see what URL the -repository was cloned from. To do this I used nothing more than the -help provided by the tool itself or its man pages. - -Bzr was the easiest: ``bzr info``. Running ``bzr help`` didn't show -what I wanted, but mentioned ``bzr help commands``. That list had the -command with a description that made sense. - -Git was the second easiest. The command ``git help`` didn't show much -and did not have a way of listing all commands. That is when I viewed -the man page. Reading through the various commands I discovered ``git -remote``. The command itself spit out nothing more than ``origin``. -Trying ``git remote origin`` said it was an error and printed out the -command usage. That is when I noticed ``git remote show``. Running -``git remote show origin`` gave me the information I wanted. - -For hg, I never found the information I wanted on my own. It turns out -I wanted ``hg paths``, but that was not obvious from the description -of "show definition of symbolic path names" as printed by ``hg help`` -(it should be noted that reporting this in the PEP did lead to the -Mercurial developers to clarify the wording to make the use of the -``hg paths`` command clearer). - - -Updating a checkout ---------------------- - -To see how long it takes to update an outdated repository I timed both -updating a repository 700 commits behind and 50 commits behind (three -weeks stale and 1 week stale, respectively). - -==== =========== ========== -DVCS 700 commits 50 commits ----- ----------- ---------- -bzr 39 s 7 s -hg 17 s 3 s -git N/A 4 s -==== =========== ========== - -.. note:: - Git lacks a value for the *700 commits* scenario as it does - not seem to allow checking out a repository at a specific - revision. - -Git deserves special mention for its output from ``git pull``. It -not only lists the delta change information for each file but also -color-codes the information. - - -XXX ... usage on top of svn, filling in `Coordinated Development of a -New Feature`_ scenario - - - -Chosen DVCS -=========== - -The `decision +At PyCon 2009, a `decision `_ -was made at PyCon 2009 to go with Mercurial_. +was made to go with Mercurial. -XXX details as to why +The choice to go with Mercurial was made for three important reasons: + +* According to a small survey, Python developers are more interested in + using Mercurial than in Bazaar or Git. + +* Mercurial is written in Python, which is congruent with the python-dev + tendency to 'eat their own dogfood'. + +* Mercurial is significantly faster than bzr (it's slower than git, though + by a much smaller difference). + +* Mercurial is easier to learn for SVN users than bzr. + +Although all of these points can be debated, in the end a pronouncement from +the BDFL was made to go with hg as the chosen DVCS for the Python project. Transition Plan =============== -XXX + +Introduction +------------ + +To make the most of hg, I (Dirkjan) want to make a high-fidelity conversion, +such that (a) as much of the svn metadata as possible is retained, and (b) all +metadata is converted to formats that are common in Mercurial. This way, tools +written for Mercurial can be optimally used. In order to do this, I want to use +the `hgsubversion `_ software to do +an initial conversion. This hg extension is focused on providing high-quality +conversion from Subversion to Mercurial for use in two-way correspondence, +meaning it doesn't throw away as much available metadata as other solutions. + +Such a conversion also seems like a good time to reconsider the contents of +the repository and determine if some things are still valuable. In this spirit, +in the following sections I propose discarding some of the older metadata. + +Branch strategy +--------------- + +Mercurial has two basic ways of using branches: cloned branches, where each +branch is kept in a separate directory, and named branches, where each revision +keeps metadata to note on which branch it belongs. The former makes it easier +to distinguish branches, at the expense of requiring more disk space on the +client. The latter makes it a little easier to switch between branches, but +often has somewhat unintuitive results for people (though this has been +getting better in recent versions of Mercurial). + +For Python, I think it would work well to have cloned branches and keep most +things separate. This is predicated on the assumption that most people work on +just one (or maybe two) branches at a time. Branches can be exposed separately, +though I would advocate merging old (and tagged!) branches into mainline so +that people can easily revert to older releases. At what age of a release this +should be done can be debated (a natural point might be when the branch gets +unsupported, e.g. 2.4 at the release of 2.6). + +Converting branches +------------------- + +There are quite a lot of branches in SVN's branches directory. I propose to +clean this up a bit, by employing the following the strategy: + +* Keep all release (maintenance) branches +* Discard branches that haven't been touched in 18 months, unless somone + indicates there's still interest in such a branch +* Keep branches that have been touched in the last 18 months, unless someone + indicates the branch can be deprecated + +Converting tags +--------------- + +The SVN tags directory contains a lot of old stuff. Some of these are not, in +fact, full tags, but contain only a smaller subset of the repository. I think +we should keep all release tags, and consider other tags for inclusion based +on requests from the developer community. I'd like to consider unifying the +release tag naming scheme to make some things more consistent, if people feel +that won't create too many problems. + +Author map +---------- + +In order to provide user names the way they are common in hg (in the 'First Last +' format), we need an author map to map cvs and svn user +names to real names and their email addresses. I have a complete version of such +a map in my `migration tools repository`_. The email addresses in it might be +out of date; that's bound to happen, although it would be nice to try and +have as many people as possible review it for addresses that are out of date. +The current version also still seems to contain some encoding problems. + +.. _migration tools repository: http://hg.xavamedia.nl/cpython/pymigr/ + +Generating .hgignore +-------------------- + +The .hgignore file can be used in Mercurial repositories to help ignore files +that are not eligible for version control. It does this by employing several +possible forms of pattern matching. The current Python repository already +includes a rudimentary .hgignore file to help with using the hg mirrors. + +It might be useful to have the .hgignore be generated automatically from +svn:ignore properties. This would make sure all historic revisions also have +useful ignore information (though one could argue ignoring isn't really +relevant to just checking out an old revision). + +Revlog reordering +----------------- + +As an optional optimization technique, we should consider trying a reordering +pass on the revlogs (internal Mercurial files) resulting from the conversion. +In some cases this results in dramatic decreases in on-disk repository size. + +Other repositories +------------------ + +Richard Tew has indicated that he'd like the Stackless repository to also be +converted. What other projects in the svn.python.org repository should be +converted? Do we want to convert the peps repository? distutils? others? + + +Infrastructure +============== + +hg-ssh +------ + +Developers should access the repositories through ssh, similar to the current +setup. Public keys can be used to grant people access to a shared hg@ account. +A hgwebdir instance should also be set up for easy browsing and read-only +access. Some facility for sandboxes/incubator repositories could be discussed. + +Hooks +----- + +A number of hooks is currently in use. The hg equivalents for these should be +developed and deployed. The following hooks are being used: + +* check whitespace: a hook to reject commits in case the whitespace doesn't + match the rules for the Python codebase. Should be straightforward to + re-implement from the current version. Open issue: do we check only the tip + after each push, or do we check every commit in a changegroup? + +* commit mails: we can leverage the notify extension for this + +* buildbots: both the regular and the community build masters must be notified. + Fortunately buildbot includes support for hg. I've also implemented this for + Mercurial itself, so I don't expect problems here. + +* check contributors: in the current setup, all changesets bear the username of + committers, who must have signed the contributor agreement. In a DVCS, the + committers are not necessarily the same people who push, and so we can't + check if the committer is a contributor. We could use a hook to check if the + committer is a contributor if we keep a list of registered contributors. + +hgwebdir +-------- + +A more or less stock hgwebdir installation should be set up. We might want to +come up with a style to match the Python website. It may also be useful to +build a quick extension to augment the URL rev parser so that it can also take +r[0-9]+ args and come up with the matching hg revision.