Update PEP 385 with current status.

This commit is contained in:
Georg Brandl 2010-11-19 17:12:24 +00:00
parent 5e2069ee4c
commit d055d3039d
1 changed files with 266 additions and 185 deletions

View File

@ -8,32 +8,32 @@ Type: Process
Content-Type: text/x-rst
Created: 25-May-2009
.. warning::
This PEP is in the draft stages.
Motivation
==========
After having decided to switch to the Mercurial DVCS, the actual migration
still has to be performed. In the case of an important piece of
infrastructure like the version control system for a large, distributed
project like Python, this is a significant effort. This PEP is an attempt
to describe the steps that must be taken for further discussion. It's
somewhat similar to `PEP 347`_, which discussed the migration to SVN.
After having decided to switch to the Mercurial DVCS, the actual
migration still has to be performed. In the case of an important
piece of infrastructure like the version control system for a large,
distributed project like Python, this is a significant effort. This
PEP is an attempt to describe the steps that must be taken for further
discussion. It's somewhat similar to `PEP 347`_, which discussed the
migration to SVN.
To make the most of hg, I (Dirkjan) would like to make a high-fidelity
conversion, such that (a) as much of the svn metadata as possible is
retained, and (b) all metadata is converted to formats that are common in
Mercurial. This way, tools written for Mercurial can be optimally used. In
order to do this, I want to use the `hgsubversion`_ software to do an initial
conversion. This hg extension is focused on providing high-quality conversion
from Subversion to Mercurial for use in two-way correspondence, meaning it
doesn't throw away as much available metadata as other solutions.
retained, and (b) all metadata is converted to formats that are common
in Mercurial. This way, tools written for Mercurial can be optimally
used. In order to do this, I want to use the `hgsubversion`_ software
to do an initial conversion. This hg extension is focused on
providing high-quality conversion from Subversion to Mercurial for use
in two-way correspondence, meaning it doesn't throw away as much
available metadata as other solutions.
Such a conversion also seems like a good time to reconsider the contents of
the repository and determine if some things are still valuable. In this spirit,
the following sections also propose discarding some of the older metadata.
Such a conversion also seems like a good time to reconsider the
contents of the repository and determine if some things are still
valuable. In this spirit, the following sections also propose
discarding some of the older metadata.
.. _PEP 347: http://www.python.org/dev/peps/pep-0347/
.. _hgsubversion: http://bitbucket.org/durin42/hgsubversion/
@ -42,7 +42,31 @@ the following sections also propose discarding some of the older metadata.
Timeline
========
TBD; needs fully working hgsubversion and consensus on this document.
The current schedule for conversion milestones:
- 2010-11-20: availability of a test repo at hg.python.org
Test commits will be allowed (and encouraged) from all committers to
the Subversion repository. The test repository and all test commits
will be removed once the final conversion is done. The server-side
hooks will be installed for the test repository, in order to test
buildbot, diff-email and whitespace checking integration.
- 2010-12-12: final conversion (tentative)
Commits to the Subversion branches now maintained in Mercurial will
be blocked. Developers should refrain from pushing to the Mercurial
repositories until all infrastructure is ensured to work after their
switch over to the new repository.
Todo list
=========
The current list of issues to resolve at various steps in the
conversion is kept `in the pymigr repo`_.
.. _in the pymigr repo: http://hg.python.org/pymigr/file/tip/todo.txt
Transition plan
@ -51,51 +75,58 @@ Transition plan
Branch strategy
---------------
Mercurial has two basic ways of using branches: cloned branches, where each
branch is kept in a separate repository, and named branches, where each revision
keeps metadata to note on which branch it belongs. The former makes it easier
to distinguish branches, at the expense of requiring more disk space on the
client. The latter makes it a little easier to switch between branches, but
often has somewhat unintuitive results for people (though this has been
getting better in recent versions of Mercurial).
Mercurial has two basic ways of using branches: cloned branches, where
each branch is kept in a separate repository, and named branches,
where each revision keeps metadata to note on which branch it belongs.
The former makes it easier to distinguish branches, at the expense of
requiring more disk space on the client. The latter makes it a little
easier to switch between branches, but often has somewhat unintuitive
results for people (though this has been getting better in recent
versions of Mercurial).
The current proposal is to use named branches for release branches and adopt
cloned branches for feature branches, with one exception to this rule: the 3.x
branches will be kept in separate clones from the 2.x branches. I think this
provides an optimal hybrid approach for Python's uses of branching.
The current proposal is to use named branches for release branches and
adopt cloned branches for feature branches, with one exception to this
rule: the 3.x branches will be kept in separate clones from the 2.x
branches. I think this provides an optimal hybrid approach for
Python's uses of branching.
Differences between named branches and cloned branches:
* Tags in a different (maintenance) clone aren't available in the local clone
* Clones with named branches will be larger, since they contain more data
* Tags in a different (maintenance) clone aren't available in the
local clone
* Clones with named branches will be larger, since they contain more
data
(The Mercurial book discourages the use of named branches, but it is, in this
respect, somewhat outdated. Named branches have gotten much easier to use
since that comment was written, due to improvements in hg.)
(The Mercurial book discourages the use of named branches, but it is,
in this respect, somewhat outdated. Named branches have gotten much
easier to use since that comment was written, due to improvements in
hg.)
Converting branches
-------------------
There are quite a lot of branches in SVN's branches directory. I propose to
clean this up a bit, by following this basic strategy:
There are quite a lot of branches in SVN's branches directory. I
propose to clean this up a bit, by following this basic strategy:
* Keep all release (maintenance) branches
* Discard branches that haven't been touched in 18 months, unless somone
indicates there's still interest in such a branch
* Keep branches that have been touched in the last 18 months, unless someone
indicates the branch can be deprecated
* Discard branches that haven't been touched in 18 months, unless
somone indicates there's still interest in such a branch
* Keep branches that have been touched in the last 18 months, unless
someone indicates the branch can be deprecated
There's a `branch map`_ available that shows info about each branch:
* keep-clone means we'll keep that branch in a separate clone
* keep-named means we'll keep that branch as a named branch in one of the clones
* keep-named means we'll keep that branch as a named branch in one of
the clones
* strip means we won't keep that branch
* streamed-merge means that it got merged by committing several new revisions
to the other branch
* streamed-merge means that it got merged by committing several new
revisions to the other branch
* merged-r* means the branch got merged in the named revision
* merges? means I haven't checked/found out yet whether that branch was ever
merged
* ? means that your input would be even more helpful than for the other items
* merges? means I haven't checked/found out yet whether that branch
was ever merged
* ? means that your input would be even more helpful than for the
other items
* some items have no action yet, feel free to treat that as just '?'
.. _branch map: http://hg.python.org/pymigr/file/tip/all-branches.txt
@ -103,62 +134,69 @@ There's a `branch map`_ available that shows info about each branch:
Converting tags
---------------
The SVN tags directory contains a lot of old stuff. Some of these are not, in
fact, full tags, but contain only a smaller subset of the repository. I think
we should keep all release tags, and consider other tags for inclusion based
on requests from the developer community. I'd like to consider unifying the
release tag naming scheme to make some things more consistent, if people feel
that won't create too many problems. The current proposal is to bring old
release tags in line with the current practice of release tag naming.
The SVN tags directory contains a lot of old stuff. Some of these are
not, in fact, full tags, but contain only a smaller subset of the
repository. All release tags will be kept; other tags will be
included based on requests from the developer community. I'd like to
consider unifying the release tag naming scheme to make some things
more consistent, if people feel that won't create too many problems.
The current proposal is to bring old release tags in line with the
current practice of release tag naming.
Author map
----------
In order to provide user names the way they are common in hg (in the 'First Last
<user@example.org>' format), we need an author map to map cvs and svn user
names to real names and their email addresses. I have a complete version of such
a map in my `migration tools repository`_. The email addresses in it might be
out of date; that's bound to happen, although it would be nice to try and
have as many people as possible review it for addresses that are out of date.
The current version also still seems to contain some encoding problems.
In order to provide user names the way they are common in hg (in the
'First Last <user@example.org>' format), we need an author map to map
cvs and svn user names to real names and their email addresses. I
have a complete version of such a map in my `migration tools
repository`_. The email addresses in it might be out of date; that's
bound to happen, although it would be nice to try and have as many
people as possible review it for addresses that are out of date. The
current version also still seems to contain some encoding problems.
.. _migration tools repository: http://hg.xavamedia.nl/cpython/pymigr/
.. _migration tools repository: http://hg.python.org/pymigr/
Generating .hgignore
--------------------
The .hgignore file can be used in Mercurial repositories to help ignore files
that are not eligible for version control. It does this by employing several
possible forms of pattern matching. The current Python repository already
includes a rudimentary .hgignore file to help with using the hg mirrors.
The .hgignore file can be used in Mercurial repositories to help
ignore files that are not eligible for version control. It does this
by employing several possible forms of pattern matching. The current
Python repository already includes a rudimentary .hgignore file to
help with using the hg mirrors.
Since the current Python repository already includes a .hgignore file (for use
with hg mirrors), we'll just use that. Generating full history of the file
was debated but deemed impractical (because it's relatively hard with fairly
little gain, since ignoring is less important for older revisions).
Since the current Python repository already includes a .hgignore file
(for use with hg mirrors), we'll just use that. Generating full
history of the file was debated but deemed impractical (because it's
relatively hard with fairly little gain, since ignoring is less
important for older revisions).
Revlog reordering
-----------------
As an optional optimization technique, I have performed a reordering pass on
the revlogs (internal Mercurial files) resulting from the conversion. In some
cases this results in dramatic decreases in on-disk repository size. This
especially makes sense for the manifest (where it really helps out quite a lot)
and oft-edited files like NEWS.txt (with an admittedly smaller effect).
As an optional optimization technique, I have performed a reordering
pass on the revlogs (internal Mercurial files) resulting from the
conversion. In some cases this results in dramatic decreases in
on-disk repository size. This especially makes sense for the manifest
(where it really helps out quite a lot) and oft-edited files like
Misc/NEWS (with an admittedly smaller effect).
Other repositories
------------------
Richard Tew has indicated that he'd like the Stackless repository to also be
converted. What other projects in the svn.python.org repository should be
converted? Do we want to convert the peps repository? distutils? others?
Richard Tew has indicated that he'd like the Stackless repository to
also be converted. What other projects in the svn.python.org
repository should be converted? Do we want to convert the peps
repository? distutils? others?
There's now an initial stab at converting the Jython repository. The current
tip of hgsubversion unfortunately fails at some point. Pending investigation.
There's now an initial stab at converting the Jython repository. The
current tip of hgsubversion unfortunately fails at some point.
Pending investigation.
Other repositories that would like to converted to Mercurial can announce
themselves to me after the main Python migration is done, and I'll take care
of their needs.
Other repositories that would like to converted to Mercurial can
announce themselves to me after the main Python migration is done, and
I'll take care of their needs.
Infrastructure
@ -167,70 +205,82 @@ Infrastructure
hg-ssh
------
Developers should access the repositories through ssh, similar to the current
setup. Public keys can be used to grant people access to a shared hg@ account.
A hgwebdir instance should also be set up for easy browsing and read-only
access. If we're using ssh, developers should trivially be able to start new
clones (for longer-term features that profit from a separate branch).
Developers should access the repositories through ssh, similar to the
current setup. Public keys can be used to grant people access to a
shared hg@ account. A hgwebdir instance will also be set up for easy
browsing and read-only access. If we're using ssh, developers should
trivially be able to start new clones (for longer-term features that
profit from development in a separate repository).
Hooks
-----
A number of hooks is currently in use. The hg equivalents for these should be
developed and deployed. The following hooks are being used:
A number of hooks is currently in use. The hg equivalents for these
should be developed and deployed. The following hooks are being used:
* check whitespace: a hook to reject commits in case the whitespace doesn't
match the rules for the Python codebase. Should be straightforward to
re-implement from the current version. We can also offer a whitespace hook
for use with client-side repositories that people can use; it could either
warn about whitespace issues and/or truncate trailing whitespace from changed
lines. Open issue: do we check only the tip after each push, or do we check
every commit in a changegroup?
* check whitespace: a hook to reject commits in case the whitespace
doesn't match the rules for the Python codebase. In a changegroup,
only the tip is checked (this allows cleanup commits for changes
pulled from third-party repos). We can also offer a whitespace hook
for use with client-side repositories that people can use; it could
either warn about whitespace issues and/or truncate trailing
whitespace from changed lines.
* commit mails: we can leverage the notify extension for this. Emails will
include diffs for each changeset committed against the repository.
* commit mails: Emails will include diffs for each changeset committed
against the repository.
* buildbots: both the regular and the community build masters must be notified.
Fortunately buildbot includes support for hg. I've also implemented this for
Mercurial itself, so I don't expect problems here.
* buildbots: both the regular and the community build masters must be
notified.
* check contributors: in the current setup, all changesets bear the username of
committers, who must have signed the contributor agreement. We might want to
use a hook to check if the committer is a contributor if we keep a list of
registered contributors. Then, the hook might warn users that push a group
of revisions containing changesets from unknown contributors.
The `hooks repository`_ contains ports of these server-side hooks to
Mercurial. One additional hook could be beneficial:
* check contributors: in the current setup, all changesets bear the
username of committers, who must have signed the contributor
agreement. We might want to use a hook to check if the committer is
a contributor if we keep a list of registered contributors. Then,
the hook might warn users that push a group of revisions containing
changesets from unknown contributors.
.. _hooks repository: http://hg.python.org/hooks/
End-of-line conversions
-----------------------
There has been some discussion about the lack of end-of-line conversion support
in Mercurial. While Mercurial comes with a win32text extension that provides
some basic support for converting end-of-line data on a file-name pattern
basis, the lack of exclusion (for specifying broad rules with exceptions) and
the use of hgrc files (which can't be versioned) make it less than ideal.
Discussion about the lack of end-of-line conversion support in
Mercurial, which was provided initially by the win32text extension,
led to the development of the new eol extension that supports a
versioned management of line-ending conventions on a file-by-file
basis, akin to Subversion's ``svn:eol-style`` properties. This
information is kept in a versioned file called ``.hgeol``, and such a
file has already been checked into the Subversion repository.
I think the primary line of defense for prevention of inappropriate newlines
should be hooks on the server side which basically turn down any changegroup
or changeset introducing such data. The use of the win32text extension (which
can hopefully be improved/extended to support the usage scenarios mentioned
above) and/or a commit-time hook could be the first line of defense.
A hook on the server side that turns down any changegroup or changeset
introducing inconsistent newline data can still be implemented, if
deemed necessary.
hgwebdir
--------
A more or less stock hgwebdir installation should be set up. We might want to
come up with a style to match the Python website. It may also be useful to
build a quick extension to augment the URL rev parser so that it can also take
r[0-9]+ args and come up with the matching hg revision.
A more or less stock hgwebdir installation should be set up. We might
want to come up with a style to match the Python website.
A `small WSGI application`_ has been written that can look up
Subversion revisions and redirect to the appropriate hgweb page for
the given changeset, regardless in which repository the converted
revision ended up (since one big Subversion repository is converted
into several Mercurial repositories). It can also look up Mercurial
changesets by their hexadecimal ID.
.. _small WSGI application: http://hg.python.org/pymigr/file/tip/hglookup.py
roundup
-------
We'll come up with an auto-linking plugin for roundup, which can match a
changeset identifier (possibly with a branch prefix), and link it to the
appropriate revision in the hgwebdir instance. Second, the script above (in
the hgwebdir section) will make sure that old links to revision should continue
to work (by pointing to the hg changeset that reflects the svn revision).
By pointing Roundup to the URL of the lookup script mentioned above,
links to SVN revisions will continue to work, and links to Mercurial
changesets can be created as well, without having to give repository
*and* changeset ID.
After migration
@ -239,32 +289,35 @@ After migration
Where to get code
-----------------
It needs to be decided where the hg repositories will live. I'd like to
propose to keep the hgwebdir instance at hg.python.org. This is an accepted
standard for many organizations, and an easy parallel to svn.python.org.
The 2.7 (trunk) repo might live at http://hg.python.org/main/, for example,
with py3k at http://hg.python.org/py3k/. For write access, developers will
have to use ssh, which could be ssh://hg@hg.python.org/main/. A demo
installation will be set up with a preliminary conversion so people can
experiment and review; it can live at http://hg.python.org/example/.
After migration, the hgwebdir will live at hg.python.org. This is an
accepted standard for many organizations, and an easy parallel to
svn.python.org. The 3.x repo might live at
http://hg.python.org/main/, for example, with the 2.x repo at
http://hg.python.org/2.x/. For write access, developers will have to
use ssh, which could be ssh://hg@hg.python.org/main/. A demo
installation will be set up with a preliminary conversion so people
can experiment and review; it can live at
http://hg.python.org/example/.
code.python.org was also proposed as the hostname. Personally, I think that
using the VCS name in the hostname is good because it prevents confusion: it
should be clear that you can't use svn or bzr for hg.python.org.
code.python.org was also proposed as the hostname. Personally, I
think that using the VCS name in the hostname is good because it
prevents confusion: it should be clear that you can't use svn or bzr
for hg.python.org.
hgwebdir can already provide tarballs for every changeset. I think this
obviates the need for daily snapshots; we can just point users to tip.tar.gz
instead, meaning they will get the latest. If desired, we could even use
buildbot results to point to the last good changeset.
hgwebdir can already provide tarballs for every changeset. I think
this obviates the need for daily snapshots; we can just point users to
tip.tar.gz instead, meaning they will get the latest. If desired, we
could even use buildbot results to point to the last good changeset.
Python-specific documentation
-----------------------------
hg comes with good built-in documentation (available through hg help) and a
`wiki`_ that's full of useful information and recipes. In addition to that,
the `parts of the developer FAQ`_ concerning version control will gain a
section on using hg for Python development. Some of the text will be dependent
on the outcome of debate about this PEP (for example, the branching strategy).
hg comes with good built-in documentation (available through hg help)
and a `wiki`_ that's full of useful information and recipes. In
addition to that, the `parts of the developer FAQ`_ concerning version
control will gain a section on using hg for Python development. Some
of the text will be dependent on the outcome of debate about this PEP
(for example, the branching strategy).
.. _wiki: http://www.selenic.com/mercurial/wiki/
.. _parts of the developer FAQ: http://www.python.org/dev/faq/#version-control
@ -272,64 +325,74 @@ on the outcome of debate about this PEP (for example, the branching strategy).
Proposed workflow
-----------------
I propose two workflows for the migration of patches between several branches.
I propose two workflows for the migration of patches between several
branches.
For migration within 2.x or 3.x branches, I propose a patch always gets
committed to the oldest branch where it applies first. Then, the resulting
changeset can be merged using hg merge to all newer branches within that
series (2.x or 3.x). If it does not apply as-is to the newer branch, hg revert
can be used to easily revert to the new-branch-native head, patch in some
alternative version of the patch (or none, if it's not applicable), then commit
the merge. The premise here is that all changesets from an older branch within
the series are eventually merged to all newer branches within the series.
For migration within 2.x or 3.x branches, I propose a patch always
gets committed to the oldest branch where it applies first. Then, the
resulting changeset can be merged using hg merge to all newer branches
within that series (2.x or 3.x). If it does not apply as-is to the
newer branch, hg revert can be used to easily revert to the
new-branch-native head, patch in some alternative version of the patch
(or none, if it's not applicable), then commit the merge. The premise
here is that all changesets from an older branch within the series are
eventually merged to all newer branches within the series.
The upshot is that this provides for the most painless merging procedure. The
downside is that in the general case, people have to think about the oldest
branch to which the patch should be applied before actually applying it.
The upshot is that this provides for the most painless merging
procedure. This means that in the general case, people have to think
about the oldest branch to which the patch should be applied before
actually applying it. Usually, that is one of only two branches: the
latest maintenance branch and the trunk, except for security fixes
applicable to older branches in security-fix-only mode.
For migration between 2.x and 3.x branches (which should all be in the same
direction, though I'm not sure what direction is most appropriate here),
changesets should be transplanted (not merged) in some other way. The
transplant extension, import/export and bundle/unbundle work equally well here.
For merging bug fixes from the 3.x to the 2.7 maintenance branch (2.6
and 2.5 are in security-fix-only mode and their maintenance will
continue in the Subversion repository), changesets should be
transplanted (not merged) in some other way. The transplant
extension, import/export and bundle/unbundle work equally well here.
Choosing this approach allows 3.x not to carry all of the 2.x history-since-it-
was-branched, meaning the clone is not as big and the merges not as complicated.
Choosing this approach allows 3.x not to carry all of the 2.x
history-since-it-was-branched, meaning the clone is not as big and the
merges not as complicated.
The future of Subversion
------------------------
What happens to the Subversion repositories after the migration? Since the svn
server contains a bunch of repositories, not just the CPython one, it will
probably live on for a bit as not every project may want to migrate or it
takes longer for other projects to migrate. To prevent people from staying
behind, we may want to remove migrated projects from the repository.
What happens to the Subversion repositories after the migration?
Since the svn server contains a bunch of repositories, not just the
CPython one, it will probably live on for a bit as not every project
may want to migrate or it takes longer for other projects to migrate.
To prevent people from staying behind, we may want to move migrated
projects from the repository to a new, read-only repository with a
new name.
Build identification
--------------------
Python currently provides the sys.subversion tuple to allow Python code to
find out exactly what version of Python it's running against. The current
version looks something like this:
Python currently provides the sys.subversion tuple to allow Python
code to find out exactly what version of Python it's running against.
The current version looks something like this:
* ('CPython', 'tags/r262', '71600')
* ('CPython', 'trunk', '73128M')
Another value is returned from Py_GetBuildInfo() in the C API, and available
to Python code as part of sys.version:
Another value is returned from Py_GetBuildInfo() in the C API, and
available to Python code as part of sys.version:
* 'r262:71600, Jun 2 2009, 09:58:33'
* 'trunk:73128M, Jun 2 2009, 01:24:14'
I propose that the revision identifier will be the short version of hg's
revision hash, for example 'dd3ebf81af43', augmented with '+' (instead of 'M')
if the working directory from which it was built was modified. This mirrors
the output of the hg id command, which is intended for this kind of usage. The
sys.subversion value will also be renamed to sys.mercurial to reflect the
change in VCS.
I propose that the revision identifier will be the short version of
hg's revision hash, for example 'dd3ebf81af43', augmented with '+'
(instead of 'M') if the working directory from which it was built was
modified. This mirrors the output of the hg id command, which is
intended for this kind of usage. The sys.subversion value will also
be renamed to sys.mercurial to reflect the change in VCS.
For the tag/branch identifier, I propose that hg will check for tags on the
currently checked out revision, use the tag if there is one ('tip' doesn't
count), and uses the branch name otherwise. sys.subversion becomes
For the tag/branch identifier, I propose that hg will check for tags
on the currently checked out revision, use the tag if there is one
('tip' doesn't count), and uses the branch name otherwise.
sys.subversion becomes
* ('CPython', '2.6.2', 'dd3ebf81af43')
* ('CPython', 'default', 'af694c6a888c+')
@ -339,5 +402,23 @@ and the build info string becomes
* '2.6.2:dd3ebf81af43, Jun 2 2009, 09:58:33'
* 'default:af694c6a888c+, Jun 2 2009, 01:24:14'
This reflects that the default branch in hg is called 'default' instead of
Subversion's 'trunk', and reflects the proposed new tag format.
This reflects that the default branch in hg is called 'default'
instead of Subversion's 'trunk', and reflects the proposed new tag
format.
Copyright
=========
This document has been placed in the public domain.
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End: