Switch to proposing a full migration to Git, Github, and Phabricator

This commit is contained in:
Donald Stufft 2015-02-01 10:26:15 -05:00
parent 00926f7bd4
commit 50d36874be
1 changed files with 272 additions and 178 deletions

View File

@ -1,5 +1,5 @@
PEP: 481 PEP: 481
Title: Migrate Some Supporting Repositories to Git and Github Title: Migrate CPython to Git, Github, and Phabricator
Version: $Revision$ Version: $Revision$
Last-Modified: $Date$ Last-Modified: $Date$
Author: Donald Stufft <donald@stufft.io> Author: Donald Stufft <donald@stufft.io>
@ -13,156 +13,305 @@ Post-History: 29-Nov-2014
Abstract Abstract
======== ========
This PEP proposes migrating to Git and Github for certain supporting This PEP proposes migrating the repository hosting of CPython and the
repositories (such as the repository for Python Enhancement Proposals) in a way supporting repositories to Git and Github. It also proposes adding Phabricator
that is more accessible to new contributors, and easier to manage for core as an alternative to Github Pull Requests to handle reviewing changes. This
developers. This is offered as an alternative to PEP 474 which aims to achieve particular PEP is offered as an alternative to PEP 474 and PEP 462 which aims
the same overall benefits but while continuing to use the Mercurial DVCS and to achieve the same overall benefits but restricts itself to tools that support
without relying on a commerical entity. Mercurial and are completely Open Source.
In particular this PEP proposes changes to the following repositories:
* https://hg.python.org/devguide/
* https://hg.python.org/devinabox/
* https://hg.python.org/peps/
This PEP does not propose any changes to the core development workflow for
CPython itself.
Rationale Rationale
========= =========
As PEP 474 mentions, there are currently a number of repositories hosted on CPython is an open source project which relies on a number of volunteers
hg.python.org which are not directly used for the development of CPython but donating their time. As an open source project it relies on attracting new
instead are supporting or ancillary repositories. These supporting repositories volunteers as well as retaining existing ones in order to continue to have
do not typically have complex workflows or often branches at all other than the a healthy amount of manpower available. In addition to increasing the amount of
primary integration branch. This simplicity makes them very good targets for manpower that is available to the project, it also needs to allow for effective
the "Pull Request" workflow that is commonly found on sites like Github. use of what manpower *is* available.
However whereas PEP 474 proposes to continue to use Mercurial and restricts The current toolchain of the CPython project is a custom and unique combination
itself to only solutions which are OSS and self-hosted, this PEP expands the of tools which mandates a workflow that is similar to one found in a lot of
scope of that to include migrating to Git and using Github. older projects, but which is becoming less and less popular as time goes on.
The existing method of contributing to these repositories generally includes The one-off nature of the CPython toolchain and workflow means that any new
generating a patch and either uploading them to bugs.python.org or emailing contributor is going to need spend time learning the tools and workflow before
them to peps@python.org. This process is unfriendly towards non-comitter they can start contributing to CPython. Once a new contributor goes through
contributors as well as cumbersome for comitters seeking to accept the patches the process of learning the CPython workflow they also are unlikely to be able
sent by users. In contrast, the Pull Request workflow style enables non to take that knowledge and apply it to future projects they wish to contribute
technical contributors, especially those who do not know their way around the to. This acts as a barrier to contribution which will scare off potential new
DVCS of choice, to contribute using the web based editor. On the committer contributors.
side, the Pull Requests enable them to tell, before merging, whether or not
a particular Pull Request will break anything. It also enables them to do a In addition the tooling that CPython uses is under-maintained, antiquated,
simple "push button" merge which does not require them to check out the and it lacks important features that enable committers to more effectively use
changes locally. Another such feature that is useful in particular for docs, their time when reviewing and approving changes. The fact that it is
is the ability to view a "prose" diff. This Github-specific feature enables under-maintained means that bugs are likely to last for longer, if they ever
a committer to view a diff of the rendered output which will hide things like get fixed, as well as it's more likely to go down for extended periods of time.
reformatting a paragraph and show you what the actual "meat" of the change The fact that it is antiquated means that it doesn't effectively harness the
actually is. capabilities of the modern web platform. Finally the fact that it lacks several
important features such as a lack of pre-testing commits and the lack of an
automatic merge tool means that committers have to do needless busy work to
commit even the simplest of changes.
Why Git? Version Control System
-------- ----------------------
Looking at the variety of DVCS which are available today, it becomes fairly The first decision that needs to be made is the VCS of the primary server side
clear that git has the largest mindshare. The Open Hub (previously Ohloh) repository. Currently the CPython repository, as well as a number of supporting
statistics [#openhub-stats]_ show that currently 37% of the repositories repositories, uses Mercurial. When evaluating the VCS we must consider the
indexed by Open Hub are using git which is second only to SVN (which has 48%), capabilities of the VCS itself as well as the network effect and mindshare of
while Mercurial has just 2% of the indexed repositories (beating only bazaar the community around that VCS.
which has 1%). In additon to the Open Hub statistics, a look at the top 100
projects on PyPI (ordered by total download counts) shows that within the
Python space itself, the majority of projects use git.
=== ========= ========== ====== === ==== There are really only two real options for this, Mercurial and Git. Between the
Git Mercurial Subversion Bazaar CVS None two of them the technical capabilities are largely equivilant. For this reason
=== ========= ========== ====== === ==== this PEP will largely ignore the technical arguments about the VCS system and
62 22 7 4 1 1 will instead focus on the social aspects.
=== ========= ========== ====== === ====
It is not possible to get exact numbers for the number of projects or people
which are using a particular VCS, however we can infer this by looking at
several sources of information for what VCS projects are using.
The Open Hub (previously Ohloh) statistics [#openhub-stats]_ show that 37% of
the repositories indexed by The Open Hub are using Git (second only to SVN
which has 48%) while Mercurial has just 2% (beating only bazaar which has 1%).
This has Git being just over 18 times as popular as Mercurial on The Open Hub.
Another source of information on the popular of the difference VCSs is PyPI
itself. This source is more targeted at the Python community itself since it
represents projects developed for Python. Unfortunately PyPI does not have a
standard location for representing this information, so this requires manual
processing. If we limit our search to the top 100 projects on PyPI (ordered
by download counts) we can see that 62% of them use Git while 22% of them use
Mercurial while 13% use something else. This has Git being just under 3 times
as popular as Mercurial for the top 100 projects on PyPI.
Obviously from these numbers Git is by far the more popular DVCS for open
source projects and choosing the more popular VCS has a number of positive
benefits.
For new contributors it increases the likelihood that they will have already
learned the basics of Git as part of working with another project or if they
are just now learning Git, that they'll be able to take that knowledge and
apply it to other projects. Additionally a larger community means more people
writing how to guides, answering questions, and writing articles about Git
which makes it easier for a new user to find answers and information about
the tool they are trying to learn.
Another benefit is that by nature of having a larger community, there will be
more tooling written *around* it. This increases options for everything from
GUI clients, helper scripts, repository hosting, etc.
Chosing a DVCS which has the larger mindshare will make it more likely that any Repository Hosting
particular person who has experience with DVCS at all will be able to ------------------
meaningfully contribute without having to learn a new tool.
In addition to simply making it more likely that any individual will already This PEP proposes allowing GitHub Pull Requests to be submitted, however GitHub
know how to use git, the number of projects and people using it means that the does not have a way to submit Pull Requests against a repository that is not
resources for learning the tool are likely to be more fully fleshed out. hosted on GitHub. This PEP also proposes that in addition to GitHub Pull
When you run into problems, the likelihood that someone else had that problem Requests Phabricator's Differential app can also be used to submit proposed
and posted a question and recieved an answer is also far higher. changes and Phabricator *does* allow submitting changes against a repository
that is not hosted on Phabricator.
Thirdly, by using a more popular tool you also increase your options for For this reason this PEP proposes using GitHub as the canonical location of
tooling *around* the DVCS itself. Looking at the various options for hosting the repository with a read-only mirror located in Phabricator. If at some point
repositories, it's extremely rare to find a hosting solution (whether OSS or in the future GitHub is no longer desired, then repository hosting can easily
commerical) that supports Mercurial but does not support Git. On the flip side, be moved to solely in Phabricator and the ability to accept GitHub Pull
there are a number of tools which support Git but do not support Mercurial. Requests dropped.
Therefore the popularity of git increases the flexibility of our options going
into the future for what toolchain these projects use.
Also, by moving to the more popular DVCS, we increase the likelihood that the In addition to hosting the repositories on Github, a read only copy of all
knowledge that the person has learned in contributing to these support repositories will also be mirrored onto the PSF Infrastructure.
repositories will transfer to projects outside of the immediate CPython project
such as to the larger Python community which is primarily using Git hosted on
Github.
In previous years there was concern about how well supported git was on Windows
in comparison to Mercurial. However, git has grown to support Windows as a
first class citizen. In addition to that, for Windows users who are not well
acquainted with the Windows command line, there are GUI options as well.
Why Github? Code Review
----------- -----------
There are a number of software projects or web services which offer Currently CPython uses a custom fork of Rietveld which has been modified to
functionality similar to that of Github. These range from commerical web not run on Google App Engine which is really only able to be maintained
services such as Bitbucket to self-hosted OSS solutions such as Kallithea or currently by one person. In addition it is missing out on features that are
Gitlab. This PEP proposes that we move these repositories to Github. present in many modern code review tools.
There are two primary reasons for selecting Github: Popularity and This PEP proposes allowing both Github Pull Requests and Phabricator changes
Quality/Polish. to propose changes and review code. It suggests both so that contributors can
select which tool best enables them to submit changes, and reviewers can focus
on reviewing changes in the tooling they like best.
Github is currently the most popular hosted repository hosting according to
Alexa, where it currently has a global rank of 121. Much like for Git itself,
by choosing the most popular tool we gain benefits in increasing the likelihood
that a new contributor will have already experienced the toolchain, the quality
and availablity of the help, more and better tooling being built around it, and
the knowledge transfer to other projects. A look again at the top 100 projects
by download counts on PyPI shows the following hosting locations:
====== ========= =========== ========= =========== ========== GitHub Pull Requests
GitHub BitBucket Google Code Launchpad SourceForge Other/Self ~~~~~~~~~~~~~~~~~~~~
====== ========= =========== ========= =========== ==========
62 18 6 4 3 7
====== ========= =========== ========= =========== ==========
In addition to all of those reasons, Github also has the benefit that while GitHub is a very popular code hosting site and is increasingly becoming the
many of the options have similar features when you look at them in a feature primary place people look to contribute to a project. Enabling users to
matrix the Github version of each of those features tend to work better and be contribute through GitHub is enabling contributors to contribute using tooling
far more polished. This is hard to quantify objectively however it is a fairly that they are likely already familiar with and if they are not they are likely
common sentiment if you go around and ask people who are using these services to be able to apply to another project.
often.
Finally, a reason to choose a web service at all over something that is GitHub Pull Requests have a fairly major advantage over the older "submit a
self-hosted is to be able to more efficiently use volunteer time and donated patch to a bug tracker" model. It allows developers to work completely within
resources. Every additional service hosted on the PSF infrastructure by the their VCS using standard VCS tooling so it does not require creating a patch
PSF infrastructure team further spreads out the amount of time that the file and figuring out what the right location is to upload it to. This lowers
volunteers on that team have to spend and uses some chunk of resources that the barrier for sending a change to be reviewed.
could potentially be used for something where there is no free or affordable
hosted solution available.
One concern that people do have with using a hosted service is that there is a On the reviewing side, GitHub Pull Requests are far easier to review, they have
lack of control and that at some point in the future the service may no longer nice syntax highlighted diffs which can operate in either unified or side by
be suitable. It is the opinion of this PEP that Github does not currently and side views. They allow expanding the context on a diff up to and including the
has not in the past engaged in any attempts to lock people into their platform entire file. Finally they allow commenting inline and on the pull request as
and that if at some point in the future Github is no longer suitable for one a whole and they present that in a nice unified way which will also hide
reason or another, then at that point we can look at migrating away from Github comments which no longer apply. Github also provides a "rendered diff" view
onto a different solution. In other words, we'll cross that bridge if and when which enables easily viewing a diff of rendered markup (such as rst) instead
we come to it. of needing to review the diff of the raw markup.
The Pull Request work flow also makes it trivial to enable the ability to
pre-test a change before actually merging it. Any particular pull request can
have any number of different types of "commit statuses" applied to it, marking
the commit (and thus the pull request) as either in a pending, successful,
errored, or failure state. This makes it easy to see inline if the pull request
is passing all of the tests, if the contributor has signed a CLA, etc.
Actually merging a Github Pull Request is quite simple, a core reviewer simply
needs to press the "Merge" button once the status of all the checks on the
Pull Request are green for successful.
GitHub also has a good workflow for submitting pull requests to a project
completely through their web interface. This would enable the Python
documentation to have "Edit on GitHub" buttons on every page and people who
discover things like typos, inaccuracies, or just want to make improvements to
the docs they are currently writing can simply hit that button and get an in
browser editor that will let them make changes and submit a pull request all
from the comfort of their browser.
Phabricator
~~~~~~~~~~~
In addition to GitHub Pull Requests this PEP also proposes setting up a
Phabricator instance and pointing it at the GitHub hosted repositories. This
will allow utilizing the Phabricator review applications of Differential and
Audit.
Differential functions similarly to GitHub pull requests except that they
require installing the ``arc`` command line tool to upload patches to
Phabricator.
Whether to enable Phabricator for any particular repository can be chosen on
a case by case basis, this PEP only proposes that it must be enabled for the
CPython repository, however for smaller repositories such as the PEP repository
it may not be worth the effort.
Criticism
=========
X is not written in Python
--------------------------
One feature that the current tooling (Mercurial, Rietveld) has is that the
primary language for all of the pieces are written in Python. It is this PEPs
belief that we should focus on the *best* tools for the job and not the *best*
tools that happen to be written in Python. Volunteer time is a precious
resource to any open source project and we can best respect and utilize that
time by focusing on the benefits and downsides of the tools themselves rather
than what language their authors happened to write them in.
One concern is the ability to modify tools to work for us, however one of
the Goals here is to *not* modify software to work for us and instead adapt
ourselves to a more standard workflow. This standardization pays off in the
ability to re-use tools out of the box freeing up developer time to actually
work on Python itself as well as enabling knowledge sharing between projects.
However if we do need to modify the tooling, Git itself is largely written in
C the same as CPython itself is. It can also have commands written for it using
any language, including Python. Phabricator is written in PHP which is a fairly
common language in the web world and fairly easy to pick up. GitHub itself is
largely written in Ruby but given that it's not Open Source there is no ability
to modify it so it's implementation language is completely meaningless.
GitHub is not Free/Open Source
------------------------------
GitHub is a big part of this proposal and someone who tends more to ideology
rather than practicality may be opposed to this PEP on that grounds alone. It
is this PEPs belief that while using entirely Free/Open Source software is an
attractive idea and a noble goal, that valuing the time of the contributors by
giving them good tooling that is well maintained and that they either already
know or if they learn it they can apply to other projects is a more important
concern than treating whether something is Free/Open Source is a hard
requirement.
However, history has shown us that sometimes benevolent proprietary companies
can stop being benevolent. This is hedged against in a few ways:
* We are not utilizing the GitHub Issue Tracker, both because it is not
powerful enough for CPython but also because for the primary CPython
repository the ability to take our issues and put them somewhere else if we
ever need to leave GitHub relies on GitHub continuing to allow API access.
* We are utilizing the GitHub Pull Request workflow, however all of those
changes live inside of Git. So a mirror of the GitHub repositories can easily
contain all of those Pull Requests. We would potentially lose any comments if
GitHub suddenly turned "evil", but the changes themselves would still exist.
* We are utilizing the GitHub repository hosting feature, however since this is
just git moving away from GitHub is as simple as pushing the repository to
a different location. Data portability for the repository itself is extremely
high.
* We are also utilizing Phabricator to provide an alternative for people who
do not wish to use GitHub. This also acts as a fallback option which will
already be in place if we ever need to stop using GitHub.
Relying on GitHub comes with a number of benefits beyond just the benefits of
the platform itself. Since it is a commercially backed venture it has a full
time staff responsible for maintaining its services. This includes making sure
they stay up, making sure they stay patched for various security
vulnerabilities, and further improving the software and infrastructure as time
goes on.
Mercurial is better than Git
----------------------------
Whether Mercurial or Git is better on a technical level is a highly subjective
opinion. This PEP does not state whether the mechanics of Git or Mercurial is
better and instead focuses on the network effect that is available for either
option. Since this PEP proposes switching to Git this leaves the people who
prefer Mercurial out, however those users can easily continue to work with
Mercurial by using the hg-git [#hg-git]_ extension for Mercurial which will
let it work with a repository which is Git on the serverside.
CPython Workflow is too Complicated
-----------------------------------
One sentiment that came out of previous discussions was that the multi branch
model of CPython was too complicated for Github Pull Requests. It is the belief
of this PEP that statement is not accurate.
Currently any particular change requires manually creating a patch for 2.7 and
3.x which won't change at all in this regards.
If someone submits a fix for the current stable branch (currently 3.4) the
GitHub Pull Request workflow can be used to create, in the browser, a Pull
Request to merge the current stable branch into the master branch (assuming
there is no merge conflicts). If there is a merge conflict that would need to
be handled locally. This provides an improvement over the current situation
where the merge must always happen locally.
Finally if someone submits a fix for the current development branch currently
then this has to be manually applied to the stable branch if it desired to
include it there as well. This must also happen locally as well in the new
workflow, however for minor changes it could easily be accomplished in the
GitHub web editor.
Looking at this, I do not believe that *any* system can hide the complexities
involved in maintaining several long running branches. The only thing that the
tooling can do is make it as easy as possible to submit changes.
Example: Scientific Python Example: Scientific Python
-------------------------- ==========================
One of the key ideas behind the move to both git and Github is that a feature One of the key ideas behind the move to both git and Github is that a feature
of a DVCS, the repository hosting, and the workflow used is the social network of a DVCS, the repository hosting, and the workflow used is the social network
@ -190,66 +339,11 @@ and casual contributors to get past and it makes the time spent learning that
workflow less reusable with other projects. workflow less reusable with other projects.
Migration
=========
Through the use of hg-git [#hg-git]_ we can easily convert a Mercurial
repository to a Git repository by simply pushing the Mercurial repository to
the Git repository. People who wish to continue to use Mercurial locally can
then use hg-git going into the future using the new Github URL. However they
will need to re-clone their repositories as using Git as the server seems to
trigger a one time change of the changeset ids.
As none of the selected repositories have any tags, branches, or bookmarks
other than the ``default`` branch the migration will simply map the ``default``
branch in Mercurial to the ``master`` branch in git.
In addition, since none of the selected projects have any great need of a
complex bug tracker, they will also migrate their issue handling to using the
GitHub issues.
In addition to the migration of the repository hosting itself there are a
number of locations for each particular repository which will require updating.
The bulk of these will simply be changing commands from the hg equivalent to
the git equivalent.
In particular this will include:
* Updating www.python.org to generate PEPs using a git clone and link to
Github.
* Updating docs.python.org to pull from Github instead of hg.python.org for the
devguide.
* Enabling the ability to send an email to python-checkins@python.org for each
push.
* Enabling the ability to send an IRC message to #python-dev on Freenode for
each push.
* Migrate any issues for these projects to their respective bug tracker on
Github.
* Use hg-git to provide a read-only mirror on hg.python.org which will enable
read-only uses of the hg.python.org instances of the specified repositories
to remain the same.
This will restore these repositories to similar functionality as they currently
have. In addition to this the migration will also include enabling testing for
each pull request using Travis CI [#travisci]_ where possible to ensure that
a new PR does not break the ability to render the documentation or PEPs.
User Access
===========
Moving to Github would involve adding an additional user account that will need
to be managed, however it also offers finer grained control, allowing the
ability to grant someone access to only one particular repository instead of
the coarser grained ACLs available on hg.python.org.
References References
========== ==========
.. [#openhub-stats] `Open Hub Statistics <https://www.openhub.net/repositories/compare>` .. [#openhub-stats] `Open Hub Statistics <https://www.openhub.net/repositories/compare>`
.. [#hg-git] `hg-git <https://hg-git.github.io/>` .. [#hg-git] `Hg-Git mercurial plugin <https://hg-git.github.io/>`
.. [#travisci] `Travis CI <https://travis-ci.org/>`
Copyright Copyright