364 lines
17 KiB
Plaintext
364 lines
17 KiB
Plaintext
PEP: 481
|
||
Title: Migrate CPython to Git, Github, and Phabricator
|
||
Version: $Revision$
|
||
Last-Modified: $Date$
|
||
Author: Donald Stufft <donald@stufft.io>
|
||
Status: Draft
|
||
Type: Process
|
||
Content-Type: text/x-rst
|
||
Created: 29-Nov-2014
|
||
Post-History: 29-Nov-2014
|
||
|
||
|
||
Abstract
|
||
========
|
||
|
||
This PEP proposes migrating the repository hosting of CPython and the
|
||
supporting repositories to Git and Github. It also proposes adding Phabricator
|
||
as an alternative to Github Pull Requests to handle reviewing changes. This
|
||
particular PEP is offered as an alternative to PEP 474 and PEP 462 which aims
|
||
to achieve the same overall benefits but restricts itself to tools that support
|
||
Mercurial and are completely Open Source.
|
||
|
||
|
||
Rationale
|
||
=========
|
||
|
||
CPython is an open source project which relies on a number of volunteers
|
||
donating their time. As an open source project it relies on attracting new
|
||
volunteers as well as retaining existing ones in order to continue to have
|
||
a healthy amount of manpower available. In addition to increasing the amount of
|
||
manpower that is available to the project, it also needs to allow for effective
|
||
use of what manpower *is* available.
|
||
|
||
The current toolchain of the CPython project is a custom and unique combination
|
||
of tools which mandates a workflow that is similar to one found in a lot of
|
||
older projects, but which is becoming less and less popular as time goes on.
|
||
|
||
The one-off nature of the CPython toolchain and workflow means that any new
|
||
contributor is going to need spend time learning the tools and workflow before
|
||
they can start contributing to CPython. Once a new contributor goes through
|
||
the process of learning the CPython workflow they also are unlikely to be able
|
||
to take that knowledge and apply it to future projects they wish to contribute
|
||
to. This acts as a barrier to contribution which will scare off potential new
|
||
contributors.
|
||
|
||
In addition the tooling that CPython uses is under-maintained, antiquated,
|
||
and it lacks important features that enable committers to more effectively use
|
||
their time when reviewing and approving changes. The fact that it is
|
||
under-maintained means that bugs are likely to last for longer, if they ever
|
||
get fixed, as well as it's more likely to go down for extended periods of time.
|
||
The fact that it is antiquated means that it doesn't effectively harness the
|
||
capabilities of the modern web platform. Finally the fact that it lacks several
|
||
important features such as a lack of pre-testing commits and the lack of an
|
||
automatic merge tool means that committers have to do needless busy work to
|
||
commit even the simplest of changes.
|
||
|
||
|
||
Version Control System
|
||
----------------------
|
||
|
||
The first decision that needs to be made is the VCS of the primary server side
|
||
repository. Currently the CPython repository, as well as a number of supporting
|
||
repositories, uses Mercurial. When evaluating the VCS we must consider the
|
||
capabilities of the VCS itself as well as the network effect and mindshare of
|
||
the community around that VCS.
|
||
|
||
There are really only two real options for this, Mercurial and Git. Between the
|
||
two of them the technical capabilities are largely equivilant. For this reason
|
||
this PEP will largely ignore the technical arguments about the VCS system and
|
||
will instead focus on the social aspects.
|
||
|
||
It is not possible to get exact numbers for the number of projects or people
|
||
which are using a particular VCS, however we can infer this by looking at
|
||
several sources of information for what VCS projects are using.
|
||
|
||
The Open Hub (previously Ohloh) statistics [#openhub-stats]_ show that 37% of
|
||
the repositories indexed by The Open Hub are using Git (second only to SVN
|
||
which has 48%) while Mercurial has just 2% (beating only bazaar which has 1%).
|
||
This has Git being just over 18 times as popular as Mercurial on The Open Hub.
|
||
|
||
Another source of information on the popular of the difference VCSs is PyPI
|
||
itself. This source is more targeted at the Python community itself since it
|
||
represents projects developed for Python. Unfortunately PyPI does not have a
|
||
standard location for representing this information, so this requires manual
|
||
processing. If we limit our search to the top 100 projects on PyPI (ordered
|
||
by download counts) we can see that 62% of them use Git while 22% of them use
|
||
Mercurial while 13% use something else. This has Git being just under 3 times
|
||
as popular as Mercurial for the top 100 projects on PyPI.
|
||
|
||
Obviously from these numbers Git is by far the more popular DVCS for open
|
||
source projects and choosing the more popular VCS has a number of positive
|
||
benefits.
|
||
|
||
For new contributors it increases the likelihood that they will have already
|
||
learned the basics of Git as part of working with another project or if they
|
||
are just now learning Git, that they'll be able to take that knowledge and
|
||
apply it to other projects. Additionally a larger community means more people
|
||
writing how to guides, answering questions, and writing articles about Git
|
||
which makes it easier for a new user to find answers and information about
|
||
the tool they are trying to learn.
|
||
|
||
Another benefit is that by nature of having a larger community, there will be
|
||
more tooling written *around* it. This increases options for everything from
|
||
GUI clients, helper scripts, repository hosting, etc.
|
||
|
||
|
||
Repository Hosting
|
||
------------------
|
||
|
||
This PEP proposes allowing GitHub Pull Requests to be submitted, however GitHub
|
||
does not have a way to submit Pull Requests against a repository that is not
|
||
hosted on GitHub. This PEP also proposes that in addition to GitHub Pull
|
||
Requests Phabricator's Differential app can also be used to submit proposed
|
||
changes and Phabricator *does* allow submitting changes against a repository
|
||
that is not hosted on Phabricator.
|
||
|
||
For this reason this PEP proposes using GitHub as the canonical location of
|
||
the repository with a read-only mirror located in Phabricator. If at some point
|
||
in the future GitHub is no longer desired, then repository hosting can easily
|
||
be moved to solely in Phabricator and the ability to accept GitHub Pull
|
||
Requests dropped.
|
||
|
||
In addition to hosting the repositories on Github, a read only copy of all
|
||
repositories will also be mirrored onto the PSF Infrastructure.
|
||
|
||
|
||
Code Review
|
||
-----------
|
||
|
||
Currently CPython uses a custom fork of Rietveld which has been modified to
|
||
not run on Google App Engine which is really only able to be maintained
|
||
currently by one person. In addition it is missing out on features that are
|
||
present in many modern code review tools.
|
||
|
||
This PEP proposes allowing both Github Pull Requests and Phabricator changes
|
||
to propose changes and review code. It suggests both so that contributors can
|
||
select which tool best enables them to submit changes, and reviewers can focus
|
||
on reviewing changes in the tooling they like best.
|
||
|
||
|
||
GitHub Pull Requests
|
||
~~~~~~~~~~~~~~~~~~~~
|
||
|
||
GitHub is a very popular code hosting site and is increasingly becoming the
|
||
primary place people look to contribute to a project. Enabling users to
|
||
contribute through GitHub is enabling contributors to contribute using tooling
|
||
that they are likely already familiar with and if they are not they are likely
|
||
to be able to apply to another project.
|
||
|
||
GitHub Pull Requests have a fairly major advantage over the older "submit a
|
||
patch to a bug tracker" model. It allows developers to work completely within
|
||
their VCS using standard VCS tooling so it does not require creating a patch
|
||
file and figuring out what the right location is to upload it to. This lowers
|
||
the barrier for sending a change to be reviewed.
|
||
|
||
On the reviewing side, GitHub Pull Requests are far easier to review, they have
|
||
nice syntax highlighted diffs which can operate in either unified or side by
|
||
side views. They allow expanding the context on a diff up to and including the
|
||
entire file. Finally they allow commenting inline and on the pull request as
|
||
a whole and they present that in a nice unified way which will also hide
|
||
comments which no longer apply. Github also provides a "rendered diff" view
|
||
which enables easily viewing a diff of rendered markup (such as rst) instead
|
||
of needing to review the diff of the raw markup.
|
||
|
||
The Pull Request work flow also makes it trivial to enable the ability to
|
||
pre-test a change before actually merging it. Any particular pull request can
|
||
have any number of different types of "commit statuses" applied to it, marking
|
||
the commit (and thus the pull request) as either in a pending, successful,
|
||
errored, or failure state. This makes it easy to see inline if the pull request
|
||
is passing all of the tests, if the contributor has signed a CLA, etc.
|
||
|
||
Actually merging a Github Pull Request is quite simple, a core reviewer simply
|
||
needs to press the "Merge" button once the status of all the checks on the
|
||
Pull Request are green for successful.
|
||
|
||
GitHub also has a good workflow for submitting pull requests to a project
|
||
completely through their web interface. This would enable the Python
|
||
documentation to have "Edit on GitHub" buttons on every page and people who
|
||
discover things like typos, inaccuracies, or just want to make improvements to
|
||
the docs they are currently writing can simply hit that button and get an in
|
||
browser editor that will let them make changes and submit a pull request all
|
||
from the comfort of their browser.
|
||
|
||
|
||
Phabricator
|
||
~~~~~~~~~~~
|
||
|
||
In addition to GitHub Pull Requests this PEP also proposes setting up a
|
||
Phabricator instance and pointing it at the GitHub hosted repositories. This
|
||
will allow utilizing the Phabricator review applications of Differential and
|
||
Audit.
|
||
|
||
Differential functions similarly to GitHub pull requests except that they
|
||
require installing the ``arc`` command line tool to upload patches to
|
||
Phabricator.
|
||
|
||
Whether to enable Phabricator for any particular repository can be chosen on
|
||
a case by case basis, this PEP only proposes that it must be enabled for the
|
||
CPython repository, however for smaller repositories such as the PEP repository
|
||
it may not be worth the effort.
|
||
|
||
|
||
Criticism
|
||
=========
|
||
|
||
X is not written in Python
|
||
--------------------------
|
||
|
||
One feature that the current tooling (Mercurial, Rietveld) has is that the
|
||
primary language for all of the pieces are written in Python. It is this PEPs
|
||
belief that we should focus on the *best* tools for the job and not the *best*
|
||
tools that happen to be written in Python. Volunteer time is a precious
|
||
resource to any open source project and we can best respect and utilize that
|
||
time by focusing on the benefits and downsides of the tools themselves rather
|
||
than what language their authors happened to write them in.
|
||
|
||
One concern is the ability to modify tools to work for us, however one of
|
||
the Goals here is to *not* modify software to work for us and instead adapt
|
||
ourselves to a more standard workflow. This standardization pays off in the
|
||
ability to re-use tools out of the box freeing up developer time to actually
|
||
work on Python itself as well as enabling knowledge sharing between projects.
|
||
|
||
However if we do need to modify the tooling, Git itself is largely written in
|
||
C the same as CPython itself is. It can also have commands written for it using
|
||
any language, including Python. Phabricator is written in PHP which is a fairly
|
||
common language in the web world and fairly easy to pick up. GitHub itself is
|
||
largely written in Ruby but given that it's not Open Source there is no ability
|
||
to modify it so it's implementation language is completely meaningless.
|
||
|
||
|
||
GitHub is not Free/Open Source
|
||
------------------------------
|
||
|
||
GitHub is a big part of this proposal and someone who tends more to ideology
|
||
rather than practicality may be opposed to this PEP on that grounds alone. It
|
||
is this PEPs belief that while using entirely Free/Open Source software is an
|
||
attractive idea and a noble goal, that valuing the time of the contributors by
|
||
giving them good tooling that is well maintained and that they either already
|
||
know or if they learn it they can apply to other projects is a more important
|
||
concern than treating whether something is Free/Open Source is a hard
|
||
requirement.
|
||
|
||
However, history has shown us that sometimes benevolent proprietary companies
|
||
can stop being benevolent. This is hedged against in a few ways:
|
||
|
||
* We are not utilizing the GitHub Issue Tracker, both because it is not
|
||
powerful enough for CPython but also because for the primary CPython
|
||
repository the ability to take our issues and put them somewhere else if we
|
||
ever need to leave GitHub relies on GitHub continuing to allow API access.
|
||
|
||
* We are utilizing the GitHub Pull Request workflow, however all of those
|
||
changes live inside of Git. So a mirror of the GitHub repositories can easily
|
||
contain all of those Pull Requests. We would potentially lose any comments if
|
||
GitHub suddenly turned "evil", but the changes themselves would still exist.
|
||
|
||
* We are utilizing the GitHub repository hosting feature, however since this is
|
||
just git moving away from GitHub is as simple as pushing the repository to
|
||
a different location. Data portability for the repository itself is extremely
|
||
high.
|
||
|
||
* We are also utilizing Phabricator to provide an alternative for people who
|
||
do not wish to use GitHub. This also acts as a fallback option which will
|
||
already be in place if we ever need to stop using GitHub.
|
||
|
||
Relying on GitHub comes with a number of benefits beyond just the benefits of
|
||
the platform itself. Since it is a commercially backed venture it has a full
|
||
time staff responsible for maintaining its services. This includes making sure
|
||
they stay up, making sure they stay patched for various security
|
||
vulnerabilities, and further improving the software and infrastructure as time
|
||
goes on.
|
||
|
||
|
||
Mercurial is better than Git
|
||
----------------------------
|
||
|
||
Whether Mercurial or Git is better on a technical level is a highly subjective
|
||
opinion. This PEP does not state whether the mechanics of Git or Mercurial is
|
||
better and instead focuses on the network effect that is available for either
|
||
option. Since this PEP proposes switching to Git this leaves the people who
|
||
prefer Mercurial out, however those users can easily continue to work with
|
||
Mercurial by using the hg-git [#hg-git]_ extension for Mercurial which will
|
||
let it work with a repository which is Git on the serverside.
|
||
|
||
|
||
CPython Workflow is too Complicated
|
||
-----------------------------------
|
||
|
||
One sentiment that came out of previous discussions was that the multi branch
|
||
model of CPython was too complicated for Github Pull Requests. It is the belief
|
||
of this PEP that statement is not accurate.
|
||
|
||
Currently any particular change requires manually creating a patch for 2.7 and
|
||
3.x which won't change at all in this regards.
|
||
|
||
If someone submits a fix for the current stable branch (currently 3.4) the
|
||
GitHub Pull Request workflow can be used to create, in the browser, a Pull
|
||
Request to merge the current stable branch into the master branch (assuming
|
||
there is no merge conflicts). If there is a merge conflict that would need to
|
||
be handled locally. This provides an improvement over the current situation
|
||
where the merge must always happen locally.
|
||
|
||
Finally if someone submits a fix for the current development branch currently
|
||
then this has to be manually applied to the stable branch if it desired to
|
||
include it there as well. This must also happen locally as well in the new
|
||
workflow, however for minor changes it could easily be accomplished in the
|
||
GitHub web editor.
|
||
|
||
Looking at this, I do not believe that *any* system can hide the complexities
|
||
involved in maintaining several long running branches. The only thing that the
|
||
tooling can do is make it as easy as possible to submit changes.
|
||
|
||
|
||
Example: Scientific Python
|
||
==========================
|
||
|
||
One of the key ideas behind the move to both git and Github is that a feature
|
||
of a DVCS, the repository hosting, and the workflow used is the social network
|
||
and size of the community using said tools. We can see this is true by looking
|
||
at an example from a sub-community of the Python community: The Scientific
|
||
Python community. They have already migrated most of the key pieces of the
|
||
SciPy stack onto Github using the Pull Request based workflow. This process
|
||
started with IPython, and as more projects moved over it became a natural
|
||
default for new projects in the community.
|
||
|
||
They claim to have seen a great benefit from this move, in that it enables
|
||
casual contributors to easily move between different projects within their
|
||
sub-community without having to learn a special, bespoke workflow and a
|
||
different toolchain for each project. They've found that when people can use
|
||
their limited time on actually contributing instead of learning the different
|
||
tools and workflows, not only do they contribute more to one project, but
|
||
that they also expand out and contribute to other projects. This move has also
|
||
been attributed to the increased tendency for members of that community to go
|
||
so far as publishing their research and educational materials on Github as
|
||
well.
|
||
|
||
This example showcases the real power behind moving to a highly popular
|
||
toolchain and workflow, as each variance introduces yet another hurdle for new
|
||
and casual contributors to get past and it makes the time spent learning that
|
||
workflow less reusable with other projects.
|
||
|
||
|
||
References
|
||
==========
|
||
|
||
.. [#openhub-stats] `Open Hub Statistics <https://www.openhub.net/repositories/compare>`
|
||
.. [#hg-git] `Hg-Git mercurial plugin <https://hg-git.github.io/>`
|
||
|
||
|
||
Copyright
|
||
=========
|
||
|
||
This document has been placed in the public domain.
|
||
|
||
|
||
|
||
..
|
||
Local Variables:
|
||
mode: indented-text
|
||
indent-tabs-mode: nil
|
||
sentence-end-double-space: t
|
||
fill-column: 70
|
||
coding: utf-8
|
||
End:
|