2014-11-29 18:12:06 -05:00
|
|
|
PEP: 481
|
2015-02-01 10:26:15 -05:00
|
|
|
Title: Migrate CPython to Git, Github, and Phabricator
|
2014-11-29 18:12:06 -05:00
|
|
|
Author: Donald Stufft <donald@stufft.io>
|
2016-05-04 07:12:23 -04:00
|
|
|
Status: Withdrawn
|
2014-11-29 18:12:06 -05:00
|
|
|
Type: Process
|
|
|
|
Content-Type: text/x-rst
|
|
|
|
Created: 29-Nov-2014
|
|
|
|
Post-History: 29-Nov-2014
|
|
|
|
|
|
|
|
|
|
|
|
Abstract
|
|
|
|
========
|
|
|
|
|
2016-05-04 07:15:53 -04:00
|
|
|
.. note:: This PEP has been withdrawn, if you're looking for the PEP
|
2022-01-21 06:03:51 -05:00
|
|
|
documenting the move to Github, please refer to :pep:`512`.
|
2016-05-04 07:15:53 -04:00
|
|
|
|
2015-02-01 10:26:15 -05:00
|
|
|
This PEP proposes migrating the repository hosting of CPython and the
|
|
|
|
supporting repositories to Git and Github. It also proposes adding Phabricator
|
|
|
|
as an alternative to Github Pull Requests to handle reviewing changes. This
|
2022-01-21 06:03:51 -05:00
|
|
|
particular PEP is offered as an alternative to :pep:`474` and :pep:`462` which aims
|
2015-02-01 10:26:15 -05:00
|
|
|
to achieve the same overall benefits but restricts itself to tools that support
|
|
|
|
Mercurial and are completely Open Source.
|
2014-11-29 18:12:06 -05:00
|
|
|
|
|
|
|
|
|
|
|
Rationale
|
|
|
|
=========
|
|
|
|
|
2015-02-01 10:26:15 -05:00
|
|
|
CPython is an open source project which relies on a number of volunteers
|
|
|
|
donating their time. As an open source project it relies on attracting new
|
|
|
|
volunteers as well as retaining existing ones in order to continue to have
|
|
|
|
a healthy amount of manpower available. In addition to increasing the amount of
|
|
|
|
manpower that is available to the project, it also needs to allow for effective
|
|
|
|
use of what manpower *is* available.
|
|
|
|
|
|
|
|
The current toolchain of the CPython project is a custom and unique combination
|
|
|
|
of tools which mandates a workflow that is similar to one found in a lot of
|
|
|
|
older projects, but which is becoming less and less popular as time goes on.
|
|
|
|
|
|
|
|
The one-off nature of the CPython toolchain and workflow means that any new
|
|
|
|
contributor is going to need spend time learning the tools and workflow before
|
|
|
|
they can start contributing to CPython. Once a new contributor goes through
|
|
|
|
the process of learning the CPython workflow they also are unlikely to be able
|
|
|
|
to take that knowledge and apply it to future projects they wish to contribute
|
|
|
|
to. This acts as a barrier to contribution which will scare off potential new
|
|
|
|
contributors.
|
|
|
|
|
|
|
|
In addition the tooling that CPython uses is under-maintained, antiquated,
|
|
|
|
and it lacks important features that enable committers to more effectively use
|
|
|
|
their time when reviewing and approving changes. The fact that it is
|
|
|
|
under-maintained means that bugs are likely to last for longer, if they ever
|
|
|
|
get fixed, as well as it's more likely to go down for extended periods of time.
|
|
|
|
The fact that it is antiquated means that it doesn't effectively harness the
|
|
|
|
capabilities of the modern web platform. Finally the fact that it lacks several
|
|
|
|
important features such as a lack of pre-testing commits and the lack of an
|
|
|
|
automatic merge tool means that committers have to do needless busy work to
|
|
|
|
commit even the simplest of changes.
|
|
|
|
|
|
|
|
|
|
|
|
Version Control System
|
|
|
|
----------------------
|
|
|
|
|
|
|
|
The first decision that needs to be made is the VCS of the primary server side
|
|
|
|
repository. Currently the CPython repository, as well as a number of supporting
|
|
|
|
repositories, uses Mercurial. When evaluating the VCS we must consider the
|
|
|
|
capabilities of the VCS itself as well as the network effect and mindshare of
|
|
|
|
the community around that VCS.
|
|
|
|
|
|
|
|
There are really only two real options for this, Mercurial and Git. Between the
|
2016-07-11 11:14:08 -04:00
|
|
|
two of them the technical capabilities are largely equivalent. For this reason
|
2015-02-01 10:26:15 -05:00
|
|
|
this PEP will largely ignore the technical arguments about the VCS system and
|
|
|
|
will instead focus on the social aspects.
|
|
|
|
|
|
|
|
It is not possible to get exact numbers for the number of projects or people
|
|
|
|
which are using a particular VCS, however we can infer this by looking at
|
|
|
|
several sources of information for what VCS projects are using.
|
|
|
|
|
|
|
|
The Open Hub (previously Ohloh) statistics [#openhub-stats]_ show that 37% of
|
|
|
|
the repositories indexed by The Open Hub are using Git (second only to SVN
|
|
|
|
which has 48%) while Mercurial has just 2% (beating only bazaar which has 1%).
|
|
|
|
This has Git being just over 18 times as popular as Mercurial on The Open Hub.
|
|
|
|
|
|
|
|
Another source of information on the popular of the difference VCSs is PyPI
|
|
|
|
itself. This source is more targeted at the Python community itself since it
|
|
|
|
represents projects developed for Python. Unfortunately PyPI does not have a
|
|
|
|
standard location for representing this information, so this requires manual
|
|
|
|
processing. If we limit our search to the top 100 projects on PyPI (ordered
|
|
|
|
by download counts) we can see that 62% of them use Git while 22% of them use
|
|
|
|
Mercurial while 13% use something else. This has Git being just under 3 times
|
|
|
|
as popular as Mercurial for the top 100 projects on PyPI.
|
|
|
|
|
|
|
|
Obviously from these numbers Git is by far the more popular DVCS for open
|
|
|
|
source projects and choosing the more popular VCS has a number of positive
|
|
|
|
benefits.
|
|
|
|
|
|
|
|
For new contributors it increases the likelihood that they will have already
|
|
|
|
learned the basics of Git as part of working with another project or if they
|
|
|
|
are just now learning Git, that they'll be able to take that knowledge and
|
|
|
|
apply it to other projects. Additionally a larger community means more people
|
|
|
|
writing how to guides, answering questions, and writing articles about Git
|
|
|
|
which makes it easier for a new user to find answers and information about
|
|
|
|
the tool they are trying to learn.
|
|
|
|
|
|
|
|
Another benefit is that by nature of having a larger community, there will be
|
|
|
|
more tooling written *around* it. This increases options for everything from
|
|
|
|
GUI clients, helper scripts, repository hosting, etc.
|
|
|
|
|
|
|
|
|
|
|
|
Repository Hosting
|
|
|
|
------------------
|
|
|
|
|
|
|
|
This PEP proposes allowing GitHub Pull Requests to be submitted, however GitHub
|
|
|
|
does not have a way to submit Pull Requests against a repository that is not
|
|
|
|
hosted on GitHub. This PEP also proposes that in addition to GitHub Pull
|
|
|
|
Requests Phabricator's Differential app can also be used to submit proposed
|
|
|
|
changes and Phabricator *does* allow submitting changes against a repository
|
|
|
|
that is not hosted on Phabricator.
|
|
|
|
|
|
|
|
For this reason this PEP proposes using GitHub as the canonical location of
|
|
|
|
the repository with a read-only mirror located in Phabricator. If at some point
|
|
|
|
in the future GitHub is no longer desired, then repository hosting can easily
|
|
|
|
be moved to solely in Phabricator and the ability to accept GitHub Pull
|
|
|
|
Requests dropped.
|
|
|
|
|
|
|
|
In addition to hosting the repositories on Github, a read only copy of all
|
|
|
|
repositories will also be mirrored onto the PSF Infrastructure.
|
|
|
|
|
|
|
|
|
|
|
|
Code Review
|
2014-11-29 18:12:06 -05:00
|
|
|
-----------
|
|
|
|
|
2015-02-01 10:26:15 -05:00
|
|
|
Currently CPython uses a custom fork of Rietveld which has been modified to
|
|
|
|
not run on Google App Engine which is really only able to be maintained
|
|
|
|
currently by one person. In addition it is missing out on features that are
|
|
|
|
present in many modern code review tools.
|
|
|
|
|
|
|
|
This PEP proposes allowing both Github Pull Requests and Phabricator changes
|
|
|
|
to propose changes and review code. It suggests both so that contributors can
|
|
|
|
select which tool best enables them to submit changes, and reviewers can focus
|
|
|
|
on reviewing changes in the tooling they like best.
|
|
|
|
|
|
|
|
|
|
|
|
GitHub Pull Requests
|
|
|
|
~~~~~~~~~~~~~~~~~~~~
|
|
|
|
|
|
|
|
GitHub is a very popular code hosting site and is increasingly becoming the
|
|
|
|
primary place people look to contribute to a project. Enabling users to
|
|
|
|
contribute through GitHub is enabling contributors to contribute using tooling
|
|
|
|
that they are likely already familiar with and if they are not they are likely
|
|
|
|
to be able to apply to another project.
|
|
|
|
|
|
|
|
GitHub Pull Requests have a fairly major advantage over the older "submit a
|
|
|
|
patch to a bug tracker" model. It allows developers to work completely within
|
|
|
|
their VCS using standard VCS tooling so it does not require creating a patch
|
|
|
|
file and figuring out what the right location is to upload it to. This lowers
|
|
|
|
the barrier for sending a change to be reviewed.
|
|
|
|
|
|
|
|
On the reviewing side, GitHub Pull Requests are far easier to review, they have
|
|
|
|
nice syntax highlighted diffs which can operate in either unified or side by
|
|
|
|
side views. They allow expanding the context on a diff up to and including the
|
|
|
|
entire file. Finally they allow commenting inline and on the pull request as
|
|
|
|
a whole and they present that in a nice unified way which will also hide
|
|
|
|
comments which no longer apply. Github also provides a "rendered diff" view
|
|
|
|
which enables easily viewing a diff of rendered markup (such as rst) instead
|
|
|
|
of needing to review the diff of the raw markup.
|
|
|
|
|
|
|
|
The Pull Request work flow also makes it trivial to enable the ability to
|
|
|
|
pre-test a change before actually merging it. Any particular pull request can
|
|
|
|
have any number of different types of "commit statuses" applied to it, marking
|
|
|
|
the commit (and thus the pull request) as either in a pending, successful,
|
|
|
|
errored, or failure state. This makes it easy to see inline if the pull request
|
|
|
|
is passing all of the tests, if the contributor has signed a CLA, etc.
|
|
|
|
|
|
|
|
Actually merging a Github Pull Request is quite simple, a core reviewer simply
|
|
|
|
needs to press the "Merge" button once the status of all the checks on the
|
|
|
|
Pull Request are green for successful.
|
|
|
|
|
|
|
|
GitHub also has a good workflow for submitting pull requests to a project
|
|
|
|
completely through their web interface. This would enable the Python
|
|
|
|
documentation to have "Edit on GitHub" buttons on every page and people who
|
|
|
|
discover things like typos, inaccuracies, or just want to make improvements to
|
|
|
|
the docs they are currently writing can simply hit that button and get an in
|
|
|
|
browser editor that will let them make changes and submit a pull request all
|
|
|
|
from the comfort of their browser.
|
|
|
|
|
|
|
|
|
|
|
|
Phabricator
|
|
|
|
~~~~~~~~~~~
|
|
|
|
|
|
|
|
In addition to GitHub Pull Requests this PEP also proposes setting up a
|
|
|
|
Phabricator instance and pointing it at the GitHub hosted repositories. This
|
|
|
|
will allow utilizing the Phabricator review applications of Differential and
|
|
|
|
Audit.
|
|
|
|
|
|
|
|
Differential functions similarly to GitHub pull requests except that they
|
|
|
|
require installing the ``arc`` command line tool to upload patches to
|
|
|
|
Phabricator.
|
|
|
|
|
|
|
|
Whether to enable Phabricator for any particular repository can be chosen on
|
2021-02-03 09:06:23 -05:00
|
|
|
a case-by-case basis, this PEP only proposes that it must be enabled for the
|
2015-02-01 10:26:15 -05:00
|
|
|
CPython repository, however for smaller repositories such as the PEP repository
|
|
|
|
it may not be worth the effort.
|
|
|
|
|
|
|
|
|
|
|
|
Criticism
|
|
|
|
=========
|
|
|
|
|
|
|
|
X is not written in Python
|
|
|
|
--------------------------
|
|
|
|
|
|
|
|
One feature that the current tooling (Mercurial, Rietveld) has is that the
|
|
|
|
primary language for all of the pieces are written in Python. It is this PEPs
|
|
|
|
belief that we should focus on the *best* tools for the job and not the *best*
|
|
|
|
tools that happen to be written in Python. Volunteer time is a precious
|
|
|
|
resource to any open source project and we can best respect and utilize that
|
|
|
|
time by focusing on the benefits and downsides of the tools themselves rather
|
|
|
|
than what language their authors happened to write them in.
|
|
|
|
|
|
|
|
One concern is the ability to modify tools to work for us, however one of
|
|
|
|
the Goals here is to *not* modify software to work for us and instead adapt
|
|
|
|
ourselves to a more standard workflow. This standardization pays off in the
|
|
|
|
ability to re-use tools out of the box freeing up developer time to actually
|
|
|
|
work on Python itself as well as enabling knowledge sharing between projects.
|
|
|
|
|
2016-07-11 11:14:08 -04:00
|
|
|
However, if we do need to modify the tooling, Git itself is largely written in
|
2015-02-01 10:26:15 -05:00
|
|
|
C the same as CPython itself is. It can also have commands written for it using
|
|
|
|
any language, including Python. Phabricator is written in PHP which is a fairly
|
|
|
|
common language in the web world and fairly easy to pick up. GitHub itself is
|
|
|
|
largely written in Ruby but given that it's not Open Source there is no ability
|
|
|
|
to modify it so it's implementation language is completely meaningless.
|
|
|
|
|
|
|
|
|
|
|
|
GitHub is not Free/Open Source
|
|
|
|
------------------------------
|
|
|
|
|
|
|
|
GitHub is a big part of this proposal and someone who tends more to ideology
|
|
|
|
rather than practicality may be opposed to this PEP on that grounds alone. It
|
|
|
|
is this PEPs belief that while using entirely Free/Open Source software is an
|
|
|
|
attractive idea and a noble goal, that valuing the time of the contributors by
|
|
|
|
giving them good tooling that is well maintained and that they either already
|
|
|
|
know or if they learn it they can apply to other projects is a more important
|
|
|
|
concern than treating whether something is Free/Open Source is a hard
|
|
|
|
requirement.
|
|
|
|
|
|
|
|
However, history has shown us that sometimes benevolent proprietary companies
|
|
|
|
can stop being benevolent. This is hedged against in a few ways:
|
|
|
|
|
|
|
|
* We are not utilizing the GitHub Issue Tracker, both because it is not
|
|
|
|
powerful enough for CPython but also because for the primary CPython
|
|
|
|
repository the ability to take our issues and put them somewhere else if we
|
|
|
|
ever need to leave GitHub relies on GitHub continuing to allow API access.
|
|
|
|
|
|
|
|
* We are utilizing the GitHub Pull Request workflow, however all of those
|
|
|
|
changes live inside of Git. So a mirror of the GitHub repositories can easily
|
|
|
|
contain all of those Pull Requests. We would potentially lose any comments if
|
|
|
|
GitHub suddenly turned "evil", but the changes themselves would still exist.
|
|
|
|
|
|
|
|
* We are utilizing the GitHub repository hosting feature, however since this is
|
|
|
|
just git moving away from GitHub is as simple as pushing the repository to
|
|
|
|
a different location. Data portability for the repository itself is extremely
|
|
|
|
high.
|
|
|
|
|
|
|
|
* We are also utilizing Phabricator to provide an alternative for people who
|
|
|
|
do not wish to use GitHub. This also acts as a fallback option which will
|
|
|
|
already be in place if we ever need to stop using GitHub.
|
|
|
|
|
|
|
|
Relying on GitHub comes with a number of benefits beyond just the benefits of
|
2016-07-11 11:14:08 -04:00
|
|
|
the platform itself. Since it is a commercially backed venture it has a full-time
|
|
|
|
staff responsible for maintaining its services. This includes making sure
|
2015-02-01 10:26:15 -05:00
|
|
|
they stay up, making sure they stay patched for various security
|
|
|
|
vulnerabilities, and further improving the software and infrastructure as time
|
|
|
|
goes on.
|
|
|
|
|
|
|
|
|
|
|
|
Mercurial is better than Git
|
|
|
|
----------------------------
|
|
|
|
|
|
|
|
Whether Mercurial or Git is better on a technical level is a highly subjective
|
|
|
|
opinion. This PEP does not state whether the mechanics of Git or Mercurial is
|
|
|
|
better and instead focuses on the network effect that is available for either
|
|
|
|
option. Since this PEP proposes switching to Git this leaves the people who
|
|
|
|
prefer Mercurial out, however those users can easily continue to work with
|
|
|
|
Mercurial by using the hg-git [#hg-git]_ extension for Mercurial which will
|
|
|
|
let it work with a repository which is Git on the serverside.
|
|
|
|
|
|
|
|
|
|
|
|
CPython Workflow is too Complicated
|
|
|
|
-----------------------------------
|
|
|
|
|
|
|
|
One sentiment that came out of previous discussions was that the multi branch
|
|
|
|
model of CPython was too complicated for Github Pull Requests. It is the belief
|
|
|
|
of this PEP that statement is not accurate.
|
|
|
|
|
|
|
|
Currently any particular change requires manually creating a patch for 2.7 and
|
|
|
|
3.x which won't change at all in this regards.
|
|
|
|
|
|
|
|
If someone submits a fix for the current stable branch (currently 3.4) the
|
|
|
|
GitHub Pull Request workflow can be used to create, in the browser, a Pull
|
|
|
|
Request to merge the current stable branch into the master branch (assuming
|
|
|
|
there is no merge conflicts). If there is a merge conflict that would need to
|
|
|
|
be handled locally. This provides an improvement over the current situation
|
|
|
|
where the merge must always happen locally.
|
|
|
|
|
|
|
|
Finally if someone submits a fix for the current development branch currently
|
|
|
|
then this has to be manually applied to the stable branch if it desired to
|
|
|
|
include it there as well. This must also happen locally as well in the new
|
|
|
|
workflow, however for minor changes it could easily be accomplished in the
|
|
|
|
GitHub web editor.
|
|
|
|
|
|
|
|
Looking at this, I do not believe that *any* system can hide the complexities
|
|
|
|
involved in maintaining several long running branches. The only thing that the
|
|
|
|
tooling can do is make it as easy as possible to submit changes.
|
2014-11-29 18:12:06 -05:00
|
|
|
|
|
|
|
|
|
|
|
Example: Scientific Python
|
2015-02-01 10:26:15 -05:00
|
|
|
==========================
|
2014-11-29 18:12:06 -05:00
|
|
|
|
|
|
|
One of the key ideas behind the move to both git and Github is that a feature
|
|
|
|
of a DVCS, the repository hosting, and the workflow used is the social network
|
|
|
|
and size of the community using said tools. We can see this is true by looking
|
|
|
|
at an example from a sub-community of the Python community: The Scientific
|
|
|
|
Python community. They have already migrated most of the key pieces of the
|
2016-07-11 11:14:08 -04:00
|
|
|
SciPy stack onto Github using the Pull Request-based workflow. This process
|
2014-11-29 19:21:23 -05:00
|
|
|
started with IPython, and as more projects moved over it became a natural
|
|
|
|
default for new projects in the community.
|
2014-11-29 18:12:06 -05:00
|
|
|
|
2014-11-29 19:21:23 -05:00
|
|
|
They claim to have seen a great benefit from this move, in that it enables
|
|
|
|
casual contributors to easily move between different projects within their
|
2014-11-29 18:12:06 -05:00
|
|
|
sub-community without having to learn a special, bespoke workflow and a
|
|
|
|
different toolchain for each project. They've found that when people can use
|
|
|
|
their limited time on actually contributing instead of learning the different
|
2014-11-30 18:17:40 -05:00
|
|
|
tools and workflows, not only do they contribute more to one project, but
|
2014-11-29 19:21:23 -05:00
|
|
|
that they also expand out and contribute to other projects. This move has also
|
|
|
|
been attributed to the increased tendency for members of that community to go
|
|
|
|
so far as publishing their research and educational materials on Github as
|
|
|
|
well.
|
2014-11-29 18:12:06 -05:00
|
|
|
|
2014-11-29 19:21:23 -05:00
|
|
|
This example showcases the real power behind moving to a highly popular
|
|
|
|
toolchain and workflow, as each variance introduces yet another hurdle for new
|
|
|
|
and casual contributors to get past and it makes the time spent learning that
|
|
|
|
workflow less reusable with other projects.
|
2014-11-29 18:12:06 -05:00
|
|
|
|
|
|
|
|
|
|
|
References
|
|
|
|
==========
|
|
|
|
|
2022-10-06 15:14:32 -04:00
|
|
|
.. [#openhub-stats] `Open Hub Statistics <https://www.openhub.net/repositories/compare>`_
|
|
|
|
.. [#hg-git] `Hg-Git mercurial plugin <https://hg-git.github.io/>`_
|
2014-11-29 18:12:06 -05:00
|
|
|
|
|
|
|
|
|
|
|
Copyright
|
|
|
|
=========
|
|
|
|
|
|
|
|
This document has been placed in the public domain.
|