332 lines
15 KiB
Plaintext
332 lines
15 KiB
Plaintext
PEP: 507
|
||
Title: Migrate CPython to Git and GitLab
|
||
Version: $Revision$
|
||
Last-Modified: $Date$
|
||
Author: Barry Warsaw <barry@python.org>
|
||
Status: Draft
|
||
Type: Process
|
||
Content-Type: text/x-rst
|
||
Created: 2015-09-30
|
||
Post-History:
|
||
|
||
|
||
Abstract
|
||
========
|
||
|
||
This PEP proposes migrating the repository hosting of CPython and the
|
||
supporting repositories to Git. Further, it proposes adopting a
|
||
hosted GitLab instance as the primary way of handling merge requests,
|
||
code reviews, and code hosting. It is similar in intent to PEP 481
|
||
but proposes an open source alternative to GitHub and omits the
|
||
proposal to run Phabricator. As with PEP 481, this particular PEP is
|
||
offered as an alternative to PEP 474 and PEP 462.
|
||
|
||
|
||
Rationale
|
||
=========
|
||
|
||
CPython is an open source project which relies on a number of
|
||
volunteers donating their time. As with any healthy, vibrant open
|
||
source project, it relies on attracting new volunteers as well as
|
||
retaining existing developers. Given that volunteer time is the most
|
||
scarce resource, providing a process that maximizes the efficiency of
|
||
contributors and reduces the friction for contributions, is of vital
|
||
importance for the long-term health of the project.
|
||
|
||
The current tool chain of the CPython project is a custom and unique
|
||
combination of tools. This has two critical implications:
|
||
|
||
* The unique nature of the tool chain means that contributors must
|
||
remember or relearn, the process, workflow, and tools whenever they
|
||
contribute to CPython, without the advantage of leveraging long-term
|
||
memory and familiarity they retain by working with other projects in
|
||
the FLOSS ecosystem. The knowledge they gain in working with
|
||
CPython is unlikely to be applicable to other projects.
|
||
|
||
* The burden on the Python/PSF infrastructure team is much greater in
|
||
order to continue to maintain custom tools, improve them over time,
|
||
fix bugs, address security issues, and more generally adapt to new
|
||
standards in online software development with global collaboration.
|
||
|
||
These limitations act as a barrier to contribution both for highly
|
||
engaged contributors (e.g. core Python developers) and especially for
|
||
more casual "drive-by" contributors, who care more about getting their
|
||
bug fix than learning a new suite of tools and workflows.
|
||
|
||
By proposing the adoption of both a different version control system
|
||
and a modern, well-maintained hosting solution, this PEP addresses
|
||
these limitations. It aims to enable a modern, well-understood
|
||
process that will carry CPython development for many years.
|
||
|
||
|
||
Version Control System
|
||
----------------------
|
||
|
||
Currently the CPython and supporting repositories use Mercurial. As a
|
||
modern distributed version control system, it has served us well since
|
||
the migration from Subversion. However, when evaluating the VCS we
|
||
must consider the capabilities of the VCS itself as well as the
|
||
network effect and mindshare of the community around that VCS.
|
||
|
||
There are really only two real options for this, Mercurial and Git.
|
||
The technical capabilities of the two systems are largely equivalent,
|
||
therefore this PEP instead focuses on their social aspects.
|
||
|
||
It is not possible to get exact numbers for the number of projects or
|
||
people which are using a particular VCS, however we can infer this by
|
||
looking at several sources of information for what VCS projects are
|
||
using.
|
||
|
||
The Open Hub (previously Ohloh) statistics [#openhub-stats]_ show that
|
||
37% of the repositories indexed by The Open Hub are using Git (second
|
||
only to Subversion which has 48%) while Mercurial has just 2%, beating
|
||
only Bazaar which has 1%. This has Git being just over 18 times as
|
||
popular as Mercurial on The Open Hub.
|
||
|
||
Another source of information on VCS popularity is PyPI itself. This
|
||
source is more targeted at the Python community itself since it
|
||
represents projects developed for Python. Unfortunately PyPI does not
|
||
have a standard location for representing this information, so this
|
||
requires manual processing. If we limit our search to the top 100
|
||
projects on PyPI (ordered by download counts) we can see that 62% of
|
||
them use Git, while 22% of them use Mercurial, and 13% use something
|
||
else. This has Git being just under 3 times as popular as Mercurial
|
||
for the top 100 projects on PyPI.
|
||
|
||
These numbers back up the anecdotal evidence for Git as the far more
|
||
popular DVCS for open source projects. Choosing the more popular VCS
|
||
has a number of positive benefits.
|
||
|
||
For new contributors it increases the likelihood that they will have already
|
||
learned the basics of Git as part of working with another project or if they
|
||
are just now learning Git, that they'll be able to take that knowledge and
|
||
apply it to other projects. Additionally a larger community means more people
|
||
writing how to guides, answering questions, and writing articles about Git
|
||
which makes it easier for a new user to find answers and information about the
|
||
tool they are trying to learn and use. Given its popularity, there may also
|
||
be more auxiliary tooling written *around* Git. This increases options for
|
||
everything from GUI clients, helper scripts, repository hosting, etc.
|
||
|
||
Further, the adoption of Git as the proposed back-end repository
|
||
format doesn't prohibit the use of Mercurial by fans of that VCS!
|
||
Mercurial users have the [#hg-git]_ plugin which allows them to push
|
||
and pull from a Git server using the Mercurial front-end. It's a
|
||
well-maintained and highly functional plugin that seems to be
|
||
well-liked by Mercurial users.
|
||
|
||
|
||
Repository Hosting
|
||
------------------
|
||
|
||
Where and how the official repositories for CPython are hosted is in
|
||
someways determined by the choice of VCS. With Git there are several
|
||
options. In fact, once the repository is hosted in Git, branches can
|
||
be mirrored in many locations, within many free, open, and proprietary
|
||
code hosting sites.
|
||
|
||
It's still important for CPython to adopt a single, official
|
||
repository, with a web front-end that allows for many convenient and
|
||
common interactions entirely through the web, without always requiring
|
||
local VCS manipulations. These interactions include as a minimum,
|
||
code review with inline comments, branch diffing, CI integration, and
|
||
auto-merging.
|
||
|
||
This PEP proposes to adopt a [#GitLab]_ instance, run within the
|
||
python.org domain, accessible to and with ultimate control from the
|
||
PSF and the Python infrastructure team, but donated, hosted, and
|
||
primarily maintained by GitLab, Inc.
|
||
|
||
Why GitLab? Because it is a fully functional Git hosting system, that
|
||
sports modern web interactions, software workflows, and CI
|
||
integration. GitLab's Community Edition (CE) is open source software,
|
||
and thus is closely aligned with the principles of the CPython
|
||
community.
|
||
|
||
|
||
Code Review
|
||
-----------
|
||
|
||
Currently CPython uses a custom fork of Rietveld modified to not run
|
||
on Google App Engine and which is currently only really maintained by
|
||
one person. It is missing common features present in many modern code
|
||
review tools.
|
||
|
||
This PEP proposes to utilize GitLab's built-in merge requests and
|
||
online code review features to facilitate reviews of all proposed
|
||
changes.
|
||
|
||
|
||
GitLab merge requests
|
||
---------------------
|
||
|
||
The normal workflow for a GitLab hosted project is to submit a *merge request*
|
||
asking that a feature or bug fix branch be merged into a target branch,
|
||
usually one or more of the stable maintenance branches or the next-version
|
||
master branch for new features. GitLab's merge requests are similar in form
|
||
and function to GitHub's pull requests, so anybody who is already familiar
|
||
with the latter should be able to immediately utilize the former.
|
||
|
||
Once submitted, a conversation about the change can be had between the
|
||
submitter and reviewer. This includes both general comments, and inline
|
||
comments attached to a particular line of the diff between the source and
|
||
target branches. Projects can also be configured to automatically run
|
||
continuous integration on the submitted branch, the results of which are
|
||
readily visible from the merge request page. Thus both the reviewer and
|
||
submitter can immediately see the results of the tests, making it much easier
|
||
to only land branches with passing tests. Each new push to the source branch
|
||
(e.g. to respond to a commenter's feedback or to fix a failing test) results
|
||
in a new run of the CI, so that the state of the request always reflects the
|
||
latest commit.
|
||
|
||
Merge requests have a fairly major advantage over the older "submit a patch to
|
||
a bug tracker" model. They allow developers to work completely within the VCS
|
||
using standard VCS tooling, without requiring the creation of a patch file or
|
||
figuring out the right location to upload the patch to. This lowers the
|
||
barrier for sending a change to be reviewed.
|
||
|
||
Merge requests are far easier to review. For example, they provide nice
|
||
syntax highlighted diffs which can operate in either unified or side by side
|
||
views. They allow commenting inline and on the merge request as a whole and
|
||
they present that in a nice unified way which will also hide comments which no
|
||
longer apply. Comments can be hidden and revealed.
|
||
|
||
Actually merging a merge request is quite simple, if the source branch applies
|
||
cleanly to the target branch. A core reviewer simply needs to press the
|
||
"Merge" button for GitLab to automatically perform the merge. The source
|
||
branch can be optionally rebased, and once the merge is completed, the source
|
||
branch can be automatically deleted.
|
||
|
||
GitLab also has a good workflow for submitting pull requests to a project
|
||
completely through their web interface. This would enable the Python
|
||
documentation to have "Edit on GitLab" buttons on every page and people who
|
||
discover things like typos, inaccuracies, or just want to make improvements to
|
||
the docs they are currently reading. They can simply hit that button and get
|
||
an in browser editor that will let them make changes and submit a merge
|
||
request all from the comfort of their browser.
|
||
|
||
|
||
Criticism
|
||
=========
|
||
|
||
X is not written in Python
|
||
--------------------------
|
||
|
||
One feature that the current tooling (Mercurial, Rietveld) has is that the
|
||
primary language for all of the pieces are written in Python. This PEP
|
||
focuses more on the *best* tools for the job and not necessarily on the *best*
|
||
tools that happen to be written in Python. Volunteer time is the most
|
||
precious resource for any open source project and we can best respect and
|
||
utilize that time by focusing on the benefits and downsides of the tools
|
||
themselves rather than what language their authors happened to write them in.
|
||
|
||
One concern is the ability to modify tools to work for us, however one of the
|
||
Goals here is to *not* modify software to work for us and instead adapt
|
||
ourselves to a more standardized workflow. This standardization pays off in
|
||
the ability to re-use tools out of the box freeing up developer time to
|
||
actually work on Python itself as well as enabling knowledge sharing between
|
||
projects.
|
||
|
||
However if we do need to modify the tooling, Git itself is largely written in
|
||
C the same as CPython itself. It can also have commands written for it using
|
||
any language, including Python. GitLab itself is largely written in Ruby and
|
||
since it is Open Source software, we would have the ability to submit merge
|
||
requests to the upstream Community Edition, albeit in language potentially
|
||
unfamiliar to most Python programmers.
|
||
|
||
|
||
Mercurial is better than Git
|
||
----------------------------
|
||
|
||
Whether Mercurial or Git is better on a technical level is a highly subjective
|
||
opinion. This PEP does not state whether the mechanics of Git or Mercurial
|
||
are better, and instead focuses on the network effect that is available for
|
||
either option. While this PEP proposes switching to Git, Mercurial users are
|
||
not left completely out of the loop. By using the hg-git extension for
|
||
Mercurial, working with server-side Git repositories is fairly easy and
|
||
straightforward.
|
||
|
||
|
||
CPython Workflow is too Complicated
|
||
-----------------------------------
|
||
|
||
One sentiment that came out of previous discussions was that the multi-branch
|
||
model of CPython was too complicated for GitLab style merge requests. This
|
||
PEP disagrees with that sentiment.
|
||
|
||
Currently any particular change requires manually creating a patch for 2.7 and
|
||
3.x which won't change at all in this regards.
|
||
|
||
If someone submits a fix for the current stable branch (e.g. 3.5) the merge
|
||
request workflow can be used to create a request to merge the current stable
|
||
branch into the master branch, assuming there is no merge conflicts. As
|
||
always, merge conflicts must be manually and locally resolved. Because
|
||
developers also have the *option* of performing the merge locally, this
|
||
provides an improvement over the current situation where the merge *must*
|
||
always happen locally.
|
||
|
||
For fixes in the current development branch that must also be applied to
|
||
stable release branches, it is possible in many situations to locally cherry
|
||
pick and apply the change to other branches, with merge requests submitted for
|
||
each stable branch. It is also possible just cherry pick and complete the
|
||
merge locally. These are all accomplished with standard Git commands and
|
||
techniques, with the advantage that all such changes can go through the review
|
||
and CI test workflows, even for merges to stable branches. Minor changes may
|
||
be easily accomplished in the GitLab web editor.
|
||
|
||
No system can hide all the complexities involved in maintaining several long
|
||
lived branches. The only thing that the tooling can do is make it as easy as
|
||
possible to submit and commit changes.
|
||
|
||
|
||
Open issues
|
||
===========
|
||
|
||
* What level of hosted support will GitLab offer? The PEP author has been in
|
||
contact with the GitLab CEO, with positive interest on their part. The
|
||
details of the hosting offer would have to be discussed.
|
||
|
||
* What happens to Roundup and do we switch to the GitLab issue tracker?
|
||
Currently, this PEP is *not* suggesting we move from Roundup to GitLab
|
||
issues. We have way too much invested in Roundup right now and migrating
|
||
the data would be a huge effort. GitLab does support webhooks, so we will
|
||
probably want to use webhooks to integrate merges and other events with
|
||
updates to Roundup (e.g. to include pointers to commits, close issues,
|
||
etc. similar to what is currently done).
|
||
|
||
* What happens to wiki.python.org? Nothing! While GitLab does support wikis
|
||
in repositories, there's no reason for us to migration our Moin wikis.
|
||
|
||
* What happens to the existing GitHub mirrors? We'd probably want to
|
||
regenerate them once the official upstream branches are natively hosted in
|
||
Git. This may change commit ids, but after that, it should be easy to
|
||
mirror the official Git branches and repositories far and wide.
|
||
|
||
* Where would the GitLab instance live? Physically, in whatever hosting
|
||
provider GitLab chooses. We would point gitlab.python.org (or
|
||
git.python.org?) to this host.
|
||
|
||
|
||
References
|
||
==========
|
||
|
||
.. [#openhub-stats] `Open Hub Statistics <https://www.openhub.net/repositories/compare>`
|
||
.. [#hg-git] `Hg-Git mercurial plugin <https://hg-git.github.io/>`
|
||
.. [#GitLab] `https://about.gitlab.com/`
|
||
|
||
|
||
Copyright
|
||
=========
|
||
|
||
This document has been placed in the public domain.
|
||
|
||
|
||
|
||
..
|
||
Local Variables:
|
||
mode: indented-text
|
||
indent-tabs-mode: nil
|
||
sentence-end-double-space: t
|
||
fill-column: 70
|
||
coding: utf-8
|
||
End:
|