diff --git a/pep-0507.txt b/pep-0507.txt new file mode 100644 index 000000000..959b9f812 --- /dev/null +++ b/pep-0507.txt @@ -0,0 +1,303 @@ +PEP: 507 +Title: Migrate CPython to Git and GitLab +Version: $Revision$ +Last-Modified: $Date$ +Author: Barry Warsaw +Status: Draft +Type: Process +Content-Type: text/x-rst +Created: 2015-09-30 +Post-History: + + +Abstract +======== + +This PEP proposes migrating the repository hosting of CPython and the +supporting repositories to Git. Further, it proposes adopting a +hosted GitLab instance as the primary way of handling merge requests, +code reviews, and code hosting. It is similar in intent to PEP 481 +but proposes an open source alternative to GitHub and omits the +proposal to run Phabricator. As with PEP 481, this particular PEP is +offered as an alternative to PEP 474 and PEP 462. + + +Rationale +========= + +CPython is an open source project which relies on a number of +volunteers donating their time. As with any healthy, vibrant open +source project, it relies on attracting new volunteers as well as +retaining existing developers. Given that volunteer time is the most +scarce resource, providing a process that maximizes the efficiency of +contributors and reduces the friction for contributions, is of vital +importance for the long-term health of the project. + +The current tool chain of the CPython project is a custom and unique +combination of tools. This has two critical implications: + +* The unique nature of the tool chain means that contributors must + remember or relearn, the process, workflow, and tools whenever they + contribute to CPython, without the advantage of leveraging long-term + memory and familiarity they retain by working with other projects in + the FLOSS ecosystem. The knowledge they gain in working with + CPython is unlikely to be applicable to other projects. + +* The burden on the Python/PSF infrastructure team is much greater in + order to continue to maintain custom tools, improve them over time, + fix bugs, address security issues, and more generally adapt to new + standards in online software development with global collaboration. + +These limitations act as a barrier to contribution both for highly +engaged contributors (e.g. core Python developers) and especially for +more casual "drive-by" contributors, who care more about getting their +bug fix than learning a new suite of tools and workflows. + +By proposing the adoption of both a different version control system +and a modern, well-maintained hosting solution, this PEP addresses +these limitations. It aims to enable a modern, well-understood +process that will carry CPython development for many years. + + +Version Control System +---------------------- + +Currently the CPython and supporting repositories use Mercurial. As a +modern distributed version control system, it has served us well since +the migration from Subversion. However, when evaluating the VCS we +must consider the capabilities of the VCS itself as well as the +network effect and mindshare of the community around that VCS. + +There are really only two real options for this, Mercurial and Git. +The technical capabilities of the two systems are largely equivalent, +therefore this PEP instead focuses on their social aspects. + +It is not possible to get exact numbers for the number of projects or +people which are using a particular VCS, however we can infer this by +looking at several sources of information for what VCS projects are +using. + +The Open Hub (previously Ohloh) statistics [#openhub-stats]_ show that +37% of the repositories indexed by The Open Hub are using Git (second +only to Subversion which has 48%) while Mercurial has just 2%, beating +only Bazaar which has 1%. This has Git being just over 18 times as +popular as Mercurial on The Open Hub. + +Another source of information on VCS popularity is PyPI itself. This +source is more targeted at the Python community itself since it +represents projects developed for Python. Unfortunately PyPI does not +have a standard location for representing this information, so this +requires manual processing. If we limit our search to the top 100 +projects on PyPI (ordered by download counts) we can see that 62% of +them use Git, while 22% of them use Mercurial, and 13% use something +else. This has Git being just under 3 times as popular as Mercurial +for the top 100 projects on PyPI. + +These numbers back up the anecdotal evidence for Git as the far more +popular DVCS for open source projects. Choosing the more popular VCS +has a number of positive benefits. + +For new contributors it increases the likelihood that they will have already +learned the basics of Git as part of working with another project or if they +are just now learning Git, that they'll be able to take that knowledge and +apply it to other projects. Additionally a larger community means more people +writing how to guides, answering questions, and writing articles about Git +which makes it easier for a new user to find answers and information about the +tool they are trying to learn and use. Given its popularity, there may also +be more auxiliary tooling written *around* Git. This increases options for +everything from GUI clients, helper scripts, repository hosting, etc. + +Further, the adoption of Git as the proposed back-end repository +format doesn't prohibit the use of Mercurial by fans of that VCS! +Mercurial users have the [#hg-git]_ plugin which allows them to push +and pull from a Git server using the Mercurial front-end. It's a +well-maintained and highly functional plugin that seems to be +well-liked by Mercurial users. + + +Repository Hosting +------------------ + +Where and how the official repositories for CPython are hosted is in +someways determined by the choice of VCS. With Git there are several +options. In fact, once the repository is hosted in Git, branches can +be mirrored in many locations, within many free, open, and proprietary +code hosting sites. + +It's still important for CPython to adopt a single, official +repository, with a web front-end that allows for many convenient and +common interactions entirely through the web, without always requiring +local VCS manipulations. These interactions include as a minimum, +code review with inline comments, branch diffing, CI integration, and +auto-merging. + +This PEP proposes to adopt a [#GitLab]_ instance, run within the +python.org domain, accessible to and with ultimate control from the +PSF and the Python infrastructure team, but donated, hosted, and +primarily maintained by GitLab, Inc. + +Why GitLab? Because it is a fully functional Git hosting system, that +sports modern web interactions, software workflows, and CI +integration. GitLab's Community Edition (CE) is open source software, +and thus is closely aligned with the principles of the CPython +community. + + +Code Review +----------- + +Currently CPython uses a custom fork of Rietveld modified to not run +on Google App Engine and which is currently only really maintained by +one person. It is missing common features present in many modern code +review tools. + +This PEP proposes to utilize GitLab's built-in merge requests and +online code review features to facilitate reviews of all proposed +changes. + + +GitLab merge requests +--------------------- + +The normal workflow for a GitLab hosted project is to submit a *merge request* +asking that a feature or bug fix branch be merged into a target branch, +usually one or more of the stable maintenance branches or the next-version +master branch for new features. GitLab's merge requests are similar in form +and function to GitHub's pull requests, so anybody who is already familiar +with the latter should be able to immediately utilize the former. + +Once submitted, a conversation about the change can be had between the +submitter and reviewer. This includes both general comments, and inline +comments attached to a particular line of the diff between the source and +target branches. Projects can also be configured to automatically run +continuous integration on the submitted branch, the results of which are +readily visible from the merge request page. Thus both the reviewer and +submitter can immediately see the results of the tests, making it much easier +to only land branches with passing tests. Each new push to the source branch +(e.g. to respond to a commenter's feedback or to fix a failing test) results +in a new run of the CI, so that the state of the request always reflects the +latest commit. + +Merge requests have a fairly major advantage over the older "submit a patch to +a bug tracker" model. They allow developers to work completely within the VCS +using standard VCS tooling, without requiring the creation of a patch file or +figuring out the right location to upload the patch to. This lowers the +barrier for sending a change to be reviewed. + +Merge requests are far easier to review. For example, they provide nice +syntax highlighted diffs which can operate in either unified or side by side +views. They allow commenting inline and on the merge request as a whole and +they present that in a nice unified way which will also hide comments which no +longer apply. Comments can be hidden and revealed. + +Actually merging a merge request is quite simple, if the source branch applies +cleanly to the target branch. A core reviewer simply needs to press the +"Merge" button for GitLab to automatically perform the merge. The source +branch can be optionally rebased, and once the merge is completed, the source +branch can be automatically deleted. + +GitLab also has a good workflow for submitting pull requests to a project +completely through their web interface. This would enable the Python +documentation to have "Edit on GitLab" buttons on every page and people who +discover things like typos, inaccuracies, or just want to make improvements to +the docs they are currently reading. They can simply hit that button and get +an in browser editor that will let them make changes and submit a merge +request all from the comfort of their browser. + + +Criticism +========= + +X is not written in Python +-------------------------- + +One feature that the current tooling (Mercurial, Rietveld) has is that the +primary language for all of the pieces are written in Python. This PEP +focuses more on the *best* tools for the job and not necessarily on the *best* +tools that happen to be written in Python. Volunteer time is the most +precious resource for any open source project and we can best respect and +utilize that time by focusing on the benefits and downsides of the tools +themselves rather than what language their authors happened to write them in. + +One concern is the ability to modify tools to work for us, however one of the +Goals here is to *not* modify software to work for us and instead adapt +ourselves to a more standardized workflow. This standardization pays off in +the ability to re-use tools out of the box freeing up developer time to +actually work on Python itself as well as enabling knowledge sharing between +projects. + +However if we do need to modify the tooling, Git itself is largely written in +C the same as CPython itself. It can also have commands written for it using +any language, including Python. GitLab itself is largely written in Ruby and +since it is Open Source software, we would have the ability to submit merge +requests to the upstream Community Edition, albeit in language potentially +unfamiliar to most Python programmers. + + +Mercurial is better than Git +---------------------------- + +Whether Mercurial or Git is better on a technical level is a highly subjective +opinion. This PEP does not state whether the mechanics of Git or Mercurial +are better, and instead focuses on the network effect that is available for +either option. While this PEP proposes switching to Git, Mercurial users are +not left completely out of the loop. By using the hg-git extension for +Mercurial, working with server-side Git repositories is fairly easy and +straightforward. + + +CPython Workflow is too Complicated +----------------------------------- + +One sentiment that came out of previous discussions was that the multi-branch +model of CPython was too complicated for GitLab style merge requests. This +PEP disagrees with that sentiment. + +Currently any particular change requires manually creating a patch for 2.7 and +3.x which won't change at all in this regards. + +If someone submits a fix for the current stable branch (e.g. 3.5) the merge +request workflow can be used to create a request to merge the current stable +branch into the master branch, assuming there is no merge conflicts. As +always, merge conflicts must be manually and locally resolved. Because +developers also have the *option* of performing the merge locally, this +provides an improvement over the current situation where the merge *must* +always happen locally. + +For fixes in the current development branch that must also be applied to +stable release branches, it is possible in many situations to locally cherry +pick and apply the change to other branches, with merge requests submitted for +each stable branch. It is also possible just cherry pick and complete the +merge locally. These are all accomplished with standard Git commands and +techniques, with the advantage that all such changes can go through the review +and CI test workflows, even for merges to stable branches. Minor changes may +be easily accomplished in the GitLab web editor. + +No system can hide all the complexities involved in maintaining several long +lived branches. The only thing that the tooling can do is make it as easy as +possible to submit and commit changes. + + +References +========== + +.. [#openhub-stats] `Open Hub Statistics ` +.. [#hg-git] `Hg-Git mercurial plugin ` +.. [#GitLab] `https://about.gitlab.com/` + + +Copyright +========= + +This document has been placed in the public domain. + + + +.. + Local Variables: + mode: indented-text + indent-tabs-mode: nil + sentence-end-double-space: t + fill-column: 70 + coding: utf-8 + End: