PEP: 462 Title: Core development workflow automation for CPython Version: $Revision$ Last-Modified: $Date$ Author: Nick Coghlan Status: Deferred Type: Process Content-Type: text/x-rst Requires: 474 Created: 23-Jan-2014 Post-History: 25-Jan-2014, 27-Jan-2014 Abstract ======== This PEP proposes investing in automation of several of the tedious, time consuming activities that are currently required for the core development team to incorporate changes into CPython. This proposal is intended to allow core developers to make more effective use of the time they have available to contribute to CPython, which should also result in an improved experience for other contributors that are reliant on the core team to get their changes incorporated. PEP Deferral ============ This PEP is currently deferred pending updates to redesign it around the proposal in PEP 474 to create a Kallithea-based forge.python.org service. Rationale for changes to the core development workflow ====================================================== The current core developer workflow to merge a new feature into CPython on a POSIX system "works" as follows: #. If applying a change submitted to bugs.python.org by another user, first check they have signed the PSF Contributor Licensing Agreement. If not, request that they sign one before continuing with merging the change. #. Apply the change locally to a current checkout of the main CPython repository (the change will typically have been discussed and reviewed as a patch on bugs.python.org first, but this step is not currently considered mandatory for changes originating directly from core developers). #. Run the test suite locally, at least ``make test`` or ``./python -m test`` (depending on system specs, this takes a few minutes in the default configuration, but substantially longer if all optional resources, like external network access, are enabled). #. Run ``make patchcheck`` to fix any whitespace issues and as a reminder of other changes that may be needed (such as updating Misc/ACKS or adding an entry to Misc/NEWS) #. Commit the change and push it to the main repository. If hg indicates this would create a new head in the remote repository, run ``hg pull --rebase`` (or an equivalent). Theoretically, you should rerun the tests at this point, but it's *very* tempting to skip that step. #. After pushing, monitor the `stable buildbots `__ for any new failures introduced by your change. In particular, developers on POSIX systems will often break the Windows buildbots, and vice-versa. Less commonly, developers on Linux or Mac OS X may break other POSIX systems. The steps required on Windows are similar, but the exact commands used will be different. Rather than being simpler, the workflow for a bug fix is *more* complicated than that for a new feature! New features have the advantage of only being applied to the ``default`` branch, while bug fixes also need to be considered for inclusion in maintenance branches. * If a bug fix is applicable to Python 2.7, then it is also separately applied to the 2.7 branch, which is maintained as an independent head in Mercurial * If a bug fix is applicable to the current 3.x maintenance release, then it is first applied to the maintenance branch and then merged forward to the default branch. Both branches are pushed to hg.python.org at the same time. Documentation patches are simpler than functional patches, but not hugely so - the main benefit is only needing to check the docs build successfully rather than running the test suite. I would estimate that even when everything goes smoothly, it would still take me at least 20-30 minutes to commit a bug fix patch that applies cleanly. Given that it should be possible to automate several of these tasks, I do not believe our current practices are making effective use of scarce core developer resources. There are many, many frustrations involved with this current workflow, and they lead directly to some undesirable development practices. * Much of this overhead is incurred on a per-patch applied basis. This encourages large commits, rather than small isolated changes. The time required to commit a 500 line feature is essentially the same as that needed to commit a 1 line bug fix - the additional time needed for the larger change appears in any preceding review rather than as part of the commit process. * The additional overhead of working on applying bug fixes creates an additional incentive to work on new features instead, and new features are already *inherently* more interesting to work on - they don't need workflow difficulties giving them a helping hand! * Getting a preceding review on bugs.python.org is *additional* work, creating an incentive to commit changes directly, increasing the reliance on post-review on the python-checkins mailing list. * Patches on the tracker that are complete, correct and ready to merge may still languish for extended periods awaiting a core developer with the time to devote to getting it merged. * The risk of push races (especially when pushing a merged bug fix) creates a temptation to skip doing full local test runs (especially after a push race has already been encountered once), increasing the chance of breaking the buildbots. * The buildbots are sometimes red for extended periods, introducing errors into local test runs, and also meaning that they sometimes fail to serve as a reliable indicator of whether or not a patch has introduced cross platform issues. * Post-conference development sprints are a nightmare, as they collapse into a mire of push races. It's tempting to just leave patches on the tracker until after the sprint is over and then try to clean them up afterwards. There are also many, many opportunities for core developers to make mistakes that inconvenience others, both in managing the Mercurial branches and in breaking the buildbots without being in a position to fix them promptly. This both makes the existing core development team cautious in granting new developers commit access, as well as making those new developers cautious about actually making use of their increased level of access. There are also some incidental annoyances (like keeping the NEWS file up to date) that will also be necessarily addressed as part of this proposal. One of the most critical resources of a volunteer-driven open source project is the emotional energy of its contributors. The current approach to change incorporation doesn't score well on that front for anyone: * For core developers, the branch wrangling for bug fixes is delicate and easy to get wrong. Conflicts on the NEWS file and push races when attempting to upload changes add to the irritation of something most of us aren't being paid to spend time on. The time we spend actually getting a change merged is time we're not spending coding additional changes, writing or updating documentation or reviewing contributions from others. * Red buildbots make life difficult for other developers (since a local test failure may *not* be due to anything that developer did), release managers (since they may need to enlist assistance cleaning up test failures prior to a release) and for the developers themselves (since it creates significant pressure to fix any failures we inadvertently introduce right *now*, rather than at a more convenient time, as well as potentially making ``hg bisect`` more difficult to use if ``hg annotate`` isn't sufficient to identify the source of a new failure). * For other contributors, a core developer spending time actually getting changes merged is a developer that isn't reviewing and discussing patches on the issue tracker or otherwise helping others to contribute effectively. It is especially frustrating for contributors that are accustomed to the simplicity of a developer just being able to hit "Merge" on a pull request that has already been automatically tested in the project's CI system (which is a common workflow on sites like GitHub and BitBucket), or where the post-review part of the merge process is fully automated (as is the case for OpenStack). Current Tools ============= The following tools are currently used to manage various parts of the CPython core development workflow. * Mercurial (hg.python.org) for version control * Roundup (bugs.python.org) for issue tracking * Rietveld (also hosted on bugs.python.org) for code review * Buildbot (buildbot.python.org) for automated testing This proposal does *not* currently suggest replacing any of these tools, although implementing it effectively may require modifications to some or all of them. It does however suggest the addition of new tools in order to automate additional parts of the workflow, as well as a critical review of these tools to see which, if any, may be candidates for replacement. Proposal ======== The essence of this proposal is that CPython aim to adopt a "core reviewer" development model, similar to that used by the OpenStack project. The workflow problems experienced by the CPython core development team are not unique. The OpenStack infrastructure team have come up with a well designed automated workflow that is designed to ensure: * once a patch has been reviewed, further developer involvement is needed only if the automated tests fail prior to merging * patches never get merged without being tested relative to the current state of the branch * the main development branch always stays green. Patches that do not pass the automated tests do not get merged If a core developer wants to tweak a patch prior to merging, they download it from the review tool, modify and *upload it back to the review tool* rather than pushing it directly to the source code repository. The core of this workflow is implemented using a tool called Zuul_, a Python web service created specifically for the OpenStack project, but deliberately designed with a plugin based trigger and action system to make it easier to adapt to alternate code review systems, issue trackers and CI systems. James Blair of the OpenStack infrastructure team provided an `excellent overview of Zuul `__ at linux.conf.au 2014. While Zuul handles several workflows for OpenStack, the specific one of interest for this PEP is the "merge gating" workflow. For this workflow, Zuul is configured to monitor the Gerrit code review system for patches which have been marked as "Approved". Once it sees such a patch, Zuul takes it, and combines it into a queue of "candidate merges". It then creates a pipeline of test runs that execute in parallel in Jenkins (in order to allow more than 24 commits a day when a full test run takes the better part of an hour), and are merged as they pass (and as all the candidate merges ahead of them in the queue pass). If a patch fails the tests, Zuul takes it out of the queue, cancels any test runs after that patch in the queue, and rebuilds the queue without the failing patch. If a developer looks at a test which failed on merge and determines that it was due to an intermittent failure, then they can resubmit the patch for another attempt at merging. To adapt this process to CPython, it should be feasible to have Zuul monitor Rietveld for approved patches (which would require a feature addition in Rietveld), submit them to Buildbot for testing on the stable buildbots, and then merge the changes appropriately in Mercurial. This idea poses a few technical challenges, which have their own section below. For CPython, I don't believe we will need to take advantage of Zuul's ability to execute tests in parallel (certainly not in the initial iteration - if we get to a point where serial testing of patches by the merge gating system is our primary bottleneck rather than having the people we need in order to be able to review and approve patches, then that will be a very good day). However, the merge queue itself is a very powerful concept that should directly address several of the issues described in the Rationale above. .. _Zuul: http://ci.openstack.org/zuul/ .. _Elastic recheck: http://status.openstack.org/elastic-recheck/ Deferred Proposals ================== The OpenStack team also use Zuul to coordinate several other activities: * Running preliminary "check" tests against patches posted to Gerrit. * Creation of updated release artefacts and republishing documentation when changes are merged * The `Elastic recheck`_ feature that uses ElasticSearch in conjunction with a spam filter to monitor test output and suggest the specific intermittent failure that may have caused a test to fail, rather than requiring users to search logs manually While these are possibilities worth exploring in the future (and one of the possible benefits I see to seeking closer coordination with the OpenStack Infrastructure team), I don't see them as offering quite the same kind of fundamental workflow improvement that merge gating appears to provide. However, if we find we are having too many problems with intermittent test failures in the gate, then introducing the "Elastic recheck" feature may need to be considered as part of the initial deployment. Suggested Variants ================== Terry Reedy has suggested doing an initial filter which specifically looks for approved documentation-only patches (~700 of the 4000+ open CPython issues are pure documentation updates). This approach would avoid several of the issues related to flaky tests and cross-platform testing, while still allowing the rest of the automation flows to be worked out (such as how to push a patch into the merge queue). The one downside to this approach is that Zuul wouldn't have complete control of the merge process as it usually expects, so there would potentially be additional coordination needed around that. It may be worth keeping this approach as a fallback option if the initial deployment proves to have more trouble with test reliability than is anticipated. It would also be possible to tweak the merge gating criteria such that it doesn't run the test suite if it detects that the patch hasn't modified any files outside the "Docs" tree, and instead only checks that the documentation builds without errors. As yet another alternative, it may be reasonable to move some parts of the documentation (such as the tutorial) out of the main source repository and manage them using the simpler pull request based model. Perceived Benefits ================== The benefits of this proposal accrue most directly to the core development team. First and foremost, it means that once we mark a patch as "Approved" in the updated code review system, *we're usually done*. The extra 20-30 minutes (or more) of actually applying the patch, running the tests and merging it into Mercurial would all be orchestrated by Zuul. Push races would also be a thing of the past - if lots of core developers are approving patches at a sprint, then that just means the queue gets deeper in Zuul, rather than developers getting frustrated trying to merge changes and failing. Test failures would still happen, but they would result in the affected patch being removed from the merge queue, rather than breaking the code in the main repository. With the bulk of the time investment moved to the review process, this also encourages "development for reviewability" - smaller, easier to review patches, since the overhead of running the tests multiple times will be incurred by Zuul rather than by the core developers. However, removing this time sink from the core development team should also improve the experience of CPython development for other contributors, as it eliminates several of the opportunities for patches to get "dropped on the floor", as well as increasing the time core developers are likely to have available for reviewing contributed patches. Another example of benefits to other contributors is that when a sprint aimed primarily at new contributors is running with just a single core developer present (such as the sprints at PyCon AU for the last few years), the merge queue would allow that developer to focus more of their time on reviewing patches and helping the other contributors at the sprint, since accepting a patch for inclusion would now be a single click in the Rietveld UI, rather than the relatively time consuming process that it is currently. Even when multiple core developers are present, it is better to enable them to spend their time and effort on interacting with the other sprint participants than it is on things that are sufficiently mechanical that a computer can (and should) handle them. With most of the ways to make a mistake when committing a change automated out of existence, there are also substantially fewer new things to learn when a contributor is nominated to become a core developer. This should have a dual benefit, both in making the existing core developers more comfortable with granting that additional level of responsibility, and in making new contributors more comfortable with exercising it. Finally, a more stable default branch in CPython makes it easier for other Python projects to conduct continuous integration directly against the main repo, rather than having to wait until we get into the release candidate phase. At the moment, setting up such a system isn't particularly attractive, as it would need to include an additional mechanism to wait until CPython's own Buildbot fleet had indicate that the build was in a usable state. Technical Challenges ==================== Adapting Zuul from the OpenStack infrastructure to the CPython infrastructure will at least require the development of additional Zuul trigger and action plugins, and may require additional development in some of our existing tools. Rietveld/Roundup vs Gerrit -------------------------- Rietveld does not currently include a voting/approval feature that is equivalent to Gerrit's. For CPython, we wouldn't need anything as sophisticated as Gerrit's voting system - a simple core-developer-only "Approved" marker to trigger action from Zuul should suffice. The core-developer-or-not flag is available in Roundup, as is the flag indicating whether or not the uploader of a patch has signed a PSF Contributor Licensing Agreement, which may require further additions to the existing integration between the two tools. Rietveld may also require some changes to allow the uploader of a patch to indicate which branch it is intended for. We would likely also want to improve the existing patch handling, in particular looking at how the Roundup/Reitveld integration handles cases where it can't figure out a suitable base version to use when generating the review (if Rietveld gains the ability to nominate a particular target repository and branch for a patch, then this may be relatively easy to resolve). Some of the existing Zuul triggers work by monitoring for particular comments (in particular, recheck/reverify comments to ask Zuul to try merging a change again if it was previously rejected due to an unrelated intermittent failure). We will likely also want similar explicit triggers for Rietveld. The current Zuul plugins for Gerrit work by monitoring the Gerrit activity stream for particular events. If Rietveld has no equivalent, we will need to add something suitable for the events we would like to trigger on. There would also be development effort needed to create a Zuul plugin that monitors Rietveld activity rather than Gerrit. Mercurial vs Gerrit/git ----------------------- Gerrit uses git as the actual storage mechanism for patches, and automatically handles merging of approved patches. By contrast, Rietveld works directly on patches, and is completely decoupled from any specific version control system. Zuul is also directly integrated with git for patch manipulation - as far as I am aware, this part of the design isn't pluggable. However, at PyCon US 2014, the Mercurial core developers at the sprints expressed some interest in collaborating with the core development team and the Zuul developers on enabling the use of Zuul with Mercurial in addition to git. Buildbot vs Jenkins ------------------- Zuul's interaction with the CI system is also pluggable, using Gearman as the `preferred interface `__. Accordingly, adapting the CI jobs to run in Buildbot rather than Jenkins should just be a matter of writing a Gearman client that can process the requests from Zuul and pass them on to the Buildbot master. Zuul uses the pure Python `gear client library `__ to communicate with Gearman, and this library should also be useful to handle the Buildbot side of things. Note that, in the initial iteration, I am proposing that we *do not* attempt to pipeline test execution. This means Zuul would be running in a very simple mode where only the patch at the head of the merge queue is being tested on the Buildbot fleet, rather than potentially testing several patches in parallel. I am picturing something equivalent to requesting a forced build from the Buildbot master, and then waiting for the result to come back before moving on to the second patch in the queue. If we ultimately decide that this is not sufficient, and we need to start using the CI pipelining features of Zuul, then we may need to look at moving the test execution to dynamically provisioned cloud images, rather than relying on volunteer maintained statically provisioned systems as we do currently. The OpenStack CI infrastructure team are exploring the idea of replacing their current use of Jenkins masters with a simpler pure Python test runner, so if we find that we can't get Buildbot to effectively support the pipelined testing model, we'd likely participate in that effort rather than setting up a Jenkins instance for CPython. In this case, the main technical risk would be a matter of ensuring we support testing on platforms other than Linux (as our stable buildbots currently cover Windows, Mac OS X, FreeBSD and OpenIndiana in addition to a couple of different Linux variants). In such a scenario, the Buildbot fleet would still have a place in doing "check" runs against the master repository (either periodically or for every commit), even if it did not play a part in the merge gating process. More unusual configurations (such as building without threads, or without SSL/TLS support) would likely still be handled that way rather than being included in the gate criteria (at least initially, anyway). Handling of maintenance branches -------------------------------- The OpenStack project largely leaves the question of maintenance branches to downstream vendors, rather than handling it directly. This means there are questions to be answered regarding how we adapt Zuul to handle our maintenance branches. Python 2.7 can be handled easily enough by treating it as a separate patch queue. This would just require a change in Rietveld to indicate which branch was the intended target of the patch. The Python 3.x maintenance branches are potentially more complicated. My current recommendation is to simply stop using Mercurial merges to manage them, and instead treat them as independent heads, similar to the Python 2.7 branch. Patches that apply cleanly to both the active maintenance branch and to default would then just be submitted to both queues, while other changes might involve submitting separate patches for the active maintenance branch and for default. This approach also has the benefit of adjusting cleanly to the intermittent periods where we have two active Python 3 maintenance branches. This does suggest some user interface ideas for the branch nomination interface for a patch: * default to "default" on issues that are marked as "enhancement" * default to "3.x+" (where 3.x is the oldest branch in regular maintenance) on any other issues * also offer the ability to select specific branches in addition to or instead of the default selection Handling of security branches ----------------------------- For simplicity's sake, I would suggest leaving the handling of security-fix only branches alone: the release managers for those branches would continue to backport specific changes manually. Handling of NEWS file updates ----------------------------- Our current approach to handling NEWS file updates regularly results in spurious conflicts when merging bug fixes forward from an active maintenance branch to a later branch. `Issue #18967* `__ discusses some possible improvements in that area, which would be beneficial regardless of whether or not we adopt Zuul as a workflow automation tool. Stability of "stable" Buildbot slaves ------------------------------------- Instability of the nominally stable buildbots has a substantially larger impact under this proposal. We would need to ensure we're happy with each of those systems gating merges to the development branches, or else move then to "unstable" status. Intermittent test failures -------------------------- Some tests, especially timing tests, exhibit intermittent failures on the existing Buildbot fleet. In particular, test systems running as VMs may sometimes exhibit timing failures when the VM host is under higher than normal load. The OpenStack CI infrastructure includes a number of additional features to help deal with intermittent failures, the most basic of which is simply allowing developers to request that merging a patch be tried again when the original failure appears to be due to a known intermittent failure (whether that intermittent failure is in OpenStack itself or just in a flaky test). The more sophisticated `Elastic recheck`_ feature may be worth considering, especially since the output of the CPython test suite is substantially simpler than that from OpenStack's more complex multi-service testing, and hence likely even more amenable to automated analysis. Enhancing Mercurial/Rietveld/Roundup integration ------------------------------------------------ One useful part of the OpenStack workflow is the "git review" plugin, which makes it relatively easy to push a branch from a local git clone up to Gerrit for review. It seems that it should be possible to create a plugin that similarly integrates Mercurial queues with Rietveld and Roundup, allowing a draft patch to be uploaded as easily as running a command like "hg qpost" with a suitable .hgqpost configuration file checked in to the source repo. (There's an existing `hg review `__, plugin hence the suggestion of ``hg qpost`` as an alternate command) It would also be good to work directly with the Mercurial folks to come up with a tailored CPython-specific tutorial for using Mercurial queues and other extensions to work effectively with the CPython repository structure. We have some of that already in the developer guide, but I've come to believe that we may want to start getting more opinionated as to which extensions we recommend using, especially for users that have previously learned ``git`` and need to know which extensions to enable to gain a similar level of flexibility in their local workflow from Mercurial. Social Challenges ================= The primary social challenge here is getting the core development team to change their practices. However, the tedious-but-necessary steps that are automated by the proposal should create a strong incentive for the existing developers to go along with the idea. I believe three specific features may be needed to assure existing developers that there are no downsides to the automation of this workflow: * Only requiring approval from a single core developer to incorporate a patch. This could be revisited in the future, but we should preserve the status quo for the initial rollout. * Explicitly stating that core developers remain free to approve their own patches, except during the release candidate phase of a release. This could be revisited in the future, but we should preserve the status quo for the initial rollout. * Ensuring that at least release managers have a "merge it now" capability that allows them to force a particular patch to the head of the merge queue. Using a separate clone for release preparation may be sufficient for this purpose. Longer term, automatic merge gating may also allow for more automated preparation of release artefacts as well. Practical Challenges ==================== The PSF runs its own directly and indirectly sponsored workflow infrastructure primarily due to past experience with unacceptably poor performance and inflexibility of infrastructure provided for free to the general public. CPython development was originally hosted on SourceForge, with source control moved to self hosting when SF was both slow to offer Subversion support and suffering from CVS performance issues (see PEP 347), while issue tracking later moved to the open source Roundup issue tracker on dedicated sponsored hosting (from Upfront Systems), due to a combination of both SF performance issues and general usability issues with the SF tracker at the time (the outcome and process for the new tracker selection were captured on the `python.org wiki `__ rather than in a PEP). Accordingly, proposals that involve setting ourselves up for "SourceForge usability and reliability issues, round two" aren't likely to gain any traction with either the CPython core development team or with the PSF Infrastructure committee. This proposal respects that history by recommending only tools that are available for self-hosting as sponsored or PSF funded infrastructure, and are also open source Python projects that can be customised to meet the needs of the CPython core development team. However, for this proposal to be a success (if it is accepted), we need to understand how we are going to carry out the necessary configuration, customisation, integration and deployment work. The last attempt at adding a new piece to the CPython support infrastructure (speed.python.org) has unfortunately foundered due to the lack of time to drive the project from the core developers and PSF board members involved, and the difficulties of trying to bring someone else up to speed to lead the activity (the hardware donated to that project by HP is currently in use to support PyPy instead, but the situation highlights some of the challenges of relying on volunteer labour with many other higher priority demands on their time to steer projects to completion). Even ultimately successful past projects, such as the source control migrations from CVS to Subversion and from Subversion to Mercurial, the issue tracker migration from SourceForge to Roundup, the code review integration between Roundup and Rietveld and the introduction of the Buildbot continuous integration fleet, have taken an extended period of time as volunteers worked their way through the many technical and social challenges involved. Accordingly, one possible outcome of this proposal may be a recommendation to the PSF to investigate how to sustain direct investment in ongoing paid development on CPython workflow tools, similar to the ongoing funded development that supports the continuous integration infrastructure for OpenStack. Some possible approaches include: * the PSF funding part-time or contract based development on CPython workflow tools, either on an ad hoc basic through the existing grants program, or on a more permanent basis, collaborating with the CPython core development team to determine the scope of the desired improvements. * discussing a possible partnership with the OpenStack Foundation to collaborate on shared tool development that ultimately benefits both organisations (for example, the OpenStack infrastructure team aren't especially happy with the maintainability challenges posed by Gerrit, so improvements to Rietveld to make it a viable candidate for replacing Gerrit may be something they would be interested in). * PSF (and OpenStack) sponsor members allocating part-time or full-time staff to work on improving the CPython workflow tools, similar to the way such staff are allocated to improving OpenStack workflow tools. Note that this model of directing paid development efforts at improving the tools that support the contributions of volunteers is also one of the known ways to incorporate funded development into a primarily volunteer driven project without creating resentment between unpaid and paid contributors: it's harder to resent people that are being paid specifically to make the tools, workflow and general experience more pleasant for the unpaid contributors. Open Questions ============== Pretty much everything in the PEP. Do we want to adopt merge gating and Zuul? Is Rietveld the right place to hook Zuul into our current workflows? How do we want to address the various technical challenges? Assuming we do want to do it (or something like it), how is the work going to get done? Do we try to get it done solely as a volunteer effort? Do we put together a grant proposal for the PSF board to consider (assuming we can find people willing and available to do the work)? Do we approach the OpenStack Foundation for assistance, since we're a key dependency of OpenStack itself, Zuul is a creation of the OpenStack infrastructure team, and the available development resources for OpenStack currently dwarf those for CPython? Next Steps ========== Unfortunately, we ran out of time at the PyCon 2014 language summit to really discuss these issues. However, the `core-workflow mailing list `__ has now been set up to discuss workflow issues in general. Acknowledgements ================ Thanks to Jesse Noller, Alex Gaynor and James Blair for providing valuable feedback on a preliminary draft of this proposal, and to James and Monty Taylor for additional technical feedback following publication of the initial draft. Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: