Draft PEP for merge workflow automation

2014-01-25 15:27:37 +10:00 · 2014-01-25 15:27:37 +10:00 · c5701f72b1
parent 7432fc0ea7
commit c5701f72b1
1 changed files with 507 additions and 0 deletions
--- a/pep-0462.txt
+++ b/pep-0462.txt
@ -0,0 +1,507 @@
+PEP: 462
+Title: Core development workflow automation for CPython
+Version: $Revision$
+Last-Modified: $Date$
+Author: Nick Coghlan <ncoghlan@gmail.com>
+Status: Draft
+Type: Process
+Content-Type: text/x-rst
+Created: 23-Jan-2014
+
+
+Abstract
+========
+
+This PEP proposes investing in automation of several of the tedious, time
+consuming activities that are currently required for the core developerment
+team to incorporate changes into CPython. This proposal is intended to
+allow core developers to make more effective use of the time they have
+available to contribute to CPython, which should also result in an improved
+experience for other contributors that are reliant on the core team to get
+their changes incorporated.
+
+
+Rationale
+=========
+
+The current core developer workflow to merge a new feature into CPython
+on a POSIX system "works" as follows:
+
+#. If applying a change submitted to bugs.python.org by another user, first
+   check they have signed the PSF Contributor Licensing Agreement. If not,
+   request that they sign one before continuing with merging the change.
+#. Apply the change locally to a current checkout of the main CPython
+   repository (the change will typically have been discussed and reviewed
+   as a patch on bugs.python.org first, but this step is not currently
+   considered mandatory for changes originating directly from core
+   developers).
+#. Run the test suite locally, at least ``make test`` or
+   ``./python -m test`` (depending on system specs, this takes a few
+   minutes in the default configuration, but substantially longer if all
+   optional resources, like external network access, are enabled).
+#. Run ``make patchcheck`` to fix any whitespace issues and as a reminder
+   of other changes that may be needed (such as updating Misc/ACKS or
+   adding an entry to Misc/NEWS)
+#. Commit the change and push it to the main repository. If hg indicates
+   this would create a new head in the remote repository, run
+   ``hg pull --rebase`` (or an equivalent). Theoretically, you should
+   rerun the tests at this point, but it's *very* tempting to skip that
+   step.
+#. After pushing, monitor the `stable buildbots
+   <http://buildbot.python.org/all/waterfall?category=3.x.stable>`__
+   for any new failures introduced by your change. In particular, developers
+   on POSIX systems will often break the Windows buildbots, and vice-versa.
+   Less commonly, developers on Linux or Mac OS X may break other POSIX
+   systems.
+
+The steps required on Windows are similar, but the exact commands used
+will be different.
+
+Rather than being simpler, the workflow for a bug fix is *more* complicated
+than that for a new feature! New features have the advantage of only being
+applied to the ``default`` branch, while bug fixes also need to be considered
+for inclusion in maintenance branches.
+
+* If a bug fix is applicable to Python 2.7, then it is also separately
+  applied to the 2.7 branch, which is maintained as an independent head
+  in Mercurial
+* If a bug fix is applicable to the current 3.x maintenance release, then
+  it is first applied to the maintenance branch and then merged forward
+  to the default branch. Both branches are pushed to hg.python.org at the
+  same time.
+
+Documentation patches are simpler than functional patches, but not
+hugely so - the main benefit is only needing to check the docs build
+successfully rather than running the test suite.
+
+I would estimate that even when everything goes smoothly, it would still
+take me at least 20-30 minutes to commit a bug fix patch that applies
+cleanly. Given that it should be possible to automate several of these
+tasks, I do not believe our current practices are making effective use
+of scarce core developer resources.
+
+There are many, many frustrations involved with this current workflow, and
+they lead directly to some undesirable development practices.
+
+* Much of this overhead is incurred on a per-patch applied basis. This
+  encourages large commits, rather than small isolated changes. The time
+  required to commit a 500 line feature is essentially the same as that
+  needed to commit a 1 line bug fix - the additional time needed for the
+  larger change appears in any preceding review rather than as part of the
+  commit process.
+* The additional overhead of working on applying bug fixes creates an
+  additional incentive to work on new features instead, and new features
+  are already *inherently* more interesting to work on - they don't need
+  workflow difficulties giving them a helping hand!
+* Getting a preceding review on bugs.python.org is *additional* work,
+  creating an incentive to commit changes directly, increasing the reliance
+  on post-review on the python-checkins mailing list.
+* Patches on the tracker that are complete, correct and ready to merge may
+  still languish for extended periods awaiting a core developer with the
+  time to devote to getting it merged.
+* The risk of push races (especially when pushing a merged bug fix) creates
+  a temptation to skip doing full local test runs (especially after a push
+  race has already been encountered once), increasing the chance of
+  breaking the buildbots.
+* The buildbots are sometimes red for extended periods, introducing errors
+  into local test runs, and also meaning that they sometimes fail to serve
+  as a reliable indicator of whether or not a patch has introduced cross
+  platform issues.
+* Post-conference development sprints are a nightmare, as they collapse
+  into a mire of push races. It's tempting to just leave patches on the
+  tracker until after the sprint is over and then try to clean them up
+  afterwards.
+
+There are also many, many opportunities for core developers to make
+mistakes that inconvience others, both in managing the Mercurial branches
+and in breaking the buildbots without being in a position to fix them
+promptly. This both makes the existing core development team cautious in
+granting new developers commit access, as well as making those new
+developers cautious about actually making use of their increased level of
+access.
+
+There are also some incidental annoyances (like keeping the NEWS file up to
+date) that will also be necessarily addressed as part of this proposal.
+
+One of the most critical resources of a volunteer-driven open source project
+is the emotional energy of its contributors. The current approach to change
+incorporation doesn't score well on that front for anyone:
+
+* For core developers, the branch wrangling for bug fixes is delicate and
+  easy to get wrong. Conflicts on the NEWS file and push races when
+  attempting to upload changes add to the irritation of something most of
+  us aren't being paid to spend time on. The time we spend actually getting
+  a change merged is time we're not spending coding additional changes,
+  writing or updating documentation or reviewing contributions from others.
+* Red buildbots make life difficult for other developers (since a local
+  test failure may *not* be due to anything that developer did), release
+  managers (since they may need to enlist assistance cleaning up test
+  failures prior to a release) and for the developers themselves (since
+  it creates significant pressure to fix any failures we inadvertently
+  introduce right *now*, rather than at a more convenient time).
+* For new contributors, a core developer spending time actually getting
+  changes merged is a developer that isn't reviewing and discussing patches
+  on the issue tracker or otherwise helping others to contribute effectively.
+  It is especially frustrating for contributors that are accustomed to the
+  simplicity of a developer just being able to hit "Merge" on a pull
+  request that has already been automatically tested in the project's CI
+  system (which is a common workflow on sites like GitHub and BitBucket), or
+  where the post-review part of the merge process is fully automated (as is
+  the case for OpenStack).
+
+
+Current Tools
+=============
+
+The following tools are currently used to manage various parts of the
+CPython core development workflow.
+
+* Mercurial (hg.python.org) for version control
+* Roundup (bugs.python.org) for issue tracking
+* Rietveld (also hosted on bugs.python.org) for code review
+* Buildbot (buildbot.python.org) for automated testing
+
+This proposal does *not* suggest replacing any of these tools, although
+implementing it effectively may require modifications to some or all of
+them.
+
+It does however suggest the addition of new tools in order to automate
+additional parts of the workflow.
+
+
+Proposal
+========
+
+The essence of this proposal is that CPython aim to adopt a "core reviewer"
+development model, similar to that used by the OpenStack project.
+
+These problems experienced by CPython are not unique. The OpenStack
+infrastructure team have come up with a well designed automated workflow
+that is designed to ensure:
+
+* once a patch has been reviewed, further developer involvement is needed
+  only if the automated tests fail prior to merging
+* patches never get merged without being tested relative to the current
+  state of the branch
+* the main development branch always stays green. Patches that do not pass
+  the automated tests do not get merged.
+
+The core of this workflow is implemented using a tool called Zuul_, a
+Python web service created specifically for the OpenStack project, but
+deliberately designed with a plugin based trigger and action system to make
+it easier to adapt to alternate code review systems, issue trackers and
+CI systems. James Blair of the OpenStack infrastructure team provided
+an `excellent overview of Zuul
+<https://www.youtube.com/watch?v=sLD9LHc1QFM>`__ at linux.conf.au 2014.
+
+While Zuul handles several workflows for OpenStack, the specific one of
+interest for this PEP is the "merge gating" workflow.
+
+For this workflow, Zuul is configured to monitor the Gerrit code review
+system for patches which have been marked as "+2 Approved". Once it sees
+such a patch, Zuul takes it, and combines it into a queue of "candidate
+merges". It then creates a pipeline of test runs that execute in parallel in
+Jenkins (in order to allow more than 24 commits a day when a full test run
+takes the better part of an hour), and are merged as they pass (and as all
+the candidate merges ahead of them in the queue pass). If a patch fails the
+tests, Zuul takes it of the queue, cancels any test runs after that patch in
+the queue, and rebuilds the queue without the failing patch.
+
+To adapt this process to CPython, it should be feasible to have Zuul monitor
+Rietveld for approved patches (which would require a feature addition in
+Rietveld), submit them to Buildbot for testing on the stable buildbots, and
+then merge the changes appropriately in Mercurial. This idea poses a few
+technical challenges, which have their own section below.
+
+For CPython, I don't believe we will need to take advantage of Zuul's
+ability to execute tests in parallel (certainly not in the initial
+iteration - if we get to a point where serial testing of patches by the
+merge gating system is our primary bottleneck rather than having the
+people we need in order to be able to review and approve patches, then
+that will be a very good day).
+
+However, the merge queue itself is a very powerful concept that should
+directly address several of the issues described above.
+
+.. _Zuul: http://ci.openstack.org/zuul/
+
+
+Deferred Proposals
+==================
+
+The OpenStack team also use Zuul to coordinate several other activities:
+
+* Running preliminary "check" tests against patches posted to Gerrit.
+* Creation of updated release artefacts and republishing documentation when
+  changes are merged
+* Using ElasticSearch in conjunction with a spam filter to monitor test
+  output and suggest the specific intermittent failure that may have
+  caused a test to fail, rather than requiring users to search logs manually
+
+While these are possibilities worth exploring in the future (and one of the
+possible benefits I see to seeking closer coordination with the OpenStack
+Infrastructure team), I don't see them as offering quite the same kind of
+fundamental workflow improvement that merge gating appears to provide.
+
+
+Perceived Benefits
+==================
+
+The benefits of this proposal accrue most directly to the core development
+team. First and foremost, it means that once we mark a patch as "Approved"
+in the updated code review system, *we're usually done*. The extra 20-30
+minutes (or more) of actually applying the patch, running the tests and
+merging it into Mercurial would all be orchestrated by Zuul. Push races
+would also be a thing of the past - if lots of core developers are
+approving patches at a sprint, then that just means the queue gets
+deeper in Zuul, rather than developers getting frustrated trying to
+merge changes and failing. Test failures would still happen, but they
+would result in the affected patch being removed from the merge queue,
+rather than breaking the code in the main repository.
+
+With the bulk of the time investment moved to the review process, this
+also encourages "development for reviewability" - smaller, easier to review
+patches, since the overhead of running the tests five times rather than once
+will be incurred by Zuul rather than by the core developers.
+
+However, removing this time sink from the core development team should also
+improve the experience of CPython development for other contributors, as it
+removes several of the opportunities for patches to get "dropped on the
+floor", as well as increasing the time core developers are likely to have
+available for reviewing contributed patches.
+
+Another example of benefits to other contributors is that when a sprint
+aimed primarily at new contributors is running with just a single core
+developer present (such as the sprints at PyCon AU for the last
+few years), the merge queue would allow that developer to focus more of
+their time on reviewing patches and helping the other contributors at the
+sprint, since accepting a patch for inclusion would now be a single click
+in the Rietveld UI, rather than the relatively time consuming process that
+it is currently. Even when multiple core developers are present, it is
+better to enable them to spend their time and effort on interacting with
+the other sprint participants than it is on things that are sufficiently
+mechanical that a computer can (and should) handle them.
+
+
+Technical Challenges
+====================
+
+Adapting Zuul from the OpenStack infrastructure to the CPython
+infrastructure will at least require the development of additional
+Zuul trigger and action plugins, and may require additional development
+in some of our existing tools.
+
+
+Rietveld vs Gerrit
+------------------
+
+Rietveld does not currently include a voting/approval feature that is
+equivalent to Gerrit's. For CPython, we wouldn't need anything as
+sophisticated as Gerrit's voting system - a simple core-developer-only
+"Approved" marker to trigger action from Zuul should suffice. The
+core-developer-or-not flag is available in Roundup, which may require
+further additions to the existing integration between the two tools.
+
+Rietveld may also require some changes to allow the uploader of a patch
+to indicate which branch it is intended for.
+
+There would also be an additional Zuul trigger plugin needed to monitor
+Rietveld activity rather than Gerrit.
+
+
+Mercurial vs Gerrit/git
+-----------------------
+
+Gerrit uses git as the actual storage mechanism for patches, and
+automatically handles merging of approved patches. By contrast, Rietveld
+works directly on patches, and is completely decoupled from any specific
+version control system.
+
+Zuul is also directly integrated with git for patch manipulation - as far
+as I am aware, this part of the design isn't pluggable.
+
+Rather than trying to adapt Zuul to work directly with Mercurial, it will
+likely be more practical to let Zuul continue to use git internally, and
+then use the hg-git Mercurial plugin to pull the output from Zuul into the
+master repo at hg.python.org. (While there are various plugins that are
+designed to let git push to Mercurial repos, the influence of GitHub is
+such that the hg-git plugin appears to be the most mature of the available
+options for hg-git interoperability).
+
+
+Buildbot vs Jenkins
+-------------------
+
+As far as I am aware, Zuul's interaction with the CI system is also
+pluggable, so this should only require creating a Buildbot plugin to use
+instead of the Jenkins one.
+
+Note that, in the initial iteration, I am proposing that we *do not*
+attempt to pipeline test execution. This means Zuul would be running in
+a very simple mode where only the patch at the head of the merge queue
+is being tested on the Buildbot fleet, rather than potentially testing
+several patches in parallel. I am picturing something equivalent to
+requesting a forced build from the Buildbot master, and then waiting for
+the result to come back before moving on to the second patch in the queue.
+
+If we ultimately decide that this is not sufficient, and we need to start
+using the CI pipelining features of Zuul, then we may need to look at using
+Jenkins test runs to control the gating process. Due to the differences
+in the client/server architectures between Jenkins and Buildbot, the
+initial feedback from the OpenStack infrastructure team is that it is likely
+to be difficult to adapt the way Zuul controls the CI pipelining process in
+Jenkins to control Buildbot instead.
+
+If that latter step occurs, it would likely make sense to look at moving the
+test execution to dynamically provisioned cloud images, rather than relying
+on volunteer maintained statically provisioned systems as we do currently.
+
+In this case, the main technical risk would become Zuul's handling of testing
+on platforms other than Linux (our stable buildbots currently cover Windows,
+Mac OS X, FreeBSD and OpenIndiana in addition to a couple of different Linux
+variants).
+
+In such a scenario, the Buildbot fleet would still have a place in doing
+"check" runs against the master repository (either periodically or for
+every commit), even if it did not play a part in the merge gating process.
+More unusual configurations (such as building without threads, or without
+SSL/TLS support) would likely still be handled that way rather than being
+included in the gate criteria.
+
+
+Handling of maintenance branches
+--------------------------------
+
+The OpenStack project largely leaves the question of maintenance branches
+to downstream vendors, rather than handling it directly. This means there
+are questions to be answered regarding how we adapt Zuul to handle our
+maintenance branches.
+
+Python 2.7 can be handled easily enough by treating it as a separate patch
+queue. This would just require a change in Rietveld to indicate which
+branch was the intended target of the patch.
+
+The Python 3.x maintenance branches are potentially more complicated. One
+option would be to simply stop using Mercurial merges to manage them, and
+instead treat them as independent heads, similar to the Python 2.7 branch.
+Patches that applied cleanly to both the active maintenance branch and to
+default would then just be submitted to both queues, while other changes
+might involve submitting separate patches for the active maintenance branch
+and for default. This approach also has the benefit of adjusting cleanly to
+the intermittent periods where we have two active Python 3 maintenance
+branches.
+
+
+Handling of security branches
+-----------------------------
+
+For simplicity's sake, I would suggest leaving the handling of
+security-fix only branches alone: the release managers for those branches
+backport specific changes manually.
+
+
+Handling of NEWS file updates
+-----------------------------
+
+Our current approaching to handling NEWS file updates regularly results in
+spurious conflicts when merging bug fixes forward from an active maintenance
+branch to a later branch.
+
+`Issue #18967* <http://bugs.python.org/issue18967>`__ discusses some
+possible improvements in that area, which would be beneficial regardless
+of whether or not we adopt Zuul as a workflow automation tool.
+
+
+Stability of "stable" Buildbot slaves
+-------------------------------------
+
+Instability of the nominally stable buildbots has a substantially larger
+impact under this proposal. We would need to ensure we're happy with each
+of those systems gating merges to the development branches, or else move
+then to "unstable" status.
+
+
+Intermittent test failures
+--------------------------
+
+Some tests, especially timing tests, exhibit intermittent failures on the
+existing Buildbot fleet. In particular, test systems running as VMs may
+sometimes exhibit timing failures
+
+
+Social Challenges
+=================
+
+The primary social challenge here is getting the core development team to
+change their practices. However, the tedious-but-necessary steps that are
+automated by the proposal should create a strong incentive for the
+existing developers to go along with the idea.
+
+I believe two specific features may be needed to assure existing
+developers that there are no downsides to the automation of this workflow:
+
+* Only requiring approval from a single core developer to incorporate a
+  patch. This could be revisited in the future, but we should preserve the
+  status quo for the initial rollout.
+
+* Explicitly stating that core developers remain free to approve their own
+  patches, except during the release candidate phase of a release. This
+  could be revisited in the future, but we should preserve the status quo
+  for the initial rollout.
+
+* Ensuring that at least release managers have a "merge it now" capability
+  that allows them to force a particular patch to the head of the merge
+  queue. Using a separate clone for release preparation may be sufficient
+  for this purpose. Longer term, automatic merge gating may also allow for
+  more automated preparation of release artefacts as well.
+
+
+Open Questions
+==============
+
+Pretty much everything in the PEP. Do we want to do this? Is Rietveld the
+right place to hook Zuul into our current workflows? How do we want to
+address the various technical challenges?
+
+Assuming we do want to do it (or something like it), how is the work going
+to get done? Do we try to get it done solely as a volunteer effort? Do we
+put together a grant proposal for the PSF board to consider (assuming we can
+find people willing and available to do the work)?
+
+Do we approach the OpenStack Foundation for assistance, since
+we're a key dependency of OpenStack itself, Zuul is a creation of the
+OpenStack infrastructure team, and the available development resources for
+OpenStack currently dwarf those for CPython?
+
+
+Next Steps
+==========
+
+The topic of CPython workflow automation is on the agenda for the Language
+Summit at PyCon US 2014 in Montreal, and we will be inviting additional
+participants (specifically Mercurial and Zuul developers) to be involved
+in the discussions (Guido van Rossum is the creator of Rietveld, and these
+workflow changes are not expected to require any significant changes in
+Roundup or Buildbot).
+
+
+Acknowledgements
+================
+
+Thanks to Jesse Noller, Alex Gaynor and James Blair for providing valuable
+feedback on a preliminary draft of this proposal.
+
+
+Copyright
+=========
+
+This document has been placed in the public domain.
+
+..
+   Local Variables:
+   mode: indented-text
+   indent-tabs-mode: nil
+   sentence-end-double-space: t
+   fill-column: 70
+   coding: utf-8
+   End: