Rollup jobs should be cleaned up before indices are deleted (#38930) (#39144)

Rollup jobs should be stopped + deleted before the indices are removed.
It's possible for an active rollup job to issue a bulk request, the test
ends and the cleanup code deletes all indices.  The in-flight bulk
request will then stall + error because the index no-longer exists...
but this process might take longer than the StopRollup timeout.

Which means the test fails, and often fails several other tests since
the job is still active (e.g. other tests cannot create the same-named
job, or fail to stop the job in their cleanup because it's still stalled).

This tends to knock over several tests before the bulk finally times
out and the job shuts down.

Instead, we need to simply stop jobs first.  Inflight bulks will resolve
quickly, and we can carry on with deleting indices after the jobs are
confirmed inactive.

stop-job.asciidoc tended to trigger this issue because it executed
an async stop API and then exited, which setup the above situation. In
can and did happen with other tests though.  As an extra precaution,
the doc test was modified to substitute in wait_for_completion
to help head off these issues too.
This commit is contained in:
Tal Levy 2019-02-20 11:12:01 -08:00 committed by GitHub
parent af451459a5
commit cb7e3708bc
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
2 changed files with 10 additions and 5 deletions

View File

@ -56,6 +56,7 @@ POST _rollup/job/sensor/_stop
--------------------------------------------------
// CONSOLE
// TEST[setup:sensor_started_rollup_job]
// TEST[s/_stop/_stop?wait_for_completion=true&timeout=10s/]
Which will return the response:

View File

@ -458,6 +458,15 @@ public abstract class ESRestTestCase extends ESTestCase {
}
private void wipeCluster() throws Exception {
// Cleanup rollup before deleting indices. A rollup job might have bulks in-flight,
// so we need to fully shut them down first otherwise a job might stall waiting
// for a bulk to finish against a non-existing index (and then fail tests)
if (hasXPack && false == preserveRollupJobsUponCompletion()) {
wipeRollupJobs();
waitForPendingRollupTasks();
}
if (preserveIndicesUponCompletion() == false) {
// wipe indices
try {
@ -505,11 +514,6 @@ public abstract class ESRestTestCase extends ESTestCase {
wipeClusterSettings();
}
if (hasXPack && false == preserveRollupJobsUponCompletion()) {
wipeRollupJobs();
waitForPendingRollupTasks();
}
if (hasXPack && false == preserveILMPoliciesUponCompletion()) {
deleteAllPolicies();
}