[ML] Speed up persistent task rechecks in ML failover tests (#43291)

The ML failover tests sometimes need to wait for jobs to be
assigned to new nodes following a node failure.  They wait
10 seconds for this to happen.  However, if the node that
failed was the master node and a new master was elected then
this 10 seconds might not be long enough as a refresh of the
memory stats will delay job assignment.  Once the memory
refresh completes the persistent task will be assigned when
the next cluster state update occurs or after the periodic
recheck interval, which defaults to 30 seconds.  Rather than
increase the length of the wait for assignment to 31 seconds,
this change decreases the periodic recheck interval to 1
second.

Fixes #43289
This commit is contained in:
David Roberts 2019-06-18 09:09:54 +01:00
parent 74813360ab
commit da97325790
1 changed files with 12 additions and 0 deletions

View File

@ -23,6 +23,7 @@ import org.elasticsearch.common.unit.TimeValue;
import org.elasticsearch.index.reindex.ReindexPlugin;
import org.elasticsearch.indices.recovery.RecoveryState;
import org.elasticsearch.license.LicenseService;
import org.elasticsearch.persistent.PersistentTasksClusterService;
import org.elasticsearch.plugins.Plugin;
import org.elasticsearch.test.ESIntegTestCase;
import org.elasticsearch.test.MockHttpTransport;
@ -350,6 +351,17 @@ public abstract class BaseMlIntegTestCase extends ESIntegTestCase {
}
protected String awaitJobOpenedAndAssigned(String jobId, String queryNode) throws Exception {
PersistentTasksClusterService persistentTasksClusterService =
internalCluster().getInstance(PersistentTasksClusterService.class, internalCluster().getMasterName());
// Speed up rechecks to a rate that is quicker than what settings would allow.
// The check would work eventually without doing this, but the assertBusy() below
// would need to wait 30 seconds, which would make the test run very slowly.
// The 1 second refresh puts a greater burden on the master node to recheck
// persistent tasks, but it will cope in these tests as it's not doing much
// else.
persistentTasksClusterService.setRecheckInterval(TimeValue.timeValueSeconds(1));
AtomicReference<String> jobNode = new AtomicReference<>();
assertBusy(() -> {
GetJobsStatsAction.Response statsResponse =