[ML] Speed up persistent task rechecks in ML failover tests (#43291)
The ML failover tests sometimes need to wait for jobs to be assigned to new nodes following a node failure. They wait 10 seconds for this to happen. However, if the node that failed was the master node and a new master was elected then this 10 seconds might not be long enough as a refresh of the memory stats will delay job assignment. Once the memory refresh completes the persistent task will be assigned when the next cluster state update occurs or after the periodic recheck interval, which defaults to 30 seconds. Rather than increase the length of the wait for assignment to 31 seconds, this change decreases the periodic recheck interval to 1 second. Fixes #43289
This commit is contained in:
parent
74813360ab
commit
da97325790
|
@ -23,6 +23,7 @@ import org.elasticsearch.common.unit.TimeValue;
|
|||
import org.elasticsearch.index.reindex.ReindexPlugin;
|
||||
import org.elasticsearch.indices.recovery.RecoveryState;
|
||||
import org.elasticsearch.license.LicenseService;
|
||||
import org.elasticsearch.persistent.PersistentTasksClusterService;
|
||||
import org.elasticsearch.plugins.Plugin;
|
||||
import org.elasticsearch.test.ESIntegTestCase;
|
||||
import org.elasticsearch.test.MockHttpTransport;
|
||||
|
@ -350,6 +351,17 @@ public abstract class BaseMlIntegTestCase extends ESIntegTestCase {
|
|||
}
|
||||
|
||||
protected String awaitJobOpenedAndAssigned(String jobId, String queryNode) throws Exception {
|
||||
|
||||
PersistentTasksClusterService persistentTasksClusterService =
|
||||
internalCluster().getInstance(PersistentTasksClusterService.class, internalCluster().getMasterName());
|
||||
// Speed up rechecks to a rate that is quicker than what settings would allow.
|
||||
// The check would work eventually without doing this, but the assertBusy() below
|
||||
// would need to wait 30 seconds, which would make the test run very slowly.
|
||||
// The 1 second refresh puts a greater burden on the master node to recheck
|
||||
// persistent tasks, but it will cope in these tests as it's not doing much
|
||||
// else.
|
||||
persistentTasksClusterService.setRecheckInterval(TimeValue.timeValueSeconds(1));
|
||||
|
||||
AtomicReference<String> jobNode = new AtomicReference<>();
|
||||
assertBusy(() -> {
|
||||
GetJobsStatsAction.Response statsResponse =
|
||||
|
|
Loading…
Reference in New Issue