[ML] Fix rare ML daily maintenance test race condition (#64043)

Depending on thread scheduling the ML daily maintenance tests could do one more iteration than expected, causing rare failures. Fixes #64036
2025-02-08 05:58:44 +00:00 · 2020-10-22 12:39:20 +01:00 · 2020-10-22 12:39:20 +01:00 · cb0c538b35
commit cb0c538b35
parent fa8ad3abde
1 changed files with 10 additions and 2 deletions
--- a/x-pack/plugin/ml/src/test/java/org/elasticsearch/xpack/ml/MlDailyMaintenanceServiceTests.java
+++ b/x-pack/plugin/ml/src/test/java/org/elasticsearch/xpack/ml/MlDailyMaintenanceServiceTests.java
@ -230,8 +230,16 @@ public class MlDailyMaintenanceServiceTests extends ESTestCase {

    private MlDailyMaintenanceService createService(CountDownLatch latch, Client client) {
        return new MlDailyMaintenanceService(Settings.EMPTY, threadPool, client, clusterService, mlAssignmentNotifier, () -> {
-                latch.countDown();
-                return TimeValue.timeValueMillis(100);
+                // We need to be careful that an unexpected iteration doesn't get squeezed in by the maintenance threadpool in
+                // between the latch getting counted down to zero and the main test thread stopping the maintenance service.
+                // This could happen if the main test thread happens to be waiting for a CPU for the whole 100ms after the
+                // latch counts down to zero.
+                if (latch.getCount() > 0) {
+                    latch.countDown();
+                    return TimeValue.timeValueMillis(100);
+                } else {
+                    return TimeValue.timeValueHours(1);
+                }
            });
    }