Increase retries for Kinesis sharding integration tests. (#12255)

This fixes intermittent, spurious failures that we've observed in the Kinesis sharding integration tests due to Kinesis taking longer than the code expected to start a sharding operation. The method that's changed is part of the integration test suite and only used by the test cases that we've seen are flaky. Prior to this change, the tests expected a sharding operation to start in 9 seconds (30 retries * 300ms delay/retry). This change bumps the number of retries to 100, giving Kinesis 30 seconds to start the sharding. This PR also makes a small, clarifying change to the condition used to determine if sharding has started. Instead of checking if the number of shards has increased (which was technically correct even if the test is reducing the number of shards due to a Kinesis implementation detail), we now just check if the shard count has changed.
2022-02-14 23:33:13 -08:00 · 2022-02-14 23:33:13 -08:00 · 47153cd7bd
parent 033989eb1d
commit 47153cd7bd
1 changed files with 6 additions and 4 deletions
--- a/integration-tests/src/main/java/org/apache/druid/testing/utils/KinesisAdminClient.java
+++ b/integration-tests/src/main/java/org/apache/druid/testing/utils/KinesisAdminClient.java
@ -129,13 +129,15 @@ public class KinesisAdminClient implements StreamAdminClient
      ITRetryUtil.retryUntil(
          () -> {
            int updatedShardCount = getStreamPartitionCount(streamName);
-            // Stream should be in active or updating state AND
-            // the number of shards must have increased irrespective of the value of newShardCount
+
+            // Retry until Kinesis records the operation is either in progress (UPDATING) or completed (ACTIVE)
+            // and the shard count has changed.
+
            return verifyStreamStatus(streamName, StreamStatus.ACTIVE, StreamStatus.UPDATING)
-                   && updatedShardCount > originalShardCount;
+                   && updatedShardCount != originalShardCount;
          }, true,
          300, // higher value to avoid exceeding kinesis TPS limit
-          30,
+          100,
          "Kinesis stream resharding to start (or finished)"
      );
    }