Increase retries for Kinesis sharding integration tests. (#12255)

This fixes intermittent, spurious failures that we've observed in
the Kinesis sharding integration tests due to Kinesis taking
longer than the code expected to start a sharding operation. The
method that's changed is part of the integration test suite and
only used by the test cases that we've seen are flaky.

Prior to this change, the tests expected a sharding operation to
start in 9 seconds (30 retries * 300ms delay/retry). This change
bumps the number of retries to 100, giving Kinesis 30 seconds to
start the sharding.

This PR also makes a small, clarifying change to the condition
used to determine if sharding has started. Instead of checking if
the number of shards has increased (which was technically correct
even if the test is reducing the number of shards due to a Kinesis
implementation detail), we now just check if the shard count has
changed.
This commit is contained in:
Daniel Koepke 2022-02-14 23:33:13 -08:00 committed by GitHub
parent 033989eb1d
commit 47153cd7bd
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 6 additions and 4 deletions

View File

@ -129,13 +129,15 @@ public class KinesisAdminClient implements StreamAdminClient
ITRetryUtil.retryUntil(
() -> {
int updatedShardCount = getStreamPartitionCount(streamName);
// Stream should be in active or updating state AND
// the number of shards must have increased irrespective of the value of newShardCount
// Retry until Kinesis records the operation is either in progress (UPDATING) or completed (ACTIVE)
// and the shard count has changed.
return verifyStreamStatus(streamName, StreamStatus.ACTIVE, StreamStatus.UPDATING)
&& updatedShardCount > originalShardCount;
&& updatedShardCount != originalShardCount;
}, true,
300, // higher value to avoid exceeding kinesis TPS limit
30,
100,
"Kinesis stream resharding to start (or finished)"
);
}