Ensure cluster is stable in ShrinkIndexIT.testShrinkThenSplitWithFailedNode (#44860)

The test ShrinkIndexIT.testShrinkThenSplitWithFailedNode sometimes fails 
because the resize operation is not acknowledged (see #44736). This resize 
operation creates a new index "splitagain" and it results in a cluster state 
update (TransportResizeAction uses MetaDataCreateIndexService.createIndex() 
to create the resized index). This cluster state update is expected to be 
acknowledged by all nodes (see IndexCreationTask.onAllNodesAcked()) but 
this is not always true: the data node that was just stopped in the test before 
executing the resize operation might still be considered as a "faulty" node
 (and not yet removed from the cluster nodes) by the FollowersChecker. The 
cluster state is then acked on all nodes but one, and it results in a non 
acknowledged resize operation.

This commit adds an ensureStableCluster() check after stopping the node in 
the test. The goal is to ensure that the data node has been correctly removed 
from the cluster and that all nodes are fully connected to each before moving 
forward with the resize operation.

Closes #44736
This commit is contained in:
Tanguy Leroux 2019-07-26 10:12:59 +02:00
parent 6ea2b5dec0
commit 8848fcfb22
1 changed files with 2 additions and 0 deletions

View File

@ -580,7 +580,9 @@ public class ShrinkIndexIT extends ESIntegTestCase {
.build()).setResizeType(ResizeType.SHRINK).get());
ensureGreen();
final int nodeCount = cluster().size();
internalCluster().stopRandomNode(InternalTestCluster.nameFilter(shrinkNode));
ensureStableCluster(nodeCount - 1);
// demonstrate that the index.routing.allocation.initial_recovery setting from the shrink doesn't carry over into the split index,
// because this would cause the shrink to fail as the initial_recovery node is no longer present.