ClusterHealth shouldn't fail with "unexpected failure" if master steps down while waiting for events
In order to wait for events of a certain priority to pass, TransportClusterHealthAction submits a cluster state update task. If the current master steps down while this task is in the queue, the task will fail causing the ClusterHealth to report an unexpected error. We often use this request to ensure cluster stability in tests after disruption. However, depends on the nature of the failure it may happen (if we're unfortunate) that two master election rounds are needed. The above issues causes the get health request to fail after the first one. Instead we should try to wait for a new master to be elected (or the local node to be re-elected). Closes #11493
This commit is contained in:
parent
6ab3141021
commit
129d8ec29a
|
@ -84,6 +84,12 @@ public class TransportClusterHealthAction extends TransportMasterNodeReadAction<
|
|||
executeHealth(request, listener);
|
||||
}
|
||||
|
||||
@Override
|
||||
public void onNoLongerMaster(String source) {
|
||||
logger.trace("stopped being master while waiting for events with priority [{}]. retrying.", request.waitForEvents());
|
||||
doExecute(request, listener);
|
||||
}
|
||||
|
||||
@Override
|
||||
public void onFailure(String source, Throwable t) {
|
||||
logger.error("unexpected failure during [{}]", t, source);
|
||||
|
|
Loading…
Reference in New Issue