ClusterHealth shouldn't fail with "unexpected failure" if master steps down while waiting for events

In order to wait for events of a certain priority to pass, TransportClusterHealthAction submits a cluster state update task. If the current master steps down while this task is in the queue, the task will fail causing the ClusterHealth to report an unexpected error.

We often use this request to ensure cluster stability in tests after disruption. However, depends on the nature of the failure it may happen (if we're unfortunate) that two master election rounds are needed. The above issues causes the get health request to fail after the first one. Instead we should try to wait for a new master to be elected (or the local node to be re-elected).

Closes #11493
This commit is contained in:
Boaz Leskes 2015-06-04 13:07:16 +02:00
parent 6ab3141021
commit 129d8ec29a
1 changed files with 6 additions and 0 deletions

View File

@ -84,6 +84,12 @@ public class TransportClusterHealthAction extends TransportMasterNodeReadAction<
executeHealth(request, listener);
}
@Override
public void onNoLongerMaster(String source) {
logger.trace("stopped being master while waiting for events with priority [{}]. retrying.", request.waitForEvents());
doExecute(request, listener);
}
@Override
public void onFailure(String source, Throwable t) {
logger.error("unexpected failure during [{}]", t, source);