f7e2984248
This will cause the leader stuck on IO during publication to step down and eventually trigger a leader election. Issue Description --- The publication of cluster state is time bound to 30s by a cluster.publish.timeout settings. If this time is reached before the new cluster state is committed, then the cluster state change is rejected and the leader considers itself to have failed. It stands down and starts trying to elect a new master. There is a bug in leader that when it tries to publish the new cluster state it first tries acquire a lock to flush the new state under a mutex to disk. The same lock is used to cancel the publication on timeout. Below is the state of the timeout scheduler meant to cancel the publication. So essentially if the flushing of cluster state is stuck on IO, so will the cancellation of the publication since both of them share the same mutex. So leader will not step down and effectively block the cluster from making progress. Signed-off-by: Bukhtawar Khan <bukhtawa@amazon.com> |
||
---|---|---|
.. | ||
licenses | ||
src | ||
build.gradle |