2018-12-21 14:24:48 -05:00
|
|
|
[[cluster-fault-detection]]
|
|
|
|
=== Cluster fault detection
|
2018-12-20 08:02:44 -05:00
|
|
|
|
2018-12-21 14:24:48 -05:00
|
|
|
The elected master periodically checks each of the nodes in the cluster to
|
|
|
|
ensure that they are still connected and healthy. Each node in the cluster also periodically checks the health of the elected master. These checks
|
2018-12-20 08:02:44 -05:00
|
|
|
are known respectively as _follower checks_ and _leader checks_.
|
|
|
|
|
2018-12-21 14:24:48 -05:00
|
|
|
Elasticsearch allows these checks to occasionally fail or timeout without
|
|
|
|
taking any action. It considers a node to be faulty only after a number of
|
|
|
|
consecutive checks have failed. You can control fault detection behavior with
|
|
|
|
<<modules-discovery-settings,`cluster.fault_detection.*` settings>>.
|
|
|
|
|
|
|
|
If the elected master detects that a node has disconnected, however, this
|
|
|
|
situation is treated as an immediate failure. The master bypasses the timeout
|
|
|
|
and retry setting values and attempts to remove the node from the cluster.
|
|
|
|
Similarly, if a node detects that the elected master has disconnected, this
|
|
|
|
situation is treated as an immediate failure. The node bypasses the timeout and
|
|
|
|
retry settings and restarts its discovery phase to try and find or elect a new
|
|
|
|
master.
|