OpenSearch/docs/reference/modules/discovery/fault-detection.asciidoc

[[cluster-fault-detection]]
=== Cluster fault detection

The elected master periodically checks each of the nodes in the cluster to
ensure that they are still connected and healthy. Each node in the cluster also
periodically checks the health of the elected master. These checks are known
respectively as _follower checks_ and _leader checks_.

Elasticsearch allows these checks to occasionally fail or timeout without
taking any action. It considers a node to be faulty only after a number of
consecutive checks have failed. You can control fault detection behavior with
<<modules-discovery-settings,`cluster.fault_detection.*` settings>>.

If the elected master detects that a node has disconnected, however, this
situation is treated as an immediate failure. The master bypasses the timeout
and retry setting values and attempts to remove the node from the cluster.
Similarly, if a node detects that the elected master has disconnected, this
situation is treated as an immediate failure. The node bypasses the timeout and
retry settings and restarts its discovery phase to try and find or elect a new
master.

[[cluster-fault-detection-filesystem-health]]
Additionally, each node periodically verifies that its data path is healthy by
writing a small file to disk and then deleting it again. If a node discovers
its data path is unhealthy then it is removed from the cluster until the data
path recovers. You can control this behavior with the
<<modules-discovery-settings,`monitor.fs.health` settings>>.
[DOCS] Merges list of discovery and cluster formation settings (#36909) 2018-12-21 14:24:48 -05:00			`[[cluster-fault-detection]]`
			`=== Cluster fault detection`
[Zen2] Update documentation for Zen2 (#34714) This commit overhauls the documentation of discovery and cluster coordination, removing mention of the Zen Discovery module and replacing it with docs for the new cluster coordination mechanism introduced in 7.0. Relates #32006 2018-12-20 08:02:44 -05:00
[DOCS] Merges list of discovery and cluster formation settings (#36909) 2018-12-21 14:24:48 -05:00			`The elected master periodically checks each of the nodes in the cluster to`
[DOCS] Adds overview and API ref for cluster voting configurations (#36954) 2019-01-07 12:11:14 -05:00			`ensure that they are still connected and healthy. Each node in the cluster also`
			`periodically checks the health of the elected master. These checks are known`
			`respectively as _follower checks_ and _leader checks_.`
[Zen2] Update documentation for Zen2 (#34714) This commit overhauls the documentation of discovery and cluster coordination, removing mention of the Zen Discovery module and replacing it with docs for the new cluster coordination mechanism introduced in 7.0. Relates #32006 2018-12-20 08:02:44 -05:00
[DOCS] Merges list of discovery and cluster formation settings (#36909) 2018-12-21 14:24:48 -05:00			`Elasticsearch allows these checks to occasionally fail or timeout without`
			`taking any action. It considers a node to be faulty only after a number of`
			`consecutive checks have failed. You can control fault detection behavior with`
			<<modules-discovery-settings,`cluster.fault_detection.*` settings>>.

			`If the elected master detects that a node has disconnected, however, this`
			`situation is treated as an immediate failure. The master bypasses the timeout`
			`and retry setting values and attempts to remove the node from the cluster.`
			`Similarly, if a node detects that the elected master has disconnected, this`
			`situation is treated as an immediate failure. The node bypasses the timeout and`
			`retry settings and restarts its discovery phase to try and find or elect a new`
[DOCS] Adds overview and API ref for cluster voting configurations (#36954) 2019-01-07 12:11:14 -05:00			`master.`
Add docs for filesystem health checks (#59134) Documents the feature and settings introduced in #52680. Co-authored-by: James Rodewig <james.rodewig@elastic.co> 2020-07-07 09:14:35 -04:00
			`[[cluster-fault-detection-filesystem-health]]`
			`Additionally, each node periodically verifies that its data path is healthy by`
			`writing a small file to disk and then deleting it again. If a node discovers`
			`its data path is unhealthy then it is removed from the cluster until the data`
			`path recovers. You can control this behavior with the`
			<<modules-discovery-settings,`monitor.fs.health` settings>>.