From 8f4f844e6ec9ffa996a337fba2880e26b6ef52bf Mon Sep 17 00:00:00 2001 From: David Turner Date: Tue, 7 Jul 2020 14:14:35 +0100 Subject: [PATCH] Add docs for filesystem health checks (#59134) Documents the feature and settings introduced in #52680. Co-authored-by: James Rodewig --- .../discovery/discovery-settings.asciidoc | 19 +++++++++++++++++++ .../discovery/fault-detection.asciidoc | 7 +++++++ 2 files changed, 26 insertions(+) diff --git a/docs/reference/modules/discovery/discovery-settings.asciidoc b/docs/reference/modules/discovery/discovery-settings.asciidoc index f0ce103ebb3..f7f3b899293 100644 --- a/docs/reference/modules/discovery/discovery-settings.asciidoc +++ b/docs/reference/modules/discovery/discovery-settings.asciidoc @@ -245,3 +245,22 @@ WARNING: This setting replaces the `discovery.zen.no_master_block` setting in earlier versions. The `discovery.zen.no_master_block` setting is ignored. -- + +`monitor.fs.health.enabled`:: + + (<>, boolean) If `true`, the node runs + periodic <>. Defaults to `true`. + +`monitor.fs.health.refresh_interval`:: + + (<>) Interval between successive + <>. + Defaults to `2m`. + +`monitor.fs.health.slow_path_logging_threshold`:: + + (<>) If a + <> + takes longer than this threshold then {es} logs a warning. Defaults to + `5s`. diff --git a/docs/reference/modules/discovery/fault-detection.asciidoc b/docs/reference/modules/discovery/fault-detection.asciidoc index 9062444b80d..56b5bc32a75 100644 --- a/docs/reference/modules/discovery/fault-detection.asciidoc +++ b/docs/reference/modules/discovery/fault-detection.asciidoc @@ -18,3 +18,10 @@ Similarly, if a node detects that the elected master has disconnected, this situation is treated as an immediate failure. The node bypasses the timeout and retry settings and restarts its discovery phase to try and find or elect a new master. + +[[cluster-fault-detection-filesystem-health]] +Additionally, each node periodically verifies that its data path is healthy by +writing a small file to disk and then deleting it again. If a node discovers +its data path is unhealthy then it is removed from the cluster until the data +path recovers. You can control this behavior with the +<>.