From e97a14ae6f46ae401d133abceac403647fcdf46b Mon Sep 17 00:00:00 2001 From: Tsuyoshi Ozawa Date: Thu, 16 Jul 2015 17:52:38 +0900 Subject: [PATCH] YARN-3805. Update the documentation of Disk Checker based on YARN-90. Contributed by Masatake Iwasaki. (cherry picked from commit 1ba2986dee4bbb64d67ada005f8f132e69575274) --- hadoop-yarn-project/CHANGES.txt | 3 +++ .../hadoop-yarn-site/src/site/markdown/NodeManager.md | 8 ++++++-- 2 files changed, 9 insertions(+), 2 deletions(-) diff --git a/hadoop-yarn-project/CHANGES.txt b/hadoop-yarn-project/CHANGES.txt index 933d0f11c14..a870ca53b8c 100644 --- a/hadoop-yarn-project/CHANGES.txt +++ b/hadoop-yarn-project/CHANGES.txt @@ -586,6 +586,9 @@ Release 2.8.0 - UNRELEASED YARN-3174. Consolidate the NodeManager and NodeManagerRestart documentation into one. (Masatake Iwasaki via ozawa) + YARN-3805. Update the documentation of Disk Checker based on YARN-90. + (Masatake Iwasaki via ozawa) + Release 2.7.2 - UNRELEASED INCOMPATIBLE CHANGES diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/NodeManager.md b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/NodeManager.md index 69e99a7796f..4724ea65e57 100644 --- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/NodeManager.md +++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/NodeManager.md @@ -36,7 +36,9 @@ The NodeManager runs services to determine the health of the node it is executin ###Disk Checker - The disk checker checks the state of the disks that the NodeManager is configured to use(local-dirs and log-dirs, configured using yarn.nodemanager.local-dirs and yarn.nodemanager.log-dirs respectively). The checks include permissions and free disk space. It also checks that the filesystem isn't in a read-only state. The checks are run at 2 minute intervals by default but can be configured to run as often as the user desires. If a disk fails the check, the NodeManager stops using that particular disk but still reports the node status as healthy. However if a number of disks fail the check(the number can be configured, as explained below), then the node is reported as unhealthy to the ResourceManager and new containers will not be assigned to the node. In addition, once a disk is marked as unhealthy, the NodeManager stops checking it to see if it has recovered(e.g. disk became full and was then cleaned up). The only way for the NodeManager to use that disk to restart the software on the node. The following configuration parameters can be used to modify the disk checks: + The disk checker checks the state of the disks that the NodeManager is configured to use(local-dirs and log-dirs, configured using yarn.nodemanager.local-dirs and yarn.nodemanager.log-dirs respectively). The checks include permissions and free disk space. It also checks that the filesystem isn't in a read-only state. The checks are run at 2 minute intervals by default but can be configured to run as often as the user desires. If a disk fails the check, the NodeManager stops using that particular disk but still reports the node status as healthy. However if a number of disks fail the check(the number can be configured, as explained below), then the node is reported as unhealthy to the ResourceManager and new containers will not be assigned to the node. + +The following configuration parameters can be used to modify the disk checks: | Configuration Name | Allowed Values | Description | |:---- |:---- |:---- | @@ -48,7 +50,9 @@ The NodeManager runs services to determine the health of the node it is executin ###External Health Script - Users may specify their own health checker script that will be invoked by the health checker service. Users may specify a timeout as well as options to be passed to the script. If the script exits with a non-zero exit code, times out or results in an exception being thrown, the node is marked as unhealthy. Please note that if the script cannot be executed due to permissions or an incorrect path, etc, then it counts as a failure and the node will be reported as unhealthy. Please note that speifying a health check script is not mandatory. If no script is specified, only the disk checker status will be used to determine the health of the node. The following configuration parameters can be used to set the health script: +Users may specify their own health checker script that will be invoked by the health checker service. Users may specify a timeout as well as options to be passed to the script. If the script exits with a non-zero exit code, times out or results in an exception being thrown, the node is marked as unhealthy. Please note that if the script cannot be executed due to permissions or an incorrect path, etc, then it counts as a failure and the node will be reported as unhealthy. Please note that speifying a health check script is not mandatory. If no script is specified, only the disk checker status will be used to determine the health of the node. + +The following configuration parameters can be used to set the health script: | Configuration Name | Allowed Values | Description | |:---- |:---- |:---- |