diff --git a/hadoop-project/src/site/site.xml b/hadoop-project/src/site/site.xml index bb3087ff30e..87866f0a3d7 100644 --- a/hadoop-project/src/site/site.xml +++ b/hadoop-project/src/site/site.xml @@ -101,6 +101,7 @@ + diff --git a/hadoop-yarn-project/CHANGES.txt b/hadoop-yarn-project/CHANGES.txt index e900c7a53f1..00d33c14549 100644 --- a/hadoop-yarn-project/CHANGES.txt +++ b/hadoop-yarn-project/CHANGES.txt @@ -92,6 +92,9 @@ Release 2.4.1 - UNRELEASED YARN-1892. Improved some logs in the scheduler. (Jian He via zjshen) + YARN-1696. Added documentation for ResourceManager fail-over. (Karthik + Kambatla, Masatake Iwasaki, Tsuyoshi OZAWA via vinodkv) + OPTIMIZATIONS BUG FIXES diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/ResourceManagerHA.apt.vm b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/ResourceManagerHA.apt.vm new file mode 100644 index 00000000000..7da3f351b39 --- /dev/null +++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/ResourceManagerHA.apt.vm @@ -0,0 +1,234 @@ +~~ Licensed under the Apache License, Version 2.0 (the "License"); +~~ you may not use this file except in compliance with the License. +~~ You may obtain a copy of the License at +~~ +~~ http://www.apache.org/licenses/LICENSE-2.0 +~~ +~~ Unless required by applicable law or agreed to in writing, software +~~ distributed under the License is distributed on an "AS IS" BASIS, +~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +~~ See the License for the specific language governing permissions and +~~ limitations under the License. See accompanying LICENSE file. + + --- + ResourceManager High Availability + --- + --- + ${maven.build.timestamp} + +ResourceManager High Availability + + \[ {{{./index.html}Go Back}} \] + +%{toc|section=1|fromDepth=0} + +* Introduction + + This guide provides an overview of High Availability of YARN's ResourceManager, + and details how to configure and use this feature. The ResourceManager (RM) + is responsible for tracking the resources in a cluster, and scheduling + applications (e.g., MapReduce jobs). Prior to Hadoop 2.4, the ResourceManager + is the single point of failure in a YARN cluster. The High Availability + feature adds redundancy in the form of an Active/Standby ResourceManager pair + to remove this otherwise single point of failure. + +* Architecture + +[images/rm-ha-overview.png] Overview of ResourceManager High Availability + +** RM Failover + + ResourceManager HA is realized through an Active/Standby architecture - at + any point of time, one of the RMs is Active, and one or more RMs are in + Standby mode waiting to take over should anything happen to the Active. + The trigger to transition-to-active comes from either the admin (through CLI) + or through the integrated failover-controller when automatic-failover is + enabled. + +*** Manual transitions and failover + + When automatic failover is not enabled, admins have to manually transition + one of the RMs to Active. To failover from one RM to the other, they are + expected to first transition the Active-RM to Standby and transition a + Standby-RM to Active. All this can be done using the "<<>>" + CLI. + +*** Automatic failover + + The RMs have an option to embed the Zookeeper-based ActiveStandbyElector to + decide which RM should be the Active. When the Active goes down or becomes + unresponsive, another RM is automatically elected to be the Active which + then takes over. Note that, there is no need to run a separate ZKFC daemon + as is the case for HDFS because ActiveStandbyElector embedded in RMs acts + as a failure detector and a leader elector instead of a separate ZKFC + deamon. + +*** Client, ApplicationMaster and NodeManager on RM failover + + When there are multiple RMs, the configuration (yarn-site.xml) used by + clients and nodes is expected to list all the RMs. Clients, + ApplicationMasters (AMs) and NodeManagers (NMs) try connecting to the RMs in + a round-robin fashion until they hit the Active RM. If the Active goes down, + they resume the round-robin polling until they hit the "new" Active. + This default retry logic is implemented as + <<>>. + You can override the logic by + implementing <<>> and + setting the value of <<>> to + the class name. + +** Recovering prevous active-RM's state + + With the {{{./ResourceManagerRestart.html}ResourceManger Restart}} enabled, + the RM being promoted to an active state loads the RM internal state and + continues to operate from where the previous active left off as much as + possible depending on the RM restart feature. A new attempt is spawned for + each managed application previously submitted to the RM. Applications can + checkpoint periodically to avoid losing any work. The state-store must be + visible from the both of Active/Standby RMs. Currently, there are two + RMStateStore implementations for persistence - FileSystemRMStateStore + and ZKRMStateStore. The <<>> implicitly allows write access + to a single RM at any point in time, and hence is the recommended store to + use in an HA cluster. When using the ZKRMStateStore, there is no need for a + separate fencing mechanism to address a potential split-brain situation + where multiple RMs can potentially assume the Active role. + + +* Deployment + +** Configurations + + Most of the failover functionality is tunable using various configuration + properties. Following is a list of required/important ones. yarn-default.xml + carries a full-list of knobs. See + {{{../hadoop-yarn-common/yarn-default.xml}yarn-default.xml}} + for more information including default values. + See {{{./ResourceManagerRestart.html}the document for ResourceManger + Restart}} also for instructions on setting up the state-store. + +*-------------------------+----------------------------------------------+ +|| Configuration Property || Description | +*-------------------------+----------------------------------------------+ +| yarn.resourcemanager.zk-address | | +| | Address of the ZK-quorum. +| | Used both for the state-store and embedded leader-election. +*-------------------------+----------------------------------------------+ +| yarn.resourcemanager.ha.enabled | | +| | Enable RM HA +*-------------------------+----------------------------------------------+ +| yarn.resourcemanager.ha.rm-ids | | +| | List of logical IDs for the RMs. | +| | e.g., "rm1,rm2" | +*-------------------------+----------------------------------------------+ +| yarn.resourcemanager.hostname. | | +| | For each , specify the hostname the | +| | RM corresponds to. Alternately, one could set each of the RM's service | +| | addresses. | +*-------------------------+----------------------------------------------+ +| yarn.resourcemanager.ha.id | | +| | Identifies the RM in the ensemble. This is optional; | +| | however, if set, admins have to ensure that all the RMs have their own | +| | IDs in the config | +*-------------------------+----------------------------------------------+ +| yarn.resourcemanager.ha.automatic-failover.enabled | | +| | Enable automatic failover; | +| | By default, it is enabled only when HA is enabled. | +*-------------------------+----------------------------------------------+ +| yarn.resourcemanager.ha.automatic-failover.embedded | | +| | Use embedded leader-elector | +| | to pick the Active RM, when automatic failover is enabled. By default, | +| | it is enabled only when HA is enabled. | +*-------------------------+----------------------------------------------+ +| yarn.resourcemanager.cluster-id | | +| | Identifies the cluster. Used by the elector to | +| | ensure an RM doesn't take over as Active for another cluster. | +*-------------------------+----------------------------------------------+ +| yarn.client.failover-proxy-provider | | +| | The class to be used by Clients, AMs and NMs to failover to the Active RM. | +*-------------------------+----------------------------------------------+ +| yarn.client.failover-max-attempts | | +| | The max number of times FailoverProxyProvider should attempt failover. | +*-------------------------+----------------------------------------------+ +| yarn.client.failover-sleep-base-ms | | +| | The sleep base (in milliseconds) to be used for calculating | +| | the exponential delay between failovers. | +*-------------------------+----------------------------------------------+ +| yarn.client.failover-sleep-max-ms | | +| | The maximum sleep time (in milliseconds) between failovers | +*-------------------------+----------------------------------------------+ +| yarn.client.failover-retries | | +| | The number of retries per attempt to connect to a ResourceManager. | +*-------------------------+----------------------------------------------+ +| yarn.client.failover-retries-on-socket-timeouts | | +| | The number of retries per attempt to connect to a ResourceManager on socket timeouts. | +*-------------------------+----------------------------------------------+ + +*** Sample configurations + + Here is the sample of minimal setup for RM failover. + ++---+ + + yarn.resourcemanager.ha.enabled + true + + + yarn.resourcemanager.cluster-id + cluster1 + + + yarn.resourcemanager.ha.rm-ids + rm1,rm2 + + + yarn.resourcemanager.hostname.rm1 + master1 + + + yarn.resourcemanager.hostname.rm2 + master2 + + + yarn.resourcemanager.zk-address + zk1:2181,zk2:2181,zk3:2181 + ++---+ + +** Admin commands + + <<>> has a few HA-specific command options to check the health/state of an + RM, and transition to Active/Standby. + Commands for HA take service id of RM set by <<>> + as argument. + ++---+ + $ yarn rmadmin -getServiceState rm1 + active + + $ yarn rmadmin -getServiceState rm2 + standby ++---+ + + If automatic failover is enabled, you can not use manual transition command. + ++---+ + $ yarn rmadmin -transitionToStandby rm1 + Automatic failover is enabled for org.apache.hadoop.yarn.client.RMHAServiceTarget@1d8299fd + Refusing to manually manage HA state, since it may cause + a split-brain scenario or other incorrect state. + If you are very sure you know what you are doing, please + specify the forcemanual flag. ++---+ + + See {{{./YarnCommands.html}YarnCommands}} for more details. + +** ResourceManager Web UI services + + Assuming a standby RM is up and running, the Standby automatically redirects + all web requests to the Active, except for the "About" page. + +** Web Services + + Assuming a standby RM is up and running, RM web-services described at + {{{./ResourceManagerRest.html}ResourceManager REST APIs}} when invoked on + a standby RM are automatically redirected to the Active RM. diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/resources/images/rm-ha-overview.png b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/resources/images/rm-ha-overview.png new file mode 100644 index 00000000000..e5ac65eb1b8 Binary files /dev/null and b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/resources/images/rm-ha-overview.png differ