YARN-1017. Added documentation for ResourceManager Restart. (jianhe)

git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/trunk@1582913 13f79535-47bb-0310-9956-ffa450edef68
2014-03-28 23:36:30 +00:00 · 2014-03-28 23:36:30 +00:00 · 72051fc52c
parent 86ef362a7d
commit 72051fc52c
3 changed files with 211 additions and 0 deletions
--- a/hadoop-project/src/site/site.xml
+++ b/hadoop-project/src/site/site.xml
@ -98,6 +98,7 @@
      <item name="YARN Architecture" href="hadoop-yarn/hadoop-yarn-site/YARN.html"/>
      <item name="Capacity Scheduler" href="hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html"/>
      <item name="Fair Scheduler" href="hadoop-yarn/hadoop-yarn-site/FairScheduler.html"/>
      <item name="ResourceManager Restart" href="hadoop-yarn/hadoop-yarn-site/ResourceManagerRestart.html"/>
      <item name="Web Application Proxy" href="hadoop-yarn/hadoop-yarn-site/WebApplicationProxy.html"/>
      <item name="YARN Timeline Server" href="hadoop-yarn/hadoop-yarn-site/TimelineServer.html"/>
      <item name="Writing YARN Applications" href="hadoop-yarn/hadoop-yarn-site/WritingYarnApplications.html"/>
--- a/hadoop-yarn-project/CHANGES.txt
+++ b/hadoop-yarn-project/CHANGES.txt
@ -335,6 +335,8 @@ Release 2.4.0 - UNRELEASED
    YARN-1891. Added documentation for NodeManager health-monitoring. (Varun
    Vasudev via vinodkv)
    YARN-1017. Added documentation for ResourceManager Restart.(jianhe)
  OPTIMIZATIONS
    YARN-1771. Reduce the number of NameNode operations during localization of
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/ResourceManagerRestart.apt.vm
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/ResourceManagerRestart.apt.vm
@ -0,0 +1,208 @@
 ~~ Licensed under the Apache License, Version 2.0 (the "License");
 ~~ you may not use this file except in compliance with the License.
 ~~ You may obtain a copy of the License at
 ~~
 ~~   http://www.apache.org/licenses/LICENSE-2.0
 ~~
 ~~ Unless required by applicable law or agreed to in writing, software
 ~~ distributed under the License is distributed on an "AS IS" BASIS,
 ~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 ~~ See the License for the specific language governing permissions and
 ~~ limitations under the License. See accompanying LICENSE file.
  ---
  ResourceManger Restart
  ---
  ---
  ${maven.build.timestamp}
 ResourceManger Restart
 %{toc|section=1|fromDepth=0}
 * {Overview}
  ResourceManager is the central authority that manages resources and schedules
  applications running atop of YARN. Hence, it is potentially a single point of
  failure in a Apache YARN cluster.
  This document gives an overview of ResourceManager Restart, a feature that
  enhances ResourceManager to keep functioning across restarts and also makes
  ResourceManager down-time invisible to end-users.
  ResourceManager Restart feature is divided into two phases:
  ResourceManager Restart Phase 1: Enhance RM to persist application/attempt state
  and other credentials information in a pluggable state-store. RM will reload
  this information from state-store upon restart and re-kick the previously
  running applications. Users are not required to re-submit the applications.
  ResourceManager Restart Phase 2:
  Focus on re-constructing the running state of ResourceManger by reading back
  the container statuses from NodeMangers and container requests from ApplicationMasters
  upon restart. The key difference from phase 1 is that previously running applications
  will not be killed after RM restarts, and so applications won't lose its work
  because of RM outage.
  As of Hadoop 2.4.0 release, only ResourceManager Restart Phase 1 is implemented which
  is described below.
 * {Feature}
  The overall concept is that RM will persist the application metadata
  (i.e. ApplicationSubmissionContext) in
  a pluggable state-store when client submits an application and also saves the final status
  of the application such as the completion state (failed, killed, finished) 
  and diagnostics when the application completes. Besides, RM also saves
  the credentials like security keys, tokens to work in a secure environment.
  Any time RM shuts down, as long as the required information (i.e.application metadata
  and the alongside credentials if running in a secure environment) is available
  in the state-store, when RM restarts, it can pick up the application metadata
  from the state-store and re-submit the application. RM won't re-submit the
  applications if they were already completed (i.e. failed, killed, finished)
  before RM went down.
  NodeMangers and clients during the down-time of RM will keep polling RM until 
  RM comes up. When RM becomes alive, it will send a re-sync command to
  all the NodeMangers and ApplicationMasters it was talking to via heartbeats.
  Today, the behaviors for NodeMangers and ApplicationMasters to handle this command
  are: NMs will kill all its managed containers and re-register with RM. From the
  RM's perspective, these re-registered NodeManagers are similar to the newly joining NMs. 
  AMs(e.g. MapReduce AM) today are expected to shutdown when they receive the re-sync command.
  After RM restarts and loads all the application metadata, credentials from state-store
  and populates them into memory, it will create a new
  attempt (i.e. ApplicationMaster) for each application that was not yet completed
  and re-kick that application as usual. As described before, the previously running
  applications' work is lost in this manner since they are essentially killed by
  RM via the re-sync command on restart.
 * {Configurations}
  This section describes the configurations involved to enable RM Restart feature.
  * Enable ResourceManager Restart functionality.
    To enable RM Restart functionality, set the following property in <<conf/yarn-site.xml>> to true:
 *--------------------------------------+--------------------------------------+
 || Property                            || Value                                |
 *--------------------------------------+--------------------------------------+
 | <<<yarn.resourcemanager.recovery.enabled>>> | |
 | | <<<true>>> |
 *--------------------------------------+--------------------------------------+ 
  * Configure the state-store that is used to persist the RM state.
 *--------------------------------------+--------------------------------------+
 || Property                            || Description                        |
 *--------------------------------------+--------------------------------------+
 | <<<yarn.resourcemanager.store.class>>> | |
 | | The class name of the state-store to be used for saving application/attempt |
 | | state and the credentials. The available state-store implementations are  |
 | | <<<org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore>>> |
 | | , a ZooKeeper based state-store implementation and  |
 | | <<<org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore>>> |
 | | , a Hadoop FileSystem based state-store implementation like HDFS. |
 | | The default value is set to |
 | | <<<org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore>>>. |
 *--------------------------------------+--------------------------------------+ 
    * Configurations when using Hadoop FileSystem based state-store implementation.
      Configure the URI where the RM state will be saved in the Hadoop FileSystem state-store.
 *--------------------------------------+--------------------------------------+
 || Property                            || Description                        |
 *--------------------------------------+--------------------------------------+
 | <<<yarn.resourcemanager.fs.state-store.uri>>> | |
 | | URI pointing to the location of the FileSystem path where RM state will |
 | | be stored (e.g. hdfs://localhost:9000/rmstore). |
 | | Default value is <<<${hadoop.tmp.dir}/yarn/system/rmstore>>>. |
 | | If FileSystem name is not provided, <<<fs.default.name>>> specified in |
 | | <<conf/core-site.xml>> will be used. |
 *--------------------------------------+--------------------------------------+ 
      Configure the retry policy state-store client uses to connect with the Hadoop
      FileSystem.
 *--------------------------------------+--------------------------------------+
 || Property                            || Description                        |
 *--------------------------------------+--------------------------------------+
 | <<<yarn.resourcemanager.fs.state-store.retry-policy-spec>>> | |
 | | Hadoop FileSystem client retry policy specification. Hadoop FileSystem client retry | 
 | | is always enabled. Specified in pairs of sleep-time and number-of-retries | 
 | | i.e. (t0, n0), (t1, n1), ..., the first n0 retries sleep t0 milliseconds on |
 | | average, the following n1 retries sleep t1 milliseconds on average, and so on. |
 | | Default value is (2000, 500) |
 *--------------------------------------+--------------------------------------+ 
    * Configurations when using ZooKeeper based state-store implementation.
      Configure the ZooKeeper server address and the root path where the RM state is stored.
 *--------------------------------------+--------------------------------------+
 || Property                            || Description                        |
 *--------------------------------------+--------------------------------------+
 | <<<yarn.resourcemanager.zk-address>>> | |
 | | Comma separated list of Host:Port pairs. Each corresponds to a ZooKeeper server |
 | | (e.g. "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002") to be used by the RM |
 | | for storing RM state. |
 *--------------------------------------+--------------------------------------+
 | <<<yarn.resourcemanager.zk-state-store.parent-path>>> | |
 | | The full path of the root znode where RM state will be stored. |
 | | Default value is /rmstore. |
 *--------------------------------------+--------------------------------------+
      Configure the retry policy state-store client uses to connect with the ZooKeeper server.
 *--------------------------------------+--------------------------------------+
 || Property                            || Description                        |
 *--------------------------------------+--------------------------------------+
 | <<<yarn.resourcemanager.zk-num-retries>>> | |
 | | Number of times RM tries to connect to ZooKeeper server if the connection is lost. |
 | | Default value is 500. |
 *--------------------------------------+--------------------------------------+
 | <<<yarn.resourcemanager.zk-retry-interval-ms>>> |
 | | The interval in milliseconds between retries when connecting to a ZooKeeper server. |
 | | Default value is 2 seconds. |
 *--------------------------------------+--------------------------------------+
 | <<<yarn.resourcemanager.zk-timeout-ms>>> | |
 | | ZooKeeper session timeout in milliseconds. This configuration is used by  |
 | | the ZooKeeper server to determine when the session expires. Session expiration |
 | | happens when the server does not hear from the client (i.e. no heartbeat) within the session |
 | | timeout period specified by this configuration. Default |
 | | value is 10 seconds |
 *--------------------------------------+--------------------------------------+
      Configure the ACLs to be used for setting permissions on ZooKeeper znodes.
 *--------------------------------------+--------------------------------------+
 || Property                            || Description                        |
 *--------------------------------------+--------------------------------------+
 | <<<yarn.resourcemanager.zk-acl>>> | |
 | | ACLs to be used for setting permissions on ZooKeeper znodes. Default value is <<<world:anyone:rwcda>>> |
 *--------------------------------------+--------------------------------------+
  * Configure the max number of application attempt retries.
 *--------------------------------------+--------------------------------------+
 || Property                            || Description                        |
 *--------------------------------------+--------------------------------------+
 | <<<yarn.resourcemanager.am.max-attempts>>> | |
 | | The maximum number of application attempts. It's a global |
 | | setting for all application masters. Each application master can specify |
 | | its individual maximum number of application attempts via the API, but the |
 | | individual number cannot be more than the global upper bound. If it is, |
 | | the RM will override it. The default number is set to 2, to |
 | | allow at least one retry for AM. |
 *--------------------------------------+--------------------------------------+
    This configuration's impact is in fact beyond RM restart scope. It controls
    the max number of attempts an application can have. In RM Restart Phase 1,
    this configuration is needed since as described earlier each time RM restarts,
    it kills the previously running attempt (i.e. ApplicationMaster) and
    creates a new attempt. Therefore, each occurrence of RM restart causes the
    attempt count to increase by 1. In RM Restart phase 2, this configuration is not
    needed since the previously running ApplicationMaster will
    not be killed and the AM will just re-sync back with RM after RM restarts.