YARN-1017. Added documentation for ResourceManager Restart. (jianhe)
git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/trunk@1582913 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
parent
86ef362a7d
commit
72051fc52c
|
@ -98,6 +98,7 @@
|
||||||
<item name="YARN Architecture" href="hadoop-yarn/hadoop-yarn-site/YARN.html"/>
|
<item name="YARN Architecture" href="hadoop-yarn/hadoop-yarn-site/YARN.html"/>
|
||||||
<item name="Capacity Scheduler" href="hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html"/>
|
<item name="Capacity Scheduler" href="hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html"/>
|
||||||
<item name="Fair Scheduler" href="hadoop-yarn/hadoop-yarn-site/FairScheduler.html"/>
|
<item name="Fair Scheduler" href="hadoop-yarn/hadoop-yarn-site/FairScheduler.html"/>
|
||||||
|
<item name="ResourceManager Restart" href="hadoop-yarn/hadoop-yarn-site/ResourceManagerRestart.html"/>
|
||||||
<item name="Web Application Proxy" href="hadoop-yarn/hadoop-yarn-site/WebApplicationProxy.html"/>
|
<item name="Web Application Proxy" href="hadoop-yarn/hadoop-yarn-site/WebApplicationProxy.html"/>
|
||||||
<item name="YARN Timeline Server" href="hadoop-yarn/hadoop-yarn-site/TimelineServer.html"/>
|
<item name="YARN Timeline Server" href="hadoop-yarn/hadoop-yarn-site/TimelineServer.html"/>
|
||||||
<item name="Writing YARN Applications" href="hadoop-yarn/hadoop-yarn-site/WritingYarnApplications.html"/>
|
<item name="Writing YARN Applications" href="hadoop-yarn/hadoop-yarn-site/WritingYarnApplications.html"/>
|
||||||
|
|
|
@ -335,6 +335,8 @@ Release 2.4.0 - UNRELEASED
|
||||||
YARN-1891. Added documentation for NodeManager health-monitoring. (Varun
|
YARN-1891. Added documentation for NodeManager health-monitoring. (Varun
|
||||||
Vasudev via vinodkv)
|
Vasudev via vinodkv)
|
||||||
|
|
||||||
|
YARN-1017. Added documentation for ResourceManager Restart.(jianhe)
|
||||||
|
|
||||||
OPTIMIZATIONS
|
OPTIMIZATIONS
|
||||||
|
|
||||||
YARN-1771. Reduce the number of NameNode operations during localization of
|
YARN-1771. Reduce the number of NameNode operations during localization of
|
||||||
|
|
|
@ -0,0 +1,208 @@
|
||||||
|
~~ Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
~~ you may not use this file except in compliance with the License.
|
||||||
|
~~ You may obtain a copy of the License at
|
||||||
|
~~
|
||||||
|
~~ http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
~~
|
||||||
|
~~ Unless required by applicable law or agreed to in writing, software
|
||||||
|
~~ distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
~~ See the License for the specific language governing permissions and
|
||||||
|
~~ limitations under the License. See accompanying LICENSE file.
|
||||||
|
|
||||||
|
---
|
||||||
|
ResourceManger Restart
|
||||||
|
---
|
||||||
|
---
|
||||||
|
${maven.build.timestamp}
|
||||||
|
|
||||||
|
ResourceManger Restart
|
||||||
|
|
||||||
|
%{toc|section=1|fromDepth=0}
|
||||||
|
|
||||||
|
* {Overview}
|
||||||
|
|
||||||
|
ResourceManager is the central authority that manages resources and schedules
|
||||||
|
applications running atop of YARN. Hence, it is potentially a single point of
|
||||||
|
failure in a Apache YARN cluster.
|
||||||
|
|
||||||
|
This document gives an overview of ResourceManager Restart, a feature that
|
||||||
|
enhances ResourceManager to keep functioning across restarts and also makes
|
||||||
|
ResourceManager down-time invisible to end-users.
|
||||||
|
|
||||||
|
ResourceManager Restart feature is divided into two phases:
|
||||||
|
|
||||||
|
ResourceManager Restart Phase 1: Enhance RM to persist application/attempt state
|
||||||
|
and other credentials information in a pluggable state-store. RM will reload
|
||||||
|
this information from state-store upon restart and re-kick the previously
|
||||||
|
running applications. Users are not required to re-submit the applications.
|
||||||
|
|
||||||
|
ResourceManager Restart Phase 2:
|
||||||
|
Focus on re-constructing the running state of ResourceManger by reading back
|
||||||
|
the container statuses from NodeMangers and container requests from ApplicationMasters
|
||||||
|
upon restart. The key difference from phase 1 is that previously running applications
|
||||||
|
will not be killed after RM restarts, and so applications won't lose its work
|
||||||
|
because of RM outage.
|
||||||
|
|
||||||
|
As of Hadoop 2.4.0 release, only ResourceManager Restart Phase 1 is implemented which
|
||||||
|
is described below.
|
||||||
|
|
||||||
|
* {Feature}
|
||||||
|
|
||||||
|
The overall concept is that RM will persist the application metadata
|
||||||
|
(i.e. ApplicationSubmissionContext) in
|
||||||
|
a pluggable state-store when client submits an application and also saves the final status
|
||||||
|
of the application such as the completion state (failed, killed, finished)
|
||||||
|
and diagnostics when the application completes. Besides, RM also saves
|
||||||
|
the credentials like security keys, tokens to work in a secure environment.
|
||||||
|
Any time RM shuts down, as long as the required information (i.e.application metadata
|
||||||
|
and the alongside credentials if running in a secure environment) is available
|
||||||
|
in the state-store, when RM restarts, it can pick up the application metadata
|
||||||
|
from the state-store and re-submit the application. RM won't re-submit the
|
||||||
|
applications if they were already completed (i.e. failed, killed, finished)
|
||||||
|
before RM went down.
|
||||||
|
|
||||||
|
NodeMangers and clients during the down-time of RM will keep polling RM until
|
||||||
|
RM comes up. When RM becomes alive, it will send a re-sync command to
|
||||||
|
all the NodeMangers and ApplicationMasters it was talking to via heartbeats.
|
||||||
|
Today, the behaviors for NodeMangers and ApplicationMasters to handle this command
|
||||||
|
are: NMs will kill all its managed containers and re-register with RM. From the
|
||||||
|
RM's perspective, these re-registered NodeManagers are similar to the newly joining NMs.
|
||||||
|
AMs(e.g. MapReduce AM) today are expected to shutdown when they receive the re-sync command.
|
||||||
|
After RM restarts and loads all the application metadata, credentials from state-store
|
||||||
|
and populates them into memory, it will create a new
|
||||||
|
attempt (i.e. ApplicationMaster) for each application that was not yet completed
|
||||||
|
and re-kick that application as usual. As described before, the previously running
|
||||||
|
applications' work is lost in this manner since they are essentially killed by
|
||||||
|
RM via the re-sync command on restart.
|
||||||
|
|
||||||
|
* {Configurations}
|
||||||
|
|
||||||
|
This section describes the configurations involved to enable RM Restart feature.
|
||||||
|
|
||||||
|
* Enable ResourceManager Restart functionality.
|
||||||
|
|
||||||
|
To enable RM Restart functionality, set the following property in <<conf/yarn-site.xml>> to true:
|
||||||
|
|
||||||
|
*--------------------------------------+--------------------------------------+
|
||||||
|
|| Property || Value |
|
||||||
|
*--------------------------------------+--------------------------------------+
|
||||||
|
| <<<yarn.resourcemanager.recovery.enabled>>> | |
|
||||||
|
| | <<<true>>> |
|
||||||
|
*--------------------------------------+--------------------------------------+
|
||||||
|
|
||||||
|
|
||||||
|
* Configure the state-store that is used to persist the RM state.
|
||||||
|
|
||||||
|
*--------------------------------------+--------------------------------------+
|
||||||
|
|| Property || Description |
|
||||||
|
*--------------------------------------+--------------------------------------+
|
||||||
|
| <<<yarn.resourcemanager.store.class>>> | |
|
||||||
|
| | The class name of the state-store to be used for saving application/attempt |
|
||||||
|
| | state and the credentials. The available state-store implementations are |
|
||||||
|
| | <<<org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore>>> |
|
||||||
|
| | , a ZooKeeper based state-store implementation and |
|
||||||
|
| | <<<org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore>>> |
|
||||||
|
| | , a Hadoop FileSystem based state-store implementation like HDFS. |
|
||||||
|
| | The default value is set to |
|
||||||
|
| | <<<org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore>>>. |
|
||||||
|
*--------------------------------------+--------------------------------------+
|
||||||
|
|
||||||
|
* Configurations when using Hadoop FileSystem based state-store implementation.
|
||||||
|
|
||||||
|
Configure the URI where the RM state will be saved in the Hadoop FileSystem state-store.
|
||||||
|
|
||||||
|
*--------------------------------------+--------------------------------------+
|
||||||
|
|| Property || Description |
|
||||||
|
*--------------------------------------+--------------------------------------+
|
||||||
|
| <<<yarn.resourcemanager.fs.state-store.uri>>> | |
|
||||||
|
| | URI pointing to the location of the FileSystem path where RM state will |
|
||||||
|
| | be stored (e.g. hdfs://localhost:9000/rmstore). |
|
||||||
|
| | Default value is <<<${hadoop.tmp.dir}/yarn/system/rmstore>>>. |
|
||||||
|
| | If FileSystem name is not provided, <<<fs.default.name>>> specified in |
|
||||||
|
| | <<conf/core-site.xml>> will be used. |
|
||||||
|
*--------------------------------------+--------------------------------------+
|
||||||
|
|
||||||
|
Configure the retry policy state-store client uses to connect with the Hadoop
|
||||||
|
FileSystem.
|
||||||
|
|
||||||
|
*--------------------------------------+--------------------------------------+
|
||||||
|
|| Property || Description |
|
||||||
|
*--------------------------------------+--------------------------------------+
|
||||||
|
| <<<yarn.resourcemanager.fs.state-store.retry-policy-spec>>> | |
|
||||||
|
| | Hadoop FileSystem client retry policy specification. Hadoop FileSystem client retry |
|
||||||
|
| | is always enabled. Specified in pairs of sleep-time and number-of-retries |
|
||||||
|
| | i.e. (t0, n0), (t1, n1), ..., the first n0 retries sleep t0 milliseconds on |
|
||||||
|
| | average, the following n1 retries sleep t1 milliseconds on average, and so on. |
|
||||||
|
| | Default value is (2000, 500) |
|
||||||
|
*--------------------------------------+--------------------------------------+
|
||||||
|
|
||||||
|
* Configurations when using ZooKeeper based state-store implementation.
|
||||||
|
|
||||||
|
Configure the ZooKeeper server address and the root path where the RM state is stored.
|
||||||
|
|
||||||
|
*--------------------------------------+--------------------------------------+
|
||||||
|
|| Property || Description |
|
||||||
|
*--------------------------------------+--------------------------------------+
|
||||||
|
| <<<yarn.resourcemanager.zk-address>>> | |
|
||||||
|
| | Comma separated list of Host:Port pairs. Each corresponds to a ZooKeeper server |
|
||||||
|
| | (e.g. "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002") to be used by the RM |
|
||||||
|
| | for storing RM state. |
|
||||||
|
*--------------------------------------+--------------------------------------+
|
||||||
|
| <<<yarn.resourcemanager.zk-state-store.parent-path>>> | |
|
||||||
|
| | The full path of the root znode where RM state will be stored. |
|
||||||
|
| | Default value is /rmstore. |
|
||||||
|
*--------------------------------------+--------------------------------------+
|
||||||
|
|
||||||
|
Configure the retry policy state-store client uses to connect with the ZooKeeper server.
|
||||||
|
|
||||||
|
*--------------------------------------+--------------------------------------+
|
||||||
|
|| Property || Description |
|
||||||
|
*--------------------------------------+--------------------------------------+
|
||||||
|
| <<<yarn.resourcemanager.zk-num-retries>>> | |
|
||||||
|
| | Number of times RM tries to connect to ZooKeeper server if the connection is lost. |
|
||||||
|
| | Default value is 500. |
|
||||||
|
*--------------------------------------+--------------------------------------+
|
||||||
|
| <<<yarn.resourcemanager.zk-retry-interval-ms>>> |
|
||||||
|
| | The interval in milliseconds between retries when connecting to a ZooKeeper server. |
|
||||||
|
| | Default value is 2 seconds. |
|
||||||
|
*--------------------------------------+--------------------------------------+
|
||||||
|
| <<<yarn.resourcemanager.zk-timeout-ms>>> | |
|
||||||
|
| | ZooKeeper session timeout in milliseconds. This configuration is used by |
|
||||||
|
| | the ZooKeeper server to determine when the session expires. Session expiration |
|
||||||
|
| | happens when the server does not hear from the client (i.e. no heartbeat) within the session |
|
||||||
|
| | timeout period specified by this configuration. Default |
|
||||||
|
| | value is 10 seconds |
|
||||||
|
*--------------------------------------+--------------------------------------+
|
||||||
|
|
||||||
|
Configure the ACLs to be used for setting permissions on ZooKeeper znodes.
|
||||||
|
|
||||||
|
*--------------------------------------+--------------------------------------+
|
||||||
|
|| Property || Description |
|
||||||
|
*--------------------------------------+--------------------------------------+
|
||||||
|
| <<<yarn.resourcemanager.zk-acl>>> | |
|
||||||
|
| | ACLs to be used for setting permissions on ZooKeeper znodes. Default value is <<<world:anyone:rwcda>>> |
|
||||||
|
*--------------------------------------+--------------------------------------+
|
||||||
|
|
||||||
|
* Configure the max number of application attempt retries.
|
||||||
|
|
||||||
|
*--------------------------------------+--------------------------------------+
|
||||||
|
|| Property || Description |
|
||||||
|
*--------------------------------------+--------------------------------------+
|
||||||
|
| <<<yarn.resourcemanager.am.max-attempts>>> | |
|
||||||
|
| | The maximum number of application attempts. It's a global |
|
||||||
|
| | setting for all application masters. Each application master can specify |
|
||||||
|
| | its individual maximum number of application attempts via the API, but the |
|
||||||
|
| | individual number cannot be more than the global upper bound. If it is, |
|
||||||
|
| | the RM will override it. The default number is set to 2, to |
|
||||||
|
| | allow at least one retry for AM. |
|
||||||
|
*--------------------------------------+--------------------------------------+
|
||||||
|
|
||||||
|
This configuration's impact is in fact beyond RM restart scope. It controls
|
||||||
|
the max number of attempts an application can have. In RM Restart Phase 1,
|
||||||
|
this configuration is needed since as described earlier each time RM restarts,
|
||||||
|
it kills the previously running attempt (i.e. ApplicationMaster) and
|
||||||
|
creates a new attempt. Therefore, each occurrence of RM restart causes the
|
||||||
|
attempt count to increase by 1. In RM Restart phase 2, this configuration is not
|
||||||
|
needed since the previously running ApplicationMaster will
|
||||||
|
not be killed and the AM will just re-sync back with RM after RM restarts.
|
Loading…
Reference in New Issue