YARN-4100. Add Documentation for Distributed and Delegated-Centralized

Node Labels feature. Contributed by Naganarasimha G R.

(cherry picked from commit db144eb1c5)
This commit is contained in:
Devaraj K 2016-02-02 12:06:51 +05:30
parent c487453b91
commit aeea77ce14
3 changed files with 99 additions and 40 deletions

View File

@ -592,6 +592,9 @@ Release 2.8.0 - UNRELEASED
YARN-4340. Add "list" API to reservation system. (Sean Po via wangda) YARN-4340. Add "list" API to reservation system. (Sean Po via wangda)
YARN-4100. Add Documentation for Distributed and Delegated-Centralized
Node Labels feature. (Naganarasimha G R via devaraj)
OPTIMIZATIONS OPTIMIZATIONS
YARN-3339. TestDockerContainerExecutor should pull a single image and not YARN-3339. TestDockerContainerExecutor should pull a single image and not

View File

@ -2281,26 +2281,26 @@
<!-- Distributed Node Labels Configuration --> <!-- Distributed Node Labels Configuration -->
<property> <property>
<description> <description>
When "yarn.node-labels.configuration-type" parameter in RM is configured as When "yarn.node-labels.configuration-type" is configured with "distributed"
"distributed", Administrators can configure in NM, the provider for the in RM, Administrators can configure in NM the provider for the
node labels by configuring this parameter. Administrators can node labels by configuring this parameter. Administrators can
specify "config", "script" or the class name of the provider. Configured configure "config", "script" or the class name of the provider. Configured
class needs to extend class needs to extend
org.apache.hadoop.yarn.server.nodemanager.nodelabels.NodeLabelsProvider. org.apache.hadoop.yarn.server.nodemanager.nodelabels.NodeLabelsProvider.
If "config" is specified then "ConfigurationNodeLabelsProvider" and If "config" is configured, then "ConfigurationNodeLabelsProvider" and if
"script" then "ScriptNodeLabelsProvider" will be used. "script" is configured, then "ScriptNodeLabelsProvider" will be used.
</description> </description>
<name>yarn.nodemanager.node-labels.provider</name> <name>yarn.nodemanager.node-labels.provider</name>
</property> </property>
<property> <property>
<description> <description>
When node labels "yarn.nodemanager.node-labels.provider" is of type When "yarn.nodemanager.node-labels.provider" is configured with "config",
"config" or the configured class extends AbstractNodeLabelsProvider then "Script" or the configured class extends AbstractNodeLabelsProvider, then
periodically node labels are retrieved from the node labels provider. periodically node labels are retrieved from the node labels provider. This
This configuration is to define the interval. If -1 is configured then configuration is to define the interval period.
node labels are retrieved from. provider only during initialization. If -1 is configured then node labels are retrieved from provider only
Defaults to 10 mins. during initialization. Defaults to 10 mins.
</description> </description>
<name>yarn.nodemanager.node-labels.provider.fetch-interval-ms</name> <name>yarn.nodemanager.node-labels.provider.fetch-interval-ms</name>
<value>600000</value> <value>600000</value>
@ -2308,8 +2308,8 @@
<property> <property>
<description> <description>
Interval at which node labels syncs with RM from NM.Will send loaded labels Interval at which NM syncs its node labels with RM. NM will send its loaded
every x intervals configured along with heartbeat from NM to RM. labels every x intervals configured, along with heartbeat to RM.
</description> </description>
<name>yarn.nodemanager.node-labels.resync-interval-ms</name> <name>yarn.nodemanager.node-labels.resync-interval-ms</name>
<value>120000</value> <value>120000</value>
@ -2317,19 +2317,18 @@
<property> <property>
<description> <description>
When node labels "yarn.nodemanager.node-labels.provider" When "yarn.nodemanager.node-labels.provider" is configured with "config"
is of type "config" then ConfigurationNodeLabelsProvider fetches the then ConfigurationNodeLabelsProvider fetches the partition label from this
partition from this parameter. parameter.
</description> </description>
<name>yarn.nodemanager.node-labels.provider.configured-node-partition</name> <name>yarn.nodemanager.node-labels.provider.configured-node-partition</name>
</property> </property>
<property> <property>
<description> <description>
When node labels "yarn.nodemanager.node-labels.provider" is a class When "yarn.nodemanager.node-labels.provider" is configured with "Script"
which extends AbstractNodeLabelsProvider then this configuration provides then this configuration provides the timeout period after which it will
the timeout period after which it will stop querying the Node labels interrupt the script which queries the Node labels. Defaults to 20 mins.
provider. Defaults to 20 mins.
</description> </description>
<name>yarn.nodemanager.node-labels.provider.fetch-timeout-ms</name> <name>yarn.nodemanager.node-labels.provider.fetch-timeout-ms</name>
<value>1200000</value> <value>1200000</value>
@ -2351,8 +2350,8 @@
<property> <property>
<description> <description>
When node labels "yarn.node-labels.configuration-type" is of type When "yarn.node-labels.configuration-type" is configured with
"delegated-centralized" then periodically node labels are retrieved "delegated-centralized", then periodically node labels are retrieved
from the node labels provider. This configuration is to define the from the node labels provider. This configuration is to define the
interval. If -1 is configured then node labels are retrieved from interval. If -1 is configured then node labels are retrieved from
provider only once for each node after it registers. Defaults to 30 mins. provider only once for each node after it registers. Defaults to 30 mins.
@ -2362,9 +2361,10 @@
</property> </property>
<property> <property>
<description>The Node Label script to run. Script output Lines starting with <description>The Node Label script to run. Script output Line starting with
"NODE_PARTITION:" will be considered for Node Labels. In case of multiple "NODE_PARTITION:" will be considered as Node Label Partition. In case of
lines having the pattern, last one will be considered</description> multiple lines have this pattern, then last one will be considered
</description>
<name>yarn.nodemanager.node-labels.provider.script.path</name> <name>yarn.nodemanager.node-labels.provider.script.path</name>
</property> </property>

View File

@ -15,7 +15,22 @@
YARN Node Labels YARN Node Labels
=============== ===============
# Overview * [Overview](#Overview)
* [Features](#Features)
* [Configuration](#Configuration)
* [Setting up ResourceManager to enable Node Labels](#Setting_up_ResourceManager_to_enable_Node_Labels)
* [Add/modify node labels list to YARN](#Add/modify_node_labels_list_to_YARN)
* [Add/modify node-to-labels mapping to YARN](#Add/modify_node-to-labels_mapping_to_YARN)
* [Configuration of Schedulers for node labels](#Configuration_of_Schedulers_for_node_labels)
* [Specifying node label for application](#Specifying_node_label_for_application)
* [Monitoring](#Monitoring)
* [Monitoring through web UI](#Monitoring_through_web_UI)
* [Monitoring through commandline](#Monitoring_through_commandline)
* [Useful links](#Useful_links)
Overview
--------
Node label is a way to group nodes with similar characteristics and applications can specify where to run. Node label is a way to group nodes with similar characteristics and applications can specify where to run.
Now we only support node partition, which is: Now we only support node partition, which is:
@ -28,20 +43,28 @@ Now we only support node partition, which is:
User can specify set of node labels which can be accessed by each queue, one application can only use subset of node labels that can be accessed by the queue which contains the application. User can specify set of node labels which can be accessed by each queue, one application can only use subset of node labels that can be accessed by the queue which contains the application.
# Features Features
--------
The ```Node Labels``` supports the following features for now: The ```Node Labels``` supports the following features for now:
* Partition cluster - each node can be assigned one label, so the cluster will be divided to several smaller disjoint partitions. * Partition cluster - each node can be assigned one label, so the cluster will be divided to several smaller disjoint partitions.
* ACL of node-labels on queues - user can set accessible node labels on each queue so only some nodes can only be accessed by specific queues. * ACL of node-labels on queues - user can set accessible node labels on each queue so only some nodes can only be accessed by specific queues.
* Specify percentage of resource of a partition which can be accessed by a queue - user can set percentage like: queue A can access 30% of resources on nodes with label=hbase. Such percentage setting will be consistent with existing resource manager * Specify percentage of resource of a partition which can be accessed by a queue - user can set percentage like: queue A can access 30% of resources on nodes with label=hbase. Such percentage setting will be consistent with existing resource manager
* Specify required Node Label in resource request, it will only be allocated when node has the same label. If no node label requirement specified, such Resource Request will only be allocated on nodes belong to DEFAULT partition. * Specify required node label in resource request, it will only be allocated when node has the same label. If no node label requirement specified, such Resource Request will only be allocated on nodes belong to DEFAULT partition.
* Operability * Operability
* Node labels and node labels mapping can be recovered across RM restart * Node labels and node labels mapping can be recovered across RM restart
* Update node labels - admin can update labels on nodes and labels on queues * Update node labels - admin can update labels on nodes and labels on queues
when RM is running when RM is running
* Mapping of NM to node labels can be done in three ways, but in all of the approaches Partition Label should be one among the valid node labels list configured in the RM.
* **Centralized :** Node to labels mapping can be done through RM exposed CLI, REST or RPC.
* **Distributed :** Node to labels mapping will be set by a configured Node Labels Provider in NM. We have two different providers in YARN: *Script* based provider and *Configuration* based provider. In case of script, NM can be configured with a script path and the script can emit the labels of the node. In case of config, node Labels can be directly configured in the NM's yarn-site.xml. In both of these options dynamic refresh of the label mapping is supported.
* **Delegated-Centralized :** Node to labels mapping will be set by a configured Node Labels Provider in RM. This would be helpful when label mapping cannot be provided by each node due to security concerns and to avoid interaction through RM Interfaces for each node in a large cluster. Labels will be fetched from this interface during NM registration and periodical refresh is also supported.
# Configuration Configuration
## Setting up ```ResourceManager``` to enable ```Node Labels```: -------------
###Setting up ResourceManager to enable Node Labels
Setup following properties in ```yarn-site.xml``` Setup following properties in ```yarn-site.xml```
@ -49,23 +72,50 @@ Property | Value
--- | ---- --- | ----
yarn.node-labels.fs-store.root-dir | hdfs://namenode:port/path/to/store/node-labels/ yarn.node-labels.fs-store.root-dir | hdfs://namenode:port/path/to/store/node-labels/
yarn.node-labels.enabled | true yarn.node-labels.enabled | true
yarn.node-labels.configuration-type | Set configuration type for node labels. Administrators can specify “centralized”, “delegated-centralized” or “distributed”. Default value is “centralized”.
Notes: Notes:
* Make sure ```yarn.node-labels.fs-store.root-dir``` is created and ```ResourceManager``` has permission to access it. (Typically from “yarn” user) * Make sure ```yarn.node-labels.fs-store.root-dir``` is created and ```ResourceManager``` has permission to access it. (Typically from “yarn” user)
* If user want to store node label to local file system of RM (instead of HDFS), paths like `file:///home/yarn/node-label` can be used * If user want to store node label to local file system of RM (instead of HDFS), paths like `file:///home/yarn/node-label` can be used
### Add/modify node labels list and node-to-labels mapping to YARN ###Add/modify node labels list to YARN
* Add cluster node labels list: * Add cluster node labels list:
* Executing ```yarn rmadmin -addToClusterNodeLabels "label_1(exclusive=true/false),label_2(exclusive=true/false)"``` to add node label. * Executing ```yarn rmadmin -addToClusterNodeLabels "label_1(exclusive=true/false),label_2(exclusive=true/false)"``` to add node label.
* If user dont specify “(exclusive=…)”, execlusive will be ```true``` by default. * If user dont specify “(exclusive=…)”, exclusive will be ```true``` by default.
* Run ```yarn cluster --list-node-labels``` to check added node labels are visible in the cluster. * Run ```yarn cluster --list-node-labels``` to check added node labels are visible in the cluster.
* Add labels to nodes ###Add/modify node-to-labels mapping to YARN
* Configuring nodes to labels mapping in **Centralized** NodeLabel setup
* Executing ```yarn rmadmin -replaceLabelsOnNode “node1[:port]=label1 node2=label2”```. Added label1 to node1, label2 to node2. If user dont specify port, it added the label to all ```NodeManagers``` running on the node. * Executing ```yarn rmadmin -replaceLabelsOnNode “node1[:port]=label1 node2=label2”```. Added label1 to node1, label2 to node2. If user dont specify port, it added the label to all ```NodeManagers``` running on the node.
## Configuration of Schedulers for node labels * Configuring nodes to labels mapping in **Distributed** NodeLabel setup
### Capacity Scheduler Configuration
Property | Value
----- | ------
yarn.node-labels.configuration-type | Needs to be set as *"distributed"* in RM, to fetch node to labels mapping from a configured Node Labels Provider in NM.
yarn.nodemanager.node-labels.provider | When *"yarn.node-labels.configuration-type"* is configured with *"distributed"* in RM, Administrators can configure the provider for the node labels by configuring this parameter in NM. Administrators can configure *"config"*, *"script"* or the *class name* of the provider. Configured class needs to extend *org.apache.hadoop.yarn.server.nodemanager.nodelabels.NodeLabelsProvider*. If *"config"* is configured, then *"ConfigurationNodeLabelsProvider"* and if *"script"* is configured, then *"ScriptNodeLabelsProvider"* will be used.
yarn.nodemanager.node-labels.resync-interval-ms | Interval at which NM syncs its node labels with RM. NM will send its loaded labels every x intervals configured, along with heartbeat to RM. This resync is required even when the labels are not modified because admin might have removed the cluster label which was provided by NM. Default is 2 mins.
yarn.nodemanager.node-labels.provider.fetch-interval-ms | When *"yarn.nodemanager.node-labels.provider"* is configured with *"config"*, *"script"* or the *configured class* extends AbstractNodeLabelsProvider, then periodically node labels are retrieved from the node labels provider. This configuration is to define the interval period. If -1 is configured, then node labels are retrieved from provider only during initialization. Defaults to 10 mins.
yarn.nodemanager.node-labels.provider.fetch-timeout-ms | When *"yarn.nodemanager.node-labels.provider"* is configured with *"script"*, then this configuration provides the timeout period after which it will interrupt the script which queries the node labels. Defaults to 20 mins.
yarn.nodemanager.node-labels.provider.script.path | The node label script to run. Script output Line starting with *"NODE_PARTITION:"* will be considered as node label Partition. In case multiple lines of script output have this pattern, then the last one will be considered.
yarn.nodemanager.node-labels.provider.script.opts | The arguments to pass to the node label script.
yarn.nodemanager.node-labels.provider.configured-node-partition | When *"yarn.nodemanager.node-labels.provider"* is configured with *"config"*, then ConfigurationNodeLabelsProvider fetches the partition label from this parameter.
* Configuring nodes to labels mapping in **Delegated-Centralized** NodeLabel setup
Property | Value
----- | ------
yarn.node-labels.configuration-type | Needs to be set as *"delegated-centralized"* to fetch node to labels mapping from a configured Node Labels Provider in RM.
yarn.resourcemanager.node-labels.provider | When *"yarn.node-labels.configuration-type"* is configured with *"delegated-centralized"*, then administrators should configure the class for fetching node labels by ResourceManager. Configured class needs to extend *org.apache.hadoop.yarn.server.resourcemanager.nodelabels.RMNodeLabelsMappingProvider*.
yarn.resourcemanager.node-labels.provider.fetch-interval-ms | When *"yarn.node-labels.configuration-type"* is configured with *"delegated-centralized"*, then periodically node labels are retrieved from the node labels provider. This configuration is to define the interval. If -1 is configured, then node labels are retrieved from provider only once for each node after it registers. Defaults to 30 mins.
###Configuration of Schedulers for node labels
* Capacity Scheduler Configuration
Property | Value Property | Value
----- | ------ ----- | ------
yarn.scheduler.capacity.`<queue-path>`.capacity | Set the percentage of the queue can access to nodes belong to DEFAULT partition. The sum of DEFAULT capacities for direct children under each parent, must be equal to 100. yarn.scheduler.capacity.`<queue-path>`.capacity | Set the percentage of the queue can access to nodes belong to DEFAULT partition. The sum of DEFAULT capacities for direct children under each parent, must be equal to 100.
@ -114,27 +164,33 @@ Notes:
* After finishing configuration of CapacityScheduler, execute ```yarn rmadmin -refreshQueues``` to apply changes * After finishing configuration of CapacityScheduler, execute ```yarn rmadmin -refreshQueues``` to apply changes
* Go to scheduler page of RM Web UI to check if you have successfully set configuration. * Go to scheduler page of RM Web UI to check if you have successfully set configuration.
# Specifying node label for application Specifying node label for application
-------------------------------------
Applications can use following Java APIs to specify node label to request Applications can use following Java APIs to specify node label to request
* `ApplicationSubmissionContext.setNodeLabelExpression(..)` to set node label expression for all containers of the application. * `ApplicationSubmissionContext.setNodeLabelExpression(..)` to set node label expression for all containers of the application.
* `ResourceRequest.setNodeLabelExpression(..)` to set node label expression for individual resource requests. This can overwrite node label expression set in ApplicationSubmissionContext * `ResourceRequest.setNodeLabelExpression(..)` to set node label expression for individual resource requests. This can overwrite node label expression set in ApplicationSubmissionContext
* Specify `setAMContainerResourceRequest.setNodeLabelExpression` in `ApplicationSubmissionContext` to indicate expected node label for application master container. * Specify `setAMContainerResourceRequest.setNodeLabelExpression` in `ApplicationSubmissionContext` to indicate expected node label for application master container.
# Monitoring Monitoring
----------
###Monitoring through web UI
## Monitoring through web UI
Following label-related fields can be seen on web UI: Following label-related fields can be seen on web UI:
* Nodes page: http://RM-Address:port/cluster/nodes, you can get labels on each node * Nodes page: http://RM-Address:port/cluster/nodes, you can get labels on each node
* Node labels page: http://RM-Address:port/cluster/nodelabels, you can get type (exclusive/non-exclusive), number of active node managers, total resource of each partition * Node labels page: http://RM-Address:port/cluster/nodelabels, you can get type (exclusive/non-exclusive), number of active node managers, total resource of each partition
* Scheduler page: http://RM-Address:port/cluster/scheduler, you can get label-related settings of each queue, and resource usage of queue partitions. * Scheduler page: http://RM-Address:port/cluster/scheduler, you can get label-related settings of each queue, and resource usage of queue partitions.
## Monitoring through commandline ###Monitoring through commandline
* Use `yarn cluster --list-node-labels` to get labels in the cluster * Use `yarn cluster --list-node-labels` to get labels in the cluster
* Use `yarn node -status <NodeId>` to get node status including labels on a given node * Use `yarn node -status <NodeId>` to get node status including labels on a given node
# Useful links Useful links
------------
* [YARN Capacity Scheduler](http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html), if you need more understanding about how to configure Capacity Scheduler * [YARN Capacity Scheduler](http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html), if you need more understanding about how to configure Capacity Scheduler
* Write YARN application using node labels, you can see following two links as examples: [YARN distributed shell](https://issues.apache.org/jira/browse/YARN-2502), [Hadoop MapReduce](https://issues.apache.org/jira/browse/MAPREDUCE-6304) * Write YARN application using node labels, you can see following two links as examples: [YARN distributed shell](https://issues.apache.org/jira/browse/YARN-2502), [Hadoop MapReduce](https://issues.apache.org/jira/browse/MAPREDUCE-6304)