YARN-8400. Fix typos in YARN Federation documentation page. Contributed by Giovanni Matteo Fumarola.
(cherry picked from commit 67fc70e09f
)
This commit is contained in:
parent
8526ba9a69
commit
1fa51d4542
|
@ -42,7 +42,7 @@ The applications running in this federated environment see a unified large YARN
|
|||
![YARN Federation Architecture | width=800](./images/federation_architecture.png)
|
||||
|
||||
###YARN Sub-cluster
|
||||
A sub-cluster is a YARN cluster with up to few thousands nodes. The exact size of the sub-cluster will be determined considering ease of deployment/maintenance, alignment
|
||||
A sub-cluster is a YARN cluster with up to a few thousand nodes. The exact size of the sub-cluster will be determined considering ease of deployment/maintenance, alignment
|
||||
with network or availability zones and general best practices.
|
||||
|
||||
The sub-cluster YARN RM will run with work-preserving high-availability turned-on, i.e., we should be able to tolerate YARN RM, NM failures with minimal disruption.
|
||||
|
@ -80,7 +80,7 @@ to minimize overhead on the scheduling infrastructure (more in section on scalab
|
|||
|
||||
###Global Policy Generator
|
||||
Global Policy Generator overlooks the entire federation and ensures that the system is configured and tuned properly all the time.
|
||||
A key design point is that the cluster availability does not depends on an always-on GPG. The GPG operates continuously but out-of-band from all cluster operations,
|
||||
A key design point is that the cluster availability does not depend on an always-on GPG. The GPG operates continuously but out-of-band from all cluster operations,
|
||||
and provide us with a unique vantage point, that allows to enforce global invariants, affect load balancing, trigger draining of sub-clusters that will undergo maintenance, etc.
|
||||
More precisely the GPG will update user capacity allocation-to-subcluster mappings, and more rarely change the policies that run in Routers, AMRMProxy (and possible RMs).
|
||||
|
||||
|
@ -111,7 +111,7 @@ on the home sub-cluster. Only in certain cases it should need to ask for resourc
|
|||
The federation Policy Store is a logically separate store (while it might be backed
|
||||
by the same physical component), which contains information about how applications and
|
||||
resource requests are routed to different sub-clusters. The current implementation provides
|
||||
several policies, ranging from random/hashing/roundrobin/priority to more sophisticated
|
||||
several policies, ranging from random/hashing/round-robin/priority to more sophisticated
|
||||
ones which account for sub-cluster load, and request locality needs.
|
||||
|
||||
|
||||
|
@ -218,7 +218,7 @@ SQL-Server scripts are located in **sbin/FederationStateStore/SQLServer/**.
|
|||
|`yarn.federation.policy-manager` | `org.apache.hadoop.yarn.server.federation.policies.manager.WeightedLocalityPolicyManager` | The choice of policy manager determines how Applications and ResourceRequests are routed through the system. |
|
||||
|`yarn.federation.policy-manager-params` | `<binary>` | The payload that configures the policy. In our example a set of weights for router and amrmproxy policies. This is typically generated by serializing a policymanager that has been configured programmatically, or by populating the state-store with the .json serialized form of it. |
|
||||
|`yarn.federation.subcluster-resolver.class` | `org.apache.hadoop.yarn.server.federation.resolver.DefaultSubClusterResolverImpl` | The class used to resolve which subcluster a node belongs to, and which subcluster(s) a rack belongs to. |
|
||||
|`yarn.federation.machine-list` | `<path of macihne-list file>` | Path of machine-list file used by `SubClusterResolver`. Each line of the file is a node with sub-cluster and rack information. Below is the example: <br/> <br/> node1, subcluster1, rack1 <br/> node2, subcluster2, rack1 <br/> node3, subcluster3, rack2 <br/> node4, subcluster3, rack2 |
|
||||
|`yarn.federation.machine-list` | `<path of machine-list file>` | Path of machine-list file used by `SubClusterResolver`. Each line of the file is a node with sub-cluster and rack information. Below is the example: <br/> <br/> node1, subcluster1, rack1 <br/> node2, subcluster2, rack1 <br/> node3, subcluster3, rack2 <br/> node4, subcluster3, rack2 |
|
||||
|
||||
###ON RMs:
|
||||
|
||||
|
@ -242,7 +242,7 @@ These are extra configurations that should appear in the **conf/yarn-site.xml**
|
|||
| Property | Example | Description |
|
||||
|:---- |:---- |
|
||||
|`yarn.router.bind-host` | `0.0.0.0` | Host IP to bind the router to. The actual address the server will bind to. If this optional address is set, the RPC and webapp servers will bind to this address and the port specified in yarn.router.*.address respectively. This is most useful for making Router listen to all interfaces by setting to 0.0.0.0. |
|
||||
| `yarn.router.clientrm.interceptor-class.pipeline` | `org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor` | A comma-seperated list of interceptor classes to be run at the router when interfacing with the client. The last step of this pipeline must be the Federation Client Interceptor. |
|
||||
| `yarn.router.clientrm.interceptor-class.pipeline` | `org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor` | A comma-separated list of interceptor classes to be run at the router when interfacing with the client. The last step of this pipeline must be the Federation Client Interceptor. |
|
||||
|
||||
Optional:
|
||||
|
||||
|
@ -273,13 +273,13 @@ Optional:
|
|||
|
||||
| Property | Example | Description |
|
||||
|:---- |:---- |
|
||||
| `yarn.nodemanager.amrmproxy.ha.enable` | `true` | Whether or not the AMRMProxy HA is enabled for multiple application attempt suppport. |
|
||||
| `yarn.nodemanager.amrmproxy.ha.enable` | `true` | Whether or not the AMRMProxy HA is enabled for multiple application attempt support. |
|
||||
| `yarn.federation.statestore.max-connections` | `1` | The maximum number of parallel connections from each AMRMProxy to the state-store. This value is typically lower than the router one, since we have many AMRMProxy that could burn-through many DB connections quickly. |
|
||||
| `yarn.federation.cache-ttl.secs` | `300` | The time to leave for the AMRMProxy cache. Typically larger than at the router, as the number of AMRMProxy is large, and we want to limit the load to the centralized state-store. |
|
||||
|
||||
Running a Sample Job
|
||||
--------------------
|
||||
In order to submit jobs to a Federation cluster one must create a seperate set of configs for the client from which jobs will be submitted. In these, the **conf/yarn-site.xml** should have the following additional configurations:
|
||||
In order to submit jobs to a Federation cluster one must create a separate set of configs for the client from which jobs will be submitted. In these, the **conf/yarn-site.xml** should have the following additional configurations:
|
||||
|
||||
| Property | Example | Description |
|
||||
|:--- |:--- |
|
||||
|
|
Loading…
Reference in New Issue