YARN-2683. [YARN-913] registry config options: document and move to core-default. (stevel)

2015-02-11 14:26:22 +00:00 · 2015-02-11 14:26:22 +00:00 · c3da2db48f
parent 362565cf5a
commit c3da2db48f
8 changed files with 805 additions and 205 deletions
--- a/hadoop-common-project/hadoop-common/src/main/resources/core-default.xml
+++ b/hadoop-common-project/hadoop-common/src/main/resources/core-default.xml
@ -1676,4 +1676,136 @@ for ldap providers in the same way as above does.
  </description>
 </property>

+
+  <!-- YARN registry -->
+
+  <property>
+    <description>
+      Is the registry enabled in the YARN Resource Manager?
+
+      If true, the YARN RM will, as needed.
+      create the user and system paths, and purge
+      service records when containers, application attempts
+      and applications complete.
+
+      If false, the paths must be created by other means,
+      and no automatic cleanup of service records will take place.
+    </description>
+    <name>hadoop.registry.rm.enabled</name>
+    <value>false</value>
+  </property>
+
+  <property>
+    <description>
+      The root zookeeper node for the registry
+    </description>
+    <name>hadoop.registry.zk.root</name>
+    <value>/registry</value>
+  </property>
+
+  <property>
+    <description>
+      Zookeeper session timeout in milliseconds
+    </description>
+    <name>hadoop.registry.zk.session.timeout.ms</name>
+    <value>60000</value>
+  </property>
+
+  <property>
+    <description>
+      Zookeeper connection timeout in milliseconds
+    </description>
+    <name>hadoop.registry.zk.connection.timeout.ms</name>
+    <value>15000</value>
+  </property>
+
+  <property>
+    <description>
+      Zookeeper connection retry count before failing
+    </description>
+    <name>hadoop.registry.zk.retry.times</name>
+    <value>5</value>
+  </property>
+
+  <property>
+    <description>
+    </description>
+    <name>hadoop.registry.zk.retry.interval.ms</name>
+    <value>1000</value>
+  </property>
+
+  <property>
+    <description>
+      Zookeeper retry limit in milliseconds, during
+      exponential backoff.
+
+      This places a limit even
+      if the retry times and interval limit, combined
+      with the backoff policy, result in a long retry
+      period
+    </description>
+    <name>hadoop.registry.zk.retry.ceiling.ms</name>
+    <value>60000</value>
+  </property>
+
+  <property>
+    <description>
+      List of hostname:port pairs defining the
+      zookeeper quorum binding for the registry
+    </description>
+    <name>hadoop.registry.zk.quorum</name>
+    <value>localhost:2181</value>
+  </property>
+
+  <property>
+    <description>
+      Key to set if the registry is secure. Turning it on
+      changes the permissions policy from "open access"
+      to restrictions on kerberos with the option of
+      a user adding one or more auth key pairs down their
+      own tree.
+    </description>
+    <name>hadoop.registry.secure</name>
+    <value>false</value>
+  </property>
+
+  <property>
+    <description>
+      A comma separated list of Zookeeper ACL identifiers with
+      system access to the registry in a secure cluster.
+
+      These are given full access to all entries.
+
+      If there is an "@" at the end of a SASL entry it
+      instructs the registry client to append the default kerberos domain.
+    </description>
+    <name>hadoop.registry.system.acls</name>
+    <value>sasl:yarn@, sasl:mapred@, sasl:hdfs@</value>
+  </property>
+
+  <property>
+    <description>
+      The kerberos realm: used to set the realm of
+      system principals which do not declare their realm,
+      and any other accounts that need the value.
+
+      If empty, the default realm of the running process
+      is used.
+
+      If neither are known and the realm is needed, then the registry
+      service/client will fail.
+    </description>
+    <name>hadoop.registry.kerberos.realm</name>
+    <value></value>
+  </property>
+
+  <property>
+    <description>
+      Key to define the JAAS context. Used in secure
+      mode
+    </description>
+    <name>hadoop.registry.jaas.context</name>
+    <value>Client</value>
+  </property>
+
 </configuration>
--- a/hadoop-yarn-project/CHANGES.txt
+++ b/hadoop-yarn-project/CHANGES.txt
@ -263,6 +263,9 @@ Release 2.7.0 - UNRELEASED

    YARN-3100. Made YARN authorization pluggable. (Jian He via zjshen)

+    YARN-2683. [YARN-913] registry config options: document and move to
+    core-default. (stevel)
+
  OPTIMIZATIONS

    YARN-2990. FairScheduler's delay-scheduling always waits for node-local and 
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
@ -1568,131 +1568,6 @@
    <value>false</value>
  </property>

-  <!-- YARN registry -->
-
-  <property>
-    <description>
-      Is the registry enabled: does the RM start it up,
-      create the user and system paths, and purge
-      service records when containers, application attempts
-      and applications complete
-    </description>
-    <name>hadoop.registry.rm.enabled</name>
-    <value>false</value>
-  </property>
-
-  <property>
-    <description>
-    </description>
-    <name>hadoop.registry.zk.root</name>
-    <value>/registry</value>
-  </property>
-
-  <property>
-    <description>
-      Zookeeper session timeout in milliseconds
-    </description>
-    <name>hadoop.registry.zk.session.timeout.ms</name>
-    <value>60000</value>
-  </property>
-
-  <property>
-    <description>
-      Zookeeper session timeout in milliseconds
-    </description>
-    <name>hadoop.registry.zk.connection.timeout.ms</name>
-    <value>15000</value>
-  </property>
-
-  <property>
-    <description>
-      Zookeeper connection retry count before failing
-    </description>
-    <name>hadoop.registry.zk.retry.times</name>
-    <value>5</value>
-  </property>
-
-  <property>
-    <description>
-    </description>
-    <name>hadoop.registry.zk.retry.interval.ms</name>
-    <value>1000</value>
-  </property>
-
-  <property>
-    <description>
-      Zookeeper retry limit in milliseconds, during
-      exponential backoff: {@value}
-
-      This places a limit even
-      if the retry times and interval limit, combined
-      with the backoff policy, result in a long retry
-      period
-    </description>
-    <name>hadoop.registry.zk.retry.ceiling.ms</name>
-    <value>60000</value>
-  </property>
-
-  <property>
-    <description>
-      List of hostname:port pairs defining the
-      zookeeper quorum binding for the registry
-    </description>
-    <name>hadoop.registry.zk.quorum</name>
-    <value>localhost:2181</value>
-  </property>
-
-  <property>
-    <description>
-      Key to set if the registry is secure. Turning it on
-      changes the permissions policy from "open access"
-      to restrictions on kerberos with the option of
-      a user adding one or more auth key pairs down their
-      own tree.
-    </description>
-    <name>hadoop.registry.secure</name>
-    <value>false</value>
-  </property>
-
-  <property>
-    <description>
-      A comma separated list of Zookeeper ACL identifiers with
-      system access to the registry in a secure cluster.
-
-      These are given full access to all entries.
-
-      If there is an "@" at the end of a SASL entry it
-      instructs the registry client to append the default kerberos domain.
-    </description>
-    <name>hadoop.registry.system.acls</name>
-    <value>sasl:yarn@, sasl:mapred@, sasl:mapred@hdfs@</value>
-  </property>
-
-  <property>
-    <description>
-      The kerberos realm: used to set the realm of
-      system principals which do not declare their realm,
-      and any other accounts that need the value.
-
-      If empty, the default realm of the running process
-      is used.
-
-      If neither are known and the realm is needed, then the registry
-      service/client will fail.
-    </description>
-    <name>hadoop.registry.kerberos.realm</name>
-    <value></value>
-  </property>
-
-  <property>
-    <description>
-      Key to define the JAAS context. Used in secure
-      mode
-    </description>
-    <name>hadoop.registry.jaas.context</name>
-    <value>Client</value>
-  </property>
-
  <property>
    <description>Defines how often NMs wake up to upload log files.
    The default value is -1. By default, the logs will be uploaded when
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/registry/index.md
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/registry/index.md
@ -24,5 +24,6 @@ and use the binding information to connect with the services's network-accessibl
 endpoints, be they REST, IPC, Web UI, Zookeeper quorum+path or some other protocol.

 * [Architecture](yarn-registry.html)
+* [Configuration](registry-configuration.html)
 * [Using the YARN Service registry](using-the-yarn-service-registry.html)
 * [Security](registry-security.html)
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/registry/registry-configuration.md
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/registry/registry-configuration.md
@ -0,0 +1,453 @@
+<!---
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Registry Configuration
+
+
+The YARN service registry is built on top of Apache Zookeeper.
+It is configured by way of a Hadoop `Configuration` class:
+the instance used to create the service controls the behavior of the client.
+
+This document lists the configuration parameters which control the
+registry client and its deployment in the YARN Resource Manager.
+
+The default values of all these settings are defined in `core-default.xml`.
+The values in this file may not match those listed in this document.
+If this is the case, the values in `core-default.xml` MUST be considered normative.
+
+## Changing the configuration values
+
+Changes to the configuration values SHOULD be done in `core-site.xml`.
+This will ensure that client and non-YARN applications will pick up
+the values, so enabling them to read from and potentially write to the registry.
+
+
+## Core Settings
+
+
+### Enabling the Registry in the Resource Manager
+
+The Resource Manager manages user directory creation and record cleanup
+on YARN container/application attempt/application completion.
+
+```
+  <property>
+    <description>
+      Is the registry enabled in the YARN Resource Manager?
+
+      If true, the YARN RM will, as needed.
+      create the user and system paths, and purge
+      service records when containers, application attempts
+      and applications complete.
+
+      If false, the paths must be created by other means,
+      and no automatic cleanup of service records will take place.
+    </description>
+    <name>hadoop.registry.rm.enabled</name>
+    <value>false</value>
+  </property>
+```
+
+If the property is set in `core-site.xml` or `yarn-site.xml`,
+the YARN Resource Manager will behave as follows:
+1. On startup: create the initial root paths of `/`, `/services` and `/users`.
+  On a secure cluster, access will be restricted to the system accounts (see below).
+2. When a user submits a job: create the user path under `/users`.
+3. When a container is completed: delete from the registry all service records
+   with a `yarn:persistence` field of value `container`, and a `yarn:id` field
+   whose value matches the ID of the completed container.
+4. When an application attempt is completed: remove all service records with
+   `yarn:persistence` set to `application-attempt` and `yarn:id` set to the
+   pplication attempt ID.
+5. When an application finishes: remove all service records with
+   `yarn:persistence` set to `application` and `yarn:id` set to the
+   application ID.
+
+All these operations are asynchronous, so that zookeeper connectivity problems
+do not delay RM operations or work scheduling.
+
+If the property `hadoop.registry.rm.enabled` is set to `false`, the RM will
+not interact with the registry —and the listed operations will not take place.
+The root paths may be created by other means, but service record cleanup will not take place.
+
+### Setting the Zookeeper Quorum: `hadoop.registry.zk.quorum`
+
+This is an essential setting: it identifies the lists of zookeeper hosts
+and the ports on which the ZK services are listening.
+
+
+```
+  <property>
+    <description>
+      List of hostname:port pairs defining the
+      zookeeper quorum binding for the registry
+    </description>
+    <name>hadoop.registry.zk.quorum</name>
+    <value>localhost:2181</value>
+  </property>
+```
+
+It takes a comma-separated list, such as `zk1:2181 ,zk2:2181, zk3:2181`
+
+### Setting the Zookeeper Registry Base path: `hadoop.registry.zk.root`
+
+This path sets the base zookeeper node for the registry
+
+```
+  <property>
+    <description>
+      The root zookeeper node for the registry
+    </description>
+    <name>hadoop.registry.zk.root</name>
+    <value>/registry</value>
+  </property>
+```
+
+The default value of `/registry` is normally sufficient. A different value
+may be needed for security reasons or because the `/registry` path is in use.
+
+The root value is prepended to all registry paths so as to create the absolute
+path. For example:
+
+* `/` maps to `/registry`
+* `/services` maps to `/registry/services`
+* `/users/yarn` maps to `/registry/users/yarn`
+
+A different value of `hadoop.registry.zk.root` would result in a different
+mapping to absolute zookeeper paths.
+
+
+## Security Options
+
+Registry security is enabled when the property `hadoop.registry.secure`
+is set to `true`. Once set, nodes are created with permissions, so that
+only a specific user *and the configured cluster "superuser" accounts*
+can write under their home path of `${hadoop.registry.zk.root}/users`.
+Only the superuser accounts
+will be able to manipulate the root path, including `${hadoop.registry.zk.root}/services`
+and `${hadoop.registry.zk.root}/users`.
+
+All write operations on the registry (including deleting entries and paths)
+must be authenticated. Read operations are still permitted by unauthenticated
+callers.
+
+The key settings for secure registry support are:
+
+* enabling the secure mode:  `hadoop.registry.secure`
+* listing the superuser zookeeper ACLs:  `hadoop.registry.system.acls`
+* listing the kerberos realm for the principals: `hadoop.registry.kerberos.realm`
+* identifying the JAAS context within the JAAS configuration which defines
+the user: `hadoop.registry.jaas.context`
+
+
+### Enabling security
+
+```
+  <property>
+    <description>
+      Key to set if the registry is secure. Turning it on
+      changes the permissions policy from "open access"
+      to restrictions on kerberos with the option of
+      a user adding one or more auth key pairs down their
+      own tree.
+    </description>
+    <name>hadoop.registry.secure</name>
+    <value>false</value>
+  </property>
+```
+
+### Identifying the client JAAS context
+
+The registry clients must identify the JAAS context which they use
+to authenticate to the registry.
+
+```
+  <property>
+    <description>
+      Key to define the JAAS context. Used in secure mode
+    </description>
+    <name>hadoop.registry.jaas.context</name>
+    <value>Client</value>
+  </property>
+```
+
+*Note* as the Resource Manager is simply another client of the registry, it
+too must have this context defined.
+
+
+### Identifying the system accounts `hadoop.registry.system.acls`
+
+These are the the accounts which are given full access to the base of the
+registry. The Resource Manager needs this option to create the root paths.
+
+Client applications writing to the registry access to the nodes it creates.
+
+1. The property `hadoop.registry.system.acls` takes a comma-separated list
+of zookeeper `ACLs` which are given full access to created nodes; the permissions
+`READ | WRITE | CREATE | DELETE | ADMIN`.
+2. Any zookeeper ACL scheme may be added to this, such as the `digest:` scheme.
+3. The SASL scheme, `sasl:`, is used to identify which callers identified
+by sasl have full access. These are the superuser accounts.
+4. They may be identified by elements such as `sasl:yarn@REALM.COM`.
+5. To aid portability of SASL settings, especially the default value,
+any `sasl:` entry without the realm value —that
+is, any entry that terminates in the `@` symbol— has the current realm appended
+to it.
+6. This realm is set to that of the current user.
+7. It may be overridden by the property `hadoop.registry.kerberos.realm`.
+
+
+```
+  <property>
+    <description>
+      A comma separated list of Zookeeper ACL identifiers with
+      system access to the registry in a secure cluster.
+      These are given full access to all entries.
+      If there is an "@" at the end of a SASL entry it
+      instructs the registry client to append the default kerberos domain.
+    </description>
+    <name>hadoop.registry.system.acls</name>
+    <value>sasl:yarn@, sasl:mapred@, sasl:mapred@, sasl:hdfs@</value>
+  </property>
+
+  <property>
+    <description>
+      The kerberos realm: used to set the realm of
+      system principals which do not declare their realm,
+      and any other accounts that need the value.
+      If empty, the default realm of the running process
+      is used.
+      If neither are known and the realm is needed, then the registry
+      service/client will fail.
+    </description>
+    <name>hadoop.registry.kerberos.realm</name>
+    <value></value>
+  </property>
+```
+
+Example: an `hadoop.registry.system.acls` entry of
+ `sasl:yarn@, sasl:admin@EXAMPLE.COM, sasl:system@REALM2`,
+would, in a YARN cluster with the realm `EXAMPLE.COM`, add the following
+admin accounts to every node
+
+* `sasl:yarn@EXAMPLE.COM`
+* `sasl:admin@EXAMPLE.COM`
+* `sasl:system@REALM2`
+
+The identity of a client application creating registry entries will be
+automatically included in the permissions of all entries created.
+If, for example, the account creating an entry was `hbase`, another
+entry would be created
+
+* `sasl:hbase@EXAMPLE.COM`
+
+
+**Important**: when setting the system ACLS, *it is critical to include the
+identity of the YARN Resource Manager*.
+
+The RM needs to be able to create the root and user paths, and delete service
+records during application and container cleanup.
+
+
+## Zookeeper connection management options
+
+Some low level options manage the ZK connection —more specifically, its failure
+handling.
+
+The Zookeeper registry clients use Apache Curator to connect to Zookeeper,
+a library which detects timeouts and attempts to reconnect to one of the
+servers which forms the zookeeper quorum. It is only after a timeout is detected
+that a retry is triggered.
+
+```
+  <property>
+    <description>
+      Zookeeper session timeout in milliseconds
+    </description>
+    <name>hadoop.registry.zk.session.timeout.ms</name>
+    <value>60000</value>
+  </property>
+
+  <property>
+    <description>
+      Zookeeper connection timeout in milliseconds
+    </description>
+    <name>hadoop.registry.zk.connection.timeout.ms</name>
+    <value>15000</value>
+  </property>
+
+  <property>
+    <description>
+      Zookeeper connection retry count before failing
+    </description>
+    <name>hadoop.registry.zk.retry.times</name>
+    <value>5</value>
+  </property>
+
+  <property>
+    <description>
+    </description>
+    <name>hadoop.registry.zk.retry.interval.ms</name>
+    <value>1000</value>
+  </property>
+
+  <property>
+    <description>
+      Zookeeper retry limit in milliseconds, during
+      exponential backoff.
+      This places a limit even
+      if the retry times and interval limit, combined
+      with the backoff policy, result in a long retry
+      period
+    </description>
+    <name>hadoop.registry.zk.retry.ceiling.ms</name>
+    <value>60000</value>
+  </property>
+```
+
+The retry strategy used in the registry client is
+[`BoundedExponentialBackoffRetry`](https://curator.apache.org/apidocs/org/apache/curator/retry/BoundedExponentialBackoffRetry.html):
+This backs off exponentially on connection failures before eventually
+concluding that the quorum is unreachable and failing.
+
+## Complete Set of Configuration Options
+
+```
+  <!-- YARN registry -->
+
+  <property>
+    <description>
+      Is the registry enabled: does the RM start it up,
+      create the user and system paths, and purge
+      service records when containers, application attempts
+      and applications complete
+    </description>
+    <name>hadoop.registry.rm.enabled</name>
+    <value>false</value>
+  </property>
+
+  <property>
+    <description>
+      List of hostname:port pairs defining the
+      zookeeper quorum binding for the registry
+    </description>
+    <name>hadoop.registry.zk.quorum</name>
+    <value>localhost:2181</value>
+  </property>
+
+  <property>
+    <description>
+      The root zookeeper node for the registry
+    </description>
+    <name>hadoop.registry.zk.root</name>
+    <value>/registry</value>
+  </property>
+
+  <property>
+    <description>
+      Key to set if the registry is secure. Turning it on
+      changes the permissions policy from "open access"
+      to restrictions on kerberos with the option of
+      a user adding one or more auth key pairs down their
+      own tree.
+    </description>
+    <name>hadoop.registry.secure</name>
+    <value>false</value>
+  </property>
+
+  <property>
+    <description>
+      A comma separated list of Zookeeper ACL identifiers with
+      system access to the registry in a secure cluster.
+
+      These are given full access to all entries.
+
+      If there is an "@" at the end of a SASL entry it
+      instructs the registry client to append the default kerberos domain.
+    </description>
+    <name>hadoop.registry.system.acls</name>
+    <value>sasl:yarn@, sasl:mapred@, sasl:mapred@, sasl:hdfs@</value>
+  </property>
+
+  <property>
+    <description>
+      The kerberos realm: used to set the realm of
+      system principals which do not declare their realm,
+      and any other accounts that need the value.
+
+      If empty, the default realm of the running process
+      is used.
+
+      If neither are known and the realm is needed, then the registry
+      service/client will fail.
+    </description>
+    <name>hadoop.registry.kerberos.realm</name>
+    <value></value>
+  </property>
+
+  <property>
+    <description>
+      Key to define the JAAS context. Used in secure
+      mode
+    </description>
+    <name>hadoop.registry.jaas.context</name>
+    <value>Client</value>
+  </property>
+
+
+  <property>
+    <description>
+      Zookeeper session timeout in milliseconds
+    </description>
+    <name>hadoop.registry.zk.session.timeout.ms</name>
+    <value>60000</value>
+  </property>
+
+  <property>
+    <description>
+      Zookeeper session timeout in milliseconds
+    </description>
+    <name>hadoop.registry.zk.connection.timeout.ms</name>
+    <value>15000</value>
+  </property>
+
+  <property>
+    <description>
+      Zookeeper connection retry count before failing
+    </description>
+    <name>hadoop.registry.zk.retry.times</name>
+    <value>5</value>
+  </property>
+
+  <property>
+    <description>
+    </description>
+    <name>hadoop.registry.zk.retry.interval.ms</name>
+    <value>1000</value>
+  </property>
+
+  <property>
+    <description>
+      Zookeeper retry limit in milliseconds, during
+      exponential backoff: {@value}
+
+      This places a limit even
+      if the retry times and interval limit, combined
+      with the backoff policy, result in a long retry
+      period
+    </description>
+    <name>hadoop.registry.zk.retry.ceiling.ms</name>
+    <value>60000</value>
+  </property>
+```
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/registry/registry-security.md
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/registry/registry-security.md
@ -24,8 +24,8 @@ This document is therefore relevant only to secure clusters.
 ## Security Model

 The security model of the registry is designed to meet the following goals
-a secur
-1. Deliver functional security on e ZK installation.
+a secure registry:
+1. Deliver functional security on a secure ZK installation.
 1. Allow the RM to create per-user regions of the registration space
 1. Allow applications belonging to a user to write registry entries
 into their part of the space. These may be short-lived or long-lived
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/registry/using-the-yarn-service-registry.md
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/registry/using-the-yarn-service-registry.md
@ -19,17 +19,39 @@

 The YARN service registry can be used in a numbe of ways :-

-1. To register dynamic YARN-deployed applications with entries that match the lifespan of the YARN application. Service Records can be set to be deleted on the completion of the YARN application, the application attempt, or an individual container.
-1. To look up static or dynamic applications and the mechanisms to communicate with them. Those mechanisms can incude: HTTP(S) URLs, Zookeeper paths, hostnames and ports and even paths in a Hadoop filesystem to configuration data.
-1. On a secure cluster, to verify that a service binding has been published by a specific user, or a system account. This can be done simply by looking at the path under which an entry has been placed.
-1. To register static applications. These will remain in the registry until deleted. They can be updated as required.
+1. To register dynamic YARN-deployed applications with entries that match the
+   lifespan of the YARN application.
+   Service Records can be set to be deleted on
+   the completion of the YARN application, the application attempt,
+   or an individual container.
+1. To look up static or dynamic applications and the mechanisms to communicate
+   with them.
+   Those mechanisms can incude: HTTP(S) URLs, Zookeeper paths,
+   hostnames and ports and even paths in a Hadoop filesystem to
+   configuration data.
+1. On a secure cluster, to verify that a service binding has been published
+   by a specific user, or a system account.
+   This can be done simply by looking at the path under which an entry has
+   been placed.
+1. To register static applications.
+   These will remain in the registry until deleted.
+    They can be updated as required.

-A user of the registry may be both a publisher of entries —Service Records— and a consumer of other services located via their service records. Different parts of a distributed application may also use it for different purposes -central manager services can publish bindings for use by the worker services, which can then look up the bindings to communicate with that manager even if it was restarted on different nodes in the cluster.
+A user of the registry may be both a publisher of entries —Service Records—
+and a consumer of other services located via their service records.
+Different parts of a distributed application may also use it for different
+purposes. As an example, the Application Master of a YARN application
+can publish bindings for use by its worker containers. The code running in the containers
+which can then look up the bindings to communicate with that manager even
+if it was restarted on different nodes in the cluster. Client applications can
+look up external service endpoints to interact with the AM via a public API.

 The registry cannot be used:-

 * To subscribe to service records or registry paths and listen for changes.
-* To directly share arbitrary data from a server for their clients. Such data must be published by some other means, a means which the registry entry can publish.
+* To directly share arbitrary data from a server for their clients.
+  Such data must be published by some other means, a means which the registry
+  entry can publish.
 * To share secrets between processes. The registry is world readable.


@ -38,13 +60,17 @@ The registry cannot be used:-

 ### Short-lived YARN Application Masters registering their public service endpoints.

-1. A YARN application is deployed. In a secure cluster, it is given the kerberos token to write to the registry.
+1. A YARN application is deployed. In a secure cluster, it is given the kerberos
+   token to write to the registry.
 2. When launched, it creates a service record at a known path
-3. This record MAY have application attempt persistence policy of and an ID of the application attempt
+3. This record MAY have application attempt persistence policy of and an ID of
+   the application attempt

 		yarn:persistence = "application_attempt"
 		yarn:id = ${application_attemptId}
-	This means that the record will be deleted when the application attempt completes, even if a new attempt is created. Every Application attempt will have to re-register the endpoint —which may be needed to locate the service anyway.
+
+	 This means that the record will be deleted when the application attempt
+	 completes, even if a new attempt is created. Every Application attempt will have to re-register the endpoint —which may be needed to locate the service anyway.
 4. Alternatively, the record MAY have the persistence policy of "application":

 		yarn:persistence = "application_attempt"
@ -52,7 +78,9 @@ The registry cannot be used:-
 	This means that the record will persist even between application attempts, albeit with out of date endpoint information.
 5. Client applications look up the service by way of the path.

-The choice of path is an application specific one. For services with a yarn application name guaranteed to be unique, we recommend a convention of:
+The choice of path is an application specific one.
+For services with a YARN application name guaranteed to be unique,
+we recommend a convention of:

 	/users/${username}/applications/${service-class}/${instance-name}

@ -68,16 +96,21 @@ Client applications may locate the service
 * From a supplied service class and instance name
 * If listed by application ID, from the service class and application ID.

-After locating a service record, the client may enumerate the `external` bindings and locate the entry with the desired API.
+After locating a service record, the client can enumerate the `external`
+bindings and locate the entry with the desired API.


 ### YARN Containers registering their public service endpoints

-Here all containers in a YARN application are publishing service endpoints for public consumption.
+Here all containers in a YARN application are publishing service endpoints
+for public consumption.

-1. The deployed containers are passed the base path under which they should register themselves.
-2. Long-lived containers must be passed an `id:password` pair which gives them the right to update these entries without the kerberos credentials of the user. This allows the containers to update their entries even after the user tokens granting the AM write access to a registry path expire.
-3. The containers instantiate a registry operations instance with the id:pass pair.
+1. The deployed containers are passed the base path under which they should
+   register themselves.
+2. Long-lived containers must be passed an `id:password` pair which gives
+   them the right to update these entries without the kerberos credentials of the user. This allows the containers to update their entries even after the user tokens granting the AM write access to a registry path expire.
+3. The containers instantiate a registry operations instance with the
+   `id:password` pair.
 4. They then a register service record on a path consisting of:

 		${base-path} + "/" + RegistryPathUtils.encodeYarnID(containerId)
@ -89,62 +122,152 @@ Here all containers in a YARN application are publishing service endpoints for p

 	When the container is terminated, the entry will be automatically deleted.

-5. The exported service endpoints of this container-deployed service should be listed in the  `external` endpoint list of the service record.
-6. Clients may enumerate all containers exported by a YARN application by listing the entries under `${base-path}`.
+5. The exported service endpoints of this container-deployed service should
+   be listed in the `external` endpoint list of the service record.
+6. Clients can enumerate all containers exported by a YARN application by
+   listing the entries under `${base-path}`.


 ### Registering Static cluster services.

-Services which are generally fixed in a cluster, but which need to publish binding and configuration information may be published in the registry. Example: an Apache Oozie service.
-Services external to the cluster to which deployed applications may also be published. Example: An Amazon Dynamo instance.
+Services which are generally fixed in a cluster, but which need to publish
+binding and configuration information may be published in the registry.
+Example: an Apache Oozie service.
+Services external to the cluster to which deployed applications may also
+be published. Example: An Amazon Dynamo instance.


-These services can be registered under paths which belong to the users running the service, such as `/users/oozie` or `/users/hbase`. Client applications would use this path. While this can authenticate the validity of the service record, it does rely on the client applications knowing the username a service is deployed on, or being configured with the full path.
+These services can be registered under paths which belong to the users
+running the service, such as `/users/oozie` or `/users/hbase`.
+Client applications would use this path.
+While this can authenticate the validity of the service record,
+it does rely on the client applications knowing the username a service
+is deployed on, or being configured with the full path.

-The alternative is for the services to be deployed under a static services path, under `/services`. For example, `/services/oozie` could contain the registration of the Oozie service. As the permissions for this path are restricted to pre-configured system accounts, the presence of a service registration does, on a secure cluster, confirm that it was registered by the cluster administration tools.
+The alternative is for the services to be deployed under a static services path,
+under `/services`. For example, `/services/oozie` could contain
+the registration of the Oozie service.
+As the permissions for this path are restricted to pre-configured
+system accounts, the presence of a service registration on this path on a secure
+cluster, confirms that it was registered by the cluster administration tools.

-1. The service is deployed by some management tool, or directly by the cluster operator.
-2. The deployed application can register itself under its own user name if given the binding information for the registry.
-3. If the application is to be registered under `/services` and it has been deployed by one of the system user accounts —it may register itself directly.
-4. If the application does not have the permissions to do so, the cluster administration tools must register the service instead.
-5. Client applications may locate a service by resolving its well known/configured path.
-5. If a service is stopped, the administration tools may delete the entry, or retain the entry but delete all it service endpoints. This is a proposed convention to indicate "the service is known but not currently reachable".
-6. When a service is restarted, its binding information may be updated, or its entire registry entry recreated.
+1. The service is deployed by some management tool, or directly by
+   the cluster operator.
+2. The deployed application can register itself under its own user name
+   if given the binding information for the registry.
+3. If the application is to be registered under `/services` and it has been
+   deployed by one of the system user accounts —it may register itself directly.
+4. If the application does not have the permissions to do so, the cluster
+   administration tools must register the service instead.
+5. Client applications may locate a service by resolving its well
+   known/configured path.
+6. If a service is stopped, the administration tools may delete the entry,
+   or retain the entry but delete all it service endpoints.
+   This is a proposed convention to indicate
+   "the service is known but not currently reachable".
+7. When a service is restarted, its binding information may be updated,
+   or its entire registry entry recreated.


 ### YARN containers locating their Application Master

-Here YARN containers register with their AM to receive work, usually by some heartbeat mechanism where they report in regularly. If the AM is configured for containers to outlive the application attempt, when an AM fails the containers keep running. These containers will need to bind to any restarted AM. They may also wish to conclude that if an AM does not restart, that they should eventually time out and terminate themselves. Such a policy helps the application react to network partitions.
+Here YARN containers register with their AM to receive work, usually by some
+heartbeat mechanism where they report in regularly.
+If the AM is configured for containers to outlive the application attempt,
+when an AM fails the containers keep running.
+These containers will need to bind to any restarted AM.
+They may also wish to conclude that if an AM does not restart,
+that they should eventually time out and terminate themselves.
+Such a policy helps the application react to network partitions.

-1. The YARN AM publishes its service endpoints such as the FQDN and socket port neede for IPC communications, or an HTTP/HTTPS URL needed for a REST channel. These are published in the `internal` endpoint list, with the specific API the containers use
-2. The YARN containers are launched with the path to the service record (somehow) passed to them. Environment variables or command line parameters are two viable mechanisms. Shared secrets may also be passed this way, so allowing the containers to authenticate themselves with the AM. An alternative means of publishing secrets is saving information to the filesystem in a path specified as one of the endpoints.
-2. The YARN containers look up the service registry to identify the communications binding.
-3. If the registered service entry cannot be found, the container MAY do one of: exit. spin with some (jittered) retry period, polling for the entry, until the entry reappears. This implies that the AM has been found.
-4. If the service entry is found, the client should attempt to communicate with the AM on its channel. Shared authentication details may be used to validate the client with the server and vice versa.
-5. The client report in to the AM until the connections start failing to connect or authenticate, or when a long lived connection is broken and cannot be restarted.
-6. A this point the client may revert to step (3). Again, some backoff policy with some jitter helps stop a newly-restarted AM being overloaded. Containers may also with to have some timeout after which they conclude that the AM is not coming back and exit.
-6. We recommend that alongside the functional commands that an AM may issue to a client, a "terminate" command can be issued to a container. This allows the system to handle the specific situation of the YARN Node Manager terminating while spawned containers keep running.
+1. The YARN AM publishes its service endpoints such as the FQDN and
+   socket port needed for IPC communications, or an HTTP/HTTPS URL needed
+   for a REST channel.
+   These are published in the `internal` endpoint list, with the
+   `api` field set to a URL of the specific API the containers use.
+1. The YARN containers are launched with the path to the service record
+   (somehow) passed to them.
+   Environment variables or command line parameters are two viable mechanisms.
+   Shared secrets should also be passed that way: command line parameters are
+   visible in the unix `ps` command.
+   More secure is saving shared secrets to the cluster filesystem,
+   passing down the path to the containers. The URI to such as path MAY be one
+   of the registered internal endpoints of the application.
+1. The YARN containers look up the service registry to identify the
+   communications binding.
+1. If the registered service entry cannot be found, the container MAY do one of:
+   exit. spin with some (jittered) retry period, polling for the entry, until
+   the entry reappears. This implies that the AM has been found.
+1. If the service entry is found, the client should attempt to communicate
+   with the AM on its channel.
+   Shared authentication details may be used to validate the client with the
+   server and vice versa.
+1. The client report in to the AM until the connections start failing to
+   connect or authenticate, or when a long lived connection is broken
+   and cannot be restarted.
+1. A this point the client may revert to step (3).
+   Again, some backoff policy with some jitter helps stop a
+   newly-restarted AM being overloaded.
+   Containers may also with to have some timeout after which they conclude
+   that the AM is not coming back and exit.
+1. We recommend that alongside the functional commands that an AM may
+   issue to a client, a "terminate" command can be issued to a container.
+   This allows the system to handle the specific situation of the
+   YARN Node Manager terminating while spawned containers keep running.

 ### YARN Applications and containers publishing their management and metrics bindings

-Management ports and bindings are simply another endpoint to publish. These should be published as *internal* endpoints, as they are not intended for public consumption. By convention, the name of the management protocol shoudl be used as the endpoint's declared API: `JMX`, `ganglia`, etc.
-
+Management ports and bindings are simply others endpoint to publish.
+These should be published as *internal* endpoints, as they are not
+intended for public consumption.

 ### Client application enumerating services by endpoint APIs

-A client application wishes to locate all services implementing a specific API, such as `"org.apache.hbase"`
+A client application wishes to locate all services implementing a specific API,
+such as `"classpath://org.apache.hbase"`

 1. The client starts from a path in the registry
-1. The client calls `registryOperations.list(path)` to list all nodes directly under that path, getting a relative list of child nodes.
-1. the client enumerates the child record statuses by calling `stat()` on each child.
-1. For all status entries, if the size of the entry is > the value of `ServiceRecordHeader.getLength()`, it MAY contain a service record.
-1. The contents can be retrieved using the `resolve()` operation. If successful, it does contain a service record —so the client can enumerate the `external` endpoints and locate the one with the desired API.
-1. The `children` field of each `RegistryPathStatus` status entry should be examined. If it is >= 0, the enumeration should be performed recursively on the path of that entry.
+1. The client calls `registryOperations.list(path)` to list all nodes directly
+   under that path, getting a relative list of child nodes.
+1. the client enumerates the child record statuses by calling `stat()`
+   on each child.
+1. For all status entries, if the size of the entry is greater than the
+   value of `ServiceRecordHeader.getLength()`, it MAY contain a service record.
+1. The contents can be retrieved using the `resolve()` operation.
+   If successful, it does contain a service record —so the client can enumerate
+   the `external` endpoints and locate the one with the desired API.
+1. The `children` field of each `RegistryPathStatus` status entry should
+   be examined. If it is >= 0, the enumeration should be performed recursively on the path of that entry.
 1. The operation ultimately completes with a list of all entries.
+1. One of the enumerated endpoints may be selected and used as the binding information
+   for a service

-This algorithm describes a depth first search of the registry tree. Variations are of course possible, including breadth first search, or immediately halting the search as soon as a single entry point. There is also the option of parallel searches of different subtrees —which may reduce search time, albeit at the price of a higher client load on the registry infrastructure.
+This algorithm describes a depth first search of the registry tree.
+Variations are of course possible, including breadth-first search,
+or immediately halting the search as soon as a single entry point.
+There is also the option of parallel searches of different subtrees
+—this may reduce search time, albeit at the price of a higher client
+load on the registry infrastructure.

-A Utility class `RegistryUtils` provides static utility methods for common registry operations,
-in particular, `RegistryUtils.listServiceRecords(registryOperations, path)`
+A utility class `RegistryUtils` provides static utility methods for
+common registry operations,in particular,
+`RegistryUtils.listServiceRecords(registryOperations, path)`
 performs the listing and collection of all immediate child record entries of
 a specified path.
+
+Client applications are left with the problem of "what to do when the endpoint
+is not valid", specifically, when a service is not running —what should be done?
+
+Some transports assume that the outage is transient, and that spinning retries
+against the original binding is the correct strategy. This is the default
+policy of the Hadoop IPC client.
+
+Other transports fail fast, immediately reporting the failure via an
+exception or other mechanism. This is directly visible to the client —but
+does allow the client to rescan the registry and rebind to the application.
+
+Finally, some application have been designed for dynamic failover from the
+outset: their published binding information is actually a zookeeper path.
+Apache HBase and Apache Accumulo are examples of this. The registry is used
+for the initial lookup of the binding, after which the clients are inherently
+resilient to failure.
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/registry/yarn-registry.md
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/registry/yarn-registry.md
@ -19,9 +19,13 @@

 # Introduction and concepts

-This document describes a YARN service registry built to address a problem:
-*how can clients talk to YARN-deployed services and the components which form
-such services?*
+This document describes a YARN service registry built to address two problems:
+
+1. How can clients talk to YARN-deployed services and the components which form
+such services?
+1. Allow Hadoop core services to be registered and discovered thereby
+reducing configuration parameters and to allow core services to be more
+easily moved.

 Service registration and discovery is a long-standing problem in distributed
 computing, dating back to Xerox's Grapevine Service. This proposal is for a
@ -346,7 +350,6 @@ A Service Record contains some basic informations and two lists of endpoints:
 one list for users of a service, one list for internal use within the
 application.

-
 <table>
  <tr>
    <td>Name</td>
@ -370,7 +373,9 @@ application.
  </tr>
 </table>

-The type field MUST be `"JSONServiceRecord"`. Mandating this string allows future record types *and* permits rapid rejection of byte arrays that lack this string before attempting JSON parsing.
+The type field MUST be `"JSONServiceRecord"`. Mandating this string allows
+future record types *and* permits rapid rejection of byte arrays that
+lack this string before attempting to parse the data with a JSON parser.

 ### YARN Persistence policies

@ -379,8 +384,15 @@ as an application, attempt or container is completed.

 This allows service to register entries which have a lifespan bound to one of
 these aspects of YARN applications' lifecycles. This is a feature which is only
-supported when the RM has enabled its support, and would not apply to
-any use of the registry without the RM's participation.
+supported when the RM has had its registry integration enabled via the
+configuration option `hadoop.registry.rm.enabled`.
+
+If this option is enabled, and the YARN resource manager is running,
+it will clean up service records as defined
+below.
+
+If the option is disabled, the RM does not provide any registry support at all.
+

 The attributes, `yarn:id` and `yarn:persistence` specify which records
 *and any child entries* may be deleted as the associated YARN components complete.
@ -1006,7 +1018,8 @@ The details are irrelevant —note that they use an application-specific API
 value to ensure uniqueness.

 Internal:
-1. Two URLS to REST APIs offered by the AM for containers deployed by
+
+1. Two URLs to REST APIs offered by the AM for containers deployed by
 the application itself.

 Python agents running in the containers retrieve the internal endpoint