YARN-7191. Improve yarn-service documentation. Contributed by Jian He
This commit is contained in:
parent
c70e5df100
commit
8851209788
|
@ -22,6 +22,8 @@ It also does all the heavy lifting work such as resolving the service definition
|
||||||
failed containers, monitoring components' healthiness and readiness, ensuring dependency start order across components, flexing up/down components,
|
failed containers, monitoring components' healthiness and readiness, ensuring dependency start order across components, flexing up/down components,
|
||||||
upgrading components etc. The end goal of the framework is to make sure the service is up and running as the state that user desired.
|
upgrading components etc. The end goal of the framework is to make sure the service is up and running as the state that user desired.
|
||||||
|
|
||||||
|
In addition, it leverages a lot of features in YARN core to accomplish scheduling constraints, such as
|
||||||
|
affinity and anti-affinity scheduling, log aggregation for services, automatically restart a container if it fails, and do in-place upgrade of a container.
|
||||||
|
|
||||||
### A Restful API-Server for deploying/managing services on YARN
|
### A Restful API-Server for deploying/managing services on YARN
|
||||||
A restful API server is developed to allow users to deploy/manage their services on YARN via a simple JSON spec. This avoids users
|
A restful API server is developed to allow users to deploy/manage their services on YARN via a simple JSON spec. This avoids users
|
||||||
|
@ -34,44 +36,11 @@ support HA, distribute the load etc.
|
||||||
|
|
||||||
### Service Discovery
|
### Service Discovery
|
||||||
A DNS server is implemented to enable discovering services on YARN via the standard mechanism: DNS lookup.
|
A DNS server is implemented to enable discovering services on YARN via the standard mechanism: DNS lookup.
|
||||||
The DNS server essentially exposes the information in YARN service registry by translating them into DNS records such as A record and SRV record.
|
|
||||||
Clients can discover the IPs of containers via standard DNS lookup.
|
The framework posts container information such as hostname and ip into the [YARN service registry](../registry/index.md). And the DNS server essentially exposes the
|
||||||
|
information in YARN service registry by translating them into DNS records such as A record and SRV record.
|
||||||
|
Clients can then discover the IPs of containers via standard DNS lookup.
|
||||||
|
|
||||||
The previous read mechanisms of YARN Service Registry were limited to a registry specific (java) API and a REST interface and are difficult
|
The previous read mechanisms of YARN Service Registry were limited to a registry specific (java) API and a REST interface and are difficult
|
||||||
to wireup existing clients and services. The DNS based service discovery eliminates this gap. Please refer to this [DNS doc](ServiceDiscovery.md)
|
to wireup existing clients and services. The DNS based service discovery eliminates this gap. Please refer to this [Service Discovery doc](ServiceDiscovery.md)
|
||||||
for more details.
|
for more details.
|
||||||
|
|
||||||
### Scheduling
|
|
||||||
|
|
||||||
A host of scheduling features are being developed to support long running services.
|
|
||||||
|
|
||||||
* Affinity and anti-affinity scheduling across containers ([YARN-6592](https://issues.apache.org/jira/browse/YARN-6592)).
|
|
||||||
* Container resizing ([YARN-1197](https://issues.apache.org/jira/browse/YARN-1197))
|
|
||||||
* Special handling of container preemption/reservation for services
|
|
||||||
|
|
||||||
### Container auto-restarts
|
|
||||||
|
|
||||||
[YARN-3998](https://issues.apache.org/jira/browse/YARN-3998) implements a retry-policy to let NM re-launch a service container when it fails.
|
|
||||||
The service REST API provides users a way to enable NodeManager to automatically restart the container if it fails.
|
|
||||||
The advantage is that it avoids the entire cycle of releasing the failed containers, re-asking new containers, re-do resource localizations and so on, which
|
|
||||||
greatly minimizes container downtime.
|
|
||||||
|
|
||||||
|
|
||||||
### Container in-place upgrade
|
|
||||||
|
|
||||||
[YARN-4726](https://issues.apache.org/jira/browse/YARN-4726) aims to support upgrading containers in-place, that is, without losing the container allocations.
|
|
||||||
It opens up a few APIs in NodeManager to allow ApplicationMasters to upgrade their containers via a simple API call.
|
|
||||||
Under the hood, NodeManager does below steps:
|
|
||||||
* Downloading the new resources such as jars, docker container images, new configurations.
|
|
||||||
* Stop the old container.
|
|
||||||
* Start the new container with the newly downloaded resources.
|
|
||||||
|
|
||||||
At the time of writing this document, core changes are done but the feature is not usable end-to-end.
|
|
||||||
|
|
||||||
### Resource Profiles
|
|
||||||
|
|
||||||
In [YARN-3926](https://issues.apache.org/jira/browse/YARN-3926), YARN introduces Resource Profiles which extends the YARN resource model for easier
|
|
||||||
resource-type management and profiles.
|
|
||||||
It primarily solves two problems:
|
|
||||||
* Make it easy to support new resource types such as network bandwith([YARN-2140](https://issues.apache.org/jira/browse/YARN-2140)), disks([YARN-2139](https://issues.apache.org/jira/browse/YARN-2139)).
|
|
||||||
Under the hood, it unifies the scheduler codebase to essentially parameterize the resource types.
|
|
||||||
* User can specify the container resource requirement by a profile name, rather than fiddling with varying resource-requirements for each resource type.
|
|
|
@ -52,7 +52,8 @@ The benefits of combining these workloads are two-fold:
|
||||||
|
|
||||||
* [Concepts](Concepts.md): Describes the internals of the framework and some features in YARN core to support running services on YARN.
|
* [Concepts](Concepts.md): Describes the internals of the framework and some features in YARN core to support running services on YARN.
|
||||||
* [Service REST API](YarnServiceAPI.md): The API doc for deploying/managing services on YARN.
|
* [Service REST API](YarnServiceAPI.md): The API doc for deploying/managing services on YARN.
|
||||||
* [Service Discovery](ServiceDiscovery.md): Deep dives into the YARN DNS internals.
|
* [Service Discovery](ServiceDiscovery.md): Descirbes the service discovery mechanism on YARN.
|
||||||
|
* [Registry DNS](RegistryDNS.md): Deep dives into the Registry DNS internals.
|
||||||
* [Examples](Examples.md): List some example service definitions (`Yarnfile`).
|
* [Examples](Examples.md): List some example service definitions (`Yarnfile`).
|
||||||
|
|
||||||
|
|
||||||
|
|
|
@ -194,32 +194,10 @@ If you are building from source code, make sure you use `-Pyarn-ui` in the `mvn`
|
||||||
</property>
|
</property>
|
||||||
```
|
```
|
||||||
|
|
||||||
## Service Discovery with YARN DNS
|
# Try with Docker
|
||||||
YARN Service framework comes with a DNS server (backed by YARN Service Registry) which enables DNS based discovery of services deployed on YARN.
|
The above example is only for a non-docker container based service. YARN Service Framework also provides first-class support for managing docker based services.
|
||||||
That is, user can simply access their services in a well-defined naming format as below:
|
Most of the steps for managing docker based services are the same except that in docker the `Artifact` type for a component is `DOCKER` and the Artifact `id` is the name of the docker image.
|
||||||
|
For details in how to setup docker on YARN, please check [Docker on YARN](../DockerContainers.md).
|
||||||
|
|
||||||
```
|
With docker support, it also opens up a set of new possibilities to implement features such as discovering service containers on YARN with DNS.
|
||||||
${COMPONENT_INSTANCE_NAME}.${SERVICE_NAME}.${USER}.${DOMAIN}
|
Check [ServiceDiscovery](ServiceDiscovery.md) for more details.
|
||||||
```
|
|
||||||
For example, in a cluster whose domain name is `yarncluster` (as defined by the `hadoop.registry.dns.domain-name` in `yarn-site.xml`), a service named `hbase` deployed by user `dev`
|
|
||||||
with two components `hbasemaster` and `regionserver` can be accessed as below:
|
|
||||||
|
|
||||||
This URL points to the usual hbase master UI
|
|
||||||
```
|
|
||||||
http://hbasemaster-0.hbase.dev.yarncluster:16010/master-status
|
|
||||||
```
|
|
||||||
|
|
||||||
|
|
||||||
Note that YARN service framework assigns COMPONENT_INSTANCE_NAME for each container in a sequence of monotonically increasing integers. For example, `hbasemaster-0` gets
|
|
||||||
assigned `0` since it is the first and only instance for the `hbasemaster` component. In case of `regionserver` component, it can have multiple containers
|
|
||||||
and so be named as such: `regionserver-0`, `regionserver-1`, `regionserver-2` ... etc
|
|
||||||
|
|
||||||
`Disclaimer`: The DNS implementation is still experimental. It should not be used as a fully-functional corporate DNS.
|
|
||||||
|
|
||||||
### Start the DNS server
|
|
||||||
By default, the DNS runs on non-privileged port `5353`.
|
|
||||||
If it is configured to use the standard privileged port `53`, the DNS server needs to be run as root:
|
|
||||||
```
|
|
||||||
sudo su - -c "yarn org.apache.hadoop.registry.server.dns.RegistryDNSServer > /${HADOOP_LOG_FOLDER}/registryDNS.log 2>&1 &" root
|
|
||||||
```
|
|
||||||
Please refer to [YARN DNS doc](ServicesDiscovery.md) for the full list of configurations.
|
|
||||||
|
|
|
@ -0,0 +1,166 @@
|
||||||
|
<!---
|
||||||
|
Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
you may not use this file except in compliance with the License.
|
||||||
|
You may obtain a copy of the License at
|
||||||
|
|
||||||
|
http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
|
||||||
|
Unless required by applicable law or agreed to in writing, software
|
||||||
|
distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
See the License for the specific language governing permissions and
|
||||||
|
limitations under the License. See accompanying LICENSE file.
|
||||||
|
-->
|
||||||
|
|
||||||
|
# Registry DNS Server
|
||||||
|
|
||||||
|
<!-- MACRO{toc|fromDepth=0|toDepth=3} -->
|
||||||
|
|
||||||
|
## Introduction
|
||||||
|
|
||||||
|
The Registry DNS Server provides a standard DNS interface to the information posted into the YARN Registry by deployed applications. The DNS service serves the following functions:
|
||||||
|
|
||||||
|
1. **Exposing existing service-discovery information via DNS** - Information provided in
|
||||||
|
the current YARN service registry’s records will be converted into DNS entries, thus
|
||||||
|
allowing users to discover information about YARN applications using standard DNS
|
||||||
|
client mechanisms (e.g. a DNS SRV Record specifying the hostname and port
|
||||||
|
number for services).
|
||||||
|
2. **Enabling Container to IP mappings** - Enables discovery of the IPs of containers via
|
||||||
|
standard DNS lookups. Given the availability of the records via DNS, container
|
||||||
|
name-based communication will be facilitated (e.g. `curl
|
||||||
|
http://solr-0.solr-service.devuser.yarncluster:8983/solr/admin/collections?action=LIST`).
|
||||||
|
|
||||||
|
## Service Properties
|
||||||
|
|
||||||
|
The existing YARN Service Registry is leveraged as the source of information for the DNS Service.
|
||||||
|
|
||||||
|
The following core functions are supported by the DNS-Server:
|
||||||
|
|
||||||
|
### Functional properties
|
||||||
|
|
||||||
|
1. Supports creation of DNS records for end-points of the deployed YARN applications
|
||||||
|
2. Record names remain unchanged during restart of containers and/or applications
|
||||||
|
3. Supports reverse lookups (name based on IP). Note, this works only for
|
||||||
|
Docker containers because other containers share the IP of the host
|
||||||
|
4. Supports security using the standards defined by The Domain Name System Security
|
||||||
|
Extensions (DNSSEC)
|
||||||
|
5. Highly available
|
||||||
|
6. Scalable - The service provides the responsiveness (e.g. low-latency) required to
|
||||||
|
respond to DNS queries (timeouts yield attempts to invoke other configured name
|
||||||
|
servers).
|
||||||
|
|
||||||
|
### Deployment properties
|
||||||
|
|
||||||
|
1. Supports integration with existing DNS assets (e.g. a corporate DNS server) by acting as
|
||||||
|
a DNS server for a Hadoop cluster zone/domain. The server is not intended to act as a
|
||||||
|
primary DNS server and does not forward requests to other servers. Rather, a
|
||||||
|
primary DNS server can be configured to forward a zone to the registry DNS
|
||||||
|
server.
|
||||||
|
2. The DNS Server exposes a port that can receive both TCP and UDP requests per
|
||||||
|
DNS standards. The default port for DNS protocols is not in the restricted
|
||||||
|
range (5353). However, existing DNS assets may only allow zone forwarding to
|
||||||
|
non-custom ports. To support this, the registry DNS server can be started in
|
||||||
|
privileged mode.
|
||||||
|
|
||||||
|
## DNS Record Name Structure
|
||||||
|
|
||||||
|
The DNS names of generated records are composed from the following elements
|
||||||
|
(labels). Note that these elements must be compatible with DNS conventions
|
||||||
|
(see “Preferred Name Syntax” in [RFC 1035](https://www.ietf.org/rfc/rfc1035.txt)):
|
||||||
|
|
||||||
|
* **domain** - the name of the cluster DNS domain. This name is provided as a
|
||||||
|
configuration property. In addition, it is this name that is configured at a parent DNS
|
||||||
|
server as the zone name for the defined registry DNS zone (the zone for which
|
||||||
|
the parent DNS server will forward requests to registry DNS). E.g. yarncluster.com
|
||||||
|
* **username** - the name of the application deployer. This name is the simple short-name (for
|
||||||
|
e.g. the primary component of the Kerberos principal) associated with the user launching
|
||||||
|
the application. As the username is one of the elements of DNS names, it is expected
|
||||||
|
that this also conforms to DNS name conventions (RFC 1035 linked above), so it
|
||||||
|
is converted to a valid DNS hostname entries using the punycode convention used
|
||||||
|
for internationalized DNS.
|
||||||
|
* **application name** - the name of the deployed YARN application. This name is inferred
|
||||||
|
from the YARN registry path to the application's node. Application name,
|
||||||
|
rather than application id, was chosen as a way of making it easy for users to refer to human-readable DNS
|
||||||
|
names. This obviously mandates certain uniqueness properties on application names.
|
||||||
|
* **container id** - the YARN assigned ID to a container (e.g.
|
||||||
|
container_e3741_1454001598828_01_000004)
|
||||||
|
* **component name** - the name assigned to the deployed component (for e.g. a master
|
||||||
|
component). A component is a distributed element of an application or service that is
|
||||||
|
launched in a YARN container (e.g. an HBase master). One can imagine multiple
|
||||||
|
components within an application. A component name is not yet a first class concept in
|
||||||
|
YARN, but is a very useful one that we are introducing here for the sake of registry DNS
|
||||||
|
entries. Many frameworks like MapReduce, Slider already have component names
|
||||||
|
(though, as mentioned, they are not yet supported in YARN in a first class fashion).
|
||||||
|
* **api** - the api designation for the exposed endpoint
|
||||||
|
|
||||||
|
### Notes about DNS Names
|
||||||
|
|
||||||
|
* In most instances, the DNS names can be easily distinguished by the number of
|
||||||
|
elements/labels that compose the name. The cluster’s domain name is always the last
|
||||||
|
element. After that element is parsed out, reading from right to left, the first element
|
||||||
|
maps to the application user and so on. Wherever it is not easily distinguishable, naming conventions are used to disambiguate the name using a prefix such as
|
||||||
|
“container” or suffix such as “api”. For example, an endpoint published as a
|
||||||
|
management endpoint will be referenced with the name *management-api.griduser.yarncluster.com*.
|
||||||
|
* Unique application name (per user) is not currently supported/guaranteed by YARN, but
|
||||||
|
it is supported by frameworks such as Apache Slider. The registry DNS service currently
|
||||||
|
leverages the last element of the ZK path entry for the application as an
|
||||||
|
application name. These application names have to be unique for a given user.
|
||||||
|
|
||||||
|
## DNS Server Functionality
|
||||||
|
|
||||||
|
The primary functions of the DNS service are illustrated in the following diagram:
|
||||||
|
|
||||||
|
![DNS Functional Overview](../images/dns_overview.png "DNS Functional Overview")
|
||||||
|
|
||||||
|
### DNS record creation
|
||||||
|
The following figure illustrates at slightly greater detail the DNS record creation and registration sequence (NOTE: service record updates would follow a similar sequence of steps,
|
||||||
|
distinguished only by the different event type):
|
||||||
|
|
||||||
|
![DNS Functional Overview](../images/dns_record_creation.jpeg "DNS Functional Overview")
|
||||||
|
|
||||||
|
### DNS record removal
|
||||||
|
Similarly, record removal follows a similar sequence
|
||||||
|
|
||||||
|
![DNS Functional Overview](../images/dns_record_removal.jpeg "DNS Functional Overview")
|
||||||
|
|
||||||
|
(NOTE: The DNS Zone requires a record as an argument for the deletion method, thus
|
||||||
|
requiring similar parsing logic to identify the specific records that should be removed).
|
||||||
|
|
||||||
|
### DNS Service initialization
|
||||||
|
* The DNS service initializes both UDP and TCP listeners on a configured port.
|
||||||
|
If a port in the restricted range is desired (such as the standard DNS port
|
||||||
|
53), the DNS service can be launched using jsvc as described in the section
|
||||||
|
on starting the DNS server.
|
||||||
|
* Subsequently, the DNS service listens for inbound DNS requests. Those requests are
|
||||||
|
standard DNS requests from users or other DNS servers (for example, DNS servers that have the
|
||||||
|
RegistryDNS service configured as a forwarder).
|
||||||
|
|
||||||
|
## Start the DNS Server
|
||||||
|
By default, the DNS server runs on non-privileged port `5353`. Start the server
|
||||||
|
with:
|
||||||
|
```
|
||||||
|
yarn --daemon start registrydns
|
||||||
|
```
|
||||||
|
|
||||||
|
If the DNS server is configured to use the standard privileged port `53`, the
|
||||||
|
environment variables YARN\_REGISTRYDNS\_SECURE\_USER and
|
||||||
|
YARN\_REGISTRYDNS\_SECURE\_EXTRA\_OPTS must be uncommented in the yarn-env.sh
|
||||||
|
file. The DNS server should then be launched as root and jsvc will be used to
|
||||||
|
reduce the privileges of the daemon after the port has been bound.
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
The Registry DNS server reads its configuration properties from the yarn-site.xml file. The following are the DNS associated configuration properties:
|
||||||
|
|
||||||
|
| Name | Description |
|
||||||
|
| ------------ | ------------- |
|
||||||
|
| hadoop.registry.dns.enabled | The DNS functionality is enabled for the cluster. Default is false. |
|
||||||
|
| hadoop.registry.dns.domain-name | The domain name for Hadoop cluster associated records. |
|
||||||
|
| hadoop.registry.dns.bind-address | Address associated with the network interface to which the DNS listener should bind. |
|
||||||
|
| hadoop.registry.dns.bind-port | The port number for the DNS listener. The default port is 5353. |
|
||||||
|
| hadoop.registry.dns.dnssec.enabled | Indicates whether the DNSSEC support is enabled. Default is false. |
|
||||||
|
| hadoop.registry.dns.public-key | The base64 representation of the server’s public key. Leveraged for creating the DNSKEY Record provided for DNSSEC client requests. |
|
||||||
|
| hadoop.registry.dns.private-key-file | The path to the standard DNSSEC private key file. Must only be readable by the DNS launching identity. See [dnssec-keygen](https://ftp.isc.org/isc/bind/cur/9.9/doc/arm/man.dnssec-keygen.html) documentation. |
|
||||||
|
| hadoop.registry.dns-ttl | The default TTL value to associate with DNS records. The default value is set to 1 (a value of 0 has undefined behavior). A typical value should be approximate to the time it takes YARN to restart a failed container. |
|
||||||
|
| hadoop.registry.dns.zone-subnet | An indicator of the IP range associated with the cluster containers. The setting is utilized for the generation of the reverse zone name. |
|
||||||
|
| hadoop.registry.dns.zone-mask | The network mask associated with the zone IP range. If specified, it is utilized to ascertain the IP range possible and come up with an appropriate reverse zone name. |
|
||||||
|
| hadoop.registry.dns.zones-dir | A directory containing zone configuration files to read during zone initialization. This directory can contain zone master files named *zone-name.zone*. See [here](http://www.zytrax.com/books/dns/ch6/mydomain.html) for zone master file documentation.|
|
|
@ -12,139 +12,112 @@
|
||||||
limitations under the License. See accompanying LICENSE file.
|
limitations under the License. See accompanying LICENSE file.
|
||||||
-->
|
-->
|
||||||
|
|
||||||
# YARN DNS Server
|
# Service Discovery
|
||||||
|
|
||||||
<!-- MACRO{toc|fromDepth=0|toDepth=3} -->
|
This document describes the mechanism of service discovery on YARN and the
|
||||||
|
steps for enabling it.
|
||||||
|
|
||||||
## Introduction
|
## Overview
|
||||||
|
A [DNS server](RegistryDNS.md) is implemented to enable discovering services on YARN via
|
||||||
|
the standard mechanism: DNS lookup.
|
||||||
|
|
||||||
The YARN DNS Server provides a standard DNS interface to the information posted into the YARN Registry by deployed applications. The DNS service serves the following functions:
|
The framework ApplicationMaster posts the container information such as hostname and IP address into
|
||||||
|
the YARN service registry. The DNS server exposes the information in YARN service registry by translating them into DNS
|
||||||
|
records such as A record and SRV record. Clients can then discover the IPs of containers via standard DNS lookup.
|
||||||
|
|
||||||
1. **Exposing existing service-discovery information via DNS** - Information provided in
|
For non-docker containers (containers with null `Artifact` or with `Artifact` type set to `TARBALL`), since all containers on the same host share the same ip address,
|
||||||
the current YARN service registry’s records will be converted into DNS entries, thus
|
the DNS supports forward DNS lookup, but not support reverse DNS lookup.
|
||||||
allowing users to discover information about YARN applications using standard DNS
|
With docker, it supports both forward and reverse lookup, since each container
|
||||||
client mechanisms (for e.g. a DNS SRV Record specifying the hostname and port
|
can be configured to have its own unique IP. In addition, the DNS also supports configuring static zone files for both foward and reverse lookup.
|
||||||
number for services).
|
|
||||||
2. **Enabling Container to IP mappings** - Enables discovery of the IPs of containers via
|
|
||||||
standard DNS lookups. Given the availability of the records via DNS, container
|
|
||||||
name-based communication will be facilitated (e.g. ‘curl
|
|
||||||
http://myContainer.myDomain.com/endpoint’).
|
|
||||||
|
|
||||||
## Service Properties
|
## Docker Container IP Management in Cluster
|
||||||
|
To support the use-case of per container per IP, containers must be launched with `bridge` network. However, with `bridge` network, containers
|
||||||
|
running on one node are not routable from other nodes by default. This is not an issue if you are only doing single node testing, however, for
|
||||||
|
a multi-node environment, containers must be made routable from other nodes.
|
||||||
|
|
||||||
The existing YARN Service Registry is leveraged as the source of information for the DNS Service.
|
There are several approaches to solve this depending on the platforms like GCE or AWS. Please refer to specific platform documentations for how to enable this.
|
||||||
|
For on-prem cluster, one way to solve this issue is, on each node, configure the docker daemon to use a custom bridge say `br0` which is routable from all nodes.
|
||||||
|
Also, assign an exclusive, contiguous range of IP addresses expressed in CIDR form e.g `172.21.195.240/26 (64 IPs)` to each docker
|
||||||
|
daemon using the `fixed-cidr` option like below in the docker `daemon.json`:
|
||||||
|
```
|
||||||
|
"bridge": "br0"
|
||||||
|
"fixed-cidr": "172.21.195.240/26"
|
||||||
|
```
|
||||||
|
Check how to [customize docker bridge network](https://docs.docker.com/engine/userguide/networking/default_network/custom-docker0/) for details.
|
||||||
|
|
||||||
The following core functions are supported by the DNS-Server:
|
|
||||||
|
|
||||||
### Functional properties
|
## Naming Convention with Registry DNS
|
||||||
|
With the DNS support, user can simply access their services in a well-defined naming format as below:
|
||||||
|
|
||||||
1. Supports creation of DNS records for end-points of the deployed YARN applications
|
```
|
||||||
2. Record names remain unchanged during restart of containers and/or applications
|
${COMPONENT_INSTANCE_NAME}.${SERVICE_NAME}.${USER}.${DOMAIN}
|
||||||
3. Supports reverse lookups (name based on IP). Note, this works only for Docker containers.
|
```
|
||||||
4. Supports security using the standards defined by The Domain Name System Security
|
For example, in a cluster whose domain name is `yarncluster` (as defined by the `hadoop.registry.dns.domain-name` in `yarn-site.xml`), a service named `hbase` deployed by user `devuser`
|
||||||
Extensions (DNSSEC)
|
with two components `hbasemaster` and `regionserver` can be accessed as below:
|
||||||
5. Highly available
|
|
||||||
6. Scalable - The service provides the responsiveness (e.g. low-latency) required to
|
|
||||||
respond to DNS queries (timeouts yield attempts to invoke other configured name
|
|
||||||
servers).
|
|
||||||
|
|
||||||
### Deployment properties
|
This URL points to the usual hbase master UI
|
||||||
|
```
|
||||||
|
http://hbasemaster-0.hbase.devuser.yarncluster:16010/master-status
|
||||||
|
```
|
||||||
|
|
||||||
1. Supports integration with existing DNS assets (e.g. a corporate DNS server) by acting as
|
|
||||||
a DNS server for a Hadoop cluster zone/domain. The server is not intended to act as a
|
|
||||||
primary DNS server and does not forward requests to other servers.
|
|
||||||
2. The DNS Server exposes a port that can receive both TCP and UDP requests per
|
|
||||||
DNS standards. The default port for DNS protocols is in a restricted, administrative port
|
|
||||||
range (5353), so the port is configurable for deployments in which the service may
|
|
||||||
not be managed via an administrative account.
|
|
||||||
|
|
||||||
## DNS Record Name Structure
|
Note that YARN service framework assigns `COMPONENT_INSTANCE_NAME` for each container in a sequence of monotonically increasing integers. For example, `hbasemaster-0` gets
|
||||||
|
assigned `0` since it is the first and only instance for the `hbasemaster` component. In case of `regionserver` component, it can have multiple containers
|
||||||
|
and so be named as such: `regionserver-0`, `regionserver-1`, `regionserver-2` ... etc
|
||||||
|
|
||||||
The DNS names of generated records are composed from the following elements (labels). Note that these elements must be compatible with DNS conventions (see “Preferred Name Syntax” in RFC 1035):
|
`Disclaimer`: The DNS implementation is still experimental. It should not be used as a fully-functional DNS.
|
||||||
|
|
||||||
* **domain** - the name of the cluster DNS domain. This name is provided as a
|
|
||||||
configuration property. In addition, it is this name that is configured at a parent DNS
|
|
||||||
server as the zone name for the defined yDNS zone (the zone for which the parent DNS
|
|
||||||
server will forward requests to yDNS). E.g. yarncluster.com
|
|
||||||
* **username** - the name of the application deployer. This name is the simple short-name (for
|
|
||||||
e.g. the primary component of the Kerberos principal) associated with the user launching
|
|
||||||
the application. As the username is one of the elements of DNS names, it is expected
|
|
||||||
that this also confirms DNS name conventions (RFC 1035 linked above), so special translation is performed for names with special characters like hyphens and spaces.
|
|
||||||
* **application name** - the name of the deployed YARN application. This name is inferred
|
|
||||||
from the YARN registry path to the application's node. Application name, rather thn application id, was chosen as a way of making it easy for users to refer to human-readable DNS
|
|
||||||
names. This obviously mandates certain uniqueness properties on application names.
|
|
||||||
* **container id** - the YARN assigned ID to a container (e.g.
|
|
||||||
container_e3741_1454001598828_01_000004)
|
|
||||||
* **component name** - the name assigned to the deployed component (for e.g. a master
|
|
||||||
component). A component is a distributed element of an application or service that is
|
|
||||||
launched in a YARN container (e.g. an HBase master). One can imagine multiple
|
|
||||||
components within an application. A component name is not yet a first class concept in
|
|
||||||
YARN, but is a very useful one that we are introducing here for the sake of yDNS
|
|
||||||
entries. Many frameworks like MapReduce, Slider already have component names
|
|
||||||
(though, as mentioned, they are not yet supported in YARN in a first class fashion).
|
|
||||||
* **api** - the api designation for the exposed endpoint
|
|
||||||
|
|
||||||
### Notes about DNS Names
|
## Configure Registry DNS
|
||||||
|
|
||||||
* In most instances, the DNS names can be easily distinguished by the number of
|
Below is the set of configurations in `yarn-site.xml` required for enabling Registry DNS. A full list of properties can be found in the Configuration
|
||||||
elements/labels that compose the name. The cluster’s domain name is always the last
|
section of [Registry DNS](RegistryDNS.md).
|
||||||
element. After that element is parsed out, reading from right to left, the first element
|
|
||||||
maps to the application user and so on. Wherever it is not easily distinguishable, naming conventions are used to disambiguate the name using a prefix such as
|
|
||||||
“container” or suffix such as “api”. For example, an endpoint published as a
|
|
||||||
management endpoint will be referenced with the name *management-api.griduser.yarncluster.com*.
|
|
||||||
* Unique application name (per user) is not currently supported/guaranteed by YARN, but
|
|
||||||
it is supported by frameworks such as Apache Slider. The yDNS service currently
|
|
||||||
leverages the last element of the ZK path entry for the application as an
|
|
||||||
application name. These application names have to be unique for a given user.
|
|
||||||
|
|
||||||
## DNS Server Functionality
|
```
|
||||||
|
<property>
|
||||||
|
<description>The domain name for Hadoop cluster associated records.</description>
|
||||||
|
<name>hadoop.registry.dns.domain-name</name>
|
||||||
|
<value>ycluster</value>
|
||||||
|
</property>
|
||||||
|
|
||||||
The primary functions of the DNS service are illustrated in the following diagram:
|
<property>
|
||||||
|
<description>The port number for the DNS listener. The default port is 5353.
|
||||||
|
If the standard privileged port 53 is used, make sure start the DNS with jsvc support.</description>
|
||||||
|
<name>hadoop.registry.dns.bind-port</name>
|
||||||
|
<value>53</value>
|
||||||
|
</property>
|
||||||
|
|
||||||
![DNS Functional Overview](../images/dns_overview.png "DNS Functional Overview")
|
<property>
|
||||||
|
<description>The DNS functionality is enabled for the cluster. Default is false.</description>
|
||||||
|
<name>hadoop.registry.dns.enabled</name>
|
||||||
|
<value>true</value>
|
||||||
|
</property>
|
||||||
|
|
||||||
### DNS record creation
|
<property>
|
||||||
The following figure illustrates at slightly greater detail the DNS record creation and registration sequence (NOTE: service record updates would follow a similar sequence of steps,
|
<description>The network mask associated with the zone IP range. If specified, it is utilized to ascertain the
|
||||||
distinguished only by the different event type):
|
IP range possible and come up with an appropriate reverse zone name.</description>
|
||||||
|
<name>hadoop.registry.dns.zone-mask</name>
|
||||||
|
<value>255.255.255.0</value>
|
||||||
|
</property>
|
||||||
|
|
||||||
![DNS Functional Overview](../images/dns_record_creation.jpeg "DNS Functional Overview")
|
<property>
|
||||||
|
<description>An indicator of the IP range associated with the cluster containers. The setting is utilized for the
|
||||||
|
generation of the reverse zone name.</description>
|
||||||
|
<name>hadoop.registry.dns.zone-subnet</name>
|
||||||
|
<value>172.17.0</value>
|
||||||
|
</property>
|
||||||
|
|
||||||
### DNS record removal
|
```
|
||||||
Similarly, record removal follows a similar sequence
|
|
||||||
|
|
||||||
![DNS Functional Overview](../images/dns_record_removal.jpeg "DNS Functional Overview")
|
|
||||||
|
|
||||||
(NOTE: The DNS Zone requires a record as an argument for the deletion method, thus
|
|
||||||
requiring similar parsing logic to identify the specific records that should be removed).
|
|
||||||
|
|
||||||
### DNS Service initialization
|
|
||||||
* The DNS service initializes both UDP and TCP listeners on a configured port. As
|
|
||||||
noted above, the default port of 5353 is in a restricted range that is only accessible to an
|
|
||||||
account with administrative privileges.
|
|
||||||
* Subsequently, the DNS service listens for inbound DNS requests. Those requests are
|
|
||||||
standard DNS requests from users or other DNS servers (for example, DNS servers that have the
|
|
||||||
YARN DNS service configured as a forwarder).
|
|
||||||
|
|
||||||
## Start the DNS Server
|
## Start the DNS Server
|
||||||
By default, the DNS runs on non-privileged port `5353`.
|
By default, the DNS server runs on non-privileged port `5353`. Start the server
|
||||||
If it is configured to use the standard privileged port `53`, the DNS server needs to be run as root:
|
with:
|
||||||
```
|
```
|
||||||
sudo su - -c "yarn org.apache.hadoop.registry.server.dns.RegistryDNSServer > /${HADOOP_LOG_FOLDER}/registryDNS.log 2>&1 &" root
|
yarn --daemon start registrydns
|
||||||
```
|
```
|
||||||
|
|
||||||
## Configuration
|
If the DNS server is configured to use the standard privileged port `53`, the
|
||||||
The YARN DNS server reads its configuration properties from the yarn-site.xml file. The following are the DNS associated configuration properties:
|
environment variables `YARN_REGISTRYDNS_SECURE_USER` and
|
||||||
|
`YARN_REGISTRYDNS_SECURE_EXTRA_OPTS` must be uncommented in the `yarn-env.sh`
|
||||||
| Name | Description |
|
file. The DNS server should then be launched as `root` and jsvc will be used to
|
||||||
| ------------ | ------------- |
|
reduce the privileges of the daemon after the port has been bound.
|
||||||
| hadoop.registry.dns.enabled | The DNS functionality is enabled for the cluster. Default is false. |
|
|
||||||
| hadoop.registry.dns.domain-name | The domain name for Hadoop cluster associated records. |
|
|
||||||
| hadoop.registry.dns.bind-address | Address associated with the network interface to which the DNS listener should bind. |
|
|
||||||
| hadoop.registry.dns.bind-port | The port number for the DNS listener. The default port is 5353. However, since that port falls in a administrator-only range, typical deployments may need to specify an alternate port. |
|
|
||||||
| hadoop.registry.dns.dnssec.enabled | Indicates whether the DNSSEC support is enabled. Default is false. |
|
|
||||||
| hadoop.registry.dns.public-key | The base64 representation of the server’s public key. Leveraged for creating the DNSKEY Record provided for DNSSEC client requests. |
|
|
||||||
| hadoop.registry.dns.private-key-file | The path to the standard DNSSEC private key file. Must only be readable by the DNS launching identity. See [dnssec-keygen](https://ftp.isc.org/isc/bind/cur/9.9/doc/arm/man.dnssec-keygen.html) documentation. |
|
|
||||||
| hadoop.registry.dns-ttl | The default TTL value to associate with DNS records. The default value is set to 1 (a value of 0 has undefined behavior). A typical value should be approximate to the time it takes YARN to restart a failed container. |
|
|
||||||
| hadoop.registry.dns.zone-subnet | An indicator of the IP range associated with the cluster containers. The setting is utilized for the generation of the reverse zone name. |
|
|
||||||
| hadoop.registry.dns.zone-mask | The network mask associated with the zone IP range. If specified, it is utilized to ascertain the IP range possible and come up with an appropriate reverse zone name. |
|
|
||||||
| hadoop.registry.dns.zones-dir | A directory containing zone configuration files to read during zone initialization. This directory can contain zone master files named *zone-name.zone*. See [here](http://www.zytrax.com/books/dns/ch6/mydomain.html) for zone master file documentation.|
|
|
||||||
|
|
Loading…
Reference in New Issue