YARN-8891. Documentation of the pluggable device framework. Contributed by Zhankun Tang.
This commit is contained in:
parent
9c88695bcd
commit
9636fe4114
|
@ -0,0 +1,177 @@
|
||||||
|
<!---
|
||||||
|
Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
you may not use this file except in compliance with the License.
|
||||||
|
You may obtain a copy of the License at
|
||||||
|
|
||||||
|
http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
|
||||||
|
Unless required by applicable law or agreed to in writing, software
|
||||||
|
distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
See the License for the specific language governing permissions and
|
||||||
|
limitations under the License. See accompanying LICENSE file.
|
||||||
|
-->
|
||||||
|
|
||||||
|
# Develop Your Own Plugin
|
||||||
|
|
||||||
|
A device plugin is loaded into the framework when
|
||||||
|
starting NM. Your plugin class only needs to consider two interfaces provided
|
||||||
|
by the framework. The `DevicePlugin` is a must to implement and the
|
||||||
|
`DevicePluginScheduler` is optional.
|
||||||
|
|
||||||
|
## DevicePlugin Interface
|
||||||
|
|
||||||
|
```
|
||||||
|
/**
|
||||||
|
* A must interface for vendor plugin to implement.
|
||||||
|
* */
|
||||||
|
public interface DevicePlugin {
|
||||||
|
/**
|
||||||
|
* Called first when device plugin framework wants to register.
|
||||||
|
* @return DeviceRegisterRequest {@link DeviceRegisterRequest}
|
||||||
|
* @throws Exception
|
||||||
|
* */
|
||||||
|
DeviceRegisterRequest getRegisterRequestInfo()
|
||||||
|
throws Exception;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Called when update node resource.
|
||||||
|
* @return a set of {@link Device}, {@link java.util.TreeSet} recommended
|
||||||
|
* @throws Exception
|
||||||
|
* */
|
||||||
|
Set<Device> getDevices() throws Exception;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Asking how these devices should be prepared/used
|
||||||
|
* before/when container launch. A plugin can do some tasks in its own or
|
||||||
|
* define it in DeviceRuntimeSpec to let the framework do it.
|
||||||
|
* For instance, define {@code VolumeSpec} to let the
|
||||||
|
* framework to create volume before running container.
|
||||||
|
*
|
||||||
|
* @param allocatedDevices A set of allocated {@link Device}.
|
||||||
|
* @param yarnRuntime Indicate which runtime YARN will use
|
||||||
|
* Could be {@code RUNTIME_DEFAULT} or {@code RUNTIME_DOCKER}
|
||||||
|
* in {@link DeviceRuntimeSpec} constants. The default means YARN's
|
||||||
|
* non-docker container runtime is used. The docker means YARN's
|
||||||
|
* docker container runtime is used.
|
||||||
|
* @return a {@link DeviceRuntimeSpec} description about environment,
|
||||||
|
* {@link VolumeSpec}, {@link MountVolumeSpec}. etc
|
||||||
|
* @throws Exception
|
||||||
|
* */
|
||||||
|
DeviceRuntimeSpec onDevicesAllocated(Set<Device>; allocatedDevices,
|
||||||
|
YarnRuntimeType yarnRuntime) throws Exception;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Called after device released.
|
||||||
|
* @param releasedDevices A set of released devices
|
||||||
|
* @throws Exception
|
||||||
|
* */
|
||||||
|
void onDevicesReleased(Set<Device> releasedDevices)
|
||||||
|
throws Exception;
|
||||||
|
}
|
||||||
|
|
||||||
|
```
|
||||||
|
The above code shows the `DevicePlugin` interface you need to implement.
|
||||||
|
Let’s go through the methods that a your plugin should implement.
|
||||||
|
|
||||||
|
|
||||||
|
* getRegisterRequestInfo(): DeviceRegisterRequest
|
||||||
|
* getDevices: Set<Device>
|
||||||
|
* onDevicesAllocated(Set<Device>, YarnRuntimeType yarnRuntime): DeviceRuntimeSpec
|
||||||
|
* onDeviceReleased(Set<Device>): void
|
||||||
|
|
||||||
|
|
||||||
|
The getRegisterRequestInfo interface is used for the plugin to advertise a
|
||||||
|
new resource type name and then the ResourceManager. The “DeviceRegisterRequest”
|
||||||
|
returned by the method consists a plugin version and a resource type name
|
||||||
|
like “nvidia.com/gpu”.
|
||||||
|
|
||||||
|
|
||||||
|
The getDevices interface is used to get latest vendor device list in this NM
|
||||||
|
node.
|
||||||
|
The resource count pre-defined in node-resources.xml will be overridden.
|
||||||
|
And it’s recommended that the vendor plugin manages allowed devices reported
|
||||||
|
to YARN in its own configuration. YARN can only have a blacklist
|
||||||
|
configuration `devices.denied-numbers` in `container-executor.cfg`.
|
||||||
|
In this method, you may invoke shell command or invoke RESTful/RPC to remote
|
||||||
|
service to get the devices at your convenience.
|
||||||
|
|
||||||
|
|
||||||
|
Please note that the `Device` object can describe a fake device. If the major
|
||||||
|
device number, minor device number and device path is left unset, the
|
||||||
|
framework won't do isolation for it. This provide feasibility for user to
|
||||||
|
define a fake device without real hardware.
|
||||||
|
|
||||||
|
The onDevicesAllocated interface is invoked to tell the framework how to use these devices.
|
||||||
|
The NM invoke this interface to let the plugin do some preparation work like create volume before container launch
|
||||||
|
and give hints on how to expose the devices to container when launch it. The
|
||||||
|
`DeviceRuntimeSpec` is the structure of the hints. For instance,
|
||||||
|
`DeviceRuntimeSpec` can describes the container launch requirements like
|
||||||
|
environment variables, device and volume mounts, Docker runtime type.etc.
|
||||||
|
|
||||||
|
|
||||||
|
The onDeviceReleased interface is used for the plugin to do some cleanup work
|
||||||
|
after container finish.
|
||||||
|
|
||||||
|
## Optional DevicePluginScheduler Interface
|
||||||
|
|
||||||
|
```
|
||||||
|
/**
|
||||||
|
* An optional interface to implement if custom device scheduling is needed.
|
||||||
|
* If this is not implemented, the device framework will do scheduling.
|
||||||
|
* */
|
||||||
|
public interface DevicePluginScheduler {
|
||||||
|
/**
|
||||||
|
* Called when allocating devices. The framework will do all device book
|
||||||
|
* keeping and fail recovery. So this hook could be stateless and only do
|
||||||
|
* scheduling based on available devices passed in. It could be
|
||||||
|
* invoked multiple times by the framework. The hint in environment variables
|
||||||
|
* passed in could be potentially used in making better scheduling decision.
|
||||||
|
* For instance, GPU scheduling might support different kind of policy. The
|
||||||
|
* container can set it through environment variables.
|
||||||
|
* @param availableDevices Devices allowed to be chosen from.
|
||||||
|
* @param count Number of device to be allocated.
|
||||||
|
* @param env Environment variables of the container.
|
||||||
|
* @return A set of {@link Device} allocated
|
||||||
|
* */
|
||||||
|
Set<Device> allocateDevices(Set<Device> availableDevices, int count,
|
||||||
|
Map<String, String> env);
|
||||||
|
}
|
||||||
|
```
|
||||||
|
The above code shows the `DevicePluginScheduler` interface that you might
|
||||||
|
needed if you want to arm the plugin with a more efficient scheduler.
|
||||||
|
This `allocateDevices` method is invoked by YARN each time when asking the
|
||||||
|
plugin's recommendation devices for one container.
|
||||||
|
This interface is optional because YARN will provide a very basic scheduler.
|
||||||
|
|
||||||
|
You can refer to `NvidiaGPUPluginForRuntimeV2` plugin for a plugin customized
|
||||||
|
scheduler. Its scheduler is targeting for Nvidia GPU topology aware
|
||||||
|
scheduling and can get considerable performance boost for the container.
|
||||||
|
|
||||||
|
## Dependency in Plugin Project
|
||||||
|
|
||||||
|
When developing the plugin, you need to add below dependency property into
|
||||||
|
your projects's `pom.xml`. For instance,
|
||||||
|
```
|
||||||
|
<dependencies>
|
||||||
|
<dependency>
|
||||||
|
<groupId>org.apache.hadoop</groupId>
|
||||||
|
<artifactId>hadoop-yarn-server-nodemanager</artifactId>
|
||||||
|
<version>3.3.0</version>
|
||||||
|
<scope>provided</scope>
|
||||||
|
</dependency>
|
||||||
|
</dependencies>
|
||||||
|
```
|
||||||
|
|
||||||
|
And after this, you can implement the above interfaces based on classes
|
||||||
|
provided in `org.apache.hadoop.yarn.server.nodemanager.api.deviceplugin`.
|
||||||
|
Please note that the plugin project is coupled with the Hadoop YARN NM version.
|
||||||
|
|
||||||
|
## Test And Use Your Own Plugin
|
||||||
|
Once you build your project and package a jar which contains your plugin
|
||||||
|
class and want to give it a try in your Hadoop cluster.
|
||||||
|
|
||||||
|
|
||||||
|
Firstly, put the jar file under a directory in Hadooop classpath.
|
||||||
|
(recommend $HADOOP_COMMOND_HOME/share/hadoop/yarn). Secondly,
|
||||||
|
follow the configurations described in [Pluggable Device Framework](./PluggableDeviceFramework.html) and restart YARN.
|
|
@ -0,0 +1,151 @@
|
||||||
|
<!---
|
||||||
|
Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
you may not use this file except in compliance with the License.
|
||||||
|
You may obtain a copy of the License at
|
||||||
|
|
||||||
|
http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
|
||||||
|
Unless required by applicable law or agreed to in writing, software
|
||||||
|
distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
See the License for the specific language governing permissions and
|
||||||
|
limitations under the License. See accompanying LICENSE file.
|
||||||
|
-->
|
||||||
|
|
||||||
|
# YARN Pluggable Device Framework
|
||||||
|
|
||||||
|
<!-- MACRO{toc|fromDepth=0|toDepth=2} -->
|
||||||
|
|
||||||
|
## Introduction
|
||||||
|
|
||||||
|
At present, YARN supports GPU/FPGA device through a native, coupling way.
|
||||||
|
But it's difficult for a vendor to implement such a device plugin
|
||||||
|
because the developer needs to understand various integration points with
|
||||||
|
YARN and also a deeper understanding YARN internals related to NodeManager.
|
||||||
|
|
||||||
|
### Pain Points Of Current Device Plugin
|
||||||
|
|
||||||
|
Some of the pain points for current device plugin development and integration
|
||||||
|
are listed below:
|
||||||
|
|
||||||
|
|
||||||
|
* At least 6 classes to be implemented (If you wanna support
|
||||||
|
Docker, you’ll implement one more “DockerCommandPlugin”).
|
||||||
|
* When implementing the “ResourceHandler” interface,
|
||||||
|
the developer must understand the YARN NM internal concepts like container
|
||||||
|
launch mechanism, cgroups operations, docker runtime operations.
|
||||||
|
* If one wants isolation, the native container-executor also need a new module
|
||||||
|
written in C language.
|
||||||
|
|
||||||
|
|
||||||
|
This brings burdens to the community to maintain both YARN
|
||||||
|
core and vendor-specific code. For more details, check YARN-8851 design document.
|
||||||
|
|
||||||
|
|
||||||
|
Based on the above reasons and in order for YARN and vendor-specific plugin to
|
||||||
|
evolve independently, we developed a new pluggable device framework to ease
|
||||||
|
vendor device plugin development and provide a more flexible way to integrate with YARN.
|
||||||
|
|
||||||
|
## Quick Start
|
||||||
|
|
||||||
|
This pluggable device framework not only simplifies the plugin development but
|
||||||
|
also the number of configurations in YARN which are needed for plugin integration.
|
||||||
|
Before we go through how to implement
|
||||||
|
your own device plugin, let's first see how to use an existing plugin.
|
||||||
|
|
||||||
|
|
||||||
|
As an example, the new framework includes a sample implementation of Nvidia
|
||||||
|
GPU plugin supporting detecting Nvidia GPUs, the custom scheduler and isolating
|
||||||
|
containers run with both YARN cgroups and Nvidia Docker runtime v2.
|
||||||
|
|
||||||
|
### Prerequisites
|
||||||
|
1. The pluggable device framework depends on LinuxContainerExecutor to handle
|
||||||
|
resource isolation and Docker stuff. So LCE and Docker enabled on YARN is a
|
||||||
|
must.
|
||||||
|
See [Using CGroups with YARN](./NodeManagerCgroups.html) and [Docker on YARN](./DockerContainers.html)
|
||||||
|
|
||||||
|
2. The sample plugin `NvidiaGPUPluginForRuntimeV2` requires Nvidia GPU drivers
|
||||||
|
and Nvidia Docker runtime v2 installed in the nodes. See Nvidia official
|
||||||
|
documents for this.
|
||||||
|
|
||||||
|
3. If you use YARN capacity scheduler, below
|
||||||
|
`DominantResourceCalculator` configuration is needed (In `capacity-scheduler.xml`):
|
||||||
|
```
|
||||||
|
<property>
|
||||||
|
<name>yarn.scheduler.capacity.resource-calculator</name>
|
||||||
|
<value>org.apache.hadoop.yarn.util.resource.DominantResourceCalculator</value>
|
||||||
|
</property>
|
||||||
|
```
|
||||||
|
|
||||||
|
### Enable Device Plugin Framework
|
||||||
|
Two properties to enable the pluggable framework support. First one is
|
||||||
|
in `yarn-site.xml`:
|
||||||
|
```
|
||||||
|
<property>
|
||||||
|
<name>yarn.nodemanager.pluggable-device-framework.enabled</name>
|
||||||
|
<value>true</value>
|
||||||
|
</property>
|
||||||
|
```
|
||||||
|
And then enable the isolation native module in `container-executor.cfg`:
|
||||||
|
```
|
||||||
|
# The configs below deal with settings for resource handled by pluggable device plugin framework
|
||||||
|
[devices]
|
||||||
|
module.enabled=true
|
||||||
|
# devices.denied-numbers=## Blacklisted devices not permitted to use. The format is comma separated "majorNumber:minorNumber". For instance, "195:1,195:2". Leave it empty means default devices reported by device plugin are all allowed.
|
||||||
|
```
|
||||||
|
|
||||||
|
### Configure Sample Nvidia GPU Plugin
|
||||||
|
The pluggable device framework loads one plugin and talks to it to know
|
||||||
|
which resource name the plugin is handling. And the resource name should be
|
||||||
|
pre-defined in `resource-types.xml`. Here we already know the resource name is
|
||||||
|
`nvidia.com/gpu` from the plugin implementation.
|
||||||
|
```
|
||||||
|
<property>
|
||||||
|
<name>yarn.resource-types</name>
|
||||||
|
<value>nvidia.com/gpu</value>
|
||||||
|
</property>
|
||||||
|
```
|
||||||
|
After define the resource name handled by the plugin. We can configure the
|
||||||
|
plugin name in `yarn-site.xml now:
|
||||||
|
```
|
||||||
|
<property>
|
||||||
|
<name>yarn.nodemanager.pluggable-device-framework.device-classes</name>
|
||||||
|
<value>org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.com.nvidia.NvidiaGPUPluginForRuntimeV2</value>
|
||||||
|
</property>
|
||||||
|
```
|
||||||
|
Note that the property value must be a full class name of the plugin.
|
||||||
|
|
||||||
|
### Restart YARN And Run Job
|
||||||
|
After restarting YARN, you should see the `nvidia.com/gpu` resource count displayed
|
||||||
|
while accessing YARN UI2 Overview and NodeManages page or issuing command:
|
||||||
|
```
|
||||||
|
yarn node -list -showDetails
|
||||||
|
```
|
||||||
|
|
||||||
|
Then you can run job requesting several `nvidia.com/gpu` as usual:
|
||||||
|
```
|
||||||
|
yarn jar <path/to/hadoop-yarn-applications-distributedshell.jar> \
|
||||||
|
-jar <path/to/hadoop-yarn-applications-distributedshell.jar> \
|
||||||
|
-shell_env YARN_CONTAINER_RUNTIME_TYPE=docker \
|
||||||
|
-shell_env YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=<docker-image-name> \
|
||||||
|
-shell_command nvidia-smi \
|
||||||
|
-container_resources memory-mb=3072,vcores=1,nvidia.com/gpu=2 \
|
||||||
|
-num_containers 2
|
||||||
|
```
|
||||||
|
|
||||||
|
### NM API To Query Resource Allocation
|
||||||
|
When a job run with resource like `nvidia.com/gpu`, you can query a NM node's
|
||||||
|
resource allocation through below RESTful API. Note that the resource name
|
||||||
|
should be URL encoded format (in this case, "nvidia.com%2Fgpu").
|
||||||
|
```
|
||||||
|
node:port/ws/v1/node/resources/nvidia.com%2Fgpu
|
||||||
|
```
|
||||||
|
For instance, use below command to get the JSON format resource allocation:
|
||||||
|
```
|
||||||
|
curl localhost:8042/ws/v1/node/resources/nvidia.com%2Fgpu | jq .
|
||||||
|
```
|
||||||
|
|
||||||
|
## Develop Your Own Plugin
|
||||||
|
|
||||||
|
Configure an existing plugin is easy. But how about implementing my own one?
|
||||||
|
It's easy too! See [Develop Device Plugin](./DevelopYourOwnDevicePlugin.html)
|
Loading…
Reference in New Issue