YARN-8891. Documentation of the pluggable device framework. Contributed by Zhankun Tang.

2019-02-22 20:00:13 +05:30 · 2019-02-22 20:00:13 +05:30 · 9636fe4114
parent 9c88695bcd
commit 9636fe4114
2 changed files with 328 additions and 0 deletions
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/DevelopYourOwnDevicePlugin.md
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/DevelopYourOwnDevicePlugin.md
@ -0,0 +1,177 @@
 <!---
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at
   http://www.apache.org/licenses/LICENSE-2.0
  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
 -->
 # Develop Your Own Plugin
 A device plugin is loaded into the framework when
 starting NM. Your plugin class only needs to consider two interfaces provided
 by the framework. The `DevicePlugin` is a must to implement and the
 `DevicePluginScheduler` is optional.
 ## DevicePlugin Interface
 ```
 /**
 * A must interface for vendor plugin to implement.
 * */
 public interface DevicePlugin {
  /**
   * Called first when device plugin framework wants to register.
   * @return DeviceRegisterRequest {@link DeviceRegisterRequest}
   * @throws Exception
   * */
  DeviceRegisterRequest getRegisterRequestInfo()
      throws Exception;
  /**
   * Called when update node resource.
   * @return a set of {@link Device}, {@link java.util.TreeSet} recommended
   * @throws Exception
   * */
  Set<Device> getDevices() throws Exception;
  /**
   * Asking how these devices should be prepared/used
   * before/when container launch. A plugin can do some tasks in its own or
   * define it in DeviceRuntimeSpec to let the framework do it.
   * For instance, define {@code VolumeSpec} to let the
   * framework to create volume before running container.
   *
   * @param allocatedDevices A set of allocated {@link Device}.
   * @param yarnRuntime Indicate which runtime YARN will use
   *        Could be {@code RUNTIME_DEFAULT} or {@code RUNTIME_DOCKER}
   *        in {@link DeviceRuntimeSpec} constants. The default means YARN's
   *        non-docker container runtime is used. The docker means YARN's
   *        docker container runtime is used.
   * @return a {@link DeviceRuntimeSpec} description about environment,
   * {@link         VolumeSpec}, {@link MountVolumeSpec}. etc
   * @throws Exception
   * */
  DeviceRuntimeSpec onDevicesAllocated(Set<Device>; allocatedDevices,
      YarnRuntimeType yarnRuntime) throws Exception;
  /**
   * Called after device released.
   * @param releasedDevices A set of released devices
   * @throws Exception
   * */
  void onDevicesReleased(Set<Device> releasedDevices)
      throws Exception;
 }
 ```
 The above code shows the `DevicePlugin` interface you need to implement.
 Let’s go through the methods that a your plugin should implement.
 * getRegisterRequestInfo(): DeviceRegisterRequest
 * getDevices: Set&lt;Device&gt;
 * onDevicesAllocated(Set&lt;Device&gt;, YarnRuntimeType yarnRuntime): DeviceRuntimeSpec
 * onDeviceReleased(Set&lt;Device&gt;): void
 The getRegisterRequestInfo interface is used for the plugin to advertise a
 new resource type name and then the ResourceManager. The “DeviceRegisterRequest”
 returned by the method consists a plugin version and a resource type name
 like “nvidia.com/gpu”.
 The getDevices interface is used to get latest vendor device list in this NM
 node.
 The resource count pre-defined in node-resources.xml will be overridden.
 And it’s recommended that the vendor plugin manages allowed devices reported
 to YARN in its own configuration. YARN can only have a blacklist
 configuration `devices.denied-numbers` in `container-executor.cfg`.
 In this method, you may invoke shell command or invoke RESTful/RPC to remote
 service to get the devices at your convenience.
 Please note that the `Device` object can describe a fake device. If the major
 device number, minor device number and device path is left unset, the
 framework won't do isolation for it. This provide feasibility for user to
 define a fake device without real hardware.
 The onDevicesAllocated interface is invoked to tell the framework how to use these devices.
 The NM invoke this interface to let the plugin do some preparation work like create volume before container launch
 and give hints on how to expose the devices to container when launch it. The
 `DeviceRuntimeSpec` is the structure of the hints. For instance,
 `DeviceRuntimeSpec` can describes the container launch requirements like
 environment variables, device and volume mounts, Docker runtime type.etc.
 The onDeviceReleased  interface is used for the plugin to do some cleanup work
 after container finish.
 ## Optional DevicePluginScheduler Interface
 ```
 /**
 * An optional interface to implement if custom device scheduling is needed.
 * If this is not implemented, the device framework will do scheduling.
 * */
 public interface DevicePluginScheduler {
  /**
   * Called when allocating devices. The framework will do all device book
   * keeping and fail recovery. So this hook could be stateless and only do
   * scheduling based on available devices passed in. It could be
   * invoked multiple times by the framework. The hint in environment variables
   * passed in could be potentially used in making better scheduling decision.
   * For instance, GPU scheduling might support different kind of policy. The
   * container can set it through environment variables.
   * @param availableDevices Devices allowed to be chosen from.
   * @param count Number of device to be allocated.
   * @param env Environment variables of the container.
   * @return A set of {@link Device} allocated
   * */
  Set<Device> allocateDevices(Set<Device> availableDevices, int count,
      Map<String, String> env);
 }
 ```
 The above code shows the `DevicePluginScheduler` interface that you might
 needed if you want to arm the plugin with a more efficient scheduler.
 This `allocateDevices` method is invoked by YARN each time when asking the
 plugin's recommendation devices for one container.
 This interface is optional because YARN will provide a very basic scheduler.
 You can refer to `NvidiaGPUPluginForRuntimeV2` plugin for a plugin customized
 scheduler. Its scheduler is targeting for Nvidia GPU topology aware
 scheduling and can get considerable performance boost for the container.
 ## Dependency in Plugin Project
 When developing the plugin, you need to add below dependency property into
 your projects's `pom.xml`. For instance,
 ```
 <dependencies>
  <dependency>
    <groupId>org.apache.hadoop</groupId>
      <artifactId>hadoop-yarn-server-nodemanager</artifactId>
      <version>3.3.0</version>
      <scope>provided</scope>
  </dependency>
 </dependencies>
 ```
 And after this, you can implement the above interfaces based on classes
 provided in `org.apache.hadoop.yarn.server.nodemanager.api.deviceplugin`.
 Please note that the plugin project is coupled with the Hadoop YARN NM version.
 ## Test And Use Your Own Plugin
 Once you build your project and package a jar which contains your plugin
 class and want to give it a try in your Hadoop cluster.
 Firstly, put the jar file under a directory in Hadooop classpath.
 (recommend $HADOOP_COMMOND_HOME/share/hadoop/yarn). Secondly,
 follow the configurations described in [Pluggable Device Framework](./PluggableDeviceFramework.html) and restart YARN.
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/PluggableDeviceFramework.md
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/PluggableDeviceFramework.md
@ -0,0 +1,151 @@
 <!---
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at
   http://www.apache.org/licenses/LICENSE-2.0
  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
 -->
 # YARN Pluggable Device Framework
 <!-- MACRO{toc|fromDepth=0|toDepth=2} -->
 ## Introduction
 At present, YARN supports GPU/FPGA device through a native, coupling way.
 But it's difficult for a vendor to implement such a device plugin
 because the developer needs to understand various integration points with
 YARN and also a deeper understanding YARN internals related to NodeManager.
 ### Pain Points Of Current Device Plugin
 Some of the pain points for current device plugin development and integration
 are listed below:
 * At least 6 classes to be implemented (If you wanna support
 Docker, you’ll implement one more “DockerCommandPlugin”).
 * When implementing the “ResourceHandler” interface,
 the developer must understand the YARN NM internal concepts like container
 launch mechanism, cgroups operations, docker runtime operations.
 * If one wants isolation, the native container-executor also need a new module
 written in C language.
 This brings burdens to the community to maintain both YARN
 core and vendor-specific code. For more details, check YARN-8851 design document.
 Based on the above reasons and in order for YARN and vendor-specific plugin to
 evolve independently, we developed a new pluggable device framework to ease
 vendor device plugin development and provide a more flexible way to integrate with YARN.
 ## Quick Start
 This pluggable device framework not only simplifies the plugin development but
 also the number of configurations in YARN which are needed for plugin integration.
 Before we go through how to implement
 your own device plugin, let's first see how to use an existing plugin.
 As an example, the new framework includes a sample implementation of Nvidia
 GPU plugin supporting detecting Nvidia GPUs, the custom scheduler and isolating
 containers run with both YARN cgroups and Nvidia Docker runtime v2.
 ### Prerequisites
 1. The pluggable device framework depends on LinuxContainerExecutor to handle
 resource isolation and Docker stuff. So LCE and Docker enabled on YARN is a
 must.
 See [Using CGroups with YARN](./NodeManagerCgroups.html) and [Docker on YARN](./DockerContainers.html)
 2. The sample plugin `NvidiaGPUPluginForRuntimeV2` requires Nvidia GPU drivers
 and Nvidia Docker runtime v2 installed in the nodes. See Nvidia official
 documents for this.
 3. If you use YARN capacity scheduler, below
 `DominantResourceCalculator` configuration is needed (In `capacity-scheduler.xml`):
 ```
 <property>
  <name>yarn.scheduler.capacity.resource-calculator</name>
  <value>org.apache.hadoop.yarn.util.resource.DominantResourceCalculator</value>
 </property>
 ```
 ### Enable Device Plugin Framework
 Two properties to enable the pluggable framework support. First one is
 in `yarn-site.xml`:
 ```
 <property>
  <name>yarn.nodemanager.pluggable-device-framework.enabled</name>
  <value>true</value>
 </property>
 ```
 And then enable the isolation native module in `container-executor.cfg`:
 ```
 # The configs below deal with settings for resource handled by pluggable device plugin framework
 [devices]
  module.enabled=true
 #  devices.denied-numbers=## Blacklisted devices not permitted to use. The format is comma separated "majorNumber:minorNumber". For instance, "195:1,195:2". Leave it empty means default devices reported by device plugin are all allowed.
 ```
 ### Configure Sample Nvidia GPU Plugin
 The pluggable device framework loads one plugin and talks to it to know
 which resource name the plugin is handling. And the resource name should be
 pre-defined in `resource-types.xml`. Here we already know the resource name is
 `nvidia.com/gpu` from the plugin implementation.
 ```
 <property>
  <name>yarn.resource-types</name>
  <value>nvidia.com/gpu</value>
 </property>
 ```
 After define the resource name handled by the plugin. We can configure the
 plugin name in `yarn-site.xml now:
 ```
 <property>
  <name>yarn.nodemanager.pluggable-device-framework.device-classes</name>
  <value>org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.com.nvidia.NvidiaGPUPluginForRuntimeV2</value>
 </property>
 ```
 Note that the property value must be a full class name of the plugin.
 ### Restart YARN And Run Job
 After restarting YARN, you should see the `nvidia.com/gpu` resource count displayed
 while accessing YARN UI2 Overview and NodeManages page or issuing command:
 ```
 yarn node -list -showDetails
 ```
 Then you can run job requesting several `nvidia.com/gpu` as usual:
 ```
 yarn jar <path/to/hadoop-yarn-applications-distributedshell.jar> \
       -jar <path/to/hadoop-yarn-applications-distributedshell.jar> \
       -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker \
       -shell_env YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=<docker-image-name> \
       -shell_command nvidia-smi \
       -container_resources memory-mb=3072,vcores=1,nvidia.com/gpu=2 \
       -num_containers 2
 ```
 ### NM API To Query Resource Allocation
 When a job run with resource like `nvidia.com/gpu`, you can query a NM node's
 resource allocation through below RESTful API. Note that the resource name
 should be URL encoded format (in this case, "nvidia.com%2Fgpu").
 ```
 node:port/ws/v1/node/resources/nvidia.com%2Fgpu
 ```
 For instance, use below command to get the JSON format resource allocation:
 ```
 curl localhost:8042/ws/v1/node/resources/nvidia.com%2Fgpu | jq .
 ```
 ## Develop Your Own Plugin
 Configure an existing plugin is easy. But how about implementing my own one?
 It's easy too! See [Develop Device Plugin](./DevelopYourOwnDevicePlugin.html)