hadoop/hadoop-yarn/hadoop-yarn-site/DockerContainers.html

1557 lines
90 KiB
HTML

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<!--
| Generated by Apache Maven Doxia at 2023-03-27
| Rendered using Apache Maven Stylus Skin 1.5
-->
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Apache Hadoop 3.4.0-SNAPSHOT &#x2013; Launching Applications Using Docker Containers</title>
<style type="text/css" media="all">
@import url("./css/maven-base.css");
@import url("./css/maven-theme.css");
@import url("./css/site.css");
</style>
<link rel="stylesheet" href="./css/print.css" type="text/css" media="print" />
<meta name="Date-Revision-yyyymmdd" content="20230327" />
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
</head>
<body class="composite">
<div id="banner">
<a href="http://hadoop.apache.org/" id="bannerLeft">
<img src="http://hadoop.apache.org/images/hadoop-logo.jpg" alt="" />
</a>
<a href="http://www.apache.org/" id="bannerRight">
<img src="http://www.apache.org/images/asf_logo_wide.png" alt="" />
</a>
<div class="clear">
<hr/>
</div>
</div>
<div id="breadcrumbs">
<div class="xright"> <a href="http://wiki.apache.org/hadoop" class="externalLink">Wiki</a>
|
<a href="https://gitbox.apache.org/repos/asf/hadoop.git" class="externalLink">git</a>
|
<a href="http://hadoop.apache.org/" class="externalLink">Apache Hadoop</a>
&nbsp;| Last Published: 2023-03-27
&nbsp;| Version: 3.4.0-SNAPSHOT
</div>
<div class="clear">
<hr/>
</div>
</div>
<div id="leftColumn">
<div id="navcolumn">
<h5>General</h5>
<ul>
<li class="none">
<a href="../../index.html">Overview</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/SingleCluster.html">Single Node Setup</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/ClusterSetup.html">Cluster Setup</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/CommandsManual.html">Commands Reference</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/FileSystemShell.html">FileSystem Shell</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/Compatibility.html">Compatibility Specification</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/DownstreamDev.html">Downstream Developer's Guide</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/AdminCompatibilityGuide.html">Admin Compatibility Guide</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/InterfaceClassification.html">Interface Classification</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/filesystem/index.html">FileSystem Specification</a>
</li>
</ul>
<h5>Common</h5>
<ul>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/CLIMiniCluster.html">CLI Mini Cluster</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/FairCallQueue.html">Fair Call Queue</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/NativeLibraries.html">Native Libraries</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/Superusers.html">Proxy User</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/RackAwareness.html">Rack Awareness</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/SecureMode.html">Secure Mode</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/ServiceLevelAuth.html">Service Level Authorization</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/HttpAuthentication.html">HTTP Authentication</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/CredentialProviderAPI.html">Credential Provider API</a>
</li>
<li class="none">
<a href="../../hadoop-kms/index.html">Hadoop KMS</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/Tracing.html">Tracing</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/UnixShellGuide.html">Unix Shell Guide</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/registry/index.html">Registry</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/AsyncProfilerServlet.html">Async Profiler</a>
</li>
</ul>
<h5>HDFS</h5>
<ul>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HdfsDesign.html">Architecture</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html">User Guide</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HDFSCommands.html">Commands Reference</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html">NameNode HA With QJM</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithNFS.html">NameNode HA With NFS</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/ObserverNameNode.html">Observer NameNode</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/Federation.html">Federation</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/ViewFs.html">ViewFs</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/ViewFsOverloadScheme.html">ViewFsOverloadScheme</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HdfsSnapshots.html">Snapshots</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HdfsEditsViewer.html">Edits Viewer</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HdfsImageViewer.html">Image Viewer</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HdfsPermissionsGuide.html">Permissions and HDFS</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HdfsQuotaAdminGuide.html">Quotas and HDFS</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/LibHdfs.html">libhdfs (C API)</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/WebHDFS.html">WebHDFS (REST API)</a>
</li>
<li class="none">
<a href="../../hadoop-hdfs-httpfs/index.html">HttpFS</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/ShortCircuitLocalReads.html">Short Circuit Local Reads</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/CentralizedCacheManagement.html">Centralized Cache Management</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HdfsNfsGateway.html">NFS Gateway</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html">Rolling Upgrade</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/ExtendedAttributes.html">Extended Attributes</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/TransparentEncryption.html">Transparent Encryption</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HdfsMultihoming.html">Multihoming</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/ArchivalStorage.html">Storage Policies</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/MemoryStorage.html">Memory Storage Support</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/SLGUserGuide.html">Synthetic Load Generator</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HDFSErasureCoding.html">Erasure Coding</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HDFSDiskbalancer.html">Disk Balancer</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HdfsUpgradeDomain.html">Upgrade Domain</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HdfsDataNodeAdminGuide.html">DataNode Admin</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs-rbf/HDFSRouterFederation.html">Router Federation</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HdfsProvidedStorage.html">Provided Storage</a>
</li>
</ul>
<h5>MapReduce</h5>
<ul>
<li class="none">
<a href="../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html">Tutorial</a>
</li>
<li class="none">
<a href="../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapredCommands.html">Commands Reference</a>
</li>
<li class="none">
<a href="../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduce_Compatibility_Hadoop1_Hadoop2.html">Compatibility with 1.x</a>
</li>
<li class="none">
<a href="../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/EncryptedShuffle.html">Encrypted Shuffle</a>
</li>
<li class="none">
<a href="../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/PluggableShuffleAndPluggableSort.html">Pluggable Shuffle/Sort</a>
</li>
<li class="none">
<a href="../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/DistributedCacheDeploy.html">Distributed Cache Deploy</a>
</li>
<li class="none">
<a href="../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/SharedCacheSupport.html">Support for YARN Shared Cache</a>
</li>
</ul>
<h5>MapReduce REST APIs</h5>
<ul>
<li class="none">
<a href="../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapredAppMasterRest.html">MR Application Master</a>
</li>
<li class="none">
<a href="../../hadoop-mapreduce-client/hadoop-mapreduce-client-hs/HistoryServerRest.html">MR History Server</a>
</li>
</ul>
<h5>YARN</h5>
<ul>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/YARN.html">Architecture</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/YarnCommands.html">Commands Reference</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html">Capacity Scheduler</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/FairScheduler.html">Fair Scheduler</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/ResourceManagerRestart.html">ResourceManager Restart</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/ResourceManagerHA.html">ResourceManager HA</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/ResourceModel.html">Resource Model</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/NodeLabel.html">Node Labels</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/NodeAttributes.html">Node Attributes</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/WebApplicationProxy.html">Web Application Proxy</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/TimelineServer.html">Timeline Server</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/TimelineServiceV2.html">Timeline Service V.2</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/WritingYarnApplications.html">Writing YARN Applications</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/YarnApplicationSecurity.html">YARN Application Security</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/NodeManager.html">NodeManager</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/DockerContainers.html">Running Applications in Docker Containers</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/RuncContainers.html">Running Applications in runC Containers</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/NodeManagerCgroups.html">Using CGroups</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/SecureContainer.html">Secure Containers</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/ReservationSystem.html">Reservation System</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/GracefulDecommission.html">Graceful Decommission</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/OpportunisticContainers.html">Opportunistic Containers</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/Federation.html">YARN Federation</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/SharedCache.html">Shared Cache</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/UsingGpus.html">Using GPU</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/UsingFPGA.html">Using FPGA</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/PlacementConstraints.html">Placement Constraints</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/YarnUI2.html">YARN UI2</a>
</li>
</ul>
<h5>YARN REST APIs</h5>
<ul>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/WebServicesIntro.html">Introduction</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html">Resource Manager</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/NodeManagerRest.html">Node Manager</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/TimelineServer.html#Timeline_Server_REST_API_v1">Timeline Server</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/TimelineServiceV2.html#Timeline_Service_v.2_REST_API">Timeline Service V.2</a>
</li>
</ul>
<h5>YARN Service</h5>
<ul>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/yarn-service/Overview.html">Overview</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/yarn-service/QuickStart.html">QuickStart</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/yarn-service/Concepts.html">Concepts</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/yarn-service/YarnServiceAPI.html">Yarn Service API</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/yarn-service/ServiceDiscovery.html">Service Discovery</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/yarn-service/SystemServices.html">System Services</a>
</li>
</ul>
<h5>Hadoop Compatible File Systems</h5>
<ul>
<li class="none">
<a href="../../hadoop-aliyun/tools/hadoop-aliyun/index.html">Aliyun OSS</a>
</li>
<li class="none">
<a href="../../hadoop-aws/tools/hadoop-aws/index.html">Amazon S3</a>
</li>
<li class="none">
<a href="../../hadoop-azure/index.html">Azure Blob Storage</a>
</li>
<li class="none">
<a href="../../hadoop-azure-datalake/index.html">Azure Data Lake Storage</a>
</li>
<li class="none">
<a href="../../hadoop-cos/cloud-storage/index.html">Tencent COS</a>
</li>
<li class="none">
<a href="../../hadoop-huaweicloud/cloud-storage/index.html">Huaweicloud OBS</a>
</li>
</ul>
<h5>Auth</h5>
<ul>
<li class="none">
<a href="../../hadoop-auth/index.html">Overview</a>
</li>
<li class="none">
<a href="../../hadoop-auth/Examples.html">Examples</a>
</li>
<li class="none">
<a href="../../hadoop-auth/Configuration.html">Configuration</a>
</li>
<li class="none">
<a href="../../hadoop-auth/BuildingIt.html">Building</a>
</li>
</ul>
<h5>Tools</h5>
<ul>
<li class="none">
<a href="../../hadoop-streaming/HadoopStreaming.html">Hadoop Streaming</a>
</li>
<li class="none">
<a href="../../hadoop-archives/HadoopArchives.html">Hadoop Archives</a>
</li>
<li class="none">
<a href="../../hadoop-archive-logs/HadoopArchiveLogs.html">Hadoop Archive Logs</a>
</li>
<li class="none">
<a href="../../hadoop-distcp/DistCp.html">DistCp</a>
</li>
<li class="none">
<a href="../../hadoop-federation-balance/HDFSFederationBalance.html">HDFS Federation Balance</a>
</li>
<li class="none">
<a href="../../hadoop-gridmix/GridMix.html">GridMix</a>
</li>
<li class="none">
<a href="../../hadoop-rumen/Rumen.html">Rumen</a>
</li>
<li class="none">
<a href="../../hadoop-resourceestimator/ResourceEstimator.html">Resource Estimator Service</a>
</li>
<li class="none">
<a href="../../hadoop-sls/SchedulerLoadSimulator.html">Scheduler Load Simulator</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/Benchmarking.html">Hadoop Benchmarking</a>
</li>
<li class="none">
<a href="../../hadoop-dynamometer/Dynamometer.html">Dynamometer</a>
</li>
</ul>
<h5>Reference</h5>
<ul>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/release/">Changelog and Release Notes</a>
</li>
<li class="none">
<a href="../../api/index.html">Java API docs</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/UnixShellAPI.html">Unix Shell API</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/Metrics.html">Metrics</a>
</li>
</ul>
<h5>Configuration</h5>
<ul>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/core-default.xml">core-default.xml</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/hdfs-default.xml">hdfs-default.xml</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs-rbf/hdfs-rbf-default.xml">hdfs-rbf-default.xml</a>
</li>
<li class="none">
<a href="../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml">mapred-default.xml</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-common/yarn-default.xml">yarn-default.xml</a>
</li>
<li class="none">
<a href="../../hadoop-kms/kms-default.html">kms-default.xml</a>
</li>
<li class="none">
<a href="../../hadoop-hdfs-httpfs/httpfs-default.html">httpfs-default.xml</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/DeprecatedProperties.html">Deprecated Properties</a>
</li>
</ul>
<a href="http://maven.apache.org/" title="Built by Maven" class="poweredBy">
<img alt="Built by Maven" src="./images/logos/maven-feather.png"/>
</a>
</div>
</div>
<div id="bodyColumn">
<div id="contentBox">
<!---
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<h1>Launching Applications Using Docker Containers</h1>
<ul>
<li><a href="#Security_Warning">Security Warning</a></li>
<li><a href="#Overview">Overview</a></li>
<li><a href="#Cluster_Configuration">Cluster Configuration</a></li>
<li><a href="#Docker_Image_Requirements">Docker Image Requirements</a></li>
<li><a href="#CGroups_configuration_Requirements">CGroups configuration Requirements</a></li>
<li><a href="#Application_Submission">Application Submission</a></li>
<li><a href="#Using_Docker_Bind_Mounted_Volumes">Using Docker Bind Mounted Volumes</a></li>
<li><a href="#User_Management_in_Docker_Container">User Management in Docker Container</a></li>
<li><a href="#Privileged_Container_Security_Consideration">Privileged Container Security Consideration</a></li>
<li><a href="#Container_Reacquisition_Requirements">Container Reacquisition Requirements</a></li>
<li><a href="#Connecting_to_a_Docker_Trusted_Registry">Connecting to a Docker Trusted Registry</a></li>
<li><a href="#Example:_MapReduce">Example: MapReduce</a></li>
<li><a href="#Example:_Spark">Example: Spark</a></li>
<li><a href="#Docker_Container_ENTRYPOINT_Support">Docker Container ENTRYPOINT Support</a></li>
<li><a href="#Requirements_when_not_using_ENTRYPOINT_.28YARN_mode.29">Requirements when not using ENTRYPOINT (YARN mode)</a></li>
<li><a href="#Docker_Container_YARN_SysFS_Support">Docker Container YARN SysFS Support</a></li>
<li><a href="#Docker_Container_Service_Mode">Docker Container Service Mode</a></li></ul>
<section>
<h2><a name="Security_Warning"></a>Security Warning</h2>
<p><b>IMPORTANT</b> Enabling this feature and running Docker containers in your cluster has security implications. Given Docker&#x2019;s integration with many powerful kernel features, it is imperative that administrators understand <a class="externalLink" href="https://docs.docker.com/engine/security/security/">Docker security</a> before enabling this feature.</p></section><section>
<h2><a name="Overview"></a>Overview</h2>
<p><a class="externalLink" href="https://www.docker.io/">Docker</a> combines an easy-to-use interface to Linux containers with easy-to-construct image files for those containers. In short, Docker enables users to bundle an application together with its preferred execution environment to be executed on a target machine. For more information about Docker, see their <a class="externalLink" href="http://docs.docker.com">documentation</a>.</p>
<p>The Linux Container Executor (LCE) allows the YARN NodeManager to launch YARN containers to run either directly on the host machine or inside Docker containers. The application requesting the resources can specify for each container how it should be executed. The LCE also provides enhanced security and is required when deploying a secure cluster. When the LCE launches a YARN container to execute in a Docker container, the application can specify the Docker image to be used.</p>
<p>Docker containers provide a custom execution environment in which the application&#x2019;s code runs, isolated from the execution environment of the NodeManager and other applications. These containers can include special libraries needed by the application, and they can have different versions of native tools and libraries including Perl, Python, and Java. Docker containers can even run a different flavor of Linux than what is running on the NodeManager.</p>
<p>Docker for YARN provides both consistency (all YARN containers will have the same software environment) and isolation (no interference with whatever is installed on the physical machine).</p></section><section>
<h2><a name="Cluster_Configuration"></a>Cluster Configuration</h2>
<p>The LCE requires that container-executor binary be owned by root:hadoop and have 6050 permissions. In order to launch Docker containers, the Docker daemon must be running on all NodeManager hosts where Docker containers will be launched. The Docker client must also be installed on all NodeManager hosts where Docker containers will be launched and able to start Docker containers.</p>
<p>To prevent timeouts while starting jobs, any large Docker images to be used by an application should already be loaded in the Docker daemon&#x2019;s cache on the NodeManager hosts. A simple way to load an image is by issuing a Docker pull request. For example:</p>
<div class="source">
<div class="source">
<pre> sudo docker pull library/openjdk:8
</pre></div></div>
<p>The following properties should be set in yarn-site.xml:</p>
<div class="source">
<div class="source">
<pre>&lt;configuration&gt;
&lt;property&gt;
&lt;name&gt;yarn.nodemanager.container-executor.class&lt;/name&gt;
&lt;value&gt;org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor&lt;/value&gt;
&lt;description&gt;
This is the container executor setting that ensures that all applications
are started with the LinuxContainerExecutor.
&lt;/description&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;yarn.nodemanager.linux-container-executor.group&lt;/name&gt;
&lt;value&gt;hadoop&lt;/value&gt;
&lt;description&gt;
The POSIX group of the NodeManager. It should match the setting in
&quot;container-executor.cfg&quot;. This configuration is required for validating
the secure access of the container-executor binary.
&lt;/description&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;yarn.nodemanager.linux-container-executor.nonsecure-mode.limit-users&lt;/name&gt;
&lt;value&gt;false&lt;/value&gt;
&lt;description&gt;
Whether all applications should be run as the NodeManager process' owner.
When false, applications are launched instead as the application owner.
&lt;/description&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;yarn.nodemanager.runtime.linux.allowed-runtimes&lt;/name&gt;
&lt;value&gt;default,docker&lt;/value&gt;
&lt;description&gt;
Comma separated list of runtimes that are allowed when using
LinuxContainerExecutor. The allowed values are default, docker, and
javasandbox.
&lt;/description&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;yarn.nodemanager.runtime.linux.type&lt;/name&gt;
&lt;value&gt;&lt;/value&gt;
&lt;description&gt;
Optional. Sets the default container runtime to use.
&lt;/description&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;yarn.nodemanager.runtime.linux.docker.image-name&lt;/name&gt;
&lt;value&gt;&lt;/value&gt;
&lt;description&gt;
Optional. Default docker image to be used when the docker runtime is
selected.
&lt;/description&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;yarn.nodemanager.runtime.linux.docker.image-update&lt;/name&gt;
&lt;value&gt;false&lt;/value&gt;
&lt;description&gt;
Optional. Default option to decide whether to pull the latest image
or not.
&lt;/description&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;yarn.nodemanager.runtime.linux.docker.allowed-container-networks&lt;/name&gt;
&lt;value&gt;host,none,bridge&lt;/value&gt;
&lt;description&gt;
Optional. A comma-separated set of networks allowed when launching
containers. Valid values are determined by Docker networks available from
`docker network ls`
&lt;/description&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;yarn.nodemanager.runtime.linux.docker.default-container-network&lt;/name&gt;
&lt;value&gt;host&lt;/value&gt;
&lt;description&gt;
The network used when launching Docker containers when no
network is specified in the request. This network must be one of the
(configurable) set of allowed container networks.
&lt;/description&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;yarn.nodemanager.runtime.linux.docker.host-pid-namespace.allowed&lt;/name&gt;
&lt;value&gt;false&lt;/value&gt;
&lt;description&gt;
Optional. Whether containers are allowed to use the host PID namespace.
&lt;/description&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;yarn.nodemanager.runtime.linux.docker.privileged-containers.allowed&lt;/name&gt;
&lt;value&gt;false&lt;/value&gt;
&lt;description&gt;
Optional. Whether applications are allowed to run in privileged
containers. Privileged containers are granted the complete set of
capabilities and are not subject to the limitations imposed by the device
cgroup controller. In other words, privileged containers can do almost
everything that the host can do. Use with extreme care.
&lt;/description&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;yarn.nodemanager.runtime.linux.docker.delayed-removal.allowed&lt;/name&gt;
&lt;value&gt;false&lt;/value&gt;
&lt;description&gt;
Optional. Whether or not users are allowed to request that Docker
containers honor the debug deletion delay. This is useful for
troubleshooting Docker container related launch failures.
&lt;/description&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;yarn.nodemanager.runtime.linux.docker.stop.grace-period&lt;/name&gt;
&lt;value&gt;10&lt;/value&gt;
&lt;description&gt;
Optional. A configurable value to pass to the Docker Stop command. This
value defines the number of seconds between the docker stop command sending
a SIGTERM and a SIGKILL.
&lt;/description&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;yarn.nodemanager.runtime.linux.docker.privileged-containers.acl&lt;/name&gt;
&lt;value&gt;&lt;/value&gt;
&lt;description&gt;
Optional. A comma-separated list of users who are allowed to request
privileged containers if privileged containers are allowed.
&lt;/description&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;yarn.nodemanager.runtime.linux.docker.capabilities&lt;/name&gt;
&lt;value&gt;CHOWN,DAC_OVERRIDE,FSETID,FOWNER,MKNOD,NET_RAW,SETGID,SETUID,SETFCAP,SETPCAP,NET_BIND_SERVICE,SYS_CHROOT,KILL,AUDIT_WRITE&lt;/value&gt;
&lt;description&gt;
Optional. This configuration setting determines the capabilities
assigned to docker containers when they are launched. While these may not
be case-sensitive from a docker perspective, it is best to keep these
uppercase. To run without any capabilites, set this value to
&quot;none&quot; or &quot;NONE&quot;
&lt;/description&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;yarn.nodemanager.runtime.linux.docker.enable-userremapping.allowed&lt;/name&gt;
&lt;value&gt;true&lt;/value&gt;
&lt;description&gt;
Optional. Whether docker containers are run with the UID and GID of the
calling user.
&lt;/description&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;yarn.nodemanager.runtime.linux.docker.userremapping-uid-threshold&lt;/name&gt;
&lt;value&gt;1&lt;/value&gt;
&lt;description&gt;
Optional. The minimum acceptable UID for a remapped user. Users with UIDs
lower than this value will not be allowed to launch containers when user
remapping is enabled.
&lt;/description&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;yarn.nodemanager.runtime.linux.docker.userremapping-gid-threshold&lt;/name&gt;
&lt;value&gt;1&lt;/value&gt;
&lt;description&gt;
Optional. The minimum acceptable GID for a remapped user. Users belonging
to any group with a GID lower than this value will not be allowed to
launch containers when user remapping is enabled.
&lt;/description&gt;
&lt;/property&gt;
&lt;/configuration&gt;
</pre></div></div>
<p>In addition, a container-executor.cfg file must exist and contain settings for the container executor. The file must be owned by root with permissions 0400. The format of the file is the standard Java properties file format, for example</p>
<div class="source">
<div class="source">
<pre>`key=value`
</pre></div></div>
<p>The following properties are required to enable Docker support:</p>
<table border="0" class="bodyTable">
<thead>
<tr class="a">
<th align="left">Configuration Name </th>
<th align="left"> Description </th></tr>
</thead><tbody>
<tr class="b">
<td align="left"> <code>yarn.nodemanager.linux-container-executor.group</code> </td>
<td align="left"> The Unix group of the NodeManager. It should match the yarn.nodemanager.linux-container-executor.group in the yarn-site.xml file. </td></tr>
</tbody>
</table>
<p>The container-executor.cfg must contain a section to determine the capabilities that containers are allowed. It contains the following properties:</p>
<table border="0" class="bodyTable">
<thead>
<tr class="a">
<th align="left">Configuration Name </th>
<th align="left"> Description </th></tr>
</thead><tbody>
<tr class="b">
<td align="left"> <code>module.enabled</code> </td>
<td align="left"> Must be &#x201c;true&#x201d; or &#x201c;false&#x201d; to enable or disable launching Docker containers respectively. Default value is 0. </td></tr>
<tr class="a">
<td align="left"> <code>docker.binary</code> </td>
<td align="left"> The binary used to launch Docker containers. /usr/bin/docker by default. </td></tr>
<tr class="b">
<td align="left"> <code>docker.allowed.capabilities</code> </td>
<td align="left"> Comma separated capabilities that containers are allowed to add. By default no capabilities are allowed to be added. </td></tr>
<tr class="a">
<td align="left"> <code>docker.allowed.devices</code> </td>
<td align="left"> Comma separated devices that containers are allowed to mount. By default no devices are allowed to be added. </td></tr>
<tr class="b">
<td align="left"> <code>docker.allowed.networks</code> </td>
<td align="left"> Comma separated networks that containers are allowed to use. If no network is specified when launching the container, the default Docker network will be used. </td></tr>
<tr class="a">
<td align="left"> <code>docker.allowed.ro-mounts</code> </td>
<td align="left"> Comma separated directories that containers are allowed to mount in read-only mode. By default, no directories are allowed to mounted. </td></tr>
<tr class="b">
<td align="left"> <code>docker.allowed.rw-mounts</code> </td>
<td align="left"> Comma separated directories that containers are allowed to mount in read-write mode. By default, no directories are allowed to mounted. </td></tr>
<tr class="a">
<td align="left"> <code>docker.allowed.volume-drivers</code> </td>
<td align="left"> Comma separated list of volume drivers which are allowed to be used. By default, no volume drivers are allowed. </td></tr>
<tr class="b">
<td align="left"> <code>docker.host-pid-namespace.enabled</code> </td>
<td align="left"> Set to &#x201c;true&#x201d; or &#x201c;false&#x201d; to enable or disable using the host&#x2019;s PID namespace. Default value is &#x201c;false&#x201d;. </td></tr>
<tr class="a">
<td align="left"> <code>docker.privileged-containers.enabled</code> </td>
<td align="left"> Set to &#x201c;true&#x201d; or &#x201c;false&#x201d; to enable or disable launching privileged containers. Default value is &#x201c;false&#x201d;. </td></tr>
<tr class="b">
<td align="left"> <code>docker.privileged-containers.registries</code> </td>
<td align="left"> Comma separated list of privileged docker registries for running privileged docker containers. By default, no registries are defined. </td></tr>
<tr class="a">
<td align="left"> <code>docker.trusted.registries</code> </td>
<td align="left"> Comma separated list of trusted docker registries for running trusted privileged docker containers. By default, no registries are defined. </td></tr>
<tr class="b">
<td align="left"> <code>docker.inspect.max.retries</code> </td>
<td align="left"> Integer value to check docker container readiness. Each inspection is set with 3 seconds delay. Default value of 10 will wait 30 seconds for docker container to become ready before marked as container failed. </td></tr>
<tr class="a">
<td align="left"> <code>docker.no-new-privileges.enabled</code> </td>
<td align="left"> Enable/disable the no-new-privileges flag for docker run. Set to &#x201c;true&#x201d; to enable, disabled by default. </td></tr>
<tr class="b">
<td align="left"> <code>docker.allowed.runtimes</code> </td>
<td align="left"> Comma separated runtimes that containers are allowed to use. By default no runtimes are allowed to be added.</td></tr>
<tr class="a">
<td align="left"> <code>docker.service-mode.enabled</code> </td>
<td align="left"> Set to &#x201c;true&#x201d; or &#x201c;false&#x201d; to enable or disable docker container service mode. Default value is &#x201c;false&#x201d;. </td></tr>
</tbody>
</table>
<p>Please note that if you wish to run Docker containers that require access to the YARN local directories, you must add them to the docker.allowed.rw-mounts list.</p>
<p>In addition, containers are not permitted to mount any parent of the container-executor.cfg directory in read-write mode.</p>
<p>The following properties are optional:</p>
<table border="0" class="bodyTable">
<thead>
<tr class="a">
<th align="left">Configuration Name </th>
<th align="left"> Description </th></tr>
</thead><tbody>
<tr class="b">
<td align="left"> <code>min.user.id</code> </td>
<td align="left"> The minimum UID that is allowed to launch applications. The default is no minimum </td></tr>
<tr class="a">
<td align="left"> <code>banned.users</code> </td>
<td align="left"> A comma-separated list of usernames who should not be allowed to launch applications. The default setting is: yarn, mapred, hdfs, and bin. </td></tr>
<tr class="b">
<td align="left"> <code>allowed.system.users</code> </td>
<td align="left"> A comma-separated list of usernames who should be allowed to launch applications even if their UIDs are below the configured minimum. If a user appears in allowed.system.users and banned.users, the user will be considered banned. </td></tr>
<tr class="a">
<td align="left"> <code>feature.tc.enabled</code> </td>
<td align="left"> Must be &#x201c;true&#x201d; or &#x201c;false&#x201d;. &#x201c;false&#x201d; means traffic control commands are disabled. &#x201c;true&#x201d; means traffic control commands are allowed. </td></tr>
<tr class="b">
<td align="left"> <code>feature.yarn.sysfs.enabled</code> </td>
<td align="left"> Must be &#x201c;true&#x201d; or &#x201c;false&#x201d;. See YARN sysfs support for detail. The default setting is disabled. </td></tr>
</tbody>
</table>
<p>Part of a container-executor.cfg which allows Docker containers to be launched is below:</p>
<div class="source">
<div class="source">
<pre>yarn.nodemanager.linux-container-executor.group=yarn
[docker]
module.enabled=true
docker.privileged-containers.enabled=true
docker.privileged-containers.registries=local
docker.trusted.registries=centos
docker.allowed.capabilities=SYS_CHROOT,MKNOD,SETFCAP,SETPCAP,FSETID,CHOWN,AUDIT_WRITE,SETGID,NET_RAW,FOWNER,SETUID,DAC_OVERRIDE,KILL,NET_BIND_SERVICE
docker.allowed.networks=bridge,host,none
docker.allowed.ro-mounts=/sys/fs/cgroup
docker.allowed.rw-mounts=/var/hadoop/yarn/local-dir,/var/hadoop/yarn/log-dir
</pre></div></div>
</section><section>
<h2><a name="Docker_Image_Requirements"></a>Docker Image Requirements</h2>
<p>In order to work with YARN, there are two requirements for Docker images.</p>
<p>First, the Docker container will be explicitly launched with the application owner as the container user. If the application owner is not a valid user in the Docker image, the application will fail. The container user is specified by the user&#x2019;s UID. If the user&#x2019;s UID is different between the NodeManager host and the Docker image, the container may be launched as the wrong user or may fail to launch because the UID does not exist. See <a href="#user-management">User Management in Docker Container</a> section for more details.</p>
<p>Second, the Docker image must have whatever is expected by the application in order to execute. In the case of Hadoop (MapReduce or Spark), the Docker image must contain the JRE and Hadoop libraries and have the necessary environment variables set: JAVA_HOME, HADOOP_COMMON_PATH, HADOOP_HDFS_HOME, HADOOP_MAPRED_HOME, HADOOP_YARN_HOME, and HADOOP_CONF_DIR. Note that the Java and Hadoop component versions available in the Docker image must be compatible with what&#x2019;s installed on the cluster and in any other Docker images being used for other tasks of the same job. Otherwise the Hadoop components started in the Docker container may be unable to communicate with external Hadoop components.</p>
<p>If a Docker image has a <a class="externalLink" href="https://docs.docker.com/engine/reference/builder/#cmd">command</a> set, the behavior will depend on whether the <code>YARN_CONTAINER_RUNTIME_DOCKER_RUN_OVERRIDE_DISABLE</code> is set to true. If so, the command will be overridden when LCE launches the image with YARN&#x2019;s container launch script.</p>
<p>If a Docker image has an entry point set and YARN_CONTAINER_RUNTIME_DOCKER_RUN_OVERRIDE_DISABLE is set to true, launch_command will be passed to ENTRYPOINT program as CMD parameters in Docker. The format of launch_command looks like: param1,param2 and this translates to CMD [ &#x201c;param1&#x201d;,&#x201c;param2&#x201d; ] in Docker.</p>
<p>If an application requests a Docker image that has not already been loaded by the Docker daemon on the host where it is to execute, the Docker daemon will implicitly perform a Docker pull command. Both MapReduce and Spark assume that tasks which take more that 10 minutes to report progress have stalled, so specifying a large Docker image may cause the application to fail.</p></section><section>
<h2><a name="CGroups_configuration_Requirements"></a>CGroups configuration Requirements</h2>
<p>The Docker plugin utilizes cgroups to limit resource usage of individual containers. Since launched containers belong to YARN, the command line option <code>--cgroup-parent</code> is used to define the appropriate control group.</p>
<p>Docker supports two different cgroups driver: <code>cgroupfs</code> and <code>systemd</code>. Note that only <code>cgroupfs</code> is supported - attempt to launch a Docker container with <code>systemd</code> results in the following, similar error message:</p>
<div class="source">
<div class="source">
<pre>Container id: container_1561638268473_0006_01_000002
Exit code: 7
Exception message: Launch container failed
Shell error output: /usr/bin/docker-current: Error response from daemon: cgroup-parent for systemd cgroup should be a valid slice named as &quot;xxx.slice&quot;.
See '/usr/bin/docker-current run --help'.
Shell output: main : command provided 4
</pre></div></div>
<p>This means you have to reconfigure the Docker deamon on each host where <code>systemd</code> driver is used.</p>
<p>Depending on what OS Hadoop is running on, reconfiguration might require different steps. However, if <code>systemd</code> was chosen for cgroups driver, it is likely that the <code>systemctl</code> command is available on the system.</p>
<p>Check the <code>ExecStart</code> property of the Docker daemon:</p>
<div class="source">
<div class="source">
<pre>~$ systemctl show --no-pager --property=ExecStart docker.service
ExecStart={ path=/usr/bin/dockerd-current ; argv[]=/usr/bin/dockerd-current --add-runtime
docker-runc=/usr/libexec/docker/docker-runc-current --default-runtime=docker-runc --exec-opt native.cgroupdriver=systemd
--userland-proxy-path=/usr/libexec/docker/docker-proxy-current
--init-path=/usr/libexec/docker/docker-init-current
--seccomp-profile=/etc/docker/seccomp.json
$OPTIONS $DOCKER_STORAGE_OPTIONS $DOCKER_NETWORK_OPTIONS $ADD_REGISTRY $BLOCK_REGISTRY $INSECURE_REGISTRY $REGISTRIES ;
ignore_errors=no ; start_time=[n/a] ; stop_time=[n/a] ; pid=0 ; code=(null) ; status=0/0 }
</pre></div></div>
<p>This example shows that the <code>native.cgroupdriver</code> is <code>systemd</code>. You have to modify that in the unit file of the daemon.</p>
<div class="source">
<div class="source">
<pre>~$ sudo systemctl edit --full docker.service
</pre></div></div>
<p>This brings up the whole configuration for editing. Just replace the <code>systemd</code> string to <code>cgroupfs</code>. Save the changes and restart both the systemd and Docker daemon:</p>
<div class="source">
<div class="source">
<pre>~$ sudo systemctl daemon-reload
~$ sudo systemctl restart docker.service
</pre></div></div>
</section><section>
<h2><a name="Application_Submission"></a>Application Submission</h2>
<p>Before attempting to launch a Docker container, make sure that the LCE configuration is working for applications requesting regular YARN containers. If after enabling the LCE one or more NodeManagers fail to start, the cause is most likely that the ownership and/or permissions on the container-executor binary are incorrect. Check the logs to confirm.</p>
<p>In order to run an application in a Docker container, set the following environment variables in the application&#x2019;s environment:</p>
<table border="0" class="bodyTable">
<thead>
<tr class="a">
<th align="left"> Environment Variable Name </th>
<th align="left"> Description </th></tr>
</thead><tbody>
<tr class="b">
<td align="left"> <code>YARN_CONTAINER_RUNTIME_TYPE</code> </td>
<td align="left"> Determines whether an application will be launched in a Docker container. If the value is &#x201c;docker&#x201d;, the application will be launched in a Docker container. Otherwise a regular process tree container will be used. </td></tr>
<tr class="a">
<td align="left"> <code>YARN_CONTAINER_RUNTIME_DOCKER_IMAGE</code> </td>
<td align="left"> Names which image will be used to launch the Docker container. Any image name that could be passed to the Docker client&#x2019;s run command may be used. The image name may include a repo prefix. </td></tr>
<tr class="b">
<td align="left"> <code>YARN_CONTAINER_RUNTIME_DOCKER_RUN_OVERRIDE_DISABLE</code> </td>
<td align="left"> Controls whether the Docker container&#x2019;s default command is overridden. When set to true, the Docker container&#x2019;s command will be &#x201c;bash <i>path_to_launch_script</i>&#x201d;. When unset or set to false, the Docker container&#x2019;s default command is used. </td></tr>
<tr class="a">
<td align="left"> <code>YARN_CONTAINER_RUNTIME_DOCKER_CONTAINER_NETWORK</code> </td>
<td align="left"> Sets the network type to be used by the Docker container. It must be a valid value as determined by the yarn.nodemanager.runtime.linux.docker.allowed-container-networks property. </td></tr>
<tr class="b">
<td align="left"> <code>YARN_CONTAINER_RUNTIME_DOCKER_PORTS_MAPPING</code> </td>
<td align="left"> Allows a user to specify ports mapping for the bridge network Docker container. The value of the environment variable should be a comma-separated list of ports mapping. It&#x2019;s the same to &#x201c;-p&#x201d; option for the Docker run command. If the value is empty, &#x201c;-P&#x201d; will be added. </td></tr>
<tr class="a">
<td align="left"> <code>YARN_CONTAINER_RUNTIME_DOCKER_CONTAINER_PID_NAMESPACE</code> </td>
<td align="left"> Controls which PID namespace will be used by the Docker container. By default, each Docker container has its own PID namespace. To share the namespace of the host, the yarn.nodemanager.runtime.linux.docker.host-pid-namespace.allowed property must be set to true. If the host PID namespace is allowed and this environment variable is set to host, the Docker container will share the host&#x2019;s PID namespace. No other value is allowed. </td></tr>
<tr class="b">
<td align="left"> <code>YARN_CONTAINER_RUNTIME_DOCKER_RUN_PRIVILEGED_CONTAINER</code> </td>
<td align="left"> Controls whether the Docker container is a privileged container. In order to use privileged containers, the yarn.nodemanager.runtime.linux.docker.privileged-containers.allowed property must be set to true, and the application owner must appear in the value of the yarn.nodemanager.runtime.linux.docker.privileged-containers.acl property. If this environment variable is set to true, a privileged Docker container will be used if allowed. No other value is allowed, so the environment variable should be left unset rather than setting it to false. </td></tr>
<tr class="a">
<td align="left"> <code>YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS</code> </td>
<td align="left"> Adds additional volume mounts to the Docker container. The value of the environment variable should be a comma-separated list of mounts. All such mounts must be given as <code>source:dest[:mode]</code> and the mode must be &#x201c;ro&#x201d; (read-only) or &#x201c;rw&#x201d; (read-write) to specify the type of access being requested. If neither is specified, read-write will be assumed. The mode may include a bind propagation option. In that case, the mode should either be of the form <code>[option]</code>, <code>rw+[option]</code>, or <code>ro+[option]</code>. Valid bind propagation options are shared, rshared, slave, rslave, private, and rprivate. The requested mounts will be validated by container-executor based on the values set in container-executor.cfg for <code>docker.allowed.ro-mounts</code> and <code>docker.allowed.rw-mounts</code>. </td></tr>
<tr class="b">
<td align="left"> <code>YARN_CONTAINER_RUNTIME_DOCKER_TMPFS_MOUNTS</code> </td>
<td align="left"> Adds additional tmpfs mounts to the Docker container. The value of the environment variable should be a comma-separated list of absolute mount points within the container. </td></tr>
<tr class="a">
<td align="left"> <code>YARN_CONTAINER_RUNTIME_DOCKER_DELAYED_REMOVAL</code> </td>
<td align="left"> Allows a user to request delayed deletion of the Docker container on a per container basis. If true, Docker containers will not be removed until the duration defined by yarn.nodemanager.delete.debug-delay-sec has elapsed. Administrators can disable this feature through the yarn-site property yarn.nodemanager.runtime.linux.docker.delayed-removal.allowed. This feature is disabled by default. When this feature is disabled or set to false, the container will be removed as soon as it exits. </td></tr>
<tr class="b">
<td align="left"> <code>YARN_CONTAINER_RUNTIME_YARN_SYSFS_ENABLE</code> </td>
<td align="left"> Enable mounting of container working directory sysfs sub-directory into Docker container /hadoop/yarn/sysfs. This is useful for populating cluster information into container. </td></tr>
<tr class="a">
<td align="left"> <code>YARN_CONTAINER_RUNTIME_DOCKER_SERVICE_MODE</code> </td>
<td align="left"> Enable Service Mode which runs the docker container as defined by the image but does not set the user (&#x2013;user and &#x2013;group-add). </td></tr>
<tr class="b">
<td align="left"> <code>YARN_CONTAINER_RUNTIME_DOCKER_CLIENT_CONFIG</code> </td>
<td align="left"> Sets the docker client config using which docker container executor can access the remote docker image. </td></tr>
</tbody>
</table>
<p>The first two are required. The remainder can be set as needed. While controlling the container type through environment variables is somewhat less than ideal, it allows applications with no awareness of YARN&#x2019;s Docker support (such as MapReduce and Spark) to nonetheless take advantage of it through their support for configuring the application environment.</p>
<p>Once an application has been submitted to be launched in a Docker container, the application will behave exactly as any other YARN application. Logs will be aggregated and stored in the relevant history server. The application life cycle will be the same as for a non-Docker application.</p></section><section>
<h2><a name="Using_Docker_Bind_Mounted_Volumes"></a>Using Docker Bind Mounted Volumes</h2>
<p><b>WARNING</b> Care should be taken when enabling this feature. Enabling access to directories such as, but not limited to, /, /etc, /run, or /home is not advisable and can result in containers negatively impacting the host or leaking sensitive information. <b>WARNING</b></p>
<p>Files and directories from the host are commonly needed within the Docker containers, which Docker provides through <a class="externalLink" href="https://docs.docker.com/engine/tutorials/dockervolumes/">volumes</a>. Examples include localized resources, Apache Hadoop binaries, and sockets. To facilitate this need, YARN-6623 added the ability for administrators to set a whitelist of host directories that are allowed to be bind mounted as volumes into containers. YARN-5534 added the ability for users to supply a list of mounts that will be mounted into the containers, if allowed by the administrative whitelist.</p>
<p>In order to make use of this feature, the following must be configured.</p>
<ul>
<li>The administrator must define the volume whitelist in container-executor.cfg by setting <code>docker.allowed.ro-mounts</code> and <code>docker.allowed.rw-mounts</code> to the list of parent directories that are allowed to be mounted.</li>
<li>The application submitter requests the required volumes at application submission time using the <code>YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS</code> environment variable.</li>
</ul>
<p>The administrator supplied whitelist is defined as a comma separated list of directories that are allowed to be mounted into containers. The source directory supplied by the user must either match or be a child of the specified directory.</p>
<p>The user supplied mount list is defined as a comma separated list in the form <i>source</i>:<i>destination</i> or <i>source</i>:<i>destination</i>:<i>mode</i>. The source is the file or directory on the host. The destination is the path within the container where the source will be bind mounted. The mode defines the mode the user expects for the mount, which can be ro (read-only) or rw (read-write). If not specified, rw is assumed. The mode may also include a bind propagation option (shared, rshared, slave, rslave, private, or rprivate). In that case, the mode should be of the form <i>option</i>, rw+<i>option</i>, or ro+<i>option</i>.</p>
<p>The following example outlines how to use this feature to mount the commonly needed /sys/fs/cgroup directory into the container running on YARN.</p>
<p>The administrator sets docker.allowed.ro-mounts in container-executor.cfg to &#x201c;/sys/fs/cgroup&#x201d;. Applications can now request that &#x201c;/sys/fs/cgroup&#x201d; be mounted from the host into the container in read-only mode.</p>
<p>At application submission time, the YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS environment variable can then be set to request this mount. In this example, the environment variable would be set to &#x201c;/sys/fs/cgroup:/sys/fs/cgroup:ro&#x201d;. The destination path is not restricted, &#x201c;/sys/fs/cgroup:/cgroup:ro&#x201d; would also be valid given the example admin whitelist.</p></section><section>
<h2><a name="User_Management_in_Docker_Container"></a><a href="#user-management"></a>User Management in Docker Container</h2>
<p>YARN&#x2019;s Docker container support launches container processes using the uid:gid identity of the user, as defined on the NodeManager host. User and group name mismatches between the NodeManager host and container can lead to permission issues, failed container launches, or even security holes. Centralizing user and group management for both hosts and containers greatly reduces these risks. When running containerized applications on YARN, it is necessary to understand which uid:gid pair will be used to launch the container&#x2019;s process.</p>
<p>As an example of what is meant by uid:gid pair, consider the following. By default, in non-secure mode, YARN will launch processes as the user <code>nobody</code> (see the table at the bottom of <a href="./NodeManagerCgroups.html">Using CGroups with YARN</a> for how the run as user is determined in non-secure mode). On CentOS based systems, the <code>nobody</code> user&#x2019;s uid is <code>99</code> and the <code>nobody</code> group is <code>99</code>. As a result, YARN will call <code>docker run</code> with <code>--user 99:99</code>. If the <code>nobody</code> user does not have the uid <code>99</code> in the container, the launch may fail or have unexpected results.</p>
<p>One exception to this rule is the use of Privileged Docker containers. Privileged containers will not set the uid:gid pair when launching the container and will honor the USER or GROUP entries in the Dockerfile. This allows running privileged containers as any user which has security implications. Please understand these implications before enabling Privileged Docker containers.</p>
<p>There are many ways to address user and group management. Docker, by default, will authenticate users against <code>/etc/passwd</code> (and <code>/etc/shadow</code>) within the container. Using the default <code>/etc/passwd</code> supplied in the Docker image is unlikely to contain the appropriate user entries and will result in launch failures. It is highly recommended to centralize user and group management. Several approaches to user and group management are outlined below.</p><section>
<h3><a name="Static_user_management"></a>Static user management</h3>
<p>The most basic approach to managing user and groups is to modify the user and group within the Docker image. This approach is only viable in non-secure mode where all container processes will be launched as a single known user, for instance <code>nobody</code>. In this case, the only requirement is that the uid:gid pair of the nobody user and group must match between the host and container. On a CentOS based system, this means that the nobody user in the container needs the UID <code>99</code> and the nobody group in the container needs GID <code>99</code>.</p>
<p>One approach to change the UID and GID is by leveraging <code>usermod</code> and <code>groupmod</code>. The following sets the correct UID and GID for the nobody user/group.</p>
<div class="source">
<div class="source">
<pre>usermod -u 99 nobody
groupmod -g 99 nobody
</pre></div></div>
<p>This approach is not recommended beyond testing given the inflexibility to add users.</p></section><section>
<h3><a name="Bind_mounting"></a>Bind mounting</h3>
<p>When organizations already have automation in place to create local users on each system, it may be appropriate to bind mount /etc/passwd and /etc/group into the container as an alternative to modifying the container image directly. To enable the ability to bind mount /etc/passwd and /etc/group, update <code>docker.allowed.ro-mounts</code> in <code>container-executor.cfg</code> to include those paths. When submitting the application, <code>YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS</code> will need to include <code>/etc/passwd:/etc/passwd:ro</code> and <code>/etc/group:/etc/group:ro</code>.</p>
<p>There are several challenges with this bind mount approach that need to be considered.</p>
<ol style="list-style-type: decimal">
<li>Any users and groups defined in the image will be overwritten by the host&#x2019;s users and groups</li>
<li>No users and groups can be added once the container is started, as /etc/passwd and /etc/group are immutible in the container. Do not mount these read-write as it can render the host inoperable.</li>
</ol>
<p>This approach is not recommended beyond testing given the inflexibility to modify running containers.</p></section><section>
<h3><a name="SSSD"></a>SSSD</h3>
<p>An alternative approach that allows for centrally managing users and groups is SSSD. System Security Services Daemon (SSSD) provides access to different identity and authentication providers, such as LDAP or Active Directory.</p>
<p>The traditional schema for Linux authentication is as follows:</p>
<div class="source">
<div class="source">
<pre>application -&gt; libpam -&gt; pam_authenticate -&gt; pam_unix.so -&gt; /etc/passwd
</pre></div></div>
<p>If we use SSSD for user lookup, it becomes:</p>
<div class="source">
<div class="source">
<pre>application -&gt; libpam -&gt; pam_authenticate -&gt; pam_sss.so -&gt; SSSD -&gt; pam_unix.so -&gt; /etc/passwd
</pre></div></div>
<p>We can bind-mount the UNIX sockets SSSD communicates over into the container. This will allow the SSSD client side libraries to authenticate against the SSSD running on the host. As a result, user information does not need to exist in /etc/passwd of the docker image and will instead be serviced by SSSD.</p>
<p>Step by step configuration for host and container:</p>
<ol style="list-style-type: decimal">
<li>Host config</li>
</ol>
<ul>
<li>Install packages
<div class="source">
<div class="source">
<pre># yum -y install sssd-common sssd-proxy
</pre></div></div>
</li>
<li>create a PAM service for the container.
<div class="source">
<div class="source">
<pre># cat /etc/pam.d/sss_proxy
auth required pam_unix.so
account required pam_unix.so
password required pam_unix.so
session required pam_unix.so
</pre></div></div>
</li>
<li>create SSSD config file, /etc/sssd/sssd.conf Please note that the permissions must be 0600 and the file must be owned by root:root.
<div class="source">
<div class="source">
<pre># cat /etc/sssd/sssd/conf
[sssd]
services = nss,pam
config_file_version = 2
domains = proxy
[nss]
[pam]
[domain/proxy]
id_provider = proxy
proxy_lib_name = files
proxy_pam_target = sss_proxy
</pre></div></div>
</li>
<li>start sssd
<div class="source">
<div class="source">
<pre># systemctl start sssd
</pre></div></div>
</li>
<li>verify a user can be retrieved with sssd
<div class="source">
<div class="source">
<pre># getent passwd -s sss localuser
</pre></div></div>
</li>
</ul>
<ol style="list-style-type: decimal">
<li>Container setup</li>
</ol>
<p>It&#x2019;s important to bind-mount the /var/lib/sss/pipes directory from the host to the container since SSSD UNIX sockets are located there.</p>
<div class="source">
<div class="source">
<pre>-v /var/lib/sss/pipes:/var/lib/sss/pipes:rw
</pre></div></div>
<ol style="list-style-type: decimal">
<li>Container config</li>
</ol>
<p>All the steps below should be executed on the container itself.</p>
<ul>
<li>
<p>Install only the sss client libraries</p>
<div class="source">
<div class="source">
<pre># yum -y install sssd-client
</pre></div></div>
</li>
<li>
<p>make sure sss is configured for passwd and group databases in</p>
<div class="source">
<div class="source">
<pre>/etc/nsswitch.conf
</pre></div></div>
</li>
<li>
<p>configure the PAM service that the application uses to call into SSSD</p>
<div class="source">
<div class="source">
<pre># cat /etc/pam.d/system-auth
#%PAM-1.0
# This file is auto-generated.
# User changes will be destroyed the next time authconfig is run.
auth required pam_env.so
auth sufficient pam_unix.so try_first_pass nullok
auth sufficient pam_sss.so forward_pass
auth required pam_deny.so
account required pam_unix.so
account [default=bad success=ok user_unknown=ignore] pam_sss.so
account required pam_permit.so
password requisite pam_pwquality.so try_first_pass local_users_only retry=3 authtok_type=
password sufficient pam_unix.so try_first_pass use_authtok nullok sha512 shadow
password sufficient pam_sss.so use_authtok
password required pam_deny.so
session optional pam_keyinit.so revoke
session required pam_limits.so
-session optional pam_systemd.so
session [success=1 default=ignore] pam_succeed_if.so service in crond quiet use_uid
session required pam_unix.so
session optional pam_sss.so
</pre></div></div>
</li>
<li>
<p>Save the docker image and use the docker image as base image for your applications.</p>
</li>
<li>
<p>test the docker image launched in YARN environment.</p>
<div class="source">
<div class="source">
<pre>$ id
uid=5000(localuser) gid=5000(localuser) groups=5000(localuser),1337(hadoop)
</pre></div></div>
</li>
</ul></section></section><section>
<h2><a name="Privileged_Container_Security_Consideration"></a>Privileged Container Security Consideration</h2>
<p>Privileged docker container can interact with host system devices. This can cause harm to host operating system without proper care. In order to mitigate risk of allowing privileged container to run on Hadoop cluster, we implemented a controlled process to sandbox unauthorized privileged docker images.</p>
<p>The default behavior disallows any privileged docker containers. Privileged docker is only allowed with ENTRYPOINT enabled docker image, and <code>docker.privileged-containers.enabled</code> is set to enabled. Docker image can run with root privileges in the docker container, but access to host level devices are disabled. This allows developer and tester to run docker images from internet with some restrictions to prevent harm to host operating system.</p>
<p>When docker images have been certified by developers and testers to be trustworthy. The trusted image can be promoted to trusted docker registry. System administrator can define <code>docker.trusted.registries</code>, and setup private docker registry server to promote trusted images. System administrator may choose to allow official docker images from Docker Hub to be part of trusted registries. &#x201c;library&#x201d; is the name to use for trusting official docker images. Container-executor.cfg example:</p>
<div class="source">
<div class="source">
<pre>[docker]
docker.privileged-containers.enabled=true
docker.trusted.registries=library
</pre></div></div>
<p>Fine grained access control can also be defined using <code>docker.privileged-containers.registries</code> to allow only a subset of Docker images to run as privileged containers. If <code>docker.privileged-containers.registries</code> is not defined, YARN will fall back to use <code>docker.trusted.registries</code> as access control for privileged Docker images. Fine grained access control example:</p>
<div class="source">
<div class="source">
<pre>[docker]
docker.privileged-containers.enabled=true
docker.privileged-containers.registries=local/centos:latest
docker.trusted.registries=library
</pre></div></div>
<p>In development environment, local images can be tagged with a repository name prefix to enable trust. The recommendation of choosing a repository name is using a local hostname and port number to prevent accidentially pulling docker images from Docker Hub or use reserved Docker Hub keyword: &#x201c;local&#x201d;. Docker run will look for docker images on Docker Hub, if the image does not exist locally. Using a local hostname and port in image name can prevent accidental pulling of canonical images from docker hub. Example of tagging image with localhost:5000 as trusted registry:</p>
<div class="source">
<div class="source">
<pre>docker tag centos:latest localhost:5000/centos:latest
</pre></div></div>
<p>Let&#x2019;s say you have an Ubuntu-based image with some changes in the local repository and you wish to use it. The following example tags the <code>local_ubuntu</code> image:</p>
<div class="source">
<div class="source">
<pre>docker tag local_ubuntu local/ubuntu:latest
</pre></div></div>
<p>Next, you have to add <code>local</code> to <code>docker.trusted.registries</code>. The image can be referenced by using <code>local/ubuntu</code>.</p>
<p>Trusted images are allowed to mount external devices such as HDFS via NFS gateway, or host level Hadoop configuration. If system administrators allow writing to external volumes using <code>docker.allow.rw-mounts directive</code>, privileged docker container can have full control of host level files in the predefined volumes.</p>
<p>For <a href="./yarn-service/Examples.html">YARN Service HTTPD example</a>, container-executor.cfg must define centos docker registry to be trusted for the example to run.</p></section><section>
<h2><a name="Container_Reacquisition_Requirements"></a>Container Reacquisition Requirements</h2>
<p>On restart, the NodeManager, as part of the NodeManager&#x2019;s recovery process, will validate that a container is still running by checking for the existence of the container&#x2019;s PID directory in the /proc filesystem. For security purposes, operating system administrator may enable the <i>hidepid</i> mount option for the /proc filesystem. If the <i>hidepid</i> option is enabled, the <i>yarn</i> user&#x2019;s primary group must be whitelisted by setting the gid mount flag similar to below. Without the <i>yarn</i> user&#x2019;s primary group whitelisted, container reacquisition will fail and the container will be killed on NodeManager restart.</p>
<div class="source">
<div class="source">
<pre>proc /proc proc nosuid,nodev,noexec,hidepid=2,gid=yarn 0 0
</pre></div></div>
</section><section>
<h2><a name="Connecting_to_a_Docker_Trusted_Registry"></a>Connecting to a Docker Trusted Registry</h2>
<p>The Docker client command will draw its configuration from the default location, which is $HOME/.docker/config.json on the NodeManager host. The Docker configuration is where secure repository credentials are stored, so use of the LCE with secure Docker repos is discouraged using this method.</p>
<p>YARN-5428 added support to Distributed Shell for securely supplying the Docker client configuration. See the Distributed Shell help for usage. Support for additional frameworks is planned.</p>
<p>As a work-around, you may manually log the Docker daemon on every NodeManager host into the secure repo using the Docker login command:</p>
<div class="source">
<div class="source">
<pre> docker login [OPTIONS] [SERVER]
Register or log in to a Docker registry server, if no server is specified
&quot;https://index.docker.io/v1/&quot; is the default.
-e, --email=&quot;&quot; Email
-p, --password=&quot;&quot; Password
-u, --username=&quot;&quot; Username
</pre></div></div>
<p>Note that this approach means that all users will have access to the secure repo.</p>
<p>Hadoop integrates with Docker Trusted Registry via YARN service API. Docker registry can store Docker images on HDFS, S3 or external storage using CSI driver.</p><section>
<h3><a name="Docker_Registry_on_HDFS"></a>Docker Registry on HDFS</h3>
<p>NFS Gateway provides capability to mount HDFS as NFS mount point. Docker Registry can configure to write to HDFS mount point using standard file system API.</p>
<p>In hdfs-site.xml, configure NFS configuration:</p>
<div class="source">
<div class="source">
<pre> &lt;property&gt;
&lt;name&gt;nfs.exports.allowed.hosts&lt;/name&gt;
&lt;value&gt;* rw&lt;/value&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;nfs.file.dump.dir&lt;/name&gt;
&lt;value&gt;/tmp/.hdfs-nfs&lt;/value&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;nfs.kerberos.principal&lt;/name&gt;
&lt;value&gt;nfs/_HOST@EXAMPLE.COM&lt;/value&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;nfs.keytab.file&lt;/name&gt;
&lt;value&gt;/etc/security/keytabs/nfs.service.keytab&lt;/value&gt;
&lt;/property&gt;
</pre></div></div>
<p>Run NFS Gateway on all datanodes as hdfs user using:</p>
<div class="source">
<div class="source">
<pre>$ $HADOOP_HOME/bin/hdfs --daemon start nfs3
</pre></div></div>
<p>On each datanode, nfs mount point is exposed to /hdfs, using:</p>
<div class="source">
<div class="source">
<pre># mount -t nfs -o vers=3,proto=tcp,nolock,noacl,sync $DN_IP:/ /hdfs
</pre></div></div>
<p>Where DN_IP is the IP address of the datanode.</p>
<p>Container-executor.cfg is configured to allow trusted Docker images from library.</p>
<div class="source">
<div class="source">
<pre>[docker]
docker.privileged-containers.enabled=true
docker.trusted.registries=library,registry.docker-registry.registry.example.com:5000
docker.allowed.rw-mounts=/tmp,/usr/local/hadoop/logs,/hdfs
</pre></div></div>
<p>Docker Registry can be started using YARN service: registry.json</p>
<div class="source">
<div class="source">
<pre>{
&quot;name&quot;: &quot;docker-registry&quot;,
&quot;version&quot;: &quot;1.0&quot;,
&quot;kerberos_principal&quot; : {
&quot;principal_name&quot; : &quot;registry/_HOST@EXAMPLE.COM&quot;,
&quot;keytab&quot; : &quot;file:///etc/security/keytabs/registry.service.keytab&quot;
},
&quot;components&quot; :
[
{
&quot;name&quot;: &quot;registry&quot;,
&quot;number_of_containers&quot;: 1,
&quot;artifact&quot;: {
&quot;id&quot;: &quot;registry:latest&quot;,
&quot;type&quot;: &quot;DOCKER&quot;
},
&quot;resource&quot;: {
&quot;cpus&quot;: 1,
&quot;memory&quot;: &quot;256&quot;
},
&quot;run_privileged_container&quot;: true,
&quot;configuration&quot;: {
&quot;env&quot;: {
&quot;YARN_CONTAINER_RUNTIME_DOCKER_RUN_OVERRIDE_DISABLE&quot;:&quot;true&quot;,
&quot;YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS&quot;:&quot;/hdfs/apps/docker/registry:/var/lib/registry&quot;
},
&quot;properties&quot;: {
&quot;docker.network&quot;: &quot;host&quot;
}
}
}
]
}
</pre></div></div>
<p>YARN service configures docker mounts from /hdfs/apps/docker/registry to /var/lib/registry inside docker container.</p>
<div class="source">
<div class="source">
<pre>yarn app -launch docker-registry /tmp/registry.json
</pre></div></div>
<p>Docker trusted registry is deployed in YARN framework, and the URL to access the registry following Hadoop Registry DNS format:</p>
<div class="source">
<div class="source">
<pre>registry.docker-registry.$USER.$DOMAIN:5000
</pre></div></div>
<p>When docker-registry application reaches STABLE state in YARN, user can push or pull docker images to Docker Trusted Registry by prefix image name with registry.docker-registry.registry.example.com:5000/.</p></section><section>
<h3><a name="Docker_Registry_on_S3"></a>Docker Registry on S3</h3>
<p>Docker Registry provides its own S3 driver and YAML configuration. YARN service configuration can generate YAML template, and enable direct Docker Registry to S3 storage. This option is the top choice for deploying Docker Trusted Registry on AWS. Configuring Docker registry storage driver to S3 requires mounting /etc/docker/registry/config.yml file (through YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS), which needs to configure an S3 bucket with its corresponding accesskey and secretKey.</p>
<p>Sample config.yml</p>
<div class="source">
<div class="source">
<pre>version: 0.1
log:
fields:
service: registry
http:
addr: :5000
storage:
cache:
blobdescriptor: inmemory
s3:
accesskey: #AWS_KEY#
secretkey: #AWS_SECRET#
region: #AWS_REGION#
bucket: #AWS_BUCKET#
encrypt: #ENCRYPT#
secure: #SECURE#
chunksize: 5242880
multipartcopychunksize: 33554432
multipartcopymaxconcurrency: 100
multipartcopythresholdsize: 33554432
rootdirectory: #STORAGE_PATH#
</pre></div></div>
<p>Docker Registry can be started using YARN service: registry.json</p>
<div class="source">
<div class="source">
<pre>{
&quot;name&quot;: &quot;docker-registry&quot;,
&quot;version&quot;: &quot;1.0&quot;,
&quot;kerberos_principal&quot; : {
&quot;principal_name&quot; : &quot;registry/_HOST@EXAMPLE.COM&quot;,
&quot;keytab&quot; : &quot;file:///etc/security/keytabs/registry.service.keytab&quot;
},
&quot;components&quot; :
[
{
&quot;name&quot;: &quot;registry&quot;,
&quot;number_of_containers&quot;: 1,
&quot;artifact&quot;: {
&quot;id&quot;: &quot;registry:latest&quot;,
&quot;type&quot;: &quot;DOCKER&quot;
},
&quot;resource&quot;: {
&quot;cpus&quot;: 1,
&quot;memory&quot;: &quot;256&quot;
},
&quot;run_privileged_container&quot;: true,
&quot;configuration&quot;: {
&quot;env&quot;: {
&quot;YARN_CONTAINER_RUNTIME_DOCKER_RUN_OVERRIDE_DISABLE&quot;:&quot;true&quot;,
&quot;YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS&quot;:&quot;&lt;path to config.yml&gt;:/etc/docker/registry/config.yml&quot;,
},
&quot;properties&quot;: {
&quot;docker.network&quot;: &quot;host&quot;
}
}
}
]
}
</pre></div></div>
<p>For further details and parameters that could be configured in the S3 storage driver, please refer <a class="externalLink" href="https://docs.docker.com/registry/storage-drivers/s3/">https://docs.docker.com/registry/storage-drivers/s3/</a>.</p></section></section><section>
<h2><a name="Example:_MapReduce"></a>Example: MapReduce</h2>
<p>This example assumes that Hadoop is installed to <code>/usr/local/hadoop</code>.</p>
<p>Additionally, <code>docker.allowed.ro-mounts</code> in <code>container-executor.cfg</code> has been updated to include the directories: <code>/usr/local/hadoop,/etc/passwd,/etc/group</code>.</p>
<p>To submit the pi job to run in Docker containers, run the following commands:</p>
<div class="source">
<div class="source">
<pre> HADOOP_HOME=/usr/local/hadoop
YARN_EXAMPLES_JAR=$HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar
MOUNTS=&quot;$HADOOP_HOME:$HADOOP_HOME:ro,/etc/passwd:/etc/passwd:ro,/etc/group:/etc/group:ro&quot;
IMAGE_ID=&quot;library/openjdk:8&quot;
export YARN_CONTAINER_RUNTIME_TYPE=docker
export YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=$IMAGE_ID
export YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS=$MOUNTS
yarn jar $YARN_EXAMPLES_JAR pi \
-Dmapreduce.map.env.YARN_CONTAINER_RUNTIME_TYPE=docker \
-Dmapreduce.map.env.YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS=$MOUNTS \
-Dmapreduce.map.env.YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=$IMAGE_ID \
-Dmapreduce.reduce.env.YARN_CONTAINER_RUNTIME_TYPE=docker \
-Dmapreduce.reduce.env.YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS=$MOUNTS \
-Dmapreduce.reduce.env.YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=$IMAGE_ID \
1 40000
</pre></div></div>
<p>Note that the application master, map tasks, and reduce tasks are configured independently. In this example, we are using the <code>openjdk:8</code> image for all three.</p></section><section>
<h2><a name="Example:_Spark"></a>Example: Spark</h2>
<p>This example assumes that Hadoop is installed to <code>/usr/local/hadoop</code> and Spark is installed to <code>/usr/local/spark</code>.</p>
<p>Additionally, <code>docker.allowed.ro-mounts</code> in <code>container-executor.cfg</code> has been updated to include the directories: <code>/usr/local/hadoop,/etc/passwd,/etc/group</code>.</p>
<p>To run a Spark shell in Docker containers, run the following command:</p>
<div class="source">
<div class="source">
<pre> HADOOP_HOME=/usr/local/hadoop
SPARK_HOME=/usr/local/spark
MOUNTS=&quot;$HADOOP_HOME:$HADOOP_HOME:ro,/etc/passwd:/etc/passwd:ro,/etc/group:/etc/group:ro&quot;
IMAGE_ID=&quot;library/openjdk:8&quot;
$SPARK_HOME/bin/spark-shell --master yarn \
--conf spark.yarn.appMasterEnv.YARN_CONTAINER_RUNTIME_TYPE=docker \
--conf spark.yarn.appMasterEnv.YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=$IMAGE_ID \
--conf spark.yarn.appMasterEnv.YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS=$MOUNTS \
--conf spark.executorEnv.YARN_CONTAINER_RUNTIME_TYPE=docker \
--conf spark.executorEnv.YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=$IMAGE_ID \
--conf spark.executorEnv.YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS=$MOUNTS
</pre></div></div>
<p>Note that the application master and executors are configured independently. In this example, we are using the <code>openjdk:8</code> image for both.</p>
<p>When using remote container registry, the YARN_CONTAINER_RUNTIME_DOCKER_CLIENT_CONFIG must reference the config.json file containing the credentials used to authenticate.</p>
<div class="source">
<div class="source">
<pre>DOCKER_IMAGE_NAME=hadoop-docker
DOCKER_CLIENT_CONFIG=hdfs:///user/hadoop/config.json
spark-submit --master yarn \
--deploy-mode cluster \
--conf spark.executorEnv.YARN_CONTAINER_RUNTIME_TYPE=docker \
--conf spark.executorEnv.YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=$DOCKER_IMAGE_NAME \
--conf spark.executorEnv.YARN_CONTAINER_RUNTIME_DOCKER_CLIENT_CONFIG=$DOCKER_CLIENT_CONFIG \
--conf spark.yarn.appMasterEnv.YARN_CONTAINER_RUNTIME_TYPE=docker \
--conf spark.yarn.appMasterEnv.YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=$DOCKER_IMAGE_NAME \
--conf spark.yarn.appMasterEnv.YARN_CONTAINER_RUNTIME_DOCKER_CLIENT_CONFIG=$DOCKER_CLIENT_CONFIG \
sparkR.R
</pre></div></div>
</section><section>
<h2><a name="Docker_Container_ENTRYPOINT_Support"></a>Docker Container ENTRYPOINT Support</h2>
<p>When Docker support was introduced to Hadoop 2.x, the platform was designed to run existing Hadoop programs inside Docker container. Log redirection and environment setup are integrated with Node Manager. In Hadoop 3.x, Hadoop Docker support extends beyond running Hadoop workload, and support Docker container in Docker native form using ENTRYPOINT from dockerfile. Application can decide to support YARN mode as default or Docker mode as default by defining YARN_CONTAINER_RUNTIME_DOCKER_RUN_OVERRIDE_DISABLE environment variable. System administrator can also set as default setting for the cluster to make ENTRY_POINT as default mode of operation.</p>
<p>In yarn-site.xml, add YARN_CONTAINER_RUNTIME_DOCKER_RUN_OVERRIDE_DISABLE to node manager environment white list:</p>
<div class="source">
<div class="source">
<pre>&lt;property&gt;
&lt;name&gt;yarn.nodemanager.env-whitelist&lt;/name&gt;
&lt;value&gt;JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,HADOOP_YARN_HOME,HADOOP_HOME,PATH,LANG,TZ,HADOOP_MAPRED_HOME,YARN_CONTAINER_RUNTIME_DOCKER_RUN_OVERRIDE_DISABLE&lt;/value&gt;
&lt;/property&gt;
</pre></div></div>
<p>In yarn-env.sh, define:</p>
<div class="source">
<div class="source">
<pre>export YARN_CONTAINER_RUNTIME_DOCKER_RUN_OVERRIDE_DISABLE=true
</pre></div></div>
</section><section>
<h2><a name="Requirements_when_not_using_ENTRYPOINT_.28YARN_mode.29"></a>Requirements when not using ENTRYPOINT (YARN mode)</h2>
<p>There are two requirements when ENTRYPOINT is not used:</p>
<ol style="list-style-type: decimal">
<li>
<p><code>/bin/bash</code> must be available inside the image. This is generally true, however, tiny Docker images (eg. ones which use busybox for shell commands) might not have bash installed. In this case, the following error is displayed:</p>
<div class="source">
<div class="source">
<pre>Container id: container_1561638268473_0015_01_000002
Exit code: 7
Exception message: Launch container failed
Shell error output: /usr/bin/docker-current: Error response from daemon: oci runtime error: container_linux.go:235: starting container process caused &quot;exec: \&quot;bash\&quot;: executable file not found in $PATH&quot;.
Shell output: main : command provided 4
</pre></div></div>
</li>
<li>
<p><code>find</code> command must also be available inside the image. Not having <code>find</code> causes this error:</p>
<div class="source">
<div class="source">
<pre>Container exited with a non-zero exit code 127. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
/tmp/hadoop-systest/nm-local-dir/usercache/hadoopuser/appcache/application_1561638268473_0017/container_1561638268473_0017_01_000002/launch_container.sh: line 44: find: command not found
</pre></div></div>
</li>
</ol></section><section>
<h2><a name="Docker_Container_YARN_SysFS_Support"></a>Docker Container YARN SysFS Support</h2>
<p>YARN SysFS is a pseudo file system provided by the YARN framework that exports information about clustering information to Docker container. Cluster information is exported to /hadoop/yarn/sysfs path. This API allows application developer to obtain clustering information without external service dependencies. Custom application master can populate cluster information by calling node manager REST API. YARN service framework automatically populates cluster information to /hadoop/yarn/sysfs/app.json. For more information about YARN service, see: <a href="./yarn-service/Overview.html">YARN Service</a>.</p></section><section>
<h2><a name="Docker_Container_Service_Mode"></a>Docker Container Service Mode</h2>
<p>Docker Container Service Mode runs the container as defined by the image but does not set the user (&#x2013;user and &#x2013;group-add). This mode is disabled by default. The administrator sets docker.service-mode.enabled to true in container-executor.cfg under docker section to enable.</p>
<p>Part of a container-executor.cfg which allows docker service mode is below:</p>
<div class="source">
<div class="source">
<pre>yarn.nodemanager.linux-container-executor.group=yarn
[docker]
module.enabled=true
docker.privileged-containers.enabled=true
docker.service-mode.enabled=true
</pre></div></div>
<p>Application User can enable or disable service mode at job level by exporting environment variable YARN_CONTAINER_RUNTIME_DOCKER_SERVICE_MODE in the application&#x2019;s environment with value true or false respectively.</p></section>
</div>
</div>
<div class="clear">
<hr/>
</div>
<div id="footer">
<div class="xright">
&#169; 2008-2023
Apache Software Foundation
- <a href="http://maven.apache.org/privacy-policy.html">Privacy Policy</a>.
Apache Maven, Maven, Apache, the Apache feather logo, and the Apache Maven project logos are trademarks of The Apache Software Foundation.
</div>
<div class="clear">
<hr/>
</div>
</div>
</body>
</html>