hadoop/hadoop-aws/tools/hadoop-aws/delegation_token_architectu...

781 lines
48 KiB
HTML

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<!--
| Generated by Apache Maven Doxia at 2023-04-27
| Rendered using Apache Maven Stylus Skin 1.5
-->
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Apache Hadoop Amazon Web Services support &#x2013; S3A Delegation Token Architecture</title>
<style type="text/css" media="all">
@import url("../../css/maven-base.css");
@import url("../../css/maven-theme.css");
@import url("../../css/site.css");
</style>
<link rel="stylesheet" href="../../css/print.css" type="text/css" media="print" />
<meta name="Date-Revision-yyyymmdd" content="20230427" />
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
</head>
<body class="composite">
<div id="banner">
<a href="http://hadoop.apache.org/" id="bannerLeft">
<img src="http://hadoop.apache.org/images/hadoop-logo.jpg" alt="" />
</a>
<a href="http://www.apache.org/" id="bannerRight">
<img src="http://www.apache.org/images/asf_logo_wide.png" alt="" />
</a>
<div class="clear">
<hr/>
</div>
</div>
<div id="breadcrumbs">
<div class="xright"> <a href="http://wiki.apache.org/hadoop" class="externalLink">Wiki</a>
|
<a href="https://gitbox.apache.org/repos/asf/hadoop.git" class="externalLink">git</a>
&nbsp;| Last Published: 2023-04-27
&nbsp;| Version: 3.4.0-SNAPSHOT
</div>
<div class="clear">
<hr/>
</div>
</div>
<div id="leftColumn">
<div id="navcolumn">
<h5>General</h5>
<ul>
<li class="none">
<a href="../../../index.html">Overview</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-common/SingleCluster.html">Single Node Setup</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-common/ClusterSetup.html">Cluster Setup</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-common/CommandsManual.html">Commands Reference</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-common/FileSystemShell.html">FileSystem Shell</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-common/Compatibility.html">Compatibility Specification</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-common/DownstreamDev.html">Downstream Developer's Guide</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-common/AdminCompatibilityGuide.html">Admin Compatibility Guide</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-common/InterfaceClassification.html">Interface Classification</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-common/filesystem/index.html">FileSystem Specification</a>
</li>
</ul>
<h5>Common</h5>
<ul>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-common/CLIMiniCluster.html">CLI Mini Cluster</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-common/FairCallQueue.html">Fair Call Queue</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-common/NativeLibraries.html">Native Libraries</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-common/Superusers.html">Proxy User</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-common/RackAwareness.html">Rack Awareness</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-common/SecureMode.html">Secure Mode</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-common/ServiceLevelAuth.html">Service Level Authorization</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-common/HttpAuthentication.html">HTTP Authentication</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-common/CredentialProviderAPI.html">Credential Provider API</a>
</li>
<li class="none">
<a href="../../../hadoop-kms/index.html">Hadoop KMS</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-common/Tracing.html">Tracing</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-common/UnixShellGuide.html">Unix Shell Guide</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-common/registry/index.html">Registry</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-common/AsyncProfilerServlet.html">Async Profiler</a>
</li>
</ul>
<h5>HDFS</h5>
<ul>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-hdfs/HdfsDesign.html">Architecture</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html">User Guide</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-hdfs/HDFSCommands.html">Commands Reference</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html">NameNode HA With QJM</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithNFS.html">NameNode HA With NFS</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-hdfs/ObserverNameNode.html">Observer NameNode</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-hdfs/Federation.html">Federation</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-hdfs/ViewFs.html">ViewFs</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-hdfs/ViewFsOverloadScheme.html">ViewFsOverloadScheme</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-hdfs/HdfsSnapshots.html">Snapshots</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-hdfs/HdfsEditsViewer.html">Edits Viewer</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-hdfs/HdfsImageViewer.html">Image Viewer</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-hdfs/HdfsPermissionsGuide.html">Permissions and HDFS</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-hdfs/HdfsQuotaAdminGuide.html">Quotas and HDFS</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-hdfs/LibHdfs.html">libhdfs (C API)</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-hdfs/WebHDFS.html">WebHDFS (REST API)</a>
</li>
<li class="none">
<a href="../../../hadoop-hdfs-httpfs/index.html">HttpFS</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-hdfs/ShortCircuitLocalReads.html">Short Circuit Local Reads</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-hdfs/CentralizedCacheManagement.html">Centralized Cache Management</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-hdfs/HdfsNfsGateway.html">NFS Gateway</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html">Rolling Upgrade</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-hdfs/ExtendedAttributes.html">Extended Attributes</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-hdfs/TransparentEncryption.html">Transparent Encryption</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-hdfs/HdfsMultihoming.html">Multihoming</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-hdfs/ArchivalStorage.html">Storage Policies</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-hdfs/MemoryStorage.html">Memory Storage Support</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-hdfs/SLGUserGuide.html">Synthetic Load Generator</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-hdfs/HDFSErasureCoding.html">Erasure Coding</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-hdfs/HDFSDiskbalancer.html">Disk Balancer</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-hdfs/HdfsUpgradeDomain.html">Upgrade Domain</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-hdfs/HdfsDataNodeAdminGuide.html">DataNode Admin</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-hdfs-rbf/HDFSRouterFederation.html">Router Federation</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-hdfs/HdfsProvidedStorage.html">Provided Storage</a>
</li>
</ul>
<h5>MapReduce</h5>
<ul>
<li class="none">
<a href="../../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html">Tutorial</a>
</li>
<li class="none">
<a href="../../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapredCommands.html">Commands Reference</a>
</li>
<li class="none">
<a href="../../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduce_Compatibility_Hadoop1_Hadoop2.html">Compatibility with 1.x</a>
</li>
<li class="none">
<a href="../../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/EncryptedShuffle.html">Encrypted Shuffle</a>
</li>
<li class="none">
<a href="../../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/PluggableShuffleAndPluggableSort.html">Pluggable Shuffle/Sort</a>
</li>
<li class="none">
<a href="../../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/DistributedCacheDeploy.html">Distributed Cache Deploy</a>
</li>
<li class="none">
<a href="../../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/SharedCacheSupport.html">Support for YARN Shared Cache</a>
</li>
</ul>
<h5>MapReduce REST APIs</h5>
<ul>
<li class="none">
<a href="../../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapredAppMasterRest.html">MR Application Master</a>
</li>
<li class="none">
<a href="../../../hadoop-mapreduce-client/hadoop-mapreduce-client-hs/HistoryServerRest.html">MR History Server</a>
</li>
</ul>
<h5>YARN</h5>
<ul>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/YARN.html">Architecture</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/YarnCommands.html">Commands Reference</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html">Capacity Scheduler</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/FairScheduler.html">Fair Scheduler</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/ResourceManagerRestart.html">ResourceManager Restart</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/ResourceManagerHA.html">ResourceManager HA</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/ResourceModel.html">Resource Model</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/NodeLabel.html">Node Labels</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/NodeAttributes.html">Node Attributes</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/WebApplicationProxy.html">Web Application Proxy</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/TimelineServer.html">Timeline Server</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/TimelineServiceV2.html">Timeline Service V.2</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/WritingYarnApplications.html">Writing YARN Applications</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/YarnApplicationSecurity.html">YARN Application Security</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/NodeManager.html">NodeManager</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/DockerContainers.html">Running Applications in Docker Containers</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/RuncContainers.html">Running Applications in runC Containers</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/NodeManagerCgroups.html">Using CGroups</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/SecureContainer.html">Secure Containers</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/ReservationSystem.html">Reservation System</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/GracefulDecommission.html">Graceful Decommission</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/OpportunisticContainers.html">Opportunistic Containers</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/Federation.html">YARN Federation</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/SharedCache.html">Shared Cache</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/UsingGpus.html">Using GPU</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/UsingFPGA.html">Using FPGA</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/PlacementConstraints.html">Placement Constraints</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/YarnUI2.html">YARN UI2</a>
</li>
</ul>
<h5>YARN REST APIs</h5>
<ul>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/WebServicesIntro.html">Introduction</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html">Resource Manager</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/NodeManagerRest.html">Node Manager</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/TimelineServer.html#Timeline_Server_REST_API_v1">Timeline Server</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/TimelineServiceV2.html#Timeline_Service_v.2_REST_API">Timeline Service V.2</a>
</li>
</ul>
<h5>YARN Service</h5>
<ul>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/yarn-service/Overview.html">Overview</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/yarn-service/QuickStart.html">QuickStart</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/yarn-service/Concepts.html">Concepts</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/yarn-service/YarnServiceAPI.html">Yarn Service API</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/yarn-service/ServiceDiscovery.html">Service Discovery</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/yarn-service/SystemServices.html">System Services</a>
</li>
</ul>
<h5>Hadoop Compatible File Systems</h5>
<ul>
<li class="none">
<a href="../../../hadoop-aliyun/tools/hadoop-aliyun/index.html">Aliyun OSS</a>
</li>
<li class="none">
<a href="../../../hadoop-aws/tools/hadoop-aws/index.html">Amazon S3</a>
</li>
<li class="none">
<a href="../../../hadoop-azure/index.html">Azure Blob Storage</a>
</li>
<li class="none">
<a href="../../../hadoop-azure-datalake/index.html">Azure Data Lake Storage</a>
</li>
<li class="none">
<a href="../../../hadoop-cos/cloud-storage/index.html">Tencent COS</a>
</li>
<li class="none">
<a href="../../../hadoop-huaweicloud/cloud-storage/index.html">Huaweicloud OBS</a>
</li>
</ul>
<h5>Auth</h5>
<ul>
<li class="none">
<a href="../../../hadoop-auth/index.html">Overview</a>
</li>
<li class="none">
<a href="../../../hadoop-auth/Examples.html">Examples</a>
</li>
<li class="none">
<a href="../../../hadoop-auth/Configuration.html">Configuration</a>
</li>
<li class="none">
<a href="../../../hadoop-auth/BuildingIt.html">Building</a>
</li>
</ul>
<h5>Tools</h5>
<ul>
<li class="none">
<a href="../../../hadoop-streaming/HadoopStreaming.html">Hadoop Streaming</a>
</li>
<li class="none">
<a href="../../../hadoop-archives/HadoopArchives.html">Hadoop Archives</a>
</li>
<li class="none">
<a href="../../../hadoop-archive-logs/HadoopArchiveLogs.html">Hadoop Archive Logs</a>
</li>
<li class="none">
<a href="../../../hadoop-distcp/DistCp.html">DistCp</a>
</li>
<li class="none">
<a href="../../../hadoop-federation-balance/HDFSFederationBalance.html">HDFS Federation Balance</a>
</li>
<li class="none">
<a href="../../../hadoop-gridmix/GridMix.html">GridMix</a>
</li>
<li class="none">
<a href="../../../hadoop-rumen/Rumen.html">Rumen</a>
</li>
<li class="none">
<a href="../../../hadoop-resourceestimator/ResourceEstimator.html">Resource Estimator Service</a>
</li>
<li class="none">
<a href="../../../hadoop-sls/SchedulerLoadSimulator.html">Scheduler Load Simulator</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-common/Benchmarking.html">Hadoop Benchmarking</a>
</li>
<li class="none">
<a href="../../../hadoop-dynamometer/Dynamometer.html">Dynamometer</a>
</li>
</ul>
<h5>Reference</h5>
<ul>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-common/release/">Changelog and Release Notes</a>
</li>
<li class="none">
<a href="../../../api/index.html">Java API docs</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-common/UnixShellAPI.html">Unix Shell API</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-common/Metrics.html">Metrics</a>
</li>
</ul>
<h5>Configuration</h5>
<ul>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-common/core-default.xml">core-default.xml</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-hdfs/hdfs-default.xml">hdfs-default.xml</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-hdfs-rbf/hdfs-rbf-default.xml">hdfs-rbf-default.xml</a>
</li>
<li class="none">
<a href="../../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml">mapred-default.xml</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-common/yarn-default.xml">yarn-default.xml</a>
</li>
<li class="none">
<a href="../../../hadoop-kms/kms-default.html">kms-default.xml</a>
</li>
<li class="none">
<a href="../../../hadoop-hdfs-httpfs/httpfs-default.html">httpfs-default.xml</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-common/DeprecatedProperties.html">Deprecated Properties</a>
</li>
</ul>
<a href="http://maven.apache.org/" title="Built by Maven" class="poweredBy">
<img alt="Built by Maven" src="../../images/logos/maven-feather.png"/>
</a>
</div>
</div>
<div id="bodyColumn">
<div id="contentBox">
<!---
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<h1>S3A Delegation Token Architecture</h1>
<p>This is an architecture document to accompany <a href="delegation_tokens.html">Working with Delegation Tokens</a></p><section>
<h2><a name="Background:_Delegation_Tokens"></a>Background: Delegation Tokens</h2>
<p>Delegation Tokens, &#x201c;DTs&#x201d; are a common feature of Hadoop Services. They are opaque byte arrays which can be issued by services like HDFS, HBase, YARN, and which can be used to authenticate a request with that service.</p><section>
<h3><a name="Tokens_are_Issued"></a>Tokens are Issued</h3>
<p>In a Kerberized cluster, they are issued by the service after the caller has authenticated, and so that principal is trusted to be who they say they are. The issued DT can therefore attest that whoever is including that token on a request is authorized to act on behalf of that principal &#x2014;for the specific set of operations which the DT grants.</p>
<p>As an example, an HDFS DT can be requested by a user, included in the launch context of a YARN application -say DistCp, and that launched application can then talk to HDFS as if they were that user.</p></section><section>
<h3><a name="Tokens_are_marshalled"></a>Tokens are marshalled</h3>
<p>Tokens are opaque byte arrays. They are contained within a <code>Token&lt;T extends TokenIdentifier&gt;</code> class which includes an expiry time, the service identifier, and some other details.</p>
<p><code>Token&lt;&gt;</code> instances can be serialized as a Hadoop Writable, or converted saved to/from a protobuf format. This is how they are included in YARN application and container requests, and elsewhere. They can even be saved to files through the <code>hadoop dt</code> command.</p></section><section>
<h3><a name="Tokens_can_be_unmarshalled"></a>Tokens can be unmarshalled</h3>
<p>At the far end, tokens can be unmarshalled and converted into instances of the java classes. This assumes that all the dependent classes are on the classpath, obviously.</p></section><section>
<h3><a name="Tokens_can_be_used_to_authenticate_callers"></a>Tokens can be used to authenticate callers</h3>
<p>The Hadoop RPC layer and the web SPNEGO layer support tokens.</p></section><section>
<h3><a name="Tokens_can_be_renewed"></a>Tokens can be renewed</h3>
<p>DTs can be renewed by the specific principal declared at creation time as &#x201c;the renewer&#x201d;. In the example above, the YARN Resource Manager&#x2019;s principal can be declared as the reviewer. Then, even while a token is attached to a queued launch request in the RM, the RM can regularly request of HDFS that the token is renewed.</p>
<p>There&#x2019;s an ultimate limit on how long tokens can be renewed for, but its generally 72h or similar, so that medium-life jobs can access services and data on behalf of a user.</p></section><section>
<h3><a name="Tokens_can_be_Revoked"></a>Tokens can be Revoked</h3>
<p>When tokens are no longer needed, the service can be told to revoke a token. Continuing the YARN example, after an application finishes the YARN RM can revoke every token marshalled into the application launch request. At which point there&#x2019;s no risk associated with that token being compromised.</p>
<p><i>This is all how &#x201c;real&#x201d; Hadoop tokens work</i></p>
<p>The S3A Delegation Tokens are subtly different.</p>
<p>The S3A DTs actually include the AWS credentials within the token data marshalled and shared across the cluster. The credentials can be one of:</p>
<ul>
<li>The Full AWS (<code>fs.s3a.access.key</code>, <code>fs.s3a.secret.key</code>) login.</li>
<li>A set of AWS session credentials (<code>fs.s3a.access.key</code>, <code>fs.s3a.secret.key</code>, <code>fs.s3a.session.token</code>).</li>
</ul>
<p>These credentials are obtained from the AWS Secure Token Service (STS) when the token is issued. * A set of AWS session credentials binding the user to a specific AWS IAM Role, further restricted to only access the S3 bucket. Again, these credentials are requested when the token is issued.</p>
<p><i>Tokens can be issued</i></p>
<p>When an S3A Filesystem instance is asked to issue a token it can simply package up the login secrets (The &#x201c;Full&#x201d; tokens), or talk to the AWS STS service to get a set of session/assumed role credentials. These are marshalled within the overall token, and then onwards to applications.</p>
<p><i>Tokens can be marshalled</i></p>
<p>The AWS secrets are held in a subclass of <code>org.apache.hadoop.security.token.TokenIdentifier</code>. This class gets serialized to a byte array when the whole token is marshalled, and deserialized when the token is loaded.</p>
<p><i>Tokens can be used to authenticate callers</i></p>
<p>The S3A FS does not hand the token to AWS services to authenticate the caller. Instead it takes the AWS credentials included in the token identifier and uses them to sign the requests.</p>
<p><i>Tokens cannot be renewed</i></p>
<p>The tokens contain the credentials; you cant use them to ask AWS for more.</p>
<p>For full credentials that is moot, but for the session and role credentials, they will expire. At which point the application will be unable to talk to the AWS infrastructure.</p>
<p><i>Tokens cannot be revoked</i></p>
<p>The AWS STS APIs don&#x2019;t let you revoke a single set of session credentials.</p></section></section><section>
<h2><a name="Background:_How_Tokens_are_collected_in_MapReduce_jobs"></a>Background: How Tokens are collected in MapReduce jobs</h2><section>
<h3><a name="org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal.28.29"></a><code>org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal()</code></h3>
<ol style="list-style-type: decimal">
<li>Calls <code>org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes()</code> for the job submission dir on the cluster FS (i.e. <code>fs.defaultFS</code>).</li>
<li>Reads in the property <code>mapreduce.job.hdfs-servers</code> and extracts DTs from them,</li>
<li>Tells the <code>FileInputFormat</code> and <code>FileOutputFormat</code> subclasses of the job to collect their source and dest FS tokens.</li>
</ol>
<p>All token collection is via <code>TokenCache.obtainTokensForNamenodes()</code></p></section><section>
<h3><a name="TokenCache.obtainTokensForNamenodes.28Credentials.2C_Path.5B.5D.2C_Configuration.29"></a><code>TokenCache.obtainTokensForNamenodes(Credentials, Path[], Configuration)</code></h3>
<ol style="list-style-type: decimal">
<li>Returns immediately if security is off.</li>
<li>Retrieves all the filesystems in the list of paths.</li>
<li>Retrieves a token from each unless it is in the list of filesystems in <code>mapreduce.job.hdfs-servers.token-renewal.exclude</code></li>
<li>Merges in any DTs stored in the file referenced under: <code>mapreduce.job.credentials.binary</code></li>
<li>Calls <code>FileSystem.collectDelegationTokens()</code>, which, if there isn&#x2019;t any token already in the credential list, issues and adds a new token. <i>There is no check to see if that existing credential has expired</i>.</li>
</ol></section><section>
<h3><a name="FileInputFormat.listStatus.28JobConf_job.29:_FileStatus.5B.5D"></a><code>FileInputFormat.listStatus(JobConf job): FileStatus[]</code></h3>
<p>Enumerates source paths in (<code>mapreduce.input.fileinputformat.inputdir</code>) ; uses <code>TokenCache.obtainTokensForNamenodes()</code> to collate a token for all of these paths.</p>
<p>This operation is called by the public interface method <code>FileInputFormat.getSplits()</code>.</p></section><section>
<h3><a name="FileOutputFormat.checkOutputSpecs.28.29"></a><code>FileOutputFormat.checkOutputSpecs()</code></h3>
<p>Calls <code>getOutputPath(job)</code> and asks for the DTs of that output path FS.</p></section></section><section>
<h2><a name="Architecture_of_the_S3A_Delegation_Token_Support"></a>Architecture of the S3A Delegation Token Support</h2>
<ol style="list-style-type: decimal">
<li>The S3A FS client has the ability to be configured with a delegation token binding, the &#x201c;DT Binding&#x201d;, a class declared in the option <code>fs.s3a.delegation.token.binding</code>.</li>
<li>If set, when a filesystem is instantiated it asks the DT binding for its list of AWS credential providers. (the list in <code>fs.s3a.aws.credentials.provider</code> are only used if the DT binding wishes to).</li>
<li>The DT binding scans for the current principal (<code>UGI.getCurrentUser()</code>/&#x201c;the Owner&#x201d;) to see if they have any token in their credential cache whose service name matches the URI of the filesystem.</li>
<li>If one is found, it is unmarshalled and then used to authenticate the caller via some AWS Credential provider returned to the S3A FileSystem instance.</li>
<li>If none is found, the Filesystem is considered to have been deployed &#x201c;Unbonded&#x201d;. The DT binding has to return a list of the AWS credential providers to use.</li>
</ol>
<p>When requests are made of AWS services, the created credential provider(s) are used to sign requests.</p>
<p>When the filesystem is asked for a delegation token, the DT binding will generate a token identifier containing the marshalled tokens.</p>
<p>If the Filesystem was deployed with a DT, that is, it was deployed &#x201c;bonded&#x201d;, that existing DT is returned.</p>
<p>If it was deployed unbonded, the DT Binding is asked to create a new DT.</p>
<p>It is up to the binding what it includes in the token identifier, and how it obtains them. This new token identifier is included in a token which has a &#x201c;canonical service name&#x201d; of the URI of the filesystem (e.g &#x201c;<a class="externalLink" href="s3a://landsat-pds">s3a://landsat-pds</a>&#x201d;).</p>
<p>The issued/reissued token identifier can be marshalled and reused.</p><section>
<h3><a name="class_org.apache.hadoop.fs.s3a.auth.delegation.S3ADelegationTokens"></a>class <code>org.apache.hadoop.fs.s3a.auth.delegation.S3ADelegationTokens</code></h3>
<p>This joins up the S3A Filesystem with the pluggable DT binding classes.</p>
<p>One is instantiated in the S3A Filesystem instance if a DT Binding class has been instantiated. If so, it is invoked for</p>
<ul>
<li>Building up the authentication chain during filesystem initialization.</li>
<li>Determining if the FS should declare that it has a canonical name (in <code>getCanonicalServiceName()</code>).</li>
<li>When asked for a DT (in <code>getDelegationToken(String renewer)</code>).</li>
</ul>
<p>The <code>S3ADelegationTokens</code> has the task of instantiating the actual DT binding, which must be a subclass of <code>AbstractDelegationTokenBinding</code>.</p>
<p>All the DT bindings, and <code>S3ADelegationTokens</code> itself are subclasses of <code>org.apache.hadoop.service.AbstractService</code>; they follow the YARN service lifecycle of: create -&gt; init -&gt; start -&gt; stop. This means that a DT binding, may, if it chooses, start worker threads when the service is started (<code>serviceStart()</code>); it must then stop them in the <code>serviceStop</code> method. (Anyone doing this must be aware that the owner FS is not fully initialized in serviceStart: they must not call into the Filesystem).</p>
<p>The actions of this class are</p>
<ul>
<li>Lookup of DTs associated with this S3A FS (scanning credentials, unmarshalling).</li>
<li>initiating the DT binding in bound/unbound state.</li>
<li>issuing DTs, either serving up the existing one, or requesting the DT Binding for a new instance of <code>AbstractS3ATokenIdentifier</code> and then wrapping a hadoop token around it.</li>
<li>General logging, debugging, and metrics. Delegation token metrics are collected in (<code>S3AInstrumentation.DelegationTokenStatistics</code>)</li>
</ul></section><section>
<h3><a name="class_org.apache.hadoop.fs.s3a.auth.delegation.AbstractS3ATokenIdentifier"></a>class <code>org.apache.hadoop.fs.s3a.auth.delegation.AbstractS3ATokenIdentifier</code></h3>
<p>All tokens returned are a subclass of <code>AbstractS3ATokenIdentifier</code>.</p>
<p>This class contains the following fields:</p>
<div class="source">
<div class="source">
<pre> /** Canonical URI of the bucket. */
private URI uri;
/**
* Encryption secrets to also marshall with any credentials.
* Set during creation to ensure it is never null.
*/
private EncryptionSecrets encryptionSecrets = new EncryptionSecrets();
/**
* Timestamp of creation.
* This is set to the current time; it will be overridden when
* deserializing data.
*/
private long created = System.currentTimeMillis();
/**
* An origin string for diagnostics.
*/
private String origin = &quot;&quot;;
/**
* This marshalled UUID can be used in testing to verify transmission,
* and reuse; as it is printed you can see what is happending too.
*/
private String uuid = UUID.randomUUID().toString();
</pre></div></div>
<p>The <code>uuid</code> field is used for equality tests and debugging; the <code>origin</code> and <code>created</code> fields are also for diagnostics.</p>
<p>The <code>encryptionSecrets</code> structure enumerates the AWS encryption mechanism of the filesystem instance, and any declared key. This allows the client-side secret for SSE-C encryption to be passed to the filesystem, or the key name for SSE-KMS.</p>
<p><i>The encryption settings and secrets of the S3A filesystem on the client are included in the DT, so can be used to encrypt/decrypt data in the cluster.</i></p></section><section>
<h3><a name="class_SessionTokenIdentifier_extends_AbstractS3ATokenIdentifier"></a>class <code>SessionTokenIdentifier</code> extends <code>AbstractS3ATokenIdentifier</code></h3>
<p>This holds session tokens, and it also gets used as a superclass of the other token identifiers.</p>
<p>It adds a set of <code>MarshalledCredentials</code> containing the session secrets.</p>
<p>Every token/token identifier must have a unique <i>Kind</i>; this is how token identifier deserializers are looked up. For Session Credentials, it is <code>S3ADelegationToken/Session</code>. Subclasses <i>must</i> have a different token kind, else the unmarshalling and binding mechanism will fail.</p></section><section>
<h3><a name="classes_RoleTokenIdentifier_and_FullCredentialsTokenIdentifier"></a>classes <code>RoleTokenIdentifier</code> and <code>FullCredentialsTokenIdentifier</code></h3>
<p>These are subclasses of <code>SessionTokenIdentifier</code> with different token kinds, needed for that token unmarshalling.</p>
<p>Their kinds are <code>S3ADelegationToken/Role</code> and <code>S3ADelegationToken/Full</code> respectively.</p>
<p>Having different possible token bindings raises the risk that a job is submitted with one binding and yet the cluster is expecting another binding. Provided the configuration option <code>fs.s3a.delegation.token.binding</code> is not marked as final in the <code>core-site.xml</code> file, the value of that binding set in the job should propagate with the binding: the choice of provider is automatic. A cluster can even mix bindings across jobs. However if a core-site XML file declares a specific binding for a single bucket and the job only had the generic `fs.s3a.delegation.token.binding`` binding, then there will be a mismatch. Each binding must be rigorous about checking the Kind of any found delegation token and failing meaningfully here.</p></section><section>
<h3><a name="class_MarshalledCredentials"></a>class <code>MarshalledCredentials</code></h3>
<p>Can marshall a set of AWS credentials (access key, secret key, session token) as a Hadoop Writable.</p>
<p>These can be given to an instance of class <code>MarshalledCredentialProvider</code> and used to sign AWS RPC/REST API calls.</p></section></section><section>
<h2><a name="DT_Binding:_AbstractDelegationTokenBinding"></a>DT Binding: <code>AbstractDelegationTokenBinding</code></h2>
<p>The plugin point for this design is the DT binding, which must be a subclass of <code>org.apache.hadoop.fs.s3a.auth.delegation.AbstractDelegationTokenBinding</code>.</p>
<p>This class</p>
<ul>
<li>Returns the <i>Kind</i> of these tokens.</li>
<li>declares whether tokens will actually be issued or not (the TokenIssuingPolicy).</li>
<li>can issue a DT in</li>
</ul>
<div class="source">
<div class="source">
<pre> public abstract AWSCredentialProviderList deployUnbonded()
throws IOException;
</pre></div></div>
<p>The S3A FS has been brought up with DTs enabled, but none have been found for its service name. The DT binding is tasked with coming up with the fallback list of AWS credential providers.</p>
<div class="source">
<div class="source">
<pre>public abstract AWSCredentialProviderList bindToTokenIdentifier(
AbstractS3ATokenIdentifier retrievedIdentifier)
throws IOException;
</pre></div></div>
<p>A DT for this FS instance been found. Bind to it and extract enough information to authenticate with AWS. Return the list of providers to use.</p>
<div class="source">
<div class="source">
<pre>public abstract AbstractS3ATokenIdentifier createEmptyIdentifier();
</pre></div></div>
<p>Return an empty identifier.</p>
<div class="source">
<div class="source">
<pre>public abstract AbstractS3ATokenIdentifier createTokenIdentifier(
Optional&lt;RoleModel.Policy&gt; policy,
EncryptionSecrets encryptionSecrets)
</pre></div></div>
<p>This is the big one: creatw a new Token Identifier for this filesystem, one which must include the encryption secrets, and which may make use of the role policy.</p></section><section>
<h2><a name="Token_issuing"></a>Token issuing</h2><section>
<h3><a name="How_Full_Delegation_Tokens_are_issued."></a>How Full Delegation Tokens are issued.</h3>
<p>If the client is only logged in with session credentials: fail.</p>
<p>Else: take the AWS access/secret key, store them in the MarshalledCredentials in a new <code>FullCredentialsTokenIdentifier</code>, and return.</p></section><section>
<h3><a name="How_Session_Delegation_Tokens_are_issued."></a>How Session Delegation Tokens are issued.</h3>
<p>If the client is only logged in with session credentials: return these.</p>
<p>This is taken from the Yahoo! patch: if a user is logged in with a set of session credentials (including those from some 2FA login), they just get wrapped up and passed in.</p>
<p>There&#x2019;s no clue as to how long they will last, so there&#x2019;s a warning printed.</p>
<p>If there is a full set of credentials, then an <code>SessionTokenBinding.maybeInitSTS()</code> creates an STS client set up to communicate with the (configured) STS endpoint, retrying with the same retry policy as the filesystem.</p>
<p>This client is then used to request a set of session credentials.</p></section><section>
<h3><a name="How_Role_Delegation_Tokens_are_issued."></a>How Role Delegation Tokens are issued.</h3>
<p>If the client is only logged in with session credentials: fail.</p>
<p>We don&#x2019;t know whether this is a full user session or some role session, and rather than pass in some potentially more powerful secrets with the job, just fail.</p>
<p>Else: as with session delegation tokens, an STS client is created. This time <code>assumeRole()</code> is invoked with the ARN of the role and an extra AWS role policy set to restrict access to:</p>
<ul>
<li>CRUD access the specific bucket a token is being requested for</li>
<li>access to all KMS keys (assumption: AWS KMS is where restrictions are set up)</li>
</ul>
<p><i>Example Generated Role Policy</i></p>
<div class="source">
<div class="source">
<pre>{
&quot;Version&quot; : &quot;2012-10-17&quot;,
&quot;Statement&quot; : [ {
&quot;Sid&quot; : &quot;7&quot;,
&quot;Effect&quot; : &quot;Allow&quot;,
&quot;Action&quot; : [ &quot;s3:GetBucketLocation&quot;, &quot;s3:ListBucket*&quot; ],
&quot;Resource&quot; : &quot;arn:aws:s3:::example-bucket&quot;
}, {
&quot;Sid&quot; : &quot;8&quot;,
&quot;Effect&quot; : &quot;Allow&quot;,
&quot;Action&quot; : [ &quot;s3:Get*&quot;, &quot;s3:PutObject&quot;, &quot;s3:DeleteObject&quot;, &quot;s3:AbortMultipartUpload&quot; ],
&quot;Resource&quot; : &quot;arn:aws:s3:::example-bucket/*&quot;
}, {
&quot;Sid&quot; : &quot;1&quot;,
&quot;Effect&quot; : &quot;Allow&quot;,
&quot;Action&quot; : [ &quot;kms:Decrypt&quot;, &quot;kms:GenerateDataKey&quot; ],
&quot;Resource&quot; : &quot;arn:aws:kms:*&quot;
}]
}
</pre></div></div>
<p>These permissions are sufficient for all operations the S3A client currently performs on a bucket. If those requirements are expanded, these policies may change.</p></section></section><section>
<h2><a name="Testing."></a>Testing.</h2>
<p>Look in <code>org.apache.hadoop.fs.s3a.auth.delegation</code></p>
<p>It&#x2019;s proven impossible to generate a full end-to-end test in an MR job.</p>
<ol style="list-style-type: decimal">
<li>MapReduce only collects DTs when kerberos is enabled in the cluster.</li>
<li>A Kerberized MiniYARN cluster refuses to start on a local <a class="externalLink" href="file://">file://</a> fs without the native libraries, so it can set directory permissions.</li>
<li>A Kerberized MiniHDFS cluster and MiniYARN cluster refuse to talk to each other reliably, at least in the week or so of trying.</li>
</ol>
<p>The <code>ITestDelegatedMRJob</code> test works around this by using Mockito to mock the actual YARN job submit operation in <code>org.apache.hadoop.mapreduce.protocol.ClientProtocol</code>. The MR code does all the work of collecting tokens and attaching them to the launch context, &#x201c;submits&#x201d; the job, which then immediately succeeds. The job context is examined to verify that the source and destination filesystem DTs were extracted.</p>
<p>To test beyond this requires a real Kerberized cluster, or someone able to fix up Mini* clusters to run kerberized.</p></section>
</div>
</div>
<div class="clear">
<hr/>
</div>
<div id="footer">
<div class="xright">
&#169; 2008-2023
Apache Software Foundation
- <a href="http://maven.apache.org/privacy-policy.html">Privacy Policy</a>.
Apache Maven, Maven, Apache, the Apache feather logo, and the Apache Maven project logos are trademarks of The Apache Software Foundation.
</div>
<div class="clear">
<hr/>
</div>
</div>
</body>
</html>