hadoop/hadoop-aws/tools/hadoop-aws/delegation_tokens.html

1127 lines
74 KiB
HTML

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<!--
| Generated by Apache Maven Doxia at 2023-03-02
| Rendered using Apache Maven Stylus Skin 1.5
-->
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Apache Hadoop Amazon Web Services support &#x2013; Working with Delegation Tokens</title>
<style type="text/css" media="all">
@import url("../../css/maven-base.css");
@import url("../../css/maven-theme.css");
@import url("../../css/site.css");
</style>
<link rel="stylesheet" href="../../css/print.css" type="text/css" media="print" />
<meta name="Date-Revision-yyyymmdd" content="20230302" />
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
</head>
<body class="composite">
<div id="banner">
<a href="http://hadoop.apache.org/" id="bannerLeft">
<img src="http://hadoop.apache.org/images/hadoop-logo.jpg" alt="" />
</a>
<a href="http://www.apache.org/" id="bannerRight">
<img src="http://www.apache.org/images/asf_logo_wide.png" alt="" />
</a>
<div class="clear">
<hr/>
</div>
</div>
<div id="breadcrumbs">
<div class="xright"> <a href="http://wiki.apache.org/hadoop" class="externalLink">Wiki</a>
|
<a href="https://gitbox.apache.org/repos/asf/hadoop.git" class="externalLink">git</a>
&nbsp;| Last Published: 2023-03-02
&nbsp;| Version: 3.4.0-SNAPSHOT
</div>
<div class="clear">
<hr/>
</div>
</div>
<div id="leftColumn">
<div id="navcolumn">
<h5>General</h5>
<ul>
<li class="none">
<a href="../../../index.html">Overview</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-common/SingleCluster.html">Single Node Setup</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-common/ClusterSetup.html">Cluster Setup</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-common/CommandsManual.html">Commands Reference</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-common/FileSystemShell.html">FileSystem Shell</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-common/Compatibility.html">Compatibility Specification</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-common/DownstreamDev.html">Downstream Developer's Guide</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-common/AdminCompatibilityGuide.html">Admin Compatibility Guide</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-common/InterfaceClassification.html">Interface Classification</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-common/filesystem/index.html">FileSystem Specification</a>
</li>
</ul>
<h5>Common</h5>
<ul>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-common/CLIMiniCluster.html">CLI Mini Cluster</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-common/FairCallQueue.html">Fair Call Queue</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-common/NativeLibraries.html">Native Libraries</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-common/Superusers.html">Proxy User</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-common/RackAwareness.html">Rack Awareness</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-common/SecureMode.html">Secure Mode</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-common/ServiceLevelAuth.html">Service Level Authorization</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-common/HttpAuthentication.html">HTTP Authentication</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-common/CredentialProviderAPI.html">Credential Provider API</a>
</li>
<li class="none">
<a href="../../../hadoop-kms/index.html">Hadoop KMS</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-common/Tracing.html">Tracing</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-common/UnixShellGuide.html">Unix Shell Guide</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-common/registry/index.html">Registry</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-common/AsyncProfilerServlet.html">Async Profiler</a>
</li>
</ul>
<h5>HDFS</h5>
<ul>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-hdfs/HdfsDesign.html">Architecture</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html">User Guide</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-hdfs/HDFSCommands.html">Commands Reference</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html">NameNode HA With QJM</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithNFS.html">NameNode HA With NFS</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-hdfs/ObserverNameNode.html">Observer NameNode</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-hdfs/Federation.html">Federation</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-hdfs/ViewFs.html">ViewFs</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-hdfs/ViewFsOverloadScheme.html">ViewFsOverloadScheme</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-hdfs/HdfsSnapshots.html">Snapshots</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-hdfs/HdfsEditsViewer.html">Edits Viewer</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-hdfs/HdfsImageViewer.html">Image Viewer</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-hdfs/HdfsPermissionsGuide.html">Permissions and HDFS</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-hdfs/HdfsQuotaAdminGuide.html">Quotas and HDFS</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-hdfs/LibHdfs.html">libhdfs (C API)</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-hdfs/WebHDFS.html">WebHDFS (REST API)</a>
</li>
<li class="none">
<a href="../../../hadoop-hdfs-httpfs/index.html">HttpFS</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-hdfs/ShortCircuitLocalReads.html">Short Circuit Local Reads</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-hdfs/CentralizedCacheManagement.html">Centralized Cache Management</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-hdfs/HdfsNfsGateway.html">NFS Gateway</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html">Rolling Upgrade</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-hdfs/ExtendedAttributes.html">Extended Attributes</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-hdfs/TransparentEncryption.html">Transparent Encryption</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-hdfs/HdfsMultihoming.html">Multihoming</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-hdfs/ArchivalStorage.html">Storage Policies</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-hdfs/MemoryStorage.html">Memory Storage Support</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-hdfs/SLGUserGuide.html">Synthetic Load Generator</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-hdfs/HDFSErasureCoding.html">Erasure Coding</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-hdfs/HDFSDiskbalancer.html">Disk Balancer</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-hdfs/HdfsUpgradeDomain.html">Upgrade Domain</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-hdfs/HdfsDataNodeAdminGuide.html">DataNode Admin</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-hdfs-rbf/HDFSRouterFederation.html">Router Federation</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-hdfs/HdfsProvidedStorage.html">Provided Storage</a>
</li>
</ul>
<h5>MapReduce</h5>
<ul>
<li class="none">
<a href="../../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html">Tutorial</a>
</li>
<li class="none">
<a href="../../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapredCommands.html">Commands Reference</a>
</li>
<li class="none">
<a href="../../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduce_Compatibility_Hadoop1_Hadoop2.html">Compatibility with 1.x</a>
</li>
<li class="none">
<a href="../../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/EncryptedShuffle.html">Encrypted Shuffle</a>
</li>
<li class="none">
<a href="../../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/PluggableShuffleAndPluggableSort.html">Pluggable Shuffle/Sort</a>
</li>
<li class="none">
<a href="../../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/DistributedCacheDeploy.html">Distributed Cache Deploy</a>
</li>
<li class="none">
<a href="../../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/SharedCacheSupport.html">Support for YARN Shared Cache</a>
</li>
</ul>
<h5>MapReduce REST APIs</h5>
<ul>
<li class="none">
<a href="../../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapredAppMasterRest.html">MR Application Master</a>
</li>
<li class="none">
<a href="../../../hadoop-mapreduce-client/hadoop-mapreduce-client-hs/HistoryServerRest.html">MR History Server</a>
</li>
</ul>
<h5>YARN</h5>
<ul>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/YARN.html">Architecture</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/YarnCommands.html">Commands Reference</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html">Capacity Scheduler</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/FairScheduler.html">Fair Scheduler</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/ResourceManagerRestart.html">ResourceManager Restart</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/ResourceManagerHA.html">ResourceManager HA</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/ResourceModel.html">Resource Model</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/NodeLabel.html">Node Labels</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/NodeAttributes.html">Node Attributes</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/WebApplicationProxy.html">Web Application Proxy</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/TimelineServer.html">Timeline Server</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/TimelineServiceV2.html">Timeline Service V.2</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/WritingYarnApplications.html">Writing YARN Applications</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/YarnApplicationSecurity.html">YARN Application Security</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/NodeManager.html">NodeManager</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/DockerContainers.html">Running Applications in Docker Containers</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/RuncContainers.html">Running Applications in runC Containers</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/NodeManagerCgroups.html">Using CGroups</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/SecureContainer.html">Secure Containers</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/ReservationSystem.html">Reservation System</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/GracefulDecommission.html">Graceful Decommission</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/OpportunisticContainers.html">Opportunistic Containers</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/Federation.html">YARN Federation</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/SharedCache.html">Shared Cache</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/UsingGpus.html">Using GPU</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/UsingFPGA.html">Using FPGA</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/PlacementConstraints.html">Placement Constraints</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/YarnUI2.html">YARN UI2</a>
</li>
</ul>
<h5>YARN REST APIs</h5>
<ul>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/WebServicesIntro.html">Introduction</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html">Resource Manager</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/NodeManagerRest.html">Node Manager</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/TimelineServer.html#Timeline_Server_REST_API_v1">Timeline Server</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/TimelineServiceV2.html#Timeline_Service_v.2_REST_API">Timeline Service V.2</a>
</li>
</ul>
<h5>YARN Service</h5>
<ul>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/yarn-service/Overview.html">Overview</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/yarn-service/QuickStart.html">QuickStart</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/yarn-service/Concepts.html">Concepts</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/yarn-service/YarnServiceAPI.html">Yarn Service API</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/yarn-service/ServiceDiscovery.html">Service Discovery</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/yarn-service/SystemServices.html">System Services</a>
</li>
</ul>
<h5>Hadoop Compatible File Systems</h5>
<ul>
<li class="none">
<a href="../../../hadoop-aliyun/tools/hadoop-aliyun/index.html">Aliyun OSS</a>
</li>
<li class="none">
<a href="../../../hadoop-aws/tools/hadoop-aws/index.html">Amazon S3</a>
</li>
<li class="none">
<a href="../../../hadoop-azure/index.html">Azure Blob Storage</a>
</li>
<li class="none">
<a href="../../../hadoop-azure-datalake/index.html">Azure Data Lake Storage</a>
</li>
<li class="none">
<a href="../../../hadoop-cos/cloud-storage/index.html">Tencent COS</a>
</li>
<li class="none">
<a href="../../../hadoop-huaweicloud/cloud-storage/index.html">Huaweicloud OBS</a>
</li>
</ul>
<h5>Auth</h5>
<ul>
<li class="none">
<a href="../../../hadoop-auth/index.html">Overview</a>
</li>
<li class="none">
<a href="../../../hadoop-auth/Examples.html">Examples</a>
</li>
<li class="none">
<a href="../../../hadoop-auth/Configuration.html">Configuration</a>
</li>
<li class="none">
<a href="../../../hadoop-auth/BuildingIt.html">Building</a>
</li>
</ul>
<h5>Tools</h5>
<ul>
<li class="none">
<a href="../../../hadoop-streaming/HadoopStreaming.html">Hadoop Streaming</a>
</li>
<li class="none">
<a href="../../../hadoop-archives/HadoopArchives.html">Hadoop Archives</a>
</li>
<li class="none">
<a href="../../../hadoop-archive-logs/HadoopArchiveLogs.html">Hadoop Archive Logs</a>
</li>
<li class="none">
<a href="../../../hadoop-distcp/DistCp.html">DistCp</a>
</li>
<li class="none">
<a href="../../../hadoop-federation-balance/HDFSFederationBalance.html">HDFS Federation Balance</a>
</li>
<li class="none">
<a href="../../../hadoop-gridmix/GridMix.html">GridMix</a>
</li>
<li class="none">
<a href="../../../hadoop-rumen/Rumen.html">Rumen</a>
</li>
<li class="none">
<a href="../../../hadoop-resourceestimator/ResourceEstimator.html">Resource Estimator Service</a>
</li>
<li class="none">
<a href="../../../hadoop-sls/SchedulerLoadSimulator.html">Scheduler Load Simulator</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-common/Benchmarking.html">Hadoop Benchmarking</a>
</li>
<li class="none">
<a href="../../../hadoop-dynamometer/Dynamometer.html">Dynamometer</a>
</li>
</ul>
<h5>Reference</h5>
<ul>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-common/release/">Changelog and Release Notes</a>
</li>
<li class="none">
<a href="../../../api/index.html">Java API docs</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-common/UnixShellAPI.html">Unix Shell API</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-common/Metrics.html">Metrics</a>
</li>
</ul>
<h5>Configuration</h5>
<ul>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-common/core-default.xml">core-default.xml</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-hdfs/hdfs-default.xml">hdfs-default.xml</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-hdfs-rbf/hdfs-rbf-default.xml">hdfs-rbf-default.xml</a>
</li>
<li class="none">
<a href="../../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml">mapred-default.xml</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-common/yarn-default.xml">yarn-default.xml</a>
</li>
<li class="none">
<a href="../../../hadoop-kms/kms-default.html">kms-default.xml</a>
</li>
<li class="none">
<a href="../../../hadoop-hdfs-httpfs/httpfs-default.html">httpfs-default.xml</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-common/DeprecatedProperties.html">Deprecated Properties</a>
</li>
</ul>
<a href="http://maven.apache.org/" title="Built by Maven" class="poweredBy">
<img alt="Built by Maven" src="../../images/logos/maven-feather.png"/>
</a>
</div>
</div>
<div id="bodyColumn">
<div id="contentBox">
<!---
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<h1>Working with Delegation Tokens</h1>
<ul>
<li><a href="#Introducing_S3A_Delegation_Tokens."> Introducing S3A Delegation Tokens.</a></li>
<li><a href="#Background:_Hadoop_Delegation_Tokens."> Background: Hadoop Delegation Tokens.</a></li>
<li><a href="#S3A_Delegation_Tokens."> S3A Delegation Tokens.</a>
<ul>
<li><a href="#S3A_Session_Delegation_Tokens"> S3A Session Delegation Tokens</a></li>
<li><a href="#S3A_Role_Delegation_Tokens"> S3A Role Delegation Tokens</a></li>
<li><a href="#S3A_Full-Credential_Delegation_Tokens"> S3A Full-Credential Delegation Tokens</a></li></ul></li>
<li><a href="#Using_S3A_Delegation_Tokens"> Using S3A Delegation Tokens</a>
<ul>
<li><a href="#Configuration_Parameters">Configuration Parameters</a></li>
<li><a href="#Warnings">Warnings</a></li>
<li><a href="#Enabling_Session_Delegation_Tokens"> Enabling Session Delegation Tokens</a></li>
<li><a href="#Enabling_Role_Delegation_Tokens"> Enabling Role Delegation Tokens</a></li>
<li><a href="#Enabling_Full_Delegation_Tokens"> Enabling Full Delegation Tokens</a></li></ul></li>
<li><a href="#Managing_the_Delegation_Tokens_Duration"> Managing the Delegation Tokens Duration</a></li>
<li><a href="#Testing_Delegation_Token_Support"> Testing Delegation Token Support</a></li>
<li><a href="#Troubleshooting_S3A_Delegation_Tokens"> Troubleshooting S3A Delegation Tokens</a>
<ul>
<li><a href="#Submitted_job_cannot_authenticate">Submitted job cannot authenticate</a></li>
<li><a href="#Tokens_are_not_issued">Tokens are not issued</a></li>
<li><a href="#Error_No_AWS_login_credentials">Error No AWS login credentials</a></li>
<li><a href="#Tokens_Expire_before_job_completes">Tokens Expire before job completes</a></li>
<li><a href="#Error_DelegationTokenIOException:_Token_mismatch">Error DelegationTokenIOException: Token mismatch</a></li>
<li><a href="#Warning_Forwarding_existing_session_credentials">Warning Forwarding existing session credentials</a></li>
<li><a href="#Error_Cannot_issue_S3A_Role_Delegation_Tokens_without_full_AWS_credentials">Error Cannot issue S3A Role Delegation Tokens without full AWS credentials</a></li></ul></li>
<li><a href="#Implementation_Details"> Implementation Details</a>
<ul>
<li><a href="#Architecture"> Architecture</a></li>
<li><a href="#Security"> Security</a></li>
<li><a href="#Resilience"> Resilience</a></li>
<li><a href="#Scalability_limits_of_AWS_STS_service"> Scalability limits of AWS STS service</a></li></ul></li>
<li><a href="#Implementing_your_own_Delegation_Token_Binding">Implementing your own Delegation Token Binding</a>
<ul>
<li><a href="#Steps">Steps</a></li>
<li><a href="#Implementing_AbstractS3ATokenIdentifier">Implementing AbstractS3ATokenIdentifier</a></li>
<li><a href="#AWSCredentialProviderList_deployUnbonded.28.29">AWSCredentialProviderList deployUnbonded()</a></li>
<li><a href="#AWSCredentialProviderList_bindToTokenIdentifier.28AbstractS3ATokenIdentifier_id.29">AWSCredentialProviderList bindToTokenIdentifier(AbstractS3ATokenIdentifier id)</a></li>
<li><a href="#AbstractS3ATokenIdentifier_createEmptyIdentifier.28.29">AbstractS3ATokenIdentifier createEmptyIdentifier()</a></li>
<li><a href="#AbstractS3ATokenIdentifier_createTokenIdentifier.28Optional.3CRoleModel.Policy.3E_policy.2C_EncryptionSecrets_secrets.29">AbstractS3ATokenIdentifier createTokenIdentifier(Optional&lt;RoleModel.Policy&gt; policy, EncryptionSecrets secrets)</a></li>
<li><a href="#Testing">Testing</a></li></ul></li></ul>
<section>
<h2><a name="Introducing_S3A_Delegation_Tokens."></a><a name="introduction"></a> Introducing S3A Delegation Tokens.</h2>
<p>The S3A filesystem client supports <code>Hadoop Delegation Tokens</code>. This allows YARN application like MapReduce, Distcp, Apache Flink and Apache Spark to obtain credentials to access S3 buckets and pass them to jobs/queries, so granting them access to the service with the same access permissions as the user.</p>
<p>Three different token types are offered.</p>
<p><i>Full Delegation Tokens:</i> include the full login values of <code>fs.s3a.access.key</code> and <code>fs.s3a.secret.key</code> in the token, so the recipient has access to the data as the submitting user, with unlimited duration. These tokens do not involve communication with the AWS STS service, so can be used with other S3 installations.</p>
<p><i>Session Delegation Tokens:</i> These contain an &#x201c;STS Session Token&#x201d; requested by the S3A client from the AWS STS service. They have a limited duration so restrict how long an application can access AWS on behalf of a user. Clients with this token have the full permissions of the user.</p>
<p><i>Role Delegation Tokens:</i> These contain an &#x201c;STS Session Token&#x201d; requested by the STS &#x201c;Assume Role&#x201d; API, granting the caller permission to interact with S3 using a specific IAM role, <i>with permissions restricted to accessing a specific S3 bucket</i>.</p>
<p>Role Delegation Tokens are the most powerful. By restricting the access rights of the granted STS token, no process receiving the token may perform any operations in the AWS infrastructure other than those for the S3 bucket, and that restricted by the rights of the requested role ARN.</p>
<p>All three tokens also marshall the encryption settings: The encryption mechanism to use and the KMS key ID or SSE-C client secret. This allows encryption policy and secrets to be uploaded from the client to the services.</p>
<p>This document covers how to use these tokens. For details on the implementation see <a href="delegation_token_architecture.html">S3A Delegation Token Architecture</a>.</p></section><section>
<h2><a name="Background:_Hadoop_Delegation_Tokens."></a><a name="background"></a> Background: Hadoop Delegation Tokens.</h2>
<p>A Hadoop Delegation Token is a byte array of data which is submitted to Hadoop services as proof that the caller has the permissions to perform the operation which it is requesting &#x2014; and which can be passed between applications to <i>delegate</i> those permissions.</p>
<p>Tokens are opaque to clients. Clients simply get a byte array of data which they must provide to a service when required. This normally contains encrypted data for use by the service.</p>
<p>The service, which holds the password to encrypt/decrypt this data, can decrypt the byte array and read the contents, knowing that it has not been tampered with, then use the presence of a valid token as evidence the caller has at least temporary permissions to perform the requested operation.</p>
<p>Tokens have a limited lifespan. They may be renewed, with the client making an IPC/HTTP request of a renewer service. This renewal service can also be executed on behalf of the caller by some other Hadoop cluster services, such as the YARN Resource Manager.</p>
<p>After use, tokens may be revoked: this relies on services holding tables of valid tokens, either in memory or, for any HA service, in Apache Zookeeper or similar. Revoking tokens is used to clean up after jobs complete.</p>
<p>Delegation Token support is tightly integrated with YARN: requests to launch containers and applications can include a list of delegation tokens to pass along. These tokens are serialized with the request, saved to a file on the node launching the container, and then loaded in to the credentials of the active user. Normally the HDFS cluster is one of the tokens used here, added to the credentials through a call to <code>FileSystem.getDelegationToken()</code> (usually via <code>FileSystem.addDelegationTokens()</code>).</p>
<p>Delegation Tokens are also supported with applications such as Hive: a query issued to a shared (long-lived) Hive cluster can include the delegation tokens required to access specific filesystems <i>with the rights of the user submitting the query</i>.</p>
<p>All these applications normally only retrieve delegation tokens when security is enabled. This is why the cluster configuration needs to enable Kerberos. Production Hadoop clusters need Kerberos for security anyway.</p></section><section>
<h2><a name="S3A_Delegation_Tokens."></a><a name="s3a-delegation-tokens"></a> S3A Delegation Tokens.</h2>
<p>S3A now supports delegation tokens, so allowing a caller to acquire tokens from a local S3A Filesystem connector instance and pass them on to applications to grant them equivalent or restricted access.</p>
<p>These S3A Delegation Tokens are special in a way that they do not contain password-protected data opaque to clients; they contain the secrets needed to access the relevant S3 buckets and associated services.</p>
<p>They are obtained by requesting a delegation token from the S3A filesystem client. Issued tokens may be included in job submissions, passed to running applications, etc. This token is specific to an individual bucket; all buckets which a client wishes to work with must have a separate delegation token issued.</p>
<p>S3A implements Delegation Tokens in its <code>org.apache.hadoop.fs.s3a.auth.delegation.S3ADelegationTokens</code> class, which then supports multiple &#x201c;bindings&#x201d; behind it, so supporting different variants of S3A Delegation Tokens.</p>
<p>Because applications only collect Delegation Tokens in secure clusters, it does mean that to be able to submit delegation tokens in transient cloud-hosted Hadoop clusters, <i>these clusters must also have Kerberos enabled</i>.</p>
<p><i>Tip</i>: you should only be deploying Hadoop in public clouds with Kerberos enabled.</p><section>
<h3><a name="S3A_Session_Delegation_Tokens"></a><a name="session-tokens"></a> S3A Session Delegation Tokens</h3>
<p>A Session Delegation Token is created by asking the AWS <a class="externalLink" href="http://docs.aws.amazon.com/STS/latest/APIReference/Welcome.html">Security Token Service</a> to issue an AWS session password and identifier for a limited duration. These AWS session credentials are valid until the end of that time period. They are marshalled into the S3A Delegation Token.</p>
<p>Other S3A connectors can extract these credentials and use them to talk to S3 and related services.</p>
<p>Issued tokens cannot be renewed or revoked.</p>
<p>See <a class="externalLink" href="http://docs.aws.amazon.com/STS/latest/APIReference/API_GetSessionToken.html">GetSessionToken</a> for specifics details on the (current) token lifespan.</p></section><section>
<h3><a name="S3A_Role_Delegation_Tokens"></a><a name="role-tokens"></a> S3A Role Delegation Tokens</h3>
<p>A Role Delegation Token is created by asking the AWS <a class="externalLink" href="http://docs.aws.amazon.com/STS/latest/APIReference/Welcome.html">Security Token Service</a> for a set of &#x201c;Assumed Role&#x201d; session credentials with a limited lifetime, belonging to a given IAM Role. The resulting session credentials are restricted to grant access to all KMS keys, and to the specific S3 bucket. They are marshalled into the S3A Delegation Token.</p>
<p>Other S3A connectors can extract these credentials and use them to talk to S3 and related services. They may only work with the explicit AWS resources identified when the token was generated.</p>
<p>Issued tokens cannot be renewed or revoked.</p></section><section>
<h3><a name="S3A_Full-Credential_Delegation_Tokens"></a><a name="full-credentials"></a> S3A Full-Credential Delegation Tokens</h3>
<p>Full Credential Delegation Tokens contain the full AWS login details (access key and secret key) needed to access a bucket.</p>
<p>They never expire, so are the equivalent of storing the AWS account credentials in a Hadoop, Hive, Spark configuration or similar.</p>
<p>The differences are:</p>
<ol style="list-style-type: decimal">
<li>They are automatically passed from the client/user to the application. A remote application can use them to access data on behalf of the user.</li>
<li>When a remote application destroys the filesystem connector instances and tokens of a user, the secrets are destroyed too.</li>
<li>Secrets in the <code>AWS_</code> environment variables on the client will be picked up and automatically propagated.</li>
<li>They do not use the AWS STS service, so may work against third-party implementations of the S3 protocol.</li>
</ol></section></section><section>
<h2><a name="Using_S3A_Delegation_Tokens"></a><a name="enabling"></a> Using S3A Delegation Tokens</h2>
<p>A prerequisite to using S3A filesystem delegation tokens is to run with Hadoop security enabled &#x2014;which inevitably means with Kerberos. Even though S3A delegation tokens do not use Kerberos, the code in applications which fetch DTs is normally only executed when the cluster is running in secure mode; somewhere where the <code>core-site.xml</code> configuration sets <code>hadoop.security.authentication</code> to <code>kerberos</code> or another valid authentication mechanism.</p>
<p><i>Without enabling security at this level, delegation tokens will not be collected.</i></p>
<p>Once Kerberos is enabled, the process for acquiring tokens is as follows:</p>
<ol style="list-style-type: decimal">
<li>Enable Delegation token support by setting <code>fs.s3a.delegation.token.binding</code> to the classname of the token binding to use.</li>
<li>Add any other binding-specific settings (STS endpoint, IAM role, etc.)</li>
<li>Make sure the settings are the same in the service as well as the client.</li>
<li>In the client, switch to using a <a href="hadoop-project-dist/hadoop-common/CredentialProviderAPI.html">Hadoop Credential Provider</a> for storing your local credentials, with a local filesystem store (<code>localjceks:</code> or <code>jcecks://file</code>), so as to keep the full secrets out of any job configurations.</li>
<li>Execute the client from a Kerberos-authenticated account application configured with the login credentials for an AWS account able to issue session tokens.</li>
</ol><section>
<h3><a name="Configuration_Parameters"></a>Configuration Parameters</h3>
<table border="0" class="bodyTable">
<thead>
<tr class="a">
<th> <b>Key</b> </th>
<th> <b>Meaning</b> </th>
<th> <b>Default</b> </th></tr>
</thead><tbody>
<tr class="b">
<td> <code>fs.s3a.delegation.token.binding</code> </td>
<td> delegation token binding class </td>
<td> `` </td></tr>
</tbody>
</table></section><section>
<h3><a name="Warnings"></a>Warnings</h3><section><section>
<h5><a name="Use_Hadoop_Credential_Providers_to_keep_secrets_out_of_job_configurations."></a>Use Hadoop Credential Providers to keep secrets out of job configurations.</h5>
<p>Hadoop MapReduce jobs copy their client-side configurations with the job. If your AWS login secrets are set in an XML file then they are picked up and passed in with the job, <i>even if delegation tokens are used to propagate session or role secrets</i>.</p>
<p>Spark-submit will take any credentials in the <code>spark-defaults.conf</code>file and again, spread them across the cluster. It wil also pick up any <code>AWS_</code> environment variables and convert them into <code>fs.s3a.access.key</code>, <code>fs.s3a.secret.key</code> and <code>fs.s3a.session.key</code> configuration options.</p>
<p>To guarantee that the secrets are not passed in, keep your secrets in a <a href="index.html#hadoop_credential_providers">hadoop credential provider file on the local filesystem</a>. Secrets stored here will not be propagated -the delegation tokens collected during job submission will be the sole AWS secrets passed in.</p></section><section>
<h5><a name="Token_Life"></a>Token Life</h5>
<ul>
<li>
<p>S3A Delegation tokens cannot be renewed.</p>
</li>
<li>
<p>S3A Delegation tokens cannot be revoked. It is possible for an administrator to terminate <i>all AWS sessions using a specific role</i> <a class="externalLink" href="https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_temp_control-access_disable-perms.html">from the AWS IAM console</a>, if desired.</p>
</li>
<li>
<p>The lifespan of Session Delegation Tokens are limited to those of AWS sessions, maximum of 36 hours.</p>
</li>
<li>
<p>The lifespan of a Role Delegation Token is limited to 1 hour by default; a longer duration of up to 12 hours can be enabled in the AWS console for the specific role being used.</p>
</li>
<li>
<p>The lifespan of Full Delegation tokens is unlimited: the secret needs to be reset in the AWS Admin console to revoke it.</p>
</li>
</ul></section><section>
<h5><a name="Service_Load_on_the_AWS_Secure_Token_Service"></a>Service Load on the AWS Secure Token Service</h5>
<p>All delegation tokens are issued on a bucket-by-bucket basis: clients must request a delegation token from every S3A filesystem to which it desires access.</p>
<p>For Session and Role Delegation Tokens, this places load on the AWS STS service, which may trigger throttling amongst all users within the same AWS account using the same STS endpoint.</p>
<ul>
<li>In experiments, a few hundred requests per second are needed to trigger throttling, so this is very unlikely to surface in production systems.</li>
<li>The S3A filesystem connector retries all throttled requests to AWS services, including STS.</li>
<li>Other S3 clients which use the AWS SDK will, if configured, also retry throttled requests.</li>
</ul>
<p>Overall, the risk of triggering STS throttling appears low, and most applications will recover from what is generally an intermittently used AWS service.</p></section></section></section><section>
<h3><a name="Enabling_Session_Delegation_Tokens"></a><a name="enabling-session-tokens"></a> Enabling Session Delegation Tokens</h3>
<p>For session tokens, set <code>fs.s3a.delegation.token.binding</code> to <code>org.apache.hadoop.fs.s3a.auth.delegation.SessionTokenBinding</code></p>
<table border="0" class="bodyTable">
<thead>
<tr class="a">
<th> <b>Key</b> </th>
<th> <b>Value</b> </th></tr>
</thead><tbody>
<tr class="b">
<td> <code>fs.s3a.delegation.token.binding</code> </td>
<td> <code>org.apache.hadoop.fs.s3a.auth.delegation.SessionTokenBinding</code> </td></tr>
</tbody>
</table>
<p>There some further configuration options.</p>
<table border="0" class="bodyTable">
<thead>
<tr class="a">
<th> <b>Key</b> </th>
<th> <b>Meaning</b> </th>
<th> <b>Default</b> </th></tr>
</thead><tbody>
<tr class="b">
<td> <code>fs.s3a.assumed.role.session.duration</code> </td>
<td> Duration of delegation tokens </td>
<td> <code>1h</code> </td></tr>
<tr class="a">
<td> <code>fs.s3a.assumed.role.sts.endpoint</code> </td>
<td> URL to service issuing tokens </td>
<td> (undefined) </td></tr>
<tr class="b">
<td> <code>fs.s3a.assumed.role.sts.endpoint.region</code> </td>
<td> region for issued tokens </td>
<td> (undefined) </td></tr>
</tbody>
</table>
<p>The XML settings needed to enable session tokens are:</p>
<div class="source">
<div class="source">
<pre>&lt;property&gt;
&lt;name&gt;fs.s3a.delegation.token.binding&lt;/name&gt;
&lt;value&gt;org.apache.hadoop.fs.s3a.auth.delegation.SessionTokenBinding&lt;/value&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;fs.s3a.assumed.role.session.duration&lt;/name&gt;
&lt;value&gt;1h&lt;/value&gt;
&lt;/property&gt;
</pre></div></div>
<ol style="list-style-type: decimal">
<li>If the application requesting a token has full AWS credentials for the relevant bucket, then a new session token will be issued.</li>
<li>If the application requesting a token is itself authenticating with a session delegation token, then the existing token will be forwarded. The life of the token will not be extended.</li>
<li>If the application requesting a token does not have either of these, the token cannot be issued: the operation will fail with an error.</li>
</ol>
<p>The endpoint for STS requests are set by the same configuration property as for the <code>AssumedRole</code> credential provider and for Role Delegation tokens.</p>
<div class="source">
<div class="source">
<pre>&lt;!-- Optional --&gt;
&lt;property&gt;
&lt;name&gt;fs.s3a.assumed.role.sts.endpoint&lt;/name&gt;
&lt;value&gt;sts.amazonaws.com&lt;/value&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;fs.s3a.assumed.role.sts.endpoint.region&lt;/name&gt;
&lt;value&gt;us-west-1&lt;/value&gt;
&lt;/property&gt;
</pre></div></div>
<p>If the <code>fs.s3a.assumed.role.sts.endpoint</code> option is set, or set to something other than the central <code>sts.amazonaws.com</code> endpoint, then the region property <i>must</i> be set.</p>
<p>Both the Session and the Role Delegation Token bindings use the option <code>fs.s3a.aws.credentials.provider</code> to define the credential providers to authenticate to the AWS STS with.</p>
<p>Here is the effective list of providers if none are declared:</p>
<div class="source">
<div class="source">
<pre>&lt;property&gt;
&lt;name&gt;fs.s3a.aws.credentials.provider&lt;/name&gt;
&lt;value&gt;
org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider,
org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider,
com.amazonaws.auth.EnvironmentVariableCredentialsProvider,
org.apache.hadoop.fs.s3a.auth.IAMInstanceCredentialsProvider
&lt;/value&gt;
&lt;/property&gt;
</pre></div></div>
<p>Not all these authentication mechanisms provide the full set of credentials STS needs. The session token provider will simply forward any session credentials it is authenticated with; the role token binding will fail.</p><section>
<h4><a name="Forwarding_of_existing_AWS_Session_credentials."></a>Forwarding of existing AWS Session credentials.</h4>
<p>When the AWS credentials supplied to the Session Delegation Token binding through <code>fs.s3a.aws.credentials.provider</code> are themselves a set of session credentials, generated delegation tokens will simply contain these existing session credentials, not a new set of credentials obtained from STS. This is because the STS service does not let callers authenticated with session/role credentials request new sessions.</p>
<p>This feature is useful when generating tokens from an EC2 VM instance in one IAM role and forwarding them over to VMs which are running in a different IAM role. The tokens will grant the permissions of the original VM&#x2019;s IAM role.</p>
<p>The duration of the forwarded tokens will be exactly that of the current set of tokens, which may be very limited in lifespan. A warning will appear in the logs declaring this.</p>
<p>Note: Role Delegation tokens do not support this forwarding of session credentials, because there&#x2019;s no way to explicitly change roles in the process.</p></section></section><section>
<h3><a name="Enabling_Role_Delegation_Tokens"></a><a name="enabling-role-tokens"></a> Enabling Role Delegation Tokens</h3>
<p>For role delegation tokens, set <code>fs.s3a.delegation.token.binding</code> to <code>org.apache.hadoop.fs.s3a.auth.delegation.RoleTokenBinding</code></p>
<table border="0" class="bodyTable">
<thead>
<tr class="a">
<th> <b>Key</b> </th>
<th> <b>Value</b> </th></tr>
</thead><tbody>
<tr class="b">
<td> <code>fs.s3a.delegation.token.binding</code> </td>
<td> <code>org.apache.hadoop.fs.s3a.auth.delegation.SessionToRoleTokenBinding</code> </td></tr>
</tbody>
</table>
<p>There are some further configuration options:</p>
<table border="0" class="bodyTable">
<thead>
<tr class="a">
<th> <b>Key</b> </th>
<th> <b>Meaning</b> </th>
<th> <b>Default</b> </th></tr>
</thead><tbody>
<tr class="b">
<td> <code>fs.s3a.assumed.role.session.duration</code> </td>
<td> Duration of delegation tokens </td>
<td> <code>1h</code> </td></tr>
<tr class="a">
<td> <code>fs.s3a.assumed.role.arn</code> </td>
<td> ARN for role to request </td>
<td> (undefined) </td></tr>
<tr class="b">
<td> <code>fs.s3a.assumed.role.sts.endpoint.region</code> </td>
<td> region for issued tokens </td>
<td> (undefined) </td></tr>
</tbody>
</table>
<p>The option <code>fs.s3a.assumed.role.arn</code> must be set to a role which the user can assume. It must have permissions to access the bucket and any KMS encryption keys. The actual requested role will be this role, explicitly restricted to the specific bucket.</p>
<p>The XML settings needed to enable session tokens are:</p>
<div class="source">
<div class="source">
<pre>&lt;property&gt;
&lt;name&gt;fs.s3a.delegation.token.binding&lt;/name&gt;
&lt;value&gt;org.apache.hadoop.fs.s3a.auth.delegation.RoleTokenBinding&lt;/value&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;fs.s3a.assumed.role.arn&lt;/name&gt;
&lt;value&gt;ARN of role to request&lt;/value&gt;
&lt;value&gt;REQUIRED ARN&lt;/value&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;fs.s3a.assumed.role.session.duration&lt;/name&gt;
&lt;value&gt;1h&lt;/value&gt;
&lt;/property&gt;
</pre></div></div>
<p>A JSON role policy for the role/session will automatically be generated which will consist of:</p>
<ol style="list-style-type: decimal">
<li>Full access to the S3 bucket for all operations used by the S3A client (read, write, list, multipart operations, get bucket location, etc).</li>
<li>Full user access to KMS keys. This is to be able to decrypt any data in the bucket encrypted with SSE-KMS, as well as encrypt new data if that is the encryption policy.</li>
</ol></section><section>
<h3><a name="Enabling_Full_Delegation_Tokens"></a><a name="enabling-full-tokens"></a> Enabling Full Delegation Tokens</h3>
<p>This passes the full credentials in, falling back to any session credentials which were used to configure the S3A FileSystem instance.</p>
<p>For Full Credential Delegation tokens, set <code>fs.s3a.delegation.token.binding</code> to <code>org.apache.hadoop.fs.s3a.auth.delegation.FullCredentialsTokenBinding</code></p>
<table border="0" class="bodyTable">
<thead>
<tr class="a">
<th> <b>Key</b> </th>
<th> <b>Value</b> </th></tr>
</thead><tbody>
<tr class="b">
<td> <code>fs.s3a.delegation.token.binding</code> </td>
<td> <code>org.apache.hadoop.fs.s3a.auth.delegation.FullCredentialsTokenBinding</code> </td></tr>
</tbody>
</table>
<p>There are no other configuration options.</p>
<div class="source">
<div class="source">
<pre>&lt;property&gt;
&lt;name&gt;fs.s3a.delegation.token.binding&lt;/name&gt;
&lt;value&gt;org.apache.hadoop.fs.s3a.auth.delegation.FullCredentialsTokenBinding&lt;/value&gt;
&lt;/property&gt;
</pre></div></div>
<p>Key points:</p>
<ol style="list-style-type: decimal">
<li>If the application requesting a token has full AWS credentials for the relevant bucket, then a full credential token will be issued.</li>
<li>If the application requesting a token is itself authenticating with a session delegation token, then the existing token will be forwarded. The life of the token will not be extended.</li>
<li>If the application requesting a token does not have either of these, the tokens cannot be issued: the operation will fail with an error.</li>
</ol></section></section><section>
<h2><a name="Managing_the_Delegation_Tokens_Duration"></a><a name="managing_token_duration"></a> Managing the Delegation Tokens Duration</h2>
<p>Full Credentials have an unlimited lifespan.</p>
<p>Session and role credentials have a lifespan defined by the duration property <code>fs.s3a.assumed.role.session.duration</code>.</p>
<p>This can have a maximum value of &#x201c;36h&#x201d; for session delegation tokens.</p>
<p>For Role Delegation Tokens, the maximum duration of a token is that of the role itself: 1h by default, though this can be changed to 12h <a class="externalLink" href="https://console.aws.amazon.com/iam/home#/roles">In the IAM Console</a>, or from the AWS CLI.</p>
<p>Without increasing the duration of the role, one hour is the maximum value; the error message <code>The requested DurationSeconds exceeds the MaxSessionDuration set for this role</code> is returned if the requested duration of a Role Delegation Token is greater than that available for the role.</p></section><section>
<h2><a name="Testing_Delegation_Token_Support"></a><a name="testing"></a> Testing Delegation Token Support</h2>
<p>The easiest way to test that delegation support is configured is to use the <code>hdfs fetchdt</code> command, which can fetch tokens from S3A, Azure ABFS and any other filesystem which can issue tokens, as well as HDFS itself.</p>
<p>This will fetch the token and save it to the named file (here, <code>tokens.bin</code>), even if Kerberos is disabled.</p>
<div class="source">
<div class="source">
<pre># Fetch a token for the AWS landsat-pds bucket and save it to tokens.bin
$ hdfs fetchdt --webservice s3a://landsat-pds/ tokens.bin
</pre></div></div>
<p>If the command fails with <code>ERROR: Failed to fetch token</code> it means the filesystem does not have delegation tokens enabled.</p>
<p>If it fails for other reasons, the likely causes are configuration and possibly connectivity to the AWS STS Server.</p>
<p>Once collected, the token can be printed. This will show the type of token, details about encryption and expiry, and the host on which it was created.</p>
<div class="source">
<div class="source">
<pre>$ bin/hdfs fetchdt --print tokens.bin
Token (S3ATokenIdentifier{S3ADelegationToken/Session; uri=s3a://landsat-pds;
timestamp=1541683947569; encryption=EncryptionSecrets{encryptionMethod=SSE_S3};
Created on vm1.local/192.168.99.1 at time 2018-11-08T13:32:26.381Z.};
Session credentials for user AAABWL expires Thu Nov 08 14:02:27 GMT 2018; (valid))
for s3a://landsat-pds
</pre></div></div>
<p>The &#x201c;(valid)&#x201d; annotation means that the AWS credentials are considered &#x201c;valid&#x201d;: there is both a username and a secret.</p>
<p>You can use the <code>s3guard bucket-info</code> command to see what the delegation support for a specific bucket is. If delegation support is enabled, it also prints the current hadoop security level.</p>
<div class="source">
<div class="source">
<pre>$ hadoop s3guard bucket-info s3a://landsat-pds/
Filesystem s3a://landsat-pds
Location: us-west-2
Filesystem s3a://landsat-pds is not using S3Guard
The &quot;magic&quot; committer is not supported
S3A Client
Signing Algorithm: fs.s3a.signing-algorithm=(unset)
Endpoint: fs.s3a.endpoint=s3.amazonaws.com
Encryption: fs.s3a.server-side-encryption-algorithm=none
Input seek policy: fs.s3a.experimental.input.fadvise=normal
Change Detection Source: fs.s3a.change.detection.source=etag
Change Detection Mode: fs.s3a.change.detection.mode=server
Delegation Support enabled: token kind = S3ADelegationToken/Session
Hadoop security mode: SIMPLE
</pre></div></div>
<p>Although the S3A delegation tokens do not depend upon Kerberos, MapReduce and other applications only request tokens from filesystems when security is enabled in Hadoop.</p></section><section>
<h2><a name="Troubleshooting_S3A_Delegation_Tokens"></a><a name="troubleshooting"></a> Troubleshooting S3A Delegation Tokens</h2>
<p>The <code>hadoop s3guard bucket-info</code> command will print information about the delegation state of a bucket.</p>
<p>Consult <a href="assumed_roles.html#troubleshooting">troubleshooting Assumed Roles</a> for details on AWS error messages related to AWS IAM roles.</p>
<p>The <a class="externalLink" href="https://github.com/steveloughran/cloudstore">cloudstore</a> module&#x2019;s StoreDiag utility can also be used to explore delegation token support.</p><section>
<h3><a name="Submitted_job_cannot_authenticate"></a>Submitted job cannot authenticate</h3>
<p>There are many causes for this; delegation tokens add some more.</p></section><section>
<h3><a name="Tokens_are_not_issued"></a>Tokens are not issued</h3>
<ul>
<li>This user is not <code>kinit</code>-ed in to Kerberos. Use <code>klist</code> and <code>hadoop kdiag</code> to see the Kerberos authentication state of the logged in user.</li>
<li>The filesystem instance on the client does not have a token binding set in <code>fs.s3a.delegation.token.binding</code>, so does not attempt to issue any.</li>
<li>The job submission is not aware that access to the specific S3 buckets are required. Review the application&#x2019;s submission mechanism to determine how to list source and destination paths. For example, for MapReduce, tokens for the cluster filesystem (<code>fs.defaultFS</code>) and all filesystems referenced as input and output paths will be queried for delegation tokens.</li>
</ul>
<p>For Apache Spark, the cluster filesystem and any filesystems listed in the property <code>spark.yarn.access.hadoopFileSystems</code> are queried for delegation tokens in secure clusters. See <a class="externalLink" href="https://spark.apache.org/docs/latest/running-on-yarn.html">Running on Yarn</a>.</p></section><section>
<h3><a name="Error_No_AWS_login_credentials"></a>Error <code>No AWS login credentials</code></h3>
<p>The client does not have any valid credentials to request a token from the Amazon STS service.</p></section><section>
<h3><a name="Tokens_Expire_before_job_completes"></a>Tokens Expire before job completes</h3>
<p>The default duration of session and role tokens as set in <code>fs.s3a.assumed.role.session.duration</code> is one hour, &#x201c;1h&#x201d;.</p>
<p>For session tokens, this can be increased to any time up to 36 hours.</p>
<p>For role tokens, it can be increased up to 12 hours, <i>but only if the role is configured in the AWS IAM Console to have a longer lifespan</i>.</p></section><section>
<h3><a name="Error_DelegationTokenIOException:_Token_mismatch"></a>Error <code>DelegationTokenIOException: Token mismatch</code></h3>
<div class="source">
<div class="source">
<pre>org.apache.hadoop.fs.s3a.auth.delegation.DelegationTokenIOException:
Token mismatch: expected token for s3a://example-bucket
of type S3ADelegationToken/Session but got a token of type S3ADelegationToken/Full
at org.apache.hadoop.fs.s3a.auth.delegation.S3ADelegationTokens.lookupToken(S3ADelegationTokens.java:379)
at org.apache.hadoop.fs.s3a.auth.delegation.S3ADelegationTokens.selectTokenFromActiveUser(S3ADelegationTokens.java:300)
at org.apache.hadoop.fs.s3a.auth.delegation.S3ADelegationTokens.bindToExistingDT(S3ADelegationTokens.java:160)
at org.apache.hadoop.fs.s3a.S3AFileSystem.bindAWSClient(S3AFileSystem.java:423)
at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:265)
</pre></div></div>
<p>The value of <code>fs.s3a.delegation.token.binding</code> is different in the remote service than in the local client. As a result, the remote service cannot use the token supplied by the client to authenticate.</p>
<p>Fix: reference the same token binding class at both ends.</p></section><section>
<h3><a name="Warning_Forwarding_existing_session_credentials"></a>Warning <code>Forwarding existing session credentials</code></h3>
<p>This message is printed when an S3A filesystem instance has been asked for a Session Delegation Token, and it is itself only authenticated with a set of AWS session credentials (such as those issued by the IAM metadata service).</p>
<p>The created token will contain these existing credentials, credentials which can be used until the existing session expires.</p>
<p>The duration of this existing session is unknown: the message is warning you that it may expire without warning.</p></section><section>
<h3><a name="Error_Cannot_issue_S3A_Role_Delegation_Tokens_without_full_AWS_credentials"></a>Error <code>Cannot issue S3A Role Delegation Tokens without full AWS credentials</code></h3>
<p>An S3A filesystem instance has been asked for a Role Delegation Token, but the instance is only authenticated with session tokens. This means that a set of role tokens cannot be requested.</p>
<p>Note: no attempt is made to convert the existing set of session tokens into a delegation token, unlike the Session Delegation Tokens. This is because the role of the current session (if any) is unknown.</p></section></section><section>
<h2><a name="Implementation_Details"></a><a name="implementation"></a> Implementation Details</h2><section>
<h3><a name="Architecture"></a><a name="architecture"></a> Architecture</h3>
<p>Concepts:</p>
<ol style="list-style-type: decimal">
<li>The S3A FileSystem can create delegation tokens when requested.</li>
<li>These can be marshalled as per other Hadoop Delegation Tokens.</li>
<li>At the far end, they can be retrieved, unmarshalled and used to authenticate callers.</li>
<li>DT binding plugins can then use these directly, or, somehow, manage authentication and token issue through other services (for example: Kerberos)</li>
<li>Token Renewal and Revocation are not supported.</li>
</ol>
<p>There&#x2019;s support for different back-end token bindings through the <code>org.apache.hadoop.fs.s3a.auth.delegation.S3ADelegationTokenManager</code></p>
<p>Every implementation of this must return a subclass of <code>org.apache.hadoop.fs.s3a.auth.delegation.AbstractS3ATokenIdentifier</code> when asked to create a delegation token; this subclass must be registered in <code>META-INF/services/org.apache.hadoop.security.token.TokenIdentifier</code> for unmarshalling.</p>
<p>This identifier must contain all information needed at the far end to authenticate the caller with AWS services used by the S3A client: AWS S3 and potentially AWS KMS (for SSE-KMS).</p>
<p>It must have its own unique <i>Token Kind</i>, to ensure that it can be distinguished from the other token identifiers when tokens are being unmarshalled.</p>
<table border="0" class="bodyTable">
<thead>
<tr class="a">
<th> Kind </th>
<th> Token class </th></tr>
</thead><tbody>
<tr class="b">
<td> <code>S3ADelegationToken/Full</code> </td>
<td> <code>org.apache.hadoop.fs.s3a.auth.delegation.FullCredentialsTokenIdentifier</code> </td></tr>
<tr class="a">
<td> <code>S3ADelegationToken/Session</code> </td>
<td> <code>org.apache.hadoop.fs.s3a.auth.delegation.RoleTokenIdentifier</code></td></tr>
<tr class="b">
<td> <code>S3ADelegationToken/Role</code> </td>
<td> <code>org.apache.hadoop.fs.s3a.auth.delegation.SessionTokenIdentifier</code> </td></tr>
</tbody>
</table>
<p>If implementing an external binding:</p>
<ol style="list-style-type: decimal">
<li>Follow the security requirements below.</li>
<li>Define a new token identifier; there is no requirement for the <code>S3ADelegationToken/</code> prefix &#x2014;but it is useful for debugging.</li>
<li>Token Renewal and Revocation is not integrated with the binding mechanism; if the operations are supported, implementation is left as an exercise.</li>
<li>Be aware of the stability guarantees of the module &#x201c;LimitedPrivate/Unstable&#x201d;.</li>
</ol></section><section>
<h3><a name="Security"></a><a name="security"></a> Security</h3>
<p>S3A DTs contain secrets valuable for a limited period (session secrets) or long-lived secrets with no explicit time limit.</p>
<ul>
<li>The <code>toString()</code> operations on token identifiers MUST NOT print secrets; this is needed to keep them out of logs.</li>
<li>Secrets MUST NOT be logged, even at debug level.</li>
<li>Prefer short-lived session secrets over long-term secrets.</li>
<li>Try to restrict the permissions to what a client with the delegated token may perform to those needed to access data in the S3 bucket.</li>
<li>Implementations need to be resistant to attacks which pass in invalid data as their token identifier: validate the types of the unmarshalled data; set limits on the size of all strings and other arrays to read in, etc.</li>
</ul></section><section>
<h3><a name="Resilience"></a><a name="resilience"></a> Resilience</h3>
<p>Implementations need to handle transient failures of any remote authentication service, and the risk of a large-cluster startup overloading it.</p>
<ul>
<li>All get/renew/cancel operations should be considered idempotent.</li>
<li>And clients to repeat with backoff &amp; jitter on recoverable connectivity failures.</li>
<li>While failing fast on the unrecoverable failures (DNS, authentication).</li>
</ul></section><section>
<h3><a name="Scalability_limits_of_AWS_STS_service"></a><a name="scalability"></a> Scalability limits of AWS STS service</h3>
<p>There is currently no documented rate limit for token requests against the AWS STS service.</p>
<p>We have two tests which attempt to generate enough requests for delegation tokens that the AWS STS service will throttle requests for tokens by that AWS account for that specific STS endpoint (<code>ILoadTestRoleCredentials</code> and <code>ILoadTestSessionCredentials</code>).</p>
<p>In the initial results of these tests:</p>
<ul>
<li>A few hundred requests a second can be made before STS block the caller.</li>
<li>The throttling does not last very long (seconds)</li>
<li>It does not appear to affect any other STS endpoints.</li>
</ul>
<p>If developers wish to experiment with these tests and provide more detailed analysis, we would welcome this. Do bear in mind that all users of the same AWS account in that region will be throttled. Your colleagues may notice, especially if the applications they are running do not retry on throttle responses from STS (it&#x2019;s not a common occurrence after all&#x2026;).</p></section></section><section>
<h2><a name="Implementing_your_own_Delegation_Token_Binding"></a>Implementing your own Delegation Token Binding</h2>
<p>The DT binding mechanism is designed to be extensible: if you have an alternate authentication mechanism, such as an S3-compatible object store with Kerberos support &#x2014;S3A Delegation tokens should support it.</p>
<p><i>if it can&#x2019;t: that&#x2019;s a bug in the implementation which needs to be corrected</i>.</p><section>
<h3><a name="Steps"></a>Steps</h3>
<ol style="list-style-type: decimal">
<li>Come up with a token &#x201c;Kind&#x201d;; a unique name for the delegation token identifier.</li>
<li>Implement a subclass of <code>AbstractS3ATokenIdentifier</code> which adds all information which is marshalled from client to remote services. This must subclass the <code>Writable</code> methods to read and write the data to a data stream: these subclasses must call the superclass methods first.</li>
<li>Add a resource <code>META-INF/services/org.apache.hadoop.security.token.TokenIdentifier</code></li>
<li>And list in it, the classname of your new identifier.</li>
<li>Implement a subclass of <code>AbstractDelegationTokenBinding</code></li>
</ol></section><section>
<h3><a name="Implementing_AbstractS3ATokenIdentifier"></a>Implementing <code>AbstractS3ATokenIdentifier</code></h3>
<p>Look at the other examples to see what to do; <code>SessionTokenIdentifier</code> does most of the work.</p>
<p>Having a <code>toString()</code> method which is informative is ideal for the <code>hdfs creds</code> command as well as debugging: <i>but do not print secrets</i>.</p>
<p><i>Important</i>: Add no references to any AWS SDK class, to ensure it can be safely deserialized whenever the relevant token identifier is examined. Best practise is: avoid any references to classes which may not be on the classpath of core Hadoop services, especially the YARN Resource Manager and Node Managers.</p></section><section>
<h3><a name="AWSCredentialProviderList_deployUnbonded.28.29"></a><code>AWSCredentialProviderList deployUnbonded()</code></h3>
<ol style="list-style-type: decimal">
<li>Perform all initialization needed on an &#x201c;unbonded&#x201d; deployment to authenticate with the store.</li>
<li>Return a list of AWS Credential providers which can be used to authenticate the caller.</li>
</ol>
<p><b>Tip</b>: consider <i>not</i> doing all the checks to verify that DTs can be issued. That can be postponed until a DT is issued -as in any deployments where a DT is not actually needed, failing at this point is overkill. As an example, <code>RoleTokenBinding</code> cannot issue DTs if it only has a set of session credentials, but it will deploy without them, so allowing <code>hadoop fs</code> commands to work on an EC2 VM with IAM role credentials.</p>
<p><b>Tip</b>: The class <code>org.apache.hadoop.fs.s3a.auth.MarshalledCredentials</code> holds a set of marshalled credentials and so can be used within your own Token Identifier if you want to include a set of full/session AWS credentials in your token identifier.</p></section><section>
<h3><a name="AWSCredentialProviderList_bindToTokenIdentifier.28AbstractS3ATokenIdentifier_id.29"></a><code>AWSCredentialProviderList bindToTokenIdentifier(AbstractS3ATokenIdentifier id)</code></h3>
<p>The identifier passed in will be the one for the current filesystem URI and of your token kind.</p>
<ol style="list-style-type: decimal">
<li>Use <code>convertTokenIdentifier</code> to cast it to your DT type, or fail with a meaningful <code>IOException</code>.</li>
<li>Extract the secrets needed to authenticate with the object store (or whatever service issues object store credentials).</li>
<li>Return a list of AWS Credential providers which can be used to authenticate the caller with the extracted secrets.</li>
</ol></section><section>
<h3><a name="AbstractS3ATokenIdentifier_createEmptyIdentifier.28.29"></a><code>AbstractS3ATokenIdentifier createEmptyIdentifier()</code></h3>
<p>Return an empty instance of your token identifier.</p></section><section>
<h3><a name="AbstractS3ATokenIdentifier_createTokenIdentifier.28Optional.3CRoleModel.Policy.3E_policy.2C_EncryptionSecrets_secrets.29"></a><code>AbstractS3ATokenIdentifier createTokenIdentifier(Optional&lt;RoleModel.Policy&gt; policy, EncryptionSecrets secrets)</code></h3>
<p>Create the delegation token.</p>
<p>If non-empty, the <code>policy</code> argument contains an AWS policy model to grant access to:</p>
<ul>
<li>The target S3 bucket.</li>
<li>KMS key <code>&quot;kms:GenerateDataKey</code> and <code>kms:Decrypt</code>permissions for all KMS keys.</li>
</ul>
<p>This can be converted to a string and passed to the AWS <code>assumeRole</code> operation.</p>
<p>The <code>secrets</code> argument contains encryption policy and secrets: this should be passed to the superclass constructor as is; it is retrieved and used to set the encryption policy on the newly created filesystem.</p>
<p><i>Tip</i>: Use <code>AbstractS3ATokenIdentifier.createDefaultOriginMessage()</code> to create an initial message for the origin of the token &#x2014;this is useful for diagnostics.</p><section>
<h4><a name="Token_Renewal"></a>Token Renewal</h4>
<p>There&#x2019;s no support in the design for token renewal; it would be very complex to make it pluggable, and as all the bundled mechanisms don&#x2019;t support renewal, untestable and unjustifiable.</p>
<p>Any token binding which wants to add renewal support will have to implement it directly.</p></section></section><section>
<h3><a name="Testing"></a>Testing</h3>
<p>Use the tests <code>org.apache.hadoop.fs.s3a.auth.delegation</code> as examples. You&#x2019;ll have to copy and paste some of the test base classes over; <code>hadoop-common</code>&#x2019;s test JAR is published to Maven Central, but not the S3A one (a fear of leaking AWS credentials).</p><section>
<h4><a name="Unit_Test_TestS3ADelegationTokenSupport"></a>Unit Test <code>TestS3ADelegationTokenSupport</code></h4>
<p>This tests marshalling and unmarshalling of tokens identifiers. <i>Test that every field is preserved.</i></p></section><section>
<h4><a name="Integration_Test_ITestSessionDelegationTokens"></a>Integration Test <code>ITestSessionDelegationTokens</code></h4>
<p>Tests the lifecycle of session tokens.</p></section><section>
<h4><a name="Integration_Test_ITestSessionDelegationInFileystem."></a>Integration Test <code>ITestSessionDelegationInFileystem</code>.</h4>
<p>This collects DTs from one filesystem, and uses that to create a new FS instance and then perform filesystem operations. A miniKDC is instantiated.</p>
<ul>
<li>Take care to remove all login secrets from the environment, so as to make sure that the second instance is picking up the DT information.</li>
<li><code>UserGroupInformation.reset()</code> can be used to reset user secrets after every test case (e.g. teardown), so that issued DTs from one test case do not contaminate the next.</li>
<li>It&#x2019;s subclass, <code>ITestRoleDelegationInFileystem</code> adds a check that the current credentials in the DT cannot be used to access data on other buckets &#x2014;that is, the active session really is restricted to the target bucket.</li>
</ul></section><section>
<h4><a name="Integration_Test_ITestDelegatedMRJob"></a>Integration Test <code>ITestDelegatedMRJob</code></h4>
<p>It&#x2019;s not easy to bring up a YARN cluster with a secure HDFS and miniKDC controller in test cases &#x2014;this test, the closest there is to an end-to-end test, uses mocking to mock the RPC calls to the YARN AM, and then verifies that the tokens have been collected in the job context.</p></section><section>
<h4><a name="Load_Test_ILoadTestSessionCredentials"></a>Load Test <code>ILoadTestSessionCredentials</code></h4>
<p>This attempts to collect many, many delegation tokens simultaneously and sees what happens.</p>
<p>Worth doing if you have a new authentication service provider, or implementing custom DT support. Consider also something for going from DT to AWS credentials if this is also implemented by your own service. This is left as an exercise for the developer.</p>
<p><b>Tip</b>: don&#x2019;t go overboard here, especially against AWS itself.</p></section></section></section>
</div>
</div>
<div class="clear">
<hr/>
</div>
<div id="footer">
<div class="xright">
&#169; 2008-2023
Apache Software Foundation
- <a href="http://maven.apache.org/privacy-policy.html">Privacy Policy</a>.
Apache Maven, Maven, Apache, the Apache feather logo, and the Apache Maven project logos are trademarks of The Apache Software Foundation.
</div>
<div class="clear">
<hr/>
</div>
</div>
</body>
</html>