hadoop/hadoop-mapreduce-client/hadoop-mapreduce-client-core/manifest_committer_architec...

775 lines
44 KiB
HTML

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<!--
| Generated by Apache Maven Doxia at 2023-03-25
| Rendered using Apache Maven Stylus Skin 1.5
-->
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Apache Hadoop 3.4.0-SNAPSHOT &#x2013; Manifest Committer Architecture</title>
<style type="text/css" media="all">
@import url("./css/maven-base.css");
@import url("./css/maven-theme.css");
@import url("./css/site.css");
</style>
<link rel="stylesheet" href="./css/print.css" type="text/css" media="print" />
<meta name="Date-Revision-yyyymmdd" content="20230325" />
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
</head>
<body class="composite">
<div id="banner">
<a href="http://hadoop.apache.org/" id="bannerLeft">
<img src="http://hadoop.apache.org/images/hadoop-logo.jpg" alt="" />
</a>
<a href="http://www.apache.org/" id="bannerRight">
<img src="http://www.apache.org/images/asf_logo_wide.png" alt="" />
</a>
<div class="clear">
<hr/>
</div>
</div>
<div id="breadcrumbs">
<div class="xright"> <a href="http://wiki.apache.org/hadoop" class="externalLink">Wiki</a>
|
<a href="https://gitbox.apache.org/repos/asf/hadoop.git" class="externalLink">git</a>
|
<a href="http://hadoop.apache.org/" class="externalLink">Apache Hadoop</a>
&nbsp;| Last Published: 2023-03-25
&nbsp;| Version: 3.4.0-SNAPSHOT
</div>
<div class="clear">
<hr/>
</div>
</div>
<div id="leftColumn">
<div id="navcolumn">
<h5>General</h5>
<ul>
<li class="none">
<a href="../../index.html">Overview</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/SingleCluster.html">Single Node Setup</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/ClusterSetup.html">Cluster Setup</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/CommandsManual.html">Commands Reference</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/FileSystemShell.html">FileSystem Shell</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/Compatibility.html">Compatibility Specification</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/DownstreamDev.html">Downstream Developer's Guide</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/AdminCompatibilityGuide.html">Admin Compatibility Guide</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/InterfaceClassification.html">Interface Classification</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/filesystem/index.html">FileSystem Specification</a>
</li>
</ul>
<h5>Common</h5>
<ul>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/CLIMiniCluster.html">CLI Mini Cluster</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/FairCallQueue.html">Fair Call Queue</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/NativeLibraries.html">Native Libraries</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/Superusers.html">Proxy User</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/RackAwareness.html">Rack Awareness</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/SecureMode.html">Secure Mode</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/ServiceLevelAuth.html">Service Level Authorization</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/HttpAuthentication.html">HTTP Authentication</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/CredentialProviderAPI.html">Credential Provider API</a>
</li>
<li class="none">
<a href="../../hadoop-kms/index.html">Hadoop KMS</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/Tracing.html">Tracing</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/UnixShellGuide.html">Unix Shell Guide</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/registry/index.html">Registry</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/AsyncProfilerServlet.html">Async Profiler</a>
</li>
</ul>
<h5>HDFS</h5>
<ul>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HdfsDesign.html">Architecture</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html">User Guide</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HDFSCommands.html">Commands Reference</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html">NameNode HA With QJM</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithNFS.html">NameNode HA With NFS</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/ObserverNameNode.html">Observer NameNode</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/Federation.html">Federation</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/ViewFs.html">ViewFs</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/ViewFsOverloadScheme.html">ViewFsOverloadScheme</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HdfsSnapshots.html">Snapshots</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HdfsEditsViewer.html">Edits Viewer</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HdfsImageViewer.html">Image Viewer</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HdfsPermissionsGuide.html">Permissions and HDFS</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HdfsQuotaAdminGuide.html">Quotas and HDFS</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/LibHdfs.html">libhdfs (C API)</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/WebHDFS.html">WebHDFS (REST API)</a>
</li>
<li class="none">
<a href="../../hadoop-hdfs-httpfs/index.html">HttpFS</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/ShortCircuitLocalReads.html">Short Circuit Local Reads</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/CentralizedCacheManagement.html">Centralized Cache Management</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HdfsNfsGateway.html">NFS Gateway</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html">Rolling Upgrade</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/ExtendedAttributes.html">Extended Attributes</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/TransparentEncryption.html">Transparent Encryption</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HdfsMultihoming.html">Multihoming</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/ArchivalStorage.html">Storage Policies</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/MemoryStorage.html">Memory Storage Support</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/SLGUserGuide.html">Synthetic Load Generator</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HDFSErasureCoding.html">Erasure Coding</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HDFSDiskbalancer.html">Disk Balancer</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HdfsUpgradeDomain.html">Upgrade Domain</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HdfsDataNodeAdminGuide.html">DataNode Admin</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs-rbf/HDFSRouterFederation.html">Router Federation</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HdfsProvidedStorage.html">Provided Storage</a>
</li>
</ul>
<h5>MapReduce</h5>
<ul>
<li class="none">
<a href="../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html">Tutorial</a>
</li>
<li class="none">
<a href="../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapredCommands.html">Commands Reference</a>
</li>
<li class="none">
<a href="../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduce_Compatibility_Hadoop1_Hadoop2.html">Compatibility with 1.x</a>
</li>
<li class="none">
<a href="../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/EncryptedShuffle.html">Encrypted Shuffle</a>
</li>
<li class="none">
<a href="../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/PluggableShuffleAndPluggableSort.html">Pluggable Shuffle/Sort</a>
</li>
<li class="none">
<a href="../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/DistributedCacheDeploy.html">Distributed Cache Deploy</a>
</li>
<li class="none">
<a href="../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/SharedCacheSupport.html">Support for YARN Shared Cache</a>
</li>
</ul>
<h5>MapReduce REST APIs</h5>
<ul>
<li class="none">
<a href="../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapredAppMasterRest.html">MR Application Master</a>
</li>
<li class="none">
<a href="../../hadoop-mapreduce-client/hadoop-mapreduce-client-hs/HistoryServerRest.html">MR History Server</a>
</li>
</ul>
<h5>YARN</h5>
<ul>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/YARN.html">Architecture</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/YarnCommands.html">Commands Reference</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html">Capacity Scheduler</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/FairScheduler.html">Fair Scheduler</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/ResourceManagerRestart.html">ResourceManager Restart</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/ResourceManagerHA.html">ResourceManager HA</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/ResourceModel.html">Resource Model</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/NodeLabel.html">Node Labels</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/NodeAttributes.html">Node Attributes</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/WebApplicationProxy.html">Web Application Proxy</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/TimelineServer.html">Timeline Server</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/TimelineServiceV2.html">Timeline Service V.2</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/WritingYarnApplications.html">Writing YARN Applications</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/YarnApplicationSecurity.html">YARN Application Security</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/NodeManager.html">NodeManager</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/DockerContainers.html">Running Applications in Docker Containers</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/RuncContainers.html">Running Applications in runC Containers</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/NodeManagerCgroups.html">Using CGroups</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/SecureContainer.html">Secure Containers</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/ReservationSystem.html">Reservation System</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/GracefulDecommission.html">Graceful Decommission</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/OpportunisticContainers.html">Opportunistic Containers</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/Federation.html">YARN Federation</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/SharedCache.html">Shared Cache</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/UsingGpus.html">Using GPU</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/UsingFPGA.html">Using FPGA</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/PlacementConstraints.html">Placement Constraints</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/YarnUI2.html">YARN UI2</a>
</li>
</ul>
<h5>YARN REST APIs</h5>
<ul>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/WebServicesIntro.html">Introduction</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html">Resource Manager</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/NodeManagerRest.html">Node Manager</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/TimelineServer.html#Timeline_Server_REST_API_v1">Timeline Server</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/TimelineServiceV2.html#Timeline_Service_v.2_REST_API">Timeline Service V.2</a>
</li>
</ul>
<h5>YARN Service</h5>
<ul>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/yarn-service/Overview.html">Overview</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/yarn-service/QuickStart.html">QuickStart</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/yarn-service/Concepts.html">Concepts</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/yarn-service/YarnServiceAPI.html">Yarn Service API</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/yarn-service/ServiceDiscovery.html">Service Discovery</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/yarn-service/SystemServices.html">System Services</a>
</li>
</ul>
<h5>Hadoop Compatible File Systems</h5>
<ul>
<li class="none">
<a href="../../hadoop-aliyun/tools/hadoop-aliyun/index.html">Aliyun OSS</a>
</li>
<li class="none">
<a href="../../hadoop-aws/tools/hadoop-aws/index.html">Amazon S3</a>
</li>
<li class="none">
<a href="../../hadoop-azure/index.html">Azure Blob Storage</a>
</li>
<li class="none">
<a href="../../hadoop-azure-datalake/index.html">Azure Data Lake Storage</a>
</li>
<li class="none">
<a href="../../hadoop-cos/cloud-storage/index.html">Tencent COS</a>
</li>
<li class="none">
<a href="../../hadoop-huaweicloud/cloud-storage/index.html">Huaweicloud OBS</a>
</li>
</ul>
<h5>Auth</h5>
<ul>
<li class="none">
<a href="../../hadoop-auth/index.html">Overview</a>
</li>
<li class="none">
<a href="../../hadoop-auth/Examples.html">Examples</a>
</li>
<li class="none">
<a href="../../hadoop-auth/Configuration.html">Configuration</a>
</li>
<li class="none">
<a href="../../hadoop-auth/BuildingIt.html">Building</a>
</li>
</ul>
<h5>Tools</h5>
<ul>
<li class="none">
<a href="../../hadoop-streaming/HadoopStreaming.html">Hadoop Streaming</a>
</li>
<li class="none">
<a href="../../hadoop-archives/HadoopArchives.html">Hadoop Archives</a>
</li>
<li class="none">
<a href="../../hadoop-archive-logs/HadoopArchiveLogs.html">Hadoop Archive Logs</a>
</li>
<li class="none">
<a href="../../hadoop-distcp/DistCp.html">DistCp</a>
</li>
<li class="none">
<a href="../../hadoop-federation-balance/HDFSFederationBalance.html">HDFS Federation Balance</a>
</li>
<li class="none">
<a href="../../hadoop-gridmix/GridMix.html">GridMix</a>
</li>
<li class="none">
<a href="../../hadoop-rumen/Rumen.html">Rumen</a>
</li>
<li class="none">
<a href="../../hadoop-resourceestimator/ResourceEstimator.html">Resource Estimator Service</a>
</li>
<li class="none">
<a href="../../hadoop-sls/SchedulerLoadSimulator.html">Scheduler Load Simulator</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/Benchmarking.html">Hadoop Benchmarking</a>
</li>
<li class="none">
<a href="../../hadoop-dynamometer/Dynamometer.html">Dynamometer</a>
</li>
</ul>
<h5>Reference</h5>
<ul>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/release/">Changelog and Release Notes</a>
</li>
<li class="none">
<a href="../../api/index.html">Java API docs</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/UnixShellAPI.html">Unix Shell API</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/Metrics.html">Metrics</a>
</li>
</ul>
<h5>Configuration</h5>
<ul>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/core-default.xml">core-default.xml</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/hdfs-default.xml">hdfs-default.xml</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs-rbf/hdfs-rbf-default.xml">hdfs-rbf-default.xml</a>
</li>
<li class="none">
<a href="../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml">mapred-default.xml</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-common/yarn-default.xml">yarn-default.xml</a>
</li>
<li class="none">
<a href="../../hadoop-kms/kms-default.html">kms-default.xml</a>
</li>
<li class="none">
<a href="../../hadoop-hdfs-httpfs/httpfs-default.html">httpfs-default.xml</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/DeprecatedProperties.html">Deprecated Properties</a>
</li>
</ul>
<a href="http://maven.apache.org/" title="Built by Maven" class="poweredBy">
<img alt="Built by Maven" src="./images/logos/maven-feather.png"/>
</a>
</div>
</div>
<div id="bodyColumn">
<div id="contentBox">
<!---
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<h1>Manifest Committer Architecture</h1>
<p>This document describes the architecture and other implementation/correctness aspects of the <a href="manifest_committer.html">Manifest Committer</a></p>
<p>The protocol and its correctness are covered in <a href="manifest_committer_protocol.html">Manifest Committer Protocol</a>.</p>
<ul>
<li><a href="#Background">Background</a>
<ul>
<li><a href="#Terminology">Terminology</a></li></ul></li>
<li><a href="#The_Manifest_Committer:_A_high_performance_committer_for_Spark_on_Azure_and_Google_storage.">The Manifest Committer: A high performance committer for Spark on Azure and Google storage.</a>
<ul>
<li><a href="#The_Manifest">The Manifest</a></li>
<li><a href="#Task_Commit">Task Commit</a></li>
<li><a href="#Job_Commit">Job Commit</a></li>
<li><a href="#Ancestor_directory_preparation">Ancestor directory preparation</a></li>
<li><a href="#Parent_directory_creation">Parent directory creation</a></li>
<li><a href="#File_Rename">File Rename</a></li>
<li><a href="#Validation">Validation</a></li></ul></li>
<li><a href="#Benefits">Benefits</a>
<ul>
<li><a href="#Disadvantages">Disadvantages</a></li>
<li><a href="#Constraints">Constraints</a></li></ul></li>
<li><a href="#Implementation_Architecture">Implementation Architecture</a>
<ul>
<li><a href="#a"></a></li></ul></li>
<li><a href="#Auditing">Auditing</a></li></ul>
<p>The <i>Manifest</i> committer is a committer for work which provides performance on ABFS for &#x201c;real world&#x201d; queries, and performance and correctness on GCS.</p>
<p>This committer uses the extension point which came in for the S3A committers. Users can declare a new committer factory for <code>abfs://</code> and <code>gcs://</code> URLs. It can be used through Hadoop MapReduce and Apache Spark.</p><section>
<h2><a name="Background"></a>Background</h2><section>
<h3><a name="Terminology"></a>Terminology</h3>
<table border="0" class="bodyTable">
<thead>
<tr class="a">
<th> Term </th>
<th> Meaning</th></tr>
</thead><tbody>
<tr class="b">
<td> Committer </td>
<td> A class which can be invoked by MR/Spark to perform the task and job commit operations. </td></tr>
<tr class="a">
<td> Spark Driver </td>
<td> The spark process scheduling the work and choreographing the commit operation.</td></tr>
<tr class="b">
<td> Job </td>
<td> In MapReduce. the entire application. In spark, this is a single stage in a chain of work </td></tr>
<tr class="a">
<td> Job Attempt </td>
<td> A single attempt at a job. MR supports multiple Job attempts with recovery on partial job failure. Spark says &#x201c;start again from scratch&#x201d; </td></tr>
<tr class="b">
<td> Task </td>
<td> a subsection of a job, such as processing one file, or one part of a file </td></tr>
<tr class="a">
<td> Task ID </td>
<td> ID of the task, unique within this job. Usually starts at 0 and is used in filenames (part-0000, part-001, etc.) </td></tr>
<tr class="b">
<td> Task attempt (TA) </td>
<td> An attempt to perform a task. It may fail, in which case MR/spark will schedule another. </td></tr>
<tr class="a">
<td> Task Attempt ID </td>
<td> A unique ID for the task attempt. The Task ID + an attempt counter.</td></tr>
<tr class="b">
<td> Destination directory </td>
<td> The final destination of work.</td></tr>
<tr class="a">
<td> Job Attempt Directory </td>
<td> A temporary directory used by the job attempt. This is always <i>underneath</i> the destination directory, so as to ensure it is in the same encryption zone as HDFS, storage volume in other filesystems, etc.</td></tr>
<tr class="b">
<td> Task Attempt directory </td>
<td> (also known as &#x201c;Task Attempt Working Directory&#x201d;). Directory exclusive for each task attempt under which files are written </td></tr>
<tr class="a">
<td> Task Commit </td>
<td> Taking the output of a Task Attempt and making it the final/exclusive result of that &#x201c;successful&#x201d; Task.</td></tr>
<tr class="b">
<td> Job Commit </td>
<td> aggregating all the outputs of all committed tasks and producing the final results of the job. </td></tr>
</tbody>
</table>
<p>The purpose of a committer is to ensure that the complete output of a job ends up in the destination, even in the presence of failures of tasks.</p>
<ul>
<li><i>Complete:</i> the output includes the work of all successful tasks.</li>
<li><i>Exclusive:</i> the output of unsuccessful tasks is not present.</li>
<li><i>Concurrent:</i> When multiple tasks are committed in parallel the output is the same as when the task commits are serialized. This is not a requirement of Job Commit.</li>
<li><i>Abortable:</i> jobs and tasks may be aborted prior to job commit, after which their output is not visible.</li>
<li><i>Continuity of correctness:</i> once a job is committed, the output of any failed, aborted, or unsuccessful task MUST NO appear at some point in the future.</li>
</ul>
<p>For Hive&#x2019;s classic hierarchical-directory-structured tables, job committing requires the output of all committed tasks to be put into the correct location in the directory tree.</p>
<p>The committer built into <code>hadoop-mapreduce-client-core</code> module is the <code>FileOutputCommitter</code>.</p></section></section><section>
<h2><a name="The_Manifest_Committer:_A_high_performance_committer_for_Spark_on_Azure_and_Google_storage."></a>The Manifest Committer: A high performance committer for Spark on Azure and Google storage.</h2>
<p>The Manifest Committer is a higher performance committer for ABFS and GCS storage for jobs which create file across deep directory trees through many tasks.</p>
<p>It will also work on <code>hdfs://</code> and indeed, <code>file://</code> URLs, but it is optimized to address listing and renaming performance and throttling issues in cloud storage.</p>
<p>It <i>will not</i> work correctly with S3, because it relies on an atomic rename-no-overwrite operation to commit the manifest file. It will also have the performance problems of copying rather than moving all the generated data.</p>
<p>Although it will work with MapReduce there is no handling of multiple job attempts with recovery from previous failed attempts.</p><section>
<h3><a name="The_Manifest"></a>The Manifest</h3>
<p>A Manifest file is designed which contains (along with IOStatistics and some other things)</p>
<ol style="list-style-type: decimal">
<li>A list of destination directories which must be created if they do not exist.</li>
<li>A list of files to rename, recorded as (absolute source, absolute destination, file-size) entries.</li>
</ol></section><section>
<h3><a name="Task_Commit"></a>Task Commit</h3>
<p>Task attempts are committed by:</p>
<ol style="list-style-type: decimal">
<li>Recursively listing the task attempt working dir to build</li>
<li>A list of directories under which files are renamed.</li>
<li>A list of files to rename: source, destination, size and optionally, etag.</li>
<li>Saving this information in a manifest file in the job attempt directory with a filename derived from the Task ID. Note: writing to a temp file and then renaming to the final path will be used to ensure the manifest creation is atomic.</li>
</ol>
<p>No renaming takes place &#x2014;the files are left in their original location.</p>
<p>The directory treewalk is single-threaded, then it is <code>O(directories)</code>, with each directory listing using one or more paged LIST calls.</p>
<p>This is simple, and for most tasks, the scan is off the critical path of the job.</p>
<p>Statistics analysis may justify moving to parallel scans in future.</p></section><section>
<h3><a name="Job_Commit"></a>Job Commit</h3>
<p>Job Commit consists of:</p>
<ol style="list-style-type: decimal">
<li>List all manifest files in the job attempt directory.</li>
<li>Load each manifest file, create directories which do not yet exist, then rename each file in the rename list.</li>
<li>Save a JSON <code>_SUCCESS</code> file with the same format as the S3A committer (for testing; use write and rename for atomic save)</li>
</ol>
<p>The job commit phase supports parallelization for many tasks and many files per task, specifically:</p>
<ol style="list-style-type: decimal">
<li>Manifest tasks are loaded and processed in a pool of &#x201c;manifest processor&#x201d; threads.</li>
<li>Directory creation and file rename operations are each processed in a pool of &quot; executor&quot; threads: many renames can execute in parallel as they use minimal network IO.</li>
<li>job cleanup can parallelize deletion of task attempt directories. This is relevant as directory deletion is <code>O(files)</code> on Google cloud storage, and also on ABFS when OAuth authentication is used.</li>
</ol></section><section>
<h3><a name="Ancestor_directory_preparation"></a>Ancestor directory preparation</h3>
<p>Optional scan of all ancestors &#x2026;if any are files, delete.</p></section><section>
<h3><a name="Parent_directory_creation"></a>Parent directory creation</h3>
<ol style="list-style-type: decimal">
<li>Probe shared directory map for directory existing. If found: operation is complete.</li>
<li>if the map is empty, call <code>getFileStatus()</code> on the path. Not found: create directory, add entry and those of all parent paths Found and is directory: add entry and those of all parent paths Found and is file: delete. then create as before.</li>
</ol>
<p>Efficiently handling concurrent creation of directories (or delete+create) is going to be a troublespot; some effort is invested there to build the set of directories to create.</p></section><section>
<h3><a name="File_Rename"></a>File Rename</h3>
<p>Files are renamed in parallel.</p>
<p>A pre-rename check for anything being at that path (and deleting it) will be optional. With spark creating new UUIDs for each file, this isn&#x2019;t going to happen, and saves HTTP requests.</p></section><section>
<h3><a name="Validation"></a>Validation</h3>
<p>Optional scan of all committed files and verify length and, if known, etag. For testing and diagnostics.</p></section></section><section>
<h2><a name="Benefits"></a>Benefits</h2>
<ul>
<li>Pushes the source tree list operations into the task commit phase, which is generally off the critical path of execution</li>
<li>Provides an atomic task commit to GCS, as there is no expectation that directory rename is atomic</li>
<li>It is possible to pass IOStatistics from workers in manifest.</li>
<li>Allows for some pre-rename operations similar to the S3A &#x201c;Partitioned Staging committer&#x201d;. This can be configured to delete all existing entries in directories scheduled to be created -or fail if those partitions are non-empty. See <a href="../../hadoop-aws/tools/hadoop-aws/committers.html#The_.E2.80.9CPartitioned.E2.80.9D_Staging_Committer">Partitioned Staging Committer</a></li>
<li>Allows for an optional preflight validation check (verify no duplicate files created by different tasks)</li>
<li>Manifests can be viewed, size of output determined, etc, during development/debugging.</li>
</ul><section>
<h3><a name="Disadvantages"></a>Disadvantages</h3>
<ul>
<li>Needs a new manifest file format.</li>
<li>May makes task commit more complex.</li>
</ul>
<p>This solution is necessary for GCS and should be beneficial on ABFS as listing overheads are paid for in the task committers.</p>
<h1>Implementation Details</h1></section><section>
<h3><a name="Constraints"></a>Constraints</h3>
<p>A key goal is to keep the manifest committer isolated and neither touch the existing committer code nor other parts of the hadoop codebase.</p>
<p>It must plug directly into MR and Spark without needing any changes other than already implemented for the S3A Committers</p>
<ul>
<li>Self-contained: MUST NOT require changes to hadoop-common, etc.</li>
<li>Isolated: MUST NOT make changes to existing committers</li>
<li>Integrated: MUST bind via <code>PathOutputCommitterFactory</code>.</li>
</ul>
<p>As a result of this there&#x2019;s a bit of copy and paste from elsewhere, e.g. <code>org.apache.hadoop.util.functional.TaskPool</code> is based on S3ACommitter&#x2019;s <code>org.apache.hadoop.fs.s3a.commit.Tasks</code>.</p>
<p>The<code>_SUCCESS</code> file MUST be compatible with the S3A JSON file. This is to ensure any existing test suites which validate S3A committer output can be retargeted at jobs executed by the manifest committer without any changes.</p><section>
<h4><a name="Progress_callbacks_in_job_commit."></a>Progress callbacks in job commit.</h4>
<p>When? Proposed: heartbeat until renaming finally finishes.</p></section><section>
<h4><a name="Error_handling_and_aborting_in_job_commit."></a>Error handling and aborting in job commit.</h4>
<p>We would want to stop the entire job commit. Some atomic boolean &#x201c;abort job&#x201d; would need to be checked in the processing of each task committer thread&#x2019;s iteraton through a directory (or processing of each file?) Failures in listing or renaming will need to be escalated to halting the entire job commit. This implies that any IOE raised in asynchronous rename operation or in a task committer thread must:</p>
<ol style="list-style-type: decimal">
<li>be caught</li>
<li>be stored in a shared field/variable</li>
<li>trigger the abort</li>
<li>be rethrown at the end of the <code>commitJob()</code> call</li>
</ol></section><section>
<h4><a name="Avoiding_deadlocks"></a>Avoiding deadlocks</h4>
<p>If a job commit stage is using a thread pool for per-task operations, e.g. loading files, that same thread pool MUST NOT be used for parallel operations within the per-task stage.</p>
<p>As every <code>JobStage</code> is executed in sequence within task or job commit, it is safe to share the same thread pool across stages.</p>
<p>In the current implementation, there is no parallel &#x201c;per manifest&#x201d; operation in job commit other than for actually loading the files. The operations to create directories and to rename files are actually performed without performing parallel processing of individual manifests.</p>
<p>Directory Preparation: merge the directory lists of all manifests, then queue for creation the (hopefully very much smaller) set of unique directories.</p>
<p>Rename: iterate through all manifests and queue their renames into a pool for renaming.</p></section><section>
<h4><a name="Thread_pool_lifetimes"></a>Thread pool lifetimes</h4>
<p>The lifespan of thread pools is constrained to that of the stage configuration, which will be limited to within each of the <code>PathOutputCommitter</code> methods to setup, commit, abort and cleanup.</p>
<p>This avoids the thread pool lifecycle problems of the S3A Committers.</p></section><section>
<h4><a name="Scale_issues_similar_to_S3A_HADOOP-16570."></a>Scale issues similar to S3A HADOOP-16570.</h4>
<p>This was a failure in terasorting where many tasks each generated many files; the full list of files to commit (and the etag of every block) was built up in memory and validated prior to execution.</p>
<p>The manifest committer assumes that the amount of data being stored in memory is less, because there is no longer the need to store an etag for every block of every file being committed.</p></section><section>
<h4><a name="Duplicate_creation_of_directories_in_the_dest_dir"></a>Duplicate creation of directories in the dest dir</h4>
<p>Combine all lists of directories to create and eliminate duplicates.</p></section></section></section><section>
<h2><a name="Implementation_Architecture"></a>Implementation Architecture</h2>
<p>The implementation architecture reflects lessons from the S3A Connector.</p>
<ul>
<li>Isolate the commit stages from the MR commit class, as that&#x2019;s got a complex lifecycle.</li>
<li>Instead, break up into series of <i>stages</i> which can be tested in isolation and chained to provided the final protocol.</li>
<li>Don&#x2019;t pass in MR data types (taskID etc) down into the stages -pass down a configuration with general types (string etc).</li>
<li>Also pass in a callback for store operations, for ease of implementing a fake store.</li>
<li>For each stage: define preconditions and postconditions, failure modes. Test in isolation.</li>
</ul><section><section>
<h4><a name="Statistics"></a>Statistics</h4>
<p>The committer collects duration statistics on all the operations it performs/invokes against filesystems. * Those collected during task commit are saved to the manifest (excluding the time to save and rename that file) * When these manifests are loaded during job commit, these statistics are merged to form aggregate statistics of the whole job. * Which are saved to the <code>_SUCCESS</code> file * and to any copy of that file in the directory specified by <code>mapreduce.manifest.committer.summary.report.directory</code>, if set. to be saved. * The class <code>org.apache.hadoop.mapreduce.lib.output.committer.manifest.files.ManifestPrinter</code> can load and print these.</p>
<p>IO statistics from filsystems and input and output streams used in a query are not collected.</p></section></section></section><section>
<h2><a name="Auditing"></a>Auditing</h2>
<p>When invoking the <code>ManifestCommitter</code> via the <code>PathOutputCommitter</code> API, the following attributes are added to the active (thread) context</p>
<table border="0" class="bodyTable">
<thead>
<tr class="a">
<th> Key </th>
<th> Value </th></tr>
</thead><tbody>
<tr class="b">
<td> <code>ji</code> </td>
<td> Job ID </td></tr>
<tr class="a">
<td> <code>tai</code> </td>
<td> Task Attempt ID </td></tr>
<tr class="b">
<td> <code>st</code> </td>
<td> Stage </td></tr>
</tbody>
</table>
<p>These are also all set in all the helper threads performing work as part of a stage&#x2019;s execution.</p>
<p>Any store/FS which supports auditing is able to collect this data and include in their logs.</p>
<p>To ease backporting, all audit integration is in the single class <code>org.apache.hadoop.mapreduce.lib.output.committer.manifest.impl.AuditingIntegration</code>.</p></section>
</div>
</div>
<div class="clear">
<hr/>
</div>
<div id="footer">
<div class="xright">
&#169; 2008-2023
Apache Software Foundation
- <a href="http://maven.apache.org/privacy-policy.html">Privacy Policy</a>.
Apache Maven, Maven, Apache, the Apache feather logo, and the Apache Maven project logos are trademarks of The Apache Software Foundation.
</div>
<div class="clear">
<hr/>
</div>
</div>
</body>
</html>